In Situ Spatial Omics technology (Xenium In Situ) enables the expression analysis of hundreds of RNAs in Fresh Frozen (FF) and Formalin-Fixed Paraffin-Embedded (FFPE) tissues with precise subcellular localization (200-nanometer resolution). Compared to Visium spatial transcriptomics, it offers significantly higher detection precision. This technology is poised to profoundly impact our understanding of tumor microenvironments, immunology, neuroscience, cell specificity, and biological development, ushering in a new era of spatial single-cell/subcellular research.
1.Pioneering Platform Adoption: One of the earliest companies in China to introduce the Xenium platform. Novel Bio successfully completed the APT test experiment and has officially commenced client projects, enabling immediate sample processing to accelerate scientific outcomes.
2.Official Designation: Novel Bio is an officially designated Xenium In Situ Analysis Technology Service Provider by 10x Genomics.
3.Proven Expertise & Infrastructure: Contributed to multiple high-impact publications integrating single-cell and spatial transcriptomics. Equipped with advanced cryostats and digital slide scanners, and possesses extensive experience in spatial multi-omics data analysis.
4.Advanced Data Analysis: Leveraging Novel Bio's proprietary CytoNavigator, a high-throughput production-level data analysis system, to provide personalized, customized data analysis services, ensuring optimal research outcomes for clients.
Sample Types:
Fresh Frozen (FF) Samples: Fresh tissue embedded in OCT and snap-frozen at -80°C. Can be stored and transported with the embedding mold. Orientation marks should be made on the mold to determine sectioning direction.
FFPE Samples: Embedded paraffin blocks or 3-5 paraffin sections. Store at 4°C and transport dry.
Other Important Notes:
1.Contact Novel Bio to obtain Xenium-specific slides for self-mounted sections by the researcher. Sample processing and transportation must maintain a low-temperature environment.
2.For multi-sample pooling on a single slide, the same Panel must be used for all samples on that slide.
3.Tissue size must be less than 10.45mm * 22.45mm (the actual capture area size).
4.Section QC: H&E staining for quality control to assess tissue morphology and spatial information preservation. DV200 > 30% (DV200 is the percentage of RNA fragments >200nt in the sample).
5.Species & Panel Info: Commercial panels currently support only Human and Mouse species (custom addition of up to 100 genes is possible). For other species with well-annotated transcriptomes, custom panel design is required (please contact us to discuss details before sample submission).
Xenium analysis yields interpretable and visualizable results immediately after the run completes, requiring no additional sequencing. The data is also compatible with third-party algorithm packages like Seurat and stLearn, enabling cell-based and spatial dimension analysis and in-depth interpretation of target data, combining ease of use with analytical depth.
Ⅰ.Experimental Groups:
1.Xenium In Situ Platform: Mouse brain (FF/FFPE) & Human breast cancer, lung cancer, glioblastoma (FFPE)
2.Other Imaging Platforms:
MERFISH: Mouse brain (FF)
CosMx: Human breast cancer (FFPE)
MERSCOPE: Mouse brain (FF)
Molecular Cartography: Human breast cancer (FFPE)
3.Single-cell RNA-seq: Mouse whole brain atlas dataset
4.Visium Spatial Transcriptome: Mouse brain coronal section dataset
Ⅱ.Capture Platforms:
Xenium In Situ, MERFISH, CosMx, MERSCOPE, Molecular Cartography
Ⅲ.Primary Technical Methods:
Tissue Domain identification, Spatially Variable Feature (SVF) detection, Cell type annotation & clustering, Gene Imputation, etc.
The Xenium In Situ platform is a novel spatial transcriptomics product developed by 10x Genomics, capable of in situ localization of hundreds of genes at subcellular resolution. Faced with numerous spatial transcriptomics technologies, selecting the appropriate platform and establishing analysis guidelines becomes crucial. This study selected 25 Xenium datasets from different tissues and species, comparing them with eight other spatially resolved transcriptomics technologies and commercial platforms, focusing on scalability, resolution, data quality, functional advantages, and limitations. Concurrently, the researchers benchmarked the performance of various open-source computational tools on Xenium datasets, covering tasks like data preprocessing, cell segmentation, spatial feature selection, and domain identification. This study not only independently validates Xenium's performance but also provides best-practice guidelines and operational recommendations for analyzing such datasets.
Fig.1
Imaging-based spatially resolved transcriptomics (SRT) technologies enable targeted, high-throughput detection of individual RNA molecules through fluorescence microscopy. These methods are fundamentally divided by their chemical principles into in situ hybridization approaches (e.g., MERFISH, SeqFISH) and in situ sequencing methods (e.g., ISS, STARmap). Commercial platforms such as CosMx, Molecular Cartography, and seqFISH have rapidly emerged in this space. Among them, the 10x Genomics Xenium platform, utilizing in situ sequencing chemistry, claims to deliver subcellular-resolution spatial mapping for hundreds of genes (Fig. 1a). While 10x Genomics has conducted internal technical validation and limited benchmarking with its data, a comprehensive independent evaluation remained lacking. This study addresses this gap through systematic comparison of Xenium against multiple SRT technologies, characterizing its data properties, advantages, and limitations, while optimizing computational analysis workflows to demonstrate its potential for innovative biological applications.
To thoroughly characterize Xenium data properties, researchers integrated 25 experimental datasets (comprising 14 independent studies) generated from 10x Genomics platforms and Xenium instruments. This comprehensive collection spanned multiple sample types, containing 1.2 billion reads across 6 million cells (Fig. 1b). Each sample profiled 210-392 genes, with all datasets providing 3D spatial coordinates (x,y,z), gene identities, and quality values (QV). Notably, 81% of reads demonstrated QV scores >20 (range: 72-91%), with no significant differences observed between fresh frozen (FF) and formalin-fixed paraffin-embedded (FFPE) sections (Fig. 1b). Following default segmentation parameters, each cell captured an average of 186.6 reads, with 76.8% of total reads successfully assigned to cells. Only 0.21% of cells were filtered out due to containing <10 reads, indicating that Xenium is well-suited for accurate cell type frequency assessment.
2.Xenium Enables Reproducible Identification of Populations
Fig.1
To deeply investigate the characteristics of Xenium datasets, researchers analyzed seven adjacent complete coronal section datasets from mouse brain. Xenium's cell identification algorithm first segments DAPI-stained nuclei, then expands the segmentation mask. By analyzing the gene-count matrix of segmented nuclei, researchers identified 50 cell types mappable to tissue structures, constructing a cell type atlas (Fig. 1c). When categorizing these cells into anatomical tissue regions, consistent distribution of region-specific cell types was observed across sections (Fig. 1d). Datasets generated from independent experiments using the same probe panel on similar samples showed high consistency in gene-specific detection efficiency, data dispersion, and single-cell read counts. Cell type proportions remained stable across experiments, with significant differences observed only for low-abundance cell populations due to biological variation between samples.
3.Xenium Retains Crucial 3D and Subcellular Information
Fig.1
Leveraging the three-dimensional coordinates provided by Xenium, researchers bypassed conventional segmentation steps and employed the segmentation-free SSAM model to directly analyze molecular spatial features. This approach successfully captured 44 cell type-specific clusters while revealing that 1.8% of cells exhibited signal mixing due to Z-axis overlap (Fig. 1e). Further application of the Points2Regions system enabled identification of subcellular mRNA clusters, which were categorized into three distinct classes: nuclear, cytoplasmic, and extracellular (Fig. 1f-i). The analysis uncovered systematic nuclear-cytoplasmic expression differences within identical cell populations, thereby transforming the dataset into a comprehensive 3D subcellular atlas that extends far beyond conventional 2D expression matrices.
4.Xenium detection efficiency is comparable to in situ hybridization platforms
Fig.2
Using mouse brain tissue—whose cellular composition has been thoroughly characterized by scRNA-seq and multiple SRT methods—as a benchmark, researchers systematically compared the performance of Xenium against several imaging platforms (Vizgen MERSCOPE, HS-ISS, MERFISH, Resolve Molecular Cartography, and Nanostring CosMx) and the sequencing-based Visium platform (Fig. 2a). To eliminate segmentation bias, all datasets were processed with uniform nucleus segmentation using Cellpose, with <10-30% of reads assigned to cells (Fig. 2a). When analysis was restricted to isocortex, hippocampal, and thalamic regions, CosMx demonstrated the highest reads per cell, with read counts increasing proportionally with the number of detected genes (Fig. 2b). Using scRNA-seq as a reference to calculate gene detection efficiency across platforms, results showed that Xenium's sensitivity was comparable to ISH technologies like MERSCOPE and Molecular Cartography, and 1.2-1.5 times higher than Chromium v2 (Fig. 2c). Cross-platform clustering further confirmed that molecules per cell were similar across commercial SRT platforms (Fig. 2e), indicating convergence in detection efficiency across the industry.
Fig.2
To compare Xenium's performance with sequencing-based methods, researchers compared it with the most commonly used spatial transcriptomics method, Visium (FF), at the tissue level: due to Visium's lack of single-cell resolution, pseudo-bulk gene-specific reads were counted normalized by area in shared anatomical regions. Results showed Xenium had significantly higher sensitivity, with median read counts 12.8 times that of Visium, and genes only weakly detected by Visium were highly enriched in Xenium (Fig. 2g). To measure specificity, researchers proposed the Negative Co-expression Purity (NCP) metric (closer to 1 indicates higher specificity). The average NCP for all SRT technologies was >0.8, with HS-ISS and Molecular Cartography being the most specific. Xenium was slightly lower but consistently outperformed CosMx, a conclusion unchanged after removing highly expressed genes (Fig. 2d). Subcellular localization showed MERFISH/ISS reads were more clustered near the cell centroid, while ISH commercial platform (CosMx, Molecular Cartography, MERSCOPE) reads were distributed farther out, with differences particularly prominent at the single-gene level (Fig. 2f).
5.Nuclear Expansion Impacts Cell Type Expression Profiles
Fig.3
Nuclear expression features are often sufficient to define cell populations in situ, but incorporating cytoplasmic reads can enhance cell clustering and marker identification. Based on this hypothesis, Xenium's default nuclear segmentation uses a 15 μm radius expansion. This expanded gene-cell matrix identified cell types clustering by region-specific patterns, contrasting with the more homogeneous classification obtained from unexpanded segmentation. For example, thalamic oligodendrocytes clustered with thalamic astrocytes rather than other oligodendrocytes, suggesting expansion captures region-specific expression features. To determine the optimal cell expansion parameter, researchers defined nuclear expression signatures for each cell type and region-specific background expression features.
Fig.3
Analysis showed that transcripts located on average more than 10.71 μm from the cell centroid had gene expression correlating more highly with region-specific background features than with cell type-specific nuclear features (Fig. 3a, b). This distance likely reflects the average radius of the cellular profile encompassing both nucleus and cytoplasm. Given the average nuclear radius in this dataset was 5.06 μm, the ideal expansion for cells in the sample would be 5.64 μm. However, different cell types exhibited different optimal expansion distances (Fig. 3b). Therefore, the segmentation strategy based on nuclear identification follows.
6.Baysor and Cellpose Outperform Standard Xenium Segmentation
Fig.3
The impact of cell segmentation on cell typing accuracy prompted researchers to explore other segmentation methods. By comparing the performance of Xenium segmentation with commonly used strategies (Fig. 3c), they found these strategies could be categorized into three main classes: Stain-based (using dyes like DAPI to determine cell location, e.g., Watershed, MESMER, Cellpose); Read-based (defining cells based on tissue read density and composition, e.g., Baysor); and Hybrid models (using both stain and read positions, e.g., Baysor, Clustermap). As a baseline segmentation, binning-based methods assuming uniform tissue distribution were also included. Furthermore, researchers applied cell expansion parameters of 1, 2, 5, 10, and 15μm to each segmentation result.
Fig.3
Researchers then filtered for strategy combinations with similar performance (Fig. 3d). DAPI stain-based strategies generated similar outputs, with cell expansion being the primary driver of differences. Additionally, clustering results based on Baysor, Clustermap, and binning strategies showed method-specific patterns, indicating these strategies have distinct segmentation output characteristics. Researchers defined the optimal segmentation strategy as the one maximizing the proportion of reads assigned to cells while preserving specific expression patterns (quantified by Negative Marker Purity, NMP). NMP calculates the expected percentage of reads detected for each cell type based on reference single-cell RNA sequencing data. They found that Baysor-based strategies – particularly the combination incorporating Xenium nucleus segmentation (BA2 P0.8) – performed best (Fig. 3e).
Fig.3
Notably, using Xenium's segmentation results as a prior significantly reduced missed cells. These conclusions were consistent across all datasets. Finally, researchers jointly analyzed cells segmented using the optimal strategy (BA2 P0.8) with Xenium nucleus segmentation results (Fig. 1c). Although the Baysor strategy yielded higher single-cell counts, the cell populations identified by both segmentation strategies were fully concordant (Fig. 3f-h), with differences in cell type abundance manifesting mainly as minor fluctuations. Overall, the analysis indicates that Xenium's default nuclear segmentation mask provides sufficient information to define the major populations detected in situ, performing comparably to more complex segmentation strategies.
7.Preparing Xenium Data: Best Practices for Preprocessing
Fig.4
To identify cell populations in situ, researchers used the Census single-cell RNA sequencing dataset as a reference, simulating Xenium data through a three-step process involving gene subset reduction, adjusted detection efficiency, and injected segmentation errors & technical noise (Fig. 4a). They systematically compared various preprocessing workflows and hyperparameter combinations, including normalization, scaling, highly variable feature selection, and clustering. By assessing the similarity between new clusters and reference labels, they determined the optimal workflow to be: library size normalization to 100, log transformation, scaling, constructing a k-nearest neighbor graph using all principal components and 16 neighbors, and performing Louvain clustering. This combination consistently maximized the agreement between original and newly generated clusters (Fig. 4b, c).
Fig.4
Surprisingly, some top-performing workflows could omit log transformation, retaining only scaling, indicating no universal preprocessing recipe for spatial data; subsequent validation on simulated data confirmed normalization method, library size, scaling factors, and number of principal components as key tuning parameters (Fig. 4d). Applying the identified optimal workflow to real Xenium data and re-tuning parameters (Fig. 4e) revealed that high-variable feature selection, normalization, and scaling were indispensable; omitting any step significantly altered clustering results, a finding consistent across datasets and highly concordant with simulated data.
8.Selecting Spatially Variable Features with Xenium Datasets
Fig.4
As an alternative to HVG selection, researchers compared eight common Spatially Variable Feature (SVF) algorithms (Moran's I, Geary's C, Hotspot, SomDE, SpatialDE, Sinfonia, Seurat mark variogram, Giotto) using the full Xenium dataset. They found the proportion of genes identified as SVFs varied significantly between methods, but gene rankings were largely consistent except for SpatialDE, Seurat mark variogram, and Geary's C (for 5000 cells) (Fig. 4f, g). When calling genes as SVFs, Seurat, Sinfonia, etc., tended towards high proportions, while Hotspot and Squidpy were more conservative (Fig. 4h-j). Using control probes lacking spatial variation to assess false positives, Hotspot performed best (erroneous selection <5%), while algorithms with high SVF proportions also had elevated false positives, suggesting noise is still misclassified. In contrast, algorithms used for Highly Variable Feature (HVF) selection completely excluded control probes and stably detected ≈18% HVFs (Fig. 4i, j).
9.Benchmarking Gene Imputation Tools on Xenium Datasets
Fig.5
To overcome the gene throughput limitation of targeted SRT, researchers adapted the Li et al. pipeline (Fig. 5a), benchmarking seven imputation methods (gimVI, SpaGE, Tangram, Liger, Seurat, SpaOTsc, NovoSpaRc) using PCC, SSIM, RMSE, and JS metrics. Results consistently identified SpaGE as the best performer, with Seurat, Tangram, and SpaOTsc also performing excellently, while gimVI integration performed worse than previously reported (Fig. 5b). The pipeline also identified genes with low overall consistency between scRNA-seq and Xenium by comparing measured and imputed expression. Quantifying PCC differences per gene-method pair revealed significant variation in imputation performance, consistent across methods (Fig. 5c). Further analysis showed that expression level and overall inter-gene correlation were the most significant determinants of imputation performance, while transcript subcellular localization or expression variability had minimal influence (Fig. 5d-f).
10.Evaluating Computational Tools for Exploring Tissue Architecture
Fig.5
Deciphering tissue architecture is fundamental to understanding its function. While the development of reliable tissue domain identification tools has attracted significant attention, independent comparative studies have remained scarce. To address this gap, researchers benchmarked five domain identification algorithms (Banksy, DeepST, SpaGCN, SPACEL, and STAGATE) against expert manual annotations of regions in the Allen Brain Atlas coronal P56 section (Fig. 5g). The study also included two straightforward cell compartment identification methods (a binning-based approach and a neighborhood-based approach). Results demonstrated that binning-based clustering predictions showed the highest concordance with manual annotations, outperforming more complex algorithms (Fig. 5h). However, these findings might be influenced by the specific structural characteristics of the analyzed tissue type, suggesting that method performance may vary across different tissue types.
Based on the evidence presented, the researchers propose an end-to-end optimized workflow for Xenium data processing and analysis (integrated graphical summary): Starting with raw Xenium data, first perform nucleus identification using Cellpose, then use Baysor for read assignment/cell resegmentation, without expansion. If segmentation is poor, alternative segmentation-free methods like SSAM or Points2Regions can be used to directly extract molecular features. Subsequently, process the cell-gene matrix using standard scRNA-seq workflows (cell filtering -> log transformation & normalization -> PCA -> dimensionality reduction -> clustering) for cell type identification. For Spatially Variable Gene ranking, multiple algorithms can be used but note result differences. If scRNA-seq data is available, use Seurat, SpaGE, or Tangram for gene imputation. Domain identification can employ Binning-based strategies or algorithms like Banksy, SPACEL.
Link to Original Article: https://doi.org/10.1038/s41592-025-02617-2