Generating a cell-by-gene matrix¶
After pairwise alignment, the same coordinate system is shared between the spatial barcodes and the staining images.
However, analysis (e.g., clustering, pseudotime, DGE...) is performed on single cells, not on individual capture areas (0.6 μm in the current version of the protocol).
So, we show how to aggregate the \(N\times G\) matrix (\(N\) spots; \(G\) genes) into a \(M\times G\) matrix (\(M\) segmented cells; \(G\) genes), where \(N\) maps to \(M\) via the segmentation mask.
Segmentation of staining image¶
To create such a spatial cell-by-gene (\(M\times G\)) expression matrix, you will first need a segmentation mask.
We efficiently segment cells (or nuclei) from staining images using cellpose. We provide a model that we fine-tuned for segmentation of fresh-frozen, H&E-stained tissue, here. You can specify any other model that works best for your data - refer to the cellpose documentation.
openst segment \
--adata <path_to_aligned_h5ad> \
--image-in <image_in_path> \
--output-mask <mask_out_path> \
--model <path>/HE_cellpose_rajewsky \
--chunked \ # divides the image into smaller chunks (lower memory usage)
--gpu \ # uses GPU for segmentation (Nvidia CUDA)
--num-workers 8 \ # processes the image in parallel
--dilate-px 10 # will extend the segmentation 10 micron around
<...>
). For instance,
<path_to_aligned_h5ad>
is the full path to the h5ad
file after pairwise alignment;
<image_in_path>
is the path to the image - a path to a file, or a location inside the h5ad
file,
like 'uns/spatial_pairwise_aligned/staining_image_transformed'
(our recommendation).
<mask_out_path>
is the location where the segmentation mask will be saved - can be a file or a location in the h5ad
file,
like uns/spatial_pairwise_aligned/mask_transformed_10px
(our recommendation). The <model_path>
for the parameter --model
is the name or location of the cellpose model weights.
We recommend using the model provided in our repo
for segmentation of H&E images The rest of parameters can be checked with openst segment --help
.
Tip
If your sample also contains very large cells (e.g., adipocytes) that are not segmented with the previous parameters, you can perform a second segmentation with a cellpose model, adjusting the diameter parameter.
openst segment \
--adata <path_to_aligned_h5ad> \
--image-in <image_in_path> \
--output-mask <mask_out_path_larger> \
--model <path>/HE_cellpose_rajewsky \
--chunked \ # divides the image into smaller chunks (lower memory usage)
--gpu \ # uses GPU for segmentation (Nvidia CUDA)
--num-workers 8 \ # processes the image in parallel
--dilate-px 50 \
--diameter 50 # diameter for the larger cell type
Replace the placeholders (<...>
) as before; in this case, the placeholder <mask_out_path_larger>
must be different from the
<mask_out_path>
provided above.
And then, you can combine the segmentation masks of both diameter configurations.
This command will apply an "AND" between all images, to only preserve mask of non-overlapping,
with the hierarchy provided in the --image-in
argument (first has higher priority).
openst segment_merge \
--adata <path_to_aligned_h5ad> \
--mask-in <mask_a> <mask_b>
--mask-out <mask_combined>
Replace the placeholders (<...>
) as before; in this case, the placeholder <mask_a>
, <mask_b>
... must correspond
to the placeholders <mask_out_path>
, <mask_out_path_larger>
...
Assigning transcripts to segmented cells¶
Now, we aggregate the initial \(N\times G\) matrix into an \(M\times G\) matrix, where \(N\) maps to \(M\) via the segmentation mask.
This step allows you to associate capture spots with segmented cells.
openst transcript_assign \
--adata <path_to_aligned_h5ad> \
--spatial-key spatial_pairwise_aligned_fine \
--mask-in-adata \
--mask <mask_out_path> \
--output <path_to_sc_h5ad>
Replace the placeholders (<...>
) as before; in this case, the placeholder <mask_in_path>
must be set to
be equal to the <mask_out_path>
(or <mask_combined>
if you ran multiple segmentation); also, <path_to_sc_h5ad>
must be set to a valid path and filename where the output cell-by-gene matrix (not barcode-by-cell) will be written.
Expected output¶
After running the steps above, you will have a single h5ad
file, containing the transcriptomic information per segmented cell,
with spatial coordinates compatible with the staining image. The staining image and the segmented image are provided in this object,
so it is possible to visualize it with squidpy or spatialdata,
among other tools.
So, this concludes the preprocessing of 2D spatial transcriptomics and imaging data of the Open-ST protocol. Next steps include 3D reconstruction, and downstream analysis of nD data.