3D reconstruction
Aligning Open-ST metastatic lymph node¶
In this tutorial, we will illustrate how to perform the alignment of the 3 serial sections from a human metastatic lymph node profiled with Open-ST, that we just preprocessed with spacemake
and openst
.
Make sure you have activated the openst
environment you created previously.
Importing stimwrap
¶
STIM
is a console-based tool.
However, when running your analysis in the Python
ecosystem (like here), you can transparently run STIM
from Python
by leveraging the wrapper stimwrap
.
stimwrap
provides bindings for all commands, and additional tools for data preprocessing and conversion prior to downstream analysis.
Creating a N5 container from AnnData
objects¶
STIM requires that the individual section data is resaved into a single N5 container.
This allows to have a single directory containing all data (spatial expression values), metadata (cell annotations), and the output from STIM registration (landmarks, transformation matrices).
You can find more about the N5 standard here.
Note
In this tutorial, we assume you could pairwise-align and segment all sections. However, you can also do 3D registration of hexbin objects, which are created automatically by spacemake
. In that case, adapt the paths accordingly.
import glob
import os
import re
workdir = "/home/user/openst_demo"
spacemake_folder = "spacemake/projects/mLN/processed_data/mLN_*/multimodal"
sections = glob.glob(os.path.join(workdir, spacemake_folder, "stitched_segmented.h5ad"))
sections_numbers = [2, 3, 4]
sections_numbers, sections = zip(*sorted(zip(sections_numbers, sections)))
# we create an alias because STIM needs unique file names
for section, number in zip(sections, sections_numbers):
alias = re.search(r'mLN_S\w+', section).group(0)
directory = os.path.dirname(section)
link_path = os.path.join(os.path.join(directory, f"{alias}_stitched_segmented.h5ad"))
if os.path.exists(link_path):
os.remove(link_path)
os.symlink(section, link_path)
sections = glob.glob(os.path.join(workdir, spacemake_folder, "*_stitched_segmented.h5ad"))
sections_numbers = [2, 3, 4]
sections_numbers, sections = zip(*sorted(zip(sections_numbers, sections)))
print(*[f"Section ID: {s}; Z-axis: {n}" for s, n in zip(sections, sections_numbers)], sep="\n")
Make sure the sections are in order, because these will be used as the relative Z-axis order.
With the code above, we have sorted the files according to their names, which contain the Z-axis offset (equal to section number).
Now, we add the slices into a single N5 container:
Pairwise registration¶
As indicated in its name, STIM will handle ST data as images. These are multi-channel images where the XY dimensions can be specified by a scaling factor (e.g., 1:1 to map 1 pixel to 1 ST unit), and the channels are genes.
During pairwise registration, STIM will automatically find corresponding points between pairs of sections for a subset of genes, and keep those with high quality/agreement across all genes for a pair of sections. This is required prior to assembling a global alignment model (when more than 2 sections are provided).
A subset of genes is used to avoid registering with all genes (in sequencing-based ST, this can lead to ~30,000 channels). This might be too time-consuming, and also most genes do not have sufficient information to render images with spatial patterns that can be used for feature detection (sparsity problem). By default, STIM detects genes with highest variance as a proxy for genes that might show suitable spatial patterns. Otherwise, the user can specify a set of genes used to render images for pairwise alignment. This is what we will do in this tutorial.
import scanpy as sc
import h5py
# Compute common genes
common_genes = set([g.decode("utf-8") for g in h5py.File(sections[0], 'r')['var/_index'][:].tolist()])
for s in sections[1:]:
with h5py.File(s, 'r') as s_data:
common_genes.intersection(set([g.decode("utf-8") for g in s_data['var/_index'][:].tolist()]))
# Load one of the sections; these data is already normalized
adata = sc.read_h5ad(sections[0])
# Filter and normalize
sc.pp.calculate_qc_metrics(adata, inplace=True)
sc.pp.filter_cells(adata, min_counts=50)
sc.pp.filter_cells(adata, max_counts=10000)
sc.pp.normalize_total(adata, inplace=True)
sc.pp.log1p(adata)
# Subset to genes common across all sections
adata = adata[:, adata.var_names.isin(common_genes)]
# Detect and select highly variable genes (common across all sections), this leads to 15 of them
sc.pp.highly_variable_genes(adata, flavor="seurat", min_mean=0.2, max_mean=0.6)
hvg_genes = adata.var_names[adata.var['highly_variable']].tolist()
Note
Adding more genes might increase the accuracy of pairwise alignment, but 10 seems to be enough for this example dataset. Adding more genes increases the time required for pairwise alignment.
Now, we run the pairwise alignment with the hvg_genes
Importantly, if you try to run pairwise alignment more than once, you need to specify the argument overwrite=True
when calling st.align_pairs
.
Otherwise, STIM
assumes that pairwise alignment was performed and will exit.
Visualization of results¶
Note
To run the interactive visualization or alignment tools via st.align_pairs_view
, st.align_interactive
or st.explorer
, make sure you are running this notebook in a computer with a graphical environment, or that you are doing redirection of the window server (e.g., X11 redirection via ssh -X ...
).
You can learn more about this here.
It is good practice to manually assess the results of pairwise alignment before proceeding or using these data for analysis, as the set of parameters used for registration might have not been suitable for the data at hand. Some reasons leading to poor alignment might be:
- Poor selection of the subset of genes used for alignment
- Scale (or render factor) parameter too large or too small
- Poor selection of alignment error (
--maxEpsilon
) parameter - Data is too noisy and might need some filtering (e.g., with
--ffMedian
or--ffSingleSpot
)
STIM provides GUI-based tools to interactively assess the result from pairwise alignment:
Alternatively, you can use an interactive pairwise alignment tool to find suitable parameters, iteratively
Global alignment¶
Once you are satisfied with the results from the pairwise alignment of pairs of sections, you can proceed with the global alignment.
This last step optimizes a global model taking into account all pairs of keypoints.
This reduces error propagation across sections, which might lead to very large distortions in the reconstruction.
Visualization of results¶
Similarly as before, it is good practice to visualize the results after the global alignment procedure.
STIM can leverage BigDataViewer
for 3D visualization of these data:
Note
To run BigDataViewer
, similarly to above, make sure you are running this notebook in a computer with a graphical environment, or that you are doing redirection of the window server (e.g., X11 redirection via ssh -X ...
).
Storing the 3D coordinates in AnnData
¶
Prior to analysing these objects with scanpy
or other tools from the scverse
ecosystem, you can apply the transformation model
and store the transformed 3D coordinates as a new layer in the AnnData
(or N5
) objects.
# iterate over datasets and apply the computed transformation
for z_axis, dataset_name in zip(sections_numbers, container.get_dataset_names()):
with container.get_dataset(dataset_name, mode="r+") as dataset:
dataset.apply_save_transform(transformation="model_sift",
locations='spatial',
destination='spatial_transform_sift',
z_coord=z_axis)
Demo: interoperability with scanpy
¶
Here, we showcase the interoperability of STIM (via AnnData
-backed N5) by plotting genes and running some data processing with scanpy
.
First of all, you can create a single AnnData
object that can be loaded at once with scanpy
(all cells in the same file).
Note¶
If you get errors in the cell above because of __DATA_TYPES__
or because of column-order
, run the following command:
Plotting gene expression¶
We transpose the Z-axis coordinates for plotting with scanpy (from a different point of view)
We use the pl.embedding
function for faster plotting (also, axes are scaled to the same magnitude). One can alternatively use the pl.spatial
function
Plotting normalized gene expression¶
In the plot above, raw counts are shown. As different sections had different sequencing depth, the intensities are not fully comparable.
For improved visualization (and downstream analysis), we can normalize the values across sections. This follows the typical scanpy workflow.
Now we can plot again, which will show depth and log-normalized counts.
From here we can proceed with downstream analysis, like cell type clustering, differential expression, discovery of spatial features...