spateo.tools#

Subpackages#

Submodules#

Package Contents#

Classes#

Lasso

Lasso an region of interest (ROI) based on spatial cluster.

Label

Given categorizations for a set of points, wrap into a Label class.

LiveWireSegmentation

STGNN

Graph neural network for representation learning of spatial transcriptomics data from only the gene expression

STGNN

Graph neural network for representation learning of spatial transcriptomics data from only the gene expression

Category_Model

Wraps all necessary methods for data loading and preparation, model initialization, parameterization,

Lagged_Model

Wraps all necessary methods for data loading and preparation, model initialization, parameterization,

Niche_LR_Model

Wraps all necessary methods for data loading and preparation, model initialization, parameterization,

Niche_Model

Wraps all necessary methods for data loading and preparation, model initialization, parameterization,

Category_Model

Wraps all necessary methods for data loading and preparation, model initialization, parameterization,

Lagged_Model

Wraps all necessary methods for data loading and preparation, model initialization, parameterization,

Niche_LR_Model

Wraps all necessary methods for data loading and preparation, model initialization, parameterization,

Niche_Model

Wraps all necessary methods for data loading and preparation, model initialization, parameterization,

Functions#

archetypes(→ numpy.ndarray)

Identify archetypes from the anndata object.

archetypes_genes(→ dict)

Identify genes that belong to each expression archetype.

find_spatial_archetypes(→ Tuple[numpy.ndarray, ...)

Clusters the expression data and finds gene archetypes. Current implementation is based on hierarchical

find_spatially_related_genes(exp_mat, gene_names, ...)

Given a gene, find other genes which correlate well spatially.

get_genes_from_spatial_archetype(...)

Get a list of genes which are the best representatives of the archetype.

find_cci_two_group(→ dict)

Performing cell-cell transformation on an anndata object, while also

prepare_cci_cellpair_adata(→ anndata.AnnData)

prepare for visualization cellpairs by func st.tl.space, plot all_cell_pair,

prepare_cci_df(cci_df, means_col, pval_col, ...)

Given a dataframe generated from the output of :func cci_two_cluster, prepare for visualization by heatmap by

niches(→ anndata.AnnData)

Performing cell-cell transformation on an anndata object, while also

predict_ligand_activities(→ pandas.DataFrame)

Function to predict the ligand activity.

predict_target_genes(→ pandas.DataFrame)

Function to predict the target genes.

spagcn_vanilla(→ Optional[anndata.AnnData])

Integrating gene expression and spatial location to identify spatial domains via SpaGCN.

scc(→ Optional[anndata.AnnData])

Spatially constrained clustering (scc) to identify continuous tissue domains.

spagcn_pyg(→ Optional[anndata.AnnData])

Function to find clusters with spagcn.

compute_pca_components(→ Tuple[Any, int, float])

Calculate the inflection point of the PCA curve to

ecp_silhouette(→ float)

Here we evaluate the clustering performance by calculating the Silhouette Coefficient.

harmony_debatch(→ Optional[anndata.AnnData])

Use harmonypy [Korunsky19] to remove batch effects.

integrate(→ anndata.AnnData)

Concatenating all anndata objects.

pca_spateo(adata[, X_data, n_pca_components, pca_key, ...])

Do PCA for dimensional reduction.

pearson_residuals(adata[, n_top_genes, subset, theta, ...])

Preprocess UMI count data with analytic Pearson residuals.

sctransform(adata, rlib_path[, n_top_genes, ...])

Use sctransform with an additional flag vst.flavor="v2" to perform normalization and dimensionality reduction

scc(→ Optional[anndata.AnnData])

Spatially constrained clustering (scc) to identify continuous tissue domains.

spagcn_pyg(→ Optional[anndata.AnnData])

Function to find clusters with spagcn.

find_all_cluster_degs(→ anndata.AnnData)

Find marker genes for each group of buckets based on gene expression.

find_cluster_degs(→ pandas.DataFrame)

Find marker genes between one group to other groups based on gene expression.

find_spatial_cluster_degs(→ pandas.DataFrame)

Function to search nearest neighbor groups in spatial space

top_n_degs(adata, group[, custom_score_func, sort_by, ...])

Find top n marker genes for each group of buckets based on differential gene expression analysis results.

AffineTrans(→ Tuple[numpy.ndarray, numpy.ndarray, ...)

Translate the x/y coordinates of data points by the translating the centroid to the origin. Then data will be

align_slices_pca(→ None)

Coarsely align the slices based on the major axis, identified via PCA

pca_align(→ Tuple[numpy.ndarray, numpy.ndarray])

Use pca to rotate a coordinate matrix to reveal the largest variance on each dimension.

procrustes(→ Tuple[float, numpy.ndarray, dict])

A port of MATLAB's procrustes function to Numpy.

construct_geodesic_distance_matrix(→ anndata.AnnData)

Given AnnData object and key to array of x- and y-coordinates, compute geodesic distance each sample and its

construct_nn_graph(→ None)

Constructing bucket-to-bucket nearest neighbors graph.

construct_spatial_distance_matrix(→ anndata.AnnData)

Given AnnData object and key to array of x- and y-coordinates, compute pairwise spatial distances between all

generate_spatial_weights_fixed_nbrs(...)

Starting from a k-nearest neighbor graph, generate a nearest neighbor graph.

generate_spatial_weights_fixed_radius(...)

Starting from a radius-based neighbor graph, generate a sparse graph (csr format) with weighted edges, where edge

weighted_expr_neighbors_graph(...)

Given an AnnData object, compute distance array in gene expression space.

weighted_spatial_graph(...)

Given an AnnData object, compute distance array with either a fixed number of neighbors for each bucket or a

glm_degs(→ Optional[anndata.AnnData])

Differential genes expression tests using generalized linear regressions. Here only size factor normalized gene

create_label_class(→ Union[Label, List[Label]])

Wraps categorical labels into custom Label class for downstream processing.

GM_lag_model(adata, group[, spatial_key, genes, ...])

Spatial lag model with spatial two stage least squares (S2SLS) with results and diagnostics; Anselin (1988).

lisa_geo_df(→ geopandas.GeoDataFrame)

Perform Local Indicators of Spatial Association (LISA) analyses on specific genes and prepare a geopandas

local_moran_i(adata, group[, spatial_key, genes, ...])

Identify cell type specific genes with local Moran's I test.

compute_shortest_path(→ List)

Inline function for easier computation of shortest_path in an image.

live_wire

This file implements the LiveWire segmentation algorithm. The code is ported from:

center_align(→ Tuple[anndata.AnnData, List[numpy.ndarray]])

Computes center alignment of slices.

generalized_procrustes_analysis(X, Y, pi)

Finds and applies optimal rotation between spatial coordinates of two layers (may also do a reflection).

mapping_aligned_coords(→ Tuple[dict, dict])

Optimal mapping coordinates between X and Y.

mapping_center_coords(→ dict)

Optimal mapping coordinates between X and Y based on intermediate coordinates.

pairwise_align(→ Tuple[numpy.ndarray, Optional[int]])

Calculates and returns optimal alignment of two slices.

cellbin_morani(→ pandas.DataFrame)

Calculate Moran's I score for each celltype (in segmented cell adata).

moran_i(→ pandas.DataFrame)

Identify genes with strong spatial autocorrelation with Moran's I test.

impute_and_downsample(→ Tuple[anndata.AnnData, ...)

Smooth gene expression distributions and downsample a spatial sample by selecting representative points from

impute_and_downsample(→ Tuple[anndata.AnnData, ...)

Smooth gene expression distributions and downsample a spatial sample by selecting representative points from

fit_glm(→ Tuple[numpy.ndarray, numpy.ndarray, float, ...)

Wrapper for fitting a generalized elastic net linear model to large biological data, with automated finding of

plot_prior_vs_data(reconst, adata, kind, target_name, ...)

Plots distribution of observed vs. predicted counts in the form of a comparative density barplot.

get_align_labels(→ pandas.DataFrame)

Obtain the label information in anndata.obs[key] corresponding to the align_X coordinate.

models_align(→ List[anndata.AnnData])

Align spatial coordinates of models.

models_align_ref(→ Tuple[List[anndata.AnnData], ...)

Align the spatial coordinates of one model list through the affine transformation matrix obtained from another model list.

models_center_align(→ Tuple[anndata.AnnData, ...)

Align spatial coordinates of a list of models to a center model.

models_center_align_ref(→ Tuple[anndata.AnnData, ...)

Align the spatial coordinates of one model list to the central model through the affine transformation matrix obtained from another model list.

rigid_transform_2D(→ numpy.ndarray)

Compute optimal transformation based on the two sets of 2D points and apply the transformation to other points.

rigid_transform_3D(→ numpy.ndarray)

Compute optimal transformation based on the two sets of 3D points and apply the transformation to other points.

spateo.tools.archetypes(adata: anndata.AnnData, moran_i_genes: Union[numpy.ndarray, list], num_clusters: int = 5, layer: Union[str, None] = None) numpy.ndarray[source]#

Identify archetypes from the anndata object.

Parameters
adata

Anndata object of interests.

moran_i_genes

genes that are identified as singificant autocorrelation genes in space based on Moran’s I.

num_clusters

number of archetypes.

layers

the layer for the gene expression, can be None which corresponds to adata.X.

Returns

the archetypes within the genes with high moran I scores.

Return type

archetypes

Examples

>>> archetypes = st.tl.archetypes(adata)
>>> adata.obs = pd.concat((adata.obs, df), 1)
>> arch_cols = adata.obs.columns
>>> st.pl.space(adata, basis="spatial", color=arch_cols, pointsize=0.1, alpha=1)
spateo.tools.archetypes_genes(adata: anndata.AnnData, archetypes: numpy.ndarray, num_clusters: int, moran_i_genes: Union[numpy.ndarray, list], layer: Union[str, None] = None) dict[source]#

Identify genes that belong to each expression archetype.

Parameters
adata

Anndata object of interests.

archetypes

the archetypes output of find_spatial_archetypes

num_clusters

number of archetypes.

moran_i_genes

genes that are identified as singificant autocorrelation genes in space based on Moran’s I.

layer

the layer for the gene expression, can be None which corresponds to adata.X.

Returns

a dictionary where the key is the index of the archetype and the values are the top genes for that particular archetype.

Return type

archetypes_dict

Examples

>>> st.tl.archetypes_genes(adata)
>>> dyn.pl.scatters(subset_adata,
>>>     basis="spatial",
>>>     color=['archetype %d'% i] + typical_genes.to_list(),
>>>     pointsize=0.03,
>>>     alpha=1,
>>>     figsize=(3, ptp_vec[1]/ptp_vec[0] * 3)
>>> )
spateo.tools.find_spatial_archetypes(num_clusters: int, exp_mat: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][source]#

Clusters the expression data and finds gene archetypes. Current implementation is based on hierarchical clustering with the Ward method. The archetypes are simply the average of genes belong to the same cell cluster.

Parameters
num_clusters

number of gene clusters or archetypes.

exp_mat

expression matrix. Rows are genes and columns are buckets.

Returns

Returns the archetypes, the gene sets (clusters) and the Pearson correlations of every gene with respect to each archetype.

Given a gene, find other genes which correlate well spatially.

Parameters
exp_mat

expression matrix.

gene_names

gene name list that associates with the rows of expression matrix.

archetypes

the archetypes output of find_spatial_archetypes

gene

the index of the gene to be queried

pval_threshold

the pvalue returned from the pearsonr function

Returns

a list of genes which are the best representatives of the archetype

spateo.tools.get_genes_from_spatial_archetype(exp_mat: numpy.ndarray, gene_names: Union[numpy.ndarray, list], archetypes: numpy.ndarray, archetype: int, pval_threshold: float = 0) Union[numpy.ndarray, list][source]#

Get a list of genes which are the best representatives of the archetype.

Parameters
exp_mat

expression matrix.

gene_names

the gene names list that associates with the rows of expression matrix

archetypes

the archetypes output of find_spatial_archetypes

archetype

a number denoting the archetype

pval_threshold

the pvalue returned from the pearsonr function

Returns

a list of genes which are the best representatives of the archetype

spateo.tools.find_cci_two_group(adata: anndata.AnnData, path: str, species: Literal[human, mouse, drosophila, zebrafish, axolotl] = 'human', layer: Tuple[None, str] = None, group: str = None, lr_pair: list = None, sender_group: str = None, receiver_group: str = None, filter_lr: Literal[outer, inner] = 'outer', top: int = 20, spatial_neighbors: str = 'spatial_neighbors', spatial_distances: str = 'spatial_distances', min_cells_by_counts: int = 0, min_pairs: int = 5, min_pairs_ratio: float = 0.01, num: int = 1000, pvalue: float = 0.05) dict[source]#
Performing cell-cell transformation on an anndata object, while also

limiting the nearest neighbor per cell to n_neighbors. This function returns a dictionary, where the key is ‘cell_pair’ and ‘lr_pair’.

Parameters
adata

An Annodata object.

path

Path to ligand_receptor network of NicheNet (prior lr_network).

species

Which species is your adata generated from. Will be used to determine the proper ligand-receptor database.

layer

the key to the layer. If it is None, adata.X will be used by default.

group

The group name in adata.obs

lr_pair

given a lr_pair list.

sender_group

the cell group name of send ligands.

receiver_group

the cell group name of receive receptors.

spatial_neighbors

spatial neighbor key {spatial_neighbors} in adata.uns.keys(),

spatial_distances

spatial neighbor distance key {spatial_distances} in adata.obsp.keys().

min_cells_by_counts

threshold for minimum number of cells expressing ligand/receptor to avoid being filtered out. Only used if ‘lr_pair’ is None.

min_pairs

minimum number of cell pairs between cells from two groups.

min_pairs_ratio

minimum ratio of cell pairs to theoretical cell pairs (n x M / 2) between cells from two groups.

num

number of permutations. It is recommended that this number be at least 1000.

pvalue

the p-value threshold that will be used to filter for significant ligand-receptor pairs.

filter_lr

filter ligand and receptor based on specific expressed in sender groups and receiver groups. ‘inner’: specific both in sender groups and receiver groups; ‘outer’: specific in sender groups or receiver groups.

top

the number of top expressed fraction in given sender groups(receiver groups) for each gene(ligand or receptor).

Returns

a dictionary where the key is ‘cell_pair’ and ‘lr_pair’.

Return type

result_dict

spateo.tools.prepare_cci_cellpair_adata(adata: anndata.AnnData, sender_group: str = None, receiver_group: str = None, group: str = None, cci_dict: dict = None, all_cell_pair: bool = False) anndata.AnnData[source]#

prepare for visualization cellpairs by func st.tl.space, plot all_cell_pair, or cell pairs which constrain by spatial distance(output of :func cci_two_cluster).

Args:

adata:An Annodata object. sender_group: the cell group name of send ligands. receiver_group: the cell group name of receive receptors. group:The group name in adata.obs, Unused unless ‘all_cell_pair’ is True. cci_dict: a dictionary result from :func cci_two_cluster, where the key is ‘cell_pair’ and ‘lr_pair’.

Unused unless ‘all_cell_pair’ is False.

all_cell_pair: show all cells of the sender and receiver cell group, spatial_key: Key in .obsm containing coordinates for each bucket. Defult False.

Returns:

adata: Updated AnnData object containing ‘spec’ in .obs.

spateo.tools.prepare_cci_df(cci_df: pandas.DataFrame, means_col: str, pval_col: str, lr_pair_col: str, sr_pair_col: str)[source]#

Given a dataframe generated from the output of :func cci_two_cluster, prepare for visualization by heatmap by splitting into two dataframes, corresponding to the mean cell type-cell type L:R product and probability values from the permutation test.

Parameters
cci_df

CCI dataframe with columns for: ligand name, receptor name, L:R product, p value, and sender-receiver cell types

means_col

Label for the column corresponding to the mean product of L:R expression between two cell types

pval_col

Label for the column corresponding to the p-value of the interaction

lr_pair_col

Label for the column corresponding to the ligand-receptor pair in format “{ligand}-{receptor}”

sr_pair_col

Label for the column corresponding to the sending-receiving cell type pair in format “{

sender}-{receiver}"

Returns

If ‘adata’ is None. Keys: ‘means’, ‘pvalues’, values: mean cell type-cell type L:R product, probability

values, respectively

Return type

dict

Example

res = find_cci_two_group(adata, …) # The df to save can be found under “lr_pair”: res[“lr_pair”].to_csv(…)

adata, dict = prepare_cci_df(res[“lr_pair”])

spateo.tools.niches(adata: anndata.AnnData, path: str, layer: Tuple[None, str] = None, weighted: bool = False, spatial_neighbors: str = 'spatial_neighbors', spatial_distances: str = 'spatial_distances', species: Literal[human, mouse, drosophila, zebrafish, axolotl] = 'human', system: Literal[niches_c2c, niches_n2c, niches_c2n, niches_n2n] = 'niches_n2n', method: Literal[scipy.stats.gmean, mean, sum] = 'sum') anndata.AnnData[source]#
Performing cell-cell transformation on an anndata object, while also

limiting the nearest neighbor per cell to k. This function returns another anndata object, in which the columns of the matrix are bucket -bucket pairs, while the rows ligand-receptor mechanisms. This resultant anndated object allows flexible downstream manipulations such as the dimensional reduction of the row or column of this object.

Our method is adapted from: Micha Sam Brickman Raredon, Junchen Yang, Neeharika Kothapalli, Naftali Kaminski, Laura E. Niklason, Yuval Kluger. Comprehensive visualization of cell-cell interactions in single-cell and spatial transcriptomics with NICHES. doi: https://doi.org/10.1101/2022.01.23.477401

Parameters
adata

An Annodata object.

path

Path to ligand_receptor network of NicheNet (prior lr_network).

layer

the key to the layer. If it is None, adata.X will be used by default.

weighted

‘False’ (defult) whether to supply the edge weights according to the actual spatial distance(just as weighted kNN). Defult is ‘False’, means all neighbor edge weights equal to 1, others is 0.

spatial_neighbors

neighbor_key {spatial_neighbors} in adata.uns.keys(),

spatial_distances

neighbor_key {spatial_distances} in adata.obsp.keys().

system

‘niches_n2n’(defult) cell-cell signaling (‘niches_c2c’), defined as the signals passed between cells, determined by the product of the ligand expression of the sending cell and the receptor expression of the receiving cell, and system-cell signaling (‘niches_n2c’), defined as the signaling input to a cell, determined by taking the geometric mean of the ligand profiles of the surrounding cells and the receptor profile of the receiving cell.similarly, ‘niches_c2n’,’niches_n2n’.

Returns

An anndata of Niches, which rows are mechanisms and columns are all possible cell x cell interactions.

spateo.tools.predict_ligand_activities(adata: anndata.AnnData, path: str, sender_cells: Optional[List[str]] = None, receiver_cells: Optional[List[str]] = None, geneset: Optional[List[str]] = None, ratio_expr_thresh: float = 0.01, species: Literal[human, mouse] = 'human') pandas.DataFrame[source]#

Function to predict the ligand activity.

Our method is adapted from: Robin Browaeys, Wouter Saelens & Yvan Saeys. NicheNet: modeling intercellular communication by linking ligands to target genes. Nature Methods volume 17, pages159–162 (2020).

Parameters
path

Path to ligand_target_matrix, lr_network (human and mouse).

adata

An Annodata object.

sender_cells

Ligand cells.

receiver_cells

Receptor cells.

geneset

The genes set of interest. This may be the differentially expressed genes in receiver cells (comparing cells in case and control group). Ligands activity prediction is based on this gene set. By default, all genes expressed in receiver cells is used.

ratio_expr_thresh

The minimum percentage of buckets expressing the ligand (target) in sender(receiver) cells.

Returns

A pandas DataFrame of the predicted activity ligands.

spateo.tools.predict_target_genes(adata: anndata.AnnData, path: str, sender_cells: Optional[List[str]] = None, receiver_cells: Optional[List[str]] = None, geneset: Optional[List[str]] = None, species: Literal[human, mouse] = 'human', top_ligand: int = 20, top_target: int = 300) pandas.DataFrame[source]#

Function to predict the target genes.

Parameters
lt_matrix_path

Path to ligand_target_matrix of NicheNet.

adata

An Annodata object.

sender_cells

Ligand cells.

receiver_cells

Receptor cells.

geneset

The genes set of interest. This may be the differentially expressed genes in receiver cells (comparing cells in case and control group). Ligands activity prediction is based on this gene set. By default, all genes expressed in receiver cells is used.

top_ligand

int (default=20) select 20 top-ranked ligands for further biological interpretation.

top_target

int (default=300) Infer target genes of top-ranked ligands, and choose the top targets according to the general prior model.

Returns

A pandas DataFrame of the predict target genes.

spateo.tools.spagcn_vanilla(adata: anndata.AnnData, spatial_key: str = 'spatial', key_added: Optional[str] = 'spagcn_pred', n_pca_components: Optional[int] = None, e_neigh: int = 10, resolution: float = 0.4, n_clusters: Optional[int] = None, refine_shape: Literal[hexagon, square] = 'hexagon', p: float = 0.5, seed: int = 100, numIterMaxSpa: int = 2000, copy: bool = False) Optional[anndata.AnnData]#

Integrating gene expression and spatial location to identify spatial domains via SpaGCN. Original Code Repository: https://github.com/jianhuupenn/SpaGCN

Reference:

Jian Hu, Xiangjie Li, Kyle Coleman, Amelia Schroeder, Nan Ma, David J. Irwin, Edward B. Lee, Russell T. Shinohara & Mingyao Li. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nature Methods volume 18, pages1342–1351 (2021)

Parameters
adata

An Anndata object after normalization.

spatial_key

the key in .obsm that corresponds to the spatial coordinate of each bucket.

key_added

adata.obs key under which to add the cluster labels. The initial clustering results of SpaGCN are under key_added, and the refined clustering results are under f’{key_added}_refined’.

n_pca_components

Number of principal components to compute. If n_pca_components == None, the value at the inflection point of the PCA curve is automatically calculated as n_comps.

e_neigh

Number of nearest neighbor in gene expression space. Used in dyn.pp.neighbors(adata, n_neighbors=e_neigh).

resolution

Resolution in the Louvain clustering method. Used when `n_clusters`==None.

n_clusters

Number of spatial domains wanted. If n_clusters != None, the suitable resolution in the initial Louvain clustering method will be automatically searched based on n_clusters.

refine_shape

Smooth the spatial domains with given spatial topology, “hexagon” for Visium data, “square” for ST data. Defaults to None.

p

Percentage of total expression contributed by neighborhoods.

seed

Global seed for random, torch, numpy. Defaults to 100.

numIterMaxSpa

SpaGCN maximum number of training iterations.

copy

Whether to copy adata or modify it inplace.

Returns

Depending on the parameter copy, when True return an updates adata with the field adata.obs[key_added] and adata.obs[f'{key_added}_refined'], containing the cluster result based on SpaGCN; else inplace update the adata object.

spateo.tools.scc(adata: anndata.AnnData, spatial_key: str = 'spatial', key_added: Optional[str] = 'scc', pca_key: str = 'pca', e_neigh: int = 30, s_neigh: int = 6, cluster_method: Literal[leiden, louvain] = 'leiden', resolution: Optional[float] = None, copy: bool = False) Optional[anndata.AnnData]#

Spatially constrained clustering (scc) to identify continuous tissue domains.

Reference:

Ao Chen, Sha Liao, Mengnan Cheng, Kailong Ma, Liang Wu, Yiwei Lai, Xiaojie Qiu, Jin Yang, Wenjiao Li, Jiangshan Xu, Shijie Hao, Xin Wang, Huifang Lu, Xi Chen, Xing Liu, Xin Huang, Feng Lin, Zhao Li, Yan Hong, Defeng Fu, Yujia Jiang, Jian Peng, Shuai Liu, Mengzhe Shen, Chuanyu Liu, Quanshui Li, Yue Yuan, Huiwen Zheng, Zhifeng Wang, H Xiang, L Han, B Qin, P Guo, PM Cánoves, JP Thiery, Q Wu, F Zhao, M Li, H Kuang, J Hui, O Wang, B Wang, M Ni, W Zhang, F Mu, Y Yin, H Yang, M Lisby, RJ Cornall, J Mulder, M Uhlen, MA Esteban, Y Li, L Liu, X Xu, J Wang. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell, 2022.

Parameters
adata

an Anndata object, after normalization.

spatial_key

the key in .obsm that corresponds to the spatial coordinate of each bucket.

key_added

adata.obs key under which to add the cluster labels.

pca_key

the key in .obsm that corresponds to the PCA result.

e_neigh

the number of nearest neighbor in gene expression space.

s_neigh

the number of nearest neighbor in physical space.

cluster_method

the method that will be used to cluster the cells.

resolution

the resolution parameter of the louvain clustering algorithm.

copy

Whether to return a new deep copy of adata instead of updating adata object passed in arguments. Defaults to False.

Returns

Depends on the argument copy, return either an ~anndata.AnnData object with cluster info in “scc_e_{a}_s{b}” or None.

spateo.tools.spagcn_pyg(adata: anndata.AnnData, n_clusters: int, p: float = 0.5, s: int = 1, b: int = 49, refine_shape: Optional[str] = None, his_img_path: Optional[str] = None, total_umi: Optional[str] = None, x_pixel: str = None, y_pixel: str = None, x_array: str = None, y_array: str = None, seed: int = 100, copy: bool = False) Optional[anndata.AnnData]#

Function to find clusters with spagcn.

Reference:

Jian Hu, Xiangjie Li, Kyle Coleman, Amelia Schroeder, Nan Ma, David J. Irwin, Edward B. Lee, Russell T. Shinohara & Mingyao Li. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nature Methods volume 18, pages1342–1351 (2021)

Parameters
adata

an Anndata object, after normalization.

n_clusters

Desired number of clusters.

p

parameter p in spagcn algorithm. See SpaGCN for details. Defaults to 0.5.

s

alpha to control the color scale in calculating adjacent matrix. Defaults to 1.

b

beta to control the range of neighbourhood when calculate grey value for one spot in calculating adjacent matrix. Defaults to 49.

refine_shape

Smooth the spatial domains with given spatial topology, “hexagon” for Visium data, “square” for ST data. Defaults to None.

his_img_path

The file path of histology image used to calculate adjacent matrix in spagcn algorithm. Defaults to None.

total_umi

By providing the key(colname) in adata.obs which contains total UMIs(counts) for each spot, the function use the total counts as a grayscale image when histology image is not provided. Ignored if his_img_path is not None. Defaults to “total_umi”.

x_pixel

The key(colname) in adata.obs which contains corresponding x-pixels in histology image. Defaults to None.

y_pixel

The key(colname) in adata.obs which contains corresponding y-pixels in histology image. Defaults to None.

x_array

The key(colname) in adata.obs which contains corresponding x-coordinates. Defaults to None.

y_array

The key(colname) in adata.obs which contains corresponding y-coordinates. Defaults to None.

seed

Global seed for random, torch, numpy. Defaults to 100.

copy

Whether to return a new deep copy of adata instead of updating adata object passed in arguments. Defaults to False.

Returns

~anndata.AnnData: An ~anndata.AnnData object with cluster info in “spagcn_pred”, and in “spagcn_pred_refined” if refine_shape is set.

The adjacent matrix used in spagcn algorithm is saved in adata.uns[“adj_spagcn”].

Return type

class

spateo.tools.compute_pca_components(matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], random_state: Optional[int] = 1, save_curve_img: Optional[str] = None) Tuple[Any, int, float]#

Calculate the inflection point of the PCA curve to obtain the number of principal components that the PCA should retain.

Parameters
matrix

A dense or sparse matrix.

save_curve_img

If save_curve_img != None, save the image of the PCA curve and inflection points.

Returns

The number of principal components that PCA should retain. new_components_stored: Percentage of variance explained by the retained principal components.

Return type

new_n_components

spateo.tools.ecp_silhouette(matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], cluster_labels: numpy.ndarray) float#

Here we evaluate the clustering performance by calculating the Silhouette Coefficient. The silhouette analysis is used to choose an optimal value for clustering resolution.

The Silhouette Coefficient is a widely used method for evaluating clustering performance, where a higher Silhouette Coefficient score relates to a model with better defined clusters and indicates a good separation between the celltypes.

Advantages of the Silhouette Coefficient:
  • The score is bounded between -1 for incorrect clustering and +1 for highly dense clustering. Scores around zero indicate overlapping clusters.

  • The score is higher when clusters are dense and well separated, which relates to a standard concept of a cluster.

Original Code Repository: https://scikit-learn.org/stable/modules/clustering.html#silhouette-coefficient

Parameters
matrix

A dense or sparse matrix of feature.

cluster_labels

A array of labels for each cluster.

Returns

Mean Silhouette Coefficient for all clusters.

Examples

>>> silhouette_score(matrix=adata.obsm["X_pca"], cluster_labels=adata.obs["leiden"].values)
spateo.tools.harmony_debatch(adata: anndata.AnnData, key: str, basis: str = 'X_pca', adjusted_basis: str = 'X_pca_harmony', max_iter_harmony: int = 10, copy: bool = False) Optional[anndata.AnnData]#

Use harmonypy [Korunsky19] to remove batch effects. This function should be run after performing PCA but before computing the neighbor graph. Original Code Repository: https://github.com/slowkow/harmonypy Interesting example: https://slowkow.com/notes/harmony-animation/

Parameters
adata

An Anndata object.

key

The name of the column in adata.obs that differentiates among experiments/batches.

basis

The name of the field in adata.obsm where the PCA table is stored.

adjusted_basis

The name of the field in adata.obsm where the adjusted PCA table will be stored after running this function.

max_iter_harmony

Maximum number of rounds to run Harmony. One round of Harmony involves one clustering and one correction step.

copy

Whether to copy adata or modify it inplace.

Returns

Updates adata with the field adata.obsm[adjusted_basis], containing principal components adjusted by Harmony.

spateo.tools.integrate(adatas: List[anndata.AnnData], batch_key: str = 'slices', fill_value: Union[int, float] = 0) anndata.AnnData#

Concatenating all anndata objects.

Parameters
adatas

AnnData matrices to concatenate with.

batch_key

Add the batch annotation to obs using this key.

fill_value

Scalar value to fill newly missing values in arrays with.

Returns

The concatenated AnnData, where adata.obs[batch_key] stores a categorical variable labeling the batch.

Return type

integrated_adata

spateo.tools.pca_spateo(adata: anndata.AnnData, X_data=None, n_pca_components: Optional[int] = None, pca_key: Optional[str] = 'X_pca', genes: Union[list, None] = None, layer: Union[str, None] = None, random_state: Optional[int] = 1)#

Do PCA for dimensional reduction.

Parameters
adata

An Anndata object.

X_data

The user supplied data that will be used for dimension reduction directly.

n_pca_components

The number of principal components that PCA will retain. If none, will Calculate the inflection point of the PCA curve to obtain the number of principal components that the PCA should retain.

pca_key

Add the PCA result to obsm using this key.

genes

The list of genes that will be used to subset the data for dimension reduction and clustering. If None, all genes will be used.

layer

The layer that will be used to retrieve data for dimension reduction and clustering. If None, will use adata.X.

Returns

The processed AnnData, where adata.obsm[pca_key] stores the PCA result.

Return type

adata_after_pca

spateo.tools.pearson_residuals(adata: anndata.AnnData, n_top_genes: Optional[int] = 3000, subset: bool = False, theta: float = 100, clip: Optional[float] = None, check_values: bool = True)#

Preprocess UMI count data with analytic Pearson residuals.

Pearson residuals transform raw UMI counts into a representation where three aims are achieved:

1.Remove the technical variation that comes from differences in total counts between cells; 2.Stabilize the mean-variance relationship across genes, i.e. ensure that biological signal from both low and

high expression genes can contribute similarly to downstream processing

3.Genes that are homogeneously expressed (like housekeeping genes) have small variance, while genes that are

differentially expressed (like marker genes) have high variance

Parameters
adata

An anndata object.

n_top_genes

Number of highly-variable genes to keep.

subset

Inplace subset to highly-variable genes if True otherwise merely indicate highly variable genes.

theta

The negative binomial overdispersion parameter theta for Pearson residuals. Higher values correspond to less overdispersion (var = mean + mean^2/theta), and theta=np.Inf corresponds to a Poisson model.

clip

Determines if and how residuals are clipped: * If None, residuals are clipped to the interval [-sqrt(n), sqrt(n)], where n is the number of cells

in the dataset (default behavior).

  • If any scalar c, residuals are clipped to the interval [-c, c]. Set clip=np.Inf for no clipping.

check_values

Check if counts in selected layer are integers. A Warning is returned if set to True.

Returns

Updates adata with the field adata.obsm["pearson_residuals"], containing pearson_residuals.

spateo.tools.sctransform(adata: anndata.AnnData, rlib_path: str, n_top_genes: Optional[int] = 3000, save_sct_img_1: Optional[str] = None, save_sct_img_2: Optional[str] = None, **kwargs)#

Use sctransform with an additional flag vst.flavor=”v2” to perform normalization and dimensionality reduction Original Code Repository: https://github.com/saketkc/pySCTransform

Installation: Conda:

`conda install R`

R:
```if (!require(“BiocManager”, quietly = TRUE))

install.packages(“BiocManager”)```

`BiocManager::install(version = "3.14")` `BiocManager::install("glmGamPoi")`

Python:

`pip install rpy2` `pip install git+https://github.com/saketkc/pysctransform`

Examples

>>> sctransform(adata=adata, rlib_path="/Users/jingzehua/opt/anaconda3/envs/spateo/lib/R")
Parameters
adata

An Anndata object.

rlib_path

library path for R environment.

n_top_genes

Number of highly-variable genes to keep.

save_sct_img_1

If save_sct_img_1 != None, save the image of the GLM model parameters.

save_sct_img_2

If save_sct_img_2 != None, save the image of the final residual variances.

**kwargs

Additional keyword arguments to pysctransform.SCTransform.

Returns

Updates adata with the field adata.obsm["pearson_residuals"], containing pearson_residuals.

spateo.tools.scc(adata: anndata.AnnData, spatial_key: str = 'spatial', key_added: Optional[str] = 'scc', pca_key: str = 'pca', e_neigh: int = 30, s_neigh: int = 6, cluster_method: Literal[leiden, louvain] = 'leiden', resolution: Optional[float] = None, copy: bool = False) Optional[anndata.AnnData]#

Spatially constrained clustering (scc) to identify continuous tissue domains.

Reference:

Ao Chen, Sha Liao, Mengnan Cheng, Kailong Ma, Liang Wu, Yiwei Lai, Xiaojie Qiu, Jin Yang, Wenjiao Li, Jiangshan Xu, Shijie Hao, Xin Wang, Huifang Lu, Xi Chen, Xing Liu, Xin Huang, Feng Lin, Zhao Li, Yan Hong, Defeng Fu, Yujia Jiang, Jian Peng, Shuai Liu, Mengzhe Shen, Chuanyu Liu, Quanshui Li, Yue Yuan, Huiwen Zheng, Zhifeng Wang, H Xiang, L Han, B Qin, P Guo, PM Cánoves, JP Thiery, Q Wu, F Zhao, M Li, H Kuang, J Hui, O Wang, B Wang, M Ni, W Zhang, F Mu, Y Yin, H Yang, M Lisby, RJ Cornall, J Mulder, M Uhlen, MA Esteban, Y Li, L Liu, X Xu, J Wang. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell, 2022.

Parameters
adata

an Anndata object, after normalization.

spatial_key

the key in .obsm that corresponds to the spatial coordinate of each bucket.

key_added

adata.obs key under which to add the cluster labels.

pca_key

the key in .obsm that corresponds to the PCA result.

e_neigh

the number of nearest neighbor in gene expression space.

s_neigh

the number of nearest neighbor in physical space.

cluster_method

the method that will be used to cluster the cells.

resolution

the resolution parameter of the louvain clustering algorithm.

copy

Whether to return a new deep copy of adata instead of updating adata object passed in arguments. Defaults to False.

Returns

Depends on the argument copy, return either an ~anndata.AnnData object with cluster info in “scc_e_{a}_s{b}” or None.

spateo.tools.spagcn_pyg(adata: anndata.AnnData, n_clusters: int, p: float = 0.5, s: int = 1, b: int = 49, refine_shape: Optional[str] = None, his_img_path: Optional[str] = None, total_umi: Optional[str] = None, x_pixel: str = None, y_pixel: str = None, x_array: str = None, y_array: str = None, seed: int = 100, copy: bool = False) Optional[anndata.AnnData]#

Function to find clusters with spagcn.

Reference:

Jian Hu, Xiangjie Li, Kyle Coleman, Amelia Schroeder, Nan Ma, David J. Irwin, Edward B. Lee, Russell T. Shinohara & Mingyao Li. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nature Methods volume 18, pages1342–1351 (2021)

Parameters
adata

an Anndata object, after normalization.

n_clusters

Desired number of clusters.

p

parameter p in spagcn algorithm. See SpaGCN for details. Defaults to 0.5.

s

alpha to control the color scale in calculating adjacent matrix. Defaults to 1.

b

beta to control the range of neighbourhood when calculate grey value for one spot in calculating adjacent matrix. Defaults to 49.

refine_shape

Smooth the spatial domains with given spatial topology, “hexagon” for Visium data, “square” for ST data. Defaults to None.

his_img_path

The file path of histology image used to calculate adjacent matrix in spagcn algorithm. Defaults to None.

total_umi

By providing the key(colname) in adata.obs which contains total UMIs(counts) for each spot, the function use the total counts as a grayscale image when histology image is not provided. Ignored if his_img_path is not None. Defaults to “total_umi”.

x_pixel

The key(colname) in adata.obs which contains corresponding x-pixels in histology image. Defaults to None.

y_pixel

The key(colname) in adata.obs which contains corresponding y-pixels in histology image. Defaults to None.

x_array

The key(colname) in adata.obs which contains corresponding x-coordinates. Defaults to None.

y_array

The key(colname) in adata.obs which contains corresponding y-coordinates. Defaults to None.

seed

Global seed for random, torch, numpy. Defaults to 100.

copy

Whether to return a new deep copy of adata instead of updating adata object passed in arguments. Defaults to False.

Returns

~anndata.AnnData: An ~anndata.AnnData object with cluster info in “spagcn_pred”, and in “spagcn_pred_refined” if refine_shape is set.

The adjacent matrix used in spagcn algorithm is saved in adata.uns[“adj_spagcn”].

Return type

class

spateo.tools.find_all_cluster_degs(adata: anndata.AnnData, group: str, genes: Optional[List[str]] = None, layer: Optional[str] = None, X_data: Optional[numpy.ndarray] = None, copy: bool = True, n_jobs: int = 1) anndata.AnnData[source]#

Find marker genes for each group of buckets based on gene expression.

Parameters
adata

An Annadata object

group

The column key/name that identifies the grouping information (for example, clusters that correspond to different cell types) of buckets. This will be used for calculating group-specific genes.

genes

The list of genes that will be used to subset the data for identifying DEGs. If None, all genes will be used.

layer

The layer that will be used to retrieve data for DEG analyses. If None and X_data is not given, .X is used.

X_data

The user supplied data that will be used for marker gene detection directly.

copy

If True (default) a new copy of the adata object will be returned, otherwise if False, the adata will be updated inplace.

n_cores

int (default=1) The maximum number of concurrently running jobs. By default it is 1 and thus no parallel computing code is used at all. When -1 all CPUs are used.

Returns

An ~anndata.AnnData with a new property cluster_markers in the .uns attribute, which includes a concatenated pandas DataFrame of the differential expression analysis result for all groups and a dictionary where keys are cluster numbers and values are lists of marker genes for the corresponding clusters. Please note that the markers are not the top marker genes. To identify top n marker genes, Use st.tl.cluster_degs.top_n_degs(adata, group=’louvain’).

spateo.tools.find_cluster_degs(adata: anndata.AnnData, test_group: str, control_groups: List[str], genes: Optional[List[str]] = None, layer: Optional[str] = None, X_data: Optional[numpy.ndarray] = None, group: Optional[str] = None, qval_thresh: float = 0.05, ratio_expr_thresh: float = 0.1, diff_ratio_expr_thresh: float = 0, log2fc_thresh: float = 0, method: Literal[multiple, pairwise] = 'multiple') pandas.DataFrame[source]#

Find marker genes between one group to other groups based on gene expression.

Test each gene for differential expression between buckets in one group and the other groups via Mann-Whitney U test. We calculate the percentage of buckets expressing the gene in the test group (ratio_expr), the difference between the percentages of buckets expressing the gene in the test group and control groups (diff_ratio_expr), the expression fold change between the test and control groups (log2fc), qval is calculated using Benjamini-Hochberg. In addition, the 1 - Jessen-Shannon distance between the distribution of percentage of cells with expression across all groups to the hypothetical perfect distribution in which only the test group of cells has expression (jsd_adj_score), and Pearson’s correlation coefficient between gene vector which actually detected expression in all cells and an ideal marker gene which is only expressed in test_group cells (ppc_score), as well as cosine_score are also calculated.

Parameters
adata

an Annodata object

test_group

The group name from group for which markers has to be found.

control_groups

The list of group name(s) from group for which markers has to be tested against.

genes

The list of genes that will be used to subset the data for identifying DEGs. If None, all genes will be used.

layer

The layer that will be used to retrieve data for DEG analyses. If None and X_data is not given, .X is used.

group

The column key/name that identifies the grouping information (for example, clusters that correspond to different cell types) of buckets. This will be used for calculating group-specific genes.

X_data

The user supplied data that will be used for marker gene detection directly.

qval_thresh

The maximal threshold of qval to be considered as significant genes.

ratio_expr_thresh

The minimum percentage of buckets expressing the gene in the test group.

diff_ratio_expr_thresh

The minimum of the difference between two groups.

log2fc_thresh

The minimum expression log2 fold change.

method

This method is to choose the difference expression genes between test group and other groups one by one or combine them together (default: ‘multiple’). Valid values are “multiple” and “pairwise”.

Returns

A pandas DataFrame of the differential expression analysis result between the two groups.

Raises

ValueError – If the method is not one of “pairwise” or “multiple”.

spateo.tools.find_spatial_cluster_degs(adata: anndata.AnnData, test_group: str, x: Optional[List[int]] = None, y: Optional[List[int]] = None, group: Optional[str] = None, genes: Optional[List[str]] = None, k: int = 10, ratio_thresh: float = 0.5) pandas.DataFrame[source]#

Function to search nearest neighbor groups in spatial space for the given test group.

Parameters
adata

an Annodata object.

test_group

The group name from group for which neighbors has to be found.

x

x-coordinates of all buckets.

y

y-coordinates of all buckets.

group

The column key/name that identifies the grouping information (for example, clusters that correspond to different cell types) of buckets.

genes

The list of genes that will be used to subset the data for identifying DEGs. If None, all genes will be used.

k

Number of neighbors to use for kneighbors queries.

ratio_thresh

For each non-test group, if more than 50% (default) of its buckets are in the neighboring set, this group is then selected as a neighboring group.

Returns

A pandas DataFrame of the differential expression analysis result between the test group and neighbor groups.

spateo.tools.top_n_degs(adata: anndata.AnnData, group: str, custom_score_func: Union[None, Callable] = None, sort_by: Union[str, List[str]] = 'log2fc', top_n_genes=10, only_deg_list: bool = True)[source]#

Find top n marker genes for each group of buckets based on differential gene expression analysis results.

Parameters
adata

an Annodata object

group

The column key/name that identifies the grouping information (for example, clusters that correspond to different cell types) of buckets. This will be used for calculating group-specific genes.

custom_score_func

A custom function to calculate the score based on the DEG analyses result. Note the columns in adata.uns[“cluster_markers”][“deg_tables”] includes:

  • ”test_group”,

  • ”control_group”,

  • ”ratio_expr”,

  • ”diff_ratio_expr”,

  • ”person_score”,

  • ”cosine_score”,

  • ”jsd_adj_score”,

  • ”log2fc”,

  • ”combined_score”,

  • ”pval”,

  • ”qval”.

sort_by

str or list Column name or names to sort by.

top_n_genes

int The number of top sorted markers.

only_gene_list

bool Whether to only return the marker gene list for each cluster.

class spateo.tools.Lasso(adata)[source]#

Lasso an region of interest (ROI) based on spatial cluster.

Examples

L = st.tl.Lasso(adata) L.vi_plot(group=’group’, group_color=’group_color’)

__sub_inde = []#
sub_adata#
vi_plot(key='spatial', group: Optional[str] = None, group_color: Optional[str] = None)#

Plot spatial cluster result and lasso ROI.

Parameters
key

The column key in .obsm, default to be ‘spatial’.

group

The column key/name that identifies the grouping information (for example, clusters that correspond to different cell types) of buckets.

group_color

The key in .uns, corresponds to a dictionary that map group names to group colors.

Returns

subset of adata.

Return type

sub_adata

spateo.tools.AffineTrans(x: numpy.ndarray, y: numpy.ndarray, centroid_x: float, centroid_y: float, theta: Tuple[None, float], R: Tuple[None, numpy.ndarray]) Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][source]#

Translate the x/y coordinates of data points by the translating the centroid to the origin. Then data will be rotated with angle theta.

Parameters
x

x coordinates for the data points (bins). 1D np.array.

y

y coordinates for the data points (bins). 1D np.array.

centroid_x

x coordinates for the centroid of data points (bins).

centroid_y

y coordinates for the centroid of data points (bins).

theta

the angle of rotation. Unit is is in np.pi (so 90 degree is np.pi / 2 and value is defined in the clockwise direction.

R

the rotation matrix. If R is provided, theta will be ignored.

Returns

The translation matrix used in affine transformation. T_r: The rotation matrix used in affine transformation. trans_xy_coord: The matrix that stores the translated and rotated coordinates.

Return type

T_t

spateo.tools.align_slices_pca(adata: anndata.AnnData, spatial_key: str = 'spatial', inplace: bool = False, result_key: Tuple[None, str] = None) None[source]#

Coarsely align the slices based on the major axis, identified via PCA

Parameters
adata

the input adata object that contains the spatial key in .obsm.

spatial_key

the key in .obsm that points to the spatial information.

inplace

whether the spatial coordinates will be inplace updated or a new key `spatial_.

result_key

when inplace is False, this points to the key in .obsm that stores the corrected spatial coordinates.

Returns

Nothing but updates the spatial coordinates either inplace or with the result_key key based on the major axis identified via PCA.

spateo.tools.pca_align(X: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray][source]#

Use pca to rotate a coordinate matrix to reveal the largest variance on each dimension.

This can be used to correct, for example, embryo slices to the right orientation.

Parameters
X

The input coordinate matrix.

Returns

The rotated coordinate matrix that has the major variances on each dimension. R: The rotation matrix that was used to convert the input X matrix to output Y matrix.

Return type

Y

spateo.tools.procrustes(X: numpy.ndarray, Y: numpy.ndarray, scaling: bool = True, reflection: str = 'best') Tuple[float, numpy.ndarray, dict][source]#

A port of MATLAB’s procrustes function to Numpy.

This function will need to be rewritten just with scipy.spatial.procrustes and scipy.linalg.orthogonal_procrustes later.

Procrustes analysis determines a linear transformation (translation, reflection, orthogonal rotation and scaling) of the points in Y to best conform them to the points in matrix X, using the sum of squared errors as the goodness of fit criterion.

d, Z, [tform] = procrustes(X, Y)

Parameters
X

matrices of target and input coordinates. they must have equal numbers of points (rows), but Y may have fewer dimensions (columns) than X. scaling: if False, the scaling component of the transformation is forced to 1

Y

matrices of target and input coordinates. they must have equal numbers of points (rows), but Y may have fewer dimensions (columns) than X. scaling: if False, the scaling component of the transformation is forced to 1

reflection

if ‘best’ (default), the transformation solution may or may not include a reflection component, depending on which fits the data best. setting reflection to True or False forces a solution with reflection or no reflection respectively.

Returns

the residual sum of squared errors, normalized according to a measure of the scale of X,

((X - X.mean(0))**2).sum()

Z: the matrix of transformed Y-values tform: a dict specifying the rotation, translation and scaling that maps X –> Y

Return type

d

spateo.tools.construct_geodesic_distance_matrix(adata: anndata.AnnData, spatial_key: str = 'spatial', nbr_object: sklearn.neighbors.NearestNeighbors = None, method: str = 'ball_tree', n_neighbors: int = 30, min_dist_threshold: Optional[float] = None, max_dist_threshold: Optional[float] = None) anndata.AnnData[source]#

Given AnnData object and key to array of x- and y-coordinates, compute geodesic distance each sample and its nearest neighbors (geodesic distance is the shortest path between vertices, where paths are lines in space that connect points).

Parameters
adata

AnnData object.

spatial_key

Key in .obsm in which x- and y-coordinates are stored.

nbr_object

An optional sklearn.neighbors.NearestNeighbors object. Can optionally create a nearest neighbor object with custom functionality.

method

Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”.

n_neighbors

For each bucket, number of neighbors to include in the distance matrix.

min_dist_threshold

Optional, sets the max allowable distance that a cell can be from its nearest neighbor to avoid being filtered out. Used to remove singular isolated cells.

max_dist_threshold

Optional, used to remove clusters of isolated cells close to one another but far from all other cells.

Returns

Input AnnData object with spatial distance matrix and geodesic distance matrix in .obsp.

Return type

adata

spateo.tools.construct_nn_graph(adata: anndata.AnnData, spatial_key: str = 'spatial', dist_metric: str = 'euclidean', n_neighbors: int = 8, exclude_self: bool = True, save_id: Union[None, str] = None) None[source]#

Constructing bucket-to-bucket nearest neighbors graph.

Parameters
adata

An anndata object.

spatial_key

Key in .obsm in which x- and y-coordinates are stored.

dist_metric

Distance metric to use. Options: ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’.

n_neighbors

Number of nearest neighbors to compute for each bucket.

exclude_self

Set True to set elements along the diagonal to zero.

save_id

Optional string; if not None, will save distance matrix and neighbors matrix to path:

path : './neighbors/{save_id}_distance.csv' and

‘./neighbors/{save_id}_neighbors.csv’, respectively.

spateo.tools.construct_spatial_distance_matrix(adata: anndata.AnnData, spatial_key: str = 'spatial', dist_metric: str = 'euclidean', min_dist_threshold: Optional[float] = None, max_dist_threshold: Optional[float] = None) anndata.AnnData[source]#

Given AnnData object and key to array of x- and y-coordinates, compute pairwise spatial distances between all samples.

Parameters
adata

An AnnData object.

spatial_key

Key in .obsm in which x- and y-coordinates are stored.

dist_metric

Distance metric to use. Options: ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’.

min_dist_threshold

Optional, sets the max allowable distance that a cell can be from its nearest neighbor to avoid being filtered out. Used to remove singular isolated cells.

max_dist_threshold

Optional, used to remove clusters of isolated cells close to one another but far from all other cells.

Returns

Input AnnData object with spatial distance matrix in .obsp.

Return type

adata

spateo.tools.generate_spatial_weights_fixed_nbrs(adata: anndata.AnnData, spatial_key: str = 'spatial', num_neighbors: int = 10, method: str = 'ball_tree', decay_type: str = 'reciprocal', nbr_object: sklearn.neighbors.NearestNeighbors = None) Union[Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, anndata.AnnData]][source]#

Starting from a k-nearest neighbor graph, generate a nearest neighbor graph.

Parameters
spatial_key

Key in .obsm where x- and y-coordinates are stored.

num_neighbors

Number of neighbors each bucket has.

method

Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options:

"kd_tree". : "ball_tree" and

decay_type

Sets method by which edge weights are defined. Options: “reciprocal”, “ranked”, “uniform”.

Returns

Weighted k-nearest neighbors graph with shape [n_samples, n_samples]. distance_graph: Unweighted graph with shape [n_samples, n_samples]. adata: Updated AnnData object containing ‘spatial_distances’,’spatial_weights’,’spatial_connectivities’ in .obsp and ‘spatial_neighbors’ in .uns.

Return type

out_graph

spateo.tools.generate_spatial_weights_fixed_radius(adata: anndata.AnnData, spatial_key: str = 'spatial', p: float = 0.05, sigma: float = 100, nbr_object: sklearn.neighbors.NearestNeighbors = None, method: str = 'ball_tree', verbose: bool = False) Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, anndata.AnnData][source]#

Starting from a radius-based neighbor graph, generate a sparse graph (csr format) with weighted edges, where edge weights decay with distance.

Note that decay is assumed to follow a Gaussian distribution.

Parameters
spatial_key

Key in .obsm where x- and y-coordinates are stored.

p

Cutoff for Gaussian (used to find where distribution drops below p * (max_value)).

sigma

Standard deviation of the Gaussian.

method

Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”.

Returns

Weighted nearest neighbors graph with shape [n_samples, n_samples]. distance_graph: Unweighted graph with shape [n_samples, n_samples]. adata: Updated AnnData object containing ‘spatial_distances’,’spatial_weights’,’spatial_connectivities’ in .obsp and ‘spatial_neighbors’ in .uns.

Return type

out_graph

spateo.tools.weighted_expr_neighbors_graph(adata: anndata.AnnData, nbr_object: sklearn.neighbors.NearestNeighbors = None, basis: str = 'pca', n_neighbors_method: str = 'ball_tree', n_pca_components: int = 30, num_neighbors: int = 30, decay_type: str = 'reciprocal') Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, anndata.AnnData][source]#

Given an AnnData object, compute distance array in gene expression space.

Parameters
adata

an anndata object.

nbr_object

An optional sklearn.neighbors.NearestNeighbors object. Can optionally create a nearest neighbor object with custom functionality.

basis

str, default ‘pca’ The space that will be used for nearest neighbor search. Valid names includes, for example, pca, umap, or X

n_neighbors_method

str, default ‘ball_tree’ Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”.

n_pca_components

Only used if ‘basis’ is ‘pca’. Sets number of principal components to compute.

num_neighbors

Number of neighbors for each bucket, used in computing distance graph

decay_type

Sets method by which edge weights are defined. Options: “reciprocal”, “ranked”, “uniform”.

Returns

Weighted k-nearest neighbors graph with shape [n_samples, n_samples]. distance_graph: Unweighted graph with shape [n_samples, n_samples]. adata: Updated AnnData object containing ‘spatial_distance’ in .obsp and ‘spatial_neighbors’ in .uns.

Return type

out_graph

spateo.tools.weighted_spatial_graph(adata: anndata.AnnData, spatial_key: str = 'spatial', fixed: str = 'n_neighbors', n_neighbors_method: str = 'ball_tree', n_neighbors: int = 30, decay_type: str = 'reciprocal', p: float = 0.05, sigma: float = 100) Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, anndata.AnnData][source]#

Given an AnnData object, compute distance array with either a fixed number of neighbors for each bucket or a fixed search radius for each bucket. Additional note: parameters ‘p’ and ‘sigma’ (used only if ‘fixed’ is ‘radius’) are used to modulate the radius when defining neighbors using a fixed radius. ‘Sigma’ parameterizes the standard deviation (e.g. in pixels, micrometers, etc.) of a Gaussian distribution that is centered at a particular bucket with height ‘a’- to search for that bucket’s neighbors, ‘p’ is the cutoff height of the Gaussian, as a proportion of the peak height ‘a’. Essentially, to define the radius that should be used for all buckets, this function measures how far out from each bucket you would need to go before the Gaussian decays to e.g. 0.05 of its peak height. With knowledge of e.g. diffusion kinetics for particular soluble factors, the neighborhood can be defined taking this into account.

Parameters
adata

an anndata object.

spatial_key

Key in .obsm containing coordinates for each bucket.

fixed

Options: ‘n_neighbors’, ‘radius’- sets either fixed number of neighbors or fixed search radius for each bucket.

n_neighbors_method

Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”. Unused unless ‘fixed’ is ‘n_neighbors’.

n_neighbors

Number of neighbors each bucket has. Unused unless ‘fixed’ is ‘n_neighbors’.

decay_type

Sets method by which edge weights are defined. Options: “reciprocal”, “ranked”, “uniform”. Unused unless ‘fixed’ is ‘n_neighbors’.

p

Cutoff for Gaussian (used to find where distribution drops below p * (max_value)). Unused unless ‘fixed’ is ‘radius’.

sigma

Standard deviation of the Gaussian. Unused unless ‘fixed’ is ‘radius’.

Returns

Weighted nearest neighbors graph with shape [n_samples, n_samples] distance_graph: Unweighted graph with shape [n_samples, n_samples] adata: Updated AnnData object containing ‘spatial_distances’,’spatial_weights’,’spatial_connectivities’ in .obsp and ‘spatial_neighbors’ in .uns.

Return type

out_graph

spateo.tools.glm_degs(adata: anndata.AnnData, X_data: Optional[numpy.ndarray] = None, genes: Optional[list] = None, layer: Optional[str] = None, key_added: str = 'glm_degs', fullModelFormulaStr: str = '~cr(time, df=3)', reducedModelFormulaStr: str = '~1', qval_threshold: Optional[float] = 0.05, llf_threshold: Optional[float] = -2000, ci_alpha: float = 0.05, inplace: bool = True) Optional[anndata.AnnData][source]#

Differential genes expression tests using generalized linear regressions. Here only size factor normalized gene expression matrix can be used, and SCT/pearson residuals transformed gene expression can not be used.

Tests each gene for differential expression as a function of integral time (the time estimated via the reconstructed vector field function) or pseudo-time using generalized additive models with natural spline basis. This function can also use other co-variates as specified in the full (i.e ~clusters) and reduced model formula to identify differentially expression genes across different categories, group, etc. glm_degs relies on statsmodels package and is adapted from the differentialGeneTest function in Monocle. Note that glm_degs supports performing deg analysis for any layer or normalized data in your adata object. That is you can either use the total, new, unspliced or velocity, etc. for the differential expression analysis.

Parameters
adata

An Anndata object. The anndata object must contain a size factor normalized gene expression matrix.

X_data

The user supplied data that will be used for differential expression analysis directly.

genes

The list of genes that will be used to subset the data for differential expression analysis. If genes = None, all genes will be used.

layer

The layer that will be used to retrieve data for dimension reduction and clustering. If layer = None, .X is used.

key_added

The key that will be used for the glm_degs key in .uns.

fullModelFormulaStr

A formula string specifying the full model in differential expression tests (i.e. likelihood ratio tests) for each gene/feature.

reducedModelFormulaStr

A formula string specifying the reduced model in differential expression tests (i.e. likelihood ratio tests) for each gene/feature.

qval_threshold

Only keep the glm test results whose qval is less than the qval_threshold.

llf_threshold

Only keep the glm test results whose log-likelihood is less than the llf_threshold.

ci_alpha

The significance level for the confidence interval. The default ci_alpha = .05 returns a 95% confidence interval.

inplace

Whether to copy adata or modify it inplace.

Returns

An AnnData object is updated/copied with the key_added dictionary in the .uns attribute, storing the differential expression test results after the GLM test.

class spateo.tools.Label(labels_dense: Union[numpy.ndarray, list], str_map: Union[None, dict] = None, verbose: bool = False)[source]#

Bases: object

Given categorizations for a set of points, wrap into a Label class.

labels_dense: Numerical labels. str_map: Optional mapping of numerical labels (keys) to strings (values). verbose: whether to print running info of row_normalize.

__repr__() str#

Return repr(self).

__str__() str#

Return str(self).

get_onehot() scipy.sparse.csr_matrix#

return one-hot sparse array of labels. If not already computed, generate the sparse array from dense label array

get_normalized_onehot() scipy.sparse.csr_matrix#

Return normalized one-hot sparse array of labels.

generate_normalized_onehot() scipy.sparse.csr_matrix#

Generate a normalized onehot matrix where each row is normalized by the count of that label e.g. a row [0 1 1 0 0] will be converted to [0 0.5 0.5 0 0]

generate_onehot() scipy.sparse.csr_matrix#

Convert an array of labels to a num_labels x num_samples sparse one-hot matrix Labels MUST be integers starting from 0, but can have gaps in between e.g. [0,1,5,9]

spateo.tools.create_label_class(adata: anndata.AnnData, cat_key: Union[str, List[str]]) Union[Label, List[Label]][source]#

Wraps categorical labels into custom Label class for downstream processing.

Parameters
adata

An anndata object.

cat_key

Keys in .obs containing categorical labels. This function and the Label class provide the most utility when this is used in conjunction with the results of multiple different runs of the Louvain algorithm.

Returns

Either an object of Label class or a list where each element is an object of Label class. Will return a

list if given multiple arguments to ‘cat_key’.

Return type

label

spateo.tools.GM_lag_model(adata: anndata.AnnData, group: str, spatial_key: str = 'spatial', genes: Tuple[None, list] = None, drop_dummy: Tuple[None, str] = None, n_neighbors: int = 5, layer: Tuple[None, str] = None, copy: bool = False, n_jobs=30)[source]#

Spatial lag model with spatial two stage least squares (S2SLS) with results and diagnostics; Anselin (1988).

math

`log{P_i} = lpha +

ho log{P_{lag-i}} + sum_k eta_k X_{ki} + epsilon_i`

Reference:

https://geographicdata.science/book/notebooks/11_regression.html http://darribas.org/gds_scipy16/ipynb_md/08_spatial_regression.html

Args:

adata: An adata object that has spatial information (via spatial_key key in adata.obsm). group: The key to the cell group in the adata object. spatial_key: The spatial key of the spatial coordinate of each bucket. genes: The gene that will be used for S2SLS analyses, must be included in the data. drop_dummy: The name of the dummy group. n_neighbors: The number of nearest neighbors of each bucket that will be used in calculating the spatial lag. layer: The key to the layer. If it is None, adata.X will be used by default. copy: Whether to copy the adata object.

Returns:

Depend on the copy argument, return a deep copied adata object (when copy = True) or inplace updated adata object. The result adata will include the following new columns in adata.var:

{*}_GM_lag_coeff: coefficient of GM test for each cell group (denoted by {*}) {*}_GM_lag_zstat: z-score of GM test for each cell group (denoted by {*}) {*}_GM_lag_pval: p-value of GM test for each cell group (denoted by {*})

Examples: >>> import spateo as st >>> st.tl.GM_lag_model(adata, group=’simpleanno’) >>> coef_cols = adata.var.columns[adata.var.columns.str.endswith(‘_GM_lag_coeff’)] >>> adata.var.loc[[“Hbb-bt”, “Hbb-bh1”, “Hbb-y”, “Hbb-bs”], :].T >>> for i in coef_cols[1:-1]: >>> print(i) >>> top_markers = adata.var.sort_values(i, ascending=False).index[:5] >>> st.pl.space(adata, basis=’spatial’, color=top_markers, ncols=5, pointsize=0.1, alpha=1) >>> st.pl.space(adata.copy(), basis=’spatial’, color=[‘simpleanno’], >>> highlights=[i.split(‘_GM_lag_coeff’)[0]], pointsize=0.1, alpha=1, show_legend=’on data’)

spateo.tools.lisa_geo_df(adata: anndata.AnnData, gene: str, spatial_key: str = 'spatial', n_neighbors: int = 8, layer: Tuple[None, str] = None) geopandas.GeoDataFrame[source]#

Perform Local Indicators of Spatial Association (LISA) analyses on specific genes and prepare a geopandas dataframe for downstream lisa plots to reveal the quantile plots and the hotspot, coldspot, doughnut and diamond regions.

Parameters
adata

An adata object that has spatial information (via spatial_key key in adata.obsm).

gene

The gene that will be used for lisa analyses, must be included in the data.

spatial_key

The spatial key of the spatial coordinate of each bucket.

n_neighbors

The number of nearest neighbors of each bucket that will be used in calculating the spatial lag.

layer

the key to the layer. If it is None, adata.X will be used by default.

Returns

a geopandas dataframe that includes the coordinate (x, y columns), expression (exp column) and lagged expression (w_exp column), z-score (exp_zscore, w_exp_zscore) and the LISA (Is column). score.

Return type

df

spateo.tools.local_moran_i(adata: anndata.AnnData, group: str, spatial_key: str = 'spatial', genes: Tuple[None, list] = None, layer: Tuple[None, str] = None, n_neighbors: int = 5, copy: bool = False, n_jobs: int = 30)[source]#

Identify cell type specific genes with local Moran’s I test.

Parameters
adata

An adata object that has spatial information (via spatial_key key in adata.obsm).

group

The key to the cell group in the adata.obs.

spatial_key

The spatial key of the spatial coordinate of each bucket.

genes

The gene that will be used for lisa analyses, must be included in the data.

layer

the key to the layer. If it is None, adata.X will be used by default.

n_neighbors

The number of nearest neighbors of each bucket that will be used in calculating the spatial lag.

copy

Whether to copy the adata object.

Returns

Depend on the copy argument, return a deep copied adata object (when copy = True) or inplace updated adata object. The resultant adata will include the following new columns in adata.var:

{*}_num_val: The maximum number of categories (`{“hotspot”, “coldspot”, “doughnut”, “diamond”}) across all

cell groups

{*}_frac_val: The maximum fraction of categories across all cell groups {*}_spec_val: The maximum specificity of categories across all cell groups {*}_num_group: The corresponding cell group with the largest number of each category (this can be affect by

the cell group size).

{*}_frac_group: The corresponding cell group with the highest fraction of each category. {*}_spec_group: The corresponding cell group with the highest specificity of each category.

{*} can be one of {“hotspot”, “coldspot”, “doughnut”, “diamond”}.

Examples: >>> import spateo as st >>> markers_df = pd.DataFrame(adata.var).query(“hotspot_frac_val > 0.05 & mean > 0.05”). >>> groupby([‘hotspot_spec_group’])[‘hotspot_spec_val’].nlargest(5) >>> markers = markers_df.index.get_level_values(1) >>> >>> for i in adata.obs[group].unique(): >>> if i in markers_df.index.get_level_values(0): >>> print(markers_df[i]) >>> dyn.pl.space(adata, color=group, highlights=[i], pointsize=0.1, alpha=1, figsize=(12, 8)) >>> st.pl.space(adata, color=markers_df[i].index, pointsize=0.1, alpha=1, figsize=(12, 8))

class spateo.tools.LiveWireSegmentation(image: Optional = None, smooth_image: bool = False, threshold_gradient_image: bool = False)[source]#

Bases: object

property image#
_smooth_image()#
_compute_gradient_image()#
_threshold_gradient_image()#
_compute_graph()#
compute_shortest_path(startPt, endPt)#
spateo.tools.compute_shortest_path(image: numpy.ndarray, startPt: Tuple[float, float], endPt: Tuple[float, float]) List[source]#

Inline function for easier computation of shortest_path in an image. This function will create a new instance of LiveWireSegmentation class every time it is called, calling for a recomputation of the gradient image and the shortest path graph. If you need to compute the shortest path in one image more than once, use the class-form initialization instead.

Parameters
image

image on which the shortest path should be computed

startPt

starting point for path computation

endPt

target point for path computation

Returns

shortest path as a list of tuples (x, y), including startPt and endPt

Return type

path

spateo.tools.live_wire(image: numpy.ndarray, smooth_image: bool = False, threshold_gradient_image: bool = False, interactive: bool = True) List[numpy.ndarray]#

Use LiveWire segmentation algorithm for image segmentation aka intelligent scissors. The general idea of the algorithm is to use image information for segmentation and avoid crossing object boundaries. A gradient image highlights the boundaries, and Dijkstra’s shortest path algorithm computes a path using gradient differences as segment costs. Thus the line avoids strong gradients in the gradient image, which corresponds to following object boundaries in the original image.

Now let’s display the image using matplotlib front end. A click on the image starts livewire segmentation. The suggestion for the best segmentation will appear as you will be moving mouse across the image. To submit a suggestion, click on the image for the second time. To finish the segmentation, press Escape key.

Parameters
image

image on which the shortest path should be computed.

smooth_image

Whether to smooth the original image using bilateral smoothing filter.

threshold_gradient_image

Wheter to use otsu method generate a thresholded gradient image for shortest path computation.

interactive

Wether to generate the path interactively.

Returns

A list of paths that are generated when running this algorithm. Paths can be used to segment a particular spatial domain of interests.

spateo.tools.center_align(init_center_sample: anndata.AnnData, samples: List[anndata.AnnData], layer: str = 'X', genes: Optional[Union[list, numpy.ndarray]] = None, spatial_key: str = 'spatial', lmbda: Optional[numpy.ndarray] = None, alpha: float = 0.1, n_components: int = 15, threshold: float = 0.001, max_iter: int = 10, numItermax: int = 200, numItermaxEmd: int = 100000, dissimilarity: str = 'kl', norm: bool = False, random_seed: Optional[int] = None, pis_init: Optional[List[numpy.ndarray]] = None, distributions: Optional[List[numpy.ndarray]] = None, dtype: str = 'float32', device: str = 'cpu') Tuple[anndata.AnnData, List[numpy.ndarray]][source]#

Computes center alignment of slices.

Parameters
init_center_sample

Sample to use as the initialization for center alignment; Make sure to include gene expression and spatial information.

samples

List of samples to use in the center alignment.

layer

If ‘X’, uses sample.X to calculate dissimilarity between spots, otherwise uses the representation given by sample.layers[layer].

genes

Genes used for calculation. If None, use all common genes for calculation.

spatial_key

The key in .obsm that corresponds to the raw spatial coordinates.

lmbda

List of probability weights assigned to each slice; If None, use uniform weights.

alpha

Alignment tuning parameter. Note: 0 <= alpha <= 1. When α = 0 only the gene expression data is taken into account, while when α =1 only the spatial coordinates are taken into account.

n_components

Number of components in NMF decomposition.

threshold

Threshold for convergence of W and H during NMF decomposition.

max_iter

Maximum number of iterations for our center alignment algorithm.

numItermax

Max number of iterations for cg during FGW-OT.

numItermaxEmd

Max number of iterations for emd during FGW-OT.

dissimilarity

Expression dissimilarity measure: 'kl' or 'euclidean'.

norm

If True, scales spatial distances such that neighboring spots are at distance 1. Otherwise, spatial distances remain unchanged.

random_seed

Set random seed for reproducibility.

pis_init

Initial list of mappings between ‘A’ and ‘slices’ to solver. Otherwise, default will automatically calculate mappings.

distributions

Distributions of spots for each slice. Otherwise, default is uniform.

dtype

The floating-point number type. Only float32 and float64.

device

Equipment used to run the program. You can also set the specified GPU for running. E.g.: ‘0’.

Returns

  • Inferred center sample with full and low dimensional representations (W, H) of the gene expression matrix.

  • List of pairwise alignment mappings of the center sample (rows) to each input sample (columns).

spateo.tools.generalized_procrustes_analysis(X, Y, pi)[source]#

Finds and applies optimal rotation between spatial coordinates of two layers (may also do a reflection).

Parameters
X

np array of spatial coordinates.

Y

np array of spatial coordinates.

pi

mapping between the two layers output by PASTE.

Returns

Aligned spatial coordinates of X, Y and the mapping relations.

spateo.tools.mapping_aligned_coords(X: numpy.ndarray, Y: numpy.ndarray, pi: numpy.ndarray, keep_all: bool = False) Tuple[dict, dict][source]#

Optimal mapping coordinates between X and Y.

Parameters
X

Aligned spatial coordinates.

Y

Aligned spatial coordinates.

pi

Mapping between the two layers output by PASTE.

keep_all

Whether to retain all the optimal relationships obtained only based on the pi matrix, If keep_all is False, the optimal relationships obtained based on the pi matrix and the nearest coordinates.

Returns

Two dicts of mapping_X, mapping_Y, pi_index, pi_value.

mapping_X is X coordinates aligned with Y coordinates. mapping_Y is the Y coordinate aligned with X coordinates. pi_index is index between optimal mapping points in the pi matrix. pi_value is the value of optimal mapping points.

spateo.tools.mapping_center_coords(modelA: anndata.AnnData, modelB: anndata.AnnData, center_key: str) dict[source]#

Optimal mapping coordinates between X and Y based on intermediate coordinates.

Parameters
modelA

modelA aligned with center model.

modelB

modelB aligned with center model.

center_key

The key in .uns that corresponds to the alignment info between modelA/modelB and center model.

Returns

A dict of raw_X, raw_Y, mapping_X, mapping_Y, pi_value.

raw_X is the raw X coordinates. raw_Y is the raw Y coordinates. mapping_X is the Y coordinates aligned with X coordinates. mapping_Y is the X coordinates aligned with Y coordinates. pi_value is the value of optimal mapping points.

spateo.tools.pairwise_align(sampleA: anndata.AnnData, sampleB: anndata.AnnData, layer: str = 'X', genes: Optional[Union[list, numpy.ndarray]] = None, spatial_key: str = 'spatial', alpha: float = 0.1, dissimilarity: str = 'kl', G_init=None, a_distribution=None, b_distribution=None, norm: bool = False, numItermax: int = 200, numItermaxEmd: int = 100000, dtype: str = 'float32', device: str = 'cpu') Tuple[numpy.ndarray, Optional[int]][source]#

Calculates and returns optimal alignment of two slices.

Parameters
sampleA

Sample A to align.

sampleB

Sample B to align.

layer

If ‘X’, uses sample.X to calculate dissimilarity between spots, otherwise uses the representation given by sample.layers[layer].

genes

Genes used for calculation. If None, use all common genes for calculation.

spatial_key

The key in .obsm that corresponds to the raw spatial coordinates.

alpha

Alignment tuning parameter. Note: 0 <= alpha <= 1. When α = 0 only the gene expression data is taken into account, while when α =1 only the spatial coordinates are taken into account.

dissimilarity

Expression dissimilarity measure: 'kl' or 'euclidean'.

G_init : array-like, optional

Initial mapping to be used in FGW-OT, otherwise default is uniform mapping.

a_distribution : array-like, optional

Distribution of sampleA spots, otherwise default is uniform.

b_distribution : array-like, optional

Distribution of sampleB spots, otherwise default is uniform.

norm

If True, scales spatial distances such that neighboring spots are at distance 1. Otherwise, spatial distances remain unchanged.

numItermax

Max number of iterations for cg during FGW-OT.

numItermaxEmd

Max number of iterations for emd during FGW-OT.

dtype

The floating-point number type. Only float32 and float64.

device

Equipment used to run the program. You can also set the specified GPU for running. E.g.: ‘0’.

Returns

Alignment of spots. obj: Objective function output of FGW-OT.

Return type

pi

spateo.tools.cellbin_morani(adata_cellbin: anndata.AnnData, binsize: int, cluster_key: str = 'Celltype') pandas.DataFrame[source]#

Calculate Moran’s I score for each celltype (in segmented cell adata). Since the presentation of cells are boolean values, this function first summarizes the number of each celltype using a given binsize, creating a spatial 2D matrix with cell counts. Then calculates Moran’s I score on the matrix for spatial score for each celltype.

Parameters
adata_cellbin : AnnData

An Annodata object for segmented cells.

binsize : int

The binsize used to summarize cell counts for each celltype.

cluster_key : str (default=”Celltype”)

The key in adata.obs including celltype labels.

Returns

A pandas DataFrame containing the Moran’ I score for celltypes.

spateo.tools.moran_i(adata: anndata.AnnData, genes: Optional[List[str]] = None, layer: Optional[str] = None, spatial_key: str = 'spatial', model: Literal[2d, 3d] = '2d', x: Optional[List[int]] = None, y: Optional[List[int]] = None, z: Optional[List[int]] = None, k: int = 5, weighted: Optional[List[str]] = None, permutations: int = 199, n_jobs: int = 1) pandas.DataFrame[source]#

Identify genes with strong spatial autocorrelation with Moran’s I test. This can be used to identify genes that are potentially related to cluster.

Parameters
adata : AnnData

an Annodata object

genes : list or None (default: None)

The list of genes that will be used to subset the data for dimension reduction and clustering. If None, all genes will be used.

layer : str or None (default: None)

The layer that will be used to retrieve data for dimension reduction and clustering. If None, .X is used.

spatial_key : The key in .obsm that corresponds to the spatial coordinate of each cell.

x : ‘list’ or None(default: None)

x-coordinates of all buckets.

y : ‘list’ or None(default: None)

y-coordinates of all buckets.

z : ‘list’ or None(default: None)

z-coordinates of all buckets.

k : 'int' (defult=20)

Number of neighbors to use by default for kneighbors queries.

weighted : 'str'(defult='kernel')

Spatial weights, defult is None, ‘kernel’ is based on kernel functions.

permutations : int (default=999)

Number of random permutations for calculation of pseudo-p_values.

n_cores : int (default=30)

The maximum number of concurrently running jobs, If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all.

Returns

A pandas DataFrame of the Moran’ I test results.

class spateo.tools.STGNN(adata: anndata.AnnData, spatial_key: str = 'spatial', random_seed: int = 50, add_regularization: bool = True, device: str = 'cpu')#

Graph neural network for representation learning of spatial transcriptomics data from only the gene expression matrix. Wraps preprocessing and training.

adata: class anndata.AnnData spatial_key: Key in .obsm where x- and y-coordinates are stored random_seed: Sets seed for all random number generators add_regularization: Set True to include weight-based penalty term in representation learning. device: Options: ‘cpu’, ‘cuda:_’. Perform computations on CPU or GPU. If GPU, provide the name of the device to run

computations

train_STGNN(**kwargs)#
Parameters
kwargs

Arguments that can be passed to :class Trainer.

Returns

AnnData object with the smoothed values stored in a layer, either “X_smooth_gcn” or

”X_smooth_gcn_reg”.

Return type

adata_output

spateo.tools.impute_and_downsample(adata: anndata.AnnData, filter_by_moran: bool = False, spatial_key: str = 'spatial', positive_ratio_cutoff: float = 0.1, imputation: bool = True, n_ds: Optional[int] = None, to_visualize: Union[None, str, List[str]] = None, cmap: str = 'magma', device: str = 'cpu', **kwargs) Tuple[anndata.AnnData, anndata.AnnData]#

Smooth gene expression distributions and downsample a spatial sample by selecting representative points from this smoothed slice.

Parameters
adata

AnnData object to model

filter_by_moran

Set True to split - for samples with highly uniform expression patterns, simple spatial smoothing will be used. For samples with localized patterns, graph neural network will be used for smoothing. If False, graph neural network will be applied to all genes.

spatial_key

Only used if ‘filter_by_moran’ is True; key in .obsm where x- and y-coordinates are stored.

positive_ratio_cutoff

Filter condition for genes- each gene must be present in higher than this proportion of the total number of cells to be retained

imputation

Set True to perform imputation. If False, will only downsample.

n_ds

Optional number of cells to downsample to- if not given, will not perform downsampling

kwargs

Additional arguments that can be provided to :func STGNN.train_STGNN. Options for kwargs: - learn_rate: Float, controls magnitude of gradient for network learning - dropout: Float between 0 and 1, proportion of weights in each layer to set to 0 - act: String specifying activation function for each encoder layer. Options: “sigmoid”, “tanh”, “relu”,

”elu”

  • clip: Float between 0 and 1, threshold below which imputed feature values will be set to 0,

    as a percentile. Recommended between 0 and 0.1.

  • weight_decay: Float, controls degradation rate of parameters

  • epochs: Int, number of iterations of training loop to perform

  • dim_output: Int, dimensionality of the output representation

  • alpha: Float, controls influence of reconstruction loss in representation learning

  • beta: Float, weight factor to control the influence of contrastive loss in representation learning

  • theta: Float, weight factor to control the influence of the regularization term in representation learning

  • add_regularization: Bool, adds penalty term to representation learning

Returns

Input AnnData object (optional) adata_rex: (optional) adata: AnnData subsetted down to downsampled buckets.

Return type

adata_orig

class spateo.tools.STGNN(adata: anndata.AnnData, spatial_key: str = 'spatial', random_seed: int = 50, add_regularization: bool = True, device: str = 'cpu')#

Graph neural network for representation learning of spatial transcriptomics data from only the gene expression matrix. Wraps preprocessing and training.

adata: class anndata.AnnData spatial_key: Key in .obsm where x- and y-coordinates are stored random_seed: Sets seed for all random number generators add_regularization: Set True to include weight-based penalty term in representation learning. device: Options: ‘cpu’, ‘cuda:_’. Perform computations on CPU or GPU. If GPU, provide the name of the device to run

computations

train_STGNN(**kwargs)#
Parameters
kwargs

Arguments that can be passed to :class Trainer.

Returns

AnnData object with the smoothed values stored in a layer, either “X_smooth_gcn” or

”X_smooth_gcn_reg”.

Return type

adata_output

spateo.tools.impute_and_downsample(adata: anndata.AnnData, filter_by_moran: bool = False, spatial_key: str = 'spatial', positive_ratio_cutoff: float = 0.1, imputation: bool = True, n_ds: Optional[int] = None, to_visualize: Union[None, str, List[str]] = None, cmap: str = 'magma', device: str = 'cpu', **kwargs) Tuple[anndata.AnnData, anndata.AnnData]#

Smooth gene expression distributions and downsample a spatial sample by selecting representative points from this smoothed slice.

Parameters
adata

AnnData object to model

filter_by_moran

Set True to split - for samples with highly uniform expression patterns, simple spatial smoothing will be used. For samples with localized patterns, graph neural network will be used for smoothing. If False, graph neural network will be applied to all genes.

spatial_key

Only used if ‘filter_by_moran’ is True; key in .obsm where x- and y-coordinates are stored.

positive_ratio_cutoff

Filter condition for genes- each gene must be present in higher than this proportion of the total number of cells to be retained

imputation

Set True to perform imputation. If False, will only downsample.

n_ds

Optional number of cells to downsample to- if not given, will not perform downsampling

kwargs

Additional arguments that can be provided to :func STGNN.train_STGNN. Options for kwargs: - learn_rate: Float, controls magnitude of gradient for network learning - dropout: Float between 0 and 1, proportion of weights in each layer to set to 0 - act: String specifying activation function for each encoder layer. Options: “sigmoid”, “tanh”, “relu”,

”elu”

  • clip: Float between 0 and 1, threshold below which imputed feature values will be set to 0,

    as a percentile. Recommended between 0 and 0.1.

  • weight_decay: Float, controls degradation rate of parameters

  • epochs: Int, number of iterations of training loop to perform

  • dim_output: Int, dimensionality of the output representation

  • alpha: Float, controls influence of reconstruction loss in representation learning

  • beta: Float, weight factor to control the influence of contrastive loss in representation learning

  • theta: Float, weight factor to control the influence of the regularization term in representation learning

  • add_regularization: Bool, adds penalty term to representation learning

Returns

Input AnnData object (optional) adata_rex: (optional) adata: AnnData subsetted down to downsampled buckets.

Return type

adata_orig

class spateo.tools.Category_Model(*args, **kwargs)#

Bases: Base_Model

Wraps all necessary methods for data loading and preparation, model initialization, parameterization, evaluation and prediction when instantiating a model for spatially-aware (but not spatially lagged) regression using categorical variables (specifically, the prevalence of categories within spatial neighborhoods) to predict the value of gene expression.

Arguments passed to :class Base_Model. The only keyword argument that is used for this class is ‘n_neighbors’.

Parameters
args

Positional arguments to :class Base_Model

kwargs

Keyword arguments to :class Base_Model

class spateo.tools.Lagged_Model(model_type: str = 'ligand', lig: Union[None, str, List[str]] = None, rec: Union[None, str, List[str]] = None, rec_ds: Union[None, str, List[str]] = None, species: Literal[human, mouse, axolotl] = 'human', normalize: bool = True, smooth: bool = False, log_transform: bool = True, *args, **kwargs)#

Bases: Base_Model

Wraps all necessary methods for data loading and preparation, model initialization, parameterization, evaluation and prediction when instantiating a model for spatially-lagged regression.

Can specify one of two models: “ligand”, which uses the spatial lag of ligand genes and the spatial lag of the regression target to predict the regression target, or “niche”, which uses the spatial lag of cell type colocalization and the spatial lag of the regression target to predict the regression target.

If “ligand” is specified, arguments to lig must be given, and it is recommended to provide species as well- default for this is human.

Arguments passed to :class Base_Model.

Parameters
model_type

Either “ligand” or “niche”, specifies whether to fit a model that incorporates the spatial lag of ligand expression or the spatial lag of cell type colocalization.

lig

Name(s) of ligands to use as predictors

rec

Name(s) of receptors to use as regression targets. If not given, will search through database for all genes that correspond to the provided genes from ‘ligands’.

rec_ds

Name(s) of receptor-downstream genes to use as regression targets. If not given, will search through database for all genes that correspond to receptor-downstream genes.

species

Specifies L:R database to use

normalize

Perform library size normalization, to set total counts in each cell to the same number (adjust for cell size)

smooth

To correct for dropout effects, leverage gene expression neighborhoods to smooth expression

log_transform

Set True if log-transformation should be applied to expression (otherwise, will assume preprocessing/log-transform was computed beforehand)

args

Additional positional arguments to :class Base_Model

kwargs

Additional keyword arguments to :class Base_Model

run_GM_lag() Tuple[pandas.DataFrame, pandas.DataFrame, pandas.DataFrame]#

Runs spatially lagged two-stage least squares model

single(cur_g: str, X: pandas.DataFrame, X_variable_names: List[str], param_labels: List[str], adata: anndata.AnnData, w: numpy.ndarray, layer: Union[None, str] = None) Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]#

Defines model run process for a single feature- not callable by the user, all arguments populated by arguments passed on instantiation of :class Base_Model.

Parameters
cur_g

Name of the feature to regress on

X

Values used for the regression

X_variable_names

Names of the variables used for the regression

param_labels

Names of categories- each computed parameter corresponds to a single element in param_labels

adata

AnnData object to store results in

w

Spatial weights array

layer

Specifies layer in AnnData to use- if None, will use .X.

Returns

Coefficients for each categorical group for each feature pred: Predicted values from regression for each feature resid: Residual values from regression for each feature

Return type

coeffs

class spateo.tools.Niche_LR_Model(lig: Union[None, str, List[str]], rec: Union[None, str, List[str]] = None, rec_ds: Union[None, str, List[str]] = None, species: Literal[human, mouse, axolotl] = 'human', niche_lr_r_lag: bool = True, *args, **kwargs)#

Bases: Base_Model

Wraps all necessary methods for data loading and preparation, model initialization, parameterization, evaluation and prediction when instantiating a model for spatially-aware regression using the prevalence of and connections between categories within spatial neighborhoods and the cell type-specific expression of ligands and receptors to predict the regression target.

Arguments passed to :class Base_Model.

Parameters
lig

Name(s) of ligands to use as predictors

rec

Name(s) of receptors to use as regression targets. If not given, will search through database for all genes that correspond to the provided genes from ‘ligands’

rec_ds

Name(s) of receptor-downstream genes to use as regression targets. If not given, will search through database for all genes that correspond to receptors

species

Specifies L:R database to use

niche_lr_r_lag

Only used if ‘mod_type’ is “niche_lr”. Uses the spatial lag of the receptor as the dependent variable rather than each spot’s unique receptor expression. Defaults to True.

args

Additional positional arguments to :class Base_Model

kwargs

Additional keyword arguments to :class Base_Model

class spateo.tools.Niche_Model(*args, **kwargs)#

Bases: Base_Model

Wraps all necessary methods for data loading and preparation, model initialization, parameterization, evaluation and prediction when instantiating a model for spatially-aware regression using both the prevalence of and connections between categories within spatial neighborhoods to predict the value of gene expression.

Arguments passed to :class Base_Model.

Parameters
args

Positional arguments to :class Base_Model

kwargs

Keyword arguments to :class Base_Model

spateo.tools.fit_glm(X: Union[numpy.ndarray, pandas.DataFrame], adata: anndata.AnnData, y_feat, calc_first_moment: bool = True, log_transform: bool = True, gs_params: Union[None, dict] = None, n_gs_cv: Union[None, int] = None, return_model: bool = True, **kwargs) Tuple[numpy.ndarray, numpy.ndarray, float, numpy.ndarray, Union[None, GLMCV]][source]#

Wrapper for fitting a generalized elastic net linear model to large biological data, with automated finding of optimum lambda regularization parameter and optional further grid search for parameter optimization.

Parameters
X

Array or DataFrame containing data for fitting- all columns in this array will be used as independent variables

adata

AnnData object from which dependent variable gene expression values will be taken from

y_feat

Name of the feature in ‘adata’ corresponding to the dependent variable

log_transform

If True, will log transform expression. Defaults to True.

calc_first_moment

If True, will alleviate dropout effects by computing the first moment of each gene across cells, consistent with the method used by the original RNA velocity method (La Manno et al., 2018). Defaults to True.

gs_params

Optional dictionary where keys are variable names for either the classifier or the regressor and values are lists of potential values for which to find the best combination using grid search. Classifier parameters should be given in the following form: ‘classifier__{parameter name}’.

n_gs_cv

Number of folds for cross-validation, will only be used if gs_params is not None. If None, will default to a 5-fold cross-validation.

return_model

If True, returns fitted model. Defaults to True.

kwargs

Additional named arguments that will be provided to :class GLMCV. Valid options are: - distr: Distribution family- can be “gaussian”, “poisson”, “neg-binomial”, or “gamma”. Case sensitive. - alpha: The weighting between L1 penalty (alpha=1.) and L2 penalty (alpha=0.) term of the loss function - Tau: optional array of shape [n_features, n_features]; the Tikhonov matrix for ridge regression. If not

provided, Tau will default to the identity matrix.

  • reg_lambda: Regularization parameter \(\lambda\) of penalty term

  • n_lambdas: Number of lambdas along the regularization path. Only used if ‘reg_lambda’ is not given.

  • cv: Number of cross-validation repeats

  • learning_rate: Governs the magnitude of parameter updates for the gradient descent algorithm

  • max_iter: Maximum number of iterations for the solver

  • tol: Convergence threshold or stopping criteria. Optimization loop will stop when relative change in

    parameter norm is below the threshold.

  • eta: A threshold parameter that linearizes the exp() function above eta.

  • score_metric: Scoring metric. Options:
    • ”deviance”: Uses the difference between the saturated (perfectly predictive) model and the true model.

    • ”pseudo_r2”: Uses the coefficient of determination b/w the true and predicted values.

  • fit_intercept: Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function

  • random_seed: Seed of the random number generator used to initialize the solution. Default: 888

  • theta: Shape parameter of the negative binomial distribution (number of successes before the first

    failure). It is used only if ‘distr’ is equal to “neg-binomial”, otherwise it is ignored.

Returns

Array of shape [n_parameters, 1], contains weight for each parameter rex: Array of shape [n_samples, 1]. Reconstructed independent variable values. reg: Instance of regression model. Returned only if ‘return_model’ is True.

Return type

Beta

spateo.tools.plot_prior_vs_data(reconst: pandas.DataFrame, adata: anndata.AnnData, kind: str = 'barplot', target_name: Union[None, str] = None, title: Union[None, str] = None, figsize: Union[None, Tuple[float, float]] = None, save_show_or_return: Literal[save, show, return, both, all] = 'save', save_kwargs: dict = {})[source]#

Plots distribution of observed vs. predicted counts in the form of a comparative density barplot.

Parameters
reconst

DataFrame containing values for reconstruction/prediction of targets of a regression model

adata

AnnData object containing observed counts

kind

Kind of plot to generate. Options: “barplot”, “scatterplot”. Case sensitive, defaults to “barplot”.

target_name

Optional, can be:
  • Column name in DataFrame/AnnData object: name of gene to subset to

  • ”sum”: computes sum over all features present in ‘reconst’ to compare to the corresponding subset of

’adata’. - “mean”: computes mean over all features present in ‘reconst’ to compare to the corresponding subset of ‘adata’.

If not given, will subset AnnData to features in ‘reconst’ and flatten both arrays to compare all values.

If not given, will compute the sum over all features present in ‘reconst’ and compare to the corresponding subset of ‘adata’.

save_show_or_return

Whether to save, show or return the figure. If “both”, it will save and plot the figure at the same time. If “all”, the figure will be saved, displayed and the associated axis and other object will be return.

save_kwargs

A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {“path”: None, “prefix”: ‘scatter’, “dpi”: None, “ext”: ‘pdf’, “transparent”: True, “close”: True, “verbose”: True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs.

class spateo.tools.Category_Model(*args, **kwargs)#

Bases: Base_Model

Wraps all necessary methods for data loading and preparation, model initialization, parameterization, evaluation and prediction when instantiating a model for spatially-aware (but not spatially lagged) regression using categorical variables (specifically, the prevalence of categories within spatial neighborhoods) to predict the value of gene expression.

Arguments passed to :class Base_Model. The only keyword argument that is used for this class is ‘n_neighbors’.

Parameters
args

Positional arguments to :class Base_Model

kwargs

Keyword arguments to :class Base_Model

class spateo.tools.Lagged_Model(model_type: str = 'ligand', lig: Union[None, str, List[str]] = None, rec: Union[None, str, List[str]] = None, rec_ds: Union[None, str, List[str]] = None, species: Literal[human, mouse, axolotl] = 'human', normalize: bool = True, smooth: bool = False, log_transform: bool = True, *args, **kwargs)#

Bases: Base_Model

Wraps all necessary methods for data loading and preparation, model initialization, parameterization, evaluation and prediction when instantiating a model for spatially-lagged regression.

Can specify one of two models: “ligand”, which uses the spatial lag of ligand genes and the spatial lag of the regression target to predict the regression target, or “niche”, which uses the spatial lag of cell type colocalization and the spatial lag of the regression target to predict the regression target.

If “ligand” is specified, arguments to lig must be given, and it is recommended to provide species as well- default for this is human.

Arguments passed to :class Base_Model.

Parameters
model_type

Either “ligand” or “niche”, specifies whether to fit a model that incorporates the spatial lag of ligand expression or the spatial lag of cell type colocalization.

lig

Name(s) of ligands to use as predictors

rec

Name(s) of receptors to use as regression targets. If not given, will search through database for all genes that correspond to the provided genes from ‘ligands’.

rec_ds

Name(s) of receptor-downstream genes to use as regression targets. If not given, will search through database for all genes that correspond to receptor-downstream genes.

species

Specifies L:R database to use

normalize

Perform library size normalization, to set total counts in each cell to the same number (adjust for cell size)

smooth

To correct for dropout effects, leverage gene expression neighborhoods to smooth expression

log_transform

Set True if log-transformation should be applied to expression (otherwise, will assume preprocessing/log-transform was computed beforehand)

args

Additional positional arguments to :class Base_Model

kwargs

Additional keyword arguments to :class Base_Model

run_GM_lag() Tuple[pandas.DataFrame, pandas.DataFrame, pandas.DataFrame]#

Runs spatially lagged two-stage least squares model

single(cur_g: str, X: pandas.DataFrame, X_variable_names: List[str], param_labels: List[str], adata: anndata.AnnData, w: numpy.ndarray, layer: Union[None, str] = None) Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]#

Defines model run process for a single feature- not callable by the user, all arguments populated by arguments passed on instantiation of :class Base_Model.

Parameters
cur_g

Name of the feature to regress on

X

Values used for the regression

X_variable_names

Names of the variables used for the regression

param_labels

Names of categories- each computed parameter corresponds to a single element in param_labels

adata

AnnData object to store results in

w

Spatial weights array

layer

Specifies layer in AnnData to use- if None, will use .X.

Returns

Coefficients for each categorical group for each feature pred: Predicted values from regression for each feature resid: Residual values from regression for each feature

Return type

coeffs

class spateo.tools.Niche_LR_Model(lig: Union[None, str, List[str]], rec: Union[None, str, List[str]] = None, rec_ds: Union[None, str, List[str]] = None, species: Literal[human, mouse, axolotl] = 'human', niche_lr_r_lag: bool = True, *args, **kwargs)#

Bases: Base_Model

Wraps all necessary methods for data loading and preparation, model initialization, parameterization, evaluation and prediction when instantiating a model for spatially-aware regression using the prevalence of and connections between categories within spatial neighborhoods and the cell type-specific expression of ligands and receptors to predict the regression target.

Arguments passed to :class Base_Model.

Parameters
lig

Name(s) of ligands to use as predictors

rec

Name(s) of receptors to use as regression targets. If not given, will search through database for all genes that correspond to the provided genes from ‘ligands’

rec_ds

Name(s) of receptor-downstream genes to use as regression targets. If not given, will search through database for all genes that correspond to receptors

species

Specifies L:R database to use

niche_lr_r_lag

Only used if ‘mod_type’ is “niche_lr”. Uses the spatial lag of the receptor as the dependent variable rather than each spot’s unique receptor expression. Defaults to True.

args

Additional positional arguments to :class Base_Model

kwargs

Additional keyword arguments to :class Base_Model

class spateo.tools.Niche_Model(*args, **kwargs)#

Bases: Base_Model

Wraps all necessary methods for data loading and preparation, model initialization, parameterization, evaluation and prediction when instantiating a model for spatially-aware regression using both the prevalence of and connections between categories within spatial neighborhoods to predict the value of gene expression.

Arguments passed to :class Base_Model.

Parameters
args

Positional arguments to :class Base_Model

kwargs

Keyword arguments to :class Base_Model

spateo.tools.get_align_labels(model: anndata.AnnData, align_X: numpy.ndarray, key: Union[str, List[str]], spatial_key: str = 'align_spatial') pandas.DataFrame[source]#

Obtain the label information in anndata.obs[key] corresponding to the align_X coordinate.

spateo.tools.models_align(models: List[anndata.AnnData], layer: str = 'X', genes: Optional[Union[list, numpy.ndarray]] = None, spatial_key: str = 'spatial', key_added: str = 'align_spatial', mapping_key_added: str = 'models_align', alpha: float = 0.1, numItermax: int = 200, numItermaxEmd: int = 100000, dtype: str = 'float32', device: str = 'cpu', keep_all: bool = False, **kwargs) List[anndata.AnnData][source]#

Align spatial coordinates of models.

Parameters
models

List of models (AnnData Object).

layer

If 'X', uses .X to calculate dissimilarity between spots, otherwise uses the representation given by .layers[layer].

genes

Genes used for calculation. If None, use all common genes for calculation.

spatial_key

The key in .obsm that corresponds to the raw spatial coordinate.

key_added

.obsm key under which to add the aligned spatial coordinate.

mapping_key_added

.uns key under which to add the alignment info.

alpha

Alignment tuning parameter. Note: 0 <= alpha <= 1.

When alpha = 0 only the gene expression data is taken into account, while when alpha =1 only the spatial coordinates are taken into account.

numItermax

Max number of iterations for cg during FGW-OT.

numItermaxEmd

Max number of iterations for emd during FGW-OT.

dtype

The floating-point number type. Only float32 and float64.

device

Equipment used to run the program. You can also set the specified GPU for running. E.g.: '0'.

keep_all

Whether to retain all the optimal relationships obtained only based on the pi matrix, If keep_all is False, the optimal relationships obtained based on the pi matrix and the nearest coordinates.

**kwargs

Additional parameters that will be passed to pairwise_align function.

Returns

List of models (AnnData Object) after alignment.

spateo.tools.models_align_ref(models: List[anndata.AnnData], models_ref: Optional[List[anndata.AnnData]] = None, n_sampling: Optional[int] = 2000, sampling_method: str = 'trn', layer: str = 'X', genes: Optional[Union[list, numpy.ndarray]] = None, spatial_key: str = 'spatial', key_added: str = 'align_spatial', mapping_key_added: str = 'models_align', alpha: float = 0.1, numItermax: int = 200, numItermaxEmd: int = 100000, dtype: str = 'float32', device: str = 'cpu', **kwargs) Tuple[List[anndata.AnnData], List[anndata.AnnData]][source]#

Align the spatial coordinates of one model list through the affine transformation matrix obtained from another model list.

Parameters
models

List of models (AnnData Object).

models_ref

Another list of models (AnnData Object).

n_sampling

When models_ref is None, new data containing n_sampling coordinate points will be automatically generated for alignment.

sampling_method

The method to sample data points, can be one of ["trn", "kmeans", "random"].

layer

If 'X', uses .X to calculate dissimilarity between spots, otherwise uses the representation given by .layers[layer].

genes

Genes used for calculation. If None, use all common genes for calculation.

spatial_key

The key in .obsm that corresponds to the raw spatial coordinate.

key_added

.obsm key under which to add the aligned spatial coordinate.

mapping_key_added

.uns key under which to add the alignment info.

alpha

Alignment tuning parameter. Note: 0 <= alpha <= 1.

When alpha = 0 only the gene expression data is taken into account, while when alpha =1 only the spatial coordinates are taken into account.

numItermax

Max number of iterations for cg during FGW-OT.

numItermaxEmd

Max number of iterations for emd during FGW-OT.

dtype

The floating-point number type. Only float32 and float64.

device

Equipment used to run the program. You can also set the specified GPU for running. E.g.: '0'

**kwargs

Additional parameters that will be passed to pairwise_align function.

Returns

List of models (AnnData Object) after alignment. align_models_ref: List of models_ref (AnnData Object) after alignment.

Return type

align_models

spateo.tools.models_center_align(init_center_model: anndata.AnnData, models: List[anndata.AnnData], layer: str = 'X', genes: Optional[Union[list, numpy.ndarray]] = None, spatial_key: str = 'spatial', key_added: str = 'align_spatial', mapping_key_added: str = 'models_align', lmbda: Optional[numpy.ndarray] = None, alpha: float = 0.1, n_components: int = 15, threshold: float = 0.001, max_iter: int = 10, numItermax: int = 200, numItermaxEmd: int = 100000, dissimilarity: str = 'kl', norm: bool = False, random_seed: Optional[int] = None, pis_init: Optional[List[numpy.ndarray]] = None, distributions: Optional[List[numpy.ndarray]] = None, dtype: str = 'float32', device: str = 'cpu', keep_all: bool = False) Tuple[anndata.AnnData, List[anndata.AnnData]][source]#

Align spatial coordinates of a list of models to a center model.

Parameters
init_center_model

AnnData object to use as the initialization for center alignment; Make sure to include gene expression and spatial information.

models

List of AnnData objects to use in the center alignment.

layer

If 'X', uses .X to calculate dissimilarity between spots, otherwise uses the representation given by .layers[layer].

genes

Genes used for calculation. If None, use all common genes for calculation.

spatial_key

The key in .obsm that corresponds to the raw spatial coordinate.

key_added

.obsm key under which to add the aligned spatial coordinate.

mapping_key_added

.uns key under which to add the alignment info.

lmbda

List of probability weights assigned to each slice; If None, use uniform weights.

alpha

Alignment tuning parameter. Note: 0 <= alpha <= 1.

When alpha = 0 only the gene expression data is taken into account, while when alpha =1 only the spatial coordinates are taken into account.

n_components

Number of components in NMF decomposition.

threshold

Threshold for convergence of W and H during NMF decomposition.

max_iter

Maximum number of iterations for our center alignment algorithm.

numItermax

Max number of iterations for cg during FGW-OT.

numItermaxEmd

Max number of iterations for emd during FGW-OT.

dissimilarity

Expression dissimilarity measure: 'kl' or 'euclidean'.

norm

If norm = True, scales spatial distances such that neighboring spots are at distance 1.

Otherwise, spatial distances remain unchanged.

random_seed

Set random seed for reproducibility.

pis_init

Initial list of mappings between A and models to solver.

Otherwise, default will automatically calculate mappings.

distributions

Distributions of spots for each slice. Otherwise, default is uniform.

dtype

The floating-point number type. Only float32 and float64.

device

Equipment used to run the program. You can also set the specified GPU for running. E.g.: '0'.

keep_all

Whether to retain all the optimal relationships obtained only based on the pi matrix, If keep_all is False, the optimal relationships obtained based on the pi matrix and the nearest coordinates.

Returns

The center model. align_models: List of models (AnnData Object) after alignment.

Return type

new_center_model

spateo.tools.models_center_align_ref(init_center_model: anndata.AnnData, models: List[anndata.AnnData], models_ref: Optional[List[anndata.AnnData]] = None, n_sampling: Optional[int] = 1000, sampling_method: str = 'trn', layer: str = 'X', genes: Optional[Union[list, numpy.ndarray]] = None, spatial_key: str = 'spatial', key_added: str = 'align_spatial', mapping_key_added: str = 'models_align', lmbda: Optional[numpy.ndarray] = None, alpha: float = 0.1, n_components: int = 15, threshold: float = 0.001, max_iter: int = 10, numItermax: int = 200, numItermaxEmd: int = 100000, dissimilarity: str = 'kl', norm: bool = False, random_seed: Optional[int] = None, pis_init: Optional[List[numpy.ndarray]] = None, distributions: Optional[List[numpy.ndarray]] = None, dtype: str = 'float32', device: str = 'cpu') Tuple[anndata.AnnData, List[anndata.AnnData], List[anndata.AnnData]][source]#

Align the spatial coordinates of one model list to the central model through the affine transformation matrix obtained from another model list.

Parameters
init_center_model

AnnData object to use as the initialization for center alignment; Make sure to include gene expression and spatial information.

models

List of AnnData objects to use in the center alignment.

models_ref

List of AnnData objects with a small number of coordinates.

n_sampling

When models_ref is None, new data containing n_sampling coordinate points will be automatically generated for alignment.

sampling_method

The method to sample data points, can be one of [“trn”, “kmeans”, “random”].

layer

If 'X', uses .X to calculate dissimilarity between spots, otherwise uses the representation given by .layers[layer].

genes

Genes used for calculation. If None, use all common genes for calculation.

spatial_key

The key in .obsm that corresponds to the raw spatial coordinate.

key_added

.obsm key under which to add the aligned spatial coordinate.

mapping_key_added

.uns key under which to add the alignment info.

lmbda

List of probability weights assigned to each slice; If None, use uniform weights.

alpha

Alignment tuning parameter. Note: 0 <= alpha <= 1.

When alpha = 0 only the gene expression data is taken into account, while when alpha =1 only the spatial coordinates are taken into account.

n_components

Number of components in NMF decomposition.

threshold

Threshold for convergence of W and H during NMF decomposition.

max_iter

Maximum number of iterations for our center alignment algorithm.

numItermax

Max number of iterations for cg during FGW-OT.

numItermaxEmd

Max number of iterations for emd during FGW-OT.

dissimilarity

Expression dissimilarity measure: 'kl' or 'euclidean'.

norm

If norm = True, scales spatial distances such that neighboring spots are at distance 1.

Otherwise, spatial distances remain unchanged.

random_seed

Set random seed for reproducibility.

pis_init

Initial list of mappings between A and models to solver.

Otherwise, default will automatically calculate mappings.

distributions

Distributions of spots for each slice. Otherwise, default is uniform.

dtype

The floating-point number type. Only float32 and float64.

device

Equipment used to run the program. You can also set the specified GPU for running. E.g.: '0'

Returns

The center model. align_models: List of models (AnnData Object) after alignment. align_models_ref: List of models_ref (AnnData Object) after alignment.

Return type

new_center_model

spateo.tools.rigid_transform_2D(coords: numpy.ndarray, coords_refA: numpy.ndarray, coords_refB: numpy.ndarray) numpy.ndarray[source]#

Compute optimal transformation based on the two sets of 2D points and apply the transformation to other points.

Parameters
coords

2D coordinate matrix needed to be transformed.

coords_refA

Referential 2D coordinate matrix before transformation.

coords_refB

Referential 2D coordinate matrix after transformation.

Returns

The 2D coordinate matrix after transformation

spateo.tools.rigid_transform_3D(coords: numpy.ndarray, coords_refA: numpy.ndarray, coords_refB: numpy.ndarray) numpy.ndarray[source]#

Compute optimal transformation based on the two sets of 3D points and apply the transformation to other points.

Parameters
coords

3D coordinate matrix needed to be transformed.

coords_refA

Referential 3D coordinate matrix before transformation.

coords_refB

Referential 3D coordinate matrix after transformation.

Returns

The 3D coordinate matrix after transformation