`spateo.tools.find_neighbors`#

Functions for finding nearest neighbors and the distances between them in spatial transcriptomics data.

Module Contents#

Functions#

`weighted_spatial_graph`(...)	Given an AnnData object, compute distance array with either a fixed number of neighbors for each bucket or a
`weighted_expr_neighbors_graph`(...)	Given an AnnData object, compute distance array in gene expression space.
`transcriptomic_connectivity`(...)	Given an AnnData object, compute pairwise connectivity matrix in transcriptomic space
`remove_greater_than`(→ scipy.sparse.csr_matrix)	Remove values greater than a threshold from a sparse matrix.
`generate_spatial_distance_graph`(...)	Creates graph based on distance in space.
`generate_spatial_weights_fixed_nbrs`(...)	Starting from a k-nearest neighbor graph, generate a nearest neighbor graph.
`gaussian_weight_2d`(→ float)	Calculate normalized gaussian value for a given distance from central point
`p_equiv_radius`(→ float)	Find radius at which you eliminate fraction p of a radial Gaussian probability distribution with standard
`generate_spatial_weights_fixed_radius`(...)	Starting from a radius-based neighbor graph, generate a sparse graph (csr format) with weighted edges, where edge
`calculate_distance`(→ numpy.ndarray)	Given array of x- and y-coordinates, compute pairwise distances between all samples using Euclidean distance.
`construct_spatial_distance_matrix`(→ anndata.AnnData)	Given AnnData object and key to array of x- and y-coordinates, compute pairwise spatial distances between all
`construct_geodesic_distance_matrix`(→ anndata.AnnData)	Given AnnData object and key to array of x- and y-coordinates, compute geodesic distance each sample and its
`construct_binned_spatial_distance`(adata[, bin_size, ...])	Given AnnData object and key to array of x- and y-coordinates, first "collapse" the dataset by aggregating
`construct_nn_graph`(→ None)	Constructing bucket-to-bucket nearest neighbors graph.
`normalize_adj`(→ numpy.ndarray)	Symmetrically normalize adjacency matrix, set diagonal to 1 and return processed adjacency array.

spateo.tools.find_neighbors.weighted_spatial_graph(adata: anndata.AnnData, spatial_key: str = 'spatial', fixed: str = 'n_neighbors', n_neighbors_method: str = 'ball_tree', n_neighbors: int = 30, decay_type: str = 'reciprocal', p: float = 0.05, sigma: float = 100) → Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, anndata.AnnData][source]#

Given an AnnData object, compute distance array with either a fixed number of neighbors for each bucket or a fixed search radius for each bucket. Additional note: parameters ‘p’ and ‘sigma’ (used only if ‘fixed’ is ‘radius’) are used to modulate the radius when defining neighbors using a fixed radius. ‘Sigma’ parameterizes the standard deviation (e.g. in pixels, micrometers, etc.) of a Gaussian distribution that is centered at a particular bucket with height ‘a’- to search for that bucket’s neighbors, ‘p’ is the cutoff height of the Gaussian, as a proportion of the peak height ‘a’. Essentially, to define the radius that should be used for all buckets, this function measures how far out from each bucket you would need to go before the Gaussian decays to e.g. 0.05 of its peak height. With knowledge of e.g. diffusion kinetics for particular soluble factors, the neighborhood can be defined taking this into account.

Parameters

adata: an anndata object.
spatial_key: Key in .obsm containing coordinates for each bucket.
fixed: Options: ‘n_neighbors’, ‘radius’- sets either fixed number of neighbors or fixed search radius for each bucket.
n_neighbors_method: Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”. Unused unless ‘fixed’ is ‘n_neighbors’.
n_neighbors: Number of neighbors each bucket has. Unused unless ‘fixed’ is ‘n_neighbors’.
decay_type: Sets method by which edge weights are defined. Options: “reciprocal”, “ranked”, “uniform”. Unused unless ‘fixed’ is ‘n_neighbors’.
p: Cutoff for Gaussian (used to find where distribution drops below p * (max_value)). Unused unless ‘fixed’ is ‘radius’.
sigma: Standard deviation of the Gaussian. Unused unless ‘fixed’ is ‘radius’.

Returns

Weighted nearest neighbors graph with shape [n_samples, n_samples] distance_graph: Unweighted graph with shape [n_samples, n_samples] adata: Updated AnnData object containing ‘spatial_distances’,’spatial_weights’,’spatial_connectivities’ in .obsp and ‘spatial_neighbors’ in .uns.

Return type

out_graph

spateo.tools.find_neighbors.weighted_expr_neighbors_graph(adata: anndata.AnnData, nbr_object: sklearn.neighbors.NearestNeighbors = None, basis: str = 'pca', n_neighbors_method: str = 'ball_tree', n_pca_components: int = 30, num_neighbors: int = 30, decay_type: str = 'reciprocal') → Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, anndata.AnnData][source]#

Given an AnnData object, compute distance array in gene expression space.

Parameters

adata: an anndata object.
nbr_object: An optional sklearn.neighbors.NearestNeighbors object. Can optionally create a nearest neighbor object with custom functionality.
basis: str, default ‘pca’ The space that will be used for nearest neighbor search. Valid names includes, for example, pca, umap, or X
n_neighbors_method: str, default ‘ball_tree’ Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”.
n_pca_components: Only used if ‘basis’ is ‘pca’. Sets number of principal components to compute.
num_neighbors: Number of neighbors for each bucket, used in computing distance graph
decay_type: Sets method by which edge weights are defined. Options: “reciprocal”, “ranked”, “uniform”.

Returns

Weighted k-nearest neighbors graph with shape [n_samples, n_samples]. distance_graph: Unweighted graph with shape [n_samples, n_samples]. adata: Updated AnnData object containing ‘spatial_distance’ in .obsp and ‘spatial_neighbors’ in .uns.

Return type

out_graph

spateo.tools.find_neighbors.transcriptomic_connectivity(adata: anndata.AnnData, nbr_object: sklearn.neighbors.NearestNeighbors = None, basis: str = 'pca', n_neighbors_method: str = 'ball_tree', n_pca_components: int = 30) → Tuple[sklearn.neighbors.NearestNeighbors, anndata.AnnData][source]#

Given an AnnData object, compute pairwise connectivity matrix in transcriptomic space

Parameters

adata: an anndata object.
nbr_object: An optional sklearn.neighbors.NearestNeighbors object. Can optionally create a nearest neighbor object with custom functionality.
basis: str, default ‘pca’ The space that will be used for nearest neighbor search. Valid names includes, for example, pca, umap, or X
n_neighbors_method: str, default ‘ball_tree’ Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”.
n_pca_components: Only used if ‘basis’ is ‘pca’. Sets number of principal components to compute.
num_neighbors: Number of neighbors for each bucket, used in computing distance graph

Returns

Object of class sklearn.neighbors.NearestNeighbors adata : Modified AnnData object

Return type

nbrs

spateo.tools.find_neighbors.remove_greater_than(graph: scipy.sparse.csr_matrix, threshold: float, copy: bool = False, verbose: bool = False) → scipy.sparse.csr_matrix[source]#

Remove values greater than a threshold from a sparse matrix.

Parameters

graph: The input scipy matrix of the graph.
threshold: Upper numerical threshold to avoid filtering.
copy: Set True to avoid altering original graph.
verbose: Set True to display messages at runtime- not recommended generally since this will print entire arrays.

Returns

The updated graph with values greater than the threshold removed.

Return type

graph

spateo.tools.find_neighbors.generate_spatial_distance_graph(locations: numpy.ndarray, nbr_object: sklearn.neighbors.NearestNeighbors = None, method: str = 'ball_tree', num_neighbors: Union[None, int] = None, radius: Union[None, float] = None) → Tuple[sklearn.neighbors.NearestNeighbors, scipy.sparse.csr_matrix][source]#

Creates graph based on distance in space.

Parameters

locations: Spatial coordinates for each bucket with shape [n_samples, 2]
nbr_object: An optional sklearn.neighbors.NearestNeighbors object. Can optionally create a nearest neighbor object with custom functionality.
method: Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”.
num_neighbors: Number of neighbors for each bucket.
radius: Search radius around each bucket.

Returns

sklearn NearestNeighbor object. graph_out: A sparse matrix of the spatial graph.

Return type

nbrs

spateo.tools.find_neighbors.generate_spatial_weights_fixed_nbrs(adata: anndata.AnnData, spatial_key: str = 'spatial', num_neighbors: int = 10, method: str = 'ball_tree', decay_type: str = 'reciprocal', nbr_object: sklearn.neighbors.NearestNeighbors = None) → Union[Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, anndata.AnnData]][source]#

Starting from a k-nearest neighbor graph, generate a nearest neighbor graph.

Parameters

spatial_key: Key in .obsm where x- and y-coordinates are stored.
num_neighbors: Number of neighbors each bucket has.
method: Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options:
"kd_tree". : "ball_tree" and
decay_type: Sets method by which edge weights are defined. Options: “reciprocal”, “ranked”, “uniform”.

Returns

Return type

out_graph

spateo.tools.find_neighbors.gaussian_weight_2d(distance: float, sigma: float) → float[source]#: Calculate normalized gaussian value for a given distance from central point Normalized by 2*pi*sigma-squared

spateo.tools.find_neighbors.p_equiv_radius(p: float, sigma: float) → float[source]#: Find radius at which you eliminate fraction p of a radial Gaussian probability distribution with standard deviation sigma.

spateo.tools.find_neighbors.generate_spatial_weights_fixed_radius(adata: anndata.AnnData, spatial_key: str = 'spatial', p: float = 0.05, sigma: float = 100, nbr_object: sklearn.neighbors.NearestNeighbors = None, method: str = 'ball_tree', verbose: bool = False) → Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, anndata.AnnData][source]#

Starting from a radius-based neighbor graph, generate a sparse graph (csr format) with weighted edges, where edge weights decay with distance.

Note that decay is assumed to follow a Gaussian distribution.

Parameters

spatial_key: Key in .obsm where x- and y-coordinates are stored.
p: Cutoff for Gaussian (used to find where distribution drops below p * (max_value)).
sigma: Standard deviation of the Gaussian.
method: Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”.

Returns

Weighted nearest neighbors graph with shape [n_samples, n_samples]. distance_graph: Unweighted graph with shape [n_samples, n_samples]. adata: Updated AnnData object containing ‘spatial_distances’,’spatial_weights’,’spatial_connectivities’ in .obsp and ‘spatial_neighbors’ in .uns.

Return type

out_graph

spateo.tools.find_neighbors.calculate_distance(position: numpy.ndarray, dist_metric: str = 'euclidean') → numpy.ndarray[source]#: Given array of x- and y-coordinates, compute pairwise distances between all samples using Euclidean distance.

spateo.tools.find_neighbors.construct_spatial_distance_matrix(adata: anndata.AnnData, spatial_key: str = 'spatial', dist_metric: str = 'euclidean', min_dist_threshold: Optional[float] = None, max_dist_threshold: Optional[float] = None) → anndata.AnnData[source]#

Given AnnData object and key to array of x- and y-coordinates, compute pairwise spatial distances between all samples.

Parameters

adata: An AnnData object.
spatial_key: Key in .obsm in which x- and y-coordinates are stored.
dist_metric: Distance metric to use. Options: ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’.
min_dist_threshold: Optional, sets the max allowable distance that a cell can be from its nearest neighbor to avoid being filtered out. Used to remove singular isolated cells.
max_dist_threshold: Optional, used to remove clusters of isolated cells close to one another but far from all other cells.

Returns

Input AnnData object with spatial distance matrix in .obsp.

Return type

adata

spateo.tools.find_neighbors.construct_geodesic_distance_matrix(adata: anndata.AnnData, spatial_key: str = 'spatial', nbr_object: sklearn.neighbors.NearestNeighbors = None, method: str = 'ball_tree', n_neighbors: int = 30, min_dist_threshold: Optional[float] = None, max_dist_threshold: Optional[float] = None) → anndata.AnnData[source]#

Given AnnData object and key to array of x- and y-coordinates, compute geodesic distance each sample and its nearest neighbors (geodesic distance is the shortest path between vertices, where paths are lines in space that connect points).

Parameters

adata: AnnData object.
spatial_key: Key in .obsm in which x- and y-coordinates are stored.
nbr_object: An optional sklearn.neighbors.NearestNeighbors object. Can optionally create a nearest neighbor object with custom functionality.
method: Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”.
n_neighbors: For each bucket, number of neighbors to include in the distance matrix.
min_dist_threshold: Optional, sets the max allowable distance that a cell can be from its nearest neighbor to avoid being filtered out. Used to remove singular isolated cells.
max_dist_threshold: Optional, used to remove clusters of isolated cells close to one another but far from all other cells.

Returns

Input AnnData object with spatial distance matrix and geodesic distance matrix in .obsp.

Return type

adata

spateo.tools.find_neighbors.construct_binned_spatial_distance(adata: anndata.AnnData, bin_size: int = 1, coords_key: str = 'spatial', distance_method: str = 'spatial', min_dist_threshold: Optional[float] = None, max_dist_threshold: Optional[float] = None, distance_metric: Optional[str] = 'euclidean', n_neighbors: Optional[int] = 30)[source]#

Given AnnData object and key to array of x- and y-coordinates, first “collapse” the dataset by aggregating nearby cells together into bins, and then compute pairwise spatial distances between all samples.

Parameters

adata: AnnData object.
bin_size: Shrinking factor to be applied to spatial coordinates; the size of this factor dictates the size of the regions that will be combined into one pseudo-cell (larger -> generally higher number of cells in each bin).
coords_key: Key in .obsm in which spatial coordinates are stored.
distance_method: Options: “spatial” and “geodesic”, indicating that pairwise spatial distance or pairwise geodesic distance should be computed, respectively.
distance_metric: Optional, can be used to change the distance metric used when “distance_method” is “spatial”. Options: ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’.
min_dist_threshold: Optional, sets the max allowable distance that a cell can be from its nearest neighbor to avoid being filtered out. Used to remove singular isolated cells.
max_dist_threshold: Optional, used to remove clusters of isolated cells close to one another but far from all other cells.
n_neighbors: For each bucket, number of neighbors to include in the distance matrix. Must be given if “distance_method” is “geodesic”.

Returns

New AnnData object generated by the binning process. M: Pairwise distance array.

Return type

adata_binned

spateo.tools.find_neighbors.construct_nn_graph(adata: anndata.AnnData, spatial_key: str = 'spatial', dist_metric: str = 'euclidean', n_neighbors: int = 8, exclude_self: bool = True, save_id: Union[None, str] = None) → None[source]#

Constructing bucket-to-bucket nearest neighbors graph.

Parameters

adata: An anndata object.
spatial_key: Key in .obsm in which x- and y-coordinates are stored.
dist_metric: Distance metric to use. Options: ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’.
n_neighbors: Number of nearest neighbors to compute for each bucket.
exclude_self: Set True to set elements along the diagonal to zero.
save_id: Optional string; if not None, will save distance matrix and neighbors matrix to path:
path : './neighbors/{save_id}_distance.csv' and: ‘./neighbors/{save_id}_neighbors.csv’, respectively.

spateo.tools.find_neighbors.normalize_adj(adj: numpy.ndarray, exclude_self: bool = True) → numpy.ndarray[source]#

Symmetrically normalize adjacency matrix, set diagonal to 1 and return processed adjacency array.

Parameters

adj: Pairwise distance matrix of shape [n_samples, n_samples].
exclude_self: Set True to set diagonal of adjacency matrix to 1.

Returns

The normalized adjacency matrix.

Return type

adj_proc

spateo.tools.find_neighbors#

Module Contents#

Functions#

`spateo.tools.find_neighbors`#