spateo.tools.find_neighbors#

Functions for finding nearest neighbors and the distances between them in spatial transcriptomics data.

Module Contents#

Functions#

weighted_spatial_graph(...)

Given an AnnData object, compute distance array with either a fixed number of neighbors for each bucket or a

weighted_expr_neighbors_graph(...)

Given an AnnData object, compute distance array in gene expression space.

transcriptomic_connectivity(...)

Given an AnnData object, compute pairwise connectivity matrix in transcriptomic space

remove_greater_than(→ scipy.sparse.csr_matrix)

Remove values greater than a threshold from a sparse matrix.

generate_spatial_distance_graph(...)

Creates graph based on distance in space.

generate_spatial_weights_fixed_nbrs(...)

Starting from a k-nearest neighbor graph, generate a nearest neighbor graph.

gaussian_weight_2d(→ float)

Calculate normalized gaussian value for a given distance from central point

p_equiv_radius(→ float)

Find radius at which you eliminate fraction p of a radial Gaussian probability distribution with standard

generate_spatial_weights_fixed_radius(...)

Starting from a radius-based neighbor graph, generate a sparse graph (csr format) with weighted edges, where edge

calculate_distance(→ numpy.ndarray)

Given array of x- and y-coordinates, compute pairwise distances between all samples using Euclidean distance.

construct_spatial_distance_matrix(→ anndata.AnnData)

Given AnnData object and key to array of x- and y-coordinates, compute pairwise spatial distances between all

construct_geodesic_distance_matrix(→ anndata.AnnData)

Given AnnData object and key to array of x- and y-coordinates, compute geodesic distance each sample and its

construct_binned_spatial_distance(adata[, bin_size, ...])

Given AnnData object and key to array of x- and y-coordinates, first "collapse" the dataset by aggregating

construct_nn_graph(→ None)

Constructing bucket-to-bucket nearest neighbors graph.

normalize_adj(→ numpy.ndarray)

Symmetrically normalize adjacency matrix, set diagonal to 1 and return processed adjacency array.

spateo.tools.find_neighbors.weighted_spatial_graph(adata: anndata.AnnData, spatial_key: str = 'spatial', fixed: str = 'n_neighbors', n_neighbors_method: str = 'ball_tree', n_neighbors: int = 30, decay_type: str = 'reciprocal', p: float = 0.05, sigma: float = 100) Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, anndata.AnnData][source]#

Given an AnnData object, compute distance array with either a fixed number of neighbors for each bucket or a fixed search radius for each bucket. Additional note: parameters ‘p’ and ‘sigma’ (used only if ‘fixed’ is ‘radius’) are used to modulate the radius when defining neighbors using a fixed radius. ‘Sigma’ parameterizes the standard deviation (e.g. in pixels, micrometers, etc.) of a Gaussian distribution that is centered at a particular bucket with height ‘a’- to search for that bucket’s neighbors, ‘p’ is the cutoff height of the Gaussian, as a proportion of the peak height ‘a’. Essentially, to define the radius that should be used for all buckets, this function measures how far out from each bucket you would need to go before the Gaussian decays to e.g. 0.05 of its peak height. With knowledge of e.g. diffusion kinetics for particular soluble factors, the neighborhood can be defined taking this into account.

Parameters
adata

an anndata object.

spatial_key

Key in .obsm containing coordinates for each bucket.

fixed

Options: ‘n_neighbors’, ‘radius’- sets either fixed number of neighbors or fixed search radius for each bucket.

n_neighbors_method

Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”. Unused unless ‘fixed’ is ‘n_neighbors’.

n_neighbors

Number of neighbors each bucket has. Unused unless ‘fixed’ is ‘n_neighbors’.

decay_type

Sets method by which edge weights are defined. Options: “reciprocal”, “ranked”, “uniform”. Unused unless ‘fixed’ is ‘n_neighbors’.

p

Cutoff for Gaussian (used to find where distribution drops below p * (max_value)). Unused unless ‘fixed’ is ‘radius’.

sigma

Standard deviation of the Gaussian. Unused unless ‘fixed’ is ‘radius’.

Returns

Weighted nearest neighbors graph with shape [n_samples, n_samples] distance_graph: Unweighted graph with shape [n_samples, n_samples] adata: Updated AnnData object containing ‘spatial_distances’,’spatial_weights’,’spatial_connectivities’ in .obsp and ‘spatial_neighbors’ in .uns.

Return type

out_graph

spateo.tools.find_neighbors.weighted_expr_neighbors_graph(adata: anndata.AnnData, nbr_object: sklearn.neighbors.NearestNeighbors = None, basis: str = 'pca', n_neighbors_method: str = 'ball_tree', n_pca_components: int = 30, num_neighbors: int = 30, decay_type: str = 'reciprocal') Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, anndata.AnnData][source]#

Given an AnnData object, compute distance array in gene expression space.

Parameters
adata

an anndata object.

nbr_object

An optional sklearn.neighbors.NearestNeighbors object. Can optionally create a nearest neighbor object with custom functionality.

basis

str, default ‘pca’ The space that will be used for nearest neighbor search. Valid names includes, for example, pca, umap, or X

n_neighbors_method

str, default ‘ball_tree’ Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”.

n_pca_components

Only used if ‘basis’ is ‘pca’. Sets number of principal components to compute.

num_neighbors

Number of neighbors for each bucket, used in computing distance graph

decay_type

Sets method by which edge weights are defined. Options: “reciprocal”, “ranked”, “uniform”.

Returns

Weighted k-nearest neighbors graph with shape [n_samples, n_samples]. distance_graph: Unweighted graph with shape [n_samples, n_samples]. adata: Updated AnnData object containing ‘spatial_distance’ in .obsp and ‘spatial_neighbors’ in .uns.

Return type

out_graph

spateo.tools.find_neighbors.transcriptomic_connectivity(adata: anndata.AnnData, nbr_object: sklearn.neighbors.NearestNeighbors = None, basis: str = 'pca', n_neighbors_method: str = 'ball_tree', n_pca_components: int = 30) Tuple[sklearn.neighbors.NearestNeighbors, anndata.AnnData][source]#

Given an AnnData object, compute pairwise connectivity matrix in transcriptomic space

Parameters
adata

an anndata object.

nbr_object

An optional sklearn.neighbors.NearestNeighbors object. Can optionally create a nearest neighbor object with custom functionality.

basis

str, default ‘pca’ The space that will be used for nearest neighbor search. Valid names includes, for example, pca, umap, or X

n_neighbors_method

str, default ‘ball_tree’ Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”.

n_pca_components

Only used if ‘basis’ is ‘pca’. Sets number of principal components to compute.

num_neighbors

Number of neighbors for each bucket, used in computing distance graph

Returns

Object of class sklearn.neighbors.NearestNeighbors adata : Modified AnnData object

Return type

nbrs

spateo.tools.find_neighbors.remove_greater_than(graph: scipy.sparse.csr_matrix, threshold: float, copy: bool = False, verbose: bool = False) scipy.sparse.csr_matrix[source]#

Remove values greater than a threshold from a sparse matrix.

Parameters
graph

The input scipy matrix of the graph.

threshold

Upper numerical threshold to avoid filtering.

copy

Set True to avoid altering original graph.

verbose

Set True to display messages at runtime- not recommended generally since this will print entire arrays.

Returns

The updated graph with values greater than the threshold removed.

Return type

graph

spateo.tools.find_neighbors.generate_spatial_distance_graph(locations: numpy.ndarray, nbr_object: sklearn.neighbors.NearestNeighbors = None, method: str = 'ball_tree', num_neighbors: Union[None, int] = None, radius: Union[None, float] = None) Tuple[sklearn.neighbors.NearestNeighbors, scipy.sparse.csr_matrix][source]#

Creates graph based on distance in space.

Parameters
locations

Spatial coordinates for each bucket with shape [n_samples, 2]

nbr_object

An optional sklearn.neighbors.NearestNeighbors object. Can optionally create a nearest neighbor object with custom functionality.

method

Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”.

num_neighbors

Number of neighbors for each bucket.

radius

Search radius around each bucket.

Returns

sklearn NearestNeighbor object. graph_out: A sparse matrix of the spatial graph.

Return type

nbrs

spateo.tools.find_neighbors.generate_spatial_weights_fixed_nbrs(adata: anndata.AnnData, spatial_key: str = 'spatial', num_neighbors: int = 10, method: str = 'ball_tree', decay_type: str = 'reciprocal', nbr_object: sklearn.neighbors.NearestNeighbors = None) Union[Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, anndata.AnnData]][source]#

Starting from a k-nearest neighbor graph, generate a nearest neighbor graph.

Parameters
spatial_key

Key in .obsm where x- and y-coordinates are stored.

num_neighbors

Number of neighbors each bucket has.

method

Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options:

"kd_tree". : "ball_tree" and

decay_type

Sets method by which edge weights are defined. Options: “reciprocal”, “ranked”, “uniform”.

Returns

Weighted k-nearest neighbors graph with shape [n_samples, n_samples]. distance_graph: Unweighted graph with shape [n_samples, n_samples]. adata: Updated AnnData object containing ‘spatial_distances’,’spatial_weights’,’spatial_connectivities’ in .obsp and ‘spatial_neighbors’ in .uns.

Return type

out_graph

spateo.tools.find_neighbors.gaussian_weight_2d(distance: float, sigma: float) float[source]#

Calculate normalized gaussian value for a given distance from central point Normalized by 2*pi*sigma-squared

spateo.tools.find_neighbors.p_equiv_radius(p: float, sigma: float) float[source]#

Find radius at which you eliminate fraction p of a radial Gaussian probability distribution with standard deviation sigma.

spateo.tools.find_neighbors.generate_spatial_weights_fixed_radius(adata: anndata.AnnData, spatial_key: str = 'spatial', p: float = 0.05, sigma: float = 100, nbr_object: sklearn.neighbors.NearestNeighbors = None, method: str = 'ball_tree', verbose: bool = False) Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, anndata.AnnData][source]#

Starting from a radius-based neighbor graph, generate a sparse graph (csr format) with weighted edges, where edge weights decay with distance.

Note that decay is assumed to follow a Gaussian distribution.

Parameters
spatial_key

Key in .obsm where x- and y-coordinates are stored.

p

Cutoff for Gaussian (used to find where distribution drops below p * (max_value)).

sigma

Standard deviation of the Gaussian.

method

Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”.

Returns

Weighted nearest neighbors graph with shape [n_samples, n_samples]. distance_graph: Unweighted graph with shape [n_samples, n_samples]. adata: Updated AnnData object containing ‘spatial_distances’,’spatial_weights’,’spatial_connectivities’ in .obsp and ‘spatial_neighbors’ in .uns.

Return type

out_graph

spateo.tools.find_neighbors.calculate_distance(position: numpy.ndarray, dist_metric: str = 'euclidean') numpy.ndarray[source]#

Given array of x- and y-coordinates, compute pairwise distances between all samples using Euclidean distance.

spateo.tools.find_neighbors.construct_spatial_distance_matrix(adata: anndata.AnnData, spatial_key: str = 'spatial', dist_metric: str = 'euclidean', min_dist_threshold: Optional[float] = None, max_dist_threshold: Optional[float] = None) anndata.AnnData[source]#

Given AnnData object and key to array of x- and y-coordinates, compute pairwise spatial distances between all samples.

Parameters
adata

An AnnData object.

spatial_key

Key in .obsm in which x- and y-coordinates are stored.

dist_metric

Distance metric to use. Options: ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’.

min_dist_threshold

Optional, sets the max allowable distance that a cell can be from its nearest neighbor to avoid being filtered out. Used to remove singular isolated cells.

max_dist_threshold

Optional, used to remove clusters of isolated cells close to one another but far from all other cells.

Returns

Input AnnData object with spatial distance matrix in .obsp.

Return type

adata

spateo.tools.find_neighbors.construct_geodesic_distance_matrix(adata: anndata.AnnData, spatial_key: str = 'spatial', nbr_object: sklearn.neighbors.NearestNeighbors = None, method: str = 'ball_tree', n_neighbors: int = 30, min_dist_threshold: Optional[float] = None, max_dist_threshold: Optional[float] = None) anndata.AnnData[source]#

Given AnnData object and key to array of x- and y-coordinates, compute geodesic distance each sample and its nearest neighbors (geodesic distance is the shortest path between vertices, where paths are lines in space that connect points).

Parameters
adata

AnnData object.

spatial_key

Key in .obsm in which x- and y-coordinates are stored.

nbr_object

An optional sklearn.neighbors.NearestNeighbors object. Can optionally create a nearest neighbor object with custom functionality.

method

Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”.

n_neighbors

For each bucket, number of neighbors to include in the distance matrix.

min_dist_threshold

Optional, sets the max allowable distance that a cell can be from its nearest neighbor to avoid being filtered out. Used to remove singular isolated cells.

max_dist_threshold

Optional, used to remove clusters of isolated cells close to one another but far from all other cells.

Returns

Input AnnData object with spatial distance matrix and geodesic distance matrix in .obsp.

Return type

adata

spateo.tools.find_neighbors.construct_binned_spatial_distance(adata: anndata.AnnData, bin_size: int = 1, coords_key: str = 'spatial', distance_method: str = 'spatial', min_dist_threshold: Optional[float] = None, max_dist_threshold: Optional[float] = None, distance_metric: Optional[str] = 'euclidean', n_neighbors: Optional[int] = 30)[source]#

Given AnnData object and key to array of x- and y-coordinates, first “collapse” the dataset by aggregating nearby cells together into bins, and then compute pairwise spatial distances between all samples.

Parameters
adata

AnnData object.

bin_size

Shrinking factor to be applied to spatial coordinates; the size of this factor dictates the size of the regions that will be combined into one pseudo-cell (larger -> generally higher number of cells in each bin).

coords_key

Key in .obsm in which spatial coordinates are stored.

distance_method

Options: “spatial” and “geodesic”, indicating that pairwise spatial distance or pairwise geodesic distance should be computed, respectively.

distance_metric

Optional, can be used to change the distance metric used when “distance_method” is “spatial”. Options: ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’.

min_dist_threshold

Optional, sets the max allowable distance that a cell can be from its nearest neighbor to avoid being filtered out. Used to remove singular isolated cells.

max_dist_threshold

Optional, used to remove clusters of isolated cells close to one another but far from all other cells.

n_neighbors

For each bucket, number of neighbors to include in the distance matrix. Must be given if “distance_method” is “geodesic”.

Returns

New AnnData object generated by the binning process. M: Pairwise distance array.

Return type

adata_binned

spateo.tools.find_neighbors.construct_nn_graph(adata: anndata.AnnData, spatial_key: str = 'spatial', dist_metric: str = 'euclidean', n_neighbors: int = 8, exclude_self: bool = True, save_id: Union[None, str] = None) None[source]#

Constructing bucket-to-bucket nearest neighbors graph.

Parameters
adata

An anndata object.

spatial_key

Key in .obsm in which x- and y-coordinates are stored.

dist_metric

Distance metric to use. Options: ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’.

n_neighbors

Number of nearest neighbors to compute for each bucket.

exclude_self

Set True to set elements along the diagonal to zero.

save_id

Optional string; if not None, will save distance matrix and neighbors matrix to path:

path : './neighbors/{save_id}_distance.csv' and

‘./neighbors/{save_id}_neighbors.csv’, respectively.

spateo.tools.find_neighbors.normalize_adj(adj: numpy.ndarray, exclude_self: bool = True) numpy.ndarray[source]#

Symmetrically normalize adjacency matrix, set diagonal to 1 and return processed adjacency array.

Parameters
adj

Pairwise distance matrix of shape [n_samples, n_samples].

exclude_self

Set True to set diagonal of adjacency matrix to 1.

Returns

The normalized adjacency matrix.

Return type

adj_proc