Note
This page was generated from 4_svg_archetype.ipynb. Interactive online version: . Some tutorial content may look better in light mode.
4.Identification of spatial variable genes and archetypes#
This notebook demonstrates how to use spateo to identify spatial variable genes, by calculating Moran‘s I score, and how to get spatial gene archetypes.
[1]:
import spateo as st
import numpy as np
st.configuration.set_pub_style_mpltex()
%matplotlib inline
2022-11-13 20:18:33.040154: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
network.py (36): The next major release of pysal/spaghetti (2.0.0) will drop support for all ``libpysal.cg`` geometries. This change is a first step in refactoring ``spaghetti`` that is expected to result in dramatically reduced runtimes for network instantiation and operations. Users currently requiring network and point pattern input as ``libpysal.cg`` geometries should prepare for this simply by converting to ``shapely`` geometries.
Data source#
bin60_clustered_h5ad: https://www.dropbox.com/s/wxgkim87uhpaz1c/mousebrain_bin60_clustered.h5ad?dl=0
[2]:
# Load annotated binning data
fname_bin60 = "mousebrain_bin60_clustered.h5ad"
adata_bin60 = st.sample_data.mousebrain(fname_bin60)
adata_bin60
[2]:
AnnData object with n_obs × n_vars = 7765 × 21667
obs: 'area', 'n_counts', 'Size_Factor', 'initial_cell_size', 'louvain', 'scc', 'scc_anno'
var: 'pass_basic_filter'
uns: '__type', 'louvain', 'louvain_colors', 'neighbors', 'pp', 'scc', 'scc_anno_colors', 'scc_colors', 'spatial', 'spatial_neighbors'
obsm: 'X_pca', 'X_spatial', 'bbox', 'contour', 'spatial'
layers: 'count', 'spliced', 'unspliced'
obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'
SVG identification (Moran’s I)#
[3]:
# Filter out low expressed and uninterested, such as mitochondrial genes
mito_genes = adata_bin60.var_names.str.startswith("mt-")
adata_bin60 = adata_bin60[: ,~mito_genes]
st.pp.filter.filter_genes(adata_bin60,min_cells= adata_bin60.n_obs*0.05, inplace=True)
adata_bin60
[3]:
AnnData object with n_obs × n_vars = 7765 × 5059
obs: 'area', 'n_counts', 'Size_Factor', 'initial_cell_size', 'louvain', 'scc', 'scc_anno'
var: 'pass_basic_filter'
uns: '__type', 'louvain', 'louvain_colors', 'neighbors', 'pp', 'scc', 'scc_anno_colors', 'scc_colors', 'spatial', 'spatial_neighbors'
obsm: 'X_pca', 'X_spatial', 'bbox', 'contour', 'spatial'
layers: 'count', 'spliced', 'unspliced'
obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'
[ ]:
# Calculate Moran's I score for genes
m = st.tl.moran_i(adata_bin60, n_jobs=30)
[5]:
# Filter genes by Moran's I score and adjusted p-value
m_filter = m[(m.moran_i > 0.15)&(m.moran_q_val<0.05)].sort_values(by=['moran_i'],ascending=False)
m_filter[0:10]
[5]:
moran_i | moran_p_val | moran_z | moran_q_val | |
---|---|---|---|---|
Plp1 | 0.529320 | 0.005 | 78.720267 | 0.014116 |
Gm28928 | 0.516217 | 0.005 | 72.395455 | 0.014116 |
Ntng1 | 0.511843 | 0.005 | 72.667184 | 0.014116 |
Mbp | 0.510432 | 0.005 | 77.079408 | 0.014116 |
Gm42418 | 0.468743 | 0.005 | 69.624860 | 0.014116 |
Phactr1 | 0.468161 | 0.005 | 70.079342 | 0.014116 |
Mobp | 0.457718 | 0.005 | 61.732588 | 0.014116 |
Cmss1 | 0.436734 | 0.005 | 65.651464 | 0.014116 |
Kcnip4 | 0.421696 | 0.005 | 61.525662 | 0.014116 |
Camk1d | 0.406778 | 0.005 | 56.055133 | 0.014116 |
[6]:
# Visualize top SVGs
st.pl.space(
adata_bin60,
genes=m_filter.index[0:6].tolist(),
pointsize=0.1,
ncols=3,
show_legend="upper left",
figsize=(2,2),
)
SVG archetypes#
[7]:
# Cluster identified SVGs into archetypes
num_clusters = 15
archetypes, clusters, gene_corrs = st.tl.find_spatial_archetypes(
exp_mat = adata_bin60[:,m_filter.index].X.A.T,
num_clusters = num_clusters,
)
|-----> [Finding gene archetypes] in progress: 100.0000%
|-----> [Finding gene archetypes] finished [0.1889s]
|-----> done!
[8]:
# Create dataframe with genes and its archetype
df_clusters = pd.DataFrame({"gene":m_filter.index.to_list(), "archetype":clusters-1})
df_clusters[0:10]
# Save gene cluster info
# df_clusters.to_csv("archetype_clusters.csv")
[8]:
gene | archetype | |
---|---|---|
0 | Plp1 | 11 |
1 | Gm28928 | 14 |
2 | Ntng1 | 8 |
3 | Mbp | 11 |
4 | Gm42418 | 2 |
5 | Phactr1 | 5 |
6 | Mobp | 11 |
7 | Cmss1 | 1 |
8 | Kcnip4 | 4 |
9 | Camk1d | 1 |
[9]:
# Calculate genes having significant spatial correlation with each archetype
archetypes_dict = st.tl.archetypes_genes(
adata=adata_bin60,
archetypes=archetypes,
num_clusters=num_clusters,
moran_i_genes=m_filter.index,
)
|-----? No genes with significant correlation were found at the current p-value threshold.
[10]:
# Calculate expression scores for archetypes
adata_bin60.obs = st.tl.archetypes(adata_bin60,m_filter.index,num_clusters=num_clusters)
|-----> [Finding gene archetypes] in progress: 100.0000%
|-----> [Finding gene archetypes] finished [0.1891s]
|-----> done!
[11]:
# Visualize archetypes
arch_cols = ["archetype %d" % i for i in np.arange(num_clusters)]
st.pl.space(
adata_bin60,
color = adata_bin60.obs[arch_cols].columns.tolist(),
pointsize=0.1,
ncols=5,
show_legend="upper left",
figsize=(2,2),
)