Note

This page was generated from 4_svg_archetype.ipynb. Interactive online version: . Some tutorial content may look better in light mode.

4.Identification of spatial variable genes and archetypes#

This notebook demonstrates how to use spateo to identify spatial variable genes, by calculating Moran‘s I score, and how to get spatial gene archetypes.

[1]:

import spateo as st

import numpy as np

st.configuration.set_pub_style_mpltex()
%matplotlib inline

2022-11-13 20:18:33.040154: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
network.py (36): The next major release of pysal/spaghetti (2.0.0) will drop support for all ``libpysal.cg`` geometries. This change is a first step in refactoring ``spaghetti`` that is expected to result in dramatically reduced runtimes for network instantiation and operations. Users currently requiring network and point pattern input as ``libpysal.cg`` geometries should prepare for this simply by converting to ``shapely`` geometries.

Data source#

bin60_clustered_h5ad: https://www.dropbox.com/s/wxgkim87uhpaz1c/mousebrain_bin60_clustered.h5ad?dl=0

[2]:

# Load annotated binning data
fname_bin60 = "mousebrain_bin60_clustered.h5ad"
adata_bin60 = st.sample_data.mousebrain(fname_bin60)

adata_bin60

[2]:

AnnData object with n_obs × n_vars = 7765 × 21667
    obs: 'area', 'n_counts', 'Size_Factor', 'initial_cell_size', 'louvain', 'scc', 'scc_anno'
    var: 'pass_basic_filter'
    uns: '__type', 'louvain', 'louvain_colors', 'neighbors', 'pp', 'scc', 'scc_anno_colors', 'scc_colors', 'spatial', 'spatial_neighbors'
    obsm: 'X_pca', 'X_spatial', 'bbox', 'contour', 'spatial'
    layers: 'count', 'spliced', 'unspliced'
    obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'

SVG identification (Moran’s I)#

[3]:

# Filter out low expressed and uninterested, such as mitochondrial genes
mito_genes = adata_bin60.var_names.str.startswith("mt-")
adata_bin60 = adata_bin60[: ,~mito_genes]

st.pp.filter.filter_genes(adata_bin60,min_cells= adata_bin60.n_obs*0.05, inplace=True)
adata_bin60

[3]:

AnnData object with n_obs × n_vars = 7765 × 5059
    obs: 'area', 'n_counts', 'Size_Factor', 'initial_cell_size', 'louvain', 'scc', 'scc_anno'
    var: 'pass_basic_filter'
    uns: '__type', 'louvain', 'louvain_colors', 'neighbors', 'pp', 'scc', 'scc_anno_colors', 'scc_colors', 'spatial', 'spatial_neighbors'
    obsm: 'X_pca', 'X_spatial', 'bbox', 'contour', 'spatial'
    layers: 'count', 'spliced', 'unspliced'
    obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'

[ ]:

# Calculate Moran's I score for genes
m = st.tl.moran_i(adata_bin60, n_jobs=30)

[5]:

# Filter genes by Moran's I score and adjusted p-value
m_filter = m[(m.moran_i > 0.15)&(m.moran_q_val<0.05)].sort_values(by=['moran_i'],ascending=False)
m_filter[0:10]

[5]:

	moran_i	moran_p_val	moran_z	moran_q_val
Plp1	0.529320	0.005	78.720267	0.014116
Gm28928	0.516217	0.005	72.395455	0.014116
Ntng1	0.511843	0.005	72.667184	0.014116
Mbp	0.510432	0.005	77.079408	0.014116
Gm42418	0.468743	0.005	69.624860	0.014116
Phactr1	0.468161	0.005	70.079342	0.014116
Mobp	0.457718	0.005	61.732588	0.014116
Cmss1	0.436734	0.005	65.651464	0.014116
Kcnip4	0.421696	0.005	61.525662	0.014116
Camk1d	0.406778	0.005	56.055133	0.014116

[6]:

# Visualize top SVGs
st.pl.space(
    adata_bin60,
    genes=m_filter.index[0:6].tolist(),
    pointsize=0.1,
    ncols=3,
    show_legend="upper left",
    figsize=(2,2),
)

../../../_images/tutorials_notebooks_2_cluster_digitization_4_svg_archetype_8_0.png

SVG archetypes#

[7]:

# Cluster identified SVGs into archetypes
num_clusters = 15

archetypes, clusters, gene_corrs = st.tl.find_spatial_archetypes(
    exp_mat = adata_bin60[:,m_filter.index].X.A.T,
    num_clusters = num_clusters,
)

|-----> [Finding gene archetypes] in progress: 100.0000%
|-----> [Finding gene archetypes] finished [0.1889s]
|-----> done!

[8]:

# Create dataframe with genes and its archetype
df_clusters = pd.DataFrame({"gene":m_filter.index.to_list(), "archetype":clusters-1})
df_clusters[0:10]

# Save gene cluster info
# df_clusters.to_csv("archetype_clusters.csv")

[8]:

	gene	archetype
0	Plp1	11
1	Gm28928	14
2	Ntng1	8
3	Mbp	11
4	Gm42418	2
5	Phactr1	5
6	Mobp	11
7	Cmss1	1
8	Kcnip4	4
9	Camk1d	1

[9]:

# Calculate genes having significant spatial correlation with each archetype
archetypes_dict = st.tl.archetypes_genes(
    adata=adata_bin60,
    archetypes=archetypes,
    num_clusters=num_clusters,
    moran_i_genes=m_filter.index,
)

|-----? No genes with significant correlation were found at the current p-value threshold.

[10]:

# Calculate expression scores for archetypes
adata_bin60.obs = st.tl.archetypes(adata_bin60,m_filter.index,num_clusters=num_clusters)

|-----> [Finding gene archetypes] in progress: 100.0000%
|-----> [Finding gene archetypes] finished [0.1891s]
|-----> done!

[11]:

# Visualize archetypes
arch_cols = ["archetype %d" % i for i in np.arange(num_clusters)]
st.pl.space(
    adata_bin60,
    color = adata_bin60.obs[arch_cols].columns.tolist(),
    pointsize=0.1,
    ncols=5,
    show_legend="upper left",
    figsize=(2,2),
)

../../../_images/tutorials_notebooks_2_cluster_digitization_4_svg_archetype_14_0.png