Analysis of Enteric Glia on TAURUS dataset

Author

Jay V. Patankar

Published

January 20, 2025

This notebook analyzes the enteric glia signatures on the TAURUS dataset of IBD patients (Thomas et al. doi.org/10.1038/s41590-024-01994-8)

Importing packages and downloading the dataset

System requirements

Recommend atleast 64GB of RAM for this analysis

Jump to data

Initial parts of the code explore the structure of the dataset. For the analysis jump to the section Clean up UMAP to exclude outlier cells

Code

```{python}
# Import necessary libraries
import scanpy as sc
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# Enable axis ticks globally in Scanpy
sc.settings.set_figure_params(dpi=300, frameon=True, figsize=(4, 4))
sc.settings._vector_friendly = True  # Enable better vector compatibility

# Download the vasular h5ad object from the 
#!wget -o ./vasc_final.h5ad https://zenodo.org/records/14007626/files/vasc_final.h5ad?download=1

# Load the h5ad file
adata = sc.read_h5ad("./vasc_final.h5ad?download=1")

# View the AnnData object
print(adata)
```

AnnData object with n_obs × n_vars = 71809 × 33075
    obs: 'sample_id', 'Patient', 'Disease', 'Site', 'Treatment', 'Disease_duration', 'Inflammation', 'Age', 'Gender', 'Ethnicity', 'Inflammation_score', 'Ileum_vs_Colon', 'LibraryType', 'CellsLoaded', 'Match', 'Batch', 'doublet_scores', 'predicted_doublets', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_rp', 'pct_counts_rp', 'total_counts_hb', 'pct_counts_hb', 'total_counts_ig', 'pct_counts_ig', 'S_score', 'G2M_score', 'phase', 'cellbarcode', 'diff', 'final_analysis', 'minor', 'major', 'sub_bucket', 'bucket', 'Remission_status'
    var: 'feature_types', 'gene_id', 'hb', 'ig', 'rp', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'gene_symbol'
    uns: 'final_analysis_colors', 'log1p', 'sample_id_colors'
    obsm: 'X_harmony', 'X_pca', 'X_umap'

Code

```{python}
# View the expression matrix dimensions (cells x genes)
display(adata.shape)

# View metadata (observations: cell-level data)
display(adata.obs.head())

# View variable (features: gene-level data)
display(adata.var.head())
```

(71809, 33075)

	sample_id	Patient	Disease	Site	Treatment	Disease_duration	Inflammation	Age	Gender	Ethnicity	...	G2M_score	phase	cellbarcode	diff	final_analysis	minor	major	sub_bucket	bucket	Remission_status
AAACCCAGTCACGACC-1-CID003352-2	CID003352-2	UC1	UC	Rectum	Post	29.0	Inflamed	26	M	Caucasian	...	-1.251166	G1	AAACCCAGTCACGACC-1-CID003352-2	8.0	Arterial endothelium	Blood_endothelium	Endothelium	Vascular	Stroma	Non_Remission
AAACGAATCAAAGAAC-1-CID003352-2	CID003352-2	UC1	UC	Rectum	Post	29.0	Inflamed	26	M	Caucasian	...	-0.800117	G1	AAACGAATCAAAGAAC-1-CID003352-2	8.0	Venous endothelium	Blood_endothelium	Endothelium	Vascular	Stroma	Non_Remission
AAAGAACTCACGATAC-1-CID003352-2	CID003352-2	UC1	UC	Rectum	Post	29.0	Inflamed	26	M	Caucasian	...	-0.780303	G1	AAAGAACTCACGATAC-1-CID003352-2	8.0	Arterial endothelium	Blood_endothelium	Endothelium	Vascular	Stroma	Non_Remission
AAAGGGCCAGTACTAC-1-CID003352-2	CID003352-2	UC1	UC	Rectum	Post	29.0	Inflamed	26	M	Caucasian	...	-0.189394	G1	AAAGGGCCAGTACTAC-1-CID003352-2	8.0	Arterial endothelium	Blood_endothelium	Endothelium	Vascular	Stroma	Non_Remission
AAAGGGCGTGCCGTAC-1-CID003352-2	CID003352-2	UC1	UC	Rectum	Post	29.0	Inflamed	26	M	Caucasian	...	-0.406760	G1	AAAGGGCGTGCCGTAC-1-CID003352-2	8.0	Arterial endothelium	Blood_endothelium	Endothelium	Vascular	Stroma	Non_Remission

5 rows × 39 columns

	feature_types	gene_id	hb	ig	rp	mt	n_cells_by_counts	mean_counts	pct_dropout_by_counts	total_counts	gene_symbol
MIR1302-2HG	Gene Expression	ENSG00000243485	False	False	False	False	7	0.000004	99.999586	7.0	MIR1302-2HG
FAM138A	Gene Expression	ENSG00000237613	False	False	False	False	0	0.000000	100.000000	0.0	FAM138A
OR4F5	Gene Expression	ENSG00000186092	False	False	False	False	10	0.000006	99.999408	10.0	OR4F5
AL627309.1	Gene Expression	ENSG00000238009	False	False	False	False	2267	0.001351	99.865826	2283.0	AL627309.1
AL627309.3	Gene Expression	ENSG00000239945	False	False	False	False	93	0.000055	99.994496	93.0	AL627309.3

Code

```{python}
# List all metadata columns
print(adata.obs.columns)

```

Index(['sample_id', 'Patient', 'Disease', 'Site', 'Treatment',
       'Disease_duration', 'Inflammation', 'Age', 'Gender', 'Ethnicity',
       'Inflammation_score', 'Ileum_vs_Colon', 'LibraryType', 'CellsLoaded',
       'Match', 'Batch', 'doublet_scores', 'predicted_doublets',
       'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt',
       'total_counts_rp', 'pct_counts_rp', 'total_counts_hb', 'pct_counts_hb',
       'total_counts_ig', 'pct_counts_ig', 'S_score', 'G2M_score', 'phase',
       'cellbarcode', 'diff', 'final_analysis', 'minor', 'major', 'sub_bucket',
       'bucket', 'Remission_status'],
      dtype='object')

Code

```{python}
# Display rows in .var where 'mt' is True
print(adata.var[adata.var["mt"] == True])

# Display rows in .var where 'rp' is True
print(adata.var[adata.var["rp"] == True])

```

           feature_types          gene_id     hb     ig     rp    mt  \
MT-ND1   Gene Expression  ENSG00000198888  False  False  False  True   
MT-ND2   Gene Expression  ENSG00000198763  False  False  False  True   
MT-CO1   Gene Expression  ENSG00000198804  False  False  False  True   
MT-CO2   Gene Expression  ENSG00000198712  False  False  False  True   
MT-ATP8  Gene Expression  ENSG00000228253  False  False  False  True   
MT-ATP6  Gene Expression  ENSG00000198899  False  False  False  True   
MT-CO3   Gene Expression  ENSG00000198938  False  False  False  True   
MT-ND3   Gene Expression  ENSG00000198840  False  False  False  True   
MT-ND4L  Gene Expression  ENSG00000212907  False  False  False  True   
MT-ND4   Gene Expression  ENSG00000198886  False  False  False  True   
MT-ND5   Gene Expression  ENSG00000198786  False  False  False  True   
MT-ND6   Gene Expression  ENSG00000198695  False  False  False  True   
MT-CYB   Gene Expression  ENSG00000198727  False  False  False  True   

         n_cells_by_counts  mean_counts  pct_dropout_by_counts  total_counts  \
MT-ND1             1655299    65.479156               2.029778   110633184.0   
MT-ND2             1651504    55.436966               2.254388    93665968.0   
MT-CO1             1678673   205.760086               0.646368   347651008.0   
MT-CO2             1680832   188.335724               0.518586   318210912.0   
MT-ATP8             964999     3.289358              42.885747     5557680.0   
MT-ATP6            1680678   158.229828               0.527701   267344176.0   
MT-CO3             1679870   205.103928               0.575523   346542368.0   
MT-ND3             1671689    98.200294               1.059722   165918624.0   
MT-ND4L            1249812     6.788721              26.028857    11470182.0   
MT-ND4             1672441   119.530106               1.015214   201957344.0   
MT-ND5             1580236    27.525661               6.472442    46507192.0   
MT-ND6              694416     1.071487              58.900422     1810378.0   
MT-CYB             1674761   113.263618               0.877903   191369536.0   

        gene_symbol  
MT-ND1       MT-ND1  
MT-ND2       MT-ND2  
MT-CO1       MT-CO1  
MT-CO2       MT-CO2  
MT-ATP8     MT-ATP8  
MT-ATP6     MT-ATP6  
MT-CO3       MT-CO3  
MT-ND3       MT-ND3  
MT-ND4L     MT-ND4L  
MT-ND4       MT-ND4  
MT-ND5       MT-ND5  
MT-ND6       MT-ND6  
MT-CYB       MT-CYB  
            feature_types          gene_id     hb     ig    rp     mt  \
RPL22     Gene Expression  ENSG00000116251  False  False  True  False   
RPL11     Gene Expression  ENSG00000142676  False  False  True  False   
RPS6KA1   Gene Expression  ENSG00000117676  False  False  True  False   
RPA2      Gene Expression  ENSG00000117748  False  False  True  False   
RPS8      Gene Expression  ENSG00000142937  False  False  True  False   
...                   ...              ...    ...    ...   ...    ...   
RPS5      Gene Expression  ENSG00000083845  False  False  True  False   
RPS4Y1    Gene Expression  ENSG00000129824  False  False  True  False   
RPS4Y2    Gene Expression  ENSG00000280969  False  False  True  False   
RPL3      Gene Expression  ENSG00000100316  False  False  True  False   
RPS19BP1  Gene Expression  ENSG00000187051  False  False  True  False   

          n_cells_by_counts  mean_counts  pct_dropout_by_counts  total_counts  \
RPL22               1241204     4.611098              26.538328     7790884.0   
RPL11               1437165    10.694373              14.940216    18069148.0   
RPS6KA1              169508     0.135060              89.967531      228196.0   
RPA2                 142949     0.099401              91.539447      167948.0   
RPS8                1445519    13.642202              14.445778    23049784.0   
...                     ...          ...                    ...           ...   
RPS5                1284961     6.358028              23.948534    10742487.0   
RPS4Y1               433926     0.760512              74.317736     1284956.0   
RPS4Y2                  416     0.000259              99.975379         438.0   
RPL3                1410277    10.791222              16.531605    18232784.0   
RPS19BP1             409100     0.381764              75.787083      645026.0   

         gene_symbol  
RPL22          RPL22  
RPL11          RPL11  
RPS6KA1      RPS6KA1  
RPA2            RPA2  
RPS8            RPS8  
...              ...  
RPS5            RPS5  
RPS4Y1        RPS4Y1  
RPS4Y2        RPS4Y2  
RPL3            RPL3  
RPS19BP1    RPS19BP1  

[147 rows x 11 columns]

Code

```{python}
# View the first few rows of metadata
print(adata.obs.head())
```

                                  sample_id Patient Disease    Site Treatment  \
AAACCCAGTCACGACC-1-CID003352-2  CID003352-2     UC1      UC  Rectum      Post   
AAACGAATCAAAGAAC-1-CID003352-2  CID003352-2     UC1      UC  Rectum      Post   
AAAGAACTCACGATAC-1-CID003352-2  CID003352-2     UC1      UC  Rectum      Post   
AAAGGGCCAGTACTAC-1-CID003352-2  CID003352-2     UC1      UC  Rectum      Post   
AAAGGGCGTGCCGTAC-1-CID003352-2  CID003352-2     UC1      UC  Rectum      Post   

                                Disease_duration Inflammation  Age Gender  \
AAACCCAGTCACGACC-1-CID003352-2              29.0     Inflamed   26      M   
AAACGAATCAAAGAAC-1-CID003352-2              29.0     Inflamed   26      M   
AAAGAACTCACGATAC-1-CID003352-2              29.0     Inflamed   26      M   
AAAGGGCCAGTACTAC-1-CID003352-2              29.0     Inflamed   26      M   
AAAGGGCGTGCCGTAC-1-CID003352-2              29.0     Inflamed   26      M   

                                Ethnicity  ...  G2M_score phase  \
AAACCCAGTCACGACC-1-CID003352-2  Caucasian  ...  -1.251166    G1   
AAACGAATCAAAGAAC-1-CID003352-2  Caucasian  ...  -0.800117    G1   
AAAGAACTCACGATAC-1-CID003352-2  Caucasian  ...  -0.780303    G1   
AAAGGGCCAGTACTAC-1-CID003352-2  Caucasian  ...  -0.189394    G1   
AAAGGGCGTGCCGTAC-1-CID003352-2  Caucasian  ...  -0.406760    G1   

                                                   cellbarcode  diff  \
AAACCCAGTCACGACC-1-CID003352-2  AAACCCAGTCACGACC-1-CID003352-2   8.0   
AAACGAATCAAAGAAC-1-CID003352-2  AAACGAATCAAAGAAC-1-CID003352-2   8.0   
AAAGAACTCACGATAC-1-CID003352-2  AAAGAACTCACGATAC-1-CID003352-2   8.0   
AAAGGGCCAGTACTAC-1-CID003352-2  AAAGGGCCAGTACTAC-1-CID003352-2   8.0   
AAAGGGCGTGCCGTAC-1-CID003352-2  AAAGGGCGTGCCGTAC-1-CID003352-2   8.0   

                                      final_analysis              minor  \
AAACCCAGTCACGACC-1-CID003352-2  Arterial endothelium  Blood_endothelium   
AAACGAATCAAAGAAC-1-CID003352-2    Venous endothelium  Blood_endothelium   
AAAGAACTCACGATAC-1-CID003352-2  Arterial endothelium  Blood_endothelium   
AAAGGGCCAGTACTAC-1-CID003352-2  Arterial endothelium  Blood_endothelium   
AAAGGGCGTGCCGTAC-1-CID003352-2  Arterial endothelium  Blood_endothelium   

                                      major  sub_bucket  bucket  \
AAACCCAGTCACGACC-1-CID003352-2  Endothelium    Vascular  Stroma   
AAACGAATCAAAGAAC-1-CID003352-2  Endothelium    Vascular  Stroma   
AAAGAACTCACGATAC-1-CID003352-2  Endothelium    Vascular  Stroma   
AAAGGGCCAGTACTAC-1-CID003352-2  Endothelium    Vascular  Stroma   
AAAGGGCGTGCCGTAC-1-CID003352-2  Endothelium    Vascular  Stroma   

                                Remission_status  
AAACCCAGTCACGACC-1-CID003352-2     Non_Remission  
AAACGAATCAAAGAAC-1-CID003352-2     Non_Remission  
AAAGAACTCACGATAC-1-CID003352-2     Non_Remission  
AAAGGGCCAGTACTAC-1-CID003352-2     Non_Remission  
AAAGGGCGTGCCGTAC-1-CID003352-2     Non_Remission  

[5 rows x 39 columns]

Code

```{python}
# access unique values in a metadata column
display(adata.obs["Ethnicity"].value_counts())
display(adata.obs["Site"].value_counts())
display(adata.obs["Treatment"].value_counts())
display(adata.obs["Remission_status"].value_counts())

```

Ethnicity
Caucasian         68726
Afro-Carribean     1810
Middle_Eastern     1273
Name: count, dtype: int64

Site
Rectum              23429
Descending_Colon    20929
Sigmoid             11014
Ascending_Colon      8459
Terminal_Ileum       7978
Name: count, dtype: int64

Treatment
Pre     36632
Post    32523
Name: count, dtype: int64

Remission_status
Non_Remission    36491
Remission        30345
None              2654
Not_avail         2319
Name: count, dtype: int64

Code

```{python}
adata.obs["major"].unique()
```

['Endothelium', 'Cycling_stroma', 'Glial']
Categories (3, object): ['Cycling_stroma', 'Endothelium', 'Glial']

Code

```{python}
# count the number of cells with a certain metadata affiliation
display(adata.obs["major"].value_counts())
display(adata.obs["minor"].value_counts())
```

major
Endothelium       55278
Glial             14783
Cycling_stroma     1748
Name: count, dtype: int64

minor
Blood_endothelium        54710
Glial                    14783
Cycling_stroma            1748
Lymphatic_endothelium      568
Name: count, dtype: int64

Compute UMAP, subset data and perform QC

Code

```{python}
# Compute and plot UMAP
sc.pp.neighbors(adata)
sc.tl.umap(adata)

```

Code

```{python}
# Subset to only cells with metadata value 'Glial' and cells with the umap1 < 2.9 and umap2 < 5
adata_glial = adata[adata.obs["major"] == "Glial"]


# Check the unique values in the 'cell_type' column after subsetting to ensure that subsetting worked
adata_glial.obs["major"].unique()

```

['Glial']
Categories (1, object): ['Glial']

Code

```{python}
# view basic qc metrics
sc.pl.violin(adata_glial, ['n_genes_by_counts', 'total_counts', 'pct_counts_mt', 'pct_counts_rp'], 
             jitter=0.4, multi_panel=True)

```

Code

```{python}
# view basic qc metrics
sc.pl.violin(adata_glial, ['n_genes_by_counts', 'total_counts', 'pct_counts_mt', 'pct_counts_rp'], 
             jitter=0.4, multi_panel=True)

upper_lim = np.quantile(adata_glial.obs.n_genes_by_counts.values, .98)

```

Code

```{python}
upper_lim
```

2982.720000000001

Code

```{python}
# filter the cells with high genes by count values
adata_glial = adata_glial[adata_glial.obs.n_genes_by_counts < upper_lim]

```

Code

```{python}
 # filter the cells with high mt percentages
adata_glial = adata_glial[adata_glial.obs.pct_counts_mt < 30]

```

Code

```{python}
# filter the cells with high ribosomal RNA gene expression
adata_glial = adata_glial[adata_glial.obs.pct_counts_rp < 17.5]

```

Code

```{python}
# view basic qc metrics
sc.pl.violin(adata_glial, ['n_genes_by_counts', 'total_counts', 'pct_counts_mt', 'pct_counts_rp'], 
             jitter=0.4, multi_panel=True)

```

Code

```{python}
# Create the UMAP plot
sc.pl.umap(
    adata_glial,
    color="major",
    add_outline=True,
    legend_loc="on data",
    show=False  # Do not immediately show the plot
)

# Customize the axis ticks
plt.gca().tick_params(axis="both", which="both", length=5, labelsize=10)  # Adjust size as needed
plt.xlabel("UMAP1", fontsize=12)  # Add x-axis label
plt.ylabel("UMAP2", fontsize=12)  # Add y-axis label

# Show the updated plot
plt.show()

```

Clean up UMAP to exclude outlier cells

Code

```{python}
# some cells are outliers from the core umap and need to be removed
import pandas as pd

# Extract UMAP coordinates
umap_adata_glia_df = pd.DataFrame(adata_glial.obsm["X_umap"], columns=["UMAP1", "UMAP2"], index=adata_glial.obs.index)

# Add metadata (e.g., cell type, condition)
umap_adata_glia_df["major"] = adata_glial.obs["major"]
umap_adata_glia_df["minor"] = adata_glial.obs["minor"]
umap_adata_glia_df["Disease"] = adata_glial.obs["Disease"]
umap_adata_glia_df["Inflammation"] = adata_glial.obs["Inflammation"]
umap_adata_glia_df["Disease"] = adata_glial.obs["Disease"]

```

Code

```{python}
# to identify the umap coordinates of these cells, use plotly
import plotly.express as px
import plotly.io as pio

# Set the renderer to 'browser'
pio.renderers.default = "browser"

# Create a Plotly scatter plot
fig = px.scatter(
    umap_adata_glia_df,
    x="UMAP1",
    y="UMAP2",
    color="Inflammation",  # Color by inflammation
    hover_data=["Disease"],  # Display condition on hover
    title="Interactive UMAP Plot"
)

fig.show()

```

Code

```{python}
# Check if indices match. If false, reinded the df
print(adata_glial.obs.index.equals(umap_adata_glia_df.index))

```

True

Code

```{python}
# Reindex the DataFrame to align with adata_glial.obs
umap_adata_glia_df = umap_adata_glia_df.reindex(adata_glial.obs.index)

# Subset the AnnData object
adata_glial = adata_glial[
    (umap_adata_glia_df["UMAP1"] <= 2.9),
    :
].copy()

```

Code

```{python}
# Reindex the DataFrame to align with adata_glial.obs
umap_adata_glia_df = umap_adata_glia_df.reindex(adata_glial.obs.index)

adata_glial = adata_glial[
    (umap_adata_glia_df["UMAP2"] <= 5),
    :
].copy()

```

Code

```{python}
# Reindex the DataFrame to align with adata_glial.obs
umap_adata_glia_df = umap_adata_glia_df.reindex(adata_glial.obs.index)

adata_glial = adata_glial[
    (umap_adata_glia_df["UMAP2"] >= -3.5),
    :
].copy()

```

Code

```{python}
print(f"Number of cells in the subset: {adata_glial.n_obs}")

```

Number of cells in the subset: 13440

Code

```{python}
# check the umap of the subsetted object
sc.pl.umap(adata_glial, color="major", add_outline=True, legend_loc="on data")

```

Perform Glia clustering and perform clusterwise DE

Code

```{python}
# Louvain clustering
sc.tl.leiden(adata_glial, resolution=0.1)
sc.pl.umap(adata_glial, color="leiden", legend_loc="on data", size=40, add_outline=True, save="TAURUS_EGC_Clustering.png")  # Visualize clusters

```

WARNING: saving figure to file figures/umapTAURUS_EGC_Clustering.png

Code

```{python}
# Normalize the data
sc.pp.normalize_total(adata_glial, target_sum=1e4)

# Find marker genes for clusters
sc.tl.rank_genes_groups(adata_glial, groupby="leiden", method="t-test")

# Extract DE results from adata_glial
result = adata_glial.uns['rank_genes_groups']
groups = result['names'].dtype.names  # Cluster/group names


# Create an empty list to store data for all clusters
all_data = []

# Loop through each group (cluster) and extract data
for group in groups:
    group_data = pd.DataFrame({
        'cluster': group,  # Add cluster/group name
        'gene': result['names'][group],
        'pval': result['pvals'][group],
        'logfoldchange': result['logfoldchanges'][group],
        'score': result['scores'][group]
    })
    all_data.append(group_data)

# Combine all groups into a single DataFrame
de_table = pd.concat(all_data, ignore_index=True)

# Export to CSV
de_table.to_csv("de_per_cluster_EGC_TAURUS_scIBD.csv", index=False)

print("Differential expression results saved to 'de_results_per_cluster.csv'")

```

Differential expression results saved to 'de_results_per_cluster.csv'

Code

```{python}
# Generate violin plots for top 20 DE genes
sc.pl.rank_genes_groups_violin(adata_glial, n_genes=20)

```

Code

```{python}
# Heatmap
sc.pl.heatmap(adata_glial, var_names=["PLP1", "S100B", "MPZ", "SPP1", "MAL", "NRXN1", "CD74", "STAT1", "SOCS3", "CXCL9"], 
              groupby=["Inflammation", "Disease"], vmax=8, dendrogram=True)

# Dotplot
sc.pl.dotplot(adata_glial, var_names=["PLP1", "S100B", "SOX10", "MAL", "MPZ", "CCK", "CD74", "DCN", "SPP1", "GFRA3", "NRXN1", "CXCL9"], 
              groupby="leiden")

```

WARNING: dendrogram data not found (using key=dendrogram_Inflammation_Disease). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.

Add and visualize cell death pathway realted genes

Code

```{python}
# load in cell death list
cell_death = pd.read_csv("cell_death_combined_Hs_scIBD.csv")

```

Code

```{python}
# df to list
cell_death_combined_list = cell_death["Genes"].tolist()
```

Code

```{python}
# Extract expression data from the adata 
# Check which genes from the list are present in adata.var
genes_in_adata = [gene for gene in cell_death_combined_list if gene in adata_glial.var_names]

# Extract expression data for these genes
expression_data = adata_glial[:, genes_in_adata].X.toarray()  # Converts to dense format if sparse

```

Code

```{python}
import numpy as np

# Calculate the mean expression across selected genes for each cell
mean_expression = np.mean(expression_data, axis=1)  # Axis 1: across genes

```

Code

```{python}
# add mean expression as metadata 
adata_glial.obs["cell_death_signature"] = mean_expression

```

Code

```{python}
# List all metadata columns
print(adata_glial.obs.columns)

```

Index(['sample_id', 'Patient', 'Disease', 'Site', 'Treatment',
       'Disease_duration', 'Inflammation', 'Age', 'Gender', 'Ethnicity',
       'Inflammation_score', 'Ileum_vs_Colon', 'LibraryType', 'CellsLoaded',
       'Match', 'Batch', 'doublet_scores', 'predicted_doublets',
       'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt',
       'total_counts_rp', 'pct_counts_rp', 'total_counts_hb', 'pct_counts_hb',
       'total_counts_ig', 'pct_counts_ig', 'S_score', 'G2M_score', 'phase',
       'cellbarcode', 'diff', 'final_analysis', 'minor', 'major', 'sub_bucket',
       'bucket', 'Remission_status', 'leiden', 'cell_death_signature'],
      dtype='object')

Code

```{python}
# subset to create disease specific objects
adata_EGC_UC = adata_glial[adata_glial.obs["Disease"] == "UC"]

adata_EGC_CD = adata_glial[adata_glial.obs["Disease"] == "CD"]

adata_EGC_CTRL = adata_glial[adata_glial.obs["Disease"] == "Healthy"]

```

Code

```{python}
adata_EGC_UC.obs["Inflammation"].value_counts()
```

Inflammation
Inflamed        3710
Non_Inflamed    3209
Name: count, dtype: int64

Code

```{python}
adata_EGC_CD.obs["Inflammation"].value_counts()
```

Inflammation
Non_Inflamed    4052
Inflamed        1408
Name: count, dtype: int64

Code

```{python}
adata_EGC_CTRL.obs["Inflammation"].value_counts()
```

Inflammation
Healthy    1061
Name: count, dtype: int64

Code

```{python}
sc.pl.umap(adata_EGC_CTRL, color = ["PLP1", "CXCL9", "cell_death_signature", "Inflammation"], cmap="viridis", save="umap_CTRL_EGCs_markers.png",add_outline=True,
          size=50, alpha=0.7, vmax=3)
```

WARNING: saving figure to file figures/umapumap_CTRL_EGCs_markers.png

Code

```{python}
sc.pl.umap(adata_EGC_CTRL, color = ["PLP1", "CXCL9", "cell_death_signature", "Inflammation"], cmap="viridis", save="umap_CTRL_EGCs_markers_PLP.png",add_outline=True,
          size=50, alpha=0.7, vmax=25)
```

WARNING: saving figure to file figures/umapumap_CTRL_EGCs_markers_PLP.png

Code

```{python}
sc.pl.umap(adata_EGC_CD, color = ["PLP1", "CXCL9", "cell_death_signature", "Inflammation"], cmap="viridis", save="umap_CD_EGCs_markers.png", add_outline=True,
         size=50, alpha=0.7, vmax=3)
```

WARNING: saving figure to file figures/umapumap_CD_EGCs_markers.png

Code

```{python}
sc.pl.umap(adata_EGC_CD, color = ["PLP1", "CXCL9", "cell_death_signature", "Inflammation"], cmap="viridis", save="umap_CD_EGCs_markers_PLP.png", add_outline=True,
         size=50, alpha=0.7, vmax=25)
```

WARNING: saving figure to file figures/umapumap_CD_EGCs_markers_PLP.png

Code

```{python}
sc.pl.umap(adata_EGC_UC, color = ["PLP1", "CXCL9", "cell_death_signature", "Inflammation"], cmap="viridis", save="umap_UC_EGCs_markers.png", add_outline=True,
         size=50, alpha=0.7, vmax=3)
```

WARNING: saving figure to file figures/umapumap_UC_EGCs_markers.png

Code

```{python}
sc.pl.umap(adata_EGC_UC, color = ["PLP1", "CXCL9", "cell_death_signature", "Inflammation"], cmap="viridis", save="umap_UC_EGCs_markers_PLP.png", add_outline=True,
         size=50, alpha=0.7, vmax=25)
```

WARNING: saving figure to file figures/umapumap_UC_EGCs_markers_PLP.png

Code

```{python}
display(adata_glial.obs.head())
```

	sample_id	Patient	Disease	Site	Treatment	Disease_duration	Inflammation	Age	Gender	Ethnicity	...	cellbarcode	diff	final_analysis	minor	major	sub_bucket	bucket	Remission_status	cell_death_signature
AATGGCTAGTTCAACC-1-CID003352-2	CID003352-2	UC1	UC	Rectum	Post	29.0	Inflamed	26	M	Caucasian	...	AATGGCTAGTTCAACC-1-CID003352-2	8.0	Glia	Glial	Glial	Glial	Stroma	Non_Remission	0.765580
AGCGCTGGTACCTGTA-1-CID003352-2	CID003352-2	UC1	UC	Rectum	Post	29.0	Inflamed	26	M	Caucasian	...	AGCGCTGGTACCTGTA-1-CID003352-2	8.0	Glia	Glial	Glial	Glial	Stroma	Non_Remission	1.871865
AGCGTATTCGTGCTCT-1-CID003352-2	CID003352-2	UC1	UC	Rectum	Post	29.0	Inflamed	26	M	Caucasian	...	AGCGTATTCGTGCTCT-1-CID003352-2	8.0	Glia	Glial	Glial	Glial	Stroma	Non_Remission	1.157531
ATGCCTCAGAGTGTGC-1-CID003352-2	CID003352-2	UC1	UC	Rectum	Post	29.0	Inflamed	26	M	Caucasian	...	ATGCCTCAGAGTGTGC-1-CID003352-2	8.0	Glia	Glial	Glial	Glial	Stroma	Non_Remission	1.925165
ATTCACTAGTATAGAC-1-CID003352-2	CID003352-2	UC1	UC	Rectum	Post	29.0	Inflamed	26	M	Caucasian	...	ATTCACTAGTATAGAC-1-CID003352-2	8.0	Glia	Glial	Glial	Glial	Stroma	Non_Remission	0.768600

5 rows × 41 columns

Code

```{python}
# split violin plots by remission split by inflammation
sc.pl.violin(adata_glial,
            keys=["PLP1", "CXCL9", "cell_death_signature"],
            groupby="Inflammation",
            jitter=True,
            stripplot=True,
            rotation=45)

```

Code

```{python}
# split violin plots by remission split by disease
sc.pl.violin(adata_glial,
            keys=["PLP1", "CXCL9", "cell_death_signature"],
            groupby="Disease",
            jitter=True,
            stripplot=True,
            rotation=45, save="violins_PLP1_CXCL9_celldeath_disease.png")
```

WARNING: saving figure to file figures/violinviolins_PLP1_CXCL9_celldeath_disease.png

Code

```{python}
# export requirements file
!echo "# Python version: $(python --version)" > requirements_TAURUS.txt
!pip freeze >> requirements_TAURUS.txt

```