Asthma Transcriptome Clustering

Introduction

Asthma, a chronic inflammatory airway disease, affects millions globally, necessitating continuous exploration of effective treatments. Focusing on the human airway smooth muscle (HASM) transcriptome, this exploration leverages RNA-Seq technology, shedding light on the gene expression patterns under various treatment conditions.

Background: Asthma and Treatment Modalities

Asthma is characterized by chronic inflammation in the airways, leading to symptoms such as wheezing, shortness of breath, and coughing. Commonly employed medications for asthma management include **\(\beta_2\)-agonists and glucocorticosteroids. These drugs primarily target the airway smooth muscle, aiming to alleviate bronchoconstriction and inflammation.

Treatments

RNA-Seq analysis sourced from the GEO database was conducted under four distinct treatment conditions in HASM cells:

No Treatment (Baseline): To understand the natural state of HASM cells.
\(\beta_2\)-Agonist Treatment (Albuterol, 1\(\mu\)M for 18h): Mimicking the effects of a common bronchodilator.
Glucocorticosteroid Treatment (Dexamethasone, 1\(\mu\) for 18h): Representing anti-inflammatory therapy.
Simultaneous Treatment with \(\beta_2\)-Agonist and Glucocorticosteroid: Investigating potential synergistic effects.

Data Acquisition and Processing

import pandas as pd
import gdown
import numpy as np
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt

url = 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE52778&format=file&file=GSE52778%5FAll%5FSample%5FFPKM%5FMatrix%2Etxt%2Egz'
gdown.download(url, 'matrix.gz', quiet=True)
df = pd.read_csv('matrix.gz', compression='gzip', header=0, sep='\s', index_col=0, engine='python')

df = df.iloc[:, 24::]
df['Total'] = df.sum(axis=1)
df = df.sort_values(by='Total', ascending=False)
df = df.drop('Total', axis=1)
df = df.iloc[0:1000, ::]

The gene expression data obtained from the GEO dataset (GSE52778) was processed using Python’s Pandas library, focusing on the top 1000 genes based on expression levels.

Correlation Analysis and Hierarchical Clustering

gene_corr = np.corrcoef(df)

linkage_matrix = linkage(gene_corr, method='average')
dendrogram(linkage_matrix)
plt.title('Dendrogram of Genes Based on Correlation')
plt.xlabel('Genes')
plt.ylabel('Distance')
plt.show()

A correlation matrix was calculated, and hierarchical clustering was employed to visually represent the relationships between genes based on their expression patterns. The resulting dendrogram highlights distinct clusters of genes responding differently to the treatment conditions.