TWAS FUSION

TWAS / FUSION (gusevlab.org)

Installation

1
2
3
wget https://github.com/gusevlab/fusion_twas/archive/master.zip
unzip master.zip
cd fusion_twas-master
1
2
wget https://data.broadinstitute.org/alkesgroup/FUSION/LDREF.tar.bz2
tar xjvf LDREF.tar.bz2
1
2
wget https://github.com/gabraham/plink2R/archive/master.zip
unzip master.zip
  • Launch R and install required libraries:
1
2
install.packages(c('optparse','RColorBrewer'))
install.packages('plink2R-master/plink2R/',repos=NULL)

Typical analysis and output——Examples

PGC精神分裂症的gwas summary数据对GTEx全血数据进行TWAS

First, download and prepare the GWAS and GTEx whole blood data:

1
2
3
4
5
wget https://data.broadinstitute.org/alkesgroup/FUSION/SUM/PGC2.SCZ.sumstats
mkdir WEIGHTS
cd WEIGHTS
wget https://data.broadinstitute.org/alkesgroup/FUSION/WGT/GTEx.Whole_Blood.tar.bz2
tar xjf GTEx.Whole_Blood.tar.bz2

可以用**LDSC munge_stats.py**把gwas的格式转为我们需要的。 gwas结果全部输入进去,不要卡阈值。

Finally, we run FUSION.test.R using this data on chromosome 22:

1
2
3
4
5
6
7
Rscript FUSION.assoc_test.R \
--sumstats PGC2.SCZ.sumstats \
--weights ./WEIGHTS/GTEx.Whole_Blood.pos \
--weights_dir ./WEIGHTS/ \
--ref_ld_chr ./LDREF/1000G.EUR. \
--chr 22 \
--out PGC2.SCZ.22.dat

实战——使用FinnGen数据集进行TWAS分析

1
2
3
4
5
6
7
8
9
##1.数据GWAS 下载链接
wget https://storage.googleapis.com/finngen-public-data-r10/summary_stats/finngen_R10_H7_GLAUCCLOSEPRIM.gz

##2.数据处理——转换为可以识别的GWAS类型
zcat finngen_R10_H7_GLAUCCLOSEPRIM.gz \ # 解压缩名为 finngen_R10_H7_GLAUCCLOSEPRIM.gz 的文件
| awk '{print $5"\t"$3"\t"$4"\t"$11"\t"$9"\t"$10"\t"$7"}' \ # 使用 awk 提取感兴趣的列并按指定格式输出
| grep 'rs' \ # 过滤包含 'rs' 的行
| grep -v ',' \ # 过滤掉含有逗号的行
> finngen_R10_H7_GLAUCCLOSEPRIM.gwas # 将结果保存到 finngen_R10_H7_GLAUCCLOSEPRIM.gwas 文件中

下方我们使用ldsc相关软件时候,我们需要进入anaconda的命令行 ,就是安装了相关环境的python中,python2.7,然后munge_sumstats.py执行下列代码

w_hm3.snplist文件从:https://github.com/perslab/CELLECT/blob/master/data/ldsc/w_hm3.snplist

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
###2.TWAS 分析
##处理成TWAS输入格式文件
#清洗数据
munge_sumstats.py \
--sumstats finngen_R10_H7_GLAUCCLOSEPRIM.gwas \
--signed-sumstats beta \
--snp rsids \
--a1 alt \
--a2 ref \
--p pval \
--N 392582 \
--merge-alleles w_hm3.snplist \
--out finngen_R10_H7_GLAUCCLOSEPRIM
--chunksize 500000

##超算平台代码
python "/public/home/fanfangzhou/R/TWAS/ldsc/munge_sumstats.py" --sumstats finngen_R10_H7_GLAUCCLOSEPRIM.gwas --merge-alleles --N 392582 --snp rsids --a1 alt --a2 ref --p pval --out finngen_R10_H7_GLAUCCLOSEPRIM --chunksize 500000

这里的N指的是研究的样本数量;
finngen_R10_H7_GLAUCCLOSEPRIM.gwas是输出的文件名;
w_hm3.snplist是被纳入分析的SNP,包含三列:包含rs编号、位置、A1(效应等位基因)、A2(无效等位基因)# 这一步可有可无
如果想把所有的SNP位点纳入分析,那么采用这个命令: munge_sumstats.py --sumstats summary.txt --N 17115 --out s

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
##R环境
conda activate twasR

##twas 分析
for i in $(seq 1 22); do Rscript "/public/home/fanfangzhou/R/TWAS/fusion_twas_master/FUSION.assoc_test.R" --sumstats finngen_R10_H7_GLAUCCLOSEPRIM.sumstats.gz --weights "/public/home/fanfangzhou/R/TWAS/GTExV8/GTExv8.EUR.Whole_Blood.pos" --weights_dir "/public/home/fanfangzhou/R/TWAS/GTExV8/" --ref_ld_chr "/public/home/fanfangzhou/R/TWAS/fusion_twas_master/LDREF/1000G.EUR." --chr $i --out qgy_blood_$i.twas; done

for i in $(seq 1 22); do
Rscript "/public/home/fanfangzhou/R/TWAS/fusion_twas-master/FUSION.assoc_test.R" \
--sumstats finngen_R10_H7_GLAUCCLOSEPRIM.sumstats.gz \
--weights "/public/home/fanfangzhou/R/TWAS/GTExV8/GTExv8.EUR.Whole_Blood.pos" \
--weights_dir "/public/home/fanfangzhou/R/TWAS/GTExV8/" \
--ref_ld_chr "/public/home/fanfangzhou/R/TWAS/fusion_twas-master/LDREF/1000G.EUR." \
--chr $i \
--out qgy_blood_$i.twas
done

上述使用twas分析需要载入R 所以我们需要在anaconda切换到R所在的环境,并且也要安装需要的R包

weight就是参考使用的,这里用的GTExV8里面的

实战2——LUAD

在之前一定要配置好相关文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
ssh admin2
screen -ls
screen -r +地址 ##进入screen操作页面
cd "/public/home/fanfangzhou/R/2024science/10.LUAD/1.SMR/"
source activate /public/home/fanfangzhou/anaconda/anaconda3/envs/ldsc ##这里就是之前LDSC配置的python2.7环境

#LUAD.ma的数据包括:SNP A1 A2 freq b se p n 这些数据,是按照标准的SMR需要的gwas格式整理的
python "/public/home/fanfangzhou/R/TWAS/ldsc/munge_sumstats.py" --sumstats LUAD.ma --N 66756 --out LUAD --chunksize 500000

conda activate /public/home/fanfangzhou/anaconda/anaconda3/envs/twasR
##twas 分析:whole blood
for i in $(seq 1 22); do Rscript "/public/home/fanfangzhou/R/TWAS/fusion_twas_master/FUSION.assoc_test.R" --sumstats LUAD.sumstats.gz --weights "/public/home/fanfangzhou/R/TWAS/GTExV8/GTExV8.EUR/GTExv8.EUR.Whole_Blood.pos" --weights_dir "/public/home/fanfangzhou/R/TWAS/GTExV8/GTExV8.EUR/" --ref_ld_chr "/public/home/fanfangzhou/R/TWAS/fusion_twas_master/LDREF/1000G.EUR." --chr $i --out TWAS_Whole_Blood_$i.twas; done

##twas 分析:lung
for i in $(seq 1 22); do Rscript "/public/home/fanfangzhou/R/TWAS/fusion_twas_master/FUSION.assoc_test.R" --sumstats LUAD.sumstats.gz --weights "/public/home/fanfangzhou/R/TWAS/GTExV8/GTExV8.EUR/GTExv8.EUR.Lung.pos" --weights_dir "/public/home/fanfangzhou/R/TWAS/GTExV8/GTExV8.EUR/" --ref_ld_chr "/public/home/fanfangzhou/R/TWAS/fusion_twas_master/LDREF/1000G.EUR." --chr $i --out LUAD_lung_$i.twas; done

TWAS 相关数据库

TWAS 可使用的参数模型

GTEx v8 multi-tissue expression

Each archive contains two sets of pos files, one for genes with significant heritability and one for all genes (labeled no_filter). Using genes that achieved significant heritability is recommended for typical analyses. Using weights from “All Samples” will also typically increase sensitivity, unless analyzing highly European-specific regions. A detailed comparison of models by population and GTEx version is provided here. Positions in the pos files are taken from GTEx annotations.

Weights were kindly estimated and provided by Junghyun Jung in the Mancuso lab.

Tissue All Samples link EUR Samples link
Adipose - Subcutaneous 581 download 479 download
Adipose - Visceral (Omentum) 469 download 393 download
Adrenal Gland 233 download 194 download
Artery - Aorta 387 download 329 download
Artery - Coronary 213 download 175 download
Artery - Tibial 584 download 476 download
Brain - Amygdala 129 download 119 download
Brain - Anterior cingulate cortex (BA24) 147 download 135 download
Brain - Caudate (basal ganglia) 194 download 172 download
Brain - Cerebellar Hemisphere 175 download 157 download
Brain - Cerebellum 209 download 188 download
Brain - Cortex 205 download 183 download
Brain - Frontal Cortex (BA9) 175 download 157 download
Brain - Hippocampus 165 download 150 download
Brain - Hypothalamus 170 download 156 download
Brain - Nucleus accumbens (basal ganglia) 202 download 181 download
Brain - Putamen (basal ganglia) 170 download 153 download
Brain - Spinal cord (cervical c-1) 126 download 115 download
Brain - Substantia nigra 114 download 100 download
Breast - Mammary Tissue 396 download 329 download
Skin - Transformed fibroblasts 483 download 403 download
Blood - EBV-transformed lymphocytes 147 download 113 download
Colon - Sigmoid 318 download 266 download
Colon - Transverse 368 download 294 download
Esophagus - Gastroesophageal Junction 330 download 275 download
Esophagus - Mucosa 497 download 411 download
Esophagus - Muscularis 465 download 385 download
Heart - Atrial Appendage 372 download 316 download
Heart - Left Ventricle 386 download 327 download
Kidney - Cortex 73 download 65 download
Liver 208 download 178 download
Lung 515 download 436 download
Minor Salivary Gland 144 download 114 download
Muscle - Skeletal 706 download 588 download
Nerve - Tibial 532 download 438 download
Ovary 167 download 138 download
Pancreas 305 download 243 download
Pituitary 237 download 219 download
Prostate 221 download 181 download
Skin - Not Sun Exposed (Suprapubic) 517 download 430 download
Skin - Sun Exposed (Lower leg) 605 download 508 download
Small Intestine - Terminal Ileum 174 download 141 download
Spleen 227 download 179 download
Stomach 324 download 260 download
Testis 322 download 272 download
Thyroid 574 download 482 download
Uterus 129 download 107 download
Vagina 141 download 120 download
Whole Blood 670 download 558 download

For reproducibility, legacy models from GTEx v7 are available for significant genes and all genes. Legacy models from GTEx v6 are also available for significant genes. See here for a comparison of GTEx v6 and v7 model performance.