孟德尔随机化——混杂SNP剔除之LDlink（1）

1、注册：LDlink | An Interactive Web Tool for Exploring Linkage Disequilibrium in Population Groups

邮箱会收到12个字符串的token，使用时需要提供token

2、包内方法

LDexpress	Determine if genomic variants are associated with gene expression. 确定基因组变异是否与基因表达相关。
LDhap	Calculates population specific haplotype frequencies of all haplotypes observed for a list of query variants. 计算查询变量列表中观察到的所有单倍型的种群特定单倍型频率。
LDmatrix	Generates a data frame of pairwise linkage disequilibrium statistics. 生成一对联动不平衡统计数据框架
LDpair	Investigates potentially correlated alleles for a pair of variants. 调查一对变异的潜在相关等位基因。
LDpop	Investigates allele frequencies and linkage disequilibrium patterns across 1000 Genomes Project populations. 在1000个基因组计划人群中调查等位基因频率和连锁不平衡模式。
LDproxy	Explore proxy and putative functional variants for a single query variant. 探索单个查询变体的代理和假定的功能变体。
LDproxy_batch	Query LDproxy using a list of query variants, one per line. 使用查询变量列表查询LDproxy，每行一个。
LDtrait	Determine if genomic variants are associated with a trait or disease. 确定基因组变异是否与性状或疾病相关。
list_chips	Provides a data frame listing the names and abbreviation codes for available commercial SNP Chip Arrays from Illumina and Affymetrix. 提供了一个数据框架，列出了来自Illumina和Affymetrix的可用商业SNP芯片阵列的名称和缩写代码。
list_gtex_tissues	Provides a data frame listing the GTEx full names, 'LDexpress' full names (without spaces) and acceptable abbreviation codes of the 54 non-diseased tissue sites collected for the GTEx Portal and used as input for the 'LDexpress' function. 提供一个数据框架，列出GTEx全称、“LDexpress”全称(不含空格)以及为GTEx Portal收集的54个非病变组织位点的可接受缩写代码，这些位点用作“LDexpress”功能的输入。
list_pop	Provides a data frame listing the available reference populations from the 1000 Genomes Project. 提供了一个数据框架，列出了1000个基因组计划中可用的参考种群。
SNPchip	Find commercial genotyping chip arrays for variants of interest. 为感兴趣的变体找到商业基因分型芯片阵列。
SNPclip	Prune a list of variants by linkage disequilibrium. 通过连锁不平衡修剪一组变异。

3、剔除混杂因素主要使用LDtrait

LDtrait(snps,#	
between 1 - 50 variants, using an rsID or chromosome coordinate (e.g. "chr7:24966446"). All input variants must match a bi-allelic variant.pop = "CEU",#人群，可以使用list_pop()查看支持人群缩写r2d = "r2",#	
use "r2" to filter desired output from a threshold based on estimated LD R2 (R squared) or "d" for LD D' (D-prime), default = "r2".r2d_threshold = 0.1,#筛选阈值，小的将被筛除win_size = 5e+05,token = NULL,#输入邮箱获得tokenfile = FALSE,genome_build = "grch37",#在三个选项中任选其一…'grch37'用于基因组构建grch37 (hg19)， 'grch38'用于grch38 (hg38)， 'grch38_high_coverage'用于grch38 High Coverage (hg38) 1000基因组计划数据集。默认为GRCh37 (hg19)。api_root = "https://ldlink.nih.gov/LDlinkRest"
)

4、结果展示：

5、解读

rs123位点与Highest math class taken (MTAG)和Educational attainment (MTAG)有关。

6、剔除原理

先检索每个位点相关的特征，满足混杂特征的位点将其剔除。

7、R函数实现

tcL<-function(snps,trait){as=LDtrait(snps=snps,pop = "ALL",r2d = "r2",r2d_threshold = 0.1,win_size = 5e+05,token = "XXXXXXXX",file = FALSE,genome_build = "grch37",api_root = "https://ldlink.nih.gov/LDlinkRest")#去除含有trait特征的SNPgwas_traits <- as[,"GWAS_Trait"]matches <-sapply(trait, function(x) grep(x, gwas_traits, ignore.case = TRUE))matches_list <- matches[lapply(matches, length) > 0] if (length(matches_list) > 0) {  ma <- unique(Reduce(union, matches_list))  } else {  # 如果没有任何匹配项，则可能返回一个空向量或 NA（取决于您的需求）  ma <- integer(0)  # 返回一个空的整数向量  } matches<-as[ma,]sn1<-unique(matches[,"Query"])snp1<-snps[!(snps %in% sn1)]return(snp1)
}

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/872754.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！