1、注册:LDlink | An Interactive Web Tool for Exploring Linkage Disequilibrium in Population Groups
邮箱会收到12个字符串的token,使用时需要提供token
2、包内方法
LDexpress | Determine if genomic variants are associated with gene expression. 确定基因组变异是否与基因表达相关。 |
LDhap | Calculates population specific haplotype frequencies of all haplotypes observed for a list of query variants. 计算查询变量列表中观察到的所有单倍型的种群特定单倍型频率。 |
LDmatrix | Generates a data frame of pairwise linkage disequilibrium statistics. 生成一对联动不平衡统计数据框架 |
LDpair | Investigates potentially correlated alleles for a pair of variants. 调查一对变异的潜在相关等位基因。 |
LDpop | Investigates allele frequencies and linkage disequilibrium patterns across 1000 Genomes Project populations. 在1000个基因组计划人群中调查等位基因频率和连锁不平衡模式。 |
LDproxy | Explore proxy and putative functional variants for a single query variant. 探索单个查询变体的代理和假定的功能变体。 |
LDproxy_batch | Query LDproxy using a list of query variants, one per line. 使用查询变量列表查询LDproxy,每行一个。 |
LDtrait | Determine if genomic variants are associated with a trait or disease. 确定基因组变异是否与性状或疾病相关。 |
list_chips | Provides a data frame listing the names and abbreviation codes for available commercial SNP Chip Arrays from Illumina and Affymetrix. 提供了一个数据框架,列出了来自Illumina和Affymetrix的可用商业SNP芯片阵列的名称和缩写代码。 |
list_gtex_tissues | Provides a data frame listing the GTEx full names, 'LDexpress' full names (without spaces) and acceptable abbreviation codes of the 54 non-diseased tissue sites collected for the GTEx Portal and used as input for the 'LDexpress' function. 提供一个数据框架,列出GTEx全称、“LDexpress”全称(不含空格)以及为GTEx Portal收集的54个非病变组织位点的可接受缩写代码,这些位点用作“LDexpress”功能的输入。 |
list_pop | Provides a data frame listing the available reference populations from the 1000 Genomes Project. 提供了一个数据框架,列出了1000个基因组计划中可用的参考种群。 |
SNPchip | Find commercial genotyping chip arrays for variants of interest. 为感兴趣的变体找到商业基因分型芯片阵列。 |
SNPclip | Prune a list of variants by linkage disequilibrium. 通过连锁不平衡修剪一组变异。 |
3、剔除混杂因素主要使用LDtrait
LDtrait(snps,#
between 1 - 50 variants, using an rsID or chromosome coordinate (e.g. "chr7:24966446"). All input variants must match a bi-allelic variant.pop = "CEU",#人群,可以使用list_pop()查看支持人群缩写r2d = "r2",#
use "r2" to filter desired output from a threshold based on estimated LD R2 (R squared) or "d" for LD D' (D-prime), default = "r2".r2d_threshold = 0.1,#筛选阈值,小的将被筛除win_size = 5e+05,token = NULL,#输入邮箱获得tokenfile = FALSE,genome_build = "grch37",#在三个选项中任选其一…'grch37'用于基因组构建grch37 (hg19), 'grch38'用于grch38 (hg38), 'grch38_high_coverage'用于grch38 High Coverage (hg38) 1000基因组计划数据集。默认为GRCh37 (hg19)。api_root = "https://ldlink.nih.gov/LDlinkRest"
)
4、结果展示:
5、解读
rs123位点与Highest math class taken (MTAG)和Educational attainment (MTAG)有关。
6、剔除原理
先检索每个位点相关的特征,满足混杂特征的位点将其剔除。
7、R函数实现
tcL<-function(snps,trait){as=LDtrait(snps=snps,pop = "ALL",r2d = "r2",r2d_threshold = 0.1,win_size = 5e+05,token = "XXXXXXXX",file = FALSE,genome_build = "grch37",api_root = "https://ldlink.nih.gov/LDlinkRest")#去除含有trait特征的SNPgwas_traits <- as[,"GWAS_Trait"]matches <-sapply(trait, function(x) grep(x, gwas_traits, ignore.case = TRUE))matches_list <- matches[lapply(matches, length) > 0] if (length(matches_list) > 0) { ma <- unique(Reduce(union, matches_list)) } else { # 如果没有任何匹配项,则可能返回一个空向量或 NA(取决于您的需求) ma <- integer(0) # 返回一个空的整数向量 } matches<-as[ma,]sn1<-unique(matches[,"Query"])snp1<-snps[!(snps %in% sn1)]return(snp1)
}