CNV拷贝数变异分析(GISTIC在线分析、maftools)

CNV拷贝数变异分析是什么?贴一段TCGA官网的介绍

“The copy number variation (CNV) pipeline uses Affymetrix SNP 6.0 array data to identify genomic regions that are repeated and infer the copy number of these repeats. This pipeline is built onto the existing TCGA level 2 data generated by Birdsuite and uses the DNAcopy R-package to perform a circular binary segmentation (CBS) analysis. CBS translates noisy intensity measurements into chromosomal regions of equal copy number. The final output files are segmented into genomic regions with the estimated copy number for each region. The GDC further transforms these copy number values into segment mean values, which are equal to log2(copy-number/ 2). Diploid regions will have a segment mean of zero, amplified regions will have positive values, and deletions will have negative values.”

1. segment file数据下载和处理

1.1 从TCGA下载数据

下载文件类型:
Copy Number Segment:A table that associates contiguous chromosomal segments with genomic coordinates, mean array intensity, and the number of probes that bind to each segment.

Masked Copy Number Segment:A table with the same information as the Copy Number Segment except that segments with probes known to contain germline mutations are removed

这里我用Masked Copy Number Segment做示范

rm(list = ls())
options(stringsAsFactors = F)
options(scipen = 200)

library(SummarizedExperiment)
library(TCGAbiolinks)

query <- GDCquery(project = "TCGA-BLCA",
                  data.category = "Copy Number Variation",
                  data.type = "Masked Copy Number Segment")
GDCdownload(query,method = "api")
BLCA_CNV_download <- GDCprepare(query = query, save = TRUE, save.filename = "BLCA_CNV_download.rda")

1.2 数据处理

#读取rda文件
A=load("C:/Users/Meredith/Desktop/BLCA_CNV_download.rda")
tumorCNV <- eval(parse(text = A))

#改名
tumorCNV=tumorCNV[,2:7]
tumorCNV=tumorCNV[,c('Sample','Chromosome','Start','End','Num_Probes','Segment_Mean')]
write.table(tumorCNV,file = 'BLCA_CNV.txt',sep = '\t',quote = F,row.names = F)

BLCA_CNV.txt

#提取01A结尾的样本(这里我用了python,小伙伴们可以用R来做)
filename = 'BLCA_CNV.txt'
finalResultName = 'segment_file.txt'

read_file = open(filename)
out_file = open(finalResultName,"r+")
for line in read_file.readlines():
    data = line.split()
    x = data[0][13:
评论 10
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值