CNV拷贝数变异分析是什么?贴一段TCGA官网的介绍
“The copy number variation (CNV) pipeline uses Affymetrix SNP 6.0 array data to identify genomic regions that are repeated and infer the copy number of these repeats. This pipeline is built onto the existing TCGA level 2 data generated by Birdsuite and uses the DNAcopy R-package to perform a circular binary segmentation (CBS) analysis. CBS translates noisy intensity measurements into chromosomal regions of equal copy number. The final output files are segmented into genomic regions with the estimated copy number for each region. The GDC further transforms these copy number values into segment mean values, which are equal to log2(copy-number/ 2). Diploid regions will have a segment mean of zero, amplified regions will have positive values, and deletions will have negative values.”
目录
1. segment file数据下载和处理
1.1 从TCGA下载数据
下载文件类型:
Copy Number Segment:A table that associates contiguous chromosomal segments with genomic coordinates, mean array intensity, and the number of probes that bind to each segment.
Masked Copy Number Segment:A table with the same information as the Copy Number Segment except that segments with probes known to contain germline mutations are removed
这里我用Masked Copy Number Segment做示范
rm(list = ls())
options(stringsAsFactors = F)
options(scipen = 200)
library(SummarizedExperiment)
library(TCGAbiolinks)
query <- GDCquery(project = "TCGA-BLCA",
data.category = "Copy Number Variation",
data.type = "Masked Copy Number Segment")
GDCdownload(query,method = "api")
BLCA_CNV_download <- GDCprepare(query = query, save = TRUE, save.filename = "BLCA_CNV_download.rda")
1.2 数据处理
#读取rda文件
A=load("C:/Users/Meredith/Desktop/BLCA_CNV_download.rda")
tumorCNV <- eval(parse(text = A))
#改名
tumorCNV=tumorCNV[,2:7]
tumorCNV=tumorCNV[,c('Sample','Chromosome','Start','End','Num_Probes','Segment_Mean')]
write.table(tumorCNV,file = 'BLCA_CNV.txt',sep = '\t',quote = F,row.names = F)

#提取01A结尾的样本(这里我用了python,小伙伴们可以用R来做)
filename = 'BLCA_CNV.txt'
finalResultName = 'segment_file.txt'
read_file = open(filename)
out_file = open(finalResultName,"r+")
for line in read_file.readlines():
data = line.split()
x = data[0][13:

最低0.47元/天 解锁文章
1280

被折叠的 条评论
为什么被折叠?



