ATAC-seq【Harvard FAS Informatics】

本文详细阐述了ATAC-seq数据的质量评估标准,包括生物和技术重复次数、非冗余片段、PCR扩增影响控制、peak文件要求等。此外,讲解了实验设计、数据预处理步骤、主要分析工具如Genrich和MACS2的使用,以及后续的可视化、比较和注释。

ATAC-seq数据质量评估注意

ENCODE的ATACseq数据标准

Uniform Processing Pipeline Restrictions

  • The read length prior to any trimming should be a minimum of 45 base pairs.
  • Sequencing may be paired- or single-ended, sequencing type is specified and paired sequences are indicated.
  • All Illumina platforms are supported for use in the uniform pipeline, though data from different platforms should be processed separately; colorspace (SOLiD) reads are not supported.
  • Barcodes, if present in the fastq, must be indicated.
  • Library insert size range must be indicated.

Current Standards

  1. 必须有两次或更多次生物学重复(稀有样本也必须做两次技术重复
  2. 每次重复要有25million非冗余,非线粒体,能够回帖的fragment(单端25 million reads,双端50 million reads=25 million fragment)
  3. 回帖率>95%, >80%可接受。
  4. IDR(Irreproducible Discovery Rate)计算重复一致性,rescue和self consisty ratios 都>2
  5. 用以下指标控制PCR扩增对文库复杂性的影响: Non-Redundant Fraction (NRF) and PCR Bottlenecking Coefficients 1 and 2, or PBC1 and PBC2:NRF>0.9, PBC1>0.9, PBC2>3
  6. peak文件必须满足如下要求:

每个重复peak数>150000,>100000可接受(ENCODE的ATAC-seq的peak file没法用)
IDR peak>70000,>50000可接受
要存在无核小体区NFR
存在单核小体峰,好的ATACseq数据应包含核小体,既能看开放染色质,又能看核小体

  1. The fraction of reads in called peak regions(FRip score)>0.3,>0.2 可以接受。对于稀有样本不要求FRiP但TSS富集还是要作为关键的衡量信噪比的指标。
  2. TSS富集分数阈值与参考基因组相关。

ATACseq 主干分析流程

reference:
1.文章:https://peerj.com/articles/4040/
2.CHIPseq课程:https://github.com/hbctraining/In-depth-NGS-Data-Analysis-Course/tree/master/sessionV/lessons
3.Harvard FAS Informatics - ATAC-seq Guidelines:https://informatics.fas.harvard.edu/atac-seq-guidelines.html


Harvard FAS Informatics - ATAC-seq Guidelines

Experimental design

Detailed protocol

  • Replicates
  • Controls:一般不设置对照。作用有限,费用。没有转座酶处理的样本测序
  • PCR amplification:尽可能少地使用PCR循环来扩增样本,减少干扰
  • Sequencing depth:最佳测序深度取决于参考基因组的大小预期染色质的开放程度。人类样本的研究推荐每个样本超过5000万个reads。
  • Sequencing mode:(1) ATACseq推荐使用paired-end。paired-end sequencing, helps to reduce these alignment ambiguities. (2) we are interested in knowing both ends of the DNA fragments generated by the assay, since the ends indicate where the transposase inserted. (3) PCR duplicates are identified more accurately. PCR duplicates are artifacts of the procedure, and they should be removed as part of the analysis pipeline . Computational programs that remove PCR duplicates typically identify duplicates based on comparing ends of aligned reads. With single-end reads, there is only one position to compare, and so any reads whose 5' ends match are considered duplicates. Thus, many false positives may result, and perfectly good reads are removed from further analysis. Paired-end sequencing, both ends of the original DNA fragments are defined. To be declared a duplicate, both ends of one fragment need to match both ends of another fragment, which is far less likely to occur by chance. Therefore, paired-end sequencing leads to fewer false positives.
  • Mitochondria: 众所周知ATAC-seq数据通常包含很大比例的来自线粒体DNA的reads线粒体DNA是裸露的,也可以被Tn5酶识别切割,植物叶绿体)。线粒体基因组中没有ATAC-seq感兴趣的峰,这些reads在计算中被丢弃,浪费测序资源。可在测序前使用洗涤剂去除样本中的线粒体

Quality control

FastQC

Process a file of 20 million reads in about 5 minutes with less than 250MB memory used. Quality scores, GC levels, PCR duplicates, and adapter content.

Adapter removal

For reads derived from short DNA fragments, the 3' ends may contain portions of the Illumina sequencing adapter.

Cutadapt
NGmerge

Unlike cutadapt, NGmerge does not require that the adapter sequences be provided, nor does it require a parameter for the minimum length of adapter to match (in

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值