甲基化RRBS流程:原理+bismark使用

本文介绍了表观遗传学的概念及其两个主要研究方向:组蛋白修饰和DNA甲基化。着重讨论了DNA甲基化在基因沉默、基因组稳定性等方面的作用,并解释了错误甲基化的潜在后果。此外,还详细介绍了通过Bisulfite-Sequencing技术测量DNA甲基化程度的方法及Bismark软件的工作原理。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Introduction

Epigenetics(表观遗传学)
在核苷酸序列不发生改变的情况下,基因表达的可遗传的变化的一门遗传学分支学科。
方向:
1、组蛋白修饰(accessibility/compaction可接近性和紧束状态)
2、甲基化

甲基化种类:

维持性甲基化作用
DNA甲基化可以调节
1、沉默基因表达
2、重复序列可以保持基因组稳定性
错误的甲基化会导致:
1、早期发育错误
2、表观遗传综合征
3、癌症
等位基因的不同表达:起因是 CGI(CpG island)的DNA甲基化
(PS:CpG表示核苷酸对,其中G在DNA链中紧随C后。CpG对很少出现在人类基因中。然而,在许多基因的启动子(promotor)或转录起始位点(transcription start site,TSS)区域周围,甲基化经常被抑制。这些区域包含浓度相对较高的CpG对,与染色体一起称作CpG岛,其长度通常在几百到几千核苷酸的长度内变化。)
而且DNA甲基化会在DNA重组的时候重置。

通过Bisulfite-Sequencing测量DNA甲基化程度

Bismark运行原理:
Bisulfite将序列正负链的C全部转换为T,所以也要将基因组序列进行转换。
基因组正负链转换很特别。
基因组的正链C->T,才能匹配原正链的reads
基因组的负链C->T相当于正链G->A,才能匹配原负链的reads
然后一条序列可以比对一个基因组的位置(即真实的基因组位置)
输出真实的基因组序列即可
How does Bismark work?

Sequence reads are first transformed into fully bisulfite-converted forward (C->T) and reverse read (G->A conversion of the forward strand) versions, before they are aligned to similarly converted versions of the genome (also C->T and G->A converted). Sequence reads that produce a unique best alignment from the four alignment processes against the bisulfite genomes (which are running in parallel) are then compared to the normal genomic sequence and the methylation state of all cytosine positions in the read is inferred. For use with Bowtie 1, a read is considered to align uniquely if one alignment exists that has with fewer mismatches to the genome than any other alignment (or if there is no other alignment). For Bowtie 2, a read is considered to align uniquely if an alignment has a unique best alignment score (as reported by the Bowtie 2 AS:i field). If a read produces several alignments with the same number of mismatches or with the same alignment score (AS:i field), a read (or a read-pair) is discarded altogether.


Bismark alignment and methylation call report

Upon completion, Bismark produces a run report containing information about the following: - Summary of alignment parameters used - Number of sequences analysed - Number of sequences with a unique best alignment (mapping efficiency) - Statistics summarising the bisulfite strand the unique best alignments came from - Number of cytosines analysed - Number of methylated and unmethylated cytosines - Percentage methylation of cytosines in CpG, CHG or CHH context (where H can be either A, T or C). This percentage is calculated individually for each context following the equation:

% methylation (context) = 100 * methylated Cs (context) / (methylated Cs (context) + unmethylated Cs (context)).

It should be stressed that the percent methylation value (context) is just a very rough calculation performed directly at the mapping step. Actual methylation levels after post-processing or filtering have been applied may vary.

Directional BS-Seq libraries (default)

Bisulfite treatment of DNA and subsequent PCR amplification can give rise to four (bisulfite converted) strands for a given locus. Depending on the adapters used, BS-Seq libraries can be constructed in two different ways:

- If a library is directional, only reads which are (bisulfite converted) versions of the original top strand (OT) or the original bottom strand (OB) will be sequenced. Even though the strands complementary to OT (CTOT) and OB (CTOB) are generated in the BS-PCR step they will not be sequenced as they carry the wrong kind of adapter at their 5’-end. By default, Bismark performs only 2 read alignments to the OT and OB strands, thereby ignoring alignments coming from the complementary strands as they should theoretically not be present in the BS-Seq library in question.
- Alternatively, BS-Seq libraries can be constructed so that all four different strands generated in the BS-PCR can and will end up in the sequencing library with roughly the same likelihood. In this case all four strands (OT, CTOT, OB, CTOB) can produce valid alignments and the library is called non- directional. Specifying --non_directional instructs Bismark to use all four alignment outputs.

To summarise again: alignments to the original top strand or to the strand complementary to the original top strand (OT and CTOT) will both yield methylation information for cytosines on the top strand. Alignments to the original bottom strand or to the strand complementary to the original bottom strand (OB and CTOB) will both yield methylation information for cytosines on the bottom strand, i.e. they will appear to yield methylation information for G positions on the top strand of the reference genome.

For more information about how to extract methylation information of the four different alignment strands please see below in the section on the Bismark methylation extractor.

Bismark在线甲基化分析资料:http://www.bioinformatics.bbsrc.ac.uk/training.html
Bismark原理英文PPT:
### RRBS操作示例代码及教程 在Linux环境下执行RRBS(Reduced Representation Bisulfite Sequencing)分析通常涉及多个工具链。对于数据预处理阶段,Bismark是一个广泛使用的软件包[^1]。 #### 安装依赖项 为了准备运行环境,在终端输入命令安装必要的组件: ```bash sudo apt-get update && sudo apt-get install -y build-essential python3-pip pip3 install --upgrade pip setuptools wheel ``` #### Bismark安装 通过Python的包管理器`pip`来安装Bismark及其依赖库: ```bash pip3 install bismark ``` #### 准备基因组索引文件 创建用于比对的参考序列索引是必需的操作之一。假设FASTA格式的参考基因组位于当前目录下名为`ref_genome.fa`: ```bash bismark_genome_preparation . ``` 此命令会在当前位置建立子文件夹并构建适合后续读段映射所需的二进制资源[^2]。 #### 数据质量控制与适配体修剪 FastQC可用于评估原始测序文件的质量状况;Trimmomatic则负责去除低质量碱基以及接头污染部分。这里给出一个简单的Shell脚本片段作为示范: ```bash fastqc *.fastq.gz -o qc_results/ trimmomatic PE \ S1_R1.fastq.gz S1_R2.fastq.gz \ output_forward_paired.fq.gz output_forward_unpaired.fq.gz \ output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz \ ILLUMINACLIP:adapters.fasta:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 ``` #### 执行双端测序读取比对至参考基因组 利用之前生成好的索引来定位来自样本个体DNA分子上的胞嘧啶修饰状态变化情况: ```bash bismark --bowtie2 -p 8 -1 trimmed_S1_R1.fastq.gz -2 trimmed_S1_R2.fastq.gz ``` 上述指令指定了采用Bowtie2算法加速搜索过程,并开启多线程模式以提高效率[^3]。 #### 提取甲基化位点信息 最后一步是从已成功匹配到染色体位置处提取CpG岛区域内的具体化学修饰特征: ```bash bismark_methylation_extractor --comprehensive --bedGraph --counts *val_*.bam ``` 这些BedGraph格式的结果可以直接导入可视化平台如IGV查看,也可以进一步统计汇总成表格形式便于下游生物信息学挖掘工作开展。
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值