SigProfilerMatrixGenerator中使用自定义基因组的方法指南-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_07663/article/details/148271305

SigProfilerMatrixGenerator中使用自定义基因组的方法指南

SigProfilerMatrixGenerator SigProfilerMatrixGenerator creates mutational matrices for all types of somatic mutations. It allows downsizing the generated mutations only to parts for the genome (e.g., exome or a custom BED file). The tool seamlessly integrates with other SigProfiler tools. 项目地址: https://gitcode.com/gh_mirrors/si/SigProfilerMatrixGenerator

前言

在生物信息学分析中，使用参考基因组是许多分析流程的基础步骤。SigProfilerMatrixGenerator作为突变特征分析的重要工具，默认支持多种常见参考基因组。然而，当研究人员需要分析非模式生物或特殊基因组时，就需要使用自定义基因组。本文将详细介绍如何在SigProfilerMatrixGenerator中安装和使用自定义基因组。

准备工作

在使用自定义基因组前，需要准备以下两个关键文件：

基因组FASTA文件：需要按染色体分割，每个染色体的序列保存为单独的gzip压缩文件。建议从权威来源如UCSC Genome Browser获取，并排除非标准染色体。
转录本注释文件：需要从BioMart等工具获取，选择包含以下关键字段：
- 基因稳定ID
- 转录本稳定ID
- 染色体名称
- 转录本起始位置
- 转录本终止位置
- 链信息
- 基因名称
- 转录本名称

安装自定义基因组

使用以下Python代码安装自定义基因组：

from SigProfilerMatrixGenerator import install as genInstall

genInstall.install('custom_genome_name', 
                  custom=True,
                  fastaPath='/path/to/fasta/files/',
                  transcriptPath='/path/to/transcript_file.txt',
                  exomePath=None)

安装完成后，系统会提示"Installation complete"，表示参考文件已成功创建。

常见问题解决

在安装后使用自定义基因组时，可能会遇到"没有校验和信息"的错误。这是因为SigProfilerMatrixGenerator需要验证参考文件的完整性。解决方法如下：

找到项目中的reference_genome_manager.py文件
为自定义基因组添加相应的md5sum校验值
更新TSB文件列表

使用自定义基因组生成突变矩阵

成功安装后，可以使用以下命令生成突变特征矩阵：

matrices = matGen.SigProfilerMatrixGeneratorFunc(
    "project_name",
    "custom_genome_name",
    "/output/directory/",
    plot=True,
    exome=False,
    bed_file=None,
    chrom_based=False,
    tsb_stat=False,
    seqInfo=False,
    cushion=100
)