#与Edwards A. Elizabeth的database的rdhA_nt_all_customized_2022_04_06_23_50.fasta对比
(方法参照《Genomic characterization of Dehalococcoidesmccartyi strain JNA that reductively dechlorinates tetrachloroethene andpolychlorinated biphenyls》)
#第二次注释,直接从第四步开始就好!
-
下载RdhA aa序列(Reductive DehalogenaseDatabase - RDaseDB (utoronto.ca))rdhA_nt_all_customized_2022_04_06_23_50.fasta
-
将序列上传到Clustal Omega(https://www.ebi.ac.uk/Tools/msa/clustalo/)默认参数运行,获取对齐序列,注意RDaseDB_202411.hmm使用蛋白序列构建!

-
将对齐序列导入hmmer3(参照hummer3’ document:Searching a sequencedatabase with a profile)
docker start metage
docker attach metage
hmmbuild RdhA.hmm RdhAclustalo-I20230321-131948-0151-95221659-p1m.clustal_num

-
OFRs与RdhA.hmm比对
OFRs(metabat.71.gene.fnn)与RdhA数据库(这是用核酸序列构建的)比对(参照hummer3’ document:Searching a sequencedatabase with a profile)1.1#进入docker megate环境
RdhA.hmm保存在/home/ys2022/gene/rdhA/database/RdhA.hmm
hmmsearch /home/ys2022/rdhA/database/RdhA.hmm bin.1.orig.Prodigal_nucl.fna > RdhAbin.1.orig.Prodigal_nucl.list
宏基因组直接以cluster之后的基因核酸序列:3.gene/uniq_genes_nucl.fna
Bin需要先做prodigal预测:
prodigal -i metabat.3.fa -o metabat.3.gene.coords.gbk -d metabat.3.Prodigal_nucl.fna -a metabat.3.Prodigal_prot.faa
再以预测的基因核酸序列作为hmmsearch的输入
用conda鉴定
conda activate snakemake
prodigal -i bin.11.orig.fa -o bin.11.orig.gbk -d bin.11.orig.Prodigal_nucl.fna -a bin.11.orig.Prodigal_prot.faa
hmmsearch /ZYdata1/ys2022/gene/rdhA/HMM/RdhA202411.hmm bin.11.orig.Prodigal_prot.faa > RdhAbin.11.orig.faa.list
1.2 挑选基因序列
从faa中筛选指定的序列,提前将序列名称保存在A.txt
from Bio import SeqIO
# 读取基因名称列表
with open("A.txt", "r") as gene_file:
gene_names = set(line.strip() for line in gene_file)
# 读取并筛选FASTA文件
input_fasta = "bin.1.orig.Prodigal_prot.faa"
output_fasta = "bin.1.orig.All_HMMsearch.faa"
with open(output_fasta, "w") as out_file:
for record in SeqIO.parse(input_fasta, "fasta"):
if record.id in gene_names:
SeqIO.write(record, out_file, "fasta")
print("筛选完成并保存到", output_faa)
# 读取并筛选FASTA文件
input_fasta1 = "bin.1.orig.Prodigal_nucl.fna"
output_fasta1 = "bin.1.orig.All_HMMsearch.fna"
with open(output_fasta1, "w") as out_file1:
for record in SeqIO.parse(input_fasta1, "fasta"):
if record.id in gene_names:
SeqIO.write(record, out_file1, "fasta")
print("筛选完成并保存到", output_fna)
将序列上NCBI blast NR数据库,跳转到结果页面后,点击Alignments,点击genBank,查看成功比对上的数据库基因的功能描述!
关于宏基因组中rdhA注释问题:序列选择>1300 bp或者> 400 aa作为rdhA的输出结果,因为小于这个长度的rdhA并不完整。基于InterPro,blast这些rdhA的domain,发现短于这个长度的rdhA可能只含有RDH_dom,并不完整。但是它们的功能仍是需要探究的!
宏基因组中用megahit组装输出contig用于hmm预测rdhAbin,用metawrap从read开始,经过BIN_REASSEMBLY后获得的bin用于hmm预测rdhA