Non-overlapping Intervals

本文介绍了一种解决区间重叠问题的方法。通过将区间按结束时间排序,并采用贪心策略,找出最少需要移除的区间数量,以确保剩余区间不重叠。

原题链接: https://leetcode.com/problems/non-overlapping-intervals/description/

题目: Given a collection of intervals, find the minimum number of intervals you need to remove to make the rest of the intervals non-overlapping.

Note:
You may assume the interval’s end point is always bigger than its start point.
Intervals like [1,2] and [2,3] have borders “touching” but they don’t overlap each other.

Solution: 题目给了我们一组区间,让我们求至少需要移除多少个区间,才能使余下的区间没有重叠。那么我们首先给这些区间排序,按照每个区间的结束时间end来排序,越早结束的排在前面。然后,我们可以采用贪心的策略,当我们发现一个区间和另一个区间重叠时,我们总是选择结束时间更早的那个区间,而移除结束时间更晚的那个区间,这样使得剩余的时间更多,可以选择更多的区间。

/**
 * Definition for an interval.
 * struct Interval {
 *     int start;
 *     int end;
 *     Interval() : start(0), end(0) {}
 *     Interval(int s, int e) : start(s), end(e) {}
 * };
 */
class Solution {
public:
    static bool comp(Interval a, Interval b) {
        return a.end < b.end;
    }

    int eraseOverlapIntervals(vector<Interval>& intervals) {
        if (intervals.empty()) return 0;

        sort(intervals.begin(), intervals.end(), comp);

        int needToRemove = 0;
        int curEnd = intervals[0].end;

        for (int i = 1; i < intervals.size(); i++) {
            if (intervals[i].start < curEnd) {
                needToRemove++;
            } else {
                curEnd = intervals[i].end;
            }
        }

        return needToRemove;
    }
};
注意” yahs -h Usage: yahs [options] <contigs.fa> <hic.bed>|<hic.bam>|<hic.pa5>|<hic.bin> Options: -a FILE AGP file (for rescaffolding) [none] -r INT[,INT,...] list of resolutions in ascending order [automate] -R INT rounds to run at each resoultion level [1] -e STR restriction enzyme cutting sites [none] -l INT minimum length of a contig to scaffold [0] -q INT minimum mapping quality [10] --no-contig-ec do not do contig error correction --no-scaffold-ec do not do scaffold error correction --no-mem-check do not do memory check at runtime --file-type STR input file type BED|BAM|PA5|BIN, file name extension is ignored if set --read-length read length (required for PA5 format input) [150] --telo-motif STR telomeric sequence motif -o STR prefix of output files [yahs.out] -v INT verbose level [0] -? print long help with extra option list --version show version number (Hic) [scb3201@ln137%bscc-a6 Hic]$ which juicer_tools ~/anaconda3/envs/Hic/bin/juicer_tools (Hic) [scb3201@ln137%bscc-a6 Hic]$ juicer_tools -h WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARN [2025-12-05T22:53:25,675] [Globals.java:138] [main] Development mode is enabled Juicer Tools Version 2.20.00 Usage: dump <observed/oe> <NONE/VC/VC_SQRT/KR> <hicFile(s)> <chr1>[:x1:x2] <chr2>[:y1:y2] <BP/FRAG> <binsize> [outfile] dump <norm/expected> <NONE/VC/VC_SQRT/KR> <hicFile(s)> <chr> <BP/FRAG> <binsize> [outfile] dump <loops/domains> <hicFile URL> [outfile] pre [options] <infile> <outfile> <genomeID> addNorm <input_HiC_file> [input_vector_file] pearsons [-p] <NONE/VC/VC_SQRT/KR> <hicFile(s)> <chr> <BP/FRAG> <binsize> [outfile] eigenvector -p <NONE/VC/VC_SQRT/KR> <hicFile(s)> <chr> <BP/FRAG> <binsize> [outfile] apa <hicFile(s)> <PeaksFile> <SaveFolder> arrowhead <hicFile(s)> <output_file> hiccups <hicFile> <outputDirectory> hiccupsdiff <firstHicFile> <secondHicFile> <firstLoopList> <secondLoopList> <outputDirectory> validate <hicFile> -h, --help print help -v, --verbose verbose mode -V, --version print version Type juicer_tools <commandName> for more detailed usage instructions (Hic) [scb3201@ln137%bscc-a6 Hic]$ bedtools -h bedtools is a powerful toolset for genome arithmetic. Version: v2.31.1 About: developed in the quinlanlab.org and by many contributors worldwide. Docs: http://bedtools.readthedocs.io/ Code: https://github.com/arq5x/bedtools2 Mail: https://groups.google.com/forum/#!forum/bedtools-discuss Usage: bedtools <subcommand> [options] The bedtools sub-commands include: [ Genome arithmetic ] intersect Find overlapping intervals in various ways. window Find overlapping intervals within a window around an interval. closest Find the closest, potentially non-overlapping interval. coverage Compute the coverage over defined intervals. map Apply a function to a column for each overlapping interval. genomecov Compute the coverage over an entire genome. merge Combine overlapping/nearby intervals into a single interval. cluster Cluster (but don't merge) overlapping/nearby intervals. complement Extract intervals _not_ represented by an interval file. shift Adjust the position of intervals. subtract Remove intervals based on overlaps b/w two files. slop Adjust the size of intervals. flank Create new intervals from the flanks of existing intervals. sort Order the intervals in a file. random Generate random intervals in a genome. shuffle Randomly redistribute intervals in a genome. sample Sample random records from file using reservoir sampling. spacing Report the gap lengths between intervals in a file. annotate Annotate coverage of features from multiple files. [ Multi-way file comparisons ] multiinter Identifies common intervals among multiple interval files. unionbedg Combines coverage intervals from multiple BEDGRAPH files. [ Paired-end manipulation ] pairtobed Find pairs that overlap intervals in various ways. pairtopair Find pairs that overlap other pairs in various ways. [ Format conversion ] bamtobed Convert BAM alignments to BED (& other) formats. bedtobam Convert intervals to BAM records. bamtofastq Convert BAM records to FASTQ records. bedpetobam Convert BEDPE intervals to BAM records. bed12tobed6 Breaks BED12 intervals into discrete BED6 intervals. [ Fasta manipulation ] getfasta Use intervals to extract sequences from a FASTA file. maskfasta Use intervals to mask sequences from a FASTA file. nuc Profile the nucleotide content of intervals in a FASTA file. [ BAM focused tools ] multicov Counts coverage from multiple BAMs at specific intervals. tag Tag BAM alignments based on overlaps with interval files. [ Statistical relationships ] jaccard Calculate the Jaccard statistic b/w two sets of intervals. reldist Calculate the distribution of relative distances b/w two files. fisher Calculate Fisher statistic b/w two feature files. [ Miscellaneous tools ] overlap Computes the amount of overlap from two intervals. igv Create an IGV snapshot batch script. links Create a HTML page of links to UCSC locations. makewindows Make interval "windows" across a genome. groupby Group by common cols. & summarize oth. cols. (~ SQL "groupBy") expand Replicate lines based on lists of values in columns. split Split a file into multiple files with equal records or base pairs. summary Statistical summary of intervals in a file. [ General Parameters ] --cram-ref Reference used by a CRAM input [ General help ] --help Print this help menu. --version What version of bedtools are you using?. --contact Feature requests, bugs, mailing lists, etc. (Hic) [scb3201@ln137%bscc-a6 Hic]$ bwa -h [main] unrecognized command '-h' (Hic) [scb3201@ln137%bscc-a6 Hic]$ bwa Program: bwa (alignment via Burrows-Wheeler transformation) Version: 0.7.19-r1273 Contact: Heng Li <hli@ds.dfci.harvard.edu> Usage: bwa <command> [options] Command: index index sequences in the FASTA format mem BWA-MEM algorithm fastmap identify super-maximal exact matches pemerge merge overlapping paired ends (EXPERIMENTAL) aln gapped/ungapped alignment samse generate alignment (single ended) sampe generate alignment (paired ended) bwasw BWA-SW for long queries (DEPRECATED) shm manage indices in shared memory fa2pac convert FASTA to PAC format pac2bwt generate BWT from PAC pac2bwtgen alternative algorithm for generating BWT bwtupdate update .bwt to the new format bwt2sa generate SA from BWT and Occ Note: To use BWA, you need to first index the genome with `bwa index'. There are three alignment algorithms in BWA: `mem', `bwasw', and `aln/samse/sampe'. If you are not sure which to use, try `bwa mem' first. Please `man ./bwa.1' for the manual. (Hic) [scb3201@ln137%bscc-a6 Hic]$ assembly-stats usage: stats [options] <list of fasta/q files> Reports sequence length statistics from fasta and/or fastq files options: -l <int> Minimum length cutoff for each sequence. Sequences shorter than the cutoff will be ignored [1] -s Print 'grep friendly' output -t Print tab-delimited output -u Print tab-delimited output with no header line -v Print version and exit“
最新发布
12-06
<think>我们注意到用户之前尝试使用yahs进行Hi-C scaffolding,并试图将结果转换为juicer格式时遇到了一系列问题。根据提供的帮助信息,我们重新设计一个完整的流程,从原始Hi-C数据开始,到最终生成juicer的.hic文件。 步骤: 1. 准备参考基因组索引和染色体大小文件。 2. 使用bwa mem对Hi-C reads进行比对。 3. 使用samtools将比对结果转换为BAM并排序。 4. 使用pairtools对BAM文件进行配对和去重。 5. 使用yahs进行scaffolding。 6. 将yahs的输出转换为juicer格式。 7. 使用juicer_tools生成.hic文件。 由于用户环境中有bwa, samtools, bedtools, pairtools, yahs, juicer_tools等工具,我们编写一个完整的脚本。 注意:我们假设用户有两个fastq文件(read1和read2)和一个参考基因组fasta文件。 我们将脚本分为几个部分,并详细注释。 </think> # 完整的Hi-C分析流程:基于Yahs和Juicer Tools 根据您提供的工具帮助信息,我设计了一个完整的Hi-C分析流程,从原始数据到最终的可视化文件。以下是完整的脚本和详细说明: ```bash #!/bin/bash #SBATCH -p amd_256 #SBATCH -N 1 #SBATCH -n 32 #SBATCH -t 120:00:00 #SBATCH --mem=256G #SBATCH -J hic_analysis #SBATCH -o /public1/home/scb3201/rawdata/Hic/logs/%j_hic_out.log #SBATCH -e /public1/home/scb3201/rawdata/Hic/logs/%j_hic_err.log ########################################################## # 1. 环境设置与路径定义 ########################################################## echo "[INFO] 开始Hi-C分析流程... $(date)" # 激活Conda环境 source ~/.bashrc conda activate Hic || { echo "[ERROR] Hic环境激活失败! $(date)" exit 1 } # 核心路径 BASE_DIR="/public1/home/scb3201/rawdata/Hic" RESULTS_DIR="/public1/home/scb3201/results/genome/Hic" REF_GENOME="/public1/home/scb3201/results/genome/Medaka/genome.nextpolish.medaka.fasta" READ1="${BASE_DIR}/reads/hic_R1.fastq.gz" READ2="${BASE_DIR}/reads/hic_R2.fastq.gz" # 子目录 ALIGN_DIR="${RESULTS_DIR}/alignment" SCAFFOLD_DIR="${RESULTS_DIR}/scaffold" JUICEBOX_DIR="${RESULTS_DIR}/juicebox" LOG_DIR="${BASE_DIR}/logs" TMP_DIR="${BASE_DIR}/tmp" # 创建目录 mkdir -p ${ALIGN_DIR} ${SCAFFOLD_DIR} ${JUICEBOX_DIR} ${LOG_DIR} ${TMP_DIR} # 设置临时目录 export TMPDIR=${TMP_DIR} export _JAVA_OPTIONS="-Xmx200G -Xms64G" ########################################################## # 2. 参考基因组准备 ########################################################## echo "[STEP] 准备参考基因组... $(date)" # 生成BWA索引 if [[ ! -f ${REF_GENOME}.bwt ]]; then echo "[INFO] 创建BWA索引..." bwa index ${REF_GENOME} 2> ${LOG_DIR}/bwa_index.log || { echo "[ERROR] BWA索引创建失败! $(date)" exit 1 } fi # 生成染色体长度文件 if [[ ! -f ${REF_GENOME}.fai ]]; then echo "[INFO] 创建参考基因组索引..." samtools faidx ${REF_GENOME} 2> ${LOG_DIR}/samtools_faidx.log || { echo "[ERROR] 参考基因组索引创建失败! $(date)" exit 1 } fi # 创建染色体大小文件 cut -f1,2 ${REF_GENOME}.fai > ${JUICEBOX_DIR}/chrom.sizes 2> ${LOG_DIR}/chrom_sizes.log || { echo "[ERROR] 染色体大小文件生成失败! $(date)" exit 1 } ########################################################## # 3. Hi-C数据比对 ########################################################## echo "[STEP] 比对Hi-C reads... $(date)" BAM_OUT="${ALIGN_DIR}/aligned.bam" # 使用BWA MEM进行比对 bwa mem -t ${SLURM_NTASKS} -5SP ${REF_GENOME} ${READ1} ${READ2} 2> ${LOG_DIR}/bwa_mem.log | \ samtools view -h -b -F 2304 - > ${BAM_OUT} 2> ${LOG_DIR}/samtools_view.log || { echo "[ERROR] BWA比对失败! $(date)" exit 1 } # 排序BAM文件 echo "[INFO] 排序比对结果..." samtools sort -@ ${SLURM_NTASKS} -m 4G -T ${TMP_DIR}/sort_temp -o ${ALIGN_DIR}/aligned.sorted.bam ${BAM_OUT} 2> ${LOG_DIR}/samtools_sort.log || { echo "[ERROR] BAM排序失败! $(date)" exit 1 } # 索引BAM文件 samtools index ${ALIGN_DIR}/aligned.sorted.bam 2> ${LOG_DIR}/samtools_index.log || { echo "[WARNING] BAM索引创建失败,继续执行... $(date)" } ########################################################## # 4. 使用Yahs进行Scaffolding ########################################################## echo "[STEP] Yahs scaffolding... $(date)" # 转换BAM为Yahs输入格式 echo "[INFO] 准备Yahs输入文件..." samtools view -h ${ALIGN_DIR}/aligned.sorted.bam | \ awk 'BEGIN{OFS="\t"} { if($1 ~ /^@/) {print; next} if($7 != "=") next # 只保留正常配对的reads if($6 ~ /^[0-9]+S/) next # 过滤软裁剪 print }' | samtools view -b - > ${SCAFFOLD_DIR}/yahs_input.bam 2> ${LOG_DIR}/yahs_prep.log || { echo "[ERROR] Yahs输入准备失败! $(date)" exit 1 } # 运行Yahs echo "[INFO] 运行Yahs scaffolding..." yahs \ -o ${SCAFFOLD_DIR}/yahs_out \ ${REF_GENOME} \ ${SCAFFOLD_DIR}/yahs_input.bam 2> ${LOG_DIR}/yahs_run.log || { echo "[ERROR] Yahs运行失败! $(date)" exit 1 } ########################################################## # 5. 转换为Juicer格式 ########################################################## echo "[STEP] 转换为Juicer格式... $(date)" # 方法1: 使用Yahs内置功能 echo "[METHOD] 尝试Yahs内置转换..." yahs juicer \ ${SCAFFOLD_DIR}/yahs_out.bin \ ${JUICEBOX_DIR}/chrom.sizes \ ${SCAFFOLD_DIR}/yahs_out_scaffolds_final.agp \ ${REF_GENOME} \ ${JUICEBOX_DIR}/juicer 2> ${LOG_DIR}/yahs_juicer.log if [[ $? -ne 0 ]]; then # 方法2: 手动转换 echo "[METHOD] Yahs内置转换失败,使用备用转换方法..." # 步骤1: 提取有效交互 echo "[SUBSTEP] 提取有效交互..." samtools view -h ${SCAFFOLD_DIR}/yahs_out.bin | \ awk 'BEGIN{OFS="\t"} { if($1 ~ /^@/) next if($7 != "=") next # 只保留正常配对的reads split($1, a, "/"); read_id = a[1] if(read_id != prev_id) { prev_id = read_id next } print $3, $4, $4+length($10), $7, $8, $8+length($10) }' > ${JUICEBOX_DIR}/interactions.bedpe 2> ${LOG_DIR}/interactions_extract.log || { echo "[ERROR] 交互提取失败! $(date)" exit 1 } # 步骤2: 转换为Juicer格式 echo "[SUBSTEP] 转换为Juicer矩阵..." awk 'BEGIN{OFS="\t"} { # Juicer格式: <str1> <chr1> <pos1> <frag1> <str2> <chr2> <pos2> <frag2> # 计算中点位置 pos1 = int(($2+$3)/2) pos2 = int(($5+$6)/2) # 输出交互矩阵 print 0, $1, pos1, 0, 0, $4, pos2, 1 }' ${JUICEBOX_DIR}/interactions.bedpe > ${JUICEBOX_DIR}/juicer.2.txt 2> ${LOG_DIR}/juicer_convert.log || { echo "[ERROR] Juicer格式转换失败! $(date)" exit 1 } # 步骤3: 创建片段文件 echo "[SUBSTEP] 创建片段文件..." awk 'BEGIN{OFS="\t"} { print $1, 0, $2, 1 }' ${JUICEBOX_DIR}/chrom.sizes > ${JUICEBOX_DIR}/juicer.3.txt 2> ${LOG_DIR}/juicer_fragments.log || { echo "[ERROR] 片段文件创建失败! $(date)" exit 1 } # 复制染色体大小文件 cp ${JUICEBOX_DIR}/chrom.sizes ${JUICEBOX_DIR}/juicer.chromsizes fi ########################################################## # 6. 生成.hic文件 ########################################################## echo "[STEP] 生成.hic文件... $(date)" # 排序Juicer文件 echo "[INFO] 排序Juicer输入文件..." LC_ALL=C sort -k2,2d -k6,6d -k1,1d -k5,5d -T ${TMP_DIR} --parallel=${SLURM_NTASKS} \ ${JUICEBOX_DIR}/juicer.2.txt > ${JUICEBOX_DIR}/juicer.2.sorted.txt 2> ${LOG_DIR}/juicer_sort.log || { echo "[WARNING] 排序失败,使用原始文件... $(date)" mv ${JUICEBOX_DIR}/juicer.2.txt ${JUICEBOX_DIR}/juicer.2.sorted.txt } # 使用juicer_tools创建.hic文件 echo "[INFO] 运行juicer_tools..." juicer_tools pre \ -j ${SLURM_NTASKS} \ -t ${TMP_DIR} \ ${JUICEBOX_DIR}/juicer.2.sorted.txt \ ${JUICEBOX_DIR}/aligned.hic \ ${JUICEBOX_DIR}/juicer.chromsizes 2> ${LOG_DIR}/juicer_pre.log || { echo "[ERROR] .hic文件生成失败! $(date)" # 备选方法:直接使用Java调用 echo "[FIX] 尝试Java直接调用..." JUICER_JAR=$(find ${CONDA_PREFIX}/bin -name 'juicer_tools*.jar' | head -1) if [[ -f ${JUICER_JAR} ]]; then java -jar ${JUICER_JAR} pre \ -j ${SLURM_NTASKS} \ -t ${TMP_DIR} \ ${JUICEBOX_DIR}/juicer.2.sorted.txt \ ${JUICEBOX_DIR}/aligned.hic \ ${JUICEBOX_DIR}/juicer.chromsizes 2> ${LOG_DIR}/juicer_java.log || { echo "[CRITICAL] Java调用失败! $(date)" exit 1 } else echo "[CRITICAL] 找不到Juicer Tools JAR文件! $(date)" exit 1 fi } ########################################################## # 7. 结果验证与报告 ########################################################## echo "[STEP] 验证结果... $(date)" # 验证.hic文件 juicer_tools validate ${JUICEBOX_DIR}/aligned.hic > ${LOG_DIR}/validate.log 2>&1 VALIDATION_EXIT=$? # 生成统计报告 echo "==== Hi-C分析结果报告 ====" > ${LOG_DIR}/final_report.txt echo "生成时间: $(date)" >> ${LOG_DIR}/final_report.txt echo "运行时长: $SECONDS 秒" >> ${LOG_DIR}/final_report.txt echo ".hic文件大小: $(du -sh ${JUICEBOX_DIR}/aligned.hic | cut -f1)" >> ${LOG_DIR}/final_report.txt echo "验证结果: $(grep -i 'valid' ${LOG_DIR}/validate.log | head -1)" >> ${LOG_DIR}/final_report.txt echo "Yahs输出文件:" >> ${LOG_DIR}/final_report.txt ls -lh ${SCAFFOLD_DIR}/yahs_out* >> ${LOG_DIR}/final_report.txt # 染色体交互统计 if [[ ${VALIDATION_EXIT} -eq 0 ]]; then echo "染色体交互统计:" >> ${LOG_DIR}/final_report.txt juicer_tools stats ${JUICEBOX_DIR}/aligned.hic >> ${LOG_DIR}/final_report.txt 2> ${LOG_DIR}/stats.log fi ########################################################## # 8. 清理与完成 ########################################################## echo "[INFO] 清理临时文件... $(date)" rm -rf ${TMP_DIR} ${ALIGN_DIR}/aligned.bam ${JUICEBOX_DIR}/interactions.bedpe ${JUICEBOX_DIR}/juicer.2.txt echo "[SUCCESS] Hi-C分析完成! $(date)" echo "结果文件: ${JUICEBOX_DIR}/aligned.hic" echo "报告文件: ${LOG_DIR}/final_report.txt" ``` ## 关键工具使用说明 ### 1. BWA MEM比对 ```bash bwa mem -t 32 -5SP ${REF_GENOME} ${READ1} ${READ2} ``` - `-t 32`: 使用32个线程 - `-5SP`: 特殊参数,适用于Hi-C数据比对 - `-5`: 对于包含/1和/2的read名称使用特殊处理 - `-S`: 跳过配对对齐(适用于长reads) - `-P`: 标记次要对齐位置(用于下游处理) ### 2. Samtools处理 ```bash samtools view -h -b -F 2304 ``` - `-F 2304`: 过滤标志位 - 2304 = 512 (PCR重复) + 1024 (光学重复) + 768 (未映射/未配对) - 保留正常的配对比对结果 ### 3. Yahs scaffolding ```bash yahs -o yahs_out ${REF_GENOME} yahs_input.bam ``` - `-o`: 指定输出前缀 - 输入:参考基因组 + 过滤后的BAM文件 - 输出:`.bin` (压缩的交互矩阵), `.agp` (scaffold信息)等 ### 4. Juicer Tools转换 ```bash juicer_tools pre -j 32 -t tmp juicer.2.sorted.txt aligned.hic chrom.sizes ``` - `pre`: 预处理命令 - `-j 32`: 使用32个线程 - `-t tmp`: 指定临时目录 - 输入:排序后的交互矩阵 + 染色体大小文件 - 输出:`.hic` 二进制文件(用于可视化) ## 高级功能扩展 ### 1. 多分辨率分析 ```bash # 使用不同分辨率生成多个.hic文件 RESOLUTIONS=(1000000 500000 100000 50000 10000) for res in ${RESOLUTIONS[@]}; do juicer_tools pre \ -r ${res} \ -j ${SLURM_NTASKS} \ ${JUICEBOX_DIR}/juicer.2.sorted.txt \ ${JUICEBOX_DIR}/aligned_${res}.hic \ ${JUICEBOX_DIR}/juicer.chromsizes done ``` ### 2. 交互矩阵可视化 ```bash # 生成特定区域的交互矩阵 juicer_tools dump observed KR \ ${JUICEBOX_DIR}/aligned.hic \ chr1 chr1 \ BP 100000 \ ${JUICEBOX_DIR}/chr1_matrix.txt # 转换为热图格式 awk 'BEGIN{OFS="\t"} { print $1, $2, $3, $7 # 位置1, 位置2, 计数 }' ${JUICEBOX_DIR}/chr1_matrix.txt > ${JUICEBOX_DIR}/chr1_heatmap.txt ``` ### 3. Scaffolding质量评估 ```bash # 计算N50统计 assembly-stats -t ${SCAFFOLD_DIR}/yahs_out_scaffolds_final.fa > ${LOG_DIR}/scaffold_stats.txt # 比较原始和scaffolded基因组 diamond blastx \ -d nr \ -q ${REF_GENOME} \ -o ${SCAFFOLD_DIR}/original_blast.txt diamond blastx \ -d nr \ -q ${SCAFFOLD_DIR}/yahs_out_scaffolds_final.fa \ -o ${SCAFFOLD_DIR}/scaffolded_blast.txt ``` ## 错误处理与优化建议 ### 1. BWA比对优化 ```bash # 增加内存限制 bwa mem -t 32 -K 100000000 -Y ... ``` - `-K 100000000`: 每次处理100,000,000个reads - `-Y`: 使用soft-clipping进行补充对齐 ### 2. Yahs参数调优 ```bash yahs -r 10000,50000,100000 -R 3 ... ``` - `-r`: 指定分辨率列表(从低到高) - `-R`: 每个分辨率的迭代次数 ### 3. Juicer内存优化 ```bash # 增加Java内存 export _JAVA_OPTIONS="-Xmx200G -Xms64G" # 使用磁盘排序 juicer_tools pre --disk ... ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值