-e 0.1 -O 5 -m 50 是标准参数,自己看readme就好了,其中-m 设置为50 是表示去除接头后如果read长度小于50我就不要了,因为我是PE150测序的,这也就是为什么要用PE模式来去除接头,保证过滤后的reads还是数量继续平衡的。
cutadapt version 1.9.1
cutadapt removes adapter sequences from high-throughput sequencing reads.
Usage:
cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq
For paired-end reads:
cutadapt -a ADAPT1 -A ADAPT2 [options] -o out1.fastq -p out2.fastq in1.fastq in2.fastq
Replace “ADAPTER” with the actual sequence of your 3’ adapter. IUPAC wildcard
characters are supported. The reverse complement is *not* automatically
searched. All reads from input.fastq will be written to output.fastq with the
adapter sequence removed. Adapter matching is error-tolerant. Multiple adapter
sequences can be given (use further -a options), but only the best-matching
adapter will be removed.
Input may also be in FASTA format. Compressed input and output is supported and
auto-detected from the file name (.gz, .xz, .bz2). Use the file name ‘-’ for
standard input/output. Without the -o option, output is sent to standard output.
Citation:
Marcel Martin. Cutadapt removes adapter sequences from high-throughput
sequencing reads. EMBnet.Journal, 17(1):10-12, May 2011.
Use “cutadapt –help” to see all command-line options.
Options:
–version show program’s version number and exit
-h, –help show this help message and exit
–debug Print debugging information.
-f FORMAT, –format=FORMAT
Input file format; can be either ‘fasta’, ‘fastq’ or
‘sra-fastq’. Ignored when reading csfasta/qual files
(default: auto-detect from file name extension).
Options that influence how the adapters are found:
Each of the three parameters -a, -b, -g can be used multiple times and
in any combination to search for an entire set of adapters of possibly
different types. Only the best matching adapter is trimmed from each
read (but see the –times option). Instead of giving an adapter
directly, you can also write file:FILE and the adapter sequences will
be read from the given FASTA FILE.
-a ADAPTER, –adapter=ADAPTER
Sequence of an adapter that was ligated to the 3’ end.
The adapter itself and anything that follows is
trimmed. If the adapter sequence ends with the ‘$’
character, the adapter is anchored to the end of the
read and only found if it is a suffix of the read.
-g ADAPTER, –front=ADAPTER
Sequence of an adapter that was ligated to the 5’ end.
If the adapter sequence starts with the character ‘^’,
the adapter is ‘anchored’. An anchored adapter must
appear in its entirety at the 5’ end of the read (it
is a prefix of the read). A non-anchored adapter may
appear partially at the 5’ end, or it may occur within
the read. If it is found within a read, the sequence
preceding the adapter is also trimmed. In all cases,
the adapter itself is trimmed.
-b ADAPTER, –anywhere=ADAPTER
Sequence of an adapter that was ligated to the 5’ or
3’ end. If the adapter is found within the read or
overlapping the 3’ end of the read, the behavior is
the same as for the -a option. If the adapter overlaps
the 5’ end (beginning of the read), the initial
portion of the read matching the adapter is trimmed,
but anything that follows is kept.
-e ERROR_RATE, –error-rate=ERROR_RATE
Maximum allowed error rate (no. of errors divided by
the length of the matching region) (default: 0.1)
–no-indels Do not allow indels in the alignments (allow only
mismatches). (default: allow both mismatches and
indels)
-n COUNT, –times=COUNT
Remove up to COUNT adapters from each read (default:
1)
-O LENGTH, –overlap=LENGTH
Minimum overlap length. If the overlap between the
read and the adapter is shorter than LENGTH, the read
is not modified. This reduces the no. of bases trimmed
purely due to short random adapter matches (default:
3).
–match-read-wildcards
Allow IUPAC wildcards in reads (default: False).
-N, –no-match-adapter-wildcards
Do not interpret IUPAC wildcards in adapters.
–no-trim Match and redirect reads to output/untrimmed-output as
usual, but do not remove adapters.
–mask-adapter Mask adapters with ‘N’ characters instead of trimming
them.
Additional read modifications:
-u LENGTH, –cut=LENGTH
Remove LENGTH bases from the beginning or end of each
read. If LENGTH is positive, bases are removed from
the beginning of each read. If LENGTH is negative,
bases are removed from the end of each read. This
option can be specified twice if the LENGTHs have
different signs.
-q [5’CUTOFF,]3’CUTOFF, –quality-cutoff=[5’CUTOFF,]3’CUTOFF
Trim low-quality bases from 5’ and/or 3’ ends of reads
before adapter removal. If one value is given, only
the 3’ end is trimmed. If two comma-separated cutoffs
are given, the 5’ end is trimmed with the first
cutoff, the 3’ end with the second. See documentation
for the algorithm. (default: no trimming)
–quality-base=QUALITY_BASE
Assume that quality values in FASTQ are encoded as
ascii(quality + QUALITY_BASE). This needs to be set to
64 for some old Illumina FASTQ files. Default: 33
–trim-n Trim N’s on ends of reads.
-x PREFIX, –prefix=PREFIX
Add this prefix to read names. Use {name} to insert
the name of the matching adapter.
-y SUFFIX, –suffix=SUFFIX
Add this suffix to read names; can also include {name}
–strip-suffix=STRIP_SUFFIX
Remove this suffix from read names if present. Can be
given multiple times.
–length-tag=TAG Search for TAG followed by a decimal number in the
description field of the read. Replace the decimal
number with the correct length of the trimmed read.
For example, use –length-tag ‘length=’ to correct
fields like ‘length=123’.
Options for filtering of processed reads:
–discard-trimmed, –discard
Discard reads that contain an adapter. Also use -O to
avoid discarding too many randomly matching reads!
–discard-untrimmed, –trimmed-only
Discard reads that do not contain the adapter.
-m LENGTH, –minimum-length=LENGTH
Discard trimmed reads that are shorter than LENGTH.
Reads that are too short even before adapter removal
are also discarded. In colorspace, an initial primer
is not counted (default: 0).
-M LENGTH, –maximum-length=LENGTH
Discard trimmed reads that are longer than LENGTH.
Reads that are too long even before adapter removal
are also discarded. In colorspace, an initial primer
is not counted (default: no limit).
–max-n=COUNT Discard reads with too many N bases. If COUNT is an
integer, it is treated as the absolute number of N
bases. If it is between 0 and 1, it is treated as the
proportion of N’s allowed in a read.
Options that influence what gets output to where:
–quiet Do not print a report at the end.
-o FILE, –output=FILE
Write modified reads to FILE. FASTQ or FASTA format is
chosen depending on input. The summary report is sent
to standard output. Use ‘{name}’ in FILE to
demultiplex reads into multiple files. (default:
trimmed reads are written to standard output)
–info-file=FILE Write information about each read and its adapter
matches into FILE. See the documentation for the file
format.
-r FILE, –rest-file=FILE
When the adapter matches in the middle of a read,
write the rest (after the adapter) into FILE.
–wildcard-file=FILE
When the adapter has N bases (wildcards), write
adapter bases matching wildcard positions to FILE.
When there are indels in the alignment, this will
often not be accurate.
–too-short-output=FILE
Write reads that are too short (according to length
specified by -m) to FILE. (default: discard reads)
–too-long-output=FILE
Write reads that are too long (according to length
specified by -M) to FILE. (default: discard reads)
–untrimmed-output=FILE
Write reads that do not contain the adapter to FILE.
(default: output to same file as trimmed reads)
Colorspace options:
-c, –colorspace Enable colorspace mode: Also trim the color that is
adjacent to the found adapter.
-d, –double-encode
Double-encode colors (map 0,1,2,3,4 to A,C,G,T,N).
-t, –trim-primer Trim primer base and the first color (which is the
transition to the first nucleotide)
–strip-f3 Strip the _F3 suffix of read names
–maq, –bwa MAQ- and BWA-compatible colorspace output. This
enables -c, -d, -t, –strip-f3 and -y ‘/1’.
–no-zero-cap Do not change negative quality values to zero in
colorspace data. By default, they are changed to zero
since many tools have problems with negative
qualities.
-z, –zero-cap Change negative quality values to zero. This is
enabled by default when -c/–colorspace is also
enabled. Use the above option to disable it.
Paired-end options:
The -A/-G/-B/-U options work like their -a/-b/-g/-u counterparts.
-A ADAPTER 3’ adapter to be removed from second read in a pair.
-G ADAPTER 5’ adapter to be removed from second read in a pair.
-B ADAPTER 5’/3 adapter to be removed from second read in a pair.
-U LENGTH Remove LENGTH bases from the beginning or end of each
second read (see –cut).
-p FILE, –paired-output=FILE
Write second read in a pair to FILE.
–pair-filter=(any|both)
Which of the reads in a paired-end read have to match
the filtering criterion in order for it to be
filtered. Default: any.
–interleaved Read and write interleaved paired-end reads.
–untrimmed-paired-output=FILE
Write second read in a pair to this FILE when no
adapter was found in the first read. Use this option
together with –untrimmed-output when trimming paired-
end reads. (Default: output to same file as trimmed
reads.)
–too-short-paired-output=FILE
Write second read in a pair to this file if pair is
too short. Use together with –too-short-output.
–too-long-paired-output=FILE
Write second read in a pair to this file if pair is
too long. Use together with –too-long-output.