Seq crumbs
Remove contaminant adapter sequences from your reads prior to other NGS processing
All seq crumbs try to share a consistent interface. By default most Seq Crumbs read from standard input and write to standard output, allowing them to to be easily combined using Unix pipes. Alternatively, several input sequence files can be provided as a list of arguments. Output can also be directed to specific files with the -o parameter (or --outfile).
seq_crumbs supports compressed gzip, BGZF and bzip2 files. When used as input it autodetects the compressed files. It can also generate compressed outputs.
The sequence formats accepted by seq_crumbs are those supported by Biopython's SeqIO module. As output only Sanger and Illumina fastq and fasta files are supported.
seq_crumbs can take advantage of multiprocessor computers by splitting the computational load into several processes.
The filtering seq crumbs can be made aware of paired reads and can filter both reads of pairs at once.
You can find more information about seq_crumbs in the seq_crumbs web site[1].
Available Crumbs
sff_extract | Extracts reads from an SFF file used by 454 and Ion Torrent. |
---|---|
split_matepairs | Splits mate-pairs separated by an oligo sequence. |
filter_by_quality | Filters sequences according to mean quality. |
filter_by_length | Filters sequences according to maximum and minimum length thresholds. |
filter_by_name | Filters sequences with a list of names given in a file. |
filter_by_blast | Filters the sequences using BLAST. |
filter_by_complexity | Filters sequences according to their complexity. |
filter_by_bowtie2 | It filters the sequences using bowtie2 |
trim_by_case | Trims sequences according to case. |
trim_edges | Removes a fixed number of residues from sequence edges. |
trim_quality | Removes, using a sliding window, regions of low quality in the edges. |
trim_blast_short | Removes oligonucleotides by using the blast-short algorithm. |
convert_format | Converts between the different supported sequence formats. |
guess_seq_format | Guesses the format of a file, including Sanger and Illumina fastq formats. |
cat_seqs | Concatenates one or several input sequence files, possibly in different formats, into one output. |
seq_head | Outputs only the first sequences of the given input. |
sample_seqs | Outputs a random sampling of the input sequences. |
count_seqs | It counts sequences in the input files |
change_case | Modifies the case of sequences. Case can be converted to lower or upper, or swapped. |
pair_matcher | Filters out orphaned read pairs. |
interleave_pairs | Interleaves two ordered paired read files. |
deinterleave_pairs | Splits an ordered file of paired reads into two files, one for each end.calculate_stats |
calculate_stats | Generates basic statistics for the given sequence files. |
orientate_transcripts | Reverse complements transcripts according to polyA, ORF or BLAST hits. |
fastqual_to_fastq | Converts fasta and qual files to a fastq format file. |
References: