NGSUtils
Manipulate FastQ and BAM NGS data
NGSUtils [1][2] is a suite of software tools for working with next-generation sequencing datasets. Staring in 2009, we (Liu Lab @ Indiana University School of Medicine) starting working with next-generation sequencing data. We initially started doing custom coding for each project in a one-off manner. It quickly became apparent that this was an inefficient manner to work, so we started assembling smaller utilities that could be adapted into larger, more complicated, workflows. We have used them for Illumia, SOLiD and 454 sequencing data. We have used them for DNA and RNA resequcing, ChIP-Seq, CLIP-Seq, and targeted resequencing (Agilent exome capture and PCR targeting). These tools are also used heavily in our in-house DNA and RNA mapping pipelines.
These tools have of great use within our lab group, and so we are happy to make them available to the greater community.
NGSUtils is made up of 50+ programs (full list), mainly written in Python. These are separated into modules based on the type of file that is to be analyzed. There are four modules:
- bamutils (BAM/SAM files)
- bedutils (BED files)
- fastqutils (FASTQ files, base- and color-space)
- gtfutils (GTF gene models)
Each of these modules contains many commands for manipulating, filtering, converting, or analyzing these types of files. Check out the documentation for each module for more information about some of the commands available.
to get help type ngsutils help 'command-name'
bamutils
Usage: bamutils COMMAND Commands DNA-seq basecall - Base/variant caller RNA-seq count - Calculates counts/FPKM for genes/BED regions/repeats (also CNV) General best - Filter out multiple mappings for a read, selecting only the best convertregion - Converts region mapping to genomic mapping export - Export reads, mapped positions, and other tags expressed - Finds regions expressed in a BAM file extract - Extracts reads based on regions in a BED file filter - Removes reads from a BAM file based on criteria innerdist - Calculate the inner mate-pair distance from two BAM files keepbest - Parses BAM file and keeps the best mapping for reads that have multiple mappings merge - Combine multiple BAM files together (taking best-matches) pair - Given two separately mapped paired files, re-pair the files peakheight - Find the size (max height, width) of given peaks (BED) in a BAM file renamepair - Postprocesses a BAM file to rename pairs that have an extra /N value split - Splits a BAM file into smaller pieces stats - Calculates simple stats for a BAM file tag - Update read names with a suffix (for merging) Conversion tobed - Convert BAM reads to BED regions tobedgraph - Convert BAM coverage to bedGraph (for visualization) tofasta - Convert BAM reads to FASTA sequences tofastq - Convert BAM reads back to FASTQ sequences Misc check - Checks a BAM file for corruption cleancigar - Fixes BAM files where the CIGAR alignment has a zero length element Run 'bamutils help CMD' for more information about a specific command ngsutils 0.5.5-2232b67
bedutils
Usage: bedutils COMMAND Commands General clean - Cleans a BED file (score should be integers) extend - Extends BED regions (3') reduce - Merges overlapping BED regions refcount - Given a number of BED files, calculate the number of samples that overlap regions in a reference BED file sizes - Extract the sizes of BED regions sort - Sorts a BED file (in place) stats - Calculates simple stats for a BED file subtract - Subtracts one set of BED regions from another Conversion annotate - Annotate BED files by adding / altering columns frombasecall - Converts a file in basecall format to BED3 format fromprimers - Converts a list of PCR primer pairs to BED regions fromvcf - Converts a file in VCF format to BED6 tobed3 - Removes extra columns from a BED (or BED compatible) file tobed6 - Removes extra columns from a BED (or BED compatible) file tobedgraph - BED to BedGraph tofasta - Extract BED regions from a reference FASTA file Misc cleanbg - Cleans up a bedgraph file Run 'bedutils help CMD' for more information about a specific command ngsutils 0.5.5-2232b67
fastqutils
Usage: fastqutils COMMAND Commands General barcode_split - Splits a FASTQ/FASTA file based on sequence barcodes filter - Filter out reads using a number of metrics merge - Merges paired FASTQ files into one file names - Write out the read names properpairs - Find properly paired reads (when fragments are filtered separately) revcomp - Reverse compliment a FASTQ file sort - Sorts a FASTQ file by name or sequence split - Splits a FASTQ file into N chunks stats - Calculate summary statistics for a FASTQ file tag - Adds a prefix or suffix to the read names in a FASTQ file tile - Splits long FASTQ reads into smaller (tiled) chunks trim - Remove 5' and 3' linker sequences (slow, S/W aligned) truncate - Truncates reads to a maximum length unmerge - Unmerged paired FASTQ files into two (or more) files Conversion convertqual - Converts qual values from Illumina to Sanger scale csencode - Converts color-space FASTQ file to encoded FASTQ fromfasta - Converts (cs)FASTA/qual files to FASTQ format fromqseq - Converts Illumina qseq (export/sorted) files to FASTQ tofasta - Converts to FASTA format (seq or qual) Run 'fastqutils help CMD' for more information about a specific command ngsutils 0.5.5-2232b67
gtfutils
Usage: gtfutils COMMAND Commands General add_isoform - Appends isoform annotation from UCSC isoforms file add_reflink - Appends isoform/name annotation from RefSeq/refLink add_xref - Appends name annotation from UCSC Xref file annotate - Annotates genomic positions based on a GTF model filter - Filter annotations from a GTF file genesize - Extract genomic/transcript sizes for genes junctions - Build a junction library from FASTA and GTF model query - Query a GTF file by coordinates Conversion tobed - Convert a GFF/GTF file to BED format Run 'gtfutils help CMD' for more information about a specific command ngsutils 0.5.5-2232b67
References:
- ↑
Marcus R Breese, Yunlong Liu
NGSUtils: a software suite for analyzing and manipulating next-generation sequencing datasets.
Bioinformatics: 2013, 29(4);494-6
[PubMed:23314324] ##WORLDCAT## [DOI] (I p) - ↑ http://ngsutils.org
- ↑ https://github.com/mbreese/tabutils