PGA2 (Prokaryotic Genome Assembly & Annotation) Pipeline

We have developed PGA2 pipeline for whole genome assembly and annotation of Prokaryotic genomes from the data generated from Next Generation Sequencing (NGS) technologies (e.g. FASTQ reads). Genome sequencer produce the raw data in terms of FASTQ reads.

PGA2 an automated pipeline developed to 1) Filter the FASTQ reads, 2) Assemble filtered reads, and 3) Annotation of assembled genome.

PGA2 pipeline is developed in Perl by integrating different tools. We have used NGSQC toolkit (Patel RK et. al., 2012) to filter the FASTQ reads. For genome assembly, Velvet tool (https://www.ebi.ac.uk/~zerbino/velvet/) has been integrated in this pipeline. Finally, PROKKA (Prokka: Prokaryotic Genome Annotation System) (Torsten Seemann, 2014) pipeline is integrated for annotation of assembled genome.

User can provide the short reads in FASTQ format to PGA2 pipeline and get the filtered reads, assembled contigs, and annotated genome.

PGA2 Pipeline Usages

PGA2 pipeline is functional at all Linux platforms. User can easily download and run PGA2 pipleine at any linux platform.

Download PGA2 Pipeline

Installation

1. After downloading the Pga2.tar.gz package, user should extract it by the command tar -zxvf Pga2.tar.gz.

2. Run the perl program install.pl and provide the path for the installation of PGA2.

3. After installation, all Perl programs and corresponding configuration files (i.e. config_file_ALL, config_file_filter, config_file_assemble, pga2_filter.pl, pga2_assemble.pl, pga2_annotate.pl and pga2_ALL.pl) will be available in bin directory.

Usages

1. Filtering of whole genome sequencing data :- perl pga2_filter.pl config_file_filter

2. Whole genome assembly :- perl pga2_assemble.pl config_file_assemble

3. Prokaryotic genome annotation :- perl pga2_annotate.pl contig.fasta (contig file)

4. Complete PGA2 Pipeline (Whole genome assembly and annotation from raw sequencing data) :-

perl pga2_ALL.pl config_file_ALL

Note:- User can change the parameters in the configuration files, according to the requirements.

Dependencies

PGA2 pipeline is user friendly package, and anybody can download and install standalone package of this pipeline from this webpage.

PGA2 Pipeline has several dependencies and follows :

Requirements

GD::Graph perl module (optional, used to prepare graphs)
String::Approx perl module (required to speed up the string matching for primer/adapter)
BioPerl >= 1.6.2 (was 1.6.0)
BLAST+ >= 2.2
HMMer >= 3.1
Aragorn >= 1.0
Prodigal >= 2.0
tbl2asn >= 21.0
GNU Parallel >= 20130422 (was 20120322)

Optional

RNAmmer >= 1.2 (requires patch to ensure it uses older HMMer 2.x)
HMMer >= 2.0 (for RNAmmer)
SignalP >= 3.0 (for --gram / sig_peptide predictions)

For fully functional PGAG pipeline, user should have all these dependencies.

Availability

PGA2 Pipeline is available by special request to us through email support@nextgenhelper.com .

Acknowledgements

1. NGSQC toolkit developers (Patel RK et. al., PloS One 7(2): e30619).

2. PROKKA pipeline developers (Torsten Seemann, Bioinformatics, 14: 2068–2069).

3. Velvet tool developers (D.R. Zerbino and E. Birney. Genome Research 18:821-829).