top of page
PGA2 (Prokaryotic Genome Assembly & Annotation) Pipeline

We have developed PGA2 pipeline for whole genome assembly and annotation of Prokaryotic genomes from the data generated from Next Generation Sequencing (NGS) technologies (e.g. FASTQ reads). Genome sequencer produce the raw data in terms of FASTQ reads.

PGA2 an automated pipeline developed to 1) Filter the FASTQ reads, 2) Assemble filtered reads, and 3) Annotation of assembled genome.

PGA2 pipeline is developed in Perl by integrating different tools. We have used NGSQC toolkit (Patel RK et. al., 2012) to filter  the FASTQ reads. For genome assembly, Velvet tool ( has been integrated in this pipeline. Finally, PROKKA (Prokka: Prokaryotic Genome Annotation System) (Torsten Seemann, 2014) pipeline is integrated for annotation of assembled genome.

User can provide the short reads in FASTQ format to PGA2 pipeline and get the filtered reads, assembled contigs, and annotated genome.

PGA2 Pipeline Usages

PGA2 pipeline is functional at all Linux platforms. User can easily download and run PGA2 pipleine at any linux platform.

Download PGA2 Pipeline



1. After downloading the Pga2.tar.gz package, user should extract it by the command tar -zxvf Pga2.tar.gz.

2. Run the perl program and provide the path for the installation of PGA2.

3. After installation, all Perl programs and corresponding configuration files (i.e. config_file_ALL, config_file_filter, config_file_assemble,,, and will be available in bin directory.



1. Filtering of whole genome sequencing data :- perl config_file_filter

2. Whole genome assembly :- perl config_file_assemble

3. Prokaryotic genome annotation :- perl contig.fasta (contig file)

4. Complete PGA2 Pipeline (Whole genome assembly and annotation from raw sequencing data) :-

perl config_file_ALL

Note:- User can change the parameters in the configuration files, according to the requirements.



PGA2 pipeline is user friendly package, and anybody can download and install standalone package of this pipeline from this webpage.

PGA2 Pipeline has several dependencies and follows :


  • GD::Graph perl module (optional, used to prepare graphs)

  • String::Approx perl module (required to speed up the string matching for primer/adapter)

  • BioPerl >= 1.6.2 (was 1.6.0)

  • BLAST+ >= 2.2

  • HMMer >= 3.1

  • Aragorn >= 1.0

  • Prodigal >= 2.0

  • tbl2asn >= 21.0

  • GNU Parallel >= 20130422 (was 20120322)



  • Barrnap >= 0.1 (fast rRNA searching using NHMMER)

  • MINCED => 0.1.4 (find CRISPRs)

  • Infernal >= 1.1rc (for --rfam /non-coding RNA predictions)



  • RNAmmer >= 1.2 (requires patch to ensure it uses older HMMer 2.x)

  • HMMer >= 2.0 (for RNAmmer)

  • SignalP >= 3.0 (for --gram / sig_peptide predictions)

For fully functional PGAG pipeline, user should have all these dependencies.


PGA2 Pipeline is available by special request to us through email .


1. NGSQC toolkit developers (Patel RK et. al., PloS One 7(2): e30619).

2. PROKKA pipeline developers (Torsten Seemann, Bioinformatics, 14: 2068–2069).

3. Velvet tool developers (D.R. Zerbino and E. Birney. Genome Research 18:821-829).

bottom of page