MEGAN6 pipeline

Hello there, here we are at the 4th part of the pipeline, setting up the MEGAN6 Community Edition1 pipeline to process the data coming from the DIAMOND2 reads aligner.

  1. HISAT2
  2. Samtools
  3. DIAMOND
  4. MEGAN6
  5. Plots
MEGAN6
daa-meganizer
compute-comparison
HISAT2
Samtools
DIAMOND
Plots

Requirements

MEGAN6 Community Edition 6.16 or greater download
eggNOG, InterPro2GO and SEED file maps download
Java runtime environment version 11 or greater download
xvfb-run is needed to run the compute-comparison tool Wikipedia

Pipeline

The MEGAN6 pipeline we set up is executed using command-line and has two parts:

  1. Meganize all .daa files, using the daa-meganizer tool provided by the MEGAN6, so they can be ready for the second part of the pipeline.
  2. Compute the comparisons between all “meganized” files using the compute-comparison tool also provided by the MEGAN6.

Note: These tools are inside the folder tools of the MEGAN6 installation.

Explaining the pipeline

Firstly, the command-line is used to improve the throughput of the pipeline, in this way we can use the supercomputers with multiple-cores available for our research. The first part of the MEGAN6 pipeline, prepares the samples as .daa files in a way they can be read by the compute-comparison tool. The second part of the pipeline, compute all the samples and generates this computation containing all the maps to the databases provided (eggNOG, InterPro2GO and SEED), finally this file, can be analysed inside the MEGAN6 GUI.

Pipeline process

We had two super computers available for our computation: NPAD, and Bioinformatics 3. The first one, NPAD has an older version of the MEGAN6 (version 6.12.3, built 14 Aug 2018), that works just fine to meganize using separate database files, but there is a couple drawbacks, it does not contain the compute-comparison tool, and at the date I’m writing this tutorial, it does not have the latest databases available, whereas in the Bioinformatics3 server it was possible to set up with the latest version of the MEGAN6 (version 6.18.5, built 14 Feb 2020) at that time, this version uses only one main database file containing all the updated databases we map our files to. This small difference (one file with all databases, in opposition to separate files) will lead to a different syntax to use the daa-meganizer.

Part 1 - Meganizing files

Bioinformatics3 server

Assuming you have downloaded and installed MEGAN6, and also downloaded the database map file, it is a good practice to set up a batch script to process the computation:

#!/bin/bash

#setting constants
global_dir="/data/home/leozenon"
db_dir="/data/home/leozenon/db"
map="/data/home/leozenon/db/megan-map-Oct2019.db"
file="1.INF.daa"

# I have tools as symbolic link from MEGAN tools on my home folder
# 28 Mar  3 18:31 tools -> /data/home/root/megan6/tools

#meganizing
./tools/daa-meganizer -i $db_dir/$file -mdb $map -pr

#moving to another directory after job done
mv $db_dir/$file $db_dir/mega/

NPAD server

The NPAD server has a different way of setting up it’s jobs that can be read on it’s on FAQ website, firstly, I have copied the daa-meganizer tool to my global folder named as daa-meg, and edited in a way to use more of the server resources:

#!/bin/bash

# Runs the DAA-Meganizer command-line program

#

# Copyright (C) 2019 Daniel H. Huson

#

# Use only as permitted under MEGAN Ultimate Edition license agreement, do not redistribute.

  
  

bin_dir=`dirname "/opt/npad/shared/softwares/python/3.6-anaconda-5.0.1/envs/megan6/opt/megan-6.12.3/tools/daa-meganizer"`

bin_dir=`cd "$bin_dir"  && pwd` # ensure absolute path

jars_dir="$bin_dir/../jars"

  

jre_dir=/opt/npad/shared/softwares/python/3.6-anaconda-5.0.1/envs/megan6

  

if [ -z $jre_dir ]

then

java=java

vmOptions="-Xmx64G"

classpath="../antbuild/MEGAN.jar:$jars_dir/MALT.jar:$jars_dir/data.jar:"

else

java=$jre_dir/bin/java

vmOptions="-Xmx64000M"

classpath="$jars_dir/MEGAN.jar:$jars_dir/MALT.jar:$jars_dir/data.jar:"

fi

  

options=$*

if [ $# ==  0 ]

then

options="-h"

fi

  

java_flags="-server -Duser.language=en -Duser.region=US -Djava.awt.headless=true $vmOptions"

  

$java $java_flags -cp "$classpath" megan.tools.DAAMeganizer $options

After that to use the NPAD supercomputer resources (memory and cores) there is an specific syntax which is explained on it’s own tutorial3, but I have added the batch script it’s used on this specific pipeline:

#!/bin/bash

  

#SBATCH --time=1-0:0

#SBATCH --mem=64000

  

global_dir="/home/lztassi/global"

db_dir="/home/lztassi/global/db"

acc_nucl="/home/lztassi/global/acc/nucl_acc2tax-Jul2019.abin"

acc_egg="/home/lztassi/global/acc/acc2eggnog-Jul2019X.abin"

acc_inter="/home/lztassi/global/acc/acc2interpro-Jul2019X.abin"

acc_seed="/home/lztassi/global/acc/acc2seed-May2015XX.abin"

file="1.INF.daa"

  

./daa-meg -i $db_dir/$file -pr -alg weighted -a2t $acc_nucl -a2eggnog $acc_egg -a2interpro2go $acc_inter -a2seed $acc_seed

mv $db_dir/$file $db_dir/mega/

The expected output will be something similar to this:

Version   MEGAN Community Edition (version 6.18.5, built 14 Feb 2020)
Copyright (C) 2019 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Java version: 11.0.1
Functional classifications to use: EGGNOG, INTERPRO2GO, SEED
Loading ncbi.map: 2,175,506
Loading ncbi.tre: 2,175,510
Loading eggnog.map:    30,875
Loading eggnog.tre:    30,986
Loading interpro2go.map:    13,501
Loading interpro2go.tre:    29,204
Loading seed.map:    13,662
Loading seed.tre:    21,085
Meganizing: /data/home/leozenon/db/3.INF.daa
Meganizing init
Annotating DAA file using FAST mode (accession database and first accession per line)
Annotating references
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (17.0s)
Writing
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (3.1s)
100% (0.0s)
Binning reads Initializing...
Initializing binning...
WARNING: Not an RMA6 file, will ignore paired read information
Using 'Naive LCA' algorithm for binning: Taxonomy
Using Best-Hit algorithm for binning: SEED
Using Best-Hit algorithm for binning: EGGNOG
Using Best-Hit algorithm for binning: INTERPRO2GO
Binning reads...
100% (1.1s)
Binning reads Analyzing alignments
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (363.1s)
Total reads:        8,394,155
With hits:           8,394,155 
Alignments:        171,438,213
Assig. Taxonomy:     8,367,088
Assig. SEED:           300,729
Assig. EGGNOG:       4,889,350
Assig. INTERPRO2GO:  6,517,782
MinSupport set to: 4197
100% (0.0s)
Binning reads Applying min-support & disabled filter to Taxonomy...
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (0.4s)
Min-supp. changes:       1,657
100% (21.3s)
Binning reads Writing classification tables
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (11.9s)
Binning reads Syncing
100% (0.1s)
Class. Taxonomy:           154
Class. SEED:               966
Class. EGGNOG:           4,235
Class. INTERPRO2GO:      8,936
Total time:  427s
Peak memory: 22.8 of 536.9G

Part 2 - Compute Comparison

As stated on the beginning of this section, this computation was only possible to be done on the Bioinformatics3 server due to MEGAN6 version.

After meganizing all the .daa files the computation of all the comparisons is done using the command xvfb-run ./tools/compute-comparison -i files -o output.

Example:

xvfb-run ./tools/compute-comparison -i db/mega/1.INF.daa db/mega/2.INF.daa db/mega/3.INF.daa db/mega/4.INF.daa db/mega/5.INF.daa db/mega/6.INF.daa db/mega/7.INF.daa db/mega/8.INF.daa db/mega/9.INF.daa db/mega/10.INF.daa db/mega/11.INF.daa db/mega/12.INF.daa db/mega/13.INF.daa db/mega/14.INF.daa db/mega/15.INF.daa db/mega/16.INF.daa db/mega/17.INF.daa db/mega/18.INF.daa db/mega/19.INF.daa db/mega/3.NO.daa db/mega/4.NO.daa db/mega/5.NO.daa db/mega/6.NO.daa db/mega/7.NO.daa db/mega/8.NO.daa db/mega/9.NO.daa db/mega/10.NO.daa db/mega/11.NO.daa db/mega/12.NO.daa db/mega/13.NO.daa db/mega/14.NO.daa db/mega/15.NO.daa db/mega/16.NO.daa db/mega/17.NO.daa db/mega/18.NO.daa db/mega/19.NO.daa -o compute.all.megan

The expected output is:

[leozenon@bioinformatica3 ~]$ xvfb-run ./tools/compute-comparison -i all_.daa_meganized_files -o compute.all.megan
ES2 Prism: Error - GLX extension is not supported
    GLX version 1.3 or higher is required
Version   MEGAN Community Edition (version 6.18.5, built 14 Feb 2020)
Copyright (C) 2019 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
GLib-GIO-Message: 23:38:00.404: Using the 'memory' GSettings backend.  Your settings will not be saved or shared with other applications.
Executing: open file='db/mega/1.INF.daa' readOnly=true;update;
Loading MEGAN File: 1.INF.daa
Info: Opened file '1.INF.daa' with 10,933,575 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/2.INF.daa' readOnly=true;update;
Loading MEGAN File: 2.INF.daa
Opened file 'db/mega/2.INF.daa' with 8,612,519 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/3.INF.daa' readOnly=true;update;
Loading MEGAN File: 3.INF.daa
Opened file 'db/mega/3.INF.daa' with 8,394,155 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/4.INF.daa' readOnly=true;update;
Loading MEGAN File: 4.INF.daa
Opened file 'db/mega/4.INF.daa' with 9,920,947 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/5.INF.daa' readOnly=true;update;
Loading MEGAN File: 5.INF.daa
Opened file 'db/mega/5.INF.daa' with 7,699,142 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/6.INF.daa' readOnly=true;update;
Loading MEGAN File: 6.INF.daa
Opened file 'db/mega/6.INF.daa' with 9,409,751 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/7.INF.daa' readOnly=true;update;
Loading MEGAN File: 7.INF.daa
Opened file 'db/mega/7.INF.daa' with 8,974,968 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/8.INF.daa' readOnly=true;update;
Loading MEGAN File: 8.INF.daa
Opened file 'db/mega/8.INF.daa' with 6,616,825 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/9.INF.daa' readOnly=true;update;
Loading MEGAN File: 9.INF.daa
Opened file 'db/mega/9.INF.daa' with 8,837,959 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/10.INF.daa' readOnly=true;update;
Loading MEGAN File: 10.INF.daa
Opened file 'db/mega/10.INF.daa' with 10,470,593 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/11.INF.daa' readOnly=true;update;
Loading MEGAN File: 11.INF.daa
Opened file 'db/mega/11.INF.daa' with 9,517,603 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/12.INF.daa' readOnly=true;update;
Loading MEGAN File: 12.INF.daa
Opened file 'db/mega/12.INF.daa' with 9,551,516 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/13.INF.daa' readOnly=true;update;
Loading MEGAN File: 13.INF.daa
Opened file 'db/mega/13.INF.daa' with 12,039,786 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/14.INF.daa' readOnly=true;update;
Loading MEGAN File: 14.INF.daa
Opened file 'db/mega/14.INF.daa' with 8,711,183 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/15.INF.daa' readOnly=true;update;
Loading MEGAN File: 15.INF.daa
Opened file 'db/mega/15.INF.daa' with 9,211,980 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/16.INF.daa' readOnly=true;update;
Loading MEGAN File: 16.INF.daa
Opened file 'db/mega/16.INF.daa' with 11,314,084 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/17.INF.daa' readOnly=true;update;
Loading MEGAN File: 17.INF.daa
Opened file 'db/mega/17.INF.daa' with 11,975,812 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/18.INF.daa' readOnly=true;update;
Loading MEGAN File: 18.INF.daa
Opened file 'db/mega/18.INF.daa' with 7,793,529 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/19.INF.daa' readOnly=true;update;
Loading MEGAN File: 19.INF.daa
Opened file 'db/mega/19.INF.daa' with 10,105,954 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/3.NO.daa' readOnly=true;update;
Loading MEGAN File: 3.NO.daa
Opened file 'db/mega/3.NO.daa' with 6,887,738 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/4.NO.daa' readOnly=true;update;
Loading MEGAN File: 4.NO.daa
Opened file 'db/mega/4.NO.daa' with 7,967,337 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/5.NO.daa' readOnly=true;update;
Loading MEGAN File: 5.NO.daa
Opened file 'db/mega/5.NO.daa' with 8,655,375 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/6.NO.daa' readOnly=true;update;
Loading MEGAN File: 6.NO.daa
Opened file 'db/mega/6.NO.daa' with 7,933,496 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/7.NO.daa' readOnly=true;update;
Loading MEGAN File: 7.NO.daa
Opened file 'db/mega/7.NO.daa' with 8,841,339 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/8.NO.daa' readOnly=true;update;
Loading MEGAN File: 8.NO.daa
Opened file 'db/mega/8.NO.daa' with 3,933,650 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/9.NO.daa' readOnly=true;update;
Loading MEGAN File: 9.NO.daa
Opened file 'db/mega/9.NO.daa' with 6,924,690 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/10.NO.daa' readOnly=true;update;
Loading MEGAN File: 10.NO.daa
Opened file 'db/mega/10.NO.daa' with 11,334,423 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/11.NO.daa' readOnly=true;update;
Loading MEGAN File: 11.NO.daa
Opened file 'db/mega/11.NO.daa' with 8,432,444 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/12.NO.daa' readOnly=true;update;
Loading MEGAN File: 12.NO.daa
Opened file 'db/mega/12.NO.daa' with 10,121,509 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/13.NO.daa' readOnly=true;update;
Loading MEGAN File: 13.NO.daa
Opened file 'db/mega/13.NO.daa' with 8,554,302 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/14.NO.daa' readOnly=true;update;
Loading MEGAN File: 14.NO.daa
Opened file 'db/mega/14.NO.daa' with 7,049,651 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/15.NO.daa' readOnly=true;update;
Loading MEGAN File: 15.NO.daa
Opened file 'db/mega/15.NO.daa' with 6,801,339 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/16.NO.daa' readOnly=true;update;
Loading MEGAN File: 16.NO.daa
Opened file 'db/mega/16.NO.daa' with 6,367,556 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/17.NO.daa' readOnly=true;update;
Loading MEGAN File: 17.NO.daa
Opened file 'db/mega/17.NO.daa' with 7,168,004 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/18.NO.daa' readOnly=true;update;
Loading MEGAN File: 18.NO.daa
Opened file 'db/mega/18.NO.daa' with 7,515,418 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Executing: open file='db/mega/19.NO.daa' readOnly=true;update;
Loading MEGAN File: 19.NO.daa
Opened file 'db/mega/19.NO.daa' with 8,460,638 reads
Executing: update;
updating viewer
Induced tree has 1 of 1 nodes
updating viewer
Induced tree has 1 of 1 nodes
Computing comparison:
Normalizing to: 0 reads per sample
Total assigned:   62,038,417 normalized
Saving to file: compute.all.megan
done
Total time:  17s
Peak memory: 0.3 of 536.9G

After the computation it will be saved to compute.all.megan that can be analysed on the MEGAN6 GUI.

Plotting

The 5th part of the pipeline is designed to elaborate plots to help evaluate all the data processed.

  1. HISAT2
  2. Samtools
  3. DIAMOND
  4. MEGAN6
  5. Plots
Plots
ggplot2
ggplot2
R
box-plots
bar-plots
HISAT2
Samtools
DIAMOND
MEGAN6

Written with StackEdit.


  1. HUSON, D. H. et al. MEGAN Community Edition - Interactive Exploration and Analysis of Large-Scale Microbiome Sequencing Data. PLOS Computational Biology, v. 12, n. 6, p. e1004957, 21 jun. 2016. ↩︎

  2. BUCHFINK, B.; XIE, C.; HUSON, D. Fast and sensitive protein alignment using DIAMOND. Nature Methods, v. 12, n. 1, p. 59, 60 (2015).** ↩︎

  3. Núcleo de Processamento de Alto Desempenho - NPAD - UFRN ↩︎