PHYLOGENY

T-Rex (Tree and reticulogram REConstruction) - is dedicated to the reconstruction of phylogenetic trees, reticulation networks and to the inference of horizontal gene transfer (HGT) events. T-REX includes several popular bioinformatics applications such as MUSCLE, MAFFT, Neighbor Joining, NINJA, BioNJ, PhyML, RAxML, random phylogenetic tree generator and some well-known sequence-to-distance transformation models. It also comprises fast and effective methods for inferring phylogenetic trees from complete and incomplete distance matrices as well as for reconstructing reticulograms and HGT networks (Reference: Alix, C. et al. 2012. Nucl. Acids Res. 40 (W1): W573-W579).

Phylogeny.fr - is a simple to use web service dedicated to reconstructing and analysing phylogenetic relationships between molecular sequences.It includes multiple alignment (MUSCLE, T-Coffee, ClustalW, ProbCons), phylogeny (PhyML, MrBayes, TNT, BioNJ), tree viewer (Drawgram, Drawtree, ATV) and utility programs (e.g. Gblocks to eliminate poorly aligned positions and divergent regions) (Reference: A. Dereeper et al,. 2008. Nucl. Acids Res. 36 (Web Server Issue):W465-9). Also available here. I have used this resource exclusively in the production of ICTV viral taxonomy proposals(TaxoProps).

NGPhylogeny.fr - is more flexible in terms of tools and workflows, easily installable, and more scalable. It integrates numerous tools in their latest version (e.g. TNT, FastME, MrBayes, etc.) as well as new ones designed in the last ten years (e.g. PhyML, SMS, FastTree, trimAl, BOOSTER, etc.). These tools cover a large range of usage (sequence searching, multiple sequence alignment, model selection, tree inference and tree drawing) and a large panel of standard methods (distance, parsimony, maximum likelihood and Bayesian). They are integrated in workflows, which have been already configured (‘One click’), can be customized (‘Advanced’), or are built from scratch (‘A la carte’). (Reference: Lemoine F et al. Nucleic Acids Res 47(W1): W260–W265).

iTOL (Interactive Tree Of Life) - is a very impressive online tool for the display, manipulation and annotation of phylogenetic and other trees. (Reference: Letunic I & Bork P (2019) Nucleic Acids Res 47(W1): W256-W259).

FastME provides distance algorithms to infer phylogenies. FastME is based on balanced minimum evolution, which is the very principle of NJ. FastME improves over NJ by performing topological moves using fast, sophisticated algorithms. The first version of FastME only included Nearest Neighbor Interchange (NNI). The new 2.0 version also includes Subtree Pruning and Regrafting (SPR), while remaining as fast as NJ and providing a number of facilities: distance estimation for DNA and proteins with various models and options, bootstrapping, and parallel computations. (Reference: Lefort V. et al. Molecular Biology & Evolution 32(10): 2798-800, 2015).

PhyML - has been widely used because of its simplicity and a fair compromise between accuracy and speed. In the meantime research on PhyML has continued, and new algorithms and methods have been implemented in the program. (Reference: V. Lefort et al. Molecular Biology and Evolution, msx149, 2017).

Treeview (part of ETE Toolkit) - the Environment for Tree Exploration (ETE) is a computational framework that simplifies the reconstruction, analysis, and visualization of phylogenetic trees and multiple sequence alignments. Here, we present ETE v3, featuring numerous improvements in the underlying library of methods, and providing a novel set of standalone tools to perform common tasks in comparative genomics and phylogenetics. The new features include (i) building gene-based and supermatrix-based phylogenies using a single command, (ii) testing and visualizing evolutionary models, (iii) calculating distances between trees of different size or including duplications, and (iv) providing seamless integration with the NCBI taxonomy database. (Reference: Huerta-Cepas J et al. (2016) Mol Biol Evol. 33(6): 1635-1638).

MAFFT version 7 (Multiple alignment program for amino acid or nucleotide sequences) (Reference: Katoh K., J.Rozewicki & K.D. Yamada (2019) Brief Bioinformatics 20: 1160–1166). Also includes links to downloadable versions of this program.

Evolview - is an interactive tree visualization tool designed to help researchers in visualizing phylogenetic trees and in annotating these with additional information. It offers the user with a platform to upload trees in most common tree formats, such as Newick/Phylip, Nexus, Nhx and PhyloXML, and provides a range of visualization options, using fifteen types of custom annotation datasets. The new version of Evolview was designed to provide simple tree uploads, manipulation and viewing options with additional annotation types. (Reference: Subramanian B et al. (2019) Nucleic Acids Res. 47(W1): W270-W275). Requires registration.

RAxML (Randomized Axelerated Maximum Likelihood) is a program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees (Reference: Stamatakis, A. 2006. Bioinformatics 22:2688–2690).

Phylemon2 - a suite of web-tools for molecular evolution, phylogenetics and phylogenomics (Reference:Sánchez, R. et al. 2011.Nucl. Acids Res. 39/suppl_2/W470)

POWER (PhylOgenetic Web Repeater) - allows users to carry out phylogenetic analysis on most programs of PHYLIP package repeatedly. POWER provide two pipelines to process the analysis. One of them includes multiple sequence alignment (MSA) at the beginning of the pipeline whereas the other begin phylogenetic analysis with aligned sequence. Very user friendly. (Reference: C.-Y. Lin. et al. 2005. Nucl. Acids Res. 33: W553-W556).

Phylodendron - phylogenetic tree printer (D.G. Gilbert, Indiana Univ.) - very useful in visualizing *.dnd file from aligments and saving the results as .GIF, .PS or .PDF files. N.B. The font style and size can be altered in the .PDF output format.

W-IQ-TREE - is an intuitive and user-friendly web interface and server for IQ-TREE, an efficient phylogenetic software for maximum likelihood analysis. W-IQ-TREE supports multiple sequence types (DNA, protein, codon, binary and morphology) in common alignment formats and a wide range of evolutionary models including mixture and partition models. W-IQ-TREE performs fast model selection, partition scheme finding, efficient tree reconstruction, ultrafast bootstrapping, branch tests, and tree topology tests. (Reference: Trifinopoulos J et al. (2016) Nucl.Acids Res. 44(Issue W1): W232–W235)

Phylogenetic tree prediction - GeneBee service (Belozersky Institute of Physico-chemical Biology, Moscow State University, Russia)

Phylogenetic Tree Plot (Laboratory of Bioinformatics, Wageningen UR, The Netherlands) - submit tree descriptions in PHYLIP (Newick) format only

Phylogenetic tree (newick) viewer - is an online tool for phylogenetic tree view (newick format) that allows multiple sequence alignments to be shown together with the trees (fasta format). It uses the tree drawing engine implemented in the ETE toolkit, and offers transparent integration with the NCBI taxonomy database. Currently, alignments can be displayed in condensed or block-based format. Leaf names in the newick tree should match those in the fasta alignment.

CVTree4 constructs whole-genome based phylogenetic trees without sequence alignment by using a Composition Vector (CV) approach. It was first developed to infer evolutionary relatedness of microbial organisms and then successfully applied to viruses, chloroplasts, and fungi. CVTree3 makes comparison with taxonomy and reports tree-branch monophyleticity from domain to species. (Reference: G. Zuo, & B. Hao (2015) Genomics Proteomics & Bioinformatics, 13: 321-331).

webPRANK - incorporates phylogeny-aware multiple sequence alignment, visualisation and post-processing in an easy-to-use web interface.(Reference: Löytynoja, A., & Goldman, N. 2010. BMC Bioinformatics. 11:579).

AnnoTree - is an interactive, functionally annotated bacterial tree of life that integrates taxonomic, phylogenetic and functional annotation data from over 27 000 bacterial and 1500 archaeal genomes. AnnoTree enables visualization of millions of precomputed genome annotations across the bacterial and archaeal phylogenies, thereby allowing users to explore gene distributions as well as patterns of gene gain and loss in prokaryotes. Using AnnoTree, we examined the phylogenomic distributions of 28 311 gene/protein families, and measured their phylogenetic conservation, patchiness, and lineage-specificity within bacteria. (Reference: Mendler K et al. (2019) Nucleic Acids Res. 47(9): 4442-4448).

FuncTree2 - is a user-friendly web application that turns hierarchical classifications into interactive and highly customizable radial trees, and enables researchers to visualize their data simultaneously on all its levels. FuncTree2 features mapping of data from multiple samples and several navigation features like zooming, panning, re-rooting and collapsing of nodes or levels. (Reference: Darzi Y et al. (2019) Bioinformatics. 35(21): 4519-4521).

eShadow Evolutionary phylogenetic SHADOWing of closely related species (Reference: Ovcharenko, D. et al. (2004) Genome Research, 14(6): 1191-1198)

SIFTER (Statistical Inference of Function Through Evolutionary Relationships) is a statistical approach to predicting protein function that uses a protein family's phylogenetic tree, as the natural structure for representing protein relationships. (Reference: S.M. Sahraeian et al. 2015. Nucl. Acids Res. 43 (W1): W141-W147).

PATH - is a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. (Reference: Gîrdea, M. et al. Algorithms for Molecular Biology 5: 6 ; 2010).

ReplacementMatrix - maximum-likelihood estimation of amino acid replacement rate matrices. (Reference: Dang, C.C. et al. 2011. Bioinformatics. 27(19):2758-2760).

DIVEIN - starting with a set of aligned sequences, DIVEIN estimates evolutionary parameters and phylogenetic trees while allowing the user to choose from a variety of evolutionary models; it then reconstructs the consensus (CON), most recent common ancestor (MRCA), and center of tree (COT) sequences. DIVEIN also provides tools for further analyses. (Reference: Deng, W. et al. 2010. Biotechniques. 48(5):405-408).

Molecular Taxonomy

AmphoraNet - is capable of assigning a probability-weighted taxonomic group for each phylogenetic marker gene found in the input metagenomic sample; the webserver is based on the AMPHORA2 workflow. It uses 31 bacterial and 104 archaeal protein coding marker genes for metagenomic phylotyping. Most of these are single copy genes, therefore AmphoraNet is suitable for estimating the taxonomic composition of bacterial and archaeal communities from metagenomic shotgun sequencing data. (Reference: Kerepesi, C. et al. 2014. Gene 533: 538–540).

VIRIDIC (Virus Intergenomic Distance Calculator) - the first level of bacteriophage classification by ICTV involves computing the overall DNA sequence identity between two viruses. This new tool computes pairwise intergenomic distances/similarities amongst phage genomes. To run it, upload a single fasta file with all phage genomes of interest, create a project and press run. Save the project ID that will be displayed when the project is created. You will need it to access the data if the calculations take a long time. (Reference: Moraru C, Varsani A, Kropinski AM. 2020. Viruses. 12(11): 1268.)

GGDC - Genome-to-Genome Distance Calculator - The species concept for Bacteria and Archaea is ultimately based on DNA-DNA hybridization (DDH). This web service can be used for genome-based species delineation with complete or incomplete genomes sequences. The server calculate intergenomic distances; and, these are converted into similarity values analogous to DDH and sent to you via e-mail. (Reference: Meier-Kolthoff, J.P. et al. 2013. BMC Bioinformatics 14:60).

VICTOR (Virus Classification and Tree Building Online Resource; Leibniz-Institut DSMZ-Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH). This web service compares bacterial and archaeal viruses ("phages") using their genome or proteome sequences. The results include phylogenomic trees inferred using the Genome-BLAST Distance Phylogeny method (GBDP), with branch support, as well as suggestions for the classification at the species, genus and family level. (The service can be applied to other kinds of viruses, too, but has not yet been tested in this respect.) Upload your FASTA files, GenBank files and/or GenBank accession IDs. (Reference: JP Meier-Kolthoff & M Göker. 2017. Bioinformatics 33(21): 3396–3404).

VIRFAM is dedicated to the recognition of head-neck-tail modules and of recombinase genes in phage genomes. You can use this server to search for remote homologs of specific protein families within protein sequences of bacteriophages. Input: protein sequences you’re your phage; output includesd a phylogenetic tree with the placement of your virus. (Reference: Lopes A et al. Nucleic Acids Res. (2010) 38(12): 3952-62).

MyTaxa - represents a new algorithm that extends the Average Amino Acid Identity (AAI) concept to identify the taxonomic affiliation of a query genome sequence or a sequence of a contig assembled from a metagenome, including short sequences, and to classify sequences representing novel taxa at three levels i.e., species, genus and phylum. MyTaxa can assign a larger number of sequences and with higher accuracy compared to other tools available for the same purposes. This is largely attributed to the fact that MyTaxa considers all genes present in an unknown (query) sequence as classifiers and quantifies the classifying power of each gene using predetermined weights, which are derived from the analysis of orthologs of the gene from all available complete genomes. (Reference: Luo, C. et al. 2014. Nucl. Acids Res).

ANI calculator - estimates the average nucleotide identity using both best hits (one-way ANI) and reciprocal best hits (two-way ANI) between two genomic datasets. Typically, the ANI values between genomes of the same species are above 95% while values below 75% are not to be trusted, and AAI should be used instead. This tool supports both complete and draft genomes (multi-fasta).(Reference: Goris J. et al. 2007. Int J Syst Evol Microbiol. 57:81-91).

Average Nucleotide Identity (ANI) calculator - their ANI Calculator uses the OrthoANIu algorithm, an improved iteration of the original OrthoANI algorithm, which uses USEARCH instead of BLAST (Reference: Yoon, S. H. et al. (2017). Antonie van Leeuwenhoek. 110:1281–1286).

TaxMan - inspect your rRNA amplicons and taxa assignments - In microbiome analyses, often rRNA gene databases are used to assign taxonomic names to sequence reads. The TaxMan server facilitates the analysis of the taxonomic distribution of your reads in two ways. First, you can check what taxonomic names are assigned to the sequences produced by your primers and what taxa you will lose. Second, the produced amplicon sequences with lineages in the FASTA header can be downloaded. This can result in a much more efficient analysis with respect to run time and memory usage, since the amplicon sequences are considerably shorter than the full length rRNA gene sequences. In addition, you can download a lineage file that includes the counts of all taxa for your primers and for the used reference. (Reference: Brandt, B.W. et al. 2012. Nucleic Acids Research 40:W82-W87).

VipTree - generates a "proteomic tree" of viral genome sequences based on genome-wide sequence similarities computed by tBLASTx. The original proteomic tree concept (i.e., "the Phage Proteomic Tree”) was developed by Rohwer and Edwards, 2002. A proteomic tree is a dendrogram that reveals global genomic similarity relationships between tens, hundreds, and thousands of viruses. It has been shown that viral groups identified in a proteomic tree well correspond to established viral taxonomies. (Reference: Nishimura Y et al. (2017) Bioinformatics 33: 2379–2380)

VirClust – is a novel tool capable of performing i) hierarchical clustering of viruses based on intergenomic distances calculated from their protein cluster content, ii) identification of core proteins and iii) annotation of viral proteins. VirClust groups proteins into clusters both based on BLASTP sequence similarity, which identifies more related proteins, and also based on hidden markow models (HMM), which identifies more distantly related proteins. Furthermore, VirClust provides an integrated visualization of the hierarchical clustering tree and of the distribution of the protein content, which allows the identification of the genomic features responsible for the respective clustering. (Reference: Moraru C (2021) doi: https://doi.org/10.1101/2021.06.14.448304)