SITES: A number of excellent sites exist all of which permit translation in all six reading frames. I would recommend "ORF Finder" because of its visuals and Pipeline or GeneMark if you are seriously interested in identifying genes within your sequence. The latter two programs permit the analysis of long sequences (submit by attachment not in the box).
path :: protein back-translation and alignment - addresses the problem of finding distant protein homologies where the divergence is the result of frameshift mutations and substitutions. Given two input protein sequences, the method implicitly aligns all the possible pairs of DNA sequences that encode them, by manipulating memory-efficient graph representations of the complete set of putative DNA sequences for each protein. (Reference: Gîrdea M et al. 2010. Algorithms for Molecular Biology 5:)
Simple translation tools - DNA to protein sequences:
Open Reading Frame Finder (NCBI) - searches for open reading frames (ORFs) in the DNA sequence you enter. The program returns the range of each ORF, along with its protein translation. Use ORF finder to search newly sequenced DNA for potential protein encoding segments, verify predicted protein using newly developed SMART BLAST or regular BLASTP.
Six-frame Translations can be done at Tuebingen, Russia, Bioline, and Science Launcher.
EMBOSS Sixpack (EMBL-EBI) - reads a DNA sequence and outputs the three forward and (optionally) three reverse translations in a visual manner. Alternatively use EMBOSS Transeq
MBS Translator (JustBio Tools) - An excellent new site since one can translate specifically from ATG and the results are presented with the nucleotide sequence overlaying the amino acid sequence. Ideal for Cut/Paste into a manuscript. You need to register to use this free tool. Other quick translation tools are here and here.
Translator (fr33.net, France) or DNA to protein translation
Translate (ExPASy, Switzerland) - is a tool which allows the translation of a nucleotide (DNA/RNA) sequence to a protein sequence.
Transcription and Translation Tool (Attotron Biosensor Corporation)
DNA to protein translation (University of the Basque Country, Spain) and here.
Translation of multiple sequences:
Virtual Ribosome (Reference: R. Wernersson. 2006. Nucl. Acids Res. 34 (web Server Issue): W385-388) - I find that the output from the first two sites is optimal for translating multiple DNA sequences.
RevTrans 1.4 Server (CBS, Danish Technical University)
TranslatorX - is a web server designed to align protein-coding nucleotide sequences based on their corresponding amino acid translations. TranslatorX novelties include: (i) use of all documented genetic codes and the possibility of assigning different genetic codes for each sequence; (ii) a battery of different multiple alignment programs; (iii) translation of ambiguous codons when possible; (iv) an innovative criterion to clean nucleotide alignments with GBlocks based on protein information; and (v) a rich output, including Jalview-powered graphical visualization of the alignments, codon-based alignments coloured according to the corresponding amino acids, measures of compositional bias and first, second and third codon position specific alignments. (Reference: Abascal F, et al. (2010) Nucleic Acids Res. 38: W7-13).
Backtranslation i.e. taking a protein sequence and defining it as DNA sequence:
Back Translation- part of the The Sequence Manipulation Suite; limited choice of codon usage (E.coli and H. sapiens)
Protein to DNA reverse translation - includes a wide range of genetic codes
Reverse translation of aminoacid sequences - probably the best in that it includes the genetic codes of seven organisms (E.coli, and 6 eukaryotes); plus provides consensus and detail output of results in RNA or DNA.
Identification of open-reading frames:
StarORF - facilitates the identification of the protein(s) encoded within a DNA sequence. Using StarORF, the DNA sequence is first transcribed into RNA and then translated into all the potential ORFs (Open Reading Frame) encoded within each of the six translation frames (3 in the forward direction and 3 in the reverse direction). This allows students to identify the translation frame that results in the longest protein coding sequence.
TICO - Translation Initiation site COrrection - provides an interface for direct post processing of the predictions obtained from GLIMMER to improve the accuracy of annotated Translation Initiation Sites (TIS). (Reference: M. Tech et al. 2005. Bioinformatics 21: 3568-3569)
GeneMark Homepage (M. Borodovsky, Georgia Institute of Technology Atlanta, U.S.A.) offers a family of programs for ORF analysis. This site links one to a growing number of programs for modeling phage, bacterial, and eukaryotic data. Extensive control is possible with the data output, i.e. one can request the nucleotide and protein sequence of the ORFs. Two programs to consider are GeneMarkS (Reference: Besemer J et al. 2001. Nucleic Acids Research; 29:2607-2618) or GeneMarkS-2 and Heuristic Approach for Gene Prediction (Reference: Besemer J & Borodovsky M. 1999. Nucleic Acids Research; 27:911 3920). For metagenomic analysis use MetaGeneMark (Reference: Zhu, W. et al. 2010. Nucleic Acids Research; 38: e132).
EasyGene (Technical University of Denmark; Reference:T.S. Larsen and A. Krogh. 2003. EasyGene - a prokaryotic gene finder that ranks ORFs by statistical significance. BMC Bioinformatics 4:21) - produces a list of predicted genes given a sequence of prokaryotic DNA. Each prediction is attributed with a significance score (R-value) indicating how likely it is to be just a non-coding open reading frame rather than a real gene. The user needs only to specify the organism hosting the query sequence. It you are interested in the analysis of existing bacterial genomes consult EasyGene 1.2.
AMIGene - (Reference: Bocs, S. et al. 2003. Nucl. Acids Res. 13: 3723-3726)
FgenesB (SoftBerry) -fast Pattern/Markov chain-based bacterial operon and gene prediction. Somewhat limited range of model bacteria & archaea. Accuracy.
ZCURVE is an ab initio program for gene finding in bacterial or archaeal genomes and its latest version is 3.0. Based on cross validations of 422 prokaryotic genomes, ZCURVE 3.0 has slightly higher accuracy than Glimmer 3.02. (Reference: Hua, Z-G. 2015.
Nucl. Acids Res.).
FramePlot 2.3.2 (National Institute of Health, Japan) - This site permits one to select the minimal size of the ORF, and the start codon (ATG or GTG being the most common). While in presentation (a series of coloured arrows is somewhat confusing by clicking on any arrow one can view the DNA and protein sequence. These can be used in homology (BLASTN & BLASTP) searches. (Reference: Ishikawa,J. & Hotta K. 1999. FEMS Microbiol. Lett. 174 :251-253).
ExPASy – Translate tool (ExPASy, University of Geneva, Switzerland). I find this site useful if I have a gene which begins with an alternative start codon. An alternative site is Translate Nucleic Acid Sequence Tool (University of Massachusetts Medical School, U.S.A.) which permits choice of reading frame(s) and genetic code.
Third Position GC Skew Display (The Institute for Genomic Research, U.S.A.) predicts genes by comparing possible open reading frames (variety of initiation codon options) to a third position GC plot. This tool is apparently most effective for genomes with a high G+C content.
FSFinder2 (Frameshift Signal Finder) - Programmed ribosomal frameshifting is involved in the expression of certain genes from a wide range of organisms such as virus, bacteria and eukaryotes including human. In programmed frameshifting, the ribosome switches to an alternative frame at a specific site in response to a special signal in a messanger RNA. Programmed frameshift plays role in viral particle morphogenesis, autogenous control, and alternative enzymatic activities. The common frameshift is a -1 frameshift, in which the ribosome shifts a single nucleotide in the upstream direction. The major elements of -1 frameshifting consist of a slippery site, where the ribosome changes reading frames, and a stimulatory RNA structure such as pseudoknot or stem-loop located a few nucleotides downstream. +1 frameshifts are much less common than -1 frameshifting but are observed in diverse organisms.
When you have identified a potential gene you might want to determine its codon usage. Codon Adaptation Index (CAI) is a technique for analyzing Codon usage bias. CAI measures the deviation of a given protein coding gene sequence with respect to a reference set of genes
For quantitative data on general codon usage in different cells consult the Codon Usage Database (Kazusa DNA Research Institute, Japan) - Unfortunately the data is presented in frequency charts which have to be manually converted to % codon usage for specific amino acids. In addition, the data has not been updated since 2007. For Information on the codons see DNA analysis (Codon Usage) which is part of the The Sequence Manipulation Suite(Paul Stothard) at Bioinformatics.org/The Open Lab.
Inidon (Andre Villegas, Public Health Ontario, Canada) - this Java-based program reads GenBank *.ffn files (FASTA formatted gene files) and provides one with a numeric and percentage usage of start codons. The latter can be downloaded for sequenced genomes from the GenBank genome site. For bacteriophage and other smaller genomes locate the file using the "search genome" function at NCBI and select "Views - coding regions." From the next screen use "Save - FASTA nucleotide." This program is currently unavailable online but the perl script can be downloaded from here.
CodonW - is designed to simplify the Multivariate analysis (correspondence analysis) of codon and amino acid usage. It also calculates standard indices of codon usage. details of this program are provided here.
CAI Calculator 2 (John Peden) - Codon usage is biased within and across genomes. The unequal frequency of codons results mainly from overall base composition of the genome however some genes, such those which are highly expressed, tend to exhibit stronger codon bias. Sharp & Li (1987) proposed to use codon adaptation index to evaluate how well a gene is adapted to the translational machinery. CAI is a single value measurement that summarizes the codon usage of a gene relative to the codon usage of a reference set of genes. A higher CAI value usually suggests that the gene of interest is likely to be highly expressed. This site offers the choice of Sharp & Li (1987) or Eyre-Walker (1996) equations for calculating CAI.
CAIcal - performs several computations in relation to codon usage and the codon adaptation of DNA or RNA sequences to host organisms. (Reference: Puigbo, P. et al. 2008. Biology Direct 3:38).
E-CAI (Expected CAI calculation) - calculates the expected value of the Codon Adaptation Index (CAI) for a set of query sequences by generating random sequences with similar G+C content and amino acid composition to the input. This expected CAI therefore provides a direct threshold value for discerning whether the differences in the CAI value are statistically significant and arise from the codon preferences or whether they are merely artifacts that arise from internal biases in the G+C composition and/or amino acid composition of the query sequences. (Reference: Puigbo, P. et al. 2008. BMC Bioinformatics 9:65).
gcua - Graphical Codon Usage (Universität Regensburg Naturwissenschaftliche Fakultät III, Germany) - offers three possibilities: (a) each triplet position vs usage table - the fraction of usage of each codon in the selected organism is presented; (b) each codon vs. usage table - the fraction of usage of each codon in the submitted sequence will be computed and plotted against the fraction of usage of the codon in the selected organism; and, (c) compare two usage tables - submit or choose two codon usage tables. The fraction of usage of each codon in the submitted usage tables will be compared graphically.
Amino Acid and Codon Usage Statistics (Institute of Bioinformatics, University of Georgia, U.S.A.). - very useful resource
CodonO - synonymous codon usage biases are associated with various biological factors, such as gene expression level, gene length, gene translation initiation signal, protein amino acid composition, protein structure, tRNA abundance, mutation frequency and patterns, and GC compositions. CodonO is a user-friendly tool for codon usage bias analyses across and within genomes in real time. (Reference: M.C. Angellotti et al. 2007. Nucl. Acids Res. 35: (Web Server issue)W132-W136)
Rare codon analysis (GenScript USA Inc.) - it is extremely useful to analyze your coding sequences for codon usage prior to attempting protein expression. This tools offers two bacteria (E.coli & Streptomyces), a variety of plants (Nicotonia & Arabidopsis), animals (human & insects) and yeast (Pichia & Saccharomyces).
PAL2NAL - a program that converts a multiple sequence alignment of proteins and the corresponding DNA (or mRNA) sequences into a codon alignment. The program automatically assigns the corresponding codon sequence even if the input DNA sequence has mismatches with the input protein sequence, or contains UTRs, polyA tails. It can also deal with frame shifts in the input alignment, which is suitable for the analysis of pseudogenes. The resulting codon alignment can further be subjected to the calculation of synonymous (dS) and non-synonymous (dN) substitution rates. (Reference: Suyama M et al. 2006. Nucleic Acids Res. 34: W609-W612).
If you want to express a gene in an organism having different different codon usage:
JCat - Codon Adapter Tool - offers a complete range of eukaryotic & prokaryotic cells; and, the ability to select against rho-independent terminators and restriction sites. (Reference: A. Grote et al. 2005. Nucl. Acids Res. 33: W526-W531).
OPTIMIZER: a web server for optimizing the codon usage of DNA sequences - one can use pre-computed tables from more than 150 prokaryotic species under a strong translational selection. Three methods of optimization are available: the 'one amino acid - one codon' approach, a random approach or an intermediate one. Several options, such as avoiding specific restriction sites and several outputs, are also available. This server can be useful for predicting and optimizing the level expression of a gene in heterologous gene expression.(Reference: P. Puigbò et al. 2007. Nucl. Acids Res. 35:
RBS Calculator - they developed a biophysical model employing thermodynamic first principles and a four-parameter free energy model to accurately predict the ribosome’s translation initiation rates for 136 synthetic 5′ UTRs with large structures, diverse shapes and multiple standby site modules. The model predicts and experiments confirm that the ribosome can readily bind distant standby site modules that support high translation rates, providing a physical mechanism for observed context effects and long-range post-transcriptional regulation. (Reference: A. E. Borujeni, et al. 2014. Nucleic Acid Research; 42 (4): 2646–2659,