Convert Sequence
Several sites are available for conversion of sequence from one format to another. These include:
Genome2D Genome Tools
Genome2D Genome Tools (Dr. Anne de Jong, Molecular Genetics, University of Groningen, The Netherlands) - this is my go-to site for all manner of analyses. Under "Genome Tools" select "Conversions." This will allow you to convert a GenBank flatfile (gbk) to GFF (General Feature Format, table), CDS (coding sequences), Proteins (FASTA Amino Acids, faa), DNA sequence (Fasta format).
Galaxy
Galaxy is an open, web-based platform for accessible, reproducible, and transparent computational biomedical research. This web server makes analysis tools, genomic data, tutorial demonstrations, persistent workspaces, and publication services available to any scientist. Extensive user documentation applicable to any public or local Galaxy instance is available. Offers a huge varierty of tools for analysis and file interconversion.
Sequence Manipulation Suite (SMS)
Sequence Manipulation Suite (SMS) - this program allows you to remove digits and blank spaces from a sequence to make it suitable for other applications. Also found here.
Sequence conversion
Sequence conversion (Bioinf @ Bugaco) - a huge suite of conversion tools.
Readseq
Readseq developed by D.G. Gilbert (Indiana University) reads and converts biosequences between a selection of common biological sequence formats, including EMBL, GenBank and fasta sequence formats is available here.
EMBOSS Seqret
EMBOSS Seqret
reads and writes (returns) sequences. It is useful for a variety of
tasks, including extracting sequences from databases, displaying
sequences, reformatting sequences, producing the reverse complement of a
sequence, extracting fragments of a sequence, sequence case conversion
or any combination of the above functions.
(Reference: Olson SA (2002) Brief Bioinform 3(1):87-91).
Format Converter
Format Converter - This program takes as input a sequence or sequences (e.g., an alignment) in an unspecified format and converts the sequence(s) to a different user-specified format. Also converts *.gbk to *.gff3.
ApolloRNA Convert data
ApolloRNA Convert data - Transformation of TransTermHP, CRISPRfinder, MOSAIC, PatScan, DARN! (GFF), GenBank output data in GFF and GAME XML format data that can be read by Apollo.
GenBank Trans Extractor
GenBank Trans Extractor accepts a GenBank file as input and returns each of the protein translations described in the file in FASTA format. GenBank Trans Extractor should be used when you are more interested in the predicted protein translations of a DNA sequence than the DNA sequence itself. Part of the Sequence Manipulation Suite.
FeatureExtract 1.2
FeatureExtract 1.2
- extracts sequence and feature annotation, such as intron/exon
structure, from GenBank entries and other GenBank format files.
(Reference: R. Wernersson (2005) Nucleic Acids Res. 33(Web Server issue): W567-W569).
Sequence editor
Sequence editor (part of Shiladitya DasSarma's HaloWeb: The Haloarchaeal Genomes Database) - converts DNA and RNA sequences. Generate antiparallel, complement and inverse sequences.
Format conversion
Format conversion - (single sequence, set of sequences, alignment, tree, matrix, ...) and format are automatically recognized. (Output: FASTA, NEXUS, PHYLIP, Clustal, EMBL, Newick, New Hampshire).
Fasta dataset splitter
Fasta dataset splitter - Part of FaBox (see below)
GenBank 2 Sequin
GenBank 2 Sequin
(P. Lehwark & S. Greiner, Max-Planck Institute for Molecular Plant
Physiology, Germany) - this extremely usesful program is designed to
convert revised GeSeq output into the Sequin format which used to be
required for
NCBI submission.
It also generates formats which can be used for small genome
submissions. None the less, any custom GenBank file can be prepared for
NCBI submission
using GenBank 2 Sequin.
(Reference: Lehwark P & Greiner S. (2019) Genomics. 111(4): 759-761).
JaMBW
JaMBW
(European Molecular Biology Laboratory of Heidelberg, Germany). Java
based Molecular Biologist's Workbench.Select Chapter 1 for sequence
format conversion (upper <-> lower case; T <-> U; reverse or complement
sequence).
(Reference: Toldo L (1997) Comput Appl Biosci 13(4):475-476).
Nucleic Acid Sequence Massager
Nucleic Acid Sequence Massager (Allotron Biosensor Corporation) which in addition to removing spurious material (numbers, breaks, HTML, spaces) changes the format (upper to low case, complement, reverse, RNA to DNA, and triplets).
extractUpStreamDNA
extractUpStreamDNA (A. Villegas, Public Health Ontario) - takes a GenBank flatfile (*.gbk) as input and parses through and for every CDS that it finds, it extracts a pre-determined length of DNA upstream (length will be an argument; and will include 3 nt for the initiation codon). Output will be an FFN file of these upstream DNA sequences. N.B. this only WORKS for prokaryotic sequences because it does not handle Splits or Joins found in eukaryotic. This data then can be analyzed with programs such as MEME. This program is temporarily unavailable online, though one can download it from here.
Convert GenBank to Fasta
Convert GenBank to Fasta (G. Rocap, School of Oceanography, University of Washington, U.S.A.) - Select a GenBank formatted file containing a feature table. Select whether to extract translated peptide sequences, DNA sequence for each feature, or the entire DNA sequence of the whole record. If you chose "Peptide Sequence", your feature table must have "translation" sub-features.
GenBank-JSON-Conversion
GenBank-JSON-Conversion - this converter accepts multi-sequence GenBank, DDBJ, or EMBL/ENA files (*.gb, *.gbk, *.gbf, *.genbank, *.gbff, *.ena, *.embl or *.txt). Alternatively, enter NCBI accession number(s) or upload them in a *.csv file.
FaBox
FaBox
(Palle Villesen Fredsted, Aarhus University, Denmark) - an online fasta
sequence toolbox, including Fasta header editor, Fasta header replacer,
Fasta sequence extractor, Fasta sequence subtractor, Fasta sequence
joiner, Fasta dataset splitter/divider
(Reference: Villesen P (2007) Molecular Ecology Notes 7 (6), 965-968).
SeqScrub
SeqScrub -
is a web application that cleans up FASTA file headers and appends
information from external databases.
(Reference: Foley G et al. (2019) BioTechniques 67(2): 50-54).
PAL2NAL
PAL2NAL
- is a program that converts a multiple sequence alignment of proteins
and the corresponding DNA (or mRNA) sequences into a codon alignment.
The program automatically assigns the corresponding codon sequence even
if the input DNA sequence has mismatches with the input protein
sequence, or contains UTRs, polyA tails. It can also deal with frame
shifts in the input alignment, which is suitable for the analysis of
pseudogenes. The resulting codon alignment can further be subjected to
the calculation of synonymous (dS) and non-synonymous (dN) substitution
rates.
(Reference: Suyama M et al. (2006) Nucl Acids Res 34: W609-W612).
Shuffle DNA and Sequence Randomizer
Shuffle DNA and Sequence Randomizer permit one to randomize a sequence to compare with one's own.
Random Sequence Generator
Random Sequence Generator (Vladimír Cermák, molbiotools.com) - generate random DNA, RNA or protein sequences. Based on the Mersenne Twister algorithm. No unwanted repeats are generated even in very long sequences. Can be used for calculations of DNA, RNA and protein molecular weights and for string reverse and complement transformations.
Updated: February, 2026