PROTEIN CHEMISTRY
BACKGROUND INFORMATION: You might want to consult Robert Russell's Guide to Structure Prediction. For the biochemical properties of amino acids see PROWL, Amino Acid Hydrophobicity and Amino Acid Chart and Reference Table (GenScript). If you are specifically interested in antibodies I would recommend that you visit "The Antibody Resource Page."
Table of amino acid abbreviations
Amino acid composition, mass & pI:
Amino acid composition & Mass – ProtParam (ExPASy, Switzerland)
Isoelectric Point - Compute pI/Mw tool (ExPASy, Switzerland). If you want a plot of the relationship between charge and pH use ProteinChemist (ProteinChemist.com) or JVirGel Proteomic Tools (PRODORIC Net, Germany).
Mass, pI, composition and mol% acidic, basic, aromatic, polar etc. amino acids - PEPSTATS (EMBOSS). Biochemistry-online (Vitalonic, Russia) gives one % composition, molecular weight, pI, and charge at any desired pH.
Peptide Molecular Weight Calculator (GenScript) - the online calculator determines the chemical formula and molecular weight of your peptide of interest. You can also specify post-translational modifications, such as N- and C- terminal modifications and positioning of disulfide bridges, to obtain more accurate outputs.
Isoelectric Point Calculator 2.0 (IPC 2.0) - is a server for the prediction of isoelectric points and pKa values using a mixture of deep learning and support vector regression models. The prediction accuracy (RMSD) of IPC 2.0 for proteins and peptides outperforms previous algorithms. (Reference: Kozlowski LP (2021) Nucl. Acids Res. Web Server issue).
Composition/Molecular Weight Calculation (Georgetown University Medical Center, U.S.A.) - the only problem with this site is that when run in batch mode it does not identify the sequence by name, merely sequential number
Batch Protein Isoelectric Point determination - part of the Sequence Manipulation Suite or ENDMEMO
Batch Protein Molecular Weight determination - part of the Sequence Manipulation Suite or ENDMEMO
Protein calculator (C. Putnam, The Scripps Research Institute, U.S.A.) - calculates mass, pI, charge at a given pH, counts amino acid residues etc.
Tm Predictor (P.C. Lyu Lab., National Tsing-Hua University, Taiwan) - calculates the theoretical protein melting temperature.
Computation of size of DNA and Protein Fragments from Their Electrophoretic Mobility (Reference: Raghava, G. P. S. 2001. Biotech Software and Internet Report 2:198-200).
Antigenicity and allergenicity: a good place to start would be The Immune Epitope Database (IEDB)
Abie Pro Peptide Antibody Design (Chang Bioscience)
Allergenicity servers: AllerTOP (Reference: Dimitrov, I. et al. 2013. BMC Bioinformatics 14(Suppl 6): S4), AlgPred - prediction of allergenic proteins and mapping of IgE epitopes (Reference: Saha, S. and Raghava, G.P.S. 2006. Nucleic Acids Research 34: W202-W209.), and SDAP - Structural Database of Allergenic Proteins (Reference: Ivanciuc, O. et al. 2003. Nucleic Acids Res. 31: 359-362).
EpiToolKit - is a virtual workbench for immunological questions with a focus on vaccine design. It offers an array of immunoinformatics tools covering MHC genotyping, epitope and neo-epitope prediction, epitope selection for vaccine design, and epitope assembly. In its recently re-implemented version 2.0, EpiToolKit provides a range of new functionality and for the first time allows combining tools into complex workflows. For inexperienced users it offers simplified interfaces to guide the users through the analysis of complex immunological data sets. (Reference: Schubert S et al. (2015) Bioinformatics 31(13): 2211–2213).
VIOLIN - Vaccine Investigation and OnLine Information Network - allows easy curation, comparison and analysis of vaccine-related research data across various human pathogens VIOLIN is expected to become a centralized source of vaccine information and to provide investigators in basic and clinical sciences with curated data and bioinformatics tools for vaccine research and development. VBLAST: Customized BLAST Search for Vaccine Research allows various search strategies against against 77 genomes of 34 pathogens. (Reference: He, Y. et al. 2014. Nucleic Acids Res. 42(Database issue): D1124-32).
SVMTriP - is a new method to predict antigenic epitope with lastest sequence input from IEDB database. In our method, Support Vector Machine (SVM) has been utilized by combining the Tri-peptide similarity and Propensity scores (SVMTriP) in order to achieve the better prediction performance. Moreover, SVMTriP is capable of recognizing viral peptides from a human protein sequence background. (Reference: Yao B et al. (2012) PLoS One 7(9): e45152).
Solubility and crystalizability:
EnzymeMiner - offers automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities. The solubility prediction employs the in-house SoluProt predictor developed using machine learning.(Reference: Hon J et al. 2020. Nucl Acids Res 48 (W1): W104–W109).
ESPRESSO (EStimation of PRotein ExpreSsion and SOlubility) - is a sequence-based predictor for estimating protein expression and solubility for three different protein expression systems: in vivo Escherichia coli, Brevibacillus, and wheat germ cell-free. (Reference: Hirose S, & Noguchi T. 2013. Proteomics. 13:1444-1456).
SABLE - Accurate sequence-based prediction of relative Solvent AccessiBiLitiEs,secondary structures and transmembrane domains for proteins of unknown structure. (Reference: Adamczak R et al. 2004. Proteins 56:753-767).
SPpred (Soluble Protein prediction) (Bioinformatics Center, Institute of Microbial Technology, Chandigarh, India) - is a web-server for predicting solubility of a protein on over expression in E.coli. The prediction is done by hybrid of SVM model trained on PSSM profile generated by PSI-BLAST search of 'nr' protein database and splitted amino acid composition.
Protein–Sol - is a web server for predicting protein solubility. Using available data for Escherichia coli protein solubility in a cell-free expression system, 35 sequence-based properties are calculated. Feature weights are determined from separation of low and high solubility subsets. The model returns a predicted solubility and an indication of the features which deviate most from average values. (Reference: Hebditch M et al. 2017. Bioinformatics 33(19): 3098–3100).
CamSol - for the rational design of protein variants with enhanced solubility. The method works by performing a rapid computational screening of tens of thousand of mutations to identify those with the greatest impact on the solubility of the target protein while maintaining its native state and biological activity. (Reference: Sormanni P et al. (2015) J Molec Biol 427(2): 478-490). N.B. Requires registration.
Surface Entropy Reduction p rediction (SERp) - this exploratory tool aims to aid identification of sites that are most suitable for mutation designed to enhance crystallizability by a Surface Entropy Reduction approach. (Reference: Goldschmidt L. et al. 2007. Protein Science. 16:1569-1576)
CRYSTALP2 - for in-silico prediction of protein crystallization propensity. (Reference: Kurgan L, et al. 2009. BMC Structural Biology 9: 50); and, PPCpred - sequence-based prediction of propensity for production of diffraction-quality crystals, production of crystals, purification and production of the protein material.(Reference: M.J. Mizianty & L. Kurgan. 2011. Bioinformatics 27: i24-i33).
Antimicrobial peptides, vaccines and toxins:
APD (Antimicrobial Peptide Database) (Reference: Wang, Z. and Wang, G 2004. Nucl. Acids Res.32: D590-D592)
The Type III Secretion System (T3SS) is an essential mechanism for host-pathogen interaction in the infection process. The proteins secreted through the T3SSmachinery of many Gram-negative bacteria are known as T3SS effectors (T3SEs). These can either be localized subcellularly in the host, or be part of the needle tip of the T3SS that interacts directly with the host membrane to bring other effectors into the target cell. T3SEdb represents such an effort to assemble a comprehensive database of all experimentally determined and putative T3SEs into a web-accessible site. BLAST search is available. (Reference: Tay DM et al. 2010. BMC Bioinformatics. 11 Suppl 7:S4).
Effective (University of Vienna, Austria & Technical University of Munich, Germany) - Bacterial protein secretion is the key virulence mechanism of symbiotic and pathogenic bacteria. Thereby effector proteins are transported from the bacterial cytosol into the extracellular medium or directly into the eukaryotic host cell. The Effective portal provides precalculated predictions on bacterial effectors in all publicly available pathogenic and symbiontic genomes as well as the possibility for the user to predict effectors in own protein sequence data.
Vaxign is the first web-based vaccine design system that predicts vaccine targets based on genome sequences using the strategy of reverse vaccinology. Predicted features in the Vaxign pipeline include protein subcellular location, transmembrane helices, adhesin probability, conservation to human and/or mouse proteins, sequence exclusion from genome(s) of nonpathogenic strain(s), and epitope binding to MHC class I and class II. The precomputed Vaxign database contains prediction of vaccine targets for >350 genomes. (Reference: He Y et al. 2010. J Biomed Biotechnol. 2010: 297505). A newer version Vaxign 2 Beta is available here.
VacTarBac is a platform which stores vaccine candidate against several pathogenic bacteria. The vaccine are designed on the basis of their probabilty to act as epitope, thus have the potential to induce any of the several arm of immune system. These epitopes have been predicted against the virulence factor and essentail genes of 14 bacterial species. (Reference: Nagpal G et al. (2018) Front Immunol. 9: 2280).
Abpred - will take a single amino acid sequence for a Fv and calculate the predicted performance on 12 biophysical platforms (Reference: Hebditch M & J Warwicker (2019) PeerJ. 7: e8199).
T3SE - Type III secretion system effector prediction (Reference: Löwer M, & Schneider G. 2009. PLoS One. 4:e5917. Erratum in: PLoS One. 2009;4(7).
SIEVE Server is a public web tool for prediction of type III secreted effectors. The SIEVE Server scores potential secreted effectors from genomes of bacterial pathogens with type III secretion systems using a model learned from known secreted proteins. The SIEVE Server requires only protein sequences of proteins to be screened and returns a conservative probability that each input protein is a type III secreted effector. (Reference: McDermott JE et al. 2011. Infect Immun. 79:23-32).
Circular dichroism:
Circular Dichroism (Birkbeck College, School of Crystalography, England) DICHROWEB is an interactive web site which allows the deconvolution of data from Circular Dichroism spectroscopy experiments. It offers an interface to a range of deconvolution algorithms (CONTINLL, SELCON3, CDSSTR, VARSLC, K2D).
K2D2: Prediction of percentages of protein secondary structure from CD spectra - allows analysis of 41 CD spectrum data points ranging from 200 nm to 240 nm or or 51 data points for the 190-240 nm range (Reference: Perez-Iratxeta C & Andrade-Navarro MA. 2008. BMC Structural Biology 2008, 8:25)
K2D3 is a web server to estimate the a helix and ß strand content of a protein from its circular dichroism spectrum. K2D3 uses a database of theoretical spectra derived with Dichrocalc (Reference: Louis-Jeune C et al. 2012. Proteins: Structure, Function, & Bioinformatics 80: 374–381)
Cysteine Residues:
DiANNA - will predict cysteine oxidation state (76% accuracy), cysteine pairs (81% accuracy) and disulfide bond connectivity (86% accuracy). (Reference:
F. Ferrè & P. Clote. 2005. Nucl. Acids Res. 33: W230-W232).
CYSREDOX (Rockefeller University, U.S.A.) and CYSPRED (CIRB Biocomputing Group, University of Bologna, Italy) calculate the redox state of cysteine residues in proteins.
Hydrophobicity Plotter (Innovagen ) - and Protein Hydroplotter - sellect under Tools (ProteinLounge, San Diego, CA ).
Proteolysis and Mass Spectrometry:
Proteolysis - PeptideCutter (ExPASy, Switzerland) which also predicts cleavage sites for enzymes and chemicals. An alternative proteolysis site is Mobility_plot 4.1 (Advanced Proteolytic Fingerprinting, IGH, France).
For more sophisticated protein analysis involving mass spectroscopy ExPasy has introduced FindMod to predict potential protein post-translational modifications in peptides; and, GlycoMod which can predict the possible oligosaccharide structures that occur on proteins from their experimentally determined masses.
ProFound - is a tool for searching a protein sequence database using information from mass spectra of peptide maps. A Bayesian algorithm is used to rank the protein sequences in the database according to their probability of producing the peptide map. A simplified version can be accessed here (Rockefeller University, New York, U.S.A.) . One cannot use one's own protein database.
ProteinProspector (University of California) - offers a wide variety of tools (e.g. MS-Fit, MS-Tag, MS-Seq, MS-Pattern, MS-Homology) for the protein mass spectroscopist.
Repeats:
Repeats in protein sequences can be discovered using Radar (Rapid Automatic Detection and Alignment of Repeats, European Bioinformatics Institute) or REPRO (Reference: George RA. & Heringa J. 2000. Trends Biochem. Sci. 25: 515-517).
REPPER (REPeats and their PERiodicities) - detects and analyzes regions with short gapless repeats in proteins. It finds periodicities by Fourier Transform (FTwin) and internal similarity analysis (REPwin). FTwin assigns numerical values to amino acids that reflect certain properties, for instance hydrophobicity, and gives information on corresponding periodicities. REPwin uses self-alignments and displays repeats that reveal significant internal similarities. They are complemented by PSIPRED and coiled coil prediction (COILS), making the server a useful analytical tool for fibrous proteins. (Reference: M. Gruber et al. 2005. Nucl. Acids Res. 33: W239-W243).
Two-dimensional gels:
JVirGel calculation of virtual two-dimensional protein gels - creates virtual 2D proteomes from a huge list of eukaryotes & prokaryotes (or an individual protein). Two versions: html (limited) and Java applet (incredible but you need to install Java Runtime Environment. (Reference: K. Hiller et al. 2003. Nucl. Acids Res. 31: 3862-3865).
Draw Virtual Two-Dimensional Protein Gels (PRODORIC Net, Germany) - using your own protein sequence data or for different organisms.
Metasite:
Scratch Protein Predictor - (Institute for Genomics and Bioinformatics, University California, Irvine) - programs include: ACCpro: the relative solvent accessibility of protein residues; CMAPpro: Prediction of amino acid contact maps; COBEpro: Prediction of continuous B-cell epitopes; CONpro: predicts whether the number of contacts of each residue in a protein is above or below the average for that residue; DIpro: Prediction of disulphide bridges; DISpro: Prediction of disordered regions; DOMpro: Prediction of domains; SSpro: Prediction of protein secondary structure; SVMcon: Prediction of amino acid contact maps using Support Vector Machines; and, 3Dpro: Prediction of protein tertiary structure (Ab Initio).
Mutagenesis:
Gene Mutagenesis Designer (GenScript) is developed to make your design of point DNA mutagenesis straightforward to facilitate gene mutation. To perform DNA mutagenesis from wild type, simply input your starting sequence of wild type gene into the field below, and then click on the “from selection” button to select the amino acid(s) of interest. Consequently, the new gene sequence encoding mutated protein will be generated upon a click “submit”. You can select a number of expression systems.
I-Mutant2.0: predictor of protein stability changes upon mutation - choose either a PDB reference number or paste your own protein. The answer (by email) indicates whether the protein is more or less stable, a fact which could be of use in designing "better" proteins. (Reference: E. Capriotti et al. 2005. Nucl. Acids Res. 33: W306-W310).
SIFT - The Sorting Intolerant from Tolerant (SIFT) algorithm predicts the effect of coding variants on protein function i.e. it predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids. SIFT can be applied to naturally occurring nonsynonymous polymorphisms and laboratory-induced missense mutations. (Reference: N-L Sim et al. 2012. Nucleic Acids Research; 40(1): W452–W457).
mCSM-membrane - predicts the effects of mutations on transmembrane proteins. (Reference: Pires DEV et al. 2020. Nucl Acids Res 48 (W1): W147–W153).
EnzymeMiner - allows automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities. The solubility prediction employs the in-house SoluProt predictor developed using machine learning.(Reference: Hon J et al. 2020. Nucl Acids Res 48 (W1): W104–W109).
PlaToLoCo (PLAtform of TOols for LOw COmplexity) - is the first web meta-server for visualization and annotation of low complexity regions in proteins which employs five different state-of-the-art tools for discovering LCRs and provides functional annotations such as domain detection, transmembrane segment prediction, and calculation of amino acid frequencies. (Reference: Jarnot P et al. 2020. Nucl Acids Res 48 (W1): W77–W84).