PROTEIN CHEMISTRY
BACKGROUND INFORMATION: You might want to consult Robert Russell's Guide to Structure Prediction. For the biochemical properties of amino acids see PROWL, Amino Acid Hydrophobicity and Amino Acid Chart and Reference Table (GenScript). If you are specifically interested in antibodies I would recommend that you visit "The Antibody Resource Page."
Table of amino acid abbreviations
Amino acid composition, mass & pI:
Amino acid composition & Mass – ProtParam (ExPASy, Switzerland)
Isoelectric Point - Compute pI/Mw tool (ExPASy, Switzerland). If you want a plot of the relationship between charge and pH use ProteinChemist (ProteinChemist.com) or JVirGel Proteomic Tools (PRODORIC Net, Germany).
Mass, pI, composition and mol% acidic, basic, aromatic, polar etc. amino acids - PEPSTATS (EMBOSS). Biochemistry-online (Vitalonic, Russia) gives one % composition, molecular weight, pI, and charge at any desired pH.
Peptide Molecular Weight Calculator (GenScript) - the online calculator determines the chemical formula and molecular weight of your peptide of interest. You can also specify post-translational modifications, such as N- and C- terminal modifications and positioning of disulfide bridges, to obtain more accurate outputs.
Isoelectric Point Calculator 2.0 (IPC 2.0) - is a server for the prediction of isoelectric points and pKa values using a mixture of deep learning and support vector regression models. The prediction accuracy (RMSD) of IPC 2.0 for proteins and peptides outperforms previous algorithms. (Reference: Kozlowski LP (2022) Nucl. Acids Res. Web Server issue 50(Issue D1): D1535–D1540).
Composition/Molecular Weight Calculation (Georgetown University Medical Center, U.S.A.) - the only problem with this site is that when run in batch mode it does not identify the sequence by name, merely sequential number
Batch Protein Isoelectric Point determination - part of the Sequence Manipulation Suite or ENDMEMO
Batch Protein Molecular Weight determination - part of the Sequence Manipulation Suite or ENDMEMO
Protein calculator (C. Putnam, The Scripps Research Institute, U.S.A.) - calculates mass, pI, charge at a given pH, counts amino acid residues etc.
Tm Predictor (P.C. Lyu Lab., National Tsing-Hua University, Taiwan) - calculates the theoretical protein melting temperature.
Computation of size of DNA and Protein Fragments from Their Electrophoretic Mobility (Reference: Raghava, G. P. S. 2001. Biotech Software and Internet Report 2:198-200).
Antigenicity and allergenicity: a good place to start would be The Immune Epitope Database (IEDB)
Allergenicity servers: AllerTOP (Reference: Dimitrov, I. et al. 2013. BMC Bioinformatics 14(Suppl 6): S4), AlgPred - prediction of allergenic proteins and mapping of IgE epitopes (Reference: Saha, S. and Raghava, G.P.S. 2006. Nucleic Acids Research 34: W202-W209.), and SDAP - Structural Database of Allergenic Proteins (Reference: Ivanciuc, O. et al. 2003. Nucleic Acids Res. 31: 359-362).
VIOLIN - Vaccine Investigation and OnLine Information Network - allows easy curation, comparison and analysis of vaccine-related research data across various human pathogens VIOLIN is expected to become a centralized source of vaccine information and to provide investigators in basic and clinical sciences with curated data and bioinformatics tools for vaccine research and development. VBLAST: Customized BLAST Search for Vaccine Research allows various search strategies against against 77 genomes of 34 pathogens. (Reference: He, Y. et al. 2014. Nucleic Acids Res. 42(Database issue): D1124-32).
SVMTriP - is a new method to predict antigenic epitope with lastest sequence input from IEDB database. In our method, Support Vector Machine (SVM) has been utilized by combining the Tri-peptide similarity and Propensity scores (SVMTriP) in order to achieve the better prediction performance. Moreover, SVMTriP is capable of recognizing viral peptides from a human protein sequence background. (Reference: Yao B et al. (2012) PLoS One 7(9): e45152).
Solubility and crystalizability:
EnzymeMiner - offers automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities. The solubility prediction employs the in-house SoluProt predictor developed using machine learning.(Reference: Hon J et al. 2020. Nucl Acids Res 48 (W1): W104–W109).
ESPRESSO (EStimation of PRotein ExpreSsion and SOlubility) - is a sequence-based predictor for estimating protein expression and solubility for three different protein expression systems: in vivo Escherichia coli, Brevibacillus, and wheat germ cell-free. (Reference: Hirose S, & Noguchi T. 2013. Proteomics. 13:1444-1456).
SABLE - Accurate sequence-based prediction of relative Solvent AccessiBiLitiEs,secondary structures and transmembrane domains for proteins of unknown structure. (Reference: Adamczak R et al. 2004. Proteins 56:753-767).
Protein–Sol - is a web server for predicting protein solubility. Using available data for Escherichia coli protein solubility in a cell-free expression system, 35 sequence-based properties are calculated. Feature weights are determined from separation of low and high solubility subsets. The model returns a predicted solubility and an indication of the features which deviate most from average values. (Reference: Hebditch M et al. 2017. Bioinformatics 33(19): 3098–3100).
SOLUPROT - was created using the gradient boosting machine technique with the TargetTrack database as a training set. When evaluated against a balanced independent test set derived from the NESG database, SoluProt's accuracy of 58.5% and AUC of 0.62 exceeded those of a suite of alternative solubility prediction tools. (Reference: Hon, J., et al. (2021) Bioinformatics 37 (1): 23–28).
CamSol - for the rational design of protein variants with enhanced solubility. The method works by performing a rapid computational screening of tens of thousand of mutations to identify those with the greatest impact on the solubility of the target protein while maintaining its native state and biological activity. (Reference: Sormanni P et al. (2015) J Molec Biol 427(2): 478-490). N.B. Requires registration.
Surface Entropy Reduction prediction (SERp) - this exploratory tool aims to aid identification of sites that are most suitable for mutation designed to enhance crystallizability by a Surface Entropy Reduction approach. (Reference: Goldschmidt L. et al. 2007. Protein Science. 16:1569-1576)
CRYSTALP2 - for in-silico prediction of protein crystallization propensity. (Reference: Kurgan L, et al. 2009. BMC Structural Biology 9: 50); and, PPCpred - sequence-based prediction of propensity for production of diffraction-quality crystals, production of crystals, purification and production of the protein material.(Reference: M.J. Mizianty & L. Kurgan. 2011. Bioinformatics 27: i24-i33).
Antimicrobial peptides, vaccines and toxins:
APD3 (Antimicrobial Peptide Database) (Reference: Wang, G., Li, X. & Wang, Z. (2016) Nucl. Acids Res. 44: D1087-D1093.)
The Type III Secretion System (T3SS) is an essential mechanism for host-pathogen interaction in the infection process. The proteins secreted through the T3SSmachinery of many Gram-negative bacteria are known as T3SS effectors (T3SEs). These can either be localized subcellularly in the host, or be part of the needle tip of the T3SS that interacts directly with the host membrane to bring other effectors into the target cell. T3SEdb represents such an effort to assemble a comprehensive database of all experimentally determined and putative T3SEs into a web-accessible site. BLAST search is available. (Reference: Wang Y et al. 2012. BMC Bioinformatics. 13: 66.). T3Enc - is an encyclopedia on bacterial type III secretion systems is an update to the discovery of TSSS (Reference: Hu Y et al. (2017) Environ. Microbiol. 19(10): 3879-3895).
T3SEpp - is a prediction pipeline which integrates the results of individual modules, resulting in high accuracy (i.e., ∼0.94) and >1-fold reduction in the false-positive rate compared to that of state-of-the-art software tools. (Reference: Hui X et al. 2020. mSystems 5 (4): e00288-20). DeepT3 2.0 - integrates different deep learning models for genome-wide predicting T3SEs from a bacterium of interest.
DeepT3 - combines various deep learning architectures including convolutional, recurrent, convolutional-recurrent and multilayer neural networks to learn N-terminal representations of proteins specifically for T3SS prediction. (Reference: Luo J et al. 2021. NAR Genomics and Bioinformatics 3(4): ).
Effective (University of Vienna, Austria & Technical University of Munich, Germany) - Bacterial protein secretion is the key virulence mechanism of symbiotic and pathogenic bacteria. Thereby effector proteins are transported from the bacterial cytosol into the extracellular medium or directly into the eukaryotic host cell. The Effective portal provides precalculated predictions on bacterial effectors in all publicly available pathogenic and symbiontic genomes as well as the possibility for the user to predict effectors in own protein sequence data.
Vaxign is the first web-based vaccine design system that predicts vaccine targets based on genome sequences using the strategy of reverse vaccinology. Predicted features in the Vaxign pipeline include protein subcellular location, transmembrane helices, adhesin probability, conservation to human and/or mouse proteins, sequence exclusion from genome(s) of nonpathogenic strain(s), and epitope binding to MHC class I and class II. The precomputed Vaxign database contains prediction of vaccine targets for >350 genomes. (Reference: He Y et al. 2010. J Biomed Biotechnol. 2010: 297505).
VacTarBac is a platform which stores vaccine candidate against several pathogenic bacteria. The vaccine are designed on the basis of their probabilty to act as epitope, thus have the potential to induce any of the several arm of immune system. These epitopes have been predicted against the virulence factor and essentail genes of 14 bacterial species. (Reference: Nagpal G et al. (2018) Front Immunol. 9: 2280).
Abpred - will take a single amino acid sequence for a Fv and calculate the predicted performance on 12 biophysical platforms (Reference: Hebditch M & J Warwicker (2019) PeerJ. 7: e8199).
SIEVE Server is a public web tool for prediction of type III secreted effectors. The SIEVE Server scores potential secreted effectors from genomes of bacterial pathogens with type III secretion systems using a model learned from known secreted proteins. The SIEVE Server requires only protein sequences of proteins to be screened and returns a conservative probability that each input protein is a type III secreted effector. (Reference: McDermott JE et al. 2011. Infect Immun. 79:23-32).
Victors - is a database comprised of genes experimentally observed to be necessary for virulence. Included are virulence factors for many different bacteria, viruses, parasites and fungi, which are pathogenic to animals and humans. Within Victors are virulence factors, as well as corresponding sequence information taken from NCBI when available. LPS and capsule structures are also included as virulence factors, but do not have attached sequence information as they are tertiary gene products and therefore do not have singular sequence data available. Has a BLAST interface (Reference: Sayers S et al.(2019) 47(D1): D693-D700)
Circular dichroism:
Circular Dichroism (Birkbeck College, School of Crystalography, England) DICHROWEB is an interactive web site which allows the deconvolution of data from Circular Dichroism spectroscopy experiments. It offers an interface to a range of deconvolution algorithms (CONTINLL, SELCON3, CDSSTR, VARSLC, K2D).
K2D2: Prediction of percentages of protein secondary structure from CD spectra - allows analysis of 41 CD spectrum data points ranging from 200 nm to 240 nm or or 51 data points for the 190-240 nm range (Reference: Perez-Iratxeta C & Andrade-Navarro MA. 2008. BMC Structural Biology 2008, 8:25)
K2D3 is a web server to estimate the a helix and ß strand content of a protein from its circular dichroism spectrum. K2D3 uses a database of theoretical spectra derived with Dichrocalc (Reference: Louis-Jeune C et al. 2012. Proteins: Structure, Function, & Bioinformatics 80: 374–381)
Cysteine Residues:
CYSREDOX (Rockefeller University, U.S.A.) and CYSPRED (CIRB Biocomputing Group, University of Bologna, Italy) calculate the redox state of cysteine residues in proteins.
Hydrophobicity Plotter (Innovagen ) - and Protein Hydroplotter - sellect under Tools (ProteinLounge, San Diego, CA ).
Proteolysis and Mass Spectrometry: An excellent proteomic resource is the Rokefeller UniversityProteomics Resource Center's Useful Links.
Proteolysis - PeptideCutter (ExPASy, Switzerland) which also predicts cleavage sites for enzymes and chemicals.
For more sophisticated protein analysis involving mass spectroscopy ExPasy has introduced FindMod to predict potential protein post-translational modifications in peptides; and, GlycoMod which can predict the possible oligosaccharide structures that occur on proteins from their experimentally determined masses.
ProteinProspector (Dr. Alma Burlingame, University of California) - offers a wide variety of tools (e.g. MS-Fit, MS-Tag, MS-Seq, MS-Pattern, MS-Homology) for the protein mass spectroscopist.
Repeats:
Repeats in protein sequences can be discovered using Radar (Rapid Automatic Detection and Alignment of Repeats, European Bioinformatics Institute) or REPRO (Reference: George RA. & Heringa J. 2000. Trends Biochem. Sci. 25: 515-517).
REP2 - is a web server to detect common tandem repeats in protein sequences.(Reference: Kamel M, et al. (2021) J. Molec. Biol. 433(11):166895).
Two-dimensional gels:
JVirGel calculation of virtual two-dimensional protein gels - creates virtual 2D proteomes from a huge list of eukaryotes & prokaryotes (or an individual protein). Two versions: html (limited) and Java applet (incredible but you need to install Java Runtime Environment. (Reference: K. Hiller et al. 2003. Nucl. Acids Res. 31: 3862-3865).
Draw Virtual Two-Dimensional Protein Gels (PRODORIC Net, Germany) - using your own protein sequence data or for different organisms. Also see Proteome-pI which is a database of pre-computed isoelectric points and molecular weights for proteins and digest peptides from model organism proteomes
Metasite:
Scratch Protein Predictor - (Institute for Genomics and Bioinformatics, University California, Irvine) - programs include: ACCpro: the relative solvent accessibility of protein residues; CMAPpro: Prediction of amino acid contact maps; COBEpro: Prediction of continuous B-cell epitopes; CONpro: predicts whether the number of contacts of each residue in a protein is above or below the average for that residue; DIpro: Prediction of disulphide bridges; DISpro: Prediction of disordered regions; DOMpro: Prediction of domains; SSpro: Prediction of protein secondary structure; SVMcon: Prediction of amino acid contact maps using Support Vector Machines; and, 3Dpro: Prediction of protein tertiary structure (Ab Initio).
Mutagenesis:
Gene Mutagenesis Designer (GenScript) is developed to make your design of point DNA mutagenesis straightforward to facilitate gene mutation. To perform DNA mutagenesis from wild type, simply input your starting sequence of wild type gene into the field below, and then click on the “from selection” button to select the amino acid(s) of interest. Consequently, the new gene sequence encoding mutated protein will be generated upon a click “submit”. You can select a number of expression systems.
I-Mutant2.0 - predicts protein stability changes upon mutation - choose either a PDB reference number or paste your own protein. The answer (by email) indicates whether the protein is more or less stable, a fact which could be of use in designing "better" proteins. (Reference: E. Capriotti et al. 2005. Nucl. Acids Res. 33: W306-W310).
SIFT - The Sorting Intolerant from Tolerant (SIFT) algorithm predicts the effect of coding variants on protein function i.e. it predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids. SIFT can be applied to naturally occurring nonsynonymous polymorphisms and laboratory-induced missense mutations. (Reference: N-L Sim et al. 2012. Nucleic Acids Research; 40(1): W452–W457).
mCSM-membrane - predicts the effects of mutations on transmembrane proteins. (Reference: Pires DEV et al. 2020. Nucl Acids Res 48 (W1): W147–W153).
EnzymeMiner - allows automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities. The solubility prediction employs the in-house SoluProt predictor developed using machine learning.(Reference: Hon J et al. 2020. Nucl Acids Res 48 (W1): W104–W109).
PlaToLoCo (PLAtform of TOols for LOw COmplexity) - is the first web meta-server for visualization and annotation of low complexity regions in proteins which employs five different state-of-the-art tools for discovering LCRs and provides functional annotations such as domain detection, transmembrane segment prediction, and calculation of amino acid frequencies. (Reference: Jarnot P et al. 2020. Nucl Acids Res 48 (W1): W77–W84).