help Help for Chemical Shift Prediction

Help File For Chemical Shift Prediction

Purpose

This Web-based server is for those interested in rapidly and accurately predicting the 1H, 13C and/or 15N chemical shifts of proteins and peptides using only the protein sequence as input. It uses sequence alignments to previously assigned proteins and a parameterized extrapolation method to predict the chemical shifts of a given query sequence.

Applications

Predicted chemical shifts can be a useful starting point in conducting, completing and verifying chemical shift assignments in peptides and proteins. The primary purpose of this server, therefore, is to facilitate the assignment process for previously unassigned peptides and proteins.

Number of Proteins/Peptides in the Database: Total: 4720
1H: 2025
13C: 1273
15N: 1422

Running the Program

1. Before starting this application, be sure to have your protein/peptide sequence ready.

2. Select the type of chemical shift you want predicted for your protein. If you want proton shifts only, select "1H". If you want carbon shifts only select "13C". If you want nitrogen shifts (and amide protons) only, select "15N". If you want to predict all possible shifts, select "All". Note that because there are substantially more proteins with 1H assignments you are far more likely to get 1H chemical shift predictions than 13C or 15N chemical shift predictions. This is why the default is normally set to "1H". If you choose "All" and only 1H and 15N chemical shifts are known for a given homologue, then only these two types of chemical shifts will be predicted.

Tip: If you selected "13C" or "15N" and you find that you do not get a "hit", try selecting "1H" or "All". This will improve your chances of getting a useful chemical shift prediction.

3. Select the number of sequence matches for the program to keep. This refers to the number of predictions that the program will attempt to make. Normally this is set to "1" to indicate that you only want the best match to be kept and used for the chemical shift prediction. Tests (ref. 1) have shown that a chemical shift prediction generated from the most homologous sequence in a database is always better than that generated from a less homologous sequence or from the weighted average of several homologues.

Tip: If your query protein has many homologues in the database (say it's a calcium binding protein), then you may want to keep several matches for additional comparisons or to distinguish between apo and Ca saturated forms of the homologues.

4. Select the output file type. This determines how the predicted chemical shifts will be presented to you. Selecting the "SHIFTY" output file type produces to a multi-column, relatively compact presentation of the chemical shift prediction. The SHIFTY format is most useful if you wish to print the data. Selecting the "NMRSTAR" output file type produces to a much longer single-column presentation of the chemical shift prediction file. The NMRSTAR format is most useful if the data is to be processed later by a computer. "SHIFTY" is the default format.

5. Type or paste in your peptide or protein sequence into the sequence window. DO NOT add a title, name or accession number to the beginning of the sequence. All characters entered into this window are assumed to be amino acids. Be sure to use the IUPAC one letter code for all amino acids. Do not use the three letter amino acid code. The amino acid sequence may be entered in either upper or lower case.

6. Press the "Submit" button to start the prediction process. A result should be returned to you within 5 - 10 seconds depending on how busy the server is. The complete prediction will appear in your browser window in the format you previously selected. The prediction may be saved, printed or deleted as you wish. If the query sequence does not have a homologue with more than 25% sequence identity, the server will return with a message "Sorry, No Prediction Possible"

Reading the Output File

It is important to be aware of how the output file is organized and to understand the presentation conventions that are used. The first few lines of the output file indicate the version of the program, the time the prediction was performed and the parameters (which are fixed) used during the database search and alignment process.

******************************************************

 Program:   SHIFTY (version 1.3)
 Description:   Predict Chemical Shifts
 Date:    Tue Aug 4 14:37:04 1998 
 Sequence Name: Web_Submission_Sequence
 Amino acids:   20 
 Scoring Mat:   wt.homology
 Gap Penalty:   10
 Gap Size Pen:  2 
 Sort Method:   1 

******************************************************

 Number of proteins tested:   176
 Number of alignments found:  1

In the next few lines, the results of the matching and prediction process are presented. This includes the percent pairwise sequence identity -- which should always be greater than 25%. It also includes the number of matches between the two aligned sequences, the name of the protein or peptide, the BMRB accession number (if known) along with a URL link to that chemical shift file in the BMRB.

**************************(1)**************************
 Title:            BETA 2 MICROGLOBULIN (HUMAN) 
 BMRB Link:                                Accession 3078 
 Matches:            8 
 Percent Identity:         40.00 

Based on formulae previously reported by Wishart et al. (1) the quality of the chemical shift prediction is also reported. This includes an indication of the expected correlation coefficient (a measure of the agreement between the predicted and observed chemical shifts) and the expected error (reported as ± X.X ppm) for the given nucleus. Generally if the correlation coefficient is > 0.90 you can be quite confident in the prediction and you should be able to almost completely assign your peptide/protein using this data alone.

 Reliability of Chemical Shift Prediction: 76.00 
 Expected rmsd:          0.24 

Below that, a compact presentation of the sequence alignment between the query sequence and the database match is shown. The vertical bars "|" indicate exact matches. The * characters indicate near matches.

Query Seq:                                                     
Matching :                                                     
Database :   IQRTPKIQVYSRHPAENGKSNFLNCYVSGFHPSDIEVDLLKNGERIEKVE      50
Structure:   CCCCCBBBBBBCCCCCCCCCBBBBBBBBBCCCCCBBBBBBCCCCCCCBBB

Query Seq:    SDASFSASDFSDFASDFSDF                                   20
Matching :    || |||  |*| *   * *|                             
Database :   HSDLSFSK DWSFYLLYYTEFTPTEKDEYACRVNHVTLSQPKIVKWDRDM      99
Structure:   BBBBBBCC CCCBBBBBBBBBBCCCCBBBBBBBBCCCCCCBBBBBCCCCC

Below this is a table describing the chemical shift types or atom names associated with each residue. This is intended to help the user read the file if they are unfamiliar with the new IUPAC nomenclature or if they are unfamiliar with the SHIFTY format.

H = amide shift   HA = alpha proton shift
HB = beta proton shift  HG = gamma proton shift
HD = delta proton shift HE = epsilon proton shift


Output format;
 For Ala; H HA HB2
 For Gly; H HA HA
 For Thr; H HA HB2 HG
 For Ser; H HA HB2 HB3
 For Asn; H HA HB2 HB3
 For Tyr; H HA HB2 HB3
 For Cys; H HA HB2 HB3
 For His; H HA HB2 HB3
 For Asp; H HA HB2 HB3
 For Phe; H HA HB2 HB3
 For Trp; H HA HB2 HB3
 For Val; H HA HB2 HG2 HG3
 For Gln; H HA HB2 HB3 HG2 HG3
 For Glu; H HA HB2 HB3 HG2 HG3
 For Met; H HA HB2 HB3 HG2 HG3 HE2
 For Leu; H HA HB2 HB3 HG2 HD2 HD3
 For Ile; H HA HB2 HG2 HG2 HG3 HD2
 For Arg; H HA HB2 HB3 HG2 HG3 HD2 HD3
 For Pro; H HA HB2 HB3 HG2 HG3 HD2 HD3
 For Lys; H HA HB2 HB3 HG2 HG3 HD2 HD3 HE2 HE

The prediction appears below this "residue" table. Here is an example of the SHIFTY format:

Num I D S H HA HB2 HB3HG2 HG3 HD2 HD3 HE2 HE3
50 E B **** **** **** **** **** **** **** **** **** ****
51 H B **** **** **** **** **** **** **** **** **** ****
52S=S B 9.13 4.67 4.48 4.09 **** **** **** **** **** ****
53D=D B **** 4.81 2.76 2.57 **** **** **** **** **** ****
54A L B **** 4.35 1.58 **** **** **** **** **** **** ****
55S=S B 8.10 4.78 3.43 2.79 **** **** **** **** **** ****
56F=F B 8.08 5.14 2.62 2.62 **** **** **** **** **** ****
57S=S C 8.35 4.56 3.93 3.68 **** **** **** **** **** ****
58A K C **** 3.80 1.31 **** **** **** **** **** **** ****
59S **** **** **** **** **** **** **** **** **** ****
60D=D C 7.71 4.36 2.30 2.30 **** **** **** **** **** ****
61F*W C 7.53 4.20 3.01 2.89 **** **** **** **** **** ****
62S=S C 7.54 3.93 3.68 3.68 **** **** **** **** **** ****
63D F B 7.90 5.27 2.10 1.36 **** **** **** **** **** ****
64F*Y B 8.40 5.58 3.13 2.89 **** **** **** **** **** ****
65A L B 9.25 4.62 1.69 **** **** **** **** **** **** ****
66S L B 8.32 5.64 4.73 3.68 **** **** **** **** **** ****
67D Y B 9.38 5.47 2.73 2.36 **** **** **** **** **** ****
68F*Y B 9.14 6.07 3.33 2.69 **** **** **** **** **** ****
69S T B 8.46 4.99 3.76 3.04 **** **** **** **** **** ****
70D*E B 8.42 4.58 2.42 2.31 **** **** **** **** **** ****
71F=F B 8.76 4.83 2.78 2.68 **** **** **** **** **** ****
72 T B **** **** **** **** **** **** **** **** **** ****

The left-most column contains the residue numbers corresponding to the database matching sequence. The second column has the sequence of the query sequence. The third column indicates the degree of similarity (identity, partial or none) between the two sequences. The fourth column shows the sequence of the homologous database sequence. The fifth column indicates the secondary structure of the database sequence (if known). The "H" indicates helix, "B" indicates beta-strand, "C" indicates random coil. If no secondary structure is available, the entire column is marked with "C's". Finally, the chemical shifts of all "predictable" resonances are presented in columns 6 and beyond. At the top of each column is an indication of the atom type with HN corresponding to amino protons, AH corresponding to alpha protons, etc. Note that Glycine has entries in the AH and BH columns with the BH corresponding to the second alpha proton.

Here is an example of the same prediction presented using the NMRSTAR format:

Num I D S Atom Shift
52S=S BH 9.13
52S=S BHA 4.67
52S=S BHB2 4.48
52S=S BHB3 4.09
55S=S BH 8.10
55S=S BHA 4.78
55S=S BHB2 3.43
55S=S BHB3 2.79
56F=F BH 8.08
56F=F BHA 5.14
56F=F BHB2 2.62
56F=F BHB3 2.62
57S=S CH 8.35
57S=S CHA 4.56
57S=S CHB2 3.93
57S=S CHB3 3.68
60D=D CH 7.71
60D=D CHA 4.36
60D=D CHB2 2.30
60D=D CHB3 2.30
61F*W CH 7.53
61F*W CHA 4.20
61F*W CHB2 3.01
61F*W CHB3 2.89
62S=S CH 7.54
62S=S CHA 3.93
62S=S CHB2 3.68
62S=S CHB3 3.68
63D F BH 7.90
63D F BHA 5.27
63D F BHB2 2.10
63D F BHB3 1.36
64F*Y BH 8.40
64F*Y BHA 5.58
64F*Y BHB2 3.13
64F*Y BHB3 2.89
65A L BH 9.25
65A L BHA 4.62
65A L BHB2 1.69
66S L BH 8.32
66S L BHA 5.64
66S L BHB2 4.73
66S L BHB3 3.68
67D Y BH 9.38
67D Y BHA 5.47
67D Y BHB2 2.73
67D Y BHB3 2.36
68F*Y BH 9.14
68F*Y BHA 6.07
68F*Y BHB2 3.33
68F*Y BHB3 2.69
69S T BH 8.46
69S T BHA 4.99
69S T BHB2 3.76
69S T BHB3 3.04
70D*E BH 8.42
70D*E BHA 4.58
70D*E BHB2 2.42
70D*E BHB3 2.31
71F=F BH 8.76
71F=F BHA 4.83
71F=F BHB2 2.78
71F=F BHB3 2.68

The output for this file type is relatively self-explanatory.

Limitations

The prediction method used by this server is exceedingly accurate (correlation coeff. > 0.90) provided that the query protein is more than ~30% homologous to another previously assigned protein. If a query protein/peptide has essentially no or very little sequence homology (< 25%) to any previously studied protein, the predictions from this server are not worth using and will not be sent to the user. The odds are approximately 35% that a protein sequence you submit will find a reasonable match to something already in the BMRB database. As the size of the BMRB grows, the odds of a successful match or a successful prediction will grow accordingly. Because there are substantially more proteins with 1H assignments only, you are far more likely to get 1H chemical predictions than 13C or 15N chemical shift predictions. This server is not capable of predicting 13C or 15N chemical shifts if only the 1H shifts of a matching homologue are known. Note that this server uses only sequence information and it does not calculate or predict chemical shifts from 3D structural coordinates. Consequently it cannot distinguish between identical protein sequences which may have been collected in different solvents (TFE vs. water), at different temperatures or under different conditions (with and without calcium). Also note that SHIFTY does not predict the aromatic shifts of tryptophan, tyrosine, phenylalanine or histidine. It is up the user to be aware of the special conditions associated with the presumptive matching homologue identified with this program.

References

Wishart, D.S., Watson, M.S., Boyko, R.F. & Sykes, B.D. "Automated Chemical Shift Prediction using the BioMagResBank" J. Biomol. NMR 10, 329-336 (1997).

Appendix

Graphs showing the relationship between the quality of chemical shift predictions versus the percent sequence identity.


Last updated August 25, 1998