Setting up input files - Rapid Determination of Protein Structures in Solution Using NMR Dipola

6. Chemical shift table with updated assignments that can be read into SPARKY (‘sparky_all.out’).

7. Detailed information about predicted chemical shifts, number of reliable assignments, number of constraints for each pseudoresidue, matrices matching experimental and back-calculated chemical shifts and/or RDCs and pseudoenergy matrices at each iter-ation step (‘mars.log’).

A.2 Setting up input files

A.2.1 Obligatory

1. A Mars run is controlled by the parameter setup file (mars.inp). This has to be adjusted to the available experimental data. Please see below for a detailed description of the parameters. Lines with a ‘#’ sign as first character as well as empty line are ignored. Do not change the variable names such as nIter.

mars.inp (MARSHOME/example/noStructure/1ubq/input)

fragSize: 5 # Maximum length of pseudoresidue fragments cutoffCO: 0.25 # Connectivity cutoff (ppm) of CO [0.25]

cutoffCA: 0.2 # Connectivity cutoff (ppm) of CA [0.5]

cutoffCB: 0.5 # Connectivity cutoff (ppm) of CB [0.5]

cutoffHA: 0.25 # Connectivity cutoff (ppm) of HA [0.25]

fixConn: fix_con.tab # Table for fixing sequential connectivity

fixAss: fix_ass.tab # Table for fixing residue type and(or) assignment

pdb: 0 # 3D structure available [0/1]

resolution: NO # Resolution of 3D structure [Angstrom]

pdbName: NO # Name of PDB file (protons required!)

tensor: NO # Method for obtaining alignment tensor [0/1/2/3/4]

nIter: NO # Number of iterations [2/3/4]

dObsExh: NO # Name of RDC table for exhaustive SVD (PALES format)

dcTab: NO # Name of RDC table (PALES format)

deuterated: 0 # Protonated proteins [0]; perdeuterated proteins [1]

sequence: 1ubq_fasta.tab # Primary sequence (FASTA format)

secondary: 1ubq_psipred.tab # Secondary structure (PSIPRED format) csTab: 1ubq_cs.tab # Chemical shift table

2. The chemical shift table follows the SPARKY format. It consists of a header, pseu-doresidues and chemical shifts. The header has to be defined before the listing of chemical shift values starts and includes the variable names for the chemical shifts.

Currently 10 different chemical shifts are supported and should be indicated by ‘CA’,

‘CA-1’, ‘CB’, ‘CB-1’, ‘CO’, ‘CO-1’, ‘HA’, ‘HA-1’, ‘H’ and ‘N’. These variable names have to be in the same order as the columns for the different chemical shifts. The first column has to be the pseudoresidue column and other columns are chemical shift columns. Pseudoresidue means the name of the group of peaks which share the same (or similar due to the experimental imperfection) N and HN chemical shifts. Lines with a ‘#’ sign as first character as well as empty lines are ignored. Missing chemical shift values have to be indicated by ‘ - ’.

1ubq cs.tab(MARSHOME/example/noStructure/1ubq/1ubq cs.tab)

N CO-1 H CA-1 CA

PR_2 123.220 170.540 8.900 54.450 55.080

PR_3? 115.340 175.920 8.320 55.080

-PR_4? 118.110 172.450 8.610 59.570 55.210

PR_5GLY 121.000 175.320 9.300 55.210 60.620

PR_6GLY 127.520 - 8.820 60.620 54.520

PR_7 115.400 177.140 8.730 54.520 60.470

PR_8 121.330 176.910 9.100 60.470 57.580

PR_9 105.590 178.800 7.630 57.580 61.400

PR_10?? 108.890 175.520 7.810 61.400 45.460

: : :

Any combination of characters can be pseudoresidue names but the number of charac-ters of the name has to be less than 25.

3. The primary sequence of the protein has to be in FASTA format.

IMPORTANT:‘X’ and ‘Z’ can not be used for the characters of a sequence.

1ubq fasta.tab (MARSHOME/example/noStructure/1ubq/1ubq fasta.tab)

A.2 Setting up input files 121

> ubq

MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYN IQKESTLHLVLRLRGG

4. Secondary structure prediction table has to be in Psipred format. Use the Psipred web server to get the table.

1ubq psipred.tab (MARSHOME/example/noStructure/1ubq/1ubq psipred.tab)

PSIPRED PREDICTION RESULTS Key

Conf: Confidence (0=low, 9=high)

Pred: Predicted secondary structure (H=helix, E=strand, C=coil) AA: Target sequence

Conf: 968896699888999867863189999999997689875658887777738887136726 Pred: CEEEEECCCCCEEEEEECCCCCHHHHHHHHHHHHCCCHHHEEEEECCEECCCCCCHHHHC AA: MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYN

10 20 30 40 50 60

Conf: 8988889999950699 Pred: CCCCCEEEEEEECCCC AA: IQKESTLHLVLRLRGG

If a 3D structure and experimental RDCs are available:

5. All standard PDB files can be used (including MOLMOL files).

IMPORTANT: When using shape-prediction all atoms in the PDB file will be used including pseudo atoms (ANI)

6. Experimental dipolar couplings are supplied according to the PALES table format:

• The protein sequence should be given as shown by one or more ‘DATA SEQUENCE’

lines. Space characters in the sequence will be ignored.

• The table must include columns for residue ID, three-character residue name and the atom name for both atoms that are involved in the dipolar coupling as well

as the dipolar coupling itself, its error and a weighting factor. Segment ID and Chain ID are optional.

IMPORTANT: The atom notation must match that of the PDB file.

• The table must include a ‘VARS’ line that labels the corresponding columns of the table.

• The table must include a ‘FORMAT’ line that defines the data type of the corre-sponding columns of the table.

• Lines with a ‘#’ sign as first character as well as empty lines are ignored.

DATA SEQUENCE MQIFVKTLTG KTITLEVEPS DTIENVKAKI QDKEGIPPDQ QRLIFAGKQL DATA SEQUENCE EDGRTLSDYN IQKESTLHLV LRLRGG

VARS RESID_I RESNAME_I ATOMNAME_I RESID_J RESNAME_J ATOMNAME_J D DD W

FORMAT %5d %6s %6s %5d %6s %6s %9.3f %9.3f %.2f

1. When additional information such as specific amino acid type labeling or initial manual assignments are available assignment of pseudoresidues can be restricted to single or to

A.2 Setting up input files 123

a set of residues. The first column has to be a pseudoresidue name followed by residue numbers or amino acid types to which the assignment should be restricted. Assignments can be fixed one by one by specifying the corresponding residue numbers or restrict it to a whole residue fragment by specifying the starting and ending residue number (inclusive) connected by ‘-’ (without a blank in between the start and end number!).

At the same time, amino acid types can be fixed by specifying the corresponding one letter code. More than one amino acid type can be specified by concatenation of the corresponding one letter codes (i.e. attach additional one-letter codes without blank in between).

fix ass.tab (MARSHOME/example/noStructure/1ubq/fix ass.tab)

PR_3 3

PR_10 10-15 23 34 PR_12 12 34-36 PR_13 13 PR_14 14 16 HKT

PR_15 LFR 66-69 13-16 9 71 PR_16 EVA

2. Also sequential connectivities can be fixed. This is especially useful when assignment is done iteratively by Mars and manually. The first and second column are pseudoresidue names. The first column is the name of the pseudoresidue for which the intra-residual chemical shift can be connected to the inter-residual chemical shift of the pseudoresidue in the second column.

fix con.tab (MARSHOME/example/noStructure/1ubq/fix con.tab)

PR_2 PR_3 PR_3 PR_4 PR_4 PR_5 PR_11 PR_12 PR_12 PR_13 PR_13 PR_14 PR_25 PR_26 PR_26 PR_27

Im Dokument Rapid Determination of Protein Structures in Solution Using NMR Dipolar Couplings (Seite 141-146)