Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Chemistry LibreTexts

6.2: Heteronuclear 3D NMR- Resonance Assignment in Proteins

  • Last updated
  • Save as PDF
  • Page ID 398288

  • Serge L. Smirnov and James McCarty
  • Western Washington University

In the previous Chapter we described 2D NMR spectroscopy, which offers significantly greater spectral resolution than basic 1D spectra. In this Chapter we will show how the well-resolved 2D 15 N-HSQC resonances can be assigned to specific residues and chemical groups within protein samples. As an example, we will consider a couple of complementary types of 3D NMR data: HNCACB and CBCA(CO)NH and their joint application for making heteronuclear NMR resonance assignment in proteins. Such an assignment opens a number of ways to probe structure and function (e.g. ligand binding) for the target protein samples.

Learning Objectives

  • Grasp why the resonance assignment of 2D 15 N-HSQC can be beneficial : the case of ligand (drug) binding by a protein (therapeutic target)
  • Familiarize with 3D heteronuclear through-bond (J-coupling) NMR : introduction and case of HNCACB and CBCA(CO)NH pair of 3D experiments
  • Follow an example of assignment of heteronuclear NMR resonances ( 1 H N , 15 N H , 13 Cα, 13 Cβ) from a combination of 2D 15 N-HSQC and 3D HNCACB/CBCA(CO)NH

15 N-HSQC as an assay for probing protein – ligand interactions: the need for the NMR resonance assignment

During the process of rational drug design, it is often necessary to characterize the interactions between the therapeutic target (protein) and candidate drug (ligand) beyond determination of the binding affinity ( K d ). Heteronuclear solution NMR experiments 15 N-HSQC can provide significant insight for such interactions. Let’s recall that most of the signals in this 2D NMR spectra originate from backbone H-N amide groups and some (minority) from the side chain NH and NH 2 groups. The position of 15 N-HSQC resonances are defined by the 1 H N and 15 N H chemical shift values, which in tern depend on the local electronic environment. Ligand binding changes such an environment for the residues forming the binding site even if the tertiary structure of the rest of the protein does not get perturbed. In such a case, the 15N-HSQC resonance pattern undergoes local changes: only the resonances representing NH groups involved in the binding site change their position significantly (>0.05 ppm in 1 H and/or >0.2 ppm in 15 N dimension) or signal intensity (including peak disappearance). Figure VI.2.A illustrates such a change.

Figure_VI.2.Ab_.png

Importantly, every 15 N-HSQC resonance in Figure VI.2.A is labeled with a single letter to help identify specific peaks which undergo spectral changes upon ligand binding. This data could have much greater impact if the peaks which underwent the most pronounced changes in position and/or intensity were assigned to specific amino acid residues within the polypeptide and chemical groups within those residues (backbone vs. side chain). The rest of this Chapter demonstrates some of the fundamentals of the heteronuclear NMR resonance assignment methodology.

Heteronuclear 3D NMR introduction: CBCA(CO)NH spectrum as an example

Just like every 2D 15N-HSQC resonance reports a J-coupling via a covalent bond between an 15N and 1H spin-½ nuclei, there are 3D NMR experiments which report resonances originating from J-coupling (through-bond) of three types of spin-½ nuclei ( 1 H, 13 C, 15 N). In this section we will introduce two such types of 3D NMR data: HNCAB and CBCA(CO)NH. In order to produce a protein sample with nearly complete uniform labeling with 13 C and 15 N isotopes, bacterial recombinant protein expression can be performed in a minimal media supplemented with 13 C-labeled glucose and 15 N-labeled ammonium chloride as the sole sources of carbon and nitrogen respectively. Figure VI.2.B introduces a general concept of a 3D NMR data and shows an element of 3D CBCA(CO)NH spectrum.

Figure_VI.2.Bd_.png

Each resonance (“cross-peak”) of a 3D CBCA(CO)NH spectrum indicates a through-bond (J-coupling scalar) interaction between two atoms of the backbone amide group ( 1 H N and 15 N H ) or residue j and Cα and Cβ nuclei ( 13 C) of preceding residue j -1. The name of the experiment, CBCA(CO)NH refers to the specific spin-½ nuclei involved (and not involved) in relevant J-coupling interactions: Cβ and Cα are J-coupled to NH while the connecting carbonyl carbon is not reporting any NMR signal (although its magnetization state is affected during the experiment). Two types of residues generate special CBCA(CO)NH peak pattern: prolines have no amide proton, so they do not have CBCA(CO)NH peaks linked with their amide groups. Glycine residues have no Cβ, therefore for any residue following a glycine only a single CBCA(CO)NH resonance will be observed (from glycine NH to previous Cα).

The NMR resonance assignment: combined use of two complementary datasets HNCACB and CBCA(CO)NH

By itself, CBCA(CO)NH does not convey much of sequential information. Another heteronuclear 3D NMR dataset, HNCACB, affords a powerful complement here. Just like CBCA(CO)NH, HNCACB reports resonances originating from J-coupling between backbone amide group and Cα / Cβ nuclei. The difference is that HNCACN reports two additional peaks, all intra-residual: between HN and Cα a Cβ spins ( Figure VI.2.C ).

Figure_VI.2.Clast_.png

Typically, HNCACB and CBCA(CO)NH are acquired with identical parameters including spectral width in all three dimensions and the same number of data points in the 15 N dimension (or 15 N planes as on panel B of Figure VI.2.B ) Now, let’s imagine that we go through every 15 N plane and build the pairs of “residue j / residue j -1″ HNCAB/CBCA(CO)NH peaks. This does not give us the sequence-specific NMR resonance assignments yet but already creates such pairs of 3D cross-peaks linked to di-peptides within the sequence. Now, let’s take into account that for some types of residues their 13Cα and 13Cβ chemical shift values differs remarkably from those from other residue types. For details, take a look at BMRB chemical shift statistics for amino acid residues with emphasis on Gly, Ala, Ser, Thr. Knowing where such residues are positioned within the polypeptide sequence, we can start “connecting the dots” by mapping HNCACB/CBCA(CO)NH planes and di-peptides on actual amino acid sequence.

Figure_VI.2.D.png

Figure VI.2.D provides a general idea of how the two 3D NMR experiments HNCACB and CBCA(CO)NH can be utilized together to map the signals on the amino acid sequence of a protein sample. The C of Ala residues typically has chemical shift values below 20.0 ppm, which is unique. This allows identification of Ala patterns HNCACB/CBCA(CO)NH spectral patters. Starting from this starting points (as well from other distinct values, e.g. Cα for Gly and Cβ for Ser/Thr), one can continue “connecting the dots” process outlined in Figure VI.2.D to cover the entire sequence. If these two 3D NMR datasets encounter resonance overlaps, which are impossible to resolve, more 3D NMR dataset pairs are utilized in a similar way, e.g. HNCO/HN(CA)CO and others. This process allows assignment to specific residues and chemical groups of nearly all backbone and some side-chain resonances ( 1 H N , 15 N H , 13 Cα, 13 Cβ). Methods for assigning side-chain chemical shift values are not discussed in this chapter but conceptually they are similar to the ones described here.

With the general process of the protein NMR resonance assignment described, let’s assume that this method was successfully applied to the protein target (T) sample presented in Figure VI.2.A. The resonance assignment completion allows one to replace letter labels with residue-number labels (similar to the ones used in Figure VI.2.D). This in turn allows one to determine the specific residues affected directly or allosterically by binding of the ligand (L) to the target. In many cases, such information together with other data leads to the determination of the ligand binding residues within the target. If the ligand is a candidate therapeutic agent, identification of the ligand binding residues greatly advances ensuing efforts to optimize the drug.

Example \(\PageIndex{1}\)

Analyze Figure VI.2.A and list at least two resonances which undergo major spectral changes upon binding of the unlabeled ligand (L) to the 15 N-labeled target protein (T). Major spectral changes for this model spectrum include resonances moving by >0.05 ppm in 1 H or >0.2 ppm in 15 N dimensions as well as peak disappearance (peak intensity going down to zero).

Upon ligand L binding target protein (T), resonance f disappears and resonance s moves by >0.05 ppm in 1 H dimension.

Example \(\PageIndex{2}\)

Inspect BMRB entry 50205 and list all the heteronuclear NMR datasets utilized for the NMR resonance assignment.

BMRB entry 50205 contains the chemical shift assignment data for the target sample and offers several ways to look at its underlying NMR data including the list of experiments used to perform the NMR resonance assignment and the chemical shift values. E.g., the NMR-STAR v3 text file has a section titled _Experiment_list, which sums up the heteronuclear NMR data types used for making the assignments: 2D 1 H- 15 N HSQC and 3D HNCACB, CBCA(CO)NH, HNCO and HN(CA)CO.

Example \(\PageIndex{3}\)

How many 3D HNCACB resonances would you expect to originate from a Lys residue which is preceded by a Met?

four as both Lys and Met have backbone amide (HN) groups and both have Cα and Cβ atoms.

Practice Problems

Problem 1 . Analyze Figure VI.2.A and list all the resonances which undergo major spectral changes upon binding of the unlabeled ligand (L) to the 15 N-labeled target protein (T). Example 1 above will help you start the analysis.

Problem 2 . From BMRB entry linked to PDB 5VNT, list all the heteronuclear NMR datasets utilized for the NMR resonance assignment for the target sample.

Problem 3 . Let’s consider panel B of Figure VI.2.B . Imagine that the 13 C dimension is taken out of the spectrum (all 13 C planes are collapsed together). What type of 2D spectrum will remain after such a dimension reduction?

Problem 4 . How many 3D HNCACB resonances would you expect to originate from a Gly residue which is preceded by a Pro?

Problem 5 . How many 3D HNCACB resonances would you expect to originate from a Pro residue which is preceded by a Gly?

Problem 6* . Look up the amino acid NMR chemical shift values statistics table presented with BMRB repository and list the average values for the following resonances: 15 N, 13 Cα and 13 Cβ for Gly, Ala, Tyr, Glu, Arg, Ser, Thr, Pro. From this analysis, suggest what types of residues tend to report unusually low or high chemical shift values in comparison with the rest of the amino acids?

PINE  is a probailistic method for automated protein backbone and sidechain assignments, detection and correction of referencing and secondary structure determination from input protein sequence and NMR data set peak lists.  Expand the “SUBMISSION” tab below for free access to the PINE analysis web-server maintained by NMRFAM.

This is an accordion element with a series of buttons that open and close related content panels.

The process of assigning a finite set of tags or labels to a collection of observations, subject to side conditions, is notable for its computational complexity. This labeling paradigm is of theoretical and practical relevance to a wide range of biological applications, including the analysis of data from DNA microarrays, metabolomics experiments, and biomolecular nuclear magnetic resonance (NMR) spectroscopy. We present a novel algorithm, called Probabilistic Interaction Network of Evidence (PINE), that achieves robust, unsupervised probabilistic labeling of data. The computational core of PINE uses estimates of evidence derived from empirical distributions of previously observed data, along with consistency measures, to drive a fictitious system M with Hamiltonian H to a quasi-stationary state that produces probabilistic label assignments for relevant subsets of the data. We demonstrate the successful application of PINE to a key task in protein NMR spectroscopy: that of converting peak lists extracted from various NMR experiments into assignments associated with probabilities for their correctness. This application, called PINE-NMR, is available from a freely accessible computer server (http://pine.nmrfam.wisc.edu). The PINE-NMR server accepts as input the sequence of the protein plus user-specified combinations of data corresponding to an extensive list of NMR experiments; it provides as output a probabilistic assignment of NMR signals (chemical shifts) to sequence-specific backbone and aliphatic side chain atoms plus a probabilistic determination of the protein secondary structure. PINE-NMR can accommodate prior information about assignments or stable isotope labeling schemes. As part of the analysis, PINE-NMR identifies, verifies, and rectifies problems related to chemical shift referencing or erroneous input data. PINE-NMR achieves robust and consistent results that have been shown to be effective in subsequent steps of NMR structure determination.

Access the PINE webserver submission form  here .

PINE  is freely available for use as a web-server, operated and maintained at NMRFAM.  Users can submit information and file inputs to the web-server via the submission form in the right column of this page.  Outputs of PINE analysis are promptly returned to the user via the input e-mail address.

PINE requires several types of input information be entered into the submission form:

  • User Information (Name, contact e-mail, principle investigator name, institution)
  • Protein Sequence
  • NMR Experiment Peaklists

For more details on these inputs, see below.

Input fields denoted with an asterisk,  * , are required for submission.

Once all inputs are entered into the submission form, click the “Submit” button at the bottom.

Protein sequence input file The protein sequence input text file should contain the amino acid sequence in a single column, using either 1- or 3-letter amino acid codes.

PINE returns several types of output, detailing the probabilistic chemical shift assignments and secondary structure determined, in several different formats.

  • Protein backbone assignments with probabilities and sub-optimals (Native PINE format)
  • Protein sidechain assignments with probabilities and sub-optimals (Native PINE format)
  • Assignments in NMR-STAR format (ver. 2.1 and 3.1)
  • Graphical depictions (.jpg) of backbone assignment and secondary structure

For more details on these outputs, see below.

Bahrami, A., Assadi, A., Markley, J. L. & Eghbalnia, H., “Probabilistic Interaction Network of Evidence Algorithm and its Application to Complete Labeling of Peak lists from Protein NMR Spectroscopy”,  PLoS Comput Biol. 2009 Mar;5(3):e1000307.

Protein NMR

A practical guide, double resonance backbone assignment.

For smaller proteins, it is possible to do the backbone assignment using just 15 N-labelled protein. The spectra used for this are the 15N-NOESY-HSQC and the 15N-TOCSY-HSQC . The 15N-NOESY-HSQC will show for each NH group all 1 H resonances which are within about 5-7Å of the NH hydrogen. Assignment is done on the assumption that the two neighbouring NH groups are always visible. Thus two NH groups can be linked because they each have an NOE to the other NH group.

Note that you always end up with a square motif between strips which are linked by an NOE: each strip has an NOE to the diagonal peak of the other strip.

Helical sections are generally easier to assign, as NOEs from NH(i) are visible not only to NH(i±1), but also to NH(i±2) and sometimes NH(i±3).

β-sheet structures include short NH-NH distances between the strands. This means that in addition to the NH(i±1) NOEs, a strong cross-strand NOE is also observed.

Having a rough idea of the secondary structure and topology of the protein can thus significantly aid backbone assignment using double resonance spectra only. Further help with assignment is provided by the 15N-TOCSY-HSQC. This should show links from the backbone NH group to all side-chain hydrogens of that residue. Using this spectrum the amino acid type can be identified or narrowed down significantly. The side-chain NOEs from the 15N-NOESY-HSQC can also be useful during the assignment process, as NH(i)-Hα(i-1) are generally very strong, in particular in β-sheet sections.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 26 January 2018

Automated NMR resonance assignments and structure determination using a minimal set of 4D spectra

  • Thomas Evangelidis 1 ,
  • Santrupti Nerli 2 , 3 ,
  • Jiří Nováček 1 ,
  • Andrew E. Brereton 4 ,
  • P. Andrew Karplus 4 ,
  • Rochelle R. Dotas 5 ,
  • Vincenzo Venditti 5 , 6 ,
  • Nikolaos G. Sgourakis 3 &
  • Konstantinos Tripsianes 1  

Nature Communications volume  9 , Article number:  384 ( 2018 ) Cite this article

8338 Accesses

28 Citations

11 Altmetric

Metrics details

  • Protein structure predictions
  • Solution-state NMR

Automated methods for NMR structure determination of proteins are continuously becoming more robust. However, current methods addressing larger, more complex targets rely on analyzing 6–10 complementary spectra, suggesting the need for alternative approaches. Here, we describe 4D-CHAINS/autoNOE-Rosetta, a complete pipeline for NOE-driven structure determination of medium- to larger-sized proteins. The 4D-CHAINS algorithm analyzes two 4D spectra recorded using a single, fully protonated protein sample in an iterative ansatz where common NOEs between different spin systems supplement conventional through-bond connectivities to establish assignments of sidechain and backbone resonances at high levels of completeness and with a minimum error rate. The 4D-CHAINS assignments are then used to guide automated assignment of long-range NOEs and structure refinement in autoNOE-Rosetta. Our results on four targets ranging in size from 15.5 to 27.3 kDa illustrate that the structures of proteins can be determined accurately and in an unsupervised manner in a matter of days.

Introduction

Nuclear magnetic resonance (NMR) structure determination relies on recording a network of nuclear Overhauser enhancement (NOE) restraints from multidimensional spectra 1 . Obtaining near-unambiguous assignments of long-range NOEs is challenging due to substantial overlap in the spectra which becomes more pronounced for larger proteins. This is typically addressed through first establishing the chemical shift assignments of backbone and sidechain atoms using multiple (6–10) triple-resonance spectra 2 , 3 , which are then used as anchors to guide the assignment of NOEs during iterative structure refinement 4 . State-of-the-art tools such as FLYA 5 , PINE 6 and UNIO 7 can automate the resonance assignment and structure determination process. In principle, recording a smaller number of higher dimensionality spectra can provide a complementary approach to increase signal dispersion and resolve ambiguities 8 . With the emergence of non-uniform sampling and reconstruction methods, such datasets can be recorded in reasonable time 9 . Recent approaches for automated resonance assignments based on three- and four-dimensional (3D and 4D) NOE data make use of a known structure to guide the assignment process 10 , 11 . However, for de novo structure determination, further development is needed to perform resonance assignments at the high levels of completeness and correctness that are required for NOE data-driven structure determination.

Recent methods allow for structure modeling guided by NMR chemical shifts, used as a means to optimize a physically realistic energy function that reproduces the native features of protein structures 12 , 13 . Chemical shift Rosetta (CS-Rosetta) relies on backbone assignments along with Rosetta’s Monte Carlo fragment assembly protocol to model protein structures in the 10–12 kDa range 12 . CS-Rosetta was superseded by resolution-adapted structural recombination (RASREC)-Rosetta, extending the size limit to 25 kDa using backbone residual dipolar couplings (RDCs) and amide NOEs 14 , or to 40 kDa using sparse NOE data acquired on methyl-labeled, perdeuterated NMR samples, assigned manually 15 . In addition, the use of evolutionary information in conjunction with NMR chemical shift data can be used to model protein targets in the 25–40 kDa range 16 , 17 , 18 . Finally, autoNOE-Rosetta performs automated assignment of long-range NOEs and structure refinement using iterations of parallel RASREC-Rosetta calculations 19 . In all these methods, the use of advanced conformational sampling methodologies enables protein structure determination using a sparse network of restraints 20 . However, a significant bottleneck remains in establishing correct sidechain assignments at sufficient completeness levels to drive the automated assignment of long-range NOEs 20 . Moreover, the use of methyl-labeled samples requires extensive deuteration, which can be challenging for several biologically important systems 21 .

Here we combine the powerful autoNOE-Rosetta approach with a new automated assignment algorithm (4D-CHAINS) in a complete pipeline for NMR structure determination. First, 4D-CHAINS utilizes two complementary experimental datasets, a 4D-TOCSY (Total Correlated Spectroscopy) and a 4D-NOESY (Nuclear Overhauser Effect Spectroscopy), to obtain near-complete resonance assignments of backbone and sidechain 1 H, 13 C and 15 N atoms. The resonance lists provided by 4D-CHAINS form the basis for iterative assignment of long-range NOEs and structure determination using autoNOE-Rosetta, which exploits through-space correlations recorded in two 4D-NOESY datasets, one amide to aliphatic, and one aliphatic to aliphatic. The combined approach allows us to obtain structural ensembles for proteins up to 27 kDa, without the need for deuteration or selective labeling, by leveraging the well-resolved spectral features of the 4D datasets together with Rosetta’s energy function. Our NMR data and detailed analysis, performed for one benchmark case with known X-ray structure and three additional blind targets, illustrate that the new approach can consistently deliver high-resolution structural ensembles of biologically relevant proteins by greatly reducing the number of required experiments and human time spent.

Development of the 4D-CHAINS assignment algorithm

Towards developing 4D-CHAINS, we recorded for four different protein targets of size from 15.5 to 27.3 kDa, a 4D HC(CC-TOCSY(CO))NH, and a 4D 13 C, 15 N edited HMQC-NOESY-HSQC (HCNH) experiment. The largest protein target of size 27.3 kDa was chosen based on its apparent correlation time of ~15 ns that still allows for TOCSY transfer to occur (Supplementary Figure  1 ). We also recorded a 4D 13 C, 13 C edited HMQC-NOESY-HSQC (HCCH) experiment to further assist in structure determination. To address the assignment problem, 4D-CHAINS uses 2D probability density maps of correlated 13 C– 1 H chemical shifts to effectively identify possible spin systems (Fig.  1 , Supplementary Figure  2 ). In particular, 4D-CHAINS combines sequential information present in the 4D-HCNH TOCSY and intraresidue information present in 4D-HCNH NOESY 13 C– 1 H planes, respectively, by clustering TOCSY or NOESY peaks to Amino Acid Index Groups (AAIGs) via their common 15 N– 1 H frequency (Supplementary Figure  3 ). 4D-CHAINS computes probability scores at several steps (amino acid-type prediction, sequential AAIG relations based on TOCSY–NOESY connectivities, alignment of peptides to the protein sequence) to yield a confidence score for a given AAIG being assigned to a specific protein residue. Finally, 4D-CHAINS uses an Overlap Layout Consensus (OLC) assembly approach adopted from genome assembly 22 to match continuous AAIG segments along the protein sequence (Fig.  2 ). The final assignment solutions are consistent with both the joined probability score and the OLC model.

figure 1

Comparison of uncorrelated and correlated chemical shifts probabilities and their power in amino acid-type prediction. a The 1D probability distributions for the C α and H α atoms of Trp (left) and their joint probability distribution (right). b The 2D probability distribution of correlated C α –H α chemical shifts of Trp (left) and the corresponding smoothed 2D probability density map (right) after applying a Gaussian kernel function. Graphs were generated from the VASCO database with a bin size of 0.04 p.p.m. for protons and 0.2 p.p.m. for carbons. The color gradient scaling differs for ( a) uncorrelated and ( b) correlated distributions. c Pie charts displaying the ranking of amino acid-type predictions in the current dataset (four proteins) using uncorrelated (left) or correlated distributions (right) of 13 C– 1 H chemical shifts. d Analysis of common (correct or wrong) predictions made by using uncorrelated (red) or correlated distributions (blue) of 13 C– 1 H chemical shifts. Horizontal axis plots the mean value and standard error of correct predictions ( P 1) to the second probable ( P 2), and vertical axis plots the mean value and standard error of wrong predictions ( Pn ) to the first probable ( P 1). Pairwise t -test shows that the improvement in predictions made by using correlated instead of uncorrelated 13 C– 1 H chemical shifts (arrows) is statistically significant for the correct ones ( P -value = 0.00016) but not for the wrong ones ( P -value = 0.2175). The discrepancy is attributed to the relatively small number of wrong predictions available for the test (405 vs 88)

figure 2

Schematic representation of 4D-CHAINS decision making using Overlap Layout Consensus assembly. a Chains of AAIGs of given length L are translated to peptides and aligned to the query sequence. b Contigs of overlapping chains (overlap length L -1) are generated. Peptides that do not form contigs are discarded. c Contigs are lined up and absolute consensus AAIGs (shapes) are identified. If there is agreement with the associated confidence score (for details see Methods), then AAIGs are assigned to the corresponding amino acids of the sequence (black shapes). d Assigned AAIGs are restrained in next rounds of decision making. False peptides and contigs do not longer form and using the same rules additional AAIGs are assigned (triangle, polygon)

A uniform 4D-CHAINS protocol was applied to all four targets (Fig.  3a ). The algorithm mapped correctly all AAIGs to the respective protein sequences with >95% completeness (Supplementary Figure  4 ). The TOCSY-based assignments alone covered approximately 80% of all aliphatic chemical shifts with an error rate of <0.5% (Fig.  3b , Supplementary Figure  5 ). To increase the overall assignment completeness, we obtained additional information from the HCNH NOESY spectrum by employing the concept of common NOEs between successive residues 21 . The TOCSY–NOESY combination enabled more complete assignments with 94% correct aliphatic chemical shifts and a combined error rate of 1.3% (Fig.  3b , Supplementary Figures  5– 7 , Supplementary Table  1 ). The concept of common NOEs in obtaining assignments was further tested by providing 4D-CHAINS fixed 15 N– 1 H assignments and the HCNH NOESY spectrum alone (Supplementary Figure  8 ). This assignment scenario (NOESY) allows users to extend existing backbone assignments, obtained conventionally, to cover sidechain atoms with 86% accuracy and an error rate of approximately 5% (Fig.  3b , Supplementary Figure  5 ).

figure 3

Automated structure determination using 4D-CHAINS/autoNOE-Rosetta. a Flowchart of the 4D-CHAINS algorithm for automated NMR resonance assignment from two 4D spectra (TOCSY and NOESY). b Quality of 4D-CHAINS assignments for supervised, TOCSY–NOESY, and NOESY settings, expressed as the average for the four different protein targets. c , d Performance of different 4D-CHAINS assignment scenarios for a 198 aa protein, α-lytic protease, calculated using autoNOE-Rosetta. c Goodness of structural ensembles is measured using the Rosetta all-atom energy function, backbone heavy atom RMSD to X-ray structure (PDB ID 1P01) and degree of structural convergence. Average energy values (in Rosetta Energy Units (REU)) for ensembles calculated using indicated data/assignment scenarios, with errors bars shown at 1 standard deviation. Also shown is the average energy of 10 locally refined X-ray structures (diamond)—refinement adapts the X-ray structure to the local optimum of the Rosetta energy, with a minimum change in RMSD (0.5 Å). The color of points represents the average % of converged residues in each ensemble, according to the color scale on the right. d Lowest-energy structures in each ensemble (shown in the same color as the points in c ) superimposed on the X-ray reference structure (gray). Images of structures were produced using Chimera ( https://www.cgl.ucsf.edu/chimera )

Performance of 4D-CHAINS relative to existing methods

To test the performance of 4D-CHAINS relative to existing assignment programs, we performed calculations using a popular assignment method, FLYA 5 , for all protein targets used in the current study. While 4D-CHAINS relies exclusively on the combination of 4D-HCNH TOCSY and 4D-HCNH NOESY, the FLYA algorithm is designed to combine peak patterns from any number of input spectra. Therefore, we provided FLYA with all available spectra (4D-HCNH TOCSY, 4D-HCNH NOESY, 4D-HCCH NOESY). Notwithstanding, 4D-CHAINS outperforms FLYA consistently for all four protein targets in our benchmark. For three proteins, namely RTT, ms6282 and Enzyme I (nEIt), FLYA outputs 90% correct assignments with 7-8% error rate, while for α-lytic protease (aLP) the number of correct assignments is limited to 25% (Supplementary Figure  5 ). Finally, we manually inspected and extended the 4D-CHAINS results to establish the maximum number of highly accurate assignments for all 13 C– 1 H correlations that can be observed in our 4D spectra (>98%), as a “best-effort” resonance list requiring a modest time investment by a trained user. These supervised assignment lists also contain aromatic and sidechain amide chemical shifts, not considered by the automated 4D-CHAINS protocol (Supplementary Figure  6 ).

Iterative structure calculations using autoNOE-Rosetta

We evaluated the performance of assignments obtained using 4D-CHAINS in driving Rosetta structure determination of a 20 kDa target, aLP, for which several known X-ray structures are available in the Protein Data Bank 23 . Inspection of the X-ray structural ensemble shows a highly complex all-β fold, with two sub-domains each containing a 6-stranded antiparallel β-sheet. In order to establish a “best effort” limit of the Rosetta automated NOE assignment and structure determination protocol, we first performed autoNOE-Rosetta calculations 19 using the supervised assignments together with both NOE datasets (HCNH+HCCH). Additionally, we carried out automated 4D-CHAINS/autoNOE-Rosetta structure calculations (Fig.  3a , Supplementary Figure  9 ) under four different scenarios, as described above (TOCSY–NOESY or NOESY assignments, each using HCNH alone or HCNH+HCCH NOEs). To evaluate the quality of the resulting structural ensembles, we used the following criteria: (i) fraction of residues converged within 2.0 Å backbone heavy atom root-mean-square deviation (RMSD) in the final ensemble, (ii) average Rosetta all-atom energies and (iii) RMSD to X-ray structure. Since the Rosetta energy function 24 has a global minimum at the native fold, lower-energy models should also exhibit higher convergence towards the native structure. We observed a good correlation between Rosetta all-atom energy, degree of convergence and structural accuracy (correlation coefficient of 0.93, Fig.  3c ). Specifically, using the supervised assignments and HCNH or HCNH+HCCH NOEs, we obtained highly converged structural ensembles (>98%, computed over the core secondary structure regions) that are within 0.7 Å RMSD from the X-ray structure.

Notably, ensembles calculated using the 4D-CHAINS (TOCSY–NOESY or NOESY) automated assignments using both NOE datasets also achieved a high level of convergence (>90%), to within 1.3 and 1.7 Å RMSD from the X-ray, respectively. Using the same 4D-CHAINS assignment lists and the HCNH NOEs alone, the accuracy relative to the X-ray was slightly reduced to 1.7 and 1.9 Å, respectively, while the convergence decreased to approximately 86%, but the models still recapitulated the protein fold and β-sheet topology. This trend is highlighted in a superposition of the lowest-energy aLP model sampled in each calculation on the X-ray structure reference (Fig.  3d ).

Convergence of aLP structures towards the X-ray reference

To evaluate the relative accuracy and precision of NOE-driven structure determination approaches using different input assignments and NOE datasets (HCNH or HCNH+HCCH), we performed a single joint Ensemblator 25 , 26 analysis of the 12 resulting aLP structural ensembles (six each using Rosetta or CYANA 27 ) and a set of 51 X-ray structures from the CoDNaS 28 database ( see Methods). A dimensionality-reduced visualization of the relationships between the models (Fig.  4a ) reveals that the Rosetta models are consistently closer to the X-ray models than the corresponding CYANA models generated using the same datasets. Overall, the Rosetta models show better convergence, and convergence for all groups correlates strongly with their similarity to the X-ray structures (Fig.  4b ). The two Rosetta-generated ensembles based on supervised assignments are nearly equivalent and show the best convergence and greatest similarity to the X-ray structures (Fig.  4a, b ). Here, the fragment-based structure refinement in Rosetta allows the generation of highly accurate ensembles from HCNH NOEs alone, which is not feasible using standard simulated annealing in CYANA. In particular, poor convergence and low similarity to the X-ray ensemble are seen for models calculated by CYANA from the HCNH NOEs alone (Fig.  4a, b ; orange and red circles). Conversely, the Rosetta ensembles produced from a single NOESY dataset (HCNH) are in good agreement with the X-ray ensemble (Fig.  4c ), and quantitative comparison shows that the structural variability pattern along the protein chain is rather similar, although the NMR ensemble typically has a greater variability than the X-ray ensemble (Fig.  4d ). These results suggest that the uncertainty of atom positions in solution correlates with variability associated with different crystal packing environments.

figure 4

Ensemblator 25 , 26 analysis of α-lytic protease NMR and X-ray ensembles. a t-SNE dimensionality reduction results showing the relationships of the aLP models. The shape and color of each point (see in plot key) convey the method and the data used to generate that model; a black outline highlights the most representative (exemplar) for each type of model. b For each type of NMR ensemble, the convergence of the ensemble during refinement is compared to the distance of its exemplar to that of the X-ray ensemble. Shapes and colors of each point (see in plot key) convey the method and the data used to generate that model. The best fit to these data (black line) has an adjusted R 2 of 0.96. c Wire diagram showing the traces of the backbone heavy atoms for the 4D-CHAINS/autoNOE-Rosetta ensemble obtained using supervised assignments and HCNH NOEs (red) and the X-ray ensemble (blue). The defined core atoms (21.4% of all atoms using a 1.6 Å cutoff) are shown in green. d Shown are the per residue global backbone heavy atom variation (bottom panel) for the 4D-CHAINS/autoNOE-Rosetta ensemble obtained using supervised assignments and HCNH NOEs (green) and the X-ray ensemble (blue), along with the closest approach distance for models in each group (red). The discrimination index (top plot, black line) reveals regions of similarity (low values) and difference (high values); the median discrimination (DI) index is indicated by a horizontal line

Consistent blind structure determination of protein targets

To further test our method in a fully unbiased manner we performed blind structure calculations for three additional protein targets, RTT 29 , 30 , ms6282 and nEIt of sizes 133, 145 and 248 amino acids (aa), respectively (Table  1 ). To establish a baseline performance, we carried out CS-Rosetta calculations guided by chemical shifts alone 15 , as well as reference CYANA calculations using both input NOE datasets (HCNH+HCCH). With the exception of the smallest target (RTT), the resulting CS-Rosetta models failed to converge (Supplementary Figure  10 ) and instead sampled conformations with sub-optimal energies (Fig.  5 ; right column, black). Conformational sampling is drastically improved in autoNOE-Rosetta calculations guided by both supervised or automated 4D-CHAINS assignments, and the resulting structural ensembles are very similar for all targets (Fig.  5 ; left column). For the largest target, the 27.3 kDa Enzyme I from Thermoanaerobacter tengcongensis , NOE contacts provided sufficient constraints to elucidate the structure of the individual domains, but the overall orientation of the two domains was not converged due to the lack of contacts at their interface (domain A, defined by residues 1–143 and domain B, defined by residues 144–248) (Supplementary Figure  11 ). Here, the use of 15 N– 1 H residual dipolar couplings allowed us to sample further lower energies, and obtain better convergence by restraining the relative orientation of the two domains (Fig.  5d , Supplementary Figure  11d ).

figure 5

Comparison of structural ensembles calculated from supervised versus fully automated assignments. a Rtt103 (RTT, 133 aa), b KanY (ms6282, 145 aa), c α-lytic protease (aLP, 198 aa) and d Enzyme I (nEIt, 248 aa). Columns 1 and 2: autoNOE-Rosetta ensembles of 10 lowest-energy structures guided by “best effort” supervised assignments or by automated 4D-CHAINS assignments (TOCSY–NOESY), respectively. Column 3: Sequence map of distance restraints assigned by autoNOE-Rosetta in iterative structure refinement calculations. Here, the upper triangular region shows restraints obtained using supervised assignments, while the lower triangular region using automated 4D-CHAINS assignments. Column 4: Rosetta energy (in Rosetta Energy Units (REU)) distributions and total numbers of assigned long-range restraints. The energy distribution was computed from the 100 lowest-energy structures sampled during the final stage of autoNOE-Rosetta calculations using supervised assignments (purple), 4D-CHAINS assignments (green) and chemical shift fragment-based RASREC-Rosetta calculations without NOEs (black). The bars represent the total number of HCNH (amide to aliphatic) and HCCH (aliphatic to aliphatic) long-range NOE restraints assigned by autoNOE-Rosetta, including ambiguous restraints derived for different stereo-specific groups. RDCs were used to obtain converged Enzyme I structures with respect to the orientation of the two domains reported in row d . Images of structural ensembles were produced using Chimera ( https://www.cgl.ucsf.edu/chimera )

Towards evaluating the effect of different levels of assignment completeness on the performance of autoNOE-Rosetta, we carried out benchmark calculations by randomly removing entries from our “best effort” supervised assignment lists for target aLP and found that autoNOE - Rosetta can identify correct protein fold from as low as 60–70% sidechain assignments. In addition, we performed a detailed comparison of assigned NOE contacts and Rosetta energy distributions, relative to control calculations guided by the supervised assignments. We observe that the use of fully automated assignments results in a small decrease in the total number of NOE contacts identified by Rosetta (approximately 80% for all targets). Furthermore, we obtain similar distributions of assigned NOE contacts among residue pairs in the protein sequence (Fig.  5 ; middle column). The respective lowest-energy models are built using hundreds of automatically assigned, long-range NOE restraints and exhibit a minimal number of violations (1–4%) involving pairs of atoms that are typically within 1 Å from their estimated upper distance limits (Supplementary Table  2 ). Given that methyl–methyl NOE contacts play a critical role in defining the hydrophobic core of the protein, we found that ∼ 25% of the total contacts identified by autoNOE-Rosetta are contributed by methyl NOEs for structure calculations using supervised or automated 4D-CHAINS assignments (Supplementary Table 3 ). Finally, the distributions of energies among the 100 best sampled structures are generally shifted relative to RASREC-Rosetta and show good overlap with their supervised counterparts (Fig.  5 ; right column).

NMR remains the only biophysical technique that can deliver high-resolution structures of proteins and other biomolecules in their functional, aqueous environment, which constitutes the basis for studying interactions with other molecules and therapeutic compounds. However, standard approaches for NMR resonance assignment rely on recording several complementary datasets which can be limiting for larger, more complex systems due to increased resonance overlap and require a significant time investment by a trained expert to analyze the spectra and establish a complete list of resonance assignments aided by computational tools 31 . Established methods to overcome this problem utilize selective isotopic labeling 32 , which can be limiting in terms of the information content present in the NMR data, expensive and challenging to perform for certain systems.

Here, we propose an automated approach for full structure determination using 2–3 4D NMR spectra recorded on a 13 C, 15 N uniformly labeled sample. First, 4D-CHAINS addresses the assignment problem in an efficient and highly robust manner, yielding the correct assignments for at least 95% of residues and error rates of less than 1.5% (Supplementary Table  1 ). It is further worth noting that the vast majority of resonances corresponding to sidechain methyls, which are important probes in identifying the protein fold, are correctly assigned by our method. Therefore, the use of 4D-CHAINS allows near-complete assignment of sidechain methyls without the need for site-specific labeling on a perdeuterated background 33 (Supplementary Table  4 ). Second, autoNOE-Rosetta uses a highly parallelizable iterative algorithm run on a computer cluster to perform assignment of long-range NOEs alongside the structure determination process. The full pipeline takes approximately 10–12 days to execute for a typical protein sample, including the time needed for NMR data acquisition, and requires minimum supervision.

In addition to recapitulating the correct protein fold, autoNOE-Rosetta models obtained using the 4D-CHAINS assignments show accurate placement of sidechains for most residues in the protein structure (Supplementary Table  5 ). Specifically, close inspection of sidechain conformations in the Rosetta ensembles computed using the supervised assignments shows good overall agreement with the X-ray rotamers for most buried residues (>10 Å 2 BSA), while using the fully automated assignments results in a small decrease (<10%) in accuracy relative to the models derived using supervised assignments (Supplementary Figure  12 and Supplementary Table  5 ). Finally, an analysis of long-range NOEs assigned by autoNOE-Rosetta versus predicted from the X-ray structure using a 5.5 Å distance cutoff between all pairs of protons shows good recovery of crystallographic contacts at levels of 67–86%, which are distributed across the entire protein fold (Fig.  6 ). Taken together, our results underpin that the automated 4D-CHAINS/autoNOE-Rosetta approach yields models that accurately capture the correct global fold as well as atomic features of the native structure.

figure 6

Comparison of assigned NOE contacts versus predicted from X-ray structure. Contacts shown as a function of residue pairs along the sequence of α-lytic protease. Upper triangular region shows NOE contacts identified during iterative structure refinement by autoNOE-Rosetta, while that of lower triangular region represents expected contacts as predicted from the X-ray structure (PDB ID 1P01) using a 5.5 Å distance cutoff between all possible proton atom pairs and further removing redundancies due to chemically equivalent protons. Different combinations of input assignments and NOE datasets used are shown as follows. a Supervised assignments with HCNH NOEs. b Supervised assignments with HCNH+HCCH NOEs. c Total number of NOE restraints assigned in a and b (orange) versus predicted from X-ray structure (gray). d 4D-CHAINS TOCSY-NOESY automated assignments with HCNH NOEs. e 4D-CHAINS TOCSY–NOESY automated assignments with HCNH+HCCH NOEs. f Total number of NOE restraints assigned in d and e (orange) versus predicted from X-ray structure (gray)

Overall, the convergence of structures obtained using supervised assignments for the three target proteins, RTT, ms6282 and aLP, are better than or comparable to the convergence of structures obtained using 4D-CHAINS automated assignments, as expected (Table  1 ). Notably, for Enzyme I, autoNOE-Rosetta achieves a higher level of structural convergence using automated assignments due to enhanced resampling of the correct protein fold during the early stages of the autoNOE-Rosetta structure calculation process. Overall, our results suggest that the fully automated assignment process introduced by 4D-CHAINS has a minimal impact on the performance and quality of the derived structural ensembles by autoNOE-Rosetta, which remain highly consistent with all available input data.

Relative to CYANA, autoNOE-Rosetta can achieve a similar degree of structural convergence using the same input resonance assignments with both the aliphatic and amide NOE peak lists. Although the total number of structurally degenerate HCNH+HCCH long-range NOE contacts identified by CYANA is higher by (i) ∼ 10% for aLP and ms6282 and (ii) ∼ 25% for RTT and nEIt (Supplementary Figure 13 ), for three targets, RTT, ms6282 and aLP, the degree of structural convergence achieved by autoNOE-Rosetta is comparable to CYANA; while for the largest target, Enzyme I (nEIt), the autoNOE-Rosetta ensemble is significantly more converged towards the correct fold (Supplementary Figure  14 ). Generally, the structural ensembles determined using autoNOE-Rosetta are closer to the nearest PDB reference structures by approximately 0.5 Å for RTT, 0.2 Å for ms6282, 0.5 Å for aLP and >2.2 Å for nEIt relative to the structures predicted by CYANA (Supplementary Figure  14 ). When using the amide NOEs alone together with either automated or supervised resonance assignments, CYANA does not yield converged models, while autoNOE-Rosetta can still deliver models showing the correct protein fold, albeit with reduced convergence relative to calculations performed using both input peak lists, as shown in detail for aLP (Fig.  3 ) and as outlined for all other targets (Table  1 ).

In summary, we demonstrate that 4D-CHAINS provides highly accurate and near-complete NMR resonance assignments from two 4D spectra, which are effective in guiding high-resolution structure determination using autoNOE-Rosetta. Our results on four targets in the 15.5–27.3 kDa range indicate that the use of our automated pipeline has a minimal impact on the precision and quality of the resulting structural ensembles, while allowing for a tremendous reduction in human effort and NMR spectrometer time. Lastly, our structural evaluation criteria, in terms of convergence and Rosetta all-atom energy, can clearly distinguish the correct structures, allowing our protocol to be used extensively for generating high-quality models in a truly unsupervised manner. Therefore, our combined approach could be of great practical utility in both high-throughput structural determination projects 34 and NMR-based screening for small-molecule and protein–protein interactions 35 .

NMR sample details

For each uniformly 13 C-, 15 N-labeled protein sample, the concentration, buffer composition and NMR data collection temperature are as follows:

The 0.8 mM RTT in 35 mM potassium phosphate (pH 6.8), 100 mM KCl, 5% D 2 O, 25 °C.

The 1.2 mM ms6282 in 50 mM sodium phosphate (pH 6.5), 150 mM NaCl, 7% D 2 O, 25 °C.

The 2.0 mM aLP in 10 mM deuterated sodium acetate (pH 4.0), 50 mM NaCl, 8% D 2 O, 25 °C.

The 1.8 mM nEIt in 20 mM sodium phosphate (pH 6.5), 100 mM NaCl, 5% D 2 O, 37 °C.

NMR data collection

For each protein target, a set of three sparsely sampled 4D NMR experiments was acquired on 850 or 950 MHz Bruker Avance III spectrometers equipped with 1 H/ 13 C/ 15 N TCI cryogenic probehead with z -axis gradients. All NMR spectra were recorded at CEITEC Josef Dadok National NMR Centre using pulse sequences adopted and modified from Bruker library. The 4D HC(CC-TOCSY(CO))NH experiment was acquired with chemical shift evolution performed in semi-constant time manner in t 1 ( 1 H ali ) and t 3 ( 15 N) and using FLOPSY16 spin-lock of 12 ms that yielded the best overall signal-to-noise ratio. The spectral widths were set to 12,500 (acq) × 2000 ( 15 N) × 8000 ( 13 C ali ) × 6250 ( 1 H ali ) Hz and maximal acquisition times in the indirectly detected dimensions were set to 50 ms for 15 N, 10 ms for 13 C ali and 16 ms for 1 H ali . The experiment was acquired with 16 scans per increment and single-scan recycling delay of 1.0 s. The overall number of 1536 points was collected in the acquisition dimension and 2500 hypercomplex points were sparsely distributed over the indirectly detected dimensions. Prior to recording full 4D HC(CC-TOCSY(CO))NH experiment, we recorded the 15 N/ 1 H 2D plane of the experiment using a full (incremental) sampling list since our methodology is applicable if the number of signals observed in the 2D plane are ≥50% of expected, based on a standard, sensitivity-enhanced 2D 15 N/ 1 H HSQC experiment. In the 4D 13 C, 15 N edited HMQC-NOESY-HSQC (HCNH) experiment, the HMQC building block was used to transfer the magnetization between 1 H ( t 1 ) and 13 C ( t 2 ) with evolution of the 1 H chemical shift in semi-constant time manner during both transfer and refocusing of magnetization. The magnetization transfer between 1 H ( t 4 ) and 15 N ( t 3 ) was designed using a reverse HSQC building block after the 70 ms NOESY mixing time. The data were collected with spectral widths set to 12,500 (acq) × 2000 ( 15 N) × 8000 ( 13 C) × 10,000 ( 1 H) Hz, respectively. The maximal evolution times in the indirectly detected dimensions were set to 50 ms for 15 N, 10 ms for 13 C and 20 ms for 1 H. The experiment was acquired using 1.0 s single-scan recycle delay and 8-step phase cycle with 8 scans per increment. In all, 1536 complex points were acquired in the direct dimension and the overall number of 5000 hypercomplex points was non-uniformly distributed over the indirectly detected dimensions. The 4D 13 C, 13 C edited HMQC-NOESY-HSQC (HCCH) experiment uses the same HMQC-type building block as described for the HCNH noesy experiment (see above) within the first 1 H ( t 1 ), 13 C ( t 2 ) transfer of magnetization. The second 1 H ( t 4 ), 13 C ( t 3 ) transfer of magnetization following the 70 ms NOESY mixing time is performed using HSQC building block utilizing gradients. Data were collected with spectral widths set to 12,500 (acq) × 8000 ( 13 C) × 8000 ( 13 C) × 10,000 ( 1 H) Hz, and the maximal acquisition times in the indirectly detected dimensions were set to 20 ms for 1 H ( t 1 ) and 10 ms for 13 C ( t 2 , t 3 ). The experiment was acquired with 8 repetitions per increment and 1.0 s single-scan recycling delay. The overall number of 1536 complex points was collected in the acquisition dimension and 5000 hypercomplex points were distributed over the indirectly detected dimensions. For each spectrum the NMR acquisition time was 4 days. From our setup, we can observe that the experimental time needed to acquire three 4D non-uniform sampling spectra is comparable to the total acquisition time of several conventional 3D experiments. However, the analysis of 3D experiments is laborious and further complicated by resonance overlap, which becomes more pronounced with increasing target size. Thus, from the user’s standpoint, it is preferable to operate using a pair of complementary experiments which yield the same information in a higher-dimensionality dataset. Finally, our benchmark data illustrate that any additional relaxation losses during the extra chemical shift evolution step needed to acquire the fourth indirect dimension are not prohibitive for highly concentrated samples of stable proteins, which can still yield very rich datasets. All the pulse sequences used in our experiments are available upon request.

Sparse sampling and data processing

The on-grid Poisson disc sampling 36 was utilized in the present application to distribute individual acquisition points in the indirectly detected dimensions. This sampling scheme introduces distances between the generated time points and has been shown to reduce the level of sampling artifacts in the direct vicinity of signal after the reconstruction 36 .

The 4D data were processed using sparse Fourier transform algorithm 37 to check the data quality. Final processing was performed in an iterative manner using the Signal separation analysis approach as implemented in the program cleaner4d 37 (SSA package). Prior to processing with the cleaner4d program, the data were square cosine weighted in the directly acquired dimension and zero-filled to 2 k points using NMRPipe/NMRDraw 3.0 38 . The 4D spectra were analyzed in Sparky 39 .

Peak picking

Peaks were picked automatically and curated manually using a restricted peak picking strategy. First, the 4D-HCNH NOESY spectrum was picked at a user-defined noise level using both 15 N, 1 H- and 13 C, 1 H-HSQC peaks as filters. Then the 4D-HCNH TOCSY spectrum was picked using the 4D-NOESY peaks as filters. Accordingly, all planes were inspected simultaneously in all spectra and picked artifacts were removed. Synchronization of all four shared dimensions in the spectra allows for a highly efficient peak picking and curation process.

Measurement of RDC restraints for nEIt

Backbone amide 1 D NH RDCs were measured by taking the difference in 1 J NH scalar couplings in aligned and isotropic media 40 . The alignment media employed was phage pf1 (16 mg ml −1 ; ASLA Biotech) 41 , and 1 J NH couplings were measured using the ARTSY pulse scheme 42 . NMR measurements were performed on a Bruker 800 MHz spectrometer equipped with a z -shielded gradient triple-resonance cryoprobe. Spectra were processed using NMRPipe 38 and analyzed using the program Sparky 39 .

Automated resonance assignment using 4D-CHAINS

4D-CHAINS is an automated resonance assignment algorithm for backbone and sidechain chemical shifts of proteins. As input it requires the protein sequence in fasta format and peak lists from 1 H, 15 N HSQC (root), 4D HC(CC-TOCSY(CO))NH and 4D 13 C, 15 N edited HMQC-NOESY-HSQC experiments in sparky format. 4D-CHAINS algorithm tackles the assignment problem in a conventional way 43 , 44 . The distinct feature of 4D-CHAINS is that all available aliphatic 13 C– 1 H coupled frequencies are used for amino acid-type prediction and sequential connectivities, drastically decreasing the ambiguity level (Supplementary Figure  9 ).

4D-CHAINS overall assignment accuracy depends on chemical shift statistics, currently available in the form of one-dimensional (1D) distributions of proton or carbon resonances for every atom of the 20 amino acids (Fig.  1a ). Since the 4D spectra provide direct information on 13 C– 1 H correlated chemical shifts, we reasoned that statistical correlated chemical shift distributions would improve 4D-CHAINS performance in addressing the assignment problem. For the different 13 C– 1 H moieties of every amino acid, correlated chemical shifts maps were generated from the VASCO-corrected data 45 . VASCO dataset was chosen instead of the larger BMRB dataset, because chemical shift values of aliphatic carbons that were improperly referenced have been corrected, thus avoiding distortion of the information content used. The resulting 2D probability distributions have bins with zero frequency, due to the relatively small sample size (Fig.  1b ). Therefore, we created probability density maps by applying a Gaussian kernel function, given by Eq. 1 , to estimate the density at any point

where n is the data size, h H and h C the bandwidth for the proton and the carbon dimension, respectively. For optimal bandwidth selection we used Scott’s rule of thumb h  =  n −1/6 . Based on our analysis, the 2D probability density maps of correlated chemical shifts provide improved predictive power when compared to joint probabilities derived from 1D histograms of proton and carbon chemical shifts (Fig.  1c, d ).

4D-CHAINS is written in Python programming language and consists of two modules: NH-mapping module and atom-type assignment module. As output, it provides TOCSY and NOESY (intraresidue and sequential) assignments of the input 4D peak lists allowing visual verification of results, and a chemical shift list in XEASY format that can be input together with NOESY peak lists to automated structure determination software.

At first, 4D-CHAINS clusters the 4D-HCNH TOCSY and 4D-HCNH NOESY peaks via the common root resonance ( 15 N– 1 H) they share to generate AAIGs of 13 C– 1 H correlated chemical shifts. For a given root resonance, the TOCSY AAIG provides sequential information, that is, the 13 C– 1 H aliphatic resonances of the previous amino acid in the sequence ( i -1), whereas the NOESY AAIG reports on any 13 C– 1 H moiety that is in close spatial proximity. By virtue of NOE distance dependence, the NOESY AAIG contains most, if not all, of the intraresidue 13 C– 1 H resonances ( i ).

For each TOCSY AAIG, 4D-CHAINS calculates the probability of an amino acid type for the preceding residue in the protein sequence using a probabilistic model 46 . Let us denote any amino acid of the 20 types by AA and the set of correlated chemical shifts in a TOCSY AAIG by CCS. The conditional probability P (AA|CCS) to get an amino acid type given the observed C–H resonances is highlighted in Eq. 2

where P (CCS|AA) is the conditional probability of 13 C− 1 H resonances for a given amino acid type, P (AA) is the prior probability of finding the given amino acid type in the protein sequence independent of the observed 13 C– 1 H resonances, and P (CCS) is the sum of the P (CCS|AA) terms over the 20 amino acid types. P (CCS|AA) can be accurately estimated for any amino acid type using the probability density maps of 13 C– 1 H correlated chemical shifts (Supplementary Figure  2 ). For a given number of 13 C– 1 H frequencies in a TOCSY AAIG all permutations of atom-type combinations are considered in calculating the probability for amino acids with possible atom types equal or larger to the TOCSY frequencies. In practice, however, only a small number of combinations is computed, because many 13 C– 1 H frequencies have non-zero probability only for distinct atom types of any amino acid (Supplementary Figure  2 ). P (CCS|AA) is considered the most probable combination, expressed as the product of probabilities of each 13 C– 1 H pair belonging to different atom types of a given amino acid. If the number of TOCSY 13 C– 1 H pairs is larger than the expected atom types of a given amino acid then P (CCS|AA) is set to zero. For every TOCSY AAIG several amino acid-type predictions are made and ranked according to their conditional probabilities. Depending on the TOCSY transfer efficiency, amino acids with long sidechains are predicted rather unambiguously. In our datasets, accurate predictions defined as the correct amino acid type being the most probable reached 89% (Fig.  1c ).

Next, sequential connectivity information is obtained by matching the 13 C– 1 H frequencies of every TOCSY AAIG ( i -1) to 13 C– 1 H frequencies present in any other NOESY AAIG ( i ), excluding the NOESY AAIG with the same root resonance ( 15 N– 1 H) as the TOCSY AAIG (Supplementary Figure  3a ). The sequential connectivities established for each TOCSY AAIG may vary in occupancy rate, defined as the ratio of matched frequencies versus the total number of TOCSY frequencies (Supplementary Figure  3 a). The algorithm creates a directed rooted tree from each AAIG and adds progressively edges and nodes using the connectivity information, until it reaches a maximum chain length (Supplementary Figure  3 b). As a tradeoff between efficiency and memory consumption, maximum length is set to six. Each chain X is then assigned a probability of occurrence given by the product of the probabilities of each connectivity type in that chain as shown in Eq. 3 to estimate the significance of each chain

where L is the chain length and k the position in the chain. In principle, chains with higher occupancy rate of connectivities are more likely to be correct.

Subsequently, the chains are used to generate a larger number of peptide sequence segments using the amino acid-type predictions obtained earlier. Each peptide is aligned to the protein sequence using the Needleman–Wunsch algorithm 47 . Many peptides are discarded at this stage due to alignment mismatches. For the aligned peptides, an alignment score S align is computed using the BLOSUM90 similarity matrix 48 , which quantifies the importance of the alignment to a specific amino acid sequence. Taken all the above into account, the weighted probability of assigning an AAIG from chain X to a specific residue in the protein sequence is given by Eq. 4 .

Since several different chains can be mapped at overlapping positions in the protein sequence, multiple AAIGs may correspond to each protein residue. For each position in the protein sequence, a confidence score ( C s ) is computed for every AAIG corresponding to the given position by summing over all the chains as indicated in Eq. 5 .

In order to identify the correct AAIGs mapped to the protein sequence from the large pool of aligned chains, the 4D-CHAINS algorithm exploits the overlap information by performing OLC assembly similar to DNA assembly techniques 22 (Fig.  2 ). From N- to C-terminus of the protein sequence, series of aligned chains are merged to contigs with identical overlap of length L -1, where L is the length of the chains. A contig terminates if there is no overlap to extend or if it encounters an AAIG that is already part of it. Chains that cannot be merged to contigs are considered spurious and discarded. Finally, all contigs are aligned to the protein sequence. For an AAIG to be assigned to a given residue in the protein sequence, two conditions must be met. First, only the absolute consensus AAIGs among the different contigs are taken into account for a given position in the sequence and, second, the consensus AAIG must have the higher confidence score for the given position (Fig.  2 ).

Mapping of AAIGs to the protein sequence is accomplished by a succession of iterations that differ in three parameters used: (i) the length of chains built; (ii) a Z -score cutoff that controls the amino acid-type predictions to be considered per TOCSY AAIG when chains are translated to peptides; and (iii) the occupancy rate of connectivities between a TOCSY AAIG and all matched NOESY AAIGs.

In the first iteration, stringent criteria are applied to ensure greater fidelity of predictions and extract long chains ( L  = 6) that are less likely to be aligned in a wrong position of the sequence. OLC assembly provides an additional level of scrutiny and removes lonely chains that cannot be extended to either end and are likely false. Accordingly, only consensus AAIGs are selected and if there is agreement with the probabilistic model (confidence score), then are mapped to certain positions of the protein sequence. Mapped AAIGs are restrained in successive rounds by eliminating all amino acid-type predictions and connectivities they participate in that are inconsistent with the NH mapping. This reduces noise and allows us to proceed gradually with shorter chains (minimum length 3) to fill short regions in the sequence that are flanked by gaps in connectivities or proline residues, incorporate AAIGs in the sequence that fulfill the connectivity criteria but have low amino acid-type probability due to abnormal chemical shifts, and finally account for the fact that NOESY AAIGs may not match all frequencies of a TOCSY AAIG. In each round both the OLC and the probabilistic rule must be met for accepting additionally mapped AAIGs to be restrained in the following round.

In the present implementation of the 4D-CHAINS algorithm, no mapping mistakes were made for the four protein targets. The NH-mapping coverage varied between 96 and 100% (Supplementary Figure  4 ; left column). To better evaluate the mapping performance of 4D-CHAINS, only the 13 C- 1 H correlated frequencies of α- and β-atoms were retained in the TOCSY input peak list to imitate the scenario of using a 4D CBHBCAHA(CO)NH experiment in conjunction with the 4D 13 C, 15 N edited HMQC-NOESY-HSQC. Interestingly, only the coverage dropped slightly but again no mistakes were introduced (Supplementary Figure  4 ; right column). This control experiment highlights the robustness of 4D-CHAINS that stems mainly from the predictive power of 13 C– 1 H correlated chemical shifts and the reliability of the connectivities established when carbon and proton frequencies are coupled.

For all AAIGs mapped to the protein sequence 4D-CHAINS obtains the assignment of aliphatic atoms by matching the observed 13 C– 1 H correlated frequencies to their distributions in the 2D probability density maps. First, the TOCSY frequencies are assigned to atom types of the previous amino acid in the sequence. Based on the amino acid type, pairs of 13 C– 1 H frequencies that differ by 0.2 p.p.m. or less in the carbon frequency are grouped to methylene moieties. The atom-type probability for these moieties is taken as the logarithmic average of the individual probabilities. Accordingly, all combinations of permutations are computed and the permutation with the highest probability provides the atom-type assignments for the TOCSY observed frequencies. It has been reported before 49 that automated assignments based on TOCSY-type transfer may interchange between atoms of certain amino acids because their chemical shift distributions overlap partially, as seen in the 2D probability density maps of correlated chemical shifts (Supplementary Figure  2 ). Another source of erroneous assignments may result from incomplete TOCSY patterns that become common as the size of the protein increases and also depend on the isotropic mixing period. For instance, in Leu residues, often the TOCSY observed correlations correspond to the α, β and one of the isopropyl atoms. Due to the missing correlations, any of the methyl groups could be wrongly assigned to the γ atom and vice versa, depending on the observed chemical shifts. 4D-CHAINS assigned all TOCSY correlations with an average error rate of 0.2% for 1845 13 C– 1 H moieties in total (4 errors out of 1845 types) (Supplementary Figures  6 and 7 ). Minor TOCSY-based misassignments should have in principle little impact on structure calculations driven by long-range NOEs because they involve intraresidue atoms.

Next, all TOCSY-derived assignments are transferred to the NOESY spectra starting from the last residue and going backwards. For every amino acid, first the intraresidue NOE peaks are assigned ( i ) by matching the TOCSY assigned peaks of the successive residue, and then any sequential NOE peaks ( i -1) by matching its own TOCSY assigned peaks. It has been noted from the early days of NMR 1 , and supported later by inter-proton statistics 21 , that for any given amide the observed NOE correlations to aliphatic protons are predominantly intraresidue ( i ) and sequential ( i -1). As a proof of that, 4D-CHAINS traced 99% of TOCSY peaks as intraresidue NOE correlations and 89% as sequential NOE correlations. This analysis demonstrates that for any given amide and its successive one, a large portion of common NOEs they share correspond to the aliphatic atoms of the former. This is particularly true for the methyl groups that in principle yield strong NOE correlations to amides, both intraresidue and sequential. The only exception is the distant methyl group of Met. Intraresidue NOE correlations are uniformly present to any type of secondary structure (α-helix, β-sheet, loops), whereas sequential NOE correlations are most prevalent in β-sheets. Yet, most of the missing sequential NOE correlations correspond to certain atom types ( δ and ε of Lys, δ of Arg, ε of Met, and to a lower extent γ of Leu and γ 1 of Ile).

Common NOEs are utilized to derive missing TOCSY-based assignments. For every residue separately, the NOESY peak intensities are normalized and peaks with low intensity (threshold 0.1 or specified otherwise) are left out. 4D-CHAINS scans the sequence backwards. For each residue where there is a missing assignment, it matches its unassigned NOE peaks with the unassigned NOE peaks of the next residue. For each peak where there is a match, a probability is derived for the missing atom-type assignments from the 2D density map of the particular residue type. Atom-type probabilities must belong to the 80th percentile of the density maps to be considered further. This filter prevents making any decisions when the correct assignment does not belong to any of the matched peaks. Accordingly, each probability is modified by the intensity of the corresponding peak to a score. This process is necessary to identify the correct assignment of methyl groups among the available options, because intraresidue methyl-amide NOEs yield stronger correlations. Several intensity transformations were tested extensively (see below) and the best performance for obtaining NOESY-type methyl assignments is given by the product of the 2D histogram probability and the intensity of the NOESY peak transformed by an exponential function, e.g., 2Dprob×(100×intensity 2 ). The highest score or product of scores provides the assignments for the missing atom types.

The efficiency of 4D-CHAINS in obtaining atom-type assignments from the 4D-HCNH NOESY spectrum has been tested in three different scenarios (Supplementary Figure  5 and Supplementary Table  1 ). In the current workflow 4D-CHAINS sought assignments not present in the TOCSY spectra. For the four different datasets, it assigned 13% of additionally assignable aliphatic atoms (277 carbon types) with an average error rate of 8.7% (24 errors out of 277 types). Then, it operated on synthetic data of a 4D CBHBCAHA(CO)NH experiment. It performed NH mapping successfully, assigned correctly all α- and β-atoms (1253 carbon types or 56.2% of all assignable atoms) and completed the missing assignments from the NOESY spectrum, where it assigned 37.9% of additionally assignable atoms (847 carbon types) with an average error rate of 5.1% (43 errors out of 847 types). Finally, only the backbone amide 15 N, 1 H HSQC frequencies were provided and 4D-CHAINS was asked to assign all aliphatic atoms from the 4D-HCNH NOESY spectrum (Supplementary Figure  8 ). 4D-CHAINS was able to assign 91.1% of all assignable aliphatic atoms (2033 carbon types) with an average error rate of 5.5% (112 errors out of 2033 types). In all cases the assignment error rate for methyl groups was lower: 7.9% for the first scenario (6 errors out of 76 methyls assigned), 3.8% for the second scenario (15 errors out of 393 methyls assigned) and 3.3% for the third scenario (15 errors out of 457 methyls assigned).

Automated atom assignments using FLYA

For FLYA resonance assignment calculations all available spectra were used as input, that is, 4D-HCNH TOCSY, 4D-HCNH NOESY and 4D-HCCH NOESY. The calculations were performed using the demo script provided with CYANA distribution shown in the Supplementary Methods.

NOE assignment and structure determination using Rosetta

NOE-based structural ensembles were generated using the csrosetta3 toolbox integrated within the Rosetta3 software suite. autoNOE-Rosetta 19 is one of the protocols included in the toolbox which performs automatic assignment of NOEs and structure determination based on the highly parallel RASREC-Rosetta 50 conformational sampling engine, which can successfully determine well-converged structures from sparse NMR data 19 , 51 . The main principle is to iterate the NOE assignment algorithm alongside a multi-stage (I–VIII) conformational sampling process, towards obtaining a network of long-range NOEs that drive structure refinement to the global minimum of the Rosetta energy function 24 . The protocol uses as input initial assignments of NOE cross-peaks, derived from the chemical shift lists provided by 4D-CHAINS. The selection of high-ranking initial assignments of NOE cross-peaks depends on several factors, including: symmetry of the peaks, chemical shift matching score and network anchoring. Long-range NOE restraints and backbone chemical shift fragments guide the generation of batches of preliminary, low-resolution structures, that are in turn utilized to evaluate and refine the NOE assignments. Short and medium-range NOEs are also assigned by the program, but not utilized in structure refinement. The autoNOE-Rosetta further eliminates peaks during the sampling process. In the final stages, only highly converged, lowest-energy models that satisfy the maximum number of assigned NOE distance restraints are retained.

In practice, the process of setting up the protocol and generating models involves the following steps. (1) Preparation of chemical shift, NOE peaks and sequence files. From the 4D-CHAINS XEASY chemical shift table, we generate a TALOS 52 file and perform empirical prediction of backbone torsion angles using TALOS-N 53 for a given protein sequence. Based on the predicted chemical shift order parameter, we retain only the rigid regions of the structure. (2) Fragment selection 54 from high-resolution structures in the PDB 23 . We use the TALOS-N φ , ψ and secondary structure-type predictions to bias the selection of 3- and 9-residue backbone fragments, excluding fragments derived from homologs to the target sequence present in the database. (3) Automated setup of autoNOE-Rosetta calculations for a range of restraint weight values.

According to this general procedure, we performed two sets of calculations for each target using NOE peak lists that included either (i) amide to aliphatic (HCNH) only, or (ii) amide to aliphatic and aliphatic to aliphatic (HCNH+HCCH). To improve sampling for nEIt, NOEs were supplemented by one RDC dataset. All calculations were setup with standard restraint weights of 5, 10, 25 and 50. The optimum restraint weight was selected based on an empirical cost function that considers individual restraint weights, Rosetta all-atom energies (talaris2014.wts 24 ), and degree of structural convergence in each calculation. Finally, we select an ensemble of 10 lowest-energy structures that show minimum NOE violations. A detailed method to setup the calculations and analyze the models is available in the Supplementary Methods.

NOE assignment and structure determination using CYANA

The 3D structural ensembles of all four target proteins used in this study were calculated using the CYANA 27 software suite, supplied with the same input datasets as with autoNOE-Rosetta. Depending on the size of each protein target, CYANA calculations required 45–90 min on 4 CPUs. The script with all parameters for CYANA calculations is available in the Supplementary Methods.

Ensemblator analysis

Analysis of the ensembles was performed using the Ensemblator 25 , 26 software for atom- and residue-level global and local comparisons. The Ensemblator first iteratively overlays pairs of structures and finally defines a “common core” of atoms that are consistently within a specified cutoff distance. For each set of comparisons, the needed cutoff distance was automatically determined by the Ensemblator to yield 20–40% of the atoms in the common core. These comparisons also yield pairwise weighted distance metrics that are used to embed the models into an N -dimensional space where N is the number of models. Also, for any specified group of models, an exemplar was defined as the model having the shortest average distance to all other models in its own group. Global comparisons between groups are performed after the common core atoms are used to overlay structures and local backbone comparisons are calculated based on the locally overlaid dipeptide residual which converts φ, ψ differences to a single distance 25 . The global and local comparisons involve quantifying the levels of variation for each residue within and between defined groups so that the level of intragroup variation can be compared with the intergroup variation.

Finally, the models in the crystallographic ensemble consisting of 51 aLP structures, used for Ensemblator analysis, were obtained from the CoDNaS 28 database by searching for α-lytic protease and utilizing all the available X-ray structures.

Restraint violation analysis

NOE restraint violations among the 10 lowest-energy models in each calculation are reported separately for different classes of restraints assigned by autoNOE-Rosetta (Supplementary Tables  2 , 6 – 9 ). First, restraints are automatically divided into three confidence classes according to a total assignment probability score 19 : highly confident (HI) (probability >70%), confident (MED) (probability >45%) and least confident (LOW) (probability <45%). Second, the ambiguity score reflecting assignment uniqueness 19 further classifies constraints into ambiguous (AMBIG) (ambiguity score >0.1), near unambiguous (NEAR_UNAMBIG) (ambiguity score <0.1) or unambiguous (UNAMBIG) (ambiguity score <0.01). Therefore, according to these criteria, each constraint can be classified into one of the six classes: HI_UNAMBIG, HI_NEAR_UNAMBIG, HI_AMBIG, MED_UNAMBIG, MED_AMBIG and LOW_AMBIG. Finally, due to the lack of stereo-specific assignments by 4D-CHAINS, the resulting autoNOE restraints are structurally degenerate and are therefore treated using an effective distance computed as the r −6 average between all the possible pairs of atoms 55 . We used a 7 Å upper distance bound to identify violations in the resulting structurally degenerate NOE restraints, shown as an average over the 10 lowest-energy models in each structural ensemble. The choice of 7 Å as upper distance bound is attributed to the use of a 70 ms mixing time where through-space magnetization transfer between closer protons can happen within a maximum distance range of 7 Å 56 . This statement was found to be true by direct observation of distances corresponding to confidently assigned NOEs in the X-ray structure of aLP (PDB ID 1PO1).

Computational cost

4D-CHAINS takes an average of half an hour to run on a commodity computer. All autoNOE-Rosetta structure calculations were performed at the UCSC Baker cluster with 13 compute nodes (AMD Opteron(tm), 2.4 GHz Processor 6378) and 32 cores per compute node. Typical message passing interface calculations are run in parallel on 100 cores, and depending on target size, take an average of: (i) 6–8 h (150 aa), (ii) 12–14 h (200 aa) and (iii) 16–18 h (250 aa). A total of approximately 2 million CPU hours was used for the various development stages of the method.

CS-Rosetta support for NMR Exchange Format

NMR restraint datasets are now represented using a new open standard, NMR Exchange Format (NEF) 57 . NEF is a self-contained format designed to be machine readable by common NMR structure determination software tools. The file is divided into sections where each section corresponds to the data used for structure calculation. Full specification of each section in NEF can be found at: https://github.com/NMRExchangeFormat/NEF/blob/master/specification/Overview.md .

NEF provides support for a set of identifiers to be used by software tools. We utilize the identifiers provided by the NEF specifications to design NEF converter and NEF parser tools as part of the csrosetta3 toolbox. NEF converter is a tool that can take sequence information, chemical shift assignments, RDC data (if available), distance restraints and peak information used for structure calculation and convert it to the standard NEF file format for deposition in databases (that support NEF).

Similarly, we also provide a series of tools to extract respective information from NEF file into FASTA, NOE restraint, chemical shift and peak files for subsequent automatic setup of CS-Rosetta for structure calculations. See supplementary information for detailed commands to convert to NEF and parse NEF file.

wwPDB data deposition

Currently, wwPDB does not support the NEF format, and we therefore utilized the NEF to BMRB translator program provided by BMRB 58 ( https://github.com/kumar-physics/BMRBTranslator ) to convert from NEF to NMR-STAR format for data deposition. The deposited NMR-STAR file consists of chemical shifts, peaks and RDCs used for structure calculation.

Code availability

4D-CHAINS is available on github ( https://github.com/tevang/4D-CHAINS ) for non-commercial usage. The updated CS-Rosetta (version 3.4) software and the detailed documentation for installation and usage can be obtained at the CS-Rosetta web server ( https://csrosetta.chemistry.ucsc.edu ). The current version (3.4) of CS-Rosetta also supports conversion of data to NMR Exchange Format for deposition to the wwPDB.

Data availability

Biological Magnetic Resonance Bank: chemical shifts, peak lists, RDCs have been deposited under 30322, 30325, 30326, and 30327 BMRB codes. Protein Data Bank: restraint lists and coordinates have been deposited under 5WOT, 5WOX, 5WOY, and 5WOZ PDB codes. Other data are available from the corresponding authors upon reasonable request.

Wüthrich, K. NMR of Proteins and Nucleic Acids . (Wiley: New York, 1986).

Ikura, M., Kay, L. & Bax, A. A novel approach for sequential assignment of proton, carbon-13, and nitrogen-15 spectra of larger proteins: heteronuclear triple-resonance three-dimensional NMR spectroscopy. Application to Calmodulin. Biochemistry 29 , 4659–4667 (1990).

Article   CAS   PubMed   Google Scholar  

Kay, L. E., Ikura, M., Tschudin, R. & Bax, A. Three-dimensional triple-resonance NMR spectroscopy of isotopically enriched proteins. J. Magn. Reson. 89 , 496–514 (1990).

ADS   CAS   Google Scholar  

Güntert, P. Automated structure determination from NMR spectra. Eur. Biophys. J. 38 , 129–143 (2009).

Article   PubMed   Google Scholar  

Schmidt, E. & Güntert, P. A new algorithm for reliable and general NMR resonance assignment. J. Am. Chem. Soc. 134 , 12817–12829 (2012).

Bahrami, A., Assadi, A. H., Markley, J. L. & Eghbalnia, H. R. Probabilistic interaction network of evidence algorithm and its application to complete labeling of peak lists from protein NMR spectroscopy. PLoS Comput. Biol. 5 , e1000307 (2009).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Guerry, P., Duong, V. & Herrmann, T. CASD-NMR 2: robust and accurate unsupervised analysis of raw NOESY spectra and protein structure determination with UNIO. J. Biomol. NMR 62 , 473–480 (2015).

Kay, L., Clore, G., Bax, A. & Gronenborn, A. Four-dimensional heteronuclear triple-resonance NMR spectroscopy of interleukin-1 beta in solution. Science 249 , 411–414 (1990).

Article   ADS   CAS   PubMed   Google Scholar  

Kazimierczuk, K. & Orekhov, V. Non-uniform sampling: post-Fourier era of NMR data collection and processing. Magn. Reson. Chem. 53 , 921–926 (2015).

Trautwein, M., Fredriksson, K., Moller, H. M. & Exner, T. Automated assignment of NMR chemical shifts based on a known structure and 4D spectra. J. Biomol. NMR 65 , 217–236 (2016).

Pritišanac, I. et al. Automatic assignment of methyl-NMR spectra of supramolecular machines using graph theory. J. Am. Chem. Soc. 139 , 9523–9533 (2017).

Shen, Y. et al. Consistent blind protein structure generation from NMR chemical shift data. Proc. Natl. Acad. Sci. USA 105 , 4685–4690 (2008).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Cavalli, A., Salvatella, X., Dobson, C. M. & Vendruscolo, M. Protein structure determination from NMR chemical shifts. Proc. Natl. Acad. Sci. USA 104 , 9615–9620 (2007).

Raman, S. et al. NMR structure determination for larger proteins using backbone-only data. Science 327 , 1014–1018 (2012).

Article   ADS   Google Scholar  

Lange, O. F. et al. Determination of solution structures of proteins up to 40 kDa using CS-Rosetta with sparse NMR data from deuterated samples. Proc. Natl. Acad. Sci. USA 109 , 10873–10878 (2012).

Thompson, J. M. et al. Accurate protein structure modeling using sparse NMR data and homologous structure information. Proc. Natl. Acad. Sci. USA 109 , 9875–9880 (2012).

Shen, Y. & Bax, A. Homology modeling of larger proteins guided by chemical shifts. Nat. Methods 12 , 747–750 (2015).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Tang, Y. et al. Protein structure determination by combining sparse NMR data with evolutionary couplings. Nat. Methods 12 , 751–754 (2015).

Lange, O. F. Automatic NOESY assignment in CS-RASREC-Rosetta. J. Biomol. NMR 59 , 147–159 (2014).

Zhang, Z., Porter, J., Tripsianes, K. & Lange, O. F. Robust and highly accurate automatic NOESY assignment and structure determination with Rosetta. J. Biomol. NMR 59 , 135–145 (2014).

Xu, Y., Zheng, Y., Fan, J. & Yang, D. A new strategy for structure determination of large proteins in solution without deuteration. Nat. Methods 3 , 931–937 (2006).

Li, Z. et al. Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph. Brief. Funct. Genom. 11 , 25–37 (2012).

Article   Google Scholar  

Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28 , 235–242 (2000).

O’Meara, M. J. et al. Combined covalent-electrostatic model of hydrogen bonding improves structure prediction with Rosetta. J. Chem. Theory Comput. 11 , 609–622 (2015).

Article   PubMed   PubMed Central   Google Scholar  

Clark, S. A., Tronrud, D. E. & Karplus, P. A. Residue-level global and local ensemble-ensemble comparisons of protein domains. Protein Sci. 24 , 1528–1542 (2015).

Brereton, A. E. & Karplus, P. A. Ensemblator v3: Robust atom-level comparative analyses and classification of protein structure ensembles. Protein Sci .  27 , 41–50 (2018).

Güntert, P. & Buchner, L. Combined automated NOE assignment and structure calculation with CYANA. J. Biomol. NMR 62 , 453–471 (2015).

Monzon, A.M., Rohr, C.O., Fornasari, M.S. & Parisi, G. CoDNaS 2.0: a comprehensive database of protein conformational diversity in the native state. J. Biol. Databases Curation.   https://doi.org/10.1093/database/baw038 (2016).

Jasnovidova, O., Krejcikova, M., Kubicek, K. & Stefl, R. Structural insight into recognition of phosphorylated threonine-4 of RNA polymerase II C-terminal domain by Rtt103p. EMBO Rep. 18 , 906–913 (2017).

Jasnovidova, O. et al. Structure and dynamics of the RNAPII CTDsome with Rtt103. Proc. Natl. Acad. Sci. USA 114 , 11133–11138 (2017).

Frueh, D. P. Practical aspects of NMR signal assignment in larger and challenging proteins. Prog. Nucl. Magn. Reson. Spectrosc. 78 , 47–75 (2014).

Kainosho, M. et al. Optimal isotope labelling for NMR protein structure determinations. Nature 440 , 52–57 (2006).

Tugarinov, V., Kanelis, V. & Kay, L. E. Isotope labeling strategies for the study of high-molecular-weight proteins by solution NMR spectroscopy. Nat. Protoc. 1 , 749–754 (2006).

Vinarov, D. A. & Markley, J. L. High-throughput automated platform for nuclear magnetic resonance–based structural proteomics. Expert. Rev. Proteom. 2 , 49–55 (2005).

Article   CAS   Google Scholar  

Dias, D. M. & Ciulli, A. NMR approaches in structure-based lead discovery: recent developments and new frontiers for targeting multi-protein complexes. Prog. Biophys. Mol. Biol. 116 , 101–112 (2014).

Kazimierczuk, K., Zawadzka, A., Koźmiński, W. & Zhukov, I. Random sampling of evolution time space and Fourier transform processing. J. Biomol. NMR 36 , 157–168 (2006).

Stanek, J., Augustyniak, R. & Koźmiński, W. Suppression of sampling artefacts in high-resolution four-dimensional NMR spectra using signal separation algorithm. J. Magn. Reson. 214 , 91–102 (2012).

Delaglio, F. et al. NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J. Biomol. Nmr. 6 , 277–293 (1995).

Goddard, T. D. & Kneller, D. G. SPARKY 3 (University of California, San Francisco, San Francisco, 2000).

Google Scholar  

Bax, A., Kontaxis, G. & Tjandra, N. Dipolar couplings in macromolecular structure determination. Methods Enzymol. 339 , 127–174 (2001).

Hansen, M. R., Mueller, L. & Pardi, A. Tunable alignment of macromolecules by filamentous phage yields dipolar coupling interactions. Nat. Struct. Mol. Biol. 5 , 1065–1074 (1998).

Fitzkee, N. C. & Bax, A. Facile measurement of 1H–15N residual dipolar couplings in larger perdeuterated proteins. J. Biomol. NMR 48 , 65–70 (2010).

Gronwald, W. & Kalbitzer, H. R. Automated structure determination of proteins by NMR spectroscopy. Prog. Nucl. Magn. Reson. Spectrosc. 44 , 33–96 (2004).

Baran, M. C., Huang, Y. J., Moseley, H. N. B. & Montelione, G. T. Automated analysis of protein NMR assignments and structures. Chem. Rev. 104 , 3541–3556 (2004).

Reiping, W. & Vranken, W. F. Validation of archived chemical shifts through atomic coordinates. Proteins 78 , 2482–2489 (2010).

Marin, A., Malliavin, T. E., Nicolas, P. & Delsuc, M. A. From NMR chemical shifts to amino acid types: Investigation of the predictive power carried by nuclei. J. Biomol. NMR 30 , 47–60 (2004).

Needleman, S. B. & Wunsch, C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48 , 443–453 (1970).

Henikoff, S. & Henikoff, J. G. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89 , 10915–10919 (1992).

Hiller, S., Joss, R. & Wider, G. Automated NMR assignment of protein side chain resonances using automated projection spectroscopy (APSY). J. Am. Chem. Soc. 130 , 12073–12079 (2008).

Lange, O. & Baker, D. Resolution-adapted recombination of structural features significantly improves sampling in restraint-guided structure calculation. Proteins 80 , 884–895 (2012).

Warner, L. R. et al. Structure of the BamC two-domain protein obtained by Rosetta with a limited NMR data set. J. Mol. Biol. 411 , 83–95 (2011).

Cornilescu, G., Delaglio, F. & Bax, A. Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J. Biomol. NMR 13 , 289–302 (1999).

Shen, Y. & Bax, A. Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks. J. Biomol. NMR 56 , 227–241 (2013).

Gront, D., Kulp, D. W., Vernon, R. M., Strauss, C. E. M. & Baker, D. Generalized fragment picking in Rosetta: design, protocols and applications. PLoS ONE e23294 (2011).

Nilges, M. Ambiguous distance data in the calculation of NMR structures. Fold Des. 2 , S53–S57 (1997).

Foster, M. P., McElroy, C. A. & Amero, C. D. Solution NMR of large molecules and assemblies. Biochem. (Mosc.). 46 , 331–340 (2007).

Gutmanas, A. et al. NMR Exchange Format: a unified and open standard for representation of NMR restraint data. Nat. Struct. Mol. Biol. 22 , 433–434 (2015).

Ulrich, E. L. et al. BioMagResBank. Nucleic Acids Res. 36 , D402–D408 (2008).

Download references

Acknowledgements

We thank Peter Lukavsky, Richard Stefl and Arie Geerlof for providing NMR samples and Alison Barrett for assistance during the early stages of developing the csrosetta3 code. This research was supported by a grant from the Czech Science Foundation (15-22380Y), project CEITEC 2020 (LQ1601) with financial contribution from the MEYS CR and National Programme for Sustainability II, and a Marie Curie Career Integration Grant (618223) to K.T., by NIH grant R01GM083136 to P.A.K., a K-22 Career Development and an R35 Outstanding Investigator Award to N.G.S. through NIAID(AI2573-01) and NIGMS(1R35GM125034-01), respectively. CIISB research infrastructure project LM2015043 funded by MEYS CR is gratefully acknowledged for the financial support of the measurements at CEITEC Josef Dadok National NMR Centre. We acknowledge the UCSC 800 MHz NMR facility supported by the Office of the Director, NIH, under High End Instrumentation (HIE) Grant S10OD018455.

Author information

Authors and affiliations.

CEITEC—Central European Institute of Technology, Masaryk University, Kamenice 5, Brno, 62500, Czech Republic

Thomas Evangelidis, Jiří Nováček & Konstantinos Tripsianes

Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, CA, 95064, USA

Santrupti Nerli

Department of Computer Science, University of California Santa Cruz, Santa Cruz, CA, 95064, USA

Santrupti Nerli & Nikolaos G. Sgourakis

Department of Biochemistry and Biophysics, 2011 Ag & Life Sciences Bldg, Oregon State University, Corvallis, OR, 97331, USA

Andrew E. Brereton & P. Andrew Karplus

Department of Chemistry, Iowa State University, 2438 Pammel Drive, Ames, IA, 50011, USA

Rochelle R. Dotas & Vincenzo Venditti

Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA, 50011, USA

Vincenzo Venditti

You can also search for this author in PubMed   Google Scholar

Contributions

K.T. conceived and together with N.G.S. designed the project. T.E. and K.T. developed and executed the 4D-CHAINS algorithm. J.N. and K.T. developed, recorded and analyzed 4D NMR experiments. S.N. and N.G.S. developed the CS-Rosetta3 software and performed parallel structure calculations, structure analysis and validation, additional testing of 4D-CHAINS and NMR experiments. A.E.B. and P.A.K. performed Ensemblator analysis. R.R.D. and V.V. prepared labeled protein samples and recorded RDC data for nEIt. N.G.S., S.N., T.E. and K.T. wrote the paper, with feedback from all authors.

Corresponding authors

Correspondence to Nikolaos G. Sgourakis or Konstantinos Tripsianes .

Ethics declarations

Competing interests.

The authors declare no competing financial interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary information, peer review file, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Evangelidis, T., Nerli, S., Nováček, J. et al. Automated NMR resonance assignments and structure determination using a minimal set of 4D spectra. Nat Commun 9 , 384 (2018). https://doi.org/10.1038/s41467-017-02592-z

Download citation

Received : 20 August 2017

Accepted : 12 December 2017

Published : 26 January 2018

DOI : https://doi.org/10.1038/s41467-017-02592-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Recognition and coacervation of g-quadruplexes by a multifunctional disordered region in recq4 helicase.

  • Anna C. Papageorgiou
  • Michaela Pospisilova
  • Konstantinos Tripsianes

Nature Communications (2023)

Distinct EH domains of the endocytic TPLATE complex confer lipid and protein binding

  • Klaas Yperman
  • Daniel Van Damme

Nature Communications (2021)

Phosphorylation-induced changes in the PDZ domain of Dishevelled 3

  • Miroslav Jurásek
  • Jitender Kumar
  • Robert Vácha

Scientific Reports (2021)

Macromolecular modeling and design in Rosetta: recent methods and frameworks

  • Julia Koehler Leman
  • Brian D. Weitzner
  • Richard Bonneau

Nature Methods (2020)

Comparative phosphorylation map of Dishevelled 3 links phospho-signatures to biological outputs

  • Kateřina Hanáková
  • Ondřej Bernatík
  • Vítězslav Bryja

Cell Communication and Signaling (2019)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

nmr backbone assignment

nmr backbone assignment

New Solid-State NMR Method for Protein Backbone Assignment

Introduction Solid-state nuclear magnetic resonance (NMR) spectroscopy has emerged as a well-established technique due to ongoing developments regarding higher magnetic fields (e.g. Bruker AVANCE 1000 MHz NMR spectrometer), advanced probe technology (e.g. ongoing development toward smaller rotor sizes to reach higher magic-angle spinning (MAS) rates), and selected pulse sequences.

A crucial step for solid state NMR site-specific protein characterization and structure elucidation is the time- consuming resonance assignment. The most common 2D and 3D experiments for intra- as well as inter-residue assignment use homonuclear 13 C- 13 C dipolar recoupling sequences (e.g. proton-driven spin diffusion (PDSD [1, 2]) or dipolar assisted rotational resonance (DARR [3])) as well as specific heteronuclear 15 N- 13 C magnetization transfer [4] combined with 13 C- 13 C transfer (e.g. NCαCx, NCOCx).

In the latter case, the magnetization transfer from Cα to CO within one amino acid residue ‘i’ as well as from the CO of a given residue ‘i’ to Cα of the preceding residue ‘i-1’ is essential for protein backbone assignment (Figure 1).

Previously, this homonuclear 13 C- 13 C transfer was mainly done by adding a dipolar-recoupling sequence, such as PDSD or DARR, to the specific NCx sequence. Recently, Adam Lange and co-workers introduced the robust and highly efficient d ipolar-based band-selective homonuclear cross polarization (BSH CP) to transfer magnetization between CO and Cα spins.[5-7]

The technique is of special interest for deuterated as well as protonated proteins that can be measured at moderate MAS rates only (e.g. due to lack of a probe with small rotor diameter size, limited signal-to-noise ratio, etc.).

As a first-order recoupling mechanism, BSH CP efficiently suppresses unwanted sequential and long-range CO-Cα and Cα-Cα magnetization transfer by dipolar truncation. Furthermore, no undesirable transfer to Cβ resonances appears as it would in the conventional NCxCx sequences (compare with Figure 2A). Reasonable 3D experiments can be recorded within just one day. [5-7]

This article (and supporting Application Note) presents a set of BSH CP experiments to facilitate the recording of spectra and to simplify the assignment process. A detailed description of necessary parameters is given.

BSH CP – Experiments In cooperation with Adam Lange and coworkers, Bruker offers five different BSH CP pulse programs (‘ppg’) from 1D up to 4D:

  • 1D-3D hNCαCO (Figure 2B, ppg: ‘hNCaCO3D.bsh’)
  • 1D-3D hNCOCα (Figure 2C, ppg: ‘hNCOCa3D.bsh’)
  • 1D-4D hNCOCαCβ (Figure 3A, ppg: ‘hNCOCaCb4D.bsh’)
  • 1D-4D hCαNCOCα (Figure 3B, ppg: ‘hCaNCOCO4D.bsh’)
  • 1D-3D hCα(N)COCα (Figure 3B, ppg: ‘hCa_n_COCa3D.bsh’)

While the hNCαCO is the only experiment providing intraresidue correlations, all other sequences deliver sequential interresidue correlations (Figure 1). The two last experiments contain two specific double CP transfer steps to correlate vicinal Cα resonances with each other.  

overlay experiments

With the combination of these experiments, a fast protein backbone assignment is possible that can be combined with further experiments to elucidate the secondary and tertiary protein structure.

BSH CP – Pulse Sequences In general all BSH CP experiments follow the same pulse sequence architecture (Figures 2B, C & 3). The initial magnetization derives from a 90° excitation pulse on 1 H (black) followed by the initial CP transfer (light blue). While in the first three experiments this is a H-N polarization transfer, experiments four and five use a H-Cα CP. In any case the initial CP is followed by a specific double CP transfer (dark blue), either from N to Cα or CO or from Cα to N. In experiments four and five a further specific double CP, from N to CO, follows (Figure 3B, purple).

4D hNCOCαCβ sequence

Finally, the homonuclear C-C transfer is achieved by the BSH CP (light red), either from Cα to CO or from CO to Cα. The hNCOCαCβ experiment comprises another homonuclear transfer to correlate Cα of a residue with its side chain Cβ atom. Here, a DREAM (Dipolar Recoupling Enhancement through Amplitude Modulation, [8]) transfer using a ramped shape has been proven to be most efficient (Figure 5B).[7]

overview of parameter settings

During the evolution times (t1 – t4) high power 1H decoupling is used (light gray). During specific double CP and BSH CP transfers, continuous wave (CW, dark gray) 1H decoupling is active. JCN decoupling is provided by centered 180° 13C or 15N pulses during evolution times and low power 15N decoupling during acquisition.

Finally there are two important trim pulses. The first one (red-filled) is applied before a CO-Cα BSH CP to prepare the CO magnetization for the CO-Cα transfer. The second pulse (red-framed) is important whenever CO signal shall be detected to prepare the CO magnetization for maximum detection. The theory behind both pulses will be introduced below.

Which Experimental Conditions to Choose? The BSH CP approach is dependent on moderate MAS rates (≤ 26 kHz) and high external magnetic B0 fields (≥ 600 MHz), since band-selectivity only works if the isotropic chemical shift difference between CO and Cα (Δ) considerably exceeds the MAS rate νR:

fig 1

Usually, the average CO and Cα resonances appear at approximately 174 ppm and 54 ppm, respectively, resulting in a chemical shift difference (Δppm) of 120 ppm. For the conversion from ppm into Hz (ΔHz) the 13 C gyromagnetic ratio γC and the external magnetic field Bo have to be taken into account:

nmr backbone assignment

For example, on a 600 MHz spectrometer the chemical shift difference of 120 ppm on 13 C equals approximately 18 kHz:

ΔHz = 120ppm ×0.25×600MHz = 18000 Hz

Revisiting Equation (1) shows that only MAS rates below 18 kHz are applicable for the use of BSH CP experiments:

ΔHz = 18 kHz > νR

Before starting any BSH CP experiment, the external magnetic field B0 as well as the rotor size and the resulting MAS rate should be considered carefully. A detailed overview of suitable experimental settings is summarized in Table 1.

Parameter Optimization Once these experimental conditions have been chosen, certain parameters need to be optimized: 90° hard pulses for 1 H, 13 C, and 15 N, HC CP, HN CP, 1 H high power decoupling, NCα double CP, NCO double CP, CαN double CP, 1 H CW decoupling, and the different offset frequencies (center of CO/ Cα/ Call/ CαCβ resonances). Here, the use of Bruker’s TopSolidsbio leads to fast results.

Finally, the BSH CP specific parameters (CP contact time and power level, first and second trim pulse lengths where necessary, see Table 2 for detailed information) need to be optimized – ideally in their 1D ppg versions.

The provided Bruker pulse programs comprise detailed information about well-chosen parameter settings. Furthermore, Bruker offers a script called ‘calcbshcp’ that is calculating all relevant BSH CP parameters based on the chosen experimental conditions automatically. Nevertheless, a theoretical overview and formulae for manual parameter setting are presented in the following paragraphs.

How to Achieve CO – Cα Band-Selective CP Transfer For cross polarization between two different nuclei, the Hartmann-Hahn (HH) condition needs to be fulfilled. Nothing less holds true for the homonuclear BSH CP:

nmr backbone assignment

Both, zero-quantum (ZQ, n = 1) and double-quantum (DQ, n =
2) homonuclear recoupling can be achieved when the sum or difference of the effective field strengths (Beff) acting on CO and Cα is equal to once or twice the MAS rate (νR). Experimental data have shown that DQ provides higher transfer efficiency compared to ZQ transfer. [5, 6]

The effective field (Beff,x) for a spin X is determined by the applied RF irradiation field (B1) and the chemical shift offset (Ω) from the carrier frequency (known as the offset ‘o1’ in TopSpin, Figure 4A, B):

nmr backbone assignment

The 13C ΔHz of CO and Cα is known from Equation (2). Hence, if the carrier frequency is set on either of the two resonance bands (CO or Cα) during the BSH CP transfer, the chemical shift difference equals the offset: Ω = Δ Hz . Experimental results have shown that the Cα resonance band is not very dependent on the resonance position, while the CO chemical shift dispersion affects the effective field much stronger and can rapidly lead to a mismatch of the BSH CP condition.[5] Thus, the carrier frequency for RF irradiation should always be set on resonance in the middle of the Cα band (Figure 4A, D) resulting in an effective Cα field (Beff,Cα that equals the applied B1 field:

Beff,Cα = B1 .

applied RF irradiation pulse

With the knowledge of the chemical shift offset and the MAS rate, the RF field strength B1 to allow for BSH CP transfer (= ‘BBSH’ in the following) between CO and Cα can be calculated. When choosing the favored DQ transfer, the HH condition (Equation (3)) results in:

2×vR = Beff,CO +Beff,Cα. (3)

Considering Equation (5) Equation (3) can be solved for Beff,CO:

Beff,CO = 2×vR −BBSH. (3)

Equation (3) can be equated to Equation (4):

2×vR –BSH = BSH2+ΔHz2)1⁄2 (3=4)

and solved for BBSH, the field strength needed for the BSH CP condition:

nmr backbone assignment

Revisiting the example of the previous page, we know that ΔHz is 18 kHz. When spinning at e.g. 15 kHz, the BSH CP spin-lock pulse, which fulfills the HH condition, becomes:

nmr backbone assignment

The last column in Table 1 summarizes the effective B1 fields for the suggested experimental conditions.

CO Trim Pulse for CO-Cα BSH CP In BSH CP sequences with a polarization transfer from Cα to CO, CO magnetization is aligned along Z before the BSH CP. Because of the chemical shift offset from CO to Cα, the RF irradiation field (BBSH), which is on resonance with Cα, flips CO magnetization directly into the needed Beff,CO direction (XZ-plane, light red) to fulfill the HH condition (compare Figure 4B and D).

On the other hand, in BSH CP sequences with a polarization transfer from CO to Cα, Cα magnetization is aligned along Z, while CO magnetization is oriented along X after the specific NCO CP (Figure 2, dark blue). Thus, by applying BBSH along X, the effective field of CO would not be flipped into the needed direction for CP transfer, but stay along X.

Therefore, a trim pulse has to precede the BSH CP to flip CO magnetization into the effective off resonance CO field (Beff,CO) of BBSH (Figure 2 and Figure 4C, D). This can be done by applying a hard trim pulse on resonance with the CO resonance band (red-filled pulse in Figure 2C, known as ‘p28’ in Bruker pulse programs).

The required flip angle θ is calculated by:

nmr backbone assignment

In our example, the flip angle is:

nmr backbone assignment

Usually, the flip angle θ is about 62-63°. With respect to the 13 C 90° pulse length (‘p1’ in Bruker pulse programs), p28 is calculated as:

nmr backbone assignment

Assuming that the 13 C 90° pulse has a strength of 55.6 kHz (corresponding to a pulse length of 4.5 μs) a 62° flip pulse would need a length of:

nmr backbone assignment

Trim Pulse for CO Detection A further trim pulse has to be applied whenever CO signal shall be detected, e.g. in the hNCαCO, but also in a 2D hNCOCα (Figure 2, red-framed pulse, known as ‘p29’ in Bruker pulse programs). In any other experiment, where CO is already detected in an indirect dimension (as e.g. in the 3D hNCOCα version) no second trim pulse is needed, since it would only maximize diagonal CO-CO peaks, but no CO-Cα cross correlation peaks.[5-7] As is described in the relevant ppg, the trim pulse can be activated for the different dimensions by setting the flag ‘-Dflip’ in the ‘zgoptns’.

Figure 5 depicts the location of effective fields (A) and magnetization(B) during a BSH CP. While Cα magnetization (MCα,BSH) is spin-locked along X, CO magnetization (MCO,BSH, corresponding to Beff,CO) is located in the XZ-plane (light red), but needs to be flipped into XY for detection.

To further spin-lock the Cα magnetization for detection as well, this second trim pulse (Btrim2 can only be applied along X. Under these conditions the maximum remaining CO magnetization can only be detected, when it is aligned exactly on the Y axis (resulting in a 90° phase shift of CO signal compared to Cα signal).

Viewing direction of Figure 4D rotated around the Z axis

As can be seen in Figure 5B, after the BSH CP the MCO,BSH is perpendicular to the Y axis (purple plane). Hence, to flip the CO magnetization onto the Y axis (MCO,after trim2), only a trim pulse along X is permitted that creates an effective field Beff2,CO), which is perpendicular to both, MCO,after trim2 and MCO,BSH. Figure 5C shows, that Btrim2 must be applied along -X to produce this Beff2,CO. The angle between MCO,BSH and the X axis equals the angle between Beff2,CO and the Z axis (θ).

As in Equation (7) θ can be expressed as:

nmr backbone assignment

When equating Equations (7) and (9), we can solve for the missing field strength of the second trim pulse:

nmr backbone assignment

The calculated amplitude of Btrim2 needs to be converted into the length of a 90° pulse, because the effective Beff2,CO field is shifted by 90° from MCO,BSH. Finally p29 becomes:

nmr backbone assignment

For our example this would mean a Btrim2 field and a length p29 of:

nmr backbone assignment

Finally, when applying the RF field strength BBSH for 7.4 μs, the CO magnetization becomes detectable in our example.

  • N.M. Szeverenyi et al. Observation of spin exchange by two-dimensional fourier transform 13C cross polarization-magic-angle spinning. J Magn Reson (1982) 47:462-475.
  • A.G. Pines et al. Proton-enhanced NMR of dilute spins in solids. J Chem Phys (1973) 59:22.
  • K. Takegoshi et al. 13C-1H dipolar-assisted rotational resonance in magic-angle spinning NMR. Chem Phys Lett (2001) 344:631-637.
  • M. Baldus et al. Cross polarization in the tilted frame: assignment and spectral simplification in heteronuclear spin systems. Mol Phys (1998) 95:1197-1207.
  • V. Chevelkov et al. Efficient CO–CA transfer in highly deuterated proteins by band-selective homonuclear cross-polarization, J Magn Reson (2013) 230:205–211.
  • V. Chevelkov et al. Efficient band-selective homonuclear CO–CA cross- polarization in protonated proteins, J Biomol NMR (2013) 56:303–311.
  • C. Shi et al. BSH-CP based 3D solid-state NMR experiments for protein resonance assignment, J Biomol NMR (2014) Epub ahead of print.
  • R. Verel et al. A homonuclear spin-pair filter for solid-state NMR based on adiabatic-passage techniques. Chem Phys Lett (1998) 287:421-428.  
  • Explore the MR Technology page to get an overview of Magnetic Resonance and all NMR, EPR

Related Products

DNP-NMR

NMR Instruments

Avance Neo 400 Nanobay

Magnetic Resonance

Book cover

Encyclopedia of Biophysics pp 2033–2037 Cite as

Protein NMR Resonance Assignment

  • Fuyuhiko Inagaki 2  
  • Reference work entry

1413 Accesses

Biosynthetic labeling ; Side chain assignment ; Spectroscopic assignment

Overview of Protein Resonance Assignment

Until the introduction of sequential assignment procedure developed by Kurt Wüthrich and his coworkers in 1980s (Wüthrich 1986 ), most of the protein assignment work was accomplished with reference to the crystal structure. Therefore, the establishment of the sequential assignment procedure was a mile stone for the protein NMR. Backbone amide proton (H N ) and α proton (H α ) signals were sequentially assigned based on the distance information between H N i and \({\rm H}^{\alpha}_{{\rm i}-1}\) , and were aligned on the amino acid sequence of the particular protein. This facilitates NMR to be independent from X-ray crystallography and the structure of proteins in solution could be determined by NMR using the assignment of proton signals and proton-proton distance information. However, due to limited resolution in 1 H 2D-NMR spectra, the molecular weight of the target protein...

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Bax A, Grzesiek S. Methodological advances in protein NMR. Acc Chem Res. 1993;26:131–8.

CAS   Google Scholar  

Cavanagh J, Fairbrother W, Palmer AG, Rance M, Skeleton NJ. Protein NMR spectroscopy. 2nd ed. Amsterdam: Elsevier; 2007.

Google Scholar  

Kainosho M, Tsuji T. Assignment of the three methionyl carbonyl carbon resonances in Streptomyces subtilisin inhibitor by a carbon-13 and nitrogen-15 double- labeling technique. A new strategy for structural studies of proteins in solution. Biochemistry. 1982;21:6273–9.

CAS   PubMed   Google Scholar  

Kay LE. Nuclear magnetic resonance methods for high molecular weight proteins: a study involving a complex of maltose binding protein and β-cyclodextrin. In: James TL, Dotsch V, Schmitz U, editors. Methods in enzymology 339. New York: Academic; 2001. p. 174–203.

McIntosh LP, Dahlquist FW. Biosynthetic incorporation of 15 N and 13 C for assignment and interpretation of nuclear magnetic resonance spectra of proteins. Q Rev Biophys. 1990;23:1–38.

Morita EH, Shimizu M, Ogasawara T, Endo Y, Tanaka R, Kohno T. A novel way of amino acid-specific assignment in 1 H- 15 N HSQC spectra with a wheat germ cell-free protein synthesis system. J Biomol NMR. 2004;30:37–45.

Shen Y, Delaglio F, Cornilescu G, Bax A. TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. J Biomol NMR. 2009;44:213–23.

CAS   PubMed Central   PubMed   Google Scholar  

Wüthrich K. NMR of proteins and nucleic acids. New York: Wiley; 1986.

Wüthrich K, Wider K. Transverse relaxation-optimized NMR spectroscopy with biomacromolecular structure in solution. Magn Reson Chem. 2003;41:S80–8.

Download references

Author information

Authors and affiliations.

Department of Structural Biology, Hokkaido University, N21, W11, Kita-ku, Sapporo, 001-0021, Japan

Fuyuhiko Inagaki

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Fuyuhiko Inagaki .

Editor information

Editors and affiliations.

Department of Biochemistry, University of Leicester, Leicester, UK

Gordon C. K. Roberts

Rights and permissions

Reprints and permissions

Copyright information

© 2013 European Biophysical Societies' Association (EBSA)

About this entry

Cite this entry.

Inagaki, F. (2013). Protein NMR Resonance Assignment. In: Roberts, G.C.K. (eds) Encyclopedia of Biophysics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16712-6_312

Download citation

DOI : https://doi.org/10.1007/978-3-642-16712-6_312

Publisher Name : Springer, Berlin, Heidelberg

Print ISBN : 978-3-642-16711-9

Online ISBN : 978-3-642-16712-6

eBook Packages : Biomedical and Life Sciences Reference Module Biomedical and Life Sciences

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Integrated Structural Biology

  • ‹ Prev
  • Next ›

1.1 Deciphering Resonance Assignment: The What, Why and How?

1.2 routine backbone assignment, 1.2.1 backbone sequential walk for small proteins (<25 kda), 1.2.2 backbone sequential walk for larger proteins (>25 kda), 1.2.3 amino acid type identification and resonance assignment, 1.2.4 selective isotopic labeling and unlabeling of individual amino acid types, 1.2.5 secondary structure assessment, 1.3 routine side-chain assignment, 1.3.1 assignment of aliphatic spin systems, 1.3.2 choice of the tocsy sequence, 1.3.3 assignment of aromatic spin systems, 1.4 new frontiers in backbone resonance assignment of proteins, 1.4.1 metabolic labeling using pyruvate to mitigate 13 c homonuclear coupling, 1.4.2 incorporating additional information on the ca resonance by band-selective ca/cb decoupling, 1.5 new developments for side-chain assignment, 1.5.1 new ideas for side-chain assignment, 1.5.2 structure-guided assignment of methyl groups, 1.6 covariance nmr for resonance assignment, 1.7 strategies for assignment of intrinsically disordered proteins, 1.7.1 higher dimensionality experiments, 1.7.2 13 c-direct detected strategies for resonance assignment, 1.7.3 direct 15 n detection: exploiting slow relaxation properties and absence of homonuclear coupling for enhanced resolution at high magnetic fields, 1.8 conclusion, chapter 1: decoding atomic addresses: solution nmr resonance assignment of proteins.

  • Published: 08 Dec 2023
  • Special Collection: 2023 ebook collection Series: New Developments in NMR
  • Open the Chapter PDF for in another window
  • Get permissions
  • Cite Icon Cite

T. Viennet, A. Dubey, R. Törner, M. A. Droemer, P. Coote, D. P. Frueh, ... H. Arthanari, in Integrated Structural Biology, ed. T. Polenova, C. M. Quinn, and A. M. Gronenborn, Royal Society of Chemistry, 2023, vol. 30, ch. 1, pp. 1-42.

Download citation file:

  • Ris (Zotero)
  • Reference Manager

NMR is a powerful analytical technique that permits the exploration of biomolecules under physiological conditions with atomic resolution. It is especially applicable for examining protein structures and their interactions and dynamics in environments closely resembling their native state, extending its utility to uniquely study disordered proteins. Nevertheless, to extract atomic resolution details, one must successfully correlate observed resonances with their originating nuclei, a process known as ‘resonance assignment’. Even with over fifty years of technical advancements, resonance assignment frequently becomes a bottleneck in the utilization of NMR for the comprehensive study of structure, dynamics, and interactions. In this context, we delve into both the traditional methods and the emerging frontiers in protein resonance assignment strategies for solution NMR. Our goal is to provide a comprehensive view of the existing experimental methodologies, with a focused discussion on their strengths and potential limitations. In this chapter, we will strictly focus on resonance assignment strategies for proteins.

Resonance assignment signifies the process through which an NMR signal at a certain resonance frequency, or chemical shift, is correlated to a specific atom in a protein. The resonance frequency depends, quite naturally, on the type of nucleus involved – such as 1 H, 13 C, or 15 N – but also hinges on the precise chemical environment of the nuclei, which in turn is dictated by the three-dimensional structure of the molecule. This aspect renders the chemical shift particularly sensitive to modifications in the molecular structure, dynamics, and interactions. The true power of NMR lies in its potential to chart these changes to the chemical shift, providing an atomic resolution picture of proteins. However, actualizing this potential necessitates a thorough understanding of the point of origin of the resonance, commonly referred to as its assignment.

For small to medium-sized proteins, ranging from 500 to 2000 Da, which are gaining prominence as a new pharmacological modality, resonance calculations can be performed using a blend of quantum mechanics, such as density functional theory, and empirical data. These calculations take into account various factors like the structure and solvent. However, predicting chemical shifts with the necessary precision and creating an assignment solely based on computational methods remains a challenge, particularly for molecules containing chemically similar moieties. Therefore, a combination of homonuclear 1 H– 1 H 2D experiments, including DQF-COSY, TOCSY, and NOESY/ROESY, is usually employed for resonance assignments of such small to medium-sized molecules.

When we consider a polypeptide and focus specifically on the i th amino acid residue, the majority of protons within that amino acid are part of a single 1 H– 1 H 3 J scalar coupling network. Given the unique side-chain structure of each amino acid residue, the 3 J spin coupling patterns and chemical shifts can be utilized to determine the amino acid type. However, these 3 J scalar coupling networks are separated by amide bonds, and as such the connection between amino acid residues cannot be determined using 1 H– 1 H scalar couplings alone. This connectivity is instead defined by the spatial proximity of protons from the i th to the i  − 1th residue, as identified through dipolar coupling-mediated NOESY/ROESY experiments. Regardless of the secondary structure, proton pairs within a 4 Å distance can be found between the i th and i  − 1th residues. Therefore, a NOESY spectrum can effectively establish sequential connectivity between neighboring residues in polypeptides. It is worth noting that NOESY/ROESY-based sequential assignment strategies have also been established for nucleic acids. However, strategies that rely exclusively on proton resonances face limitations due to the pronounced chemical shift degeneracy of proton resonances and the challenge of distinguishing sequential NOEs from long-range NOEs from non-sequential amino acids as a result of the tertiary structure. As a result, this strategy is generally applicable up to peptides consisting of around 50 amino acid residues and becomes difficult to apply for larger proteins and nucleic acids. It should be noted that for proteins in this size range isotopic labeling is not required.

Pioneering advancements in stable isotopic labeling, NMR instrumentation, and novel NMR methods have significantly expanded the molecular weight range for resonance assignments. Protein NMR spectroscopists utilize an array of experiments specifically designed for assigning protein residue resonances, factoring in the system’s molecular weight and relaxation rates. These experiments necessitate isotopic enrichment of proteins with 13 C, and 15 N, enabling a sequential walk along the protein backbone by correlating the amide 1 H N – 15 N resonances of a given residue to that of its predecessor. This is achieved by transferring magnetization through 13 CA nuclei and encoding 13 CA, 13 CO, or 13 CB resonances, as demonstrated in the triple-resonance experiments (refer to Figure 1.1 ). This approach to resonance assignment capitalizes on the fact that there is substantial scalar coupling between the amide nitrogen nuclei and its corresponding CA nuclei, as well as the CA nuclei of the preceding amino acid. The goal of this chapter is to provide a comprehensive review of both the currently available experiments and novel methodologies employed for protein resonance assignments in solution NMR. This resource is intended for the biomolecular NMR community, and it assumes that the reader possesses a fundamental understanding of protein NMR.

Schematic workflow for protein backbone resonance assignment. Each cross-peak in a 15N–1H HSQC spectrum (allocated random numbers #1–4) is inspected in a third 13C dimension visualized as a strip at given 1H and 15N chemical shifts. The combination of different spectra allows the attribution of cross-peaks to the given 13CA, 13CB and 13CO chemical shifts of the own and previous spin systems. This information is used to classify spin systems in groups of amino acid types and to sort spin systems in their order in the primary sequence. Finally, this information is mapped onto an unambiguous suite of residues in the primary sequence and cross-peaks are assigned to their corresponding residues. Assignments are then used to understand the protein secondary structure and topology or to map interactions and dynamics onto the primary sequence (or 3D structure if known).

Schematic workflow for protein backbone resonance assignment. Each cross-peak in a 15 N– 1 H HSQC spectrum (allocated random numbers #1–4) is inspected in a third 13 C dimension visualized as a strip at given 1 H and 15 N chemical shifts. The combination of different spectra allows the attribution of cross-peaks to the given 13 CA, 13 CB and 13 CO chemical shifts of the own and previous spin systems. This information is used to classify spin systems in groups of amino acid types and to sort spin systems in their order in the primary sequence. Finally, this information is mapped onto an unambiguous suite of residues in the primary sequence and cross-peaks are assigned to their corresponding residues. Assignments are then used to understand the protein secondary structure and topology or to map interactions and dynamics onto the primary sequence (or 3D structure if known).

The early 1990s marked a significant period in protein NMR assignment as key technologies became readily available for these experiments. One major advancement was the recombinant expression of proteins, which facilitated isotope labeling and thereby expanded NMR-visible nuclei from solely 1 H to include 1 H, 13 C, and 15 N. This expansion introduced an additional dispersion of resonances along these heteronuclear dimensions and enabled the use of 15 N– 13 C heteronuclear couplings for scalar coupling-based sequential assignments, as opposed to the older NOE-based method. This method made sequential assignments of macromolecules more streamlined and reliable.

Simultaneously, NMR spectrometer hardware evolved, increasing its power both in terms of magnetic field – with magnets capable of reaching up to a 600 MHz 1 H Larmor frequency – and probe design, which now included 4-channel probes for 1 H, 13 C, and 15 N triple-resonance experiments and 2 H lock/decoupling. Lastly, all the essential components of triple-resonance pulse sequences were developed. These include efficient magnetization transfer schemes such as INEPT 1   and TOCSY; 2   heteronuclear decoupling using continuous wave 3   or composite pulses; 4   homonuclear decoupling using constant-time evolution; 5   water suppression schemes such as WATERGATE 6 , 7   and excitation sculpting; 8 , 9   sensitivity enhancement blocks (preservation of equivalent pathways or PEP); 10 , 11   and quadrature detection and phase cycling schemes. 12 , 13  

In conjunction, these advancements resulted in the development of an initial set of triple-resonance experiments in the early 1990s. This suite of experiments is still in routine use today, with new methods continually being explored (as reviewed in Sections 1.4 and 1.5 ). In this chapter, we will delve into the optimal experiment suite for achieving a backbone sequential walk for both small and larger proteins, and explain how to garner amino acid type information to achieve assignment.

For small folded proteins up to approximately 25 kDa, their favorable relaxation properties and low spectral crowding allow them to be readily assigned without deuteration using the simplest set of triple-resonance experiments. Essentially, three sets of correlations are generated to match the spin system of a given residue i (H i , N i , CA i , CO i , and CB i ) to the carbon chemical shifts of the preceding residue in the primary sequence (CA i −1 , CO i −1 , and CB i −1 ). These chemical shifts are then used to find the corresponding spin system of residue i  − 1, including the amide chemical shifts (H i −1 , N i −1 ). This procedure is reiterated until an unambiguous stretch of residues can be mapped onto the primary sequence, and the spin system can be assigned to specific residues (refer to Figure 1.1 ). The following set of experiments are employed:

2D 15 N– 1 H HSQC: this experiment correlates the chemical shifts of amide 1 H and amide 15 N for each residue. This acts as the protein fingerprint as one residue yields one cross-peak (Pro is not present, and Trp, Gln, Lys, Asn, and Arg provide additional side-chain cross-peaks). 14  

3D HNCA: an INEPT is used to transfer magnetization from 15 N to 13 CA. Given the proximity of the J -coupling values for the intra-residue N i –CA i (around 11 Hz) and sequential N i –CA i −1 (around 8 Hz), it correlates an HSQC cross-peak to the preceding CA i −1 and its own CA i resonances. 15  

3D HNCO: in contrast to HNCA, the J -coupling value of the intra-residue N i –CO i pair is too low for efficient magnetization transfer, so HNCO only correlates a 15 N– 1 H HSQC cross-peak to the preceding CO i −1 resonance. 9 , 15  

3D HN(co)CA: magnetization is first transferred to CO i −1 but is not encoded. Instead, it is transferred again to CA, establishing correlation of a 15 N– 1 H HSQC cross-peak to the preceding CA i −1 resonance. 16   Due to the utilization of two strong 1 J NCO and 1 J COCA couplings, HN(co)CA is often more efficient than HNCA, particularly for small proteins in low magnetic fields (<800 MHz) (see Figure 1.2A ).

3D HN(ca)CO: magnetization is initially transferred to both CA i and CA i −1 but is not frequency-encoded. It is instead transferred again to CO, correlating a 15 N– 1 H HSQC cross-peak to both CO i and CO i −1 resonances. 17  

3D HNCACB: extending from HNCA, magnetization is further transferred from CA to CB using an INEPT and the approximately 35 Hz CA–CB J -coupling. This experiment can be tuned to achieve either both CA and CB encoding (half transfer) or CB-only encoding (full transfer). The latter often exhibits better sensitivity in detecting CB resonances for small proteins, despite requiring twice longer coherence transfer steps. It correlates a 15 N– 1 H HSQC cross-peak to the preceding CB i −1 and its own CB i resonances. 18  

3D HN(coca)CB: magnetization is transferred to CO, then CA, and finally to CB, which enables the correlation of a 15 N– 1 H HSQC cross-peak to solely the preceding CB i −1 resonance. 19  

Considerations in the sensitivity of triple-resonance experiments. (A) Relative first scan sensitivity of regular out-and-back experiments for small proteins (here GB1) at 600 MHz. (B) Relative first scan sensitivity of 1HN TROSY-based experiments for large deuterated proteins (here MBP) at 800 MHz. (C) 13C strips for selected GB1 residues exemplifying the relative intensities of 3D cross-peaks from different experiments (top) and statistical analysis of all non-overlapped 3D cross-peak intensities for GB1. (D) Magnetic field dependence of the relative signal height of TROSY and non-TROSY 1HN detected experiments. (E) NaCl concentration dependence of the relative signal height of 1HN and 15NH TROSY detected experiments.

Considerations in the sensitivity of triple-resonance experiments. (A) Relative first scan sensitivity of regular out-and-back experiments for small proteins (here GB1) at 600 MHz. (B) Relative first scan sensitivity of 1 H N TROSY-based experiments for large deuterated proteins (here MBP) at 800 MHz. (C) 13 C strips for selected GB1 residues exemplifying the relative intensities of 3D cross-peaks from different experiments (top) and statistical analysis of all non-overlapped 3D cross-peak intensities for GB1. (D) Magnetic field dependence of the relative signal height of TROSY and non-TROSY 1 H N detected experiments. (E) NaCl concentration dependence of the relative signal height of 1 H N and 15 N H TROSY detected experiments.

A significant limitation of the assignment routine that involves collecting four to six 3D triple-resonance experiments is the associated measurement time. Therefore, speeding up data collection can be particularly beneficial. Established non-uniform sampling schedules 20   and reconstruction algorithms 21–24   can be deployed to achieve a substantial time reduction, approximately tenfold, for 3D experiments.

Furthermore, the principle of SOFAST, 25   or band-selective excitation short-transient (BEST), can be applied to backbone assignment experiments. In essence, the magnetization of aliphatic and solvent 1 H nuclei is left undisturbed, achieved through the use of narrow bandwidth amide-selective shaped pulses. This strategy serves as a reservoir to accelerate the T 1 relaxation of amide protons, thereby minimizing inter-scan delays. A full suite of BEST triple-resonance experiments is available for use. 26  

The following out-and-stay experiments are available:

3D (hbha)CBCANH: this experiment correlates a 15 N– 1 H HSQC cross-peak to the previous CB i −1 /CA i −1 and the own CB i /CA i resonances. 27  

3D (hbha)CBCA(co)NH: this correlates a 15 N– 1 H HSQC cross-peak to the previous CB i −1 /CA i −1 resonances. 27  

For proteins larger than approximately 25 kDa, slower tumbling in solution leads to unfavorable relaxation properties and losses in both sensitivity and achievable resolution. These factors collectively render sequential assignment challenging. However, this issue can be circumvented through perdeuteration of proteins expressed in bacteria, followed by their exchange into a protonated buffer solution (back exchange). This process results in samples where labile amide hydrogens are NMR active ( 1 H), but background aliphatic/aromatic deuterons remain invisible to conventional triple-resonance experiments ( 2 H). Effectively, this removes the contribution of side-chain 1 H nuclei to relaxation, and the impact of the 2 H– 13 C J -coupling in aliphatic spin systems can readily be eliminated using 2 H decoupling during 13 C evolution. Deuteration renders “out-and-stay” type experiments unfeasible, thereby necessitating sole reliance on “out-and-back” experiments.

A crucial aspect of protein perdeuteration is the back-exchange process, through which amide protons are re-protonated by interacting with the solvent. This process often fails for amide groups concealed within the core of well-structured proteins, rendering these residues undetectable to triple-resonance experiments. It is critical to ensure sufficient back-exchange by comparing the number of cross-peaks on the 15 N– 1 H HSQC spectrum of the deuterated sample with that of its protonated counterpart. To optimize back-exchange, the sample may be exposed to high temperature and basic pH to increase the rate of exchange. Alternatively, or additionally, mild denaturing conditions (1–2 M GdnHCl or urea) can be applied to ‘open’ the protein core and increase the accessibility of the buried amide groups. However, it is crucial to test and optimize these procedures for each sample, as some proteins may lack stability under these conditions.

Furthermore, TROSY selection instead of 15 N decoupling during 1 H-direct detection becomes advantageous for larger proteins, where the slowest relaxing component has increased resolution and signal-to-noise ratio. A suite of experiments employing both TROSY and BEST approaches is available but generally does not support 2 H decoupling and is thus advantageous mostly for larger disordered proteins (see Section 1.7 ). 28   BEST experiments depend on aliphatic hydrogens to serve as a sink, assisting in the rapid longitudinal relaxation of the amide hydrogens. This effect is absent when deuterated, necessitating reliance on bulk water as a sink, which is effective for disordered proteins. The routine experiments with TROSY selection and 2 H decoupling are as follows:

2D 15 N– 1 H TROSY-HSQC: this correlates the chemical shifts of amide 1 H and amide 15 N. 29  

3D TROSY HNCA 2H : this correlates a 15 N– 1 H HSQC cross-peak to the previous CA i −1 and the own CA i resonances. 30 , 31  

3D TROSY HNCO 2H : this correlates a 15 N– 1 H HSQC cross-peak to the previous CO i −1 resonance. 31 , 32  

3D TROSY HN(co)CA 2H : this correlates a 15 N– 1 H HSQC cross-peak to the previous CA i −1 resonance. 31 , 32  

3D TROSY HN(ca)CO 2H : this correlates a 15 N– 1 H HSQC cross-peak to both CO i and CO i −1 resonances. 31 , 32  

3D TROSY HNCACB 2H : this correlates a 15 N– 1 H HSQC cross-peak to the previous CB i −1 and the own CB i resonances. 31–33  

3D TROSY HN(coca)CB 2H : this correlates a 15 N– 1 H HSQC cross-peak to only the previous CB i −1 resonance. 31 , 33  

The optimal field strengths for TROSY have been estimated to be on spectrometers operating around 900 MHz (21.14 T) for the 1 H N TROSY. 29   This initial rationale was based on the consideration of where the quadratic B 0 field dependences of the TROSY relaxation rates reach a minimum. The optimal DD-CSA interference gives rise to the longest transverse relaxation times ( T 2 ) of TROSY components at this magnetic field strength. However, in terms of sensitivity, the peak height is not only directly proportional to T 2 but also dependent on the strength of the magnetic field by B 0 3/2 . Therefore, the sensitivity is proportional to the product of T 2 ( B 0 ) and B 0 3/2 and shifts the maximum of the peak height significantly to a higher field. Considering this effect, the maximum sensitivity of the 1 H N TROSY shifts to around 1.5 GHz. 34   Theoretical estimates clearly show the advantage of higher-field magnets, above 1 GHz ( Figure 1.2D ); however, in practice, this gain in sensitivity for 1 H-detected TROSY can be partly offset by adverse ionic strength effects in high Q -factor probes at higher field strengths ( Figure 1.2E ). One should also consider the contribution of additional relaxation due to exchange broadening ( R ex ). This is especially true when the exchange is fast on the NMR time scale, where R ex is proportional to the field strength. In such a case, the advantages of higher-field magnets are lower in magnitude, while the field dependencies are largely unchanged. Furthermore, while the 1 H N TROSY prefers high field, 13 CO resonances suffer from the large contribution of CSA to their transverse relaxation rates which scales up quadratically with the magnetic field. More generally, one should consider the effect of the magnetic field strength on the relaxation of each coherence participating in transfer periods when assessing the sensitivity of multidimensional experiments.

Chemical shift statistics for nuclear spins in the protein backbone. Data are grouped by amino acid type and secondary structure and their mean value ± one standard deviation is plotted. Note that 13CA and 13CB are plotted on the same panel but are easily distinguishable (13CA 45–65 ppm; 13CB 15–45 ppm), except for Ser/Thr which have 13CB around 62–72 ppm.

Chemical shift statistics for nuclear spins in the protein backbone. Data are grouped by amino acid type and secondary structure and their mean value ± one standard deviation is plotted. Note that 13 CA and 13 CB are plotted on the same panel but are easily distinguishable ( 13 CA 45–65 ppm; 13 CB 15–45 ppm), except for Ser/Thr which have 13 CB around 62–72 ppm.

Once the backbone sequential walk has been performed, multiple smaller or longer series of 15 N– 1 H HSQC cross-peaks that have been linked together in their primary sequence order are available. In practice, it is often impossible to create a single series spanning the whole primary sequence due to missing cross-peaks in the spectra and to the presence of Pro residues which by essence do not have 15 N– 1 H HSQC cross-peaks. To assign these series of cross-peaks onto the primary sequence, one needs amino acid type information that can fortunately be extracted from the chemical shifts measured for each spin system ( Figure 1.1 ).

The Biological Magnetic Resonance Data Bank (BMRB) 35   is a public archive of chemical shifts of proteins, nucleic acids and metabolites. It is used for depositing, querying, and extracting chemical shift data and currently has over 6000 protein data sets. We used the data deposited in BMRB as of April 2023 to calculate the mean and standard deviation of chemical shifts for backbone nuclear spins. Chemical shift values that differ more than 5 standard deviations from the mean were omitted ( ca. 0.04% of the dataset). Ultimately 561 776 1 H, 431 739 1 HA, 529 242 15 N, 375 712 13 CO, 520 054 13 CA and 460 486 13 CB were used for statistics. Values were grouped by amino acid type and are plotted in Figure 1.3 . All amino acid type and nuclei-specific entries have at least 10 000 chemical shifts. The only exception was Pro where only 2563 chemical shifts of 15 N are available. Notably, the 13 CB chemical shift of Cys shows high standard deviation because Cys exists in oxidized and reduced states. Indeed, it has been shown that the 13 CB chemical shift for oxidized cysteine is 40.7 ± 3.8 ppm and that for reduced cysteine is 28.4 ± 2.4 ppm. 36  

Figure 1.3 reveals the distinctive power of certain nuclear spin chemical shifts. 1 H does not provide information on the amino acid type. However, Gly and Pro can easily be distinguished based on 15 N chemical shifts alone. Next, 13 CO and 1 HA contain little distinguishing power. 13 CA is more useful and a group of amino acid residues (Val/Thr/Pro/Ile) can be identified from their higher chemical shifts. On the other hand, 13 CB has the highest amino acid type dependence. Six categories can be made using 13 CB chemical shifts: Ala; Glu/Gln/His/Lys/Met/Pro/Arg/Val/Trp; Asp/Phe/Ile/Leu/Asn/Tyr; Gly; Ser; and Thr. Cys can occur in an oxidized or reduced state which makes it difficult to identify a priori . This classification of amino acid types based on backbone resonances is in practice enough to remove most ambiguities in the assignment of a series of cross-peaks to the primary sequence.

As proteins under study increase in size, spectral crowding and relaxation cause significant challenges that demand attention. The base fingerprint spectrum 1 H– 15 N HSQC may lack sufficient resolution to discern all residues, and more complex spectra such as HNCA and, more crucially, HNCACB – which provides the indispensable amino acid type information necessary for backbone assignment – may become unmanageable due to rapid relaxation. To mitigate these issues, selective labeling of amino acid residues can be deployed to simplify spectra, resolve overlapping resonances, and identify amino acid types. Here we are able to reduce the complexity by isotopically labeling only one type of amino acid. For instance, consider a protein composed of 500 residues, of which 480 are non-proline residues and 25 are valine. With standard labeling, one would observe 480 resonances in the 1 H– 15 N HSQC spectrum. However, if we specifically label valine, we would observe only 25 resonances, significantly reducing the spectral overlap.

Selective amino acid labeling involves introducing stable isotopes ( 13 C and/or 15 N) into only selected types of amino acids in proteins, a departure from the uniform labeling typically carried out for resonance assignment. This approach can easily be implemented in bacterial culture, by adding selected isotope-labeled amino acids into an amino acid-depleted medium like M9, devoid of nitrogen or carbon sources. The remaining amino acids are supplemented in their NMR-inactive form. Generally, a 15 N– 1 H HSQC is utilized as readout and examined for the presence or absence of cross-peaks in differentially labeled samples. This offers a straightforward method for assigning amino types to specific cross-peaks.

The approach can be made more sophisticated by employing different 15 N- and 13 C-labeled amino acids, enabling the identification of specific dipeptides, which provides sequential information in addition to amino acid type information. This technique is referred to as combinatorial labeling. 37   For instance, to assign three Methionine residues appearing as distinct dipeptides in the protein sequence (Met-Cys, Met-Val, and Met-Asn), only two samples are needed, with 13 C-Met/ 15 N-Val and 13 C-Met/ 15 N-Cys. 38   This combinatorial labeling can be extended to selectively labeled tripeptides to further eliminate ambiguities in sequence-specific assignment. 39  

The major limitation of selective labeling approaches is the price of isotope-labeled amino acids. To circumvent this problem, selective ‘unlabeling’ of amino acids can be achieved by supplementing 12 C/ 14 N amino acids in otherwise uniformly labeled minimal medium. 40   The resonance of selectively unlabeled residues will disappear from the NMR spectra of the control sample, leading to direct identification of amino acid types. Additionally, experiments were designed to exploit the absence of 12 C– 15 N J -coupling to filter the neighboring residues of unlabeled amino acid residues. If the tripeptide is unique in the protein sequence, resonance assignment is directly achieved. 41   An important concern in labeling/unlabeling strategies is scrambling of isotopes in amino acids through metabolic pathways in the organism expressing the protein of interest. In practice, in Escherichia coli , selective 15 N unlabeling of the following groups of amino acids can be achieved: Arg, Lys, Asn, Gln, His, Met, Ala/Trp, Phe/Tyr, Ile/Leu/Val, Gly/Cys/Ser/Thr. 42   The main source of 15 N scrambling in bacteria is the transaminase and it is typically less active in insect cells and in vitro expression systems, and thus more amino acids can be selectively labeled in these expression systems. It is worth mentioning that the amino acid selective labeling strategy is also established in these systems and contributes the sensitive detection of NMR resonance in large molecular weight proteins that are difficult to express in E. coli . 43   An expression strain that is auxotrophic for a particular amino acid can also be exploited to efficiently incorporate a specifically isotopically labeled amino acid. Today, numerous auxotrophic strains are available that are compatible with the widely-used T7 RNA polymerase overexpression systems, which minimizes metabolic scrambling and facilitates efficient incorporation of labeled amino acids. 44 , 45  

It should be also noted that 13 C is typically less prone to scrambling than 15 N, and thus the number of amino acid residues that can be selectively 13 C-labeled/unlabeled is increased. This can be achieved by adding amino acids that are 13 C-labeled at the carbonyl position to a deuterated minimal medium supplemented with 15 N ammonium chloride (for uniform 2 H– 15 N labeling). Selective TROSY-HNCO cross-peaks can be observed with high sensitivity for amino acid pairs connected by the labeling schemes (combinatorial labeling), and the amide proton resonance of the residue following the 13 C labeled amino acid is very sharp since its HA position is deuterated. This approach is cost effective and has been successfully applied to proteins larger than 40 kDa. 46  

The correlation between the chemical shift and protein secondary structure was experimentally recognized almost fifty years ago. 47   We used our BMRB dataset (April 2023) and, where applicable, assigned SwissProt identifiers to the BMRB entries. These were used to extract experimentally determined residue-specific secondary structure information using the PDB entries referenced in UniProt. Where available, this secondary structure information was used to annotate the original chemical shift entries from the BMRB. This yielded a total of 2 879 009 entries with secondary structure information confirmed by experimental methods, after filtering of outliers (832 703 α-helix, 700 370 β-strand, 103 996 turn; and 1 241 940 unstructured). Figure 1.3 illustrates the power of 1 HA, 13 CA, 13 CB and 13 CO chemical shifts in determining the secondary structure, whereas 1 H N and 15 N do not have strong dependence on the secondary structure.

Several automated methods have been developed to predict the secondary structure of proteins using chemical shifts. We can broadly classify these prediction methods into two categories. The first type of method, akin to circular dichroism, interprets NMR spectra in terms of secondary structure content without requiring sequence assignments. One such approach is the ‘CD-by-NMR’ method, which uses various unassigned 2D NMR spectra to extract information about the secondary structure. Here, an ‘average chemical shift (ACS)’ is computed for each nucleus type, and empirical equations are subsequently used to resolve the proportions of α-helix and β-strand structures. 48  

In contrast, the second category of methods, represented by the ‘chemical shift index (CSI)’ approach, relies on backbone assignments to provide residue-specific secondary structure information. 49   The authors generated a database of chemical shifts from residues known to assume ‘random coil’ conformations. They calculate the secondary shifts of 1 HA, 13 CA, 13 CB and 13 CO for a given residue by subtracting these random coil chemical shifts from the observed shifts. They define threshold values for secondary shifts, beyond which a specific secondary structure is assigned to the residue. This is carried out for each nucleus type, and a consensus is reached for each residue, which is demonstrated to improve accuracy. 50   The strength of the CSI method lies in its ability to specify the location (start and end residues) and type of secondary structure (α-helix, β-sheet, random coil; later expanded to 11 types of secondary and super-secondary structures 51   ). However, the CSI approach does face limitations, including its sensitivity to the chosen thresholds and random coil chemical shift values, as well as the difficulties it encounters with missing or incomplete assignments. Improved methods like PSSI (probabilistic secondary structure identification) 52   partially address these challenges. PECAN (protein energetic conformation analysis using NMR) 53   also mitigates some of these issues by incorporating sequence information and energetics models to refine the boundary determination of secondary structure elements, achieving approximately 90% accuracy in determining residue-specific secondary structure information.

Besides assigning secondary structure types, it can be desirable to calculate actual torsion angles, e.g. for 3D structure determination. Three-bond scalar couplings 3 J HA-HN , 3 J CO-HA and 3 J N-HA directly correlate with torsion angles 54   but are not easily measured in large proteins. Alternatively, chemical shifts of 1 HA, 13 CA, 13 CB and 13 CO can be used to predict torsion angles as done by the software TALOS 55   and its upgraded version TALOS-N (based on neural networks). 56  

Similar to the case of backbone resonance assignment, most experiments still used for assigning side-chain resonances were developed in the early 1990s from newly developed pulse sequence building blocks at the time. The basic idea is to achieve magnetization transfer from the side-chain aliphatic 1 H nuclei to a previously assigned nucleus (from backbone assignment), e.g. , 13 CB or 13 CA. Since the base spectrum here is a 13 C– 1 H HSQC, it is advisable to assign 1 HB/ 1 HA to establish unambiguous HSQC cross-peaks as starting points for the assignment process. As the frequencies of side-chain resonances help to determine the amino acid type, incorporating the side-chain resonance information already during backbone assignment is often useful. Technically, rather than using multiple INEPT steps, isotropic mixing of all 13 C nuclei along the side chain is used to transfer magnetization (TOCSY). 2   We detail the use of TOCSY sequences and their advantages in terms of bandwidth and sensitivity in Section 1.3.2 . This results in correlation of all 1 H or 13 C nuclei along the side chain. Such an approach can result in overly crowded spectra, in which case replacing the TOCSY by a COSY step simplifies the spectra by correlating only pairs of neighboring 13 C nuclei. Notably, for small proteins (less than 10 kDa), the 1 H– 15 N HSQC-TOCSY experiment, which relies on weak coupling between the side-chain hydrogens, is used to achieve resonance assignment of these side-chain hydrogens. 57   This method, however, is not effective for larger proteins due to the challenges posed by relaxation and the requirement for extended TOCSY mixing times, necessitated by the weak scalar coupling between side-chain hydrogens.

The experiments routinely used for side-chain assignment are as follows:

13 C– 1 H HSQC: this correlates the chemical shifts of each 1 H and 13 C pair in aliphatic side chains.

HBHA(cbca)NH: this correlates a 15 N– 1 H HSQC cross-peak to the previous HA/HB i −1 and the own HA/HB i resonances. 58   It is identical to the (hbha)CBCANH experiment, but aliphatic 1 H nuclei are encoded instead of 13 C.

HBHA(cbcaco)NH: this correlates a 15 N– 1 H HSQC cross-peak to the previous HA/HB i −1 resonances. 58  

H(cc)H COSY: this correlates pairs of neighboring 1 H resonances within a side-chain spin system. Magnetization is first transferred from 1 H to their attached 13 C nuclei using INEPT, followed by 90° COSY transfer to the neighboring 13 C nuclei, and then INEPT back to 1 H nuclei for detection. 59  

HC(-c)H TOCSY: this correlates all 1 H resonances within a side-chain spin system. The experiment is identical to H(cc)H-COSY, except that an isotropic mixing between all 13 C nuclei is employed (see Section 1.3.2 ). 60–62  

(h)C-CH TOCSY: this correlates all 13 C resonances within a side-chain spin system. The experiment is identical to H(c-c)H-TOCSY, except that chemical shift evolution happens after the initial INEPT transfer, on 13 C. 60–62  

The approach described above is based on a combination of the most sensitive spectra that can be used to achieve exhaustive assignment of aliphatic side chains; however, it requires prior rigorous assignment of 1 HA/ 13 CA (and 1 HB/ 13 CB) which is not always easy to obtain. For that reason, more complex experiments have been developed that can directly correlate side-chain nuclei to a cross-peak on a 15 N– 1 H HSQC spectrum. The following two experiments are available:

H(c-cco)NH: this correlates all 1 H resonances within a side chain to the successive 1 H i +1 / 15 N i +1 cross-peak of a HSQC. Magnetization is first transferred from 1 H to their attached 13 C via INEPT, then it is mixed between all 13 C nuclei using TOCSY. Finally, magnetization on 13 CA is transferred by successive INEPT steps to 13 CO i −1 , 15 N i −1 and ultimately 1 HN i −1 for detection. 63 , 64  

(h)C-C(co)NH: this correlates all 13 C resonances within a side chain to the successive 1 H i +1 / 15 N i +1 cross-peak of a HSQC. Chemical shift evolution happens after the initial INEPT transfer on 13 C. 63 , 64  

These experiments can readily be combined with TROSY selection for larger proteins; however, they require a protonated sample since they are 1 H-start out-and-stay type. Whenever relaxation is particularly problematic, the equivalent 13 C-start experiment CC(co)NH is available but otherwise suffers from reduced sensitivity due to the lower magnetization of 13 C as compared to 1 H.

To achieve isotropic mixing along the side chain, total correlation spectroscopy (TOCSY) sequences utilize the relatively large and homogenous 1 J CC couplings in aliphatic systems ( ca. 35 Hz). In this system, magnetization transfer is allowed under the Hartmann–Hahn condition, i.e. when the difference in resonance frequencies between neighboring 13 C nuclei is much smaller than 2π 1 J CC ( ca. 220 Hz). 65   However, this is unrealistic in protein samples at high magnetic fields. For example, the aliphatic carbons span about 70 ppm, which corresponds to a bandwidth of about 14 kHz at 18.8 T (800 MHz). The purpose of a TOCSY pulse is thus to remove effective chemical shift differences from the sample, while maintaining at least some of the coupling magnitude. This offers a way to indirectly fulfill the Hartmann–Hahn mixing condition and achieve TOCSY mixing over a larger range of chemical shifts.

The simplest example of a TOCSY pulse is a high-power transverse plane pulse, known as a spin-lock pulse. If this pulse power is much stronger than the chemical shift offsets, then it continually and rapidly reverses the sense of the chemical shifts’ precession. This repeatedly refocuses the dispersion caused by chemical shift offsets. Importantly, this conserves the homonuclear J -coupling evolution so that the effective Hartmann–Hahn condition is satisfied, and TOCSY transfer takes place. This scheme requires an extremely high RF power in the spin-lock pulse to cover experimentally relevant bandwidths. In practice, such RF power levels cannot safely be utilized in NMR spectrometers, 66   and the problem continuously grows with increasing magnetic fields available. This is the reason for the need of improved TOCSY sequences with better effective bandwidths, robustness to RF inhomogeneity, and minimized signal losses from relaxation during the mixing period.

The first composite pulse to achieve broadband mixing by effectively satisfying the Hartmann–Hahn condition worked by repeated application of the 90 x –180 y –90 x compensated inversion element, which has a slightly broader inversion profile than a simple 180° pulse. 67   The compensated inversion elements were arranged into an MLEV-16 supercycle so that the phase of the inversion element was varied systematically, which averages out some of the error, especially around the edges of the bandwidth. The mixing bandwidth of this approach is about 80% of the RF amplitude. 68  

The most widely used mixing pulses in liquid state NMR are DIPSI-2 69   (decoupling in the presence of scalar interactions) and FLOPSY-16 70   (flip–flop spectroscopy). The design principle underlying the DIPSI scheme is to apply an arbitrary sequence of hard pulses with arbitrary duration along a single axis, i.e. , alternating in the plus or minus x -direction. The pulse durations were numerically optimized to maximize the pulse fidelity. The FLOPSY sequence works in a similar way, but the arbitrary sequence of hard pulses is applied with an arbitrary phase. The flip angles and phases were numerically optimized, and improved performance over DIPSI was achieved. The mixing bandwidths of DIPSI-2 and FLOPSY-16 are limited to approximately twice the RF amplitude. 68  

TOCSY sequences behave differently from one another in a variety of ways, such as the offset dependence of transfer efficiency, robustness to RF inhomogeneity, and relaxation effects. Therefore, attempts to quantitatively compare and rank the performance of different sequences must employ a precise definition of performance that considers these various properties. Active bandwidth and the global quality factor are the two main metrics used to quantitatively compare the performance of TOCSY mixing sequences. Active bandwidth refers to the spectral region over which at least 50% of the magnetization is transferred via a coupling. Global quality factors score mixing sequences according to the worst-case transfer over a range of possible mixing times and chemical shifts. It is possible to use both active bandwidth and the global quality factor as cost functions in optimal control theory to directly optimize a TOCSY sequence. This approach yielded a shaped TOCSY pulse termed RRF-AB that scores 23% higher than FLOPSY in active bandwidth. 71 , 72  

Figure 1.4 shows a side-by-side comparison of active bandwidths for MLEV-16, DIPSI-2, FLOPSY-16 and RRF-AB, plotted as contours lines for transfer efficiencies of 90%, 70%, 50% (red line) and 30%. Of note, the point marked with a red ‘ x ’ corresponds to the farthest off-diagonal cross-peak that experiences 50% transfer efficiency using RRF-AB. This cross-peak has under 30% transfer efficiency using any of the other sequences. The active bandwidths for the four sequences are 0.61 A , 0.85 A , 1.05 A , and 1.31 A , respectively (where A is the RF amplitude).

Simulation of the efficiency of various TOCSY sequences: (A) MLEV-16, (B) DIPSI-2, (C) FLOPSY-16 and (D) RRF-AB. Simulation parameters were root-mean-square RF amplitude A = 4 kHz, J = 35 Hz, and t_mix = 1/(2J). Contour lines are plotted at 90%, 70%, 50% (red), and 30% transfer efficiencies.

Simulation of the efficiency of various TOCSY sequences: (A) MLEV-16, (B) DIPSI-2, (C) FLOPSY-16 and (D) RRF-AB. Simulation parameters were root-mean-square RF amplitude A  = 4 kHz, J  = 35 Hz, and t _mix = 1/(2 J ). Contour lines are plotted at 90%, 70%, 50% (red), and 30% transfer efficiencies.

The strategy for assigning aromatic side chains mirrors that of aliphatic side chains, where all side-chain resonances are interconnected and ultimately linked to resonances identified from backbone assignment routines (typically 13 CB). Aromatic side chains exhibit unique NMR properties, most notably a significant chemical shift anisotropy arising from the planarity of aromatic rings, which subsequently enhances transverse relaxation rates. However, the beneficial interplay between chemical shift anisotropy and dipolar interactions can be harnessed to boost resolution in aromatic side chains using TROSY selection. 73   Still, the robust homonuclear 1 J CC couplings (approximately 57 Hz) curtail the resolution potential of the aromatic TROSY experiment. The implementation of constant-time evolution may help, yet it often results in signal reduction in rapidly relaxing, high molecular weight complexes. To mitigate this issue, alternative labeling of aromatic side chains can be achieved by utilizing 2- 13 C-glycerol or 2- or 3- 13 C-pyruvate as a carbon source during protein expression. 74   The combination of TROSY and alternative labeling generates highly resolved 2D spectra, laying the groundwork for aromatic side-chain assignment.

The assignment of aromatic side chains via isotropic TOCSY mixing presents significant challenges, primarily due to two factors: (i) the substantial chemical shift distance exceeding 80 ppm between aromatic and aliphatic carbons, which is not conducive to TOCSY transfer; and (ii) the aromatic carbons resonating around 120 ppm are strongly coupled amongst themselves (∼57 Hz), while exhibiting weak coupling with the CB carbon (∼35 Hz). This discrepancy severely restricts the TOCSY-based transfer of magnetization from aromatic carbons to the protein backbone. Hence most experiments employed for assignment rely on COSY-type magnetization transfer steps. These steps correlate 1 HB/ 13 CB to various aromatic carbons or protons in a series of 2D experiments. The use of TOCSY is generally limited to isotropic mixing of aromatic 13 C nuclei alone. The following experiments can be employed:

Aromatic TROSY: this correlates the resonances of each 1 H and 13 C pair in aromatic side chains. 73  

(hc)C-(c)CH-TOCSY: this correlates the resonances of all 1 H to all 13 C nuclei. 75  

(hb)CB(cgcd)HD: this correlates the resonances of 13 CB and 1 HD. 76  

(hb)CB(cgcdce)HE: this correlates the resonances of 13 CB and 1 HE. 76  

HB(cb)CG: this correlates the resonances of 1 HB and 13 CG. 75 , 77  

HD(cd)CG: this correlates the resonances of 1 HD and 13 CG. 75 , 77  

HE(cecd)CG: this correlates the resonances of 1 HE and 13 CG. 75 , 77  

HZ(czcecd)CG: this correlates the resonances of 1 HZ and 13 CG. 77  

Histidine is a special case of aromatic residue that contains amide side-chain groups, where the protonation state can be important for protein function ( e.g. enzyme catalysis). NE and ND resonances can be assigned using H(c)N and H C N experiments. H(c)N transfers magnetization from HD/HE to CE/CD first, and then to NE/ND in a second step, both via INEPT transfer using the 1 J coupling. The H C N experiment however uses the 2 J HN coupling directly. 78   Notably, the protonation state of NE and ND changes the relative 1 J and 2 J coupling strengths. Additionally, measurement of the chemical shifts of CE and CD allows for determination of the tautomeric state of singly protonated histidine. 79  

Figure 1.2 demonstrates the striking disparity in sensitivity among the triple-resonance experiments typically used for routine sequential backbone assignment, especially for large molecular weight systems. Despite implementing deuteration and TROSY selection, those experiments with prolonged coherence steps are significantly hindered by rapid transverse relaxation. This challenge is particularly critical in the HNCACB experiment due to its lengthy 13 CA– 13 CB INEPT transfer steps, a predicament illustrated by the 40 kDa protein VSP/PTEN. 80   The HNCA experiment, being roughly four times more sensitive than HNCACB for larger proteins at high magnetic fields, 81   offers an optimal trade-off between sensitivity and information content for assignment.

In theory, the HNCA experiment encapsulates all necessary data for sequential assignments (both CA i and CA i −1 resonances). However, the limited dispersion of CA chemical shifts in practical scenarios generates ambiguities, obstructing sequential assignment. This challenge can be partially mitigated by conducting an HNCA with high resolution in the CA dimension, a feat made feasible due to the availability of high magnetic fields and the use of non-uniform sampling, 20   complemented by reconstruction algorithms. 21–24   Deuteration significantly mitigates the relaxation rates of CA, thus facilitating the acquisition of high-resolution signals even for exceptionally large proteins. However, the presence of 13 CA– 13 CB coupling places a practical limit on the achievable resolution.

Moreover, 13 CA chemical shifts are generally insufficient for identifying amino acid types, adding another layer of complexity in assigning the protein backbone exclusively from HNCA. Therefore, it becomes extremely advantageous to amplify the benefits of the sensitive HNCA experiment by integrating additional layers of information. The strategies detailed below supplement the conventional HNCA experiment to surmount chemical shift degeneracy and enhance the identification of amino acid types, thus enabling backbone assignment using exclusively HNCA spectra.

The most direct method to achieve a high-resolution HNCA spectrum without 13 C splitting is to ‘unlabel’ the 13 CB nucleus during sample preparation, a process referred to as metabolic decoupling. Several approaches to metabolic labeling are available, including the use of 2- 13 C-glycerol instead of uniform 13 C-glucose in growth media 82   and stereo-array isotope labeling (SAIL) which employs specialized 13 C-labeled and deuterated precursors. 83   However, these strategies have limitations in terms of incorporation rates and costs. More recently, pyruvate has been utilized as the sole carbon source during bacterial growth. 84   E. coli can synthesize all metabolites from pyruvate as a precursor and does not require an additional carbon source. Various pyruvate isotopomers (1-, 2-, 3- 13 C or combinations) are commercially available, and pyruvate labeling can be readily combined with deuteration. This is easily accomplished by dissolving protonated pyruvate in D 2 O at pD = 13. The CH acidity of the methyl protons allows them to exchange with deuterons before the pD is restored by adding a phosphate buffer. Consequently, the contributions of both 1 H– 13 C and 13 C– 13 C couplings to 13 CA transverse relaxation rates are eliminated, and HNCA spectra with exceptional resolution can be obtained, significantly reducing sequential matching ambiguities.

In terms of pyruvate metabolism, amino acids can be categorized into three groups: (i) amino acids synthesized directly from pyruvate or through conjugation with directly pyruvate-derived metabolites that conserve the carbon structure of pyruvate (Ala, Ser, Cys, Gly, Trp, Phe, Tyr, Lys, Asp); (ii) the branched-chain amino acids Val and Leu that are synthesized by conjugating pyruvate with acetyl-CoA, and Ile that is formed by conjugating pyruvate with a threonine derivative; and (iii) TCA-cycle-derived amino acids (glutamate-type Glu, Gln, Pro, Arg and aspartate-type Asp, Asn, Thr, Met). Based on this knowledge, incorporation of 13 C at the CA and CB positions can be calculated for all amino acid types.

Notably, it was shown that by using a mix of 2- 13 C pyruvate and 3- 13 C pyruvate (mixed pyruvate labeling), the 13 CA– 13 CB coupling is reintroduced for specific amino acid types. 84   The respective 13 CA signals in a high-resolution HNCA spectrum exhibit superpositions of singlet (from 13 CA that is attached to a 12 CB) and doublet signals (from 13 CA that is attached to a 13 CB) which allows to quantify the relative amount of adjacently incorporated 13 CB. Thus, as a result of the biochemical pathway, we obtain unique peak shapes for different amino acid types. This introduces amino acid type information in the HNCA experiment because, as a general trend, amino acids in group (i) show singlets, those in group (ii) show mostly doublets, and those in group (iii) show a singlet/doublet mix. The singlet-to-doublet ratio is always identical in the internal peak and its sequential match and thus provides another means to distinguish between ambiguous sequential matches. 84   This pyruvate labeling technique provides a high-resolution CA resonance, free of coupling, and an additional split doublet that furnishes further information about the amino acid type. The well-defined, uncoupled central resonance facilitates resonance matching for establishing sequential connectivity, and any existing degeneracy can be resolved using the additional data derived from peak shape. Importantly, this is achieved while maintaining the relaxation demands of an HNCA experiment.

Elimination of the effect of 13 CA– 13 CB coupling can be achieved by adding decoupling schemes to the HNCA pulse sequence. Constant time evolution for 13 CA chemical shift encoding can be used but greatly limits sensitivity, especially for large proteins. Alternatively, band-selective pulses such as adiabatic WURST pulses can be used. 85   However, three bands are required to decouple all CB nuclei (including those of Ala and Ser/Thr) and need proper calibration not to touch 13 CA (which is almost inevitable for Gly). Moreover, such homonuclear decoupling schemes give rise to Bloch–Siegert shifts 86   of the 13 CA resonances that have to be properly calibrated and compensated for.

More recently, a suite of homonuclear shaped decoupling pulses was designed using optimal control theory and optimized to invert only selected 13 CB frequency bands without perturbing other frequencies. This suite of pulses was termed beta/alpha decoupling pulse (BADCOP). 87   Importantly, the pulses have been designed to avoid introducing any Bloch–Siegert shifts. Overall, they allow refocusing of the 13 CA– 13 CB coupling using a single pulse placed in the center of the 13 CA evolution period. This technique allows selective decoupling of 13 CB nuclei resonating in a defined chemical shift range, resulting in a singlet in the HNCA (while other cross-peaks appear as doublets). The individual line shape of a 13 CA cross-peak is therefore only determined by the chemical shift of its adjacent 13 CB. Because 13 CB is particularly sensitive to the amino acid type, this effectively encodes information on the amino acid type in an HNCA spectrum.

Three variations of BADCOP pulses were designed which vary in the chemical shift bandwidth and range of 13 CB inversion. 87   They all additionally invert all 13 CO resonances. BADCOP1 inverts 13 CB spins with chemical shifts <35 ppm, thereby decoupling all amino acids but Lys, Tyr, Asp, Asn, Phe, Ile, Ser, and Thr (that appear as doublets). BADCOP2 decouples 13 CB with chemical shifts between 28 and 35 ppm (Cys, Arg, Met, His, Lys, Glu, Val, Trp, Pro, and Gln). BADCOP3 decouples 13 CB spins with chemical shifts <43 ppm so that only Ser and Thr cross-peaks appear as doublets. Acquiring multiple high resolution HNCA spectra using different BADCOP pulses enables assignment of signals to defined groups of amino acid types (depending on their respective 13 CB chemical shifts). 87   Moreover, the line shapes are always identical in the internal peak and its sequential match. Differential line shape distortions of signals with chemical shifts close to the different BADCOP cut-off frequencies are of extraordinary value to identify the correct sequential match out of a set of candidates. Similarly, the variety of 13 CA– 13 CB coupling constants usually causing issues when implementing homonuclear decoupling strategies become beneficial as a characteristic signal feature. Thus, BADCOP decoupling provides an excellent means to increase the resolution of the HNCA experiment, remove ambiguities in sequential matching and extract amino acid type information for backbone assignment.

Note that pyruvate labeling and band-selective CB decoupling are orthogonal methods that do not rely on the same principles and therefore do not overlap in the information they provide. In particularly tricky cases where many residues’ 13 C frequencies are nearly degenerate (whose likelihood increases in larger proteins), combining the two methods proves to be very efficient in obtaining faithful sequential links and ultimately resonance assignments. A simulated case is presented in Figure 1.5 , where the proper sequential match (the Arg) can only be identified by combining pyruvate labeling and band-selective CB decoupling.

Complementarity of pyruvate labeling and band-selective decoupling. Three sequential match candidates (CAi candidates in blue matching to CAi−1 in orange) have completely degenerate 13CA chemical shifts of 55.56 ppm in a constant time HNCA spectrum. Mixed pyruvate labeling allows the last candidate to be excluded via line shape matching, but the first two candidates remain. BADCOP1 generates distinct splitting patterns (singlet and doublet) for these two cross-peaks, allowing the first candidate to be excluded. Additionally, combining these methods provides finer amino acid type distinction than either method alone.

Complementarity of pyruvate labeling and band-selective decoupling. Three sequential match candidates (CA i candidates in blue matching to CA i −1 in orange) have completely degenerate 13 CA chemical shifts of 55.56 ppm in a constant time HNCA spectrum. Mixed pyruvate labeling allows the last candidate to be excluded via line shape matching, but the first two candidates remain. BADCOP1 generates distinct splitting patterns (singlet and doublet) for these two cross-peaks, allowing the first candidate to be excluded. Additionally, combining these methods provides finer amino acid type distinction than either method alone.

The applicability of assignment strategies discussed in Sections 1.2 and 1.3 is limited to proteins of about 50 kDa as the sensitivity is severely decreased for fast-relaxing high molecular weight assemblies. The power of methyl NMR was recognized as a way to tackle much larger proteins and molecular assemblies up to the MDa range. Detecting the multiple quantum coherence between 13 C and 1 H in methyl groups has favorable relaxation properties with a small dependence on the overall tumbling rate, referred to as the ‘methyl TROSY’ effect. 88   To make use of the full power of methyl TROSY, isolated isotope labeled methyl groups need to be introduced in otherwise deuterated proteins of interest. Several approaches have been developed to introduce Met, Ala, Thr, Ile, Val and Leu residues with specific methyl isotope labeling during protein recombinant expression. These approaches rely either on addition of synthetic selectively labeled amino acids in the growth medium (for Met, 89   Ala 90   and Thr 91   ) or on using metabolic precursors (α-ketobutyrate for Ile and α-ketoisovalerate for Leu and Val). 92   Notably, utilization of stereo-isotopomers of acetolactate allows stereo-selective labeling of pro-S and pro-R methyl groups in Val and Leu. 93 , 94   Furthermore, a synthetic route to stereo-selective methyl labeled Leu has been reported which expands the use of this approach to non-bacterial protein expression systems (cell-free, insect cells). 43  

Application of methyl TROSY also necessitated adaptations in side-chain assignment strategies. Isotropic mixing (TOCSY) performs poorly in such spin systems due to the presence of undesirable 1 J CC couplings at branching points. 95   Protein labeling with a linearized chain of 13 C nuclei along the side chain allows efficient magnetization transfer in perdeuterated proteins. 96 , 97   Moreover, unidirectional transfer from methyl groups to backbone 13 CA or 13 CO or even amides can be achieved with COSY-type relay experiments. Magnetization is transferred stepwise from carbon to carbon by using suitably designed concatenated blocks and selective pulses to avoid magnetization leakage in two separate directions at each step. Different COSY-based experiments are used for different amino acids: (hm)CM(cgcbca)NH 95 , 96   for Ile and Leu, and (hm)CM(cbca)NH for Val. 96   Out-and-back type HMCM(cgcbca)CO and HMCM(cg)CBCA experiments are also available and can outperform the aforementioned experiments in some cases.

A different approach uses precursors which generate linearized 13 C-labeled versions of Leu and Val side chains in a deuterated background. 98   A TOCSY out-and-back (hmcm)CCMHM pulse sequence is used to connect methyl resonances with other aliphatic 13 C resonances. 98   The J -splitting between neighboring 13 C nuclei greatly limits achievable resolution but can be deconvoluted during processing using machine learning tools. 99   The advantages are that TOCSY mixing is shorter than relayed COSY transfer and that it can be easily tuned just by changing the mixing time, which increases the versatility of experiments.

As pointed out in Section 1.5.1 , assigning methyl resonances by transferring magnetization and correlating methyl resonances with backbone resonances ( e.g. , using HCCH-TOCSY type experiments; see Section 1.3.1 ) is challenging for large proteins. However, in the era of AlphaFold, a structure or structural model of nearly all proteins is available and can be used to predict methyl–methyl NOESY patterns. Then, these can be compared to experimentally measured NOESY spectra ( e.g. , using 3D (h)CCH HMQC NOESY) to extract methyl-specific assignments. However, the problem of mapping methyl resonances to a protein structure requires searching in a high-dimensional space and the number of maps to test grows exponentially with the number of methyl bearing residues in the protein. Algorithms have been designed to incorporate heuristic and probabilistic approaches to navigate the problem of combinatorial explosion. Several such algorithms are available that employ different methods to combine NOE data and protein structures and optimize the accuracy of methyl assignments.

FLAMEnGO 100   uses Monte Carlo simulations along with a scoring function to assign methyl resonances. It starts from random assignments and then iteratively swaps assignments using Monte Carlo simulations, and the swap is accepted or rejected based on a scoring function whose main contributing term is the difference in simulated and experimental methyl–methyl NOE spectra. Methyl Assignment by Graph Matching (MAGMA) 101   uses exact graph matching to compare the data graph generated using NOE restraints and the structure graph generated using a high-resolution protein structure. Heuristics are used to prioritize the matching of vertices and to maximize the number of edges in the data graph explained by the structure graph. Methyl Assignment by Graphing Interference Construct (MAGIC) 102   works in a similar manner but prioritizes assigning high density vertices and the high confidence local search results are propagated for the next iteration. This avoids combinatorial explosion and is faster than MAGMA as it is traversing the search space hierarchically from a local dense network to global assignment. Methyl Fully Automated Assignment (MethylFLYA) 103   uses an evolutionary algorithm to optimize methyl assignment. At each iteration, a subset of parents is selected based on scoring, mutated, and recombined to produce the next generation. The entire process is run multiple times with different random seeds each time and solutions that consistently appear in several runs are finally selected. Methyl Assignment Using Satisfiability (MAUS) 104   generates a structure graph and all possible data graphs from NOE data. The fitting of a sparse data graph to a structure graph is a nondeterministic polynomial complete subgraph isomorphism problem. MAUS converts it to a satisfiability problem and uses the general solver method CryptoMiniSat to solve the satisfiability problem.

Practical considerations for the applicability of these structure-based assignment methods are the goodness of the available high-resolution structure and the quality of NOE data. Maximizing the number of methyl probes is a straightforward way to improve the density of the NOE network. Conversely, deuteration increases the maximum distance of the measured NOE cross-peaks up to 12 Å 105   and therefore the number of NOEs in the network. Assigning amino acid types to methyl resonances also largely reduces ambiguities in assigning the NOE network. Whilst Ile and Met are easily identified by their distinct 1 HM and 13 CM chemical shifts, identifying the signals of Leu, Val, Ala and Thr is not as straightforward and should be addressed by selective labeling (Thr, Ala) 90 , 91   and use of optimal-control pulses that effectively invert the sign of Leu vs. Val methyl resonances. 106   Stereospecific labeling of proR and proS methyl groups in Leu and Val further improves the assignment outcome by increasing the NOE distance threshold, reducing spectral crowding, and reducing ambiguities in the NOE network. Additionally, knowledge of the internal dynamics of the protein (from crystal structure B -factors or NMR relaxation experiments) is useful in assessing the outcome of structure-guided assignment and refining the network of NOEs that should be observed. Finally, in homo-oligomeric complexes, distinction between inter- and intramolecular NOEs is useful both as prior information (omission of interchain NOEs) or as a validation tool. 107  

One take-home message from the discussions above is that the information provided separately by different spectra is increased when all are considered simultaneously. For example, side-chain proton resonances can be tentatively assigned with an H(c-cco)NH spectrum, whereas HC(-c)H or (h)C-CH TOCSY spectra help identify (H,C) correlations belonging to the same side chain but do not pair them with assigned resonances unless HA and HB have been assigned in addition to CA and CB. However, when both are used together, unassigned 1 H and 13 C resonances can be linked to the assigned amide resonances reliably, albeit somewhat tediously. Covariance NMR aims at exploiting this property by combining the information provided by separate spectra into new, artificial correlation maps. Here, spectra are treated as multi-dimensional arrays 108   that can be subjected to matrix operations. 109–111   Most importantly, matrix multiplication along dimensions common to two different spectra will lead to correlations between signals in the remaining dimensions if they carry the same signals in the subsumed dimension. For example, multiplying the H N /H ali planes of H(c-cco)NH ( Figure 1.6A ) with the H C /H ali planes of HC-(c)H TOCSY ( Figure 1.6B ) will provide correlations (H N , H C ) only if the 1D traces along H ali at the H N and H C coordinates feature overlapping signals ( Figure 1.6C ). By repeating this procedure for all points in the nitrogen and carbon dimensions of H(c-cco)NH and HC(c)H-TOCSY, respectively, we could build an [H,N,H,C] four-dimensional array, in which the 13 C– 1 H HSQC of an unassigned side chain can be visualized for every assigned amide (H,N) correlation ( Figure 1.6D ). The assignment then becomes trivial. Unfortunately, while such matrix operations have been routinely employed for small molecules, 112 , 113   too many residues feature similar frequencies in proteins and, in combination with line-broadening and artefactual correlations, the resulting correlation maps are impractical. To overcome this challenge, a spectral derivative must first be taken along the subsumed dimension. 114   The dispersive spectra then feature inflexion points instead of maxima, and partially overlapped but mismatched signals will cancel each other during matrix multiplication. For spectra where several common signals are compared during multiplication, applying matrix square-rooting further reduces artefacts. 115   Figure 1.6C and D demonstrates the quality of correlation maps combining spectral derivatives and square-rooting. 116   In practice, the entire 3D input arrays are subject to spectral derivatives along H ali and multiplied and rooted via singular-value decomposition. 117  

Covariance NMR for resonance assignments. Multiplications between the HN/Hali planes (A) of H(c-cco)NH and the HC/Hali planes of HC(-c)H TOCSY lead to HN/HC planes (C). HN and HC resonances that are correlated via common signals along Hali (dashed lines in A and B) lead to correlations in the output correlation map (grey arrow in C). (D) A four-dimensional array is built by repeating the process for all nitrogen and carbon datapoints in H(c-cco)NH and HC(-c)H TOCSY, respectively. With this 4D array, each assigned amide is paired with a 13C–1H HSQC of its unassigned side chain, and side-chain resonance assignments are done through visual inspection. To remove the artefacts in (D), spectral derivatives were applied along the Hali dimension and matrix square-rooting was applied to the 4D array. In practice, the entire 3D input arrays are subject to spectral derivatives along Hali and multiplied and rooted via singular-value decomposition.

Covariance NMR for resonance assignments. Multiplications between the H N /H ali planes (A) of H(c-cco)NH and the H C /H ali planes of HC(-c)H TOCSY lead to H N /H C planes (C). H N and H C resonances that are correlated via common signals along H ali (dashed lines in A and B) lead to correlations in the output correlation map (grey arrow in C). (D) A four-dimensional array is built by repeating the process for all nitrogen and carbon datapoints in H(c-cco)NH and HC(-c)H TOCSY, respectively. With this 4D array, each assigned amide is paired with a 13 C– 1 H HSQC of its unassigned side chain, and side-chain resonance assignments are done through visual inspection. To remove the artefacts in (D), spectral derivatives were applied along the H ali dimension and matrix square-rooting was applied to the 4D array. In practice, the entire 3D input arrays are subject to spectral derivatives along H ali and multiplied and rooted via singular-value decomposition.

The procedure has also been successfully employed to assign amide resonances using maps calculated from HNCO, HN(ca)CO, HNCA, HN(co)CA, HN(ca)CB, and HN(coca)CB. 118   Here, correlations between sequential residues can be seen in [H,N,Hs,Ns] correlation maps, where Hs and Ns refer to either residue i  + 1 or i  − 1 depending on how the 4D is transposed. Further improvements are obtained when applying element-wise multiplication between the maps obtained from different carbon dimensions as CA, CB, and CO chemical shifts are all unlikely to be accidentally the same for non-sequential residues. Asparagine side chains can be assigned with a different combination of the same spectra. 116   Similarly, methyl resonances can be assigned by combining HMCMCGCB or HMCM(cg)CBCA spectra with HNCA and HN(ca)CB. 119  

Although covariance correlation maps have been greatly improved by recent advances, they are not meant to replace peak picking and visual inspections but to help overcome challenges. 120   Notably, artefacts may remain when large dynamic ranges in sensitivity are present, e.g. for a disordered region attached to a folded core, as artefacts stemming from intense signals may compare with true correlations between weak signals. Thus, correlation maps are best used in conjunction with the original data. Since the maps do not require specific data acquisition, we envision that they may be routinely employed for quality control and rescue of stalled resonance assignments.

NMR spectroscopy is particularly useful to study intrinsically disordered proteins and protein regions, where other structural biology techniques fail due to their high flexibility and degrees of freedom. However, these proteins come with their own challenges in terms of resonance assignment. The traditional resonance assignment experiments involving 1 H N direct detection suffer from poor chemical shift dispersion and from exchange broadening of 1 H N with the solvent protons. Further, IDPs are rich in Pro residues which lack amide protons and are thus absent from traditional experiments.

The counterpart to this is that disordered protein regions have favorable relaxation properties, which allows multiple magnetization transfer steps and encoding of more than two indirect dimensions. However, obtaining uniform sampling in four or more dimensions would require unrealistic measurement times. Consequently, various approaches have been proposed to decrease the number of recorded data points during the experiment. The availability of efficient non-uniform sampling (NUS) schedules 20   and reconstruction algorithms 21–24   makes NUS a valid approach to increase the dimensionality of NMR spectra. The most attractive feature of NUS is that the requirement in the number of collected points in the Nyquist grid in fact goes down with the number of dimensions to reconstruct (10% for 2 reconstructed dimensions, 4% for 3 reconstructed dimensions, etc. ).

Another way of measuring high-dimensionality NMR spectra in a time-efficient way is automated projection spectroscopy (APSY), which allows reconstruction of up to 7-dimensional spectra from a set of 2D projection spectra at different angles. An automated algorithm retrieves the peak positions and yields a final N- dimensional peak list. 121   The strategy relies on incrementing two or more evolution periods simultaneously, first described as accordion spectroscopy. 122  

Both APSY and NUS have successfully been applied to increase the number of dimensions by correlating different combinations of internal and sequential N, H, CO, CA and HA nuclei, which increases spectral dispersion for crowded IDPs with repetitive amino acid sequences. The following experiments have successfully been utilized:

4D HNCOCA: this correlates a set of H i , N i , CO i −1 and CA i −1 resonances. 121  

4D HNCACO: this correlates H i , N i , CO i , CO i −1 , CA i and CA i −1 resonances. 123  

4D HACANH: this correlates H i , N i , HA i , HA i −1 , CA i and CA i −1 resonances. 123  

5D HACACONH: this correlates a set of H i , N i , CO i −1 , HA i −1 and CA i −1 resonances. 121  

4D CBCANH: this correlates H i , N i , CA i , CA i −1 , CB i and CB i −1 resonances. 123  

5D CBCACONH: this correlates a set of H i , N i , CO i −1 , CA i −1 and CB i −1 resonances. 123  

5D HNCOCACB: this is an out-and-back experiment correlating a set of H i , N i , CO i −1 , CA i −1 and CB i −1 resonances. 124  

Similarly, the high number of transfer steps and encoded nuclei allows creation of new connectivities for unambiguous sequential matching. In essence, these experiments shift the base spectrum for backbone assignment from a 2D 15 N– 1 H HSQC to a 3D HNCO:

5D HACA(n)CONH: this connects HA i , CA i and HA i −1 , CA i −1 to a given set of N i , H i , CO i −1 (HNCO) coordinates. 124  

5D (haca)CON(ca)CONH: this is an out-and-stay experiment connecting CO i −1 , N i and CO i −2 , N i −1 to a given set of N i , H i , CO i −1 (HNCO) coordinates. 124  

5D (h)NCO(nca)CONH: this is an out-and-back experiment connecting CO i −1 , N i and CO i −2 , N i −1 to a given set of N i , H i , CO i −1 (HNCO) coordinates. 124  

6D HNCO(nca)CONH: this is a variant of the previous experiment additionally encoding the first amide proton resonance, essentially connecting two sequential HNCO cross-peaks. 125  

7D HNCO(n)CACONH: this is a variant of the previous experiment additionally encoding the CA resonance. 125  

In a similar but more easy-to-implement approach, correlation of sequential nitrogen resonances was proposed early on to benefit from the higher resolution of 15 N in order to alleviate spectral crowding. 126   These experiments are referred to as HNN experiments and are particularly useful for IDPs:

4D HN(ca)NH: this correlates a set of H i , N i resonances (HSQC cross-peak) to H i −1 and N i −1 (previous HSQC cross-peak) and H i +1 , N i +1 (following HSQC cross-peak). 127  

4D HN(coca)NH: this correlates a set of H i , N i resonances (HSQC cross-peak) to H i −1 and N i −1 (previous HSQC cross-peak). 128  

4D HN(cocanca)NH: this correlates a set of H i , N i resonances (HSQC cross-peak) to H i −2 and N i −2 resonances, which allows us to ‘walk’ through Pro residues that are abundant in IDPs but invisible to 1 H detected NMR. 129  

Heteronuclear or low gyromagnetic nuclei ( 13 C and 15 N) direct detection can overcome the problems described above by circumventing the use of 1 H N nuclei altogether. It is beneficial to detect 13 C/ 15 N in the direct dimension to fully utilize the available high resolution and high chemical shift dispersion by sampling more points ‘for free’. Sampling a higher number of points in the direct dimension does not add to the experimental time. Owing to the low gyromagnetic ratio of 13 C or 15 N, these experiments are relatively less sensitive than 1 H N direct detection, but with the improvement in hardware and availability of high-field magnets it has become feasible without impractical sample concentrations. Note that these experiments can be engineered with starting magnetization of 1 HA to retain higher sensitivity and short interscan delays. 1 HA direct detected experiments were also developed 130   which retain sensitive detection and observe Pro residues, but the issue of poor chemical shift dispersion of 1 HA remains.

Historically 13 C direct detection experiments were designed for studying proteins with a paramagnetic center without involving 1 H in any step, thus named ‘protonless NMR’. 131   Due to its low gyromagnetic ratio, 13 C does not suffer from transverse paramagnetic relaxation rate enhancement as much as 1 H. The benefits of 13 C direct detection for disordered proteins were later recognized and a suite of experiments were designed with polarization starting on 1 H. 132  

Ideally one would like to detect 13 CA owing to its large chemical shift dispersion and rich information content in terms of the amino acid type and secondary structure. However, 13 CA has large one-bond homonuclear scalar couplings with neighboring 13 CO and 13 CB nuclei. Several schemes have been developed to remove the couplings including band selective decoupling pulses applied by interrupting the FID collection 133   and machine learning algorithms for virtual decoupling during processing. 99   However, the most commonly used technique relies on the spin state selection method, i.e. , the in-phase/anti-phase (IPAP) scheme. The in-phase and anti-phase spin states are evolved and selected in an interleaved fashion and the respective FIDs are stored and processed separately. Since the one-bond scalar coupling 1 J CACO is constant, the two peaks in the doublet are shifted by half of 1 J CACO to the center frequency and added to obtain a decoupled peak. For 13 CA detection, this must be done twice to remove the effect of 1 J CACO and 1 J CACB separately, which requires the measurement of four FIDs per increment (double IPAP). 131  

A way of alleviating the need for double IPAP is with an alternate 13 C– 12 C isotope labeling scheme. This can be achieved using mixtures of 2- 13 C pyruvate and 3- 13 C pyruvate or of 2- 13 C glycerol and 1,3- 13 C glycerol as carbon sources for bacterial expression. This strategy enables alternate 13 C– 12 C labeling at most positions and allows detection of 13 CA without the substantial loss in sensitivity for high molecular weight systems due to the use of IPAP schemes. Complete deuteration at HA sites is critical for taking advantage of reduced dipole relaxation and can readily be achieved in 100% D 2 O medium with protonated amino acid precursors. 134   A 2D NCA experiment optimized for deuterated and alternately- 13 C labeled proteins correlates the chemical shifts of 13 CA with the two neighboring nitrogen nuclei (CA i –N i and CA i –N i +1 ). Thus, it allows sequential linking of the backbone resonances. The strategy can be extended to 3D experiments for higher spectral dispersion, like the 3D CANCA experiment for perdeuterated proteins that correlates 13 CA with the i and i  + 1 15 N and 13 CA resonances. 135  

However, strategies relying on direct detection of 13 CO are more popular, where only a single IPAP to remove the 1 J COCA coupling is necessary. Protein resonance assignment can be achieved using the following experiments that use the basic building blocks discussed in Sections 1.2.1 and 1.3.1 : 132 , 136  

(ha)CANCO: this connects both CA i and CA i −1 resonances to a specific N i –CO i −1 cross-peak.

(ha)CA(co)NCO: this connects only the CA i −1 resonance to a specific N i –CO i −1 cross-peak.

(hbha)CBCA(co)NCO: this connects the CA i −1 and CB i −1 resonances to a specific N i –CO i −1 cross-peak.

(hc)C(co)NCO: this connects all i  − 1 aliphatic side-chain resonances (TOCSY mixing) to a specific N i –CO i −1 cross-peak.

As intrinsically disordered proteins of interest become larger and their spectra become crowded and complex, optimal chemical shift dispersion and spectral resolution are desired, which are theoretically achieved by 15 N-direct detection. The main limitation of 15 N detection resides in its poor sensitivity due to the very low gyromagnetic ratio of 15 N. Nevertheless, 15 N has the advantage of a low CSA (compared to 13 CO), and thus 15 N detection becomes advantageous as the magnetic field increases. Moreover, 15 N detection does not require homonuclear decoupling which avoids the need for IPAP schemes. Broadband 13 C decoupling is readily achieved using adiabatic WURST pulses. Finally, TROSY selection can be employed in 15 N direct detection. It is also shown that, under physiological pH and ionic strength, 15 N TROSY direct detection is comparably sensitive to 1 H N TROSY detection. 137   This is especially important in high-field magnets with high Q -factor probes. Indeed, in a triple-resonance cryogenically cooled TXO probe (inner coil for 15 N and 13 C), the signal height of 1 H-detected resonances was reduced by more than 75% from 10 mM to 1 M NaCl concentration ( Figure 1.2E ). In contrast, the signal height of the 15 N-detected resonances was substantially less affected, reduced by 20% from 10 mM to 1 M NaCl concentration.

Unlike for 1 H-detected TROSY, deuteration is not mandatory to benefit from 15 N-detected TROSY due to its lower sensitivity to 1 H dipolar broadening. 137   This facilitates studies of proteins that require eukaryotic expression and cannot be fully deuterated. It is shown that the combination of 15 N TROSY detection with CRINEPT coherence transfer allows the observation of the main chain amide resonances of nondeuterated proteins as large as 150 kDa. 138  

Two approaches have successively been developed for 15 N detection. The first relies on buffer deuteration and direct detection of the slowly relaxing 15 N 2H resonances. 139   The second employs TROSY selection of the slowly relaxing spin state of 15 N in protonated buffers. 140   It can be used in physiological conditions, but it requires the IPAP scheme for TROSY selection. At ultra-high magnetic fields (1 GHz and above), TROSY 15 N direct detection is expected to provide the best sensitivity and resolution for intrinsically disordered proteins. 34   In both cases, protein resonance assignment is done with the following experiments:

(haca)COCAN: this connects both CA i and CA i −1 resonances to a specific N i –CO i −1 cross-peak.

(ha)CACON: this connects only the CA i −1 resonance to a specific N i –CO i −1 cross-peak.

(hbha)CBCACON: this connects the CA i −1 and CB i −1 resonances to a specific N i –CO i −1 cross-peak.

(hc)C(ca)CON: this connects all i −1 aliphatic side-chain resonances (TOCSY mixing) to a specific N i –CO i −1 cross-peak.

Groundbreaking advances in solution NMR have unfurled hitherto hidden vistas of structural and mechanistic biology. Techniques such as relaxation dispersion and CEST experiments have illuminated the existence of minor conformations, thus unveiling their biological significance. Key phenomena such as protein allostery, conformational exchanges and selections, drug binding to allosteric pockets, and the mechanistic effects of disease-causing mutations have been brought to the fore by astute NMR studies. However, these intricate investigations invariably hinge on resonance assignments. Presently, we are equipped with an arsenal of methods for obtaining resonance assignments of proteins, meticulously detailed in the preceding sections. For larger proteins—those exceeding 50 kDa—utilizing backbone amine and side-chain methyl (AILVMT) resonances forms the cornerstone of effective analysis. Furthermore, strategic inclusion of key aromatic residues in a deuterated background can prove invaluable when necessary. For disordered proteins, the utilization of 13 C and 15 N detection experiments presents a host of benefits, not least of which includes the enhanced resolution and the ability to detect proline residues, often found abundantly in disordered proteins. The application of non-uniform sampling further heightens resolution in the indirect dimension, with the added benefit of augmenting sensitivity through the collection of more scans. The path forward in NMR resonance assignment is paved with a fusion of inventive biochemistry (such as pyruvate labeling and ILV labeling), cutting-edge hardware improvements (like high-field magnets, cryogenically cooled probes, and specially designed probes for low gamma nuclei), and the development of novel NMR experimental methods and data processing and analysis techniques. The judicious utilization of chemical shift resources can significantly bolster these endeavors. Looking ahead, there is a burgeoning array of innovative techniques on the horizon, including those leveraging artificial intelligence and machine learning, poised to automate resonance assignment. These avant-garde methods have the potential to harness the existing technologies and techniques, significantly transforming the landscape of resonance assignment, providing access to larger molecular weight systems in the coming years.

  • Campaigning and outreach
  • News and events
  • Awards and funding
  • Privacy policy
  • Journals and databases
  • Locations and contacts
  • Membership and professional community
  • Teaching and learning
  • Help and legal
  • Cookie policy
  • Terms and conditions
  • Get Adobe Acrobat Reader
  • Registered charity number: 207890
  • © Royal Society of Chemistry 2023

This Feature Is Available To Subscribers Only

Sign In or Create an Account

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • HHS Author Manuscripts
  • PMC10358425

Logo of nihpa

Solution NMR Backbone Assignment of the C-Terminal Region of Human Dynein Light Intermediate Chain 2 (LIC2-C) Unveils Structural Resemblance with Its Homologue LIC1-C

Morkos a. henen.

1 Department of Biochemistry and Molecular Genetics, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA;

Natasia Paukovich

Rytis prekeris.

2 Department of Cell and Developmental Biology, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA;

Beat Vögeli

Associated data.

The chemical shift assignment of the human LIC2-C (BMRB code 51890) has been deposited in the Biological Magnetic Resonance Data Bank.

Dynein, a homodimeric protein complex, plays a pivotal role in retrograde transportation along microtubules within cells. It consists of various subunits, among which the light intermediate chain (LIC) performs diverse functions, including cargo adaptor binding. In contrast to the vertebrate LIC homolog LIC1, LIC2 has received relatively limited characterization thus far, despite partially orthogonal functional roles. In this study, we present a near-to-complete backbone NMR chemical shift assignment of the C-terminal region of the light intermediate chain 2 of human dynein 1 (LIC2-C). We perform a comparative analysis of the secondary structure propensity of LIC2-C with the one previously reported for LIC1-C and show that the two transient helices in LIC1 that interact with motor adaptors are also present in LIC2.

1. Biological Context

Dynein 1 (‘dynein’), one of the two classes of dynein in vertebrates, is the predominant microtubule minus-end-directed motor [ 1 , 2 ]. Dynein is actively involved in a diverse range of cellular trafficking processes, including the cargo transport of proteins, RNA and vesicles, nuclear migration, or cell division [ 3 , 4 ]. Processive movement of dynein on microtubules requires binding to the cofactor dynactin, and specific adaptor proteins link the dynein–dynactin complex to cargo [ 5 ].

Dynein is a homodimeric multi-protein dynein complex, composed of two heavy chains and multiple accessory chains consisting of intermediate, light, and light intermediate chains [ 6 ]. The light intermediate chain (LIC) binds the various adaptors and has been identified as one factor responsible for metaphase to anaphase progression through the inactivation of the spindle assembly checkpoint (SAC) [ 7 ]. The close homologs LIC1 and LIC2 are believed to work in mutually exclusive complexes to perform distinct [ 8 , 9 ] but also compensatory functions [ 10 ]. LIC1 plays a larger role in maintaining spindle pole integrity. In contrast, LIC2 has been reported to govern spindle orientation through interactions with 14–3-3, Par3, and NuMA [ 11 ]. Notably, LIC2, but not LIC1, has been found to transport NuMA asymmetrically to the spindle poles [ 11 ]. The structural basis of this disparity between LIC1 and LIC2 is not known. While we have studied LIC1 in solution at the structural level [ 12 ], there is no such data on LIC2 available.

Employing NMR chemical shifts, we have previously shown that the C-terminal region of LIC1 is disordered with a propensity to form two helices (helices 1 and 2) [ 12 ]. Our NMR and animal model data demonstrated that helix 1 is essential for binding all tested dynein adaptors, and helix 2 plays an additional but non-essential role. Pulldown assays indicated that LIC2-C also binds to these adaptors, but we did not pursue a more detailed characterization, such as an NMR analysis. Sequence alignment between the C-terminal regions of LIC1 and LIC2 reveals that the two regions which comprise helical propensity in LIC1-C are conserved in LIC2-C ( Figure 1 ).

An external file that holds a picture, illustration, etc.
Object name is nihms-1915380-f0001.jpg

Sequence alignment between LIC1-C (residues 381–523) and LIC2-C (375–492). These regions exhibit ~47% sequence similarity, but the regions harboring helix 1 and helix 2 in LIC1-C are highly conserved in LIC2-C. Residue colors indicate side-chain chemistry: yellow, proline; blue, hydrophobic; red, positively charged; magenta, negatively charged; green, polar uncharged; orange, glycine; cyan, aromatic.

A comprehensive understanding of LIC2-C’s mechanism underlying intracellular transport and specific differences with LIC1 is critical to rationalize the implications of dysfunctions and mutations in various diseases [ 13 , 14 ]. In this article, we take the first steps towards the characterization of the C-terminal region of LIC2-C using NMR, by assigning the chemical shifts of the backbone and deciphering secondary structure features.

2. Methods and Experiments

2.1. protein expression and purification, (a) construction design and cloning:.

To obtain the LIC2-C construct (residues 375–492 of LIC2), we introduced an additional tryptophan residue at the N-terminus for convenient measurement of protein concentration using absorbance at 280 nm. Furthermore, a C-terminal (6x::His) tag was incorporated. The construct was successfully cloned into the pGEX-6p-1 expression vector, which contained an N-terminal GSTag fused to a preScission protease cleavage site. The plasmid was transformed into Escherichia coli strain BL21 (LEMO21-DE3).

(B) Media Preparation and Induction:

For protein expression, M9 media was prepared, containing 1 g/L 15 N-ammonium chloride and 2 g/L 13 C-glucose. The expression was induced with 0.4 mM isopropyl-1-thio-d-galactopyranoside when the optical density at 600 nm (A600) of the culture reached 0.6. The induced culture was incubated overnight at 20 °C.

(C) Cell Harvesting and Lysis:

The cells were harvested by centrifugation at 4 °C for 10 min at 5000× g . Subsequently, the cell pellet was resuspended in low-imidazole binding buffer (20 mM HEPES, 200 mM NaCl, 1 mM EGTA, 1 mM MgCl 2 , 1 mM NaN 3 , 20 mM imidazole, pH 7.3), subjected to sonication for cell disruption, and the lysate was clarified by centrifugation at 30,900× g .

(D) Protein Purification:

The clarified lysate was purified using a HisTrap FF column (Cytiva). The protein of interest was eluted using high-imidazole buffer (20 mM HEPES, 200 mM NaCl, 1 mM EGTA, 1 mM MgCl 2 , 1 mM NaN 3 , 200 mM imidazole, pH 7.3). Subsequently, the eluted protein was dialyzed in preScission protease cleavage buffer (50 mM Tris-HCl, 150 mM NaCl, 10 mM EDTA, 1 mM DTT, pH 8.0, 20% glycerol). The protein was incubated with the protease overnight at 4 °C, and the cleavage efficiency was assessed by analyzing the cleaved product on a 4–12% gradient SDS-PAGE gel.

The cleaved protein was concentrated to a final volume of 4 mL. Further purification was performed using a size-exclusion HiLoad 16/600 Superdex 75 pg column (Cytiva) in NMR buffer (50 mM NaP, 150 mM NaCl, 1 mM DTT, 0.02% NaN 3 , pH 6.5). The relevant fractions were concentrated using a 3000 MWCO concentrator to achieve a final concentration of 840 μM for subsequent NMR measurements.

2.2. NMR Spectroscopy

13 C- and 15 N-labeled NMR sample of LIC2 was prepared in the NMR buffer at a concentration of 840 μM and measured in a 5 mm Shigemi tube. NMR spectra were acquired on BRUKER Avance NEO 600 MHz triple-resonance cryoprobe spectrometers at 25 °C. Backbone assignment was accomplished using 1 H- 15 N HSQC, 3D HNCACB, HN(co)CACB [ 15 , 16 ], and HNN pulse sequences [ 17 ]. The 3D spectra were obtained using a nonuniform sampling (NUS) scheme generated by the NUS@HMS scheme generator [ 18 ]. In the direct dimension, 2048 complex data points were acquired, while the indirect 13 C and 15 N dimensions were subsampled by 25–30% from the original 256 and 92 points, respectively. The spectral widths used for all experiments were 8196 Hz ( 1 H), 2129 Hz ( 15 N), and 12,076 Hz (13Cα/13Cβ). The number of scans was set to 16 and 32 for 3D experiments and 2D HSQC, respectively, with an interscan delay of 1.0 s. Reconstruction of the 3D NUS spectra was performed using the hmsIST software [ 18 ], while linearly acquired 2D spectra were zero-filled using NUS as an alternative to linear prediction. A solvent subtraction function was applied in the direct dimension. Further data processing and visualization were conducted using NMRpipe/NMRDraw [ 19 ] and NMRFAM Sparky [ 20 ]. The resonance assignment was carried out using the CCPNmr analysis software v2.5.1 [ 21 ].

3. Assignment and Data Deposition

We confirmed our prediction [ 12 ] that the LIC2-C protein is disordered through the narrow peak dispersion observed in the 1 H- 15 N HSQC spectrum ( Figure 2 ). Because of the combination of overlapping peaks, including the disorder of the protein and the existence of numerous redundant charged regions, we incorporated an HNN experiment into our standard backbone experiments. This additional experiment offers orthogonal connectivity information. ( Figure 2A ). Thus, we were able to obtain a near-complete backbone assignment (93.3%) ( Table 1 ; Figure 2B ). The missing residues are either heavily overlapped or in a sequence-redundant region. We have deposited the backbone assignment for LIC2-C in the BMRB under the accession code 51890.

An external file that holds a picture, illustration, etc.
Object name is nihms-1915380-f0002.jpg

Assignment of LIC2-C 375–492 using 3D backbone NMR and HNN experiments. ( A ) F3-F2 plane displaying 1 H- 15 N correlations through the 3D HNN spectrum at the F1- 15 N chemical shift of 119.9 ppm (orange) superimposed on the 2D 1 H- 15 N HSQC spectrum (blue). Off-diagonal peaks (circles) indicate the 15 N chemical shifts of the preceding and succeeding residues relative to the diagonal peak (square; V428). ( B ) 1 H- 15 N HSQC spectrum demonstrating peak assignment.

Backbone assignment statistics of LIC2-C.

4. Chemical Shift Analysis

We analyzed LIC2-C and compared it to LIC1-C by calculating the Secondary Structure Propensity (SSP) score [ 22 ] using HN, N, Cα, and Cβ chemical shifts. The SSP score varies between +1, representing a fully formed helix, and −1, indicating a fully formed β-strand. Generally, loops and disordered residues have an SSP score of around 0. The majority of the LIC2-C residues have a negative score, mostly more negative than −0.2 ( Figure 3 ). This is indicative of structural disorder. However, we found two regions, residues 423–439 and 468–477, show a positive score, usually between 0 and +0.2, but peaking at ~0.3. These values are indicative of moderate helical propensity and, thus, of transiently formed helices. Interestingly these regions coincide with those that also show helical structure in LIC1 [ 12 ]. Of note, the scores for LIC2 are generally larger than for LIC1, suggesting a higher helical propensity. Following the nomenclature used for LIC1, we designate these LIC2 regions helix 1 (423–439) and helix 2 (468–477).

An external file that holds a picture, illustration, etc.
Object name is nihms-1915380-f0003.jpg

Secondary structure propensities for LIC2-C in comparison to LIC1-C. Data are derived from chemical shift values of 1 HN, 15 N, Cα, and Cβ. An SSP score of +1 indicates a fully formed α-helix, while −1 indicates a fully formed β-strand. Top: LIC2-C, regions 423–439 and 468–477 show helical propensity (helix1 and helix2) and are separated by more disordered parts. Bottom: LIC1-C, regions 442–453 and 493–502 have been shown to have helical propensity (helix 1 and helix2) [ 12 ].

In LIC1, helix 1 constitutes the main interaction site for all tested dynein adaptors thus far and appears as a fully formed helix in dynein-dynactin-adaptor complexes in X-ray crystallography and cryo-electron microscopy images [ 23 , 24 ]. On the other hand, helix 2 is a secondary binding site for these adaptors except for Rab interacting lysosomal protein (RILP). Considering the conservation of both helix 1 and 2 in LIC2, it is reasonable to hypothesize that they may have a similar functional role in LIC2 as observed in LIC1. These findings are in agreement with our previous prediction of the presence of helical content, also based on the high amino acid conservation of these residues between LIC1 and LIC2 [ 12 ]. The results are also in line with our previous pulldown experiments, where we narrowed down the main adaptor interaction site to lie within residues 375 and 450, including L436 and L437.

The similarity of the helices stands in strong contrast to the fact that the phosphorylation sites on LIC1 are very different from those in LIC2 [ 25 ]. The C-terminal segment (LIC-C) is a hotspot for phosphorylation, and there is increasing evidence that LIC-adaptor interactions are spatiotemporally regulated by phosphorylation [ 26 ]. In contrast to the diverse range of kinesin transporters involved in plus-end directed intracellular transport, there is only one dynein available for many different types of transport, engaging many cargo-specific adaptors. Our results suggest that the multifunctionality of the dynein complex is neither facilitated by interaction sites that would be in different regions of LIC1 and LIC2, but more likely by post-translational modifications. The resonance assignment of LIC2, in combination with the previously published assignment of LIC1, serves as a foundation for future investigations into these unknown mechanisms.

Acknowledgments:

The authors thank David Jones (University of Colorado, Denver) for his help and support. We also thank Reto Gassmann, Institute for Research and Innovation in Health (i3S), Porto, Portugal, for fruitful discussions and for providing the plasmid for LIC2-C.

The project was supported by NIH grants R01 GM130694 to BV, 1R21 AI171827 to MAH, and R01 GM143774 to R.P., University of Colorado Cancer Center Support Grant P30 CA046934, and NIH Biomedical Research Support Shared Grant S10 OD025020-01.

Conflicts of Interest: The authors declare they have no conflict of interest.

Data Availability Statement:

IMAGES

  1. NMR spectra and assignment of HuPrP(90–231, M129, Q212P). (A) 1H-15N...

    nmr backbone assignment

  2. CBCANH / HNCACB

    nmr backbone assignment

  3. (PDF) NMR backbone assignment and dynamics of Profilin from Heimdallarchaeota

    nmr backbone assignment

  4. Triple Resonance Backbone Assignment

    nmr backbone assignment

  5. (PDF) RIBRA

    nmr backbone assignment

  6. (PDF) RIBRA

    nmr backbone assignment

VIDEO

  1. Final Day of the Jackal 05.09.13

  2. Final Paper 1: FR

  3. bhic 134 previous year solve paper

  4. NMR Spectroscopy

  5. NMR 28 06 Part 1

  6. ACD/2D NMR Manager Manual Structure Assignment (Przypisanie strukury)

COMMENTS

  1. 6.2: Heteronuclear 3D NMR- Resonance Assignment in Proteins

    Figure VI.2.D An example of combined use of HNCACB and CBCA(CO)NH experiments for the backbone NMR resonance assignment in proteins. Cα and Cα labels are color coded: blue for intra-residual signals and green for preceding carbons (Cα-1, Cβ-1). HNCACB contours are color-coded: black for positive signals (Cα) and red for negative ones (Cβ).

  2. Robust automated backbone triple resonance NMR assignments of ...

    Assignment of resonances of nuclear magnetic resonance (NMR) spectra to specific atoms within a protein remains a labor-intensive and challenging task. Automation of the assignment process...

  3. Triple Resonance Backbone Assignment

    Standard triple resonance backbone assignment of proteins is based on the CBCANNH and CBCA (CO)NNH spectra. The idea is that the CBCANNH correlates each NH group with the Cα and Cβ chemical shifts of its own residue (strongly) and of the residue preceding (weakly).

  4. PDF CcpNmr Analysis Version 3 Backbone Assignment Tutorial

    Backbone Assignment Tutorial 0 Introduction These tutorials are designed to guide you through a sequential triple resonance backbone assignment using Ccpnmr AnalysisAssign Version 3.0, they are not intended to teach any theoretical aspects of NMR. For a practical guide, please visit http://www.protein-nmr.org.uk.

  5. Backbone assignment of perdeuterated proteins by solid-state NMR using

    As a guide, this protocol describes a procedure for the chemical shift assignment of the backbone atoms of proteins in the solid state by 1 H-detected ssNMR. It requires a perdeuterated,...

  6. Backbone-independent NMR resonance assignments of methyl ...

    Preparation of NMR samples, backbone, and methyl assignments. All proteins were overexpressed in M9 medium in 2 H 2 O containing 2 g l −1 2 H 13 C glucose (Sigma #552151) and 1 g l −1 15 NH 4 Cl.

  7. Practical aspects of NMR signal assignment in larger and challenging

    Backbone assignment is often considered completed when all signals in an H-N correlation map have been assigned. For small proteins, with little overlap, the first step in defining spin systems consists in peak picking all H/N signals in an HSQC spectrum. ... Many NMR assignment software packages feature automated routines for performing this ...

  8. Towards automatic protein backbone assignment using proton ...

    Backbone assignment, associating NMR signals to the backbone nuclei of a certain residue, is the first and essential step of every NMR study. For proton-detected experiments, several different backbone assignment strategies were developed depending on the protonation level.

  9. Assignment Theory

    The most simple and straight forward method of backbone resonance assignment involves the use of 15 N, 13 C labelled protein and the measurement of CBCANNH and CBCA (CO)NNH spectra. Large Proteins Large proteins give worse NMR spectra, because they tumble more slowly.

  10. NMR Backbone Assignment of Large Proteins by Using

    A highly sensitive assignment: NMR backbone assignment with a high accuracy of 97.7 % is accomplished by using only HN (CO)CA and HNCA spectra, which show a very high sensitivity even for large proteins.

  11. APSY-NMR for protein backbone assignment in high-throughput structural

    A standard set of three APSY-NMR experiments has been used in daily practice to obtain polypeptide backbone NMR assignments in globular proteins with sizes up to about 150 residues, which had been identified as targets for structure determination by the Joint Center for Structural Genomics (JCSG) under the auspices of the Protein Structure Initiative (PSI). In a representative sample of 30 ...

  12. A Practical Implementation of Cross-spectrum in Protein Backbone

    In protein NMR spectroscopy the backbone resonance assignment is a key step in the characterization of protein structure and dynamics. Numerous efforts have been made to expedite this critical, time-consuming, and labor-intensive task by improvements to the NMR data collection schemes as well as the analysis methods.

  13. Pine

    Pine. PINE is a probailistic method for automated protein backbone and sidechain assignments, detection and correction of referencing and secondary structure determination from input protein sequence and NMR data set peak lists. Expand the "SUBMISSION" tab below for free access to the PINE analysis web-server maintained by NMRFAM.

  14. NMR Structure Determination for Larger Proteins Using Backbone-Only

    The first step in protein structure determination by nuclear magnetic resonance (NMR) is chemical-shift assignment for the backbone atoms. In contrast to the subsequent assignment of the side chains, this process is now rapid, reliable, and largely automated (1-5).Global backbone structural information complementing the local structure information provided by backbone chemical-shift ...

  15. Double Resonance Backbone Assignment

    The 15N-NOESY-HSQC will show for each NH group all 1 H resonances which are within about 5-7Å of the NH hydrogen. Assignment is done on the assumption that the two neighbouring NH groups are always visible. Thus two NH groups can be linked because they each have an NOE to the other NH group.

  16. Automated NMR resonance assignments and structure determination using a

    This is typically addressed through first establishing the chemical shift assignments of backbone and sidechain atoms using multiple (6-10) triple-resonance spectra 2, 3, which are then used as...

  17. Backbone-independent NMR resonance assignments of methyl probes in

    To circumvent the need for existing backbone assignments, methyl NOE data and a known structure of the target protein can be used to derive a set of possible assignments, by fitting local NOE networks to methyl distances derived from the three-dimensional (3D) structure.

  18. New Solid-State NMR Method for Protein Backbone Assignment

    C. Shi et al. BSH-CP based 3D solid-state NMR experiments for protein resonance assignment, J Biomol NMR (2014) Epub ahead of print. R. Verel et al. A homonuclear spin-pair filter for solid-state NMR based on adiabatic-passage techniques. Chem Phys Lett (1998) 287:421-428.

  19. Protein NMR Resonance Assignment

    Backbone amide proton (H N) and α proton (H α) signals were sequentially assigned based on the distance information between H N i and \ ( {\rm H}^ {\alpha}_ { {\rm i}-1}\), and were aligned on the amino acid sequence of the particular protein.

  20. NMR Backbone Assignment of VIM-2 and Identification of the Active

    The NMR backbone resonance assignment for the metallo-β-lactamase VIM-2 (84%) is disclosed, providing the basis for rational development of a clinically applicable inhibitor, which will be a long sought tool in fighting antibiotic resistance.

  21. Decoding Atomic Addresses: Solution NMR Resonance Assignment of

    The early 1990s marked a significant period in protein NMR assignment as key technologies became readily available for these experiments. ... the striking disparity in sensitivity among the triple-resonance experiments typically used for routine sequential backbone assignment, especially for large molecular weight systems. Despite implementing ...

  22. NMR backbone assignment of the Cε4 domain of immunoglobulin E

    Although the backbone assignments of the Cε2 and Cε3 monomers have been completed using solution state NMR techniques (McDonnell et al. 2001; Borthakur et al. 2012 ), the backbone assignment of Cε4 has not been published.

  23. Solution NMR Backbone Assignment of the C-Terminal Region of Human

    In this study, we present a near-to-complete backbone NMR chemical shift assignment of the C-terminal region of the light intermediate chain 2 of human dynein 1 (LIC2-C). We perform a comparative analysis of the secondary structure propensity of LIC2-C with the one previously reported for LIC1-C and show that the two transient helices in LIC1 ...