• Keine Ergebnisse gefunden

Joint use of Small Angle X-ray Scattering with high resolution methods to study flexible biological macromolecules

N/A
N/A
Protected

Academic year: 2021

Aktie "Joint use of Small Angle X-ray Scattering with high resolution methods to study flexible biological macromolecules"

Copied!
145
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Joint use of Small Angle X-ray Scattering with high resolution

methods to study flexible biological macromolecules

Dissertation

zur Erlangung des Doktorgrades

an der Fakultät für Mathematik, Informatik und Naturwissenschaften Fachbereich Chemie

der Universität Hamburg

vorgelegt von Giancarlo Tria

(2)

Die hier vorliegende Arbeit wurde im Zeitraum zwischen August 2010 und Februar 2014 am European Molecular Biology Laboratory (Hamburg Outstation c/o DESY) im Arbeitskreis von Dr. Dmitri I. Svergun

1. Gutachter: Dr. Dmitri I. Svergun

European Molecular Biology Laboratory (Hamburg Outstation c/o DESY)

2. Gutachter: Prof. Dr. Andrew Torda

Bioinformatik Zentrum (Fachbereich Chemie der Universität Hamburg)

(3)

Alla mia famiglia

Il nostro Giuoco delle perle di vetro assomma in sé i tre principi: scienza, venerazione del bello e meditazione, di modo che un autentico giocatore di perle dovrebbe essere impregnato di serenità come un frutto maturo del suo dolce succo, e anzitutto dovrebbe avere in sé la serenità della musica, la quale non è altro che coraggio, passo sereno e danza sorridente attraverso gli orrori e le fiamme del mondo, festosa offerta d'un sacrificio. (Hermann Hess)

(4)
(5)

Acknowledgments

I would like to thank the people that made my PhD time at EMBL Hamburg really worthwhile. As a matter of fact, there are probably too many people to acknowledge and I am afraid to forget someone. In such a case, I excuse myself for the forgetfulness.

First of all, I would like to thank the entire BioSAXS group at EMBL Hamburg for guiding me through the world of Small Angle X-ray Scattering. Interesting discussions and practical feedbacks combined with a great working atmosphere made my PhD experience something really special.

Many thanks to Margret Fischer and to the entire EMBL administration team for their invaluable contribution on making my life easier and taking care of me as a family member.

I would also like to say thanks to all my colleagues for making my time at EMBL fascinating and funny at the same time. I would especially like to thank Fabio Dell’Antonia, Melissa Gräwert and Al Kikhney – which I used to share the office with – for patiently listening to my philosophical speeches about science and live in general.

My great thanks go to EMBL colleagues Matthew Dunne, Melissa Gräwert, Cy Jeffries, Haydyn Mertens and Stacey Southall for proofreading, including this dissertation.

Then I would like to thank my university supervisor Andrew Torda and my Thesis Advisory Committee members Victor Lamzin and Gerard Kleywegt for supporting and supervising my work. Additionally, I would like to thank Pau Bernadó for providing me all the information necessary to understand the tool EOM, object of this dissertation, and for all the constructive discussions about possible improvements. A special thank goes to Dritan Siliqi for introducing me in the world of Macromolecular Crystallography and for the number of interesting discussions about potential combinations of Small Angle X-ray Scattering in solution with Macromolecular Crystallography. I would like to thank Dritan Siliqi even more for the valuable advices provided almost on the daily basis about general aspects of science and society.

(6)

I would like to take this chance to thank the collaborators Christian Koch (section 2.1), Andrea Ilari (section 2.3), Daniela Schneeberger (section 5.5), Bodo Sander (section 6.1) and Magda Muller (section 6.2), for the successful collaborations made together.

I acknowledge the EMBL International PhD Programme for providing me with a fellowship covering the entire PhD time. In this regards, I would also like to acknowledge Universität Hamburg and iUCr for providing me with financial support to attend courses and conferences all around the world.

Last but definitely not least, my greatest acknowledgment goes to Dmitri Svergun for having believed in me and giving me the opportunity to be part of the BioSAXS group at EMBL Hamburg. Under his careful supervision I had the chance to run a wonderful as well as interesting and challenging adventure in the field of science.

(7)
(8)

Abstract

Small-Angle X-ray Scattering (SAXS) in solution is an established technique for the structural characterization of biological macromolecules. SAXS allows one to study proteins, nucleic acids and their complexes in near physiological environments. Since no crystals are needed, structural responses to changes in experimental conditions such as temperature, pressure and chemical modifications can be addressed. Over the last decades, important improvements in data analysis methods have significantly increased the number of biological questions that can be answered using SAXS. The method is especially powerful in combination with other biophysical and biochemical techniques, and SAXS often plays one of the major roles in complicated structural as well as functional investigations using hybrid approaches. Recent developments made solution SAXS very useful for the studies of flexible systems such as intrinsically disordered proteins (IDPs) and multi-domain proteins with unstructured regions. For many systems, especially for large complexes, SAXS is the only structural technique directly assessing particle flexibility. A recently developed approach where flexible systems are represented by an ensemble of conformations rather than a single model (Ensemble Optimization Method (EOM), Bernado et al. 2007) allowed one to provide quantitative descriptions of macromolecular flexibility in solution. The ensemble approach implemented in several computer programs became widely used by the community, but its use revealed some limitations. These limitations include restricted range of applicability to different type of macromolecules (with different flexibility scenarios) and also potential over- or under-interpretation of the data. The present dissertation is aimed at enhancing the capabilities offered by SAXS in solution for flexible macromolecules. The improvements described in this work, mainly devoted to molecular modelling and data analysis, shall overcome many of the limitations encountered when SAXS is employed to assess flexibility. The concept of using an ensemble for the characterization of macromolecular motion has been here further exploited and broadened to complicated scenarios, so far inaccessible by the method. The implementation of a rapid and effective procedure to generate flexible fragments with a possibility to account for particle symmetry allows one to build adequate multi-subunit models particles. This makes SAXS suitable for quantitative characterization of challenging systems where disorder comes in combination with other protein features, e.g. point group symmetry. In another development, the ensemble representation has been extended to account for mixtures of different oligomeric states of

(9)

a protein making it possible to study polydisperse solutions. Additionally, problems of potential over- or under-representations of the data, typical of the ensemble approaches, were addressed, and two new metrics, Rflex and Rcheck, are introduced. Here, Rflex is used as a measure of flexibility – based on the concept of entropy coming from the field of information communication – whereas Rcheck is employed as a control parameter to detect potential artifacts. For a more intuitive interpretation, these metrics are complemented by a detailed graphical representation providing a comprehensive description of the behavior of flexible particles in solution. These new capabilities are extensively described in this dissertation with a particular focus on the methodology as well as their application in structural studies that already profited from the prototypal implementation of the method (program EOM 2.0). At a more general level, this dissertation aims at encouraging the use of a hybrid approach as standard modus operandi in the field of structural biology where detailed atomic observations – offered by high resolution techniques – are synergistically combined with the overall picture offered by SAXS in solution.

(10)

Zusammenfassung

Röntgen-Kleinwinkel-Streuung (SAXS, englisch: small-angle X-ray scattering) ist eine etablierte biophysikalische Methode zur strukturellen Charakterisierung von biologischen Makromolekülen in Lösung. Sie ermöglicht die Analyse von Proteinen, Nukleinsäuren so wie deren Komplexen unter nahezu physiologischen Bedingungen. Da kein Kristall benötigt wird, können strukturelle Antworten auf Veränderungen in den experimentellen Parametern wie Temperatur, Druck und chemische Modifikationen untersucht werden. Während der letzten Dekaden haben wichtige Verbesserungen in den Methoden zur Datenanalyse die Reichweite der biologischen Fragestellungen, die mit SAXS beantwortet werden können, signifikant erhöht. Vor allem in Kombination mit anderen biophysikalischen und biochemischen Techniken stellt SAXS eine besonders leistungstarke Methode dar und wird demnach als essenziell für die Durchführung komplizierter struktureller sowie funktioneller Studien mittels hybrider Ansätze betrachtet. Des weiteren ist SAXS in Lösung besonders geeignet zur Untersuchung von flexiblen Systemen wie intrinsisch ungeordneten Proteinen (IDPs, englisch: intrinsically disordered proteins) und Proteinen, die aus mehren Domänen bestehen und unstrukturierte Regionen aufweisen. Für viele Systeme, insbesonders sehr grosse Komplexe, stellt SAXS die einzige Möglichkeit dar, die Flexibilität des Partikels zu analysieren. Mittels eines vor Kurzem entwickelter Ansatzes, mit der die Bewegung des Proteins durch ein Ensemble von Konformationen anstatt eines einzelnen Modells dargestellt wird (Ensemble Optimization Method (EOM), Bernado et al. 2007), erhält man eine quantitative Beschreibung der makromolekularen Flexibilität in Lösung. Dieser in einigen Computerprogrammen implementierte Ensemble-Ansatz wurde von der Gemeinschaft der SAXS Benutzern umfangreich verwendet. Sein Gebrauch hat, jedoch, einige Grenzen offenbart. Diese Grenzen beinhalten ein eingeschränkter Geltungsbereich für unterschiedliche Arten von Makromolekülen (für Szenarien mit unterschiedlicher Flexibilität) so wie die potentielle Fehlinterpretation der Daten. Das Ziel der vorliegenden Dissertation besteht darin, das Anwendungsspektrum von SAXS zur Untersuchung von flexiblen Makromolekülen in Lösung auszuweiten. Die hier vorgestellten Verbesserungen, vornehmlich der molekularen Modellierung und Datenanalyse gewidmet, sollen dazu beitragen, die praktischen Einschränkungen zu überwinden, auf die man bei der Anwendung von SAXS zur Untersuchung von Flexibilität stöβt. Das Konzept der Anwendung eines Ensembles zur Charakterisierung der makromolekularen

(11)

Bewegung wurde hier weiter ausgeschöpft und auf solche kompliziertere Szenarien ausgeweitet, die dieser Methode bislang nicht zugänglich waren. Die Implementation einer schnellen und effektiven Prozedur zur Generierung von flexiblen Fragmenten mit der Möglichkeit die Symmetrie des Partikels zu berücksichtigen, ermöglicht die Rekonstruktion adequater Modelle von Partikel mit mehreren Domänen. Dadurch kann SAXS auch zur qualitativen Charakterisierung herausfordernder Systeme angwendet werden, die sowohl Fehlordnung als auch andere Merkmale wie Punktgruppen-Symmetrie auweisen. In einer anderen Entwicklung wurde die Ensemble-Methode auch auf Mixturen von Volllängen Proteinen ausgeweitet und ermöglicht so, die Analyse von polydispersen Lösungen. Zusätzlich wurden Probleme mit Über- bzw. der Unterrepräsentationen, die typischerweise bei diesem Ansatz auftreten, in Angriff genommen und zwei neue metrische Maße, Rflex und Rcheck, werden eingeführt. Dabei wird Rflex – basierend auf dem Entropie-Konzept aus dem Fachbereich der Informations- und Kommunikationstechnologie – benutzt, um die Flexibilität zu messen, wobei Rcheck als Kontrollparameter zur Detektion potenzieller Artefakte angewendet wird. Für eine intuitivere Interpretation der Ergebnisse werden diese metrischen Maße mit einer detaillierten graphischen Repräsentation komplementiert, um eine umfassende Beschreibung des Verhaltens der flexiblen Partikel in Lösung zu erhalten. Diese neuen Einsatzmöglichkeiten werden extensiv in dieser Dissertation beschrieben. Dabei wird insbesondere auf die Methologie sowie ihre Anwendung in Struktur-/Funktionsstudien eingegangen, die bereits von der prototypischen Implementierung der Methode in das Programm EOM 2.0 profitiert haben. Auf einer allgemeineren Ebene soll diese Dissertation dazu anregen, den beschriebenen Hybrid-Ansatz als Standard modus operandi in der Strukturbiologie zu verwenden, um detaillierte atomare Beobachtungen – verfügbar durch hochaufgelöste Strukturbestimmungsmethoden – synergetisch mit dem Einblick in das „große Ganze“ – verfügbar durch SAXS in Lösung – zu kombinieren.

(12)

12

Table of contents

Acknowledgments ...5 Abstract ...8 Zusammenfassung ...10 Table of contents ...12 List of abbreviations ...14 List of figures ...16 List of tables ...28 1 Introduction ...29 2 SAXS basics ...34

2.1 Scattering by macromolecules in solution and data processing ...35

2.2 Ab-initio shape determination ...40

2.3 Rigid body modeling ...43

2.4 Mixtures ...46

2.5 Flexibility ...47

2.6 Summary ...49

3 Detection of flexibility: a critical approach ...50

3.1 Kratky representation ...53

3.2 Amino-acid number-to-Rg ratio as an indicator of globularity ...55

3.3 Absence of correlation peaks in the P(r) function ...57

3.4 Porod-Debye representation ...59

3.5 Low correlation densities in ab initio reconstruction ...60

3.6 Isolated domains in rigid body modeling approach ...62

3.7 Combined approaches using additional complementary data ...62

3.7.1 Radii of gyration (Rg) versus Hydrodynamic radius (Rh) ...62

3.7.2 Prediction of disorder using bioinformatics ...64

3.8 Conclusion ...66

4 Dealing with flexibility using SAXS: through the literature...68

4.1 Ensemble Optimization Method (EOM) (Bernado et al. 2007) ...69

4.2 MES – Minimal Ensemble Search (Pelikan, Hura, and Hammel 2009) ...73

4.3 BSS-SAXS (Yang et al. 2010) ...76

4.4 Ensemble Refinement of SAXS (EROS) (Rozycki, Kim, and Hummer 2011) ...79

4.5 ENSEMBLE (Krzeminski et al. 2013) ...80

4.6 Summary ...82

(13)

13

5.1 Smart missing section reconstruction ...85

5.2 Fixing domain positions ...92

5.3 Point group symmetry ...94

5.4 Measure of flexibility: Rflex & Rcheck ...100

5.5 Flexible chromosome size ...106

5.6 Multipools & multicurves fitting ...110

5.7 New graphical data representation ...111

5.8 Conclusions ...114

6 EOM 2.0: first successful applications ...115

6.1 Structural characterization of gephyrin by AFM and SAXS reveals a mixture of compact and extended states ...116

6.2 Small Angle X-Ray Scattering Studies of Mitochondrial Glutaminase C Reveal Extended Flexible Regions, and Link Oligomeric State with Enzyme Activity ...119

6.3 Structural Basis of a Kv7.1 (KCNQ1) Potassium Channel Gating Module: Studies of the Intracellular C-terminal Domain in Complex with Calmodulin ...122

6.4 Summary ...124

7 Conclusions ...125

References ...127

Appendix I – Curriculum Vitae ...136

Appendix II – Scientific publications ...138

Appendix III – Courses ...141

Gefahrstoffe und KMR-Substanzen ...143

Eidesstattliche Versicherung ...144

(14)

14

List of abbreviations

1D One-dimensional 2D One-dimensional 3D One-dimensional Å Angstrom AA Amino acid

AFM Atomic force microscopy

AI Artificial intelligence

AUC Analytical ultracentrifugation

BSA Bovine serum albumin

CD Circular dichroism

DESY Deutsches Electronen Synchrotron

DLS Dynamic light scattering

Dmax Maximum dimension

DR Dummy residue

DTT Dithiothreitol

EM Electron microscopy

EMBL European Molecular Biology Laboratory

EMBL-EBI European Molecular Biology Laboratory – European Bioinformatics Institute

EOM Ensemble Optimization Method

EPR Electron paramagnetic resonance

Fig. Figure

FRET Fluorescence resonance energy transfer

GA Genetic algorithm

HMM Hidden Markov model

IDP Intrinsically disordered protein

IDR Intrinsically disordered region

LED Light emitting diode

mg Milligram

ml Milliliter

(15)

15

MX Macromolecular crystallography

nm Nanometer

NMR Nuclear magnetic resonance

NN Neural network

PDB Protein data bank

PDBe Protein data bank Europe

PDBj Protein data bank Japan

RDC Residual dipolar coupling

Rg Radius of gyration

Rh Hydrodynamic radius

SANS Small angle neutron scattering

SAS Small angle scattering

SAXS Small angle X-ray scattering

SVM Support Vector Machine

Tab. Table

VDAM Dummy atom model volume

VP Hydrated particle volume

(16)

16

List of figures

Figure 1.1 World map of SAXS beamlines either dedicated or actively employed for

biological solution SAXS. Extracted with permission from (Graewert and Svergun 2013) © Current Opinion in Structural Biology ... 30

Figure 1.2 Number of publications referring to biological SAXS in solution over the last

decades. ... 30

Figure 1.3 SAXS dedicated beamlines at EMBL Hamburg. (top) P12 beamline currently

in operation at the 3rd generation ring PETRA III (c/o DESY, Hamburg, Germany). (bottom) Archive picture of X33 beamline at the 2nd generation ring DORIS (c/o DESY, Hamburg, Germany) shut down in 2012 and replaced by P12. Both the pictures are available at the official website of EMBL BioSAXS group (http://www.embl-hamburg.de/biosaxs/). ... 31

Figure 2.1 Schematic representation of a standard SAXS in solution measurement. .... 36 Figure 2.2 Small angle X-ray scattering of Bovine Serum Albumin (PDBe id 3V03)

measured at X33 (DORIS, EMBL Hamburg). (black) Scattering patterns from 5 mg/ml of solution BSA in 50 mM HEPES, pH 7.5, (red) solvent scattering and (blue) the difference curve containing the contribution from the protein alone (usually presented in a semi-logarithmic scale). The figure also highlights the information content of the experimental curve that can be summarized as information about shape at low angle and secondary elements at high angle, respectively. ... 36

Figure 2.3 Standard plots for characterization by SAXS. SAXS curves (A) and Guinier

plots (B) for BSA samples measured at X33 (DORIS, EMBL Hamburg) in different buffers condition showing (1) aggregation, (2) good quality data and (3) inter-particle repulsion. The Guinier fits for estimation of Rg and I(0) are displayed, with the linear

(17)

17

regions defining smin and smax used for parameter estimation indicated by the thick lines. Extracted with permission from (Mertens and Svergun 2010) © Journal of Structural Biology ... 38

Figure 2.4 Ab initio reconstruction of the bifunctional PpoA wild type enzyme from

Aspergillus nidulans (Koch et al. 2013). (A) SDS-PAGE from which PpoA wildtype results monodisperse and with MM ~110 kDa (B) Overall parameters extrapolated from the experimental SAXS data. According to the estimated MM (from Porod Volume and DAMMIF ab initio with P1 symmetry) the particle rearranges as a trimer while in solution. (C) Densely packed shape reconstruction is computed using DAMMIF/DAMAVER imposing P3 symmetry as constrain. ... 42

Figure 2.5 Ab initio GASBOR reconstruction (grey) from 5 mg/ml of solution Ubiquitin

(PDBe code 3EHV (blue)) in 20 mM MES, pH 6.0, measured at P12 (PETRA III, EMBL Hamburg). A superimposition (tool SUPCOMB) between the MX atomic model and SAXS ab initio one is shown, too. ... 43

Figure 2.6 Schematic representation of a trial-and-error approach where each generated

model is compared against the experimental curve in order to minimize the discrepancy (χ) of the final solution. ... 44

Figure 2.7 SeZnuA-SeZinT complex interaction as derived from rigid body modeling

using SAXS in solution (Ilari et al. 2013). Atomic structures of both SeZnuA (red, PDBe code 2XQV) and SeZinT (black, PDBe code 4ARH) were used and assembled using MASSHA into a rather globular complex fully compatible with the ab initio shape computed from experimental data using DAMMIF/DAMAVER (grey). A missing loop (118-141) in SeZnuA was then modelled using BUNCH (blue). The figure also shows the quality of the final solution (χ=1.148) as well as the linearity of the Guinier region and the derived pair-distance distribution function P(r). ... 45

(18)

18

Figure 2.9 Polydisperse solution of 6 mg/ml solution Bovine Serum Albumin (BSA,

PDBe id 3V03) in 50 mM HEPES, pH 7.5 measured at X33 (DORIS, EMBL Hamburg). ... 47

Figure 2.10 Schematic ensemble representation of a flexible 2 domains (yellow) protein

with interdomain disordered region (blue). Each conformation generates an individual scattering profile Ik(s) that composes the finale scattering I(s). ... 48

Figure 2.11 Schematic representation of the applications of SAXS in solution to

macromolecules. ... 49

Figure 3.1 Snapshot of the first page of the manuscript Breaking the protein rules

(Chouard 2011). ... 50

Figure 3.2 Schematic representation of the protein dynamics according to time-scale. The

red line (dash) identifies the range of motions detectable using SAXS in solution. .. 52

Figure 3.3 Comparison between Log plot (A, top-left) and Kratky plot (B, top-right) on

three coarse-grain degrees of compactness: folded particle (light blue), multi-domain particle with flexible inter-domain linker (green) and completely unfolded particle (red). ... 54

Figure 3.4 Graphical representation of the dependency of the theoretical Rg from the

number of residues in the chain according to the Flory’s equation. ... 56

Figure 3.5 Scattering profiles, Kratky plots and P(r) functions for dynamic (a, b, c) and

static (d, e, f) scenarios simulated for the four systems 2- (red), 3- (green), 4- (blue), and 5-ubiquitin (pink) respectively. Adapted with permission from (Bernado 2010) © European Biophysics Journal ... 58

Figure 3.6 Kratky plot and Porod-Debey plot for the intrinsically disordered region

Rad51 AP1 (red) and the construct MBP-Rad51 API (MBP = E. coli maltose binding protein) (black). A) Kratky plot display the classical well defined bell shape typical for a partially folded particle (black) and the monotonic function typical for the IDP

(19)

19

(red). B) Porod-Debye plot displays the so-called Porod plateau for the partially folded MBP-Rad51 AP1 whereas for IDP Rad51 API a monotonic trend is displayed (no plateau) (red). Please note that in the figure the scattering vector s is represented using the letter q. Extracted with permission from (Rambo and Tainer 2011) © 2011 Wiley Periodicals, Inc. ... 59

Figure 3.7 The figure displays 10 (from top-let to bottom-right) independent ab initio

shape determinations (computed using DAMMIF and superimposed using DAMAVER) for TAU protein measured at X33 beamline (DORIS, EMBL Hamburg) at a concentration of 2mg/ml. The Kratky plot in the inset suggests the presence of disorder confirmed by not superimposable DAMMIF reconstructions... 61

Figure 3.8 Graphical representation of a hypothetical pipeline for the critical assessment

of flexibility based on SAXS data. Each step of the pipeline represents an indicator for the presence of a disordered or flexible state. In order to avoid potential over-fitting due to an inappropriate use of ensemble representation, the assessment of flexibility should follow a step-by-step approach (the narrow) making the evidence of disorder stronger and stronger when multiple indicators are used. Moreover, external information (bioinformatics prediction, shape factor) can be employed to discriminate ambiguous cases. ... 67

Figure 4.1 Graphical representation of the challenge associated to the interpretation of

the scattering profile from a flexible system. Starting from an ensemble of models known a priori, the theoretical scattering in easily computed. The inverse operation, deconvolution of the experimental scattering into single model scattering, is by far not trivial. ... 68

Figure 4.2 Schematic representation of the EOM strategy for the analysis of SAXS data

from flexible systems. M conformations/curves/gene are generated as initial pool (left part, red) and used to create C chromosomes/ensembles used to feed the genetic

(20)

20

algorithm. Through processes of mutations, crossing and elitism the GA runs for G generations providing a final ensemble solution composed of N models/curves that fit the experimental profile. The complete process is repeated R independent times. The Rg distribution of the selected (N × R) conformations (right part, blue) is compared with that derived from the pool that represents the complete conformational freedom scenario according to the protein under observation. From this comparison it is possible to infer a quantitative structural estimation of the protein motion in solution. Extracted with permission from (Bernado and Svergun 2012) © Molecular bioSystems ... 70

Figure 4.3 (a) Domain structure of HMGB1 with the structured domains, A box (red) and

B box (blue), the basic linkers (yellow) and acidic tail (green) highlighted. (b) Schematic representations of a hypothetical compact (tail-bound) form of HMGB1 in which the DNA-binding faces are occluded, in equilibrium with an open (tail-free) form of the protein. (c) EOM Rg distributions comparison for the selected ensembles (red) and the initial pool of structures (grey). Adapted with permission from (Stott et al. 2010) © Journal of Molecular Biology ... 72

Figure 4.4 (A) SAXS characterization of solution PCNA-Ub complexes formed by either

split-function (green) or by chemical cross-linking (blue). The high chi-value for both suggest different reassemblies of the particles in solutions with respect to what observed in the crystal structure 3L10.pdb (χ2 = 23.8 and χ2 = 6.8 for split-fusion and cross-linking respectively). (B) Guinier regions indicating the good quality of the experimental SAXS data collected. (C) Ab initio shape reconstructions calculated from experimental scattering curves of split-fusion or cross-linked PCNA–Ub. Extracted from (Tsutakawa, S. E. et al., 2011) © Proc. Nat. Acad. Sci., USA, 2011. ... 74

Figure 4.5 (A) Schematic representation of MES methodology. One hundred thirty

(21)

21

ubiquitins per PCNA homotrimer were placed at the crystallographic (x) position, the MD-identified positions (a,b,c), or the not refined run of MES (f). (B) The scattering curve of the best MES ensemble fits the experimental scattering data better than the crystal structure (3L10). (C) Real space representation for the P(r) plots showing the good fit of the MES ensemble to the experimental data. (D) Ensemble of three models that best fit the experimental scattering curve are shown in ribbon models. Relative populations of each position in x, a/b/c, or f show that ubiquitin adopts both the crystallographic and computationally determined discrete positions indicating the presence of flexibility in solution. Extracted from (Tsutakawa, S. E. et al., 2011) © Proc. Nat. Acad. Sci., USA, 2011. ... 75

Figure 4.6 Representative assembly conformational states, ranging in architecture from

fully (1) to partially assembled (5) to disassembled states (9), used to represent the motion of the multi-domain Hck kinase. The catalytic domain is coloured in blue, SH2 in green, and SH3 in yellow. Extracted from (Yang et al. 2010) © Proc. Nat. Acad. Sci., USA, 2011. ... 77

Figure 4.7 (A) Discrepancy χ2 between experimental and simulated SAXS data at low (blue) and high (red) salt concentration, respectively, versus relative entropy S. The entropy threshold S0 below which the simulated data are considered over-fitted is indicated with black bot. (B) Optimized weights wk at the entropy threshold S0 versus initial weights wk(0). The black dashed line indicates wk = wk(0) whereas the dashed gray lines indicate a cluster free energy change of kT log(wk/wk(0))= ±2 kT. (C-D) Discrepancy χ2 as a function of the cumulative weight of the clusters at low and high salt respectively. Extracted with permission from (Rozycki, Kim, and Hummer 2011) © Structure ... 80

Figure 4.8 Fractional contact plots and cluster analysis of final ensembles. (A) Fractional

(22)

22

fraction of conformers with heavy atom distances shorter than 6 Å. Positions of phosphorylation sites (Thr-5, Thr-33, Thr-45, Ser-69, Ser-76, Ser-80) are indicated by solid black lines and contacts involving CPDs are marked by ovals. Positions of spin labels are indicated by red dashed lines. (B, C) The conformers from the three combined final ensembles for (B) Sic1 and (C) pSic1 are partitioned into 8 and 5 clusters, respectively, based on Cα-RMSDs. Fractional contact plots and one representative conformer from each cluster are depicted for ascending Rg (mean Rg for the cluster presented). Conformers are shown as rainbow-colored cartoons from blue to red from N- to C-terminus. Extracted with permission from (Mittag et al. 2010) © Structure ... 81

Figure 5.1 Bond angles distributions for random (red) and native (blue) conformations

used in EOM 2.0 as constrain for the modeling of the missing sections. ... 86

Figure 5.2 Distribution of Cα backbone angles and dihedrals for native (left) and random

(right) modelling. ... 87

Figure 5.3 Minimum number of rejects necessary to generate a chain (using threshold

Cα-Cα atoms distance, bond angles and dihedral angles as constrains) versus the length of the chain in number of residues for both the scenarios: random (red) and native (blue). ... 88

Figure 5.4 Example of representative coil generation in random (red) as well as native

(blue) mode. Random mode tends to generate coil completely unfolded whereas native mode makes chain with higher probability of having structure-like sections. ... 89

Figure 5.5 Comparison of the scaling factor parameter v (Flory’s relationship) extracted

from the models generated using EOM 2.0 with the values extracted theoretically (LeGuillou & Zinn-Justin, 1977) as well as experimentally (Kohl et. al, 2004; Bernadò & Svergun, 2012) for random (red section) and native (blue section) conformations respectively. ... 90

(23)

23

Figure 5.6 Comparison between the pool generated using EOM 2.0 (represented as

Whiskers box) and the theoretical Rg expectations based on the Flory’s relationship using the parameters R0 = 1.927 and v = 0.598 in the case of chemically denatured proteins (Kohn et al. 2004) (red), and R0 = 2.54 and v = 0.522 for intrinsically disordered proteins (Bernado and Svergun 2012) (blue). ... 91

Figure 5.7 Distributions of end-to-end distances of pools containing 10000 models for

100 and 500 amino acids chains compared with normal distribution with the same mean and standard deviation values. (Tria & al., (in preparation)) ... 92

Figure 5.8 (i) Different views of a multi-domain protein, solved by MX, with 2 subunits

(A, grey; B, yellow) connected by a missing loop 30AA long (red spheres, left area). (ii) Multiple inter-domain linker reconstructions (loops of multiple colors) computed with EOM (upper-right area). (iii) Different view of multiple inter-domain loop reconstructions computed with EOM 2.0 using the possibility to keep each domain fixed in the original 3D coordinates (bottom-right area). ... 93

Figure 5.9 Case study on a flexible multi-domain particle with a long inter-domain region

(122 AA) and a N-terminal tail (31 AA), both disordered. Domains are available as crystal structure showing different oligomerization arrangements: pentamer (N-terminal domain) and monomer (C-(N-terminal domain). The full length protein is observed in solution as pentamer. ... 96

Figure 5.10 Pool reconstruction performed using EOM 2.0 for the case study in Fig. 5.9.

Transparency is used to represent representative multiple conformations present in the pool. (A) Flexible pentameric particle modelled extending to the entire particle the symmetry present in the pentameric core. (B) Flexible pentameric particle modelled as asymmetric with symmetry present only in the core. ... 97

Figure 5.11 Graphical representation of the strategy used by EOM 2.0 to generate

(24)

24

study in Fig. 5.9 with the oligomerization interface (β-sheet in the dashed red box) and a Cα-Cα atoms distance (10Å). (B) Symmetry generation strategy performed moving the domain at the center of the coordinates in order to apply symmetry operations according to the oligomerization interface defined. ... 98

Figure 5.12 Selection of symmetric models generated using the same interface (as in Fig.

5.11 (A)) but different Cα-Cα atoms distance (20Å vs. 10Å). Smaller distances (lower dashed box) results in limited difference in the oligomerization such that its effect is similar to a MD approach. ... 99

Figure 5.13 Graphical representation of some case studies involving very complicated

particles that are now possible to model with an ensemble approach by using EOM 2.0. (A) Multi-domain particle with p62 symmetry reconstruction through a user defined oligomerization interface. (B) Big virus particle (~2MDa) composed by 2 domains (available at atomic resolution) connected by a long disordered region. ... 100

Figure 5.14 Comparison of the entropy calculated for four different distributions over the

same interval of values. The distributions are: Gaussian (black), Gaussian with low variance (orange), Uniform (cyan), and Single interval (red). ... 102

Figure 5.15 Comparison of the flexibility calculated using the metric Rflex for two

theoretical distributions (Gaussian, black; bimodal, red) computed over the same interval of values. ... 103

Figure 5.16 Measures of flexibility. (A) Pool (black), EOM(1) (purple), EOM(2)

(orange), EOM(3) (pink), and EOM(4) (dark green) represent potential output of EOM 2.0 in terms of Rg distributions whereas Uniform (red), Compact (lightblue), Bimodal (cyan) represent theoretical cases of maximal flexibility, rigidity and potential artifact, respectively. (B) Rflex measured over all the distributions in A. (C) All Rflex values compared to the value calculated for the pool (~89%, threshold to define randomness) and associated to RCheck in order to detect potential artifacts. ... 105

(25)

25

Figure 5.17 Graphical representation of the workflow in the genetic algorithm of EOM.

Starting from a pool of curves (genes), ensembles (chromosomes) are first composed and then mutated and then crossed. At each generation, the best m ensembles are selected and used for the next generation where the process of selection, mutation and crossing is repeated. The ensemble that better resists (minimizes discrepancy) through all the generations is presented as final solution. ... 107

Figure 5.18 Global picture of a SAXS analysis for a flexible mutated Collybistin (Soykan

et al., submitted). The particle is composed of two domains SH3 and DH/PH connected by an inter-domain disordered region (33 AA). Further disordered regions are also present: N-terminal tail (8 AA) and C-terminal tail (20 AA). (A) SAXS overall parameter analysis and ensemble fits computed using a predefined (=20) ensemble size (EOM) and a flexible ensemble size (EOM 2.0). (B) Ensembles distribution results with the measure of flexibility according to the metrics Rflex (~90% for both) and Rcheck (~1.2 and ~1.1). (C) Composition of the optimized ensemble solution provided by EOM 2.0. (D) Graphical representation of the flexibility with the N-terminal SH3 domain (several color) in different conformations and the C-terminal DH/PH (darkblue) shown as cartoon. Disordered regions are in transparency. ... 109

Figure 5.19 Dependence of the relative error of Rg determination by EOM 2.0 on level

of noise. ... 110

Figure 5.20 EOM 2.0 analysis for a user project at DORIS X33 (EMBL Hamburg).

(A-B) Graphical description of the particle as in Fig. 5.9. (C) Representation of the different observed measuring the same particle at different ionic strengths (low and high) using the classical distribution comparison as well as the distribution descriptors and the metrics Rflex and Rcheck introduced above... 112

Figure 6.1 SAXS ensemble analysis on gephyrin. (A) Domain architecture of gephyrin

(26)

26

domain (332-750) present as dimer. A predicted disordered inter-domain linker is also present (182-332). (B) EOM 2.0 ensemble solution scattering compared with the experimental curve (χ=1.00). (C) EOM 2.0 Rg distribution of the initial random pool (red dashed line) and the selected ensemble (different grey shades) for full-length gephyrin. The broad distribution mirrors the conformational heterogeneity of the sample; however, there is a slight preference for compact states as indicated by the larger area under the curve for this fraction when compared with intermediate and extended conformers. (D) Graphical representation of the trimeric models composing the final ensemble for gephyrin (coloured differently). (E) Representation of the proposed hexagonal lattice relying on the simultaneous use of GephG trimers (G3; blue) and GephE dimers (E2; one protomer in red and the other in salmon). This assembly, which could be continued multiple times, is thought to build a bridge between bound glycine or GABAA receptors and the cytoskeleton. ... 117

Figure 6.2 (A) Different views of the full length enzyme (GAC wild type construct) with

the N-terminus loop (green) and the C-terminus loop (cyan) highlighted. The cartoon representation is used to highlight regions observed at atomic resolution. (B) Different views of the dimeric GAC as observed in the crystal structure (PDB id 3ss3) with the linkers reconstruction by EOM 2.0. (C) Tentative tetrameric GACWT arrangement. The rigid core was performed using the program SASREFMX (ATSAS package) by a combination of the atomic resolution structure (PDB id 3ss3) and the GACWT concentration series data. The disordered linkers were finally reconstructed using EOM 2.0. (D) Tentative octameric GACWT arrangement. The core was initially generated by EOM 2.0 using the dimeric atomic resolution structure (PDB id 3ss3) and then refined using the MASSHA (ATSAS package). The disordered linkers were finally reconstructed using EOM 2.0. (E) Concentrations screen analysis for GACWT and GACc constructs. Rg distributions corresponding of the pool (composed of

(27)

27

dimers, tetramers and octamers, dashed green) compared to the Rg distributions (several other colors) extracted at different solution concentration by EOM 2.0. ... 120

Figure 6.3 Structural solution studies of the CT/CaM complex. (A) depicts a tentative

molecular model for the CT/CaM complex and includes: the proximal CT/CaM model as based on the crystal structure (CT grey, CaM red), a model for the helix C coiled coil module (labeled), and the crystal structure of the helix D module (labeled). C4 molecular symmetry has been imposed. (B) Shows the experimental SAXS data and the calculated p(r) function in the inset. The blue solid line denotes scattering from the tentative model in panel A, while the red solid line denotes intensity calculated for the EOM 2.0 selected model seen in panel D, against the experimental data. (C) Ab-initio shape of the CT/CaM complex calculated and averaged using the programs DAMMIF/DAMAVER. (D) EOM 2.0 selected model with the proximal CT/CaM modules and the helix D module both twisted with respect to helix C. (E) Depicts a superposition between the EOM 2.0 model and the ab initio DAMMIF model (aligned by SUPCOMB). ... 123

(28)

28

List of tables

Table 3.1 Schematic representation of the NSD values between all the pairs of models

present in Fig. 3.7. The averaged <NSD> value tending to 1 suggests the low shape stability of the sample. ... 61

Table 3.2 Shape factors to identify the anisometry of the particle once Rg and Rh are

known. ... 64

Table 3.3 List of servers for bioinformatics prediction of disorder. The list is present on

the webpage (http://www.idpbynmr.eu/home/science/research-tools.html) of the Marie Curie project IDPbyNMR. ... 66

(29)

29

1 Introduction

After being originally introduced in the field of material science, small-angle scattering (SAS) has rapidly found a fertile environment in the area of structural biology. The first SAS experiment on biological samples is in fact backdated to the 1950s and was performed on easily purified proteins such as haemoglobin and ovalbumin (Ritland, Kaesberg, and Beeman 1950). By that time, the interpretation of SAS data was limited and restricted to the determination of simple parameters such as radius of gyration (Schmidt 1962). Consequently, the growing interest for the field together with the increase in available computational power made the development of more and more accurate analysis techniques for the extrapolation of shape related information possible (Feigin and Svergun 1987). A further remarkable step forward has however been taken in the last two decades with the advances in high-flux synchrotron beamlines and neutron facilities dedicated for biological investigations. In addition, modern photon counting detectors and novel collection strategies as well as automated sample chambers have significantly improved the quality of SAS measurements and reduced the time for data collection from days to milliseconds (Hura et al. 2009, Round et al. 2008, Teixeira et al. 2008, Toft et al. 2008). These important progresses have allowed to advance the technique even more such that small-angle scattering of X-rays (SAXS) and neutrons (SANS) became powerful and robust techniques for the structural characterization of biological macromolecules. At a resolution level of a few nm, both, SAXS and SANS, allow the study of macromolecules in environmental conditions ranging from almost native-like to highly denatured, and from a few kilodaltons to gigadaltons in molecular mass (Svergun et al. 2013). This makes both techniques particularly suitable for the detection of conformational rearrangements upon changes in the environment as the experiments are usually made on particles dissolved in close to physiological solutions and no crystals are needed. Although conceptually similar, SAXS and SANS have substantial differences with respect to some basic details. In SAXS, X-rays are scattered by the electron clouds surrounding each atom. Accordingly, the scattering is proportional to the electron density of each atom within the molecule. Neutrons, on the other hand, as neutral particles have no associated electric field. As such they penetrate the electron cloud and are scattered by atomic nuclei. These differences may lead to different information content in the experimental data such that SAXS and SANS are often viewed as complementary techniques. Further SAXS is more widespread and popular thanks to the broader

(30)

30

availability of the experimental facilities (laboratory X-ray cameras and synchrotron beamlines, see Fig. 1.1) as compared to neutron facilities. Although some of the concepts can be shared by the approaches, this dissertation is entirely devoted to SAXS, as the described methodologies still have to be adapted for SANS.

Figure 1.1 World map of SAXS beamlines either dedicated or actively employed for biological

solution SAXS. Extracted with permission from (Graewert and Svergun 2013) © Current Opinion in Structural Biology

The increase in popularity that SAXS in solution has registered year by year is also due to the continuous development of publicly available computer routines and user friendly software for data collection and interpretation. This has allowed the technique to be spread throughout the biological community and a high number of biologists is now attracted by the potential of SAXS in solution. The

ascendant trend in the number of publications referring to biological SAXS over the last decade clearly reflects this (Fig. 1.2). In that respect, the program package ATSAS (Petoukhov et al. 2012, Konarev et al. 2006) appears to be the most widely used public SAXS software worldwide. The ATSAS

package is a comprehensive suite of tools for data manipulation and interpretation which is developed and maintained by the BioSAXS group at EMBL Hamburg (European Molecular Biology Laboratory, Hamburg Outstation) – a worldwide recognized centre of excellence in the field of SAXS in solutions on macromolecules with an advanced high brilliance synchrotron SAXS dedicated beamline (Fig. 1.3).

As the number of questions that can now be answered with SAXS in solution is growing

Figure 1.2 Number of publications

referring to biological SAXS in solution over the last decades.

(31)

31

rapidly, its application to biology already opened and continues to open new applications totally unimaginable until few years ago. In particular, SAXS is now employed for the analysis of flexible systems including multi-domain proteins with flexible linkers and further extremely challenging objects like intrinsically disordered proteins (IDPs). IDPs are highly unlikely to be crystallized such that very little is known about their structural organization. As it will be extensively discussed later, the new possibilities offered by SAXS have contributed to make obsolete the concept of closed world, with regard to the disordered regions, for which the lack of structural knowledge has often let them to be labelled as functionally irrelevant. Moreover, the importance of SAXS as technique for the study of potentially flexible proteins is also demonstrated by the consideration that SAXS in solution received in recent European research programs for structural biology such as WeNMR1 and IDPbyNMR2. The aim of these projects is to characterize flexible particles through the hybrid use of complementary techniques including Nuclear Magnetic Resonance (NMR), SAXS, and bioinformatics.

EMBL SAXS beamline P12 (PETRA III@DESY

3rd generation ring)

EMBL SAXS beamline X33 (DORIS@DESY

2nd generation ring)

(end of operation, October 22nd 2012)

Figure 1.3 SAXS dedicated beamlines at EMBL Hamburg. (top) P12 beamline currently

in operation at the 3rd generation ring PETRA III (c/o DESY, Hamburg, Germany). (bottom) Archive picture of X33 beamline at the 2nd generation ring DORIS (c/o DESY, Hamburg, Germany) shut down in 2012 and replaced by P12. Both the pictures are

1 WeNMR is an e-Infrastructure project funded under the 7th framework of the EU. Contract no. 261572 2 IDPbyNMR is a Marie Curie activity funded under the 7th framework of the EU. Contract no. 264257

(32)

32

available at the official website of EMBL BioSAXS group (http://www.embl-hamburg.de/biosaxs/).

The major challenge of studying flexible particles lies in the fact that they cannot be represented by a single model generation. Until recently, in fact, it was only possible to qualitatively distinguish between globular and unfolded states of the macromolecules from the SAXS data. The situation has significantly changed with the advent of methods utilizing an ensemble approach. In the latter, flexibility is described through a set of models (different conformations of the same particle) such that quantitative information about motion can be extracted from the scattering data. Several tools have been developed for SAXS ensemble analysis and the results of these analyses are often complemented by other methods providing structural information on such systems (methods like the above mentioned NMR but also Macromolecular Crystallography (MX), circular dichroism (CD), calorimetry, structure prediction, etc). This topic will be extensively discussed, in theory and practice, in the present dissertation. Consequently, a recently installed databank dedicated for disordered proteins, DisProt (Sickmeier et al. 2007), has confirmed the growing interest in this new direction of research. In DisProt fully and partially disordered proteins can be deposited – in a similar manner as PDBe. It currently counts nearly 700 proteins deposited with more than 1500 disordered regions characterized (January 2014).

With the extensive use of tools for the characterization of flexibility, several limitations restricting the use of these techniques for the ensemble analysis of some systems have become evident. Mainly due to the challenge of modeling, the tools currently available are not applicable to characterize complicated scenarios where disorder is combined with other structural features (i.e., point group symmetry). The present work is devoted to overcoming of these limitations upon strong requests by the biological community.

In this dissertation, an enhanced ensemble approach that makes SAXS in solution suitable for detailed characterization of complicated flexible systems will be presented. The work is structured as follows: first, an introduction to the theoretical and experimental basics of SAXS in solution is given (Chapter 2). This introduction is necessary in order to understand the power of the technique as well as the new concepts developed later in the dissertation. Next, an overview of the crucial aspects of the detection of flexibility based on SAXS data (Chapter 3) is portrayed followed by some practical applications extracted

(33)

33

from the literature where ensemble analyses have been employed (Chapter 4). The aim here is to highlight the capabilities of an ensemble approach. New developments will be then described in details (Chapter 5) followed by a description of practical cases that benefited greatly from the implementations of the enhanced ensemble approach (Chapter 6). Final remarks and conclusions are given at the end of the dissertation (Chapter 7).

All the work presented in this dissertation was performed at the BioSAXS group (EMBL Hamburg) under the supervision of: (i) Dr. Dmitri Svergun, head of the BioSAXS group and direct supervisor of the work, (ii) Prof. Dr. Andrew Torda (Zentrum für Bioinformatik (ZBH), University of Hamburg) as an academic supervisor, (iii) Dr. Victor Lamzin (EMBL Hamburg) and (iv) Dr. Gerard Kleywegt (EMBL-EBI), additional members of an EMBL Thesis Advisory Committee established according to the rules of the EMBL International PhD Programme.

(34)

34

2 SAXS basics

Small-angle X-ray scattering (SAXS) is nowadays an essential biophysical tool for structural investigation on macromolecules in solution. At a low (nanometer-scale) resolution it allows one to study macromolecules in terms of shape, conformation, oligomerization as well as the folding state (structural disorder vs. compactness). No size limitations are present such that protein-protein interactions as well as protein-nucleic acids complexes can also be investigated. Since no crystals are needed, macromolecules can be assessed in near physiological conditions and conformational changes can be studied as response to changes in the environment (i.e. of the solution). Although discovered already in the late 1930’s, SAXS on macromolecules became very popular only in the last decades thanks to the remarkable progress in data collection and analysis. The increasing availability of high-flux third generation synchrotron radiation sources as well as in-house cameras (see Introduction) allow one to retrieve rich structural information, especially when SAXS is combined with other structural techniques. Thanks also to the progress in computational methods over the last years SAXS in solution became a powerful technique complementary to the higher resolution methods such as Macromolecular Crystallography (MX) and Nuclear Magnetic Resonance (NMR). This progress made it necessary to develop hybrid approaches for characterization of macromolecules. A number of studies have also shown the high synergy that SAXS in solution has with regard to biophysical techniques such as Fluorescence resonance energy transfer (FRET), Circular Dichroism (CD), Analytical ultracentrifuge (AUC), Electron Microscopy (EM), to mention a few only.

In this chapter the basics of SAXS in solution will be presented with particular focus on what kind of information can be extracted when SAXS in solution is applied to biological macromolecules.

(35)

35

2.1 Scattering by macromolecules in solution and data processing

Small angle X-ray scattering is based on the elastic scattering of photons by electrons within macromolecules. When an object gets irradiated by a monochromatic beam at a certain wavelength λ, it generates a secondary out-coming wave (Vachette and Svergun 2000). In the case of SAXS on macromolecules in solution, the signal is the excess scattering length density Δρ(r) = ρ(r) – ρs between the scattering length of an element at position r within the particle (ρ(r)) and that of the solvent (ρs) where the particle is sited in (Feigin and Svergun 1987). The scattering amplitude of the macromolecule is a Fourier transformation of the excess scattering length density:

 

 

   

V

dr

isr

r

r

s

A

(

)

exp

(Equation 2.1)

where the integration is performed over the particle volume. In solution the particles are usually randomly distributed and the intensity of the entire ensemble of monodisperse particles is a continuous isotropic function proportional to the scattering from a single particle averaged over all orientations (Ω) (Svergun and Koch 2003):

 

   

s

A

s

A

s

I

(Equation 2.2)

where the momentum transfer s = k1 – k0 (no energy transfer) is a function of the angle between the monochromatic incident beam k0 and the scattered beam k1, namely scattering angle (2θ) (s = 4πsin(θ)/λ) (Fig. 2.1). The maximum s value where scattering intensities can be recorded determines the nominal resolution of the experiment as d = 2π/smax (Koch, Vachette, and Svergun 2003) (note that the momentum transfer s is also denoted in the literature as q). The scattered photons are collected as a 2D image on a detector and presented as a radially averaged 1D curve I(s) (often in semi-logarithmic scale). In a classical SAXS measurement on a macromolecule the intensity curve I(s) for the particle as well as for the solvent are collected. The difference between the two curves represents the SAXS pattern for the particle in solution (Fig. 2.2).

(36)

36

Figure 2.1 Schematic representation of a standard SAXS in solution measurement.

Figure 2.2 Small angle X-ray scattering of Bovine Serum Albumin (PDBe id 3V03) measured

at X33 (DORIS, EMBL Hamburg). (black) Scattering patterns from 5 mg/ml of solution BSA in 50 mM HEPES, pH 7.5, (red) solvent scattering and (blue) the difference curve containing the contribution from the protein alone (usually presented in a semi-logarithmic scale). The figure also highlights the information content of the experimental curve that can be summarized as information about shape at low angle and secondary elements at high angle, respectively.

(37)

37

The experimental scattering is a product of the scattering from the particle (called form factor) and the term arising from the inter-particles interaction (called structure factor) (Svergun and Koch 2003). For a successful SAXS measurement, the sample must be monodisperse (≥ 90%) while the interactions are minimized as much as possible. If inter-particle interactions are observed, multiple scattering patterns from different sample dilutions are usually recorded in order to extrapolate the scattering to infinite dilution (ideal solution). Upon this procedure, the scattering of the entire system is proportional to the scattering of a single particle averaged over all the directions (Eq. 2.2).

Contrary to what is observed in crystallography where macromolecules are packed and regularly positioned having correlated orientations, during a SAXS measurement no reflections are observed and the information about particle orientation is lost. However, information about the macromolecules is still present in its scattering pattern, and low resolution structural parameters can be estimated from the experimental data directly, without any model assumptions. The interatomic distances distribution p(r), also called pair-wise distance distribution, is obtained as an inverse Fourier transformation of the scattering pattern I(s):

 

 

 

 

   

0 2 2 2 1

sin

2

sr

dr

sr

s

I

s

r

s

I

r

p

(Equation 2.3)

and describes the probable frequency of interatomic distance r within a particle, in the real space (Feigin and Svergun 1987). The value of r beyond which p(r) is equal to zero is the maximum particle diameter (Dmax) and can be estimated iteratively using programs such as GNOM (Semenyuk and Svergun 1991) and ITP (Glatter 1977). Accordingly,

 

max

   

0

sin

D

dr

sr

sr

r

p

s

I

(Equation 2.4)

is the representation of the distance distribution function back into the reciprocal space. Estimation of the maximum distance is a non-trivial task helped by the fact that p(r) function must be smoothly tending to zero while approaching Dmax (p’(Dmax ) = 0) and negative values are, at least for protein solutions, not allowed. In addition to Dmax, further parameters can be derived from the raw data and they include: radius of gyration (Rg),

(38)

38

molecular mass (MM), and hydrated particle volume (Vp).

The Rg corresponds to the root-mean-squared distance of the scattering elements from the center of mass weighted by their scattering length and provides information about the mass distribution within a particle. Objects with same mass but with different shapes have different Rg values such that Rg represents an important indicator of the conformation of the particle. Developed in the late 1930s for material science, the Guinier approximation (Guinier 1939) is still the most straightforward method to extract the Rg and the forward scattering intensity, I(0) (also called intensity at zero angle). Following the Guinier approximation, at very small angles the scattered intensity can be approximated as

   

 







3

exp

0

2 g

sR

I

s

I

(Equation 2.5)

The I(0) and Rg values can be directly estimated from the scattering intensity by finding a linear region in the Guinier plot [ln (I(s)) vs. s2] in order to extract slope and y-axis intercept, respectively, as shown in Fig. 2.3. For SAXS on macromolecules supper limit < 1.3/Rg works fine for rather globular particles whereas smaller values of supper limit must be considered while analysing elongated shapes. A non-linear Guinier region is usually considered as an indicator of poor sample quality.

Figure 2.3 Standard plots for characterization by SAXS. SAXS curves (A) and Guinier plots

(39)

39

condition showing (1) aggregation, (2) good quality data and (3) inter-particle repulsion. The Guinier fits for estimation of Rg and I(0) are displayed, with the linear regions defining smin and

smax used for parameter estimation indicated by the thick lines. Extracted with permission3 from

(Mertens and Svergun 2010) © Journal of Structural Biology

An alternative approach that uses the entire scattering pattern estimates Rg as the second moment of the pair-distance distribution p(r) (Feigin and Svergun 1987). This latter method tends to be more precise and in a combined approach with the Guinier approximation represents a good proof of the data quality/consistency.

The experimental values of I(0) are proportional to the concentration and, for the given atomic composition, to the molecular mass (MM) of the particles. Consequently, the MM can be estimated from the experimental data normalized by the solute concentration directly when compared with a scattering profile of a standard particle that has previously been characterized. In a classical SAXS experiment a standard particle (e.g., Bovine Serum Albumin (BSA), PDBe code 3V03 and MM = ~66kDa) is usually measured first and the forward scattering Iparticle(0) of that particle is then used to estimate the MM of the particle using the Eq. 2.6:

 

 

BSA BSA particle particle

I

MM

I

MM

0

0

(Equation 2.6)

where MMBSA and I(0)BSA are the molecular mass and the forward scattering, respectively, of the standard particle. However, although very straightforward, this method is strongly dependent from the solute concentration and the latter must be accurately measured. In case of an incorrect concentration, the normalization computed during the data pre-processing step would lead to an incorrect I(0) and thus MM estimate (Mylonas and Svergun 2007).

There is, however, an alternative method for the estimation of the MM from the SAXS data utilizing the hydrated particle volume (Vp). The latter is estimated from the scattering profile using the Porod’s equation (Porod 1982):

(40)

40

 

Q

I

V

p

2

2

0

/

,

Q

s

s

I

 

s

K

ds

s

max min 2 (Equation 2.7)

where Q is the so-called Porod invariant and K is a constant subtracted to ensure the asymptotical intensity decay proportional to s-4 at higher angles (Glatter and Kratky 1982). The portion of the scattering data up to smax ≈ 8/Rg has been empirically found as an optimal range for a reliable computation of Vp and in most cases it represents the second minimum of the scattering profile plotted in the Porod plot [I(s)s4 vs. s]. The hydrated particle volume (Vp) allows an estimation of the MM independently from the solute concentration. For globular proteins the MM is computed as MM[kDa] ≈ Vp[nm3]/~1.6 (with an accuracy of about 20%) (Petoukhov et al. 2012).

2.2 Ab-initio shape determination

As shown above, a SAXS pattern contains information about the overall parameters of the particle, which can be directly extracted from the data. These parameters provide useful guidance about the gross structure of the macromolecule, but modern approaches allow one to obtain more detailed information about the 3D particle shape at low resolution.

Based on the assumptions of monodispersity (see above), a SAXS 3D envelope can be reconstructed (at a resolution of 1-2 nm) starting from a 1D isotropic pattern (Svergun et al. 2013). Already from the late 1960’s algorithms for shape determination were developed (Stuhrmann 1970b) but a new generation of modelling techniques utilizing vastly improved capabilities of modern computers became available starting from 1990-ies [DAMMIN (Svergun 1999); DALAI_GA (Chacon et al. 2000); SAXS3D (Walther, Cohen, and Doniach 2000); GA_STRUCT (Heller et al. 2002); DAMMIF (Franke and Svergun 2009)]. The most advanced and rapid methods utilize so-called spherical harmonics expansion, where I(s) is considered as a sum of independent contributions from the substructures corresponding to different spherical harmonics according to the Eq. 2.8

 

 

 

 

0 2 2

2

l l l m lm

s

A

s

I

(Equation 2.8)

(41)

41

where Alm(s) is the partial scattering amplitudes (Stuhrmann 1970a, Svergun et al. 1996) computed as the Hankel transforms from the radial functions

 

   

0 2

2

dr

r

sr

j

r

i

s

A

lm l

lm l

(Equation 2.9)

where the jl(sr) are the spherical Bessel functions (Stuhrmann 1970b).

DAMMIN (Dummy Atom Model Minimisation) is the milestone among the ab initio programs that make use of spherical harmonics expansion. Inside to a constrained (usually spherical) search volume, with a maximum diameter defined by the experimentally determined Dmax (Eq. 2.3), the tool represents the particle as a collection of M (>>1) densely packed beads. Each bead is randomly assigned to the solvent (index = 0) or solute (index = 1), and the particle structure is described by a binary string X of length M (Svergun 1999). Theoretical scattering from a low resolution 3D envelope, represented as densely packed bead model, can therefore be analytically computed in a very rapid manner (Svergun, Petoukhov, and Koch 2001) and compared (χ) to the experimental curve. The shape modelling is hence conducted by simulated annealing (SA) (Kirkpatrick, Gelatt, and Vecci 1983) starting from a random initial approximation and minimizing the discrepancy according to the Eq. 2.10.

   

 

j j j j

s

s

cI

s

I

N

2 exp 2

1

1

(Equation 2.10)

where N is the number of experimental points, Iexp(sj) and I(sj) the experimental and theoretical scattering respectively, c is a scaling factor, and σ(sj) the associated standard deviation (Feigin and Svergun 1987). At present, shape determination algorithms are even included in an automated pipeline for high-throughput data processing and analysis (Franke, Kikhney, and Svergun 2012). In this pipeline, the program DAMMIF (a faster version of DAMMIN, ~30 times faster) is ran automatically and envelope reconstructions are available within about a minute after the measurement.

Referenzen

ÄHNLICHE DOKUMENTE

Combining small angle x-ray and neutron scattering data and utilizing the contrast variation technique our group tries to find a more meaningful model for the phospholipid

Crystallization of these hybrid particles was observed over a broad range of particle concentrations at (and below) room temperature.. Upon an increase in temperature, the

With ZnO additions the V–O coordination number decreases from 4.4 in vitreous V 2 O 5 to 4.0 in the metavanadate glass where the strongest decrease of the fraction of VO 5 units

Samples retrieved for SAXS following the manufacture of the flexible pipes without supercritical exposure, have retained their isotropic structure, in other

After removing the device from the master, holes were punched into the in- and outlets with a hole-puncher (Harris Unicore, diameter 0.5 mm).. The PDMS was then

These developments caused significant increase in number of applica- tions of both small angle X-ray scattering (SAXS) and small angle neutron scattering (SANS) for

First of all, as observed by SAXS, the strong scattering intensity of 1c solutions at low q range implies that the investigated coil-ring-coil block copolymers form suprastructural

Keywords: saponin; aescin; critical micelle concentration (cmc); autofluorescence; small-angle X-ray scattering (SAXS); transmission electron microscopy (TEM); micelle