Maximum likelihood trajectories from single molecule fluorescence resonance energy transfer experiments
Gunnar F. Schro¨der and Helmut Grubmu¨llera)
Theoretical Molecular Biophysics Group, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Go¨ttingen, Germany
共Received 4 June 2003; accepted 14 August 2003兲
Single molecule fluorescence resonance energy transfer 共FRET兲 experiments are a powerful and versatile tool for studying conformational motions of single biomolecules. However, the small number of recorded photons typically limits the achieved time resolution. We develop a maximum likelihood theory that uses the full information of the recorded photon arrival times to reconstruct nanometer distance trajectories. In contrast to the conventional, intensity-based approach, our maximum likelihood approach does not suffer from biased a priori distance distributions.
Furthermore, by providing probability distributions for the distance, the theory also yields rigorous error bounds. Applied to a burst of 230 photons obtained from a FRET dye pair site-specifically linked to the neural fusion protein syntaxin-1a, the theory enables one to distinguish time-resolved details of millisecond fluctuations from shot noise. From cross validation, an effective diffusion coefficient is also determined from the FRET data. © 2003 American Institute of Physics.
关DOI: 10.1063/1.1616511兴
I. INTRODUCTION
Fluorescence resonance energy transfer 共FRET兲 mea- surements allow one to determine the distance between two dyes at a nanometer scale.1–3 In a typical set-up 共Fig. 1兲, information on the structure of a biomolecule such as DNA or a protein is obtained from a pair of FRET dyes, a donor and an acceptor, which are covalently attached at defined positions to the biomolecule. After excitation of the donor, and depending on the distance and relative orientation be- tween the two dyes, energy is transferred to the acceptor by the Fo¨rster mechanism.1 Thus, by measuring donor and ac- ceptor fluorescence intensities, IDand IA, the distance r be- tween the two dyes is obtained, usually via
IA
IA⫹ID⫽ 1
1⫹
冉
rr0冊
6, 共1兲where r0 is the dye-specific effective Fo¨rster radius, which also includes 共averaged兲 dye orientation effects.2 This ap- proach is valid if the relative dye rotations are faster than the lifetime of the excited state of the donor, which is usually the case.
Recently, time-resolved FRET experiments have ma- tured to a level that allows one to record arrival times of individual photons from single molecules.4 –11 From the ar- rival times, fluorescence intensity variations, ID(t) and IA(t), are obtained,10,12,13which, using Eq.共1兲, allow one to track distance changes r(t) between the two dyes, and hence to monitor conformational motions of the studied biomolecule.12,13
In the conventional analysis, the required FRET intensi- ties are computed from photon counts in time windows8,10 共cf. also Ref. 14兲. For a typical window size of 1 ms, how- ever, the small number of only 10–50 photons per window10 implies considerable statistical uncertainty 共‘‘shot noise’’15兲 and thus limits the time resolution for r(t). Furthermore, the choice of the window size is somewhat arbitrary and only guided by the requirement to trade off shot noise and time resolution. Finally, the traditional method saliently assumes a uniform a priori probability for the FRET intensities 共rather than for the distances兲. Therefore, and contrary to what one might intuitively assume at first sight, the traditional method cannot be considered a model-free approach. Rather, because the distance r depends nonlinearly on the intensities, Eq.共1兲, the assumed uniform intensity distribution transforms into a nonuniform distance distribution,
p共r兲⫽
冉
rr0冊
5冋
1⫹冉
rr0冊
6册
2. 共2兲This distribution is centered at the Fo¨rster radius and has a half width of about 13r0, implying preferred distances near r0; it describes the unjustified bias introduced by the con- ventional analysis.
In many cases where only limited or noisy data are avail- able, the maximum-likelihood approach has been success- fully applied.16 –22 In this article, we develop a maximum- likelihood theory to reconstruct r(t) from the photons recorded in single molecule FRET measurements. In particu- lar, we aim at calculating the time-dependent probability dis- tribution P(r,t兩兵ti
D,tiA其) for the distance r during a measure- ment of length ⌬T, given that nD photons from the donor dye have been recorded at times tiD, i⫽1,...,nD, and nA
a兲Phone:⫹⫹49-551-201-1763; Fax:⫹⫹49-551-201-1089; Electronic mail:
hgrubmu@gwdg.de
9920
0021-9606/2003/119(18)/9920/5/$20.00 © 2003 American Institute of Physics
acceptor photons at times tiA, i⫽1,...,nA. Finally, we will extract an effective diffusion coefficient for the biomolecular motion from the FRET data.
II. THEORY
To that aim, in a first step we consider a statistical en- semble of distance trajectories,兵r(t)其, and compute for each full trajectory the conditional probability P关r(t)兩兵ti
D,tiA其兴 that r(t) is realized for the given photon registration times.
Assuming Bayesian statistics, this probability is given by the a priori probability P关r(t)兴 for each trajectory and the con- ditional probability that the nA⫹nDphotons are observed at the measured time instances for given trajectory,
P关r共t兲兩兵ti
D,tiA其兴⬀P关r共t兲兴P关兵ti
D,tiA其兩r共t兲兴. 共3兲 To evaluate these two distributions, the time interval⌬T is discretized into N bins 关j⫺1,j兴, j⫽1,...,N, and subse- quently N→⬁ is considered. The time discretization ªj
⫺j⫺1⫽⌬T/N is always chosen fine enough such that not more than one photon per interval 关j⫺1,j兴 is recorded.
For a discretized trajectory r1,...,rN, where rj is the distance at time 12(j⫺1⫹j), the conditional probability to observe the recorded photon pattern E1,...,ENis
P关E1,...,EN兩r1,...,rN兴⫽nA⫹nD
兿
j⫽N1 fj, 共4兲where the probabilities fj are chosen according to which of the three possible events Ej 关donor-photon is recorded共D兲, acceptor-photon is recorded 共A兲, or no photon is recorded 共0兲兴occurs during关j⫺1,j兴,
fj⫽
再
II关DA1共共⫺rrjj兲关兲关ID11共⫺⫺rj兲兴关IIDA1共共rr⫺jj兲兴兲兴IA共rfor D,for A,j兲兴 for 0. 共5兲Here, IA(rj) and ID(rj) are specified from Eq.共1兲, and the required 共average兲 total intensity I0⫽IA(t)⫹ID(t)⫽(nA
⫹nD)/⌬T is estimated from the recorded number of pho- tons. Note that for the nD⫹nA events D and A, the fj denote probability densities, which have to be scaled byto obtain the desired probabilities, hence the prefactor in Eq. 共4兲.
For the a priori probability P关r(t)兴
⬀limN→⬁P关r1,...,rN兴, we assume that r(t) results from a one-dimensional diffusion process with effective diffusion
coefficient D. This is realistic, e.g., for the overdamped mil- lisecond opening and closure domain motions of the solvated macromolecule at hand.10 The discretized version is a ran- dom walk with transition probabilities
gj⫹1兩j⬀ 1
冑
4Dexp冋
⫺共rj⫹4D1⫺rj兲2册
. 共6兲Note that this implies that all possible distances are assigned equal a priori probabilities, which is reasonable if the energy landscape that governs the distance distribution is unknown.
If there is additional information on the energy landscape, this can be incorporated into gj⫹1兩j in a Smoluchowsky-type generalization. Note also that two or three dimensional dif- fusion of the dyes can be described in a similar manner by an appropriate effective energy landscape that accounts for the projection of the higher-dimensional diffusion onto the one- dimensional distance coordinate r(t). Thus, P关r1,...,rN兴
⫽⌸j⫽2
N gj兩j⫺1, and Eq.共3兲reads
P关r1,...,rN兩兵ti
D,tiA其兴⬀f1
兿
j⫽N2 gj兩j⫺1fj. 共7兲In a second step the probability distribution for the dis- tance rk at times (k⫺1⫹k)/2 is calculated by integration over all other distances,
P共rk兩兵ti
D,tiA其兲⬀
冕
¯冕
dr1¯drk⫺1drk⫹1¯drNP关r1,...,rN兩兵ti
D,tiA其兴. 共8兲 Using Eq. 共7兲and rearranging integrals, one obtains
P共rk兩兵ti
D,tiA其兲⬀LkfkRk 共9兲
with
Lk⫽
冕
drk⫺1gk兩k⫺1fk⫺1冕
drk⫺2¯冕
dr1g2兩1f1,共10兲 Rk⫽
冕
drk⫹1gk⫹1兩kfk⫹1冕
drk⫹2¯冕
drNgN兩N⫺1fN.The above two equations obey the recursion relations Lk⫽
冕
drk⫺1gk兩k⫺1fk⫺1Lk⫺1,共11兲 Rk⫽
冕
drk⫹1gk⫹1兩kfk⫹1Rk⫹1,which, in the continuum limit 共i.e., →0, j→t, and rk
→r), transform into forward and backward Schro¨dinger-type equations that resemble generalized diffusion equations for Lk→L(r,t) and Rk→R(r,t),
tL共r,t兲⫽lim
→0
兵r
2关共1⫹F共r,t兲兲L共r,t兲兴
⫹关F共r,t兲⫹F共r,t兲兴L共r,t兲,
tR共r,t兲⫽⫺lim
→0
兵r
2关共1⫹F共r,t兲兲R共r,t兲兴
⫹关F共r,t兲⫹F共r,t兲兴R共r,t兲其 共12兲
FIG. 1. Typical single molecule FRET experiment. A donor and an acceptor dye molecule are attached to a protein that exhibits conformational dynam- ics. By probing the interdye distance trajectory r(t), measurement of the FRET efficiency provides time-resolved information on the dynamics of the studied protein共arrows兲.
where, to ensure convergence, fk has been written in the form fk⫽1⫹F(r,t). For the derivation of Eqs. 共12兲, the recursion relations Eqs.共11兲have been expanded inup to first order, using gk兩k⫺1⫽Drk⫺1
2 gk兩k⫺1⫽Drk
2gk兩k⫺1, and partial integration in r, noting that L(r,t) and R(r,t) as well as their derivatives with respect to r vanish for r→⫾⬁.
Solving Eqs.共12兲yields, after normalization, the desired probability distribution to find the distance r at time t,
P共r,t兩兵ti
D,tiA其兲⬀L共r,t兲关1⫹F共r,t兲兴R共r,t兲. 共13兲 By combining the three definitions for fj, Eq.共5兲, into one expression using a Gaussian limit representation for the
␦-function,␦(t⫺t⬘)⫽lim→0h(t⫺t⬘), with
h共t⫺t⬘兲⫽ 1
冑
2exp冋
⫺共t⫺2t2⬘兲2册
, 共14兲and neglecting higher orders of, one obtains
F共r,t兲⫽关ID共r兲⫺1兴j
兺
n⫽D1 h共t⫺tDj兲⫹关IA共r兲⫺1兴
兺
jn⫽A1 h共t⫺tAj兲⫺I0. 共15兲With this expression, Eqs.共12兲reads
tL共r,t兲
⫽lim
→0
再 冕
dr⬘g共r⫺r⬘,兲r2⬘冋
L共r⬘,t兲⫻
冉
1⫹关ID共r⬘兲⫺1兴兺
jn⫽D1 h共t⫺tjD兲⫹关IA共r⬘兲⫺1兴⫻j
兺
⫽nA1 h共t⫺tAj兲冊 册
⫹冕
dr⬘g共r⫺r⬘,兲L共r⬘,t兲⫻
冋
ID共r⬘2兲⫺1兺
jn⫽D1 共t⫺tjD兲2h共t⫺tDj兲⫹IA共r⬘2兲⫺1⫻
兺
jn⫽A1 共t⫺tAj兲2h共t⫺tAj兲⫺I0册冎
. 共16兲A similar expression is obtained for R(r,t). For times t, for which no photon arrives, Eq.共16兲simplifies to
tL共r,t兲⫽Dr
2L共r,t兲⫺I0L共r,t兲,
共17兲
tR共r,t兲⫽⫺Dr
2R共r,t兲⫹I0R共r,t兲,
with solutions that propagate in time according to L共r,t兲⫽e⫺I0共t⫺t⬘兲
冕
dr⬘L共r⬘,t⬘兲exp冋
⫺4D共r⫺共tr⫺⬘t兲⬘2兲册
,共18兲 R共r,t兲⫽eI0共t⬘⫺t兲
冕
dr⬘R共r⬘,t⬘兲exp冋
⫺4D共r⫺共tr⬘⫺⬘兲2t兲册
for t⬎t⬘ and t⬍t⬘, respectively. To also include the photon arrival times tj, note that
lim
→0
共t⫺tj兲2h共t⫺tj兲/2
⫽lim
→0
h共t⫺tj兲⫹lim
→0
2t
2h共t⫺tj兲
⫽␦共t⫺tj兲, 共19兲 where the second term is ⬀t
2␦(t⫺tj) and is dropped, be- cause 兰⫺⑀⑀ ␦⬙(x)dx⫽0. This gives rise to additive singulari- ties in Eqs. 共17兲 of the form L(r,t)关(ID(r)⫺1)兴␦(t⫺tj), due to which L(r,t) and R(r,t) exhibit discontinuities at all tj,
lim
t→共tjD兲⫹
L共r,t兲⫽ID共r兲 lim
t→共tDj兲⫺
L共r,t兲,
lim
t→共tjA兲⫹
L共r,t兲⫽IA共r兲 lim
t→共tjA兲⫺
L共r,t兲,
lim
t→共t j D兲⫺
R共r,t兲⫽ID共r兲 lim
t→共t j D兲⫹
R共r,t兲,
lim
t→共tjA兲⫺
R共r,t兲⫽IA共r兲 lim
t→共tjA兲⫹
R共r,t兲.
共20兲
Equations 共18兲 and共20兲 are the main result of this article.
Starting with the boundary condition L(r,0)⫽1, Eqs. 共18兲 and 共20兲, when alternatingly applied, propagate L(r,t) in time from one photon arrival to the next. Similarly, starting from R(r,⌬T)⫽1, R(r,t) is propagated in reverse time di- rection, which, by using Eq. 共13兲, yields P(r,t兩兵ti
D,tiA其) for all times t. Note that, from Eqs. 共20兲, the discontinuities in L(r,t) and R(r,t) cancel in Eq. 共13兲, such that P(r,t兩兵ti
D,tiA其) is nondifferentiable, but continuous also at t
⫽tj.
III. RESULTS AND DISCUSSION
As an example, Figs. 2共b兲–2共d兲show the application of our theory to the 230 photon arrival times 共wedges兲 from a 10 ms single molecule photon burst recorded in a FRET measurement, for which donor and acceptor dyes have been covalently linked to the flexible domains of the neuronal fusion protein syntaxin-1a,10 as sketched in Fig. 1. Three different diffusion coefficients D have been chosen. Each of the three plots shows, gray-shaded, the time dependent dis- tance distribution P(r,t兩兵ti
D,tiA其), together with the average distance共bold兲and 1intervals共dashed兲. As expected from Eq. 共1兲, larger distances are obtained for higher donor and lower acceptor photon intensities. For comparison, Fig. 2共a兲 shows the traditional method, which directly uses Eq. 共1兲 with intensities and error bars evaluated in successive time bins,23 here of 0.5 ms width.
Apparently, the choice of D is critical. For small values, the distance can change only slowly. Therefore, it does not fully reflect the significant intensity fluctuations encoded in the recorded photon arrival times, and rather yields smooth trajectories with small amplitude. For very small values共be- low 0.01⫻10⫺14m2/s), the distance distribution becomes time independent and approaches the distance given by the average intensities 共data not shown兲. Increasing D entails fluctuations of correspondingly increased frequencies. These fluctuations arise from both intensity fluctuations due to ac- tual distance variations and共undesirable兲probability fluctua-
tions due to the broadening of L(r,t) and R(r,t) between subsequent photons. As can be seen from Eqs.共18兲, the latter become relevant for 4D⬎I02, where is the width of P(r,t兩兵ti
D,tiA其). The lower panel in Fig. 2 shows an example for which, due to the large D chosen, the data are apparently overfitted. In between these two limiting cases, an optimal value for D is expected to provide the best description of the data关Fig. 2共c兲兴.
That optimal value was determined by calculating the agreement between the obtained time-dependent distance distribution and the measured photon arrival times as a func- tion of the chosen D. Such type of cross-validation underlies, e.g., the free R value used to assess the accuracy of macro- molecular x-ray structures.24In a similar spirit, one photon k was excluded from the FRET data, and a new distance dis- tribution,
Pk共rk兲⬅Pk共rk,tk兩兵ti
D,tiA,i⫽k其兲 共21兲
was obtained for the arrival time tkof the excluded photon.
Using this distribution, the likelihood Pk(D) for the actually observed photon k was determined for varying D,
Pk共D兲⬀
冕
0⬁
drkPk共rk兲ID/A共rk兲, 共22兲
with ID/Achosen according to the type of the excluded pho- ton. Assuming that for different photons k chosen to be omit- ted, the obtained likelihoods Pk(D) are statistically indepen- dent, one obtains from the maximum of the 共normalized兲 joint likelihoods P(D)⬀⌸kPk(D) 共inset of Fig. 2兲a diffu- sion coefficient D⫽0.2⫻10⫺14m2/s that describes the mea- sured photon arrival times best. In the figure, no scale for P(D) is given to avoid its erroneous interpretation as the 共absolute兲probability that D is the correct diffusion constant.
Clearly, the fewer photons are available, the less infor- mation on r(t) can be obtained. As an extreme case, Fig. 3共a兲 shows the result of our analysis with only every fourth pho- ton from the original data used. As expected, the distance distribution becomes broader, and only some of the features seen in Fig. 2 remain. Yet, despite the very small number of photons used 共58兲, our analysis still reveals a statistically significant distance fluctuation at the 1 level. This finding suggests that a correspondingly improved time resolution can be achieved by our method.
To check whether the width of the calculated distance distribution correctly describes the actual statistical uncer- tainty, we have finally used the average trajectory calculated from the original data关thick line in Fig. 2共c兲兴to create a new 共hypothetical兲set of 230 random photon arrival times obey- ing Eq.共1兲. Thus, for these data, the underlying trajectory is known. From that set, a new distance distribution was recal- culated and compared with the correct trajectory 关Fig. 3共b兲兴.
FIG. 2.共a兲Intensity-based calculation of donor/acceptor distances r(t) from a set of 230 photon arrival times共wedges兲with r0⫽6.5 nm共Ref. 10兲using Eq.共1兲; intensities are obtained from 0.5 ms bins.共b兲–共d兲Time dependent distance probability distributions P(r,t兩兵ti
D,tiA其) 共gray-shaded兲 calculated from the same set for three different diffusion coefficients D. Also shown are average distance trajectories 共bold兲and 1 intervals 共dashed兲. The inset shows the共normalized兲likelihood P(D) as a function of D; three arrows denote the three chosen values for D.
FIG. 3. 共a兲Distance distribution for a reduced set of 58 photons共wedges兲 and D⫽0.2⫻10⫺14m2/s; notation as in Fig. 2.共b兲Recalculated distance distribution 共gray-shaded兲for a hypothetical set of 230 photons共wedges兲 that has been calculated from the original average trajectory in Fig. 2共c兲, also shown in bold here; D⫽0.2⫻10⫺14m2/s. The dashed lines denote the 1interval for the recalculated distance distribution.
As can be seen, most of the correct trajectory 共bold兲 stays within the 1-range of the recalculated distance distribution, thus showing the reliability of our method.
We have developed a theory that enables reconstruction of nanometer distance trajectories from single molecule single photon FRET recordings. In contrast to the commonly used method of window averaging, the full single photon information is used, and rigorous error bounds are obtained.
Furthermore, the method is expected to be robust with re- spect to variation of the excitation intensity I0, e.g., due to diffusion of the particle through the laser focus. In addition, our approach allows to extract an effective diffusion constant from the FRET recordings and thus avoids the usual ad hoc choice of an averaging interval for the determination of in- tensities. Finally, the likelihood approach avoids the severe bias of usual distance determination due to the salient as- sumption of uniform a priori probabilities for the FRET in- tensities, which implies, via Eq.共1兲, preferred distances near r0. Possible extensions of the method concern position- and dye-dependent detection efficiencies. Because low count rates are also often encountered for many other types of single molecule experiments, we expect our approach to be of wide applicability. A software package that implements this theory 共FRETtrace兲 can be downloaded from the webpage of the authors.
ACKNOWLEDGMENTS
We thank C. Seidel for providing his FRET data, for valuable discussions, and for carefully reading the manu- script. This work was supported by the Volkswagen Founda- tion, Grant No. I/75 321.
1T. Fo¨rster, Ann. Phys.共Leipzig兲2, 55共1948兲.
2B. W. van der Meer, G. Cooker, and S.-Y. Chen, Resonance Energy Trans- fer: Theory and Data共VCH, New York, 1994兲.
3S. H. Lin, W. Z. Xiao, and W. Dietz, Phys. Rev. E 47, 3698共1993兲.
4S. Weiss, Science 283, 1676共1999兲.
5S. Weiss, Nat. Struct. Biol. 7, 724共2000兲.
6T. J. Ha, A. Y. Ting, J. Liang, W. B. Caldwell, A. A. Deniz, D. S. Chemla, P. G. Schultz, and S. Weiss, Proc. Natl. Acad. Sci. U.S.A. 96, 893共1999兲.
7W. E. Moerner and Michel Orrit, Science 283, 1670共1999兲.
8X. Zhuang, L. E. Bartley, H. P. Babcock, R. Russell, T. Ha, D. Herschlag, and S. Chu, Science 288, 2048共2000兲.
9N. L. Goddard, G. Bonnet, O. Krichevsky, and A. Libchaber, Phys. Rev.
Lett. 85, 2400共2000兲.
10M. Margittai, J. Widengren, E. Schweinberger, et al.,共unpublished兲.
11X. Zhuang, H. Kim, M. J. B. Pereira, H. P. Babcock, N. G. Walter, and S.
Chu, Science 296, 1473共2002兲.
12T. J. Ha, A. Y. Ting, J. Liang, A. A. Deniz, D. S. Chemla, P. G. Schultz, and S. Weiss, Chem. Phys. 247, 107共1999兲.
13R. Ku¨hnemuth and C. A. M. Seidel, Single Mol. 2, 251共2001兲.
14E. Barkai, Y. J. Jung, and R. Silbey, Phys. Rev. Lett. 87, 207403共2001兲.
15C. Eggeling, J. R. Fries, L. Brand, R. Gunther, and C. A. M. Seidel, Proc.
Natl. Acad. Sci. U.S.A. 95, 1556共1998兲.
16F. G. Ball, Y. Cai, J. B. Kadane, and A. O’Hagan, Proc. R. Soc. London, Ser. A 455, 2879共1999兲.
17W. A. Carrington, R. M. Lynch, E. D. W. Moore, G. Isenberg, K. E.
Fogarty, and F. S. Fredric, Science 268, 1483共1995兲.
18T. Dudok de Wit and E. Floriani, Phys. Rev. E 58, 5115共1998兲.
19T. J. Loredo and D. Q. Lamb, Phys. Rev. D 65, 063002共2002兲.
20J. Enderlein, Appl. Opt. 34, 514共1995兲.
21C. Zander, M. Sauer, K. H. Drexhage, D.-S. Ko, A. Schulz, J. Wolfrum, L.
Brand, C. Eggeling, and C. A. M. Seidel, Appl. Phys. B: Lasers Opt. 63, 517共1996兲.
22J. Enderlein, P. M. Goodwin, A. Van Orden, W. P. Ambrose, R. Erdmann, and R. A. Keller, Chem. Phys. Lett. 270, 464共1997兲.
23P. J. Rothwell, S. Berger, O. Kensch, S. Felekyan, M. Antonik, B. M.
Wo¨hrl, T. Restle, R. S. Goody, and C. A. M. Seidel, Proc. Natl. Acad. Sci.
U.S.A. 100, 1655共2003兲.
24A. Bru¨nger, Nature共London兲355, 472共1992兲.