Piecewise polynomial regression with fractional residuals for the analysis of calcium imaging data

(1)

residuals for the analysis of calcium imaging data

Dissertation zur Erlangung des

akademischen Grades eines Doktors der Naturwissenschaften

vorgelegt von Arno Weiersh¨ auser

an der

Mathematisch- Naturwissenschaftliche Sektion Fachbereich Mathematik und Statistik

Tag der m¨ undlichen Pr¨ ufung: 06. Februar 2012

Referenten

Prof. Dr. Jan Beran (Universit¨ at Konstanz) Prof. Dr. Holger Dette (Universit¨ at Bochum)

Konstanzer Online-Publikations-System (KOPS) URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-188670

(2)

(3)

First and foremost, I am deeply grateful to my supervisor Prof. Dr. Jan Beran for his guidance, support and the excellent conditions at his chair throughout my entire studies at the University of Konstanz. His ideas and advice were essential in finishing this project.

I am also thankful for the opportunity to travel to many international conferences. In addition, my thanks go to Prof. Dr. Holger Dette for reviewing this thesis. Furthermore, I would like to thank my colleagues at the Beran group for their professional and personal support during the last four years. Special thanks go to Prof. Dr. Giovanni Galizia and his group for providing the data and for valuable discussion regarding the biological context.

In particular, I would like to thank Martin Strauch for his enduring patience in answering my questions and Daniel M¨unch for his suggestions on the biological introduction of this thesis. Lastly, my thanks go to Gareth Cox and Loudina Erasmus for proofreading large parts of the manuscript.

(4)

(5)

In this work we deal with the mathematical analysis and application of piecewise (or segmented) polynomial regression. Motivated by an application in neurobiology we allow the residual processes of our model to exhibit long memory, short memory or antipersistence.

As a solid biological background is essential for understanding the application in this work, we start with an introduction to neurobiology and the related experimental techniques. We conclude this introduction with a sample of data sets by means of which we illustrate piecewise polynomial regression.

Thereafter, we discuss least squares estimation with piecewise polynomials when the residuals exhibit antipersistence, short memory or long memory. This purely mathematical discussion is completely detached from the initial biological application. We start with an introduction to the related mathematical foundations and then discuss consistency and the asymptotic properties of the least squares estimator. In addition to the usual least squares estimator we treat the weighted least squares estimator as well. The asymptotic distribution is represented as a stochastic integral with respect to a fractional Brownian motion (in the case of antipersistence, short memory or long memory) or as a stochastic integral with respect to a Hermite process (in the case of long memory). We derive our results by means of fractional calculus which allows us to state a unifying formula of the asymptotic covariance matrix which covers all three correlation structures. In the case of an unknown number of segments we show that an information criterion can be used to estimated this unknown number. However, as the precise normalisation of the information criterion depends on the underlying correlation structure of the residuals, the latter results is only of theoretical interest.

We conclude this work by applying the derived methods on a large biological data sets. In this analysis, we apply piecewise polynomials to estimate the trend function of temporal response patterns. These estimates serve then as an input for an errors-in- variables regression model.

(6)

Die vorliegende Arbeit beschäftigt sich mit der mathematischen Analyse stückweiser (segmentierter) polynomialer Regressionsmodelle mit verschiedenen Korrelationsstruk- turen (long memory, short memory und Antipersistenz) in den Residuen sowie die An- wendung dieser Modelle in der Kodierung olfaktorischer Informationen im Insektenge- hirn. Da für das Verständnis der Anwendung ein gewisses biologisches Hintergrundwissen notwendig ist, werden zunächst grundlegende Begriffe aus der Neurobiologie sowie die dazugehörige experimentelle Methodik eingeführt. Die biologische Einführung schließt mit Vorstellung einiger Datenbeispiele anhand derer die Möglichkeiten stückweiser polynomialer Regression illustriert werden.

Losgelöst von dieser Anwendung findet in den zwei danach folgenden Kapiteln eine rein mathematische Diskussion der kleinste Quadrat Schätzung bei stückweisen Poly- nomen statt. Nach einer Einführung der grundlegenden mathematische Begriffe werden Konsistenz und die asymptotischen Eigenschaften des kleinste Quadrat Schätzers disku- tiert. Neben dem üblichen kleinste Quadrat Schätzer schließt diese Diskussion auch den gewichteten kleinste Quadrat Schätzer ein. Die asymptotische Verteilung wird mit Hilfe einer gebrochenen Brownschen Bewegung (im Falle von Antipersistenz, short memory oder long memory) oder eines Hermite Prozesses (im Falle von long memory) dargestellt.

Die Beweisf¨uhrung st¨utzt sich dabei wesentlich auf die Methoden des fractional calculus.

Dies ermöglicht unter anderem, die asymptotische Kovarianzmatrix des kleinste Quadrat Schätzers für alle drei Korrelationsstrukturen einheitlich darzustellen. Im Falle einer unbekannten Anzahl von Segmenten wird gezeigt, dass diese mit Hilfe eines Information- skriteriums theoretisch konsistent geschätzt werden kann. Da jedoch die genaue Normal- isierung des Informationskriteriums von der zugrunde liegenden Korrelationsstruktur der Residuen abhängt, hat dieses letzte Resultat nur eine theoretische Bedeutung..

Die Dissertation schließt mit einer Anwendung der hergeleiteten Methoden auf einen umfangreichen biologischen Datensatz. Dabei werden stückweise Polynome zur Trend- schätzung bei neurobiologischen Zeitreihen verwendet. Diese Schätzungen werden anschlie- ßend mit Hilfe eines errors-in-variables Modells verglichen.

(7)

Contents 1

I Introduction 3

II Fractional and self-similar processes 17

1 Preliminaries . . . 18

1.1 Basic definitions . . . 18

1.2 Self-similar processes . . . 23

1.3 Limit theorems for partial sums . . . 25

2 Fractional calculus . . . 34

2.1 Fractional integrals on a compact interval . . . 34

2.2 Fractional derivatives on a compact interval . . . 36

2.3 Fractional integrals on the real axis . . . 39

2.4 Fractional derivatives on the real axis . . . 42

2.5 Miscellaneous results . . . 46

2.6 Fourier transform and inner product spaces . . . 50

3 Limit theorems for weighted sums . . . 53

3.1 Stochastic integrals with respect to self-similar processes . . . 53

3.2 Asymptotics for weighted sums . . . 55

III Regression with piecewise polynomials 74 4 Basic concepts . . . 75

4.1 Piecewise polynomials . . . 75

4.2 Least squares regression . . . 78

5 Limit theorems . . . 89

5.1 Consistency . . . 89

5.2 Asymptotic distribution . . . 97

1

(8)

5.3 Estimating the dimension of the model . . . 118

6 Simulation studies . . . 131

6.1 Asymptotic covariance matrix . . . 131

6.2 Consistency of the information criterion . . . 132

IV Application 138 7 Experimental procedure . . . 138

8 Descriptive analysis . . . 139

8.1 A first graphical analysis . . . 142

8.2 AUC . . . 147

9 A rigorous analysis . . . 148

9.1 An introduction to errors-in-variables regression . . . 149

9.2 Two errors-in-variables regression models . . . 153

9.3 Fitting the model . . . 154

9.4 Results . . . 156

V Supplemental material 161 10 Tables and figures . . . 161

10.1 Tables and figures for chapter III . . . 161

10.2 Tables and figures for chapter IV . . . 189

11 R and Maple code . . . 203

11.1 Maple code for chapter III . . . 203

11.2 R code for chapter III . . . 225

11.3 R code for chapter IV . . . 266

List of Tables 307

List of Figures 309

Bibliography 311

(9)

Introduction

In neurobiology, problems of time series analysis occur on a regular basis. In this particular thesis, we deal with the neuronal representation of odours in the brain of the honeybee Apis mellifera. Although major advances have been made during the last 15 years, the olfactory system¹ of insects (including that of the honeybee) is to a large extent only partially understood and subject to ongoing research.

As we will deal with rather specific neurobiological problems, we start with a brief introduction to neurons in general and the olfactory system of insects in particular. Gen- eral references on neuroscience are for example [DMS01, Hal92, BCP96]. The material about the olfactory system presented here is primarily taken from [Gal08]. For additional reading in this regard, we recommend [GM00, Sac02, SG06].

Figure 1 shows four neurons with some characteristic features. Typically, several branches, so-called dendrites, arise from the central cell body of the neuron. Most neurons possess one axon which transmits the output signal of the neuron. In most cases, the axon ends in a synapse which connects the axon to a dendrite of another neuron, the receiving neuron. The axon branches several times so the output signal is sent to many different neurons simultaneously. To communicate with other neurons, a stimulated neuron sends an electrical impulse, termedaction potential, along the membrane of its axon.

The entire membrane of a neuron is equipped with severalion channels, in particular voltage activated calcium channels. When reaching the axon terminal, the electrical impulse temporarily opens the calcium channels there and calcium ions from the surrounding extracellular fluid flow into the axon terminal. This presynaptic calcium influx causes the axon terminal to release chemical neurotransmitters which in turn stimulate the receptors

1The olfactory system is the collectivity of all receptors, neurons, and brain centers used for olfaction, i.e. the sense of smell.

3

(10)

Nucleus

Axons Dendrites

Dendrites

Synapses

Axon terminal

(presynaptic element) Postsynaptic

dendrite

Receptors Synaptic cleft

Ca^{2 +}

Ca^{2 +} Ca^{2 +} Ca^{2 +}

Ca^{2 +} Ca^{2 +}

Figure 1 Schematic view of four neurons with close-up view of a synapse. Grey circles indicate the nucleus of each neuron and red circles indicate the synapses. Additional organelles (e.g. mitochon- dria) are omitted. Ca²⁺ indicates a higher concentration of calcium ions in the extracellular fluid compared to the interior of the neuron.

(11)

of the postsynaptic neuron. Depending on the neurotransmitter, the subsequent or postsynaptic neuron is either inhibited or excited. If the excitation is sufficiently strong, the postsynaptic neuron itself sends an action potential along its membrane. Again, voltage activated calcium channels open and calcium flows into the postsynaptic neuron. This latter influx is referred to as the postsynaptic calcium influx. As we will see in the next section, this postsynaptic calcium influx can be used in experiments to visualise neural activity.

Figure 2(a) illustrates odour processing in a honeybee schematically: the primary olfactory sensors are the two antennae with approximately 60,000 - 65,000 olfactory sensory neurons each, see e.g. [GM00, Gal08]. These special neurons express olfactory receptors on their dendrites which allow them to interact with odour molecules. The honey bee has approximately 160 types of olfactory sensory neurons, with each type being sensitive to different molecules and/or molecule groups. Once stimulated, the olfactory sensory neurons send electrical impulse along their axons into the first olfactory brain center, termed antennal lobe, compare figures 2(b) and 2(c). In the antennal lobe, the signals of the olfactory sensory neurons are processed by a network of approximately 4,000 local inter

(a) (b) (c)

Figure 2 (a) Schematic view of odour processing in the honeybee. Indicated in blue: Antenna with approximately 60,000 to 65,000 olfactory sensory neurons. Indicated in red are the two antennal lobes, consisting of approximately 160 glomeruli with 4,000 local inter neurons and 800 projection neurons each. The projection neurons lead to higher-order brain centres, such as the mushroom bodies and the lateral protocerebrum. Adopted from http://neuro.uni-konstanz.de/. (b) Close- up computer model of the antennal lobe with glomeruli (coloured balls) and antennal nerve (blue) (c) Schematic view of Antennal lobe. The glomeruli are indicated by black circles. Within the glomeruli, the axons of the olfactory sensory axons (OSN) synapse with the local neurons (LN) and the projection neurons (PN). Adopted from [SG06, page 257] by courtesy of MIT Press.

(12)

neurons and 800projection neurons. The local neurons primarily control the communica- tion within the antennal lobe. Their axons branch inside the antennal lobe and synapse with other local inter neurons and/or projection neurons, see [Gal08, page 735]. Project neurons are responsible for submitting the olfactory information to lateral protocerebrum and the mushroom body, where the information is processed further. Their axons leave the antennal lobe and lead to these higher order brain centers.

The synapses between olfactory sensory neurons, local inter neurons, and projection neurons are not scattered randomly across the antennal lobe but organised in approximately 160 discrete spherical structures on the outer shell of the antennal, the so-called glomeruli. Outside these glomeruli structures, only “very few synapses are found”, see [Gal08, page 730]. The actual central cell bodies, termed soma (or somata in plural), are clustered outside the antennal lobe. Although all insects share this basic structure with the honey bee, the total numbers of glomeruli, local inter neurons and projection neurons differs widely among species.

Whilst the general structure of the antennal lobe is well-known, its precise function is far less understood. As a general rule for insects (including the honey bee), each glomerulus corresponds to a single type of olfactory sensory neuron. In other words, all axons of olfactory sensory neurons sharing the same response profile converge onto the same glomerulus.² Based on the actual composition of the odour, different types of olfactory sensory neurons are activated which in turn leads to an increase or decrease of the neuronal activity in different glomeruli. These so-called glomerular activity patterns are believed to play an important role in the representation of odours within the brain, see [Gal08, page 742-748]. The processing of odours within the antennal lobe is far more complex than just averaging among each receptor type. It is for example well-known that the 4,000 local inter neurons have to a large extent an inhibitory effect on other local interneurons, projection neurons, and olfactory sensory neurons, see [SG06, page 235- 255]. Moreover, this activation and deactivation process of a glomerulus is not a binary on/off operation but a continuous one as “each glomerulus can be activated to a certain degree” [Gal08, page 742]. The analysis of these activity patterns is the starting point of this thesis.

To visualise the neural activity patterns in insects, calcium imaging techniques are used as introduced in [GJK⁺97]. Briefly, the neurons of interest are stained with calcium- sensitive dye. Then, by analysing the emission spectrum of the stained neurons, one can infer on the concentration of the calcium within the neurons.

2A well-known exception to this rule is the locust, see [Gal08, page 731].

(13)

The general scope of calcium imaging reaches far beyond olfactory research in insects, making it thus impossible to discuss this technique in full generality. Rather, we restrict ourselves to the experimental setup which was used to obtain the data that we will discuss in this thesis. General references on calcium imaging are e.g. [VP10, Dei00, MC91].

For technical details on calcium imaging in olfactory research in insects, see [GV04].

A technical description of the particular experiment can be found in [RMS⁺]. Other experiments with the same staining protocol can be found in [SG03] and [Szy05, chapter 1].

During the pre experimental treatment, the projection neurons of the right antennal lobe of several honey bees are stained with the calcium-sensitive dye Fura-2 dextran.

Under normal circumstances, the absorption spectrum of Fura-2 molecules is maximal at 365nm. However, once inside the the cell, Fura-2 binds to free intracellular calcium. These Fura-2 calcium complexes have different fluorescence spectra, see figure 3 and [Dei00, page 31]. In particular, the maximum of the absorption spectrum is now attained at 340nm.

By comparing the intensity of the fluorescence emission of 340 nm excitation with 380nm excitation, we can determine the concentration of calcium, see [Dei00, page 33].

Under this staining protocol, the visualised calcium fluctuations in the antennal lobe corresponds mainly to an increase of intracellular calcium within the dendrites of the projection neurons caused by voltage-activated calcium channels, see [Szy05, page 33].

Galizia and Kimmerle [GK04] showed that the increase of calcium in the dendrites of projection neurons “is related to action potential rates” [Szy05, page 33]. Thus, this influx “qualitatively reflect the electrical activity of their respective cell compartments, as would be measured electrophysiologically”, see [Szy05, page 33]. However, in contrast to electrophysiological measurements where only a limited number of neurons are observed, calcium imaging allows for a simultaneous measurement of the entire antennal lobe while maintaining a sufficient spatial and temporal resolution, see e.g. [GJK⁺97, page 61] and [Sac02, page 7].

Once staining is completed, the bee is placed in a recording chamber (figure 4) and the actual imaging experiment starts. Depending on the specific protocol, the bee is subsequently exposed to odours and/or rest phases in order to record the spatio-temporal evolution of odour evoked glomerular patterns and/or analyse the background noise.

Throughout the experiment the stained antennal lobe is exposed to a monochromatic excitation light alternating between 340nm and 380nm. Exposure times range between 5 and 15ms for 380nm and between 20 and 60 ms for 340nm. The resulting fluorescence emission of the antennal lobe is recorded in a digital video with a typical temporal reso-

(14)

lution of five frames per second. Figure 5 shows a sample of five frames obtained during a calcium imaging experiment.

In the first step of the data analysis, the glomeruli on the video frames need to be identified. Algorithmic approaches to this problem have been proposed only recently, see [SG08, RMS⁺]. For related techniques in the context of research in mammals, see [RSO⁺07]. Before that the identification had to be done manually by human experts.

A possible algorithmic solution to this problem has to take account of several problems.

First of all, one needs to project and three dimensional structure (antennal lobe) onto a two dimensional structure (video frames). Although the principal structure of the

Calcium Fura-2 cell

(a)

Calcium Fura-2 cell

(b)

Calcium Fura-2 cell

(c)

Calcium Fura-2 cell

(d)

Figure 3 Schematic view of calcium imaging(a)Stained part of a neuron with calcium ions within extracellular fluid. Blue rectangle indicates the cell body (nucleus, organelles, and calcium channels are omitted). Magenta balls indicate Fura-2 dye molecules within the cell body and grey balls indicate calcium ions. (b)The same part after an action potential has been released: some calcium ions have passed the cell membrane through the calcium channels and are now inside the cell. Some Fura-2 molecules have bound to the intracellular calcium ions, some calcium ions and Fura-2 molecules remain free. (c) If is exposed to a monochromatic excitation light at 340nm, bounded Fura-2 molecules emit stronger than free Fura-2 molecules. (d) Excitation is repeated with 380nm. This time, free Fura-2 molecules emit stronger than the bounded ones. By comparing the two emission pictures, the concentration of calcium can be calculated.

(15)

A C

B

Figure 4 Close-up view of experimental setup: odour delivery device (A), bee fixed in a plexiglass box (B) and microscope objective (C).

Figure 5 Sample of five non consecutive frames obtained during a calcium imaging experiment with superimposed map of identified glomeruli. Red colours indicate a high intracellular calcium concentration in response to an odour stimulus. Glomeruli are identified after the actual experiment by means of a manual analysis and/or identification algorithms.

antennal lobe is the same for each bee, each individual is subject to biological variation (slightly different response behaviour and/or arrangement of the glomeruli). The problem is furthermore complicated by possible experimental variation, e.g. different angles on the antennal lobe, movement of the bee during the experiment and possible experimental artefacts.

The algorithmic approach proposed in [SG08] is based on independent component analysis as introduced in e.g. [CJ10, HO00]: each frame is considered as a n×m dimensional random vector where n ×m denotes the number of pixels of each frame. A

(16)

large number, say p, of independent components is then estimated from the data. Each independent component results in a single grey scale image of the same dimension as the original frames, with the colour of each pixel representing the weight of its contribution to the independent component. In general, the estimated components turn out to be sparse, i.e. they often contain a single anatomic component and some background noise.

To extract the object from the noise, Otsu thresholding as in [Ots79] is applied. Only those pixel of the picture are kept whose contribution to the independent component is sufficiently strong. Note that in addition to glomeruli other anatomic objects and/or experimental artefacts are identified as well. These non-glomerular components are filtered out by different criteria (e.g. size and circularity). The remaining objects are the basis of the map as shown in figure 5. Popular extensions of this algorithm include a dimension re- duction by a principal component analysis before performing the independent component analysis and correction for movement of the bee during the experiment.

A graph based algorithm is then used to match this empty map optimal with a three dimensional atlas of the antennal lobe. Several atlases are available in the literature, see for example [GMM99]. Briefly, the generated map is represented as a graph G_m with the unidentified glomeruli acting as knots and vertices between two knots being drawn if the corresponding glomeruli touch each other on the map. Likewise, the theoretical atlas is represented as a graphG_a again with vertices being drawn between neighbouring glomeruli. Then, a branch and bound strategy is applied to search G_a for subgraphs which are isomorphic or at least close to isomorphic to G_m. To allow for biological variation, violations in the vertice configuration are admitted but penalised by a scoring function. The branch and bound algorithm thus aims to find a subgraph with a low overall score. This procedure can be supported by employing certain marker odours (e.g.

nonanol). These chemicals trigger a characteristic, well-known response pattern which allows the experimenter to identify a certain glomerulus on the map a priori. In case the search algorithm yields many solutions with similar scores, those contradicting this a priori information can immediately be discarded. Alternatively, the identified glomeruli can be used as a seed knot for the branch and bound search.

Finally, signals are extracted from the centre of each identified glomeruli. To that end, a spatial average over a radius of for instance 5 pixel is extracted from the centre of each object. So, as the outcome of each experiment, we obtain a multivariate time series with each dimension accounting for an identified glomeruli.

Figures 6, 7 and 8 show typical outcomes of calcium imaging experiments. The samples in figures 7 and 6 were obtained in an experiment which was designed to study the

(17)

background noise before and after an odour stimulus. In total, 6000 frames (20 minutes) were observed, the odour stimulus is given at frame 3000. Figure 7 shows three commonly observed response behaviours: the most frequent reaction to a stimulus is a sharp increase of the activity followed by a gradual decay (glomerulus 9). An inhibitory reaction to an odour stimulus (glomerulus 2) or no reaction at all (glomerulus 7) are also regularly observed. Figure 8 shows a small sample of a large series of experiments, in which the response behaviour itself was studied under varying experimental conditions. Thus, only 110 frames were observed per measurement and the stimulus was given after 6 seconds (frame 30).

Typical questions associated with either of two experiments are for example: (a) Does a specific glomerulus react at all? (b) How long does the reaction of a glomerulus last? and (c) Is a particular response stronger or weaker than a response measured under different experimental conditions? The first step in answering these questions

2000 2500 3000 3500 4000

900910920930940950

Odour response

Frame

Fluorescence

(a)

3000 3100 3200 3300 3400 3500 3600

900910920930940950

Odour response

Frame

Fluorescence

(b)

3000 3100 3200 3300 3400 3500 3600

−10−50510

Residuals

Frame

Residuals

(c)

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●●●

●

●●

●

●●

●

● ●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●●

●

●●●

●

●●

●

●●

●

●●

●

−3 −2 −1 0 1 2 3

−10−50510

Normal qq−plot of residuals

Quantiles of standard normal distribution

Quantiles of residuals

(d)

●

●●

●

●●

●

●●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●●●

●

●●●

●

●●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●●●

●

●●

●

●●

●

0.002 0.005 0.020 0.050 0.200 0.500

5e−025e−015e+005e+01

Spectrum (log−log scale)

λ

Power

(e)

Figure 6 Sample of response pattern as presented [BW11]. (a)Frames 2000 to 4000, the reaction to the stimulus att₀ = 3000 is clearly visible. (b) Frames 3005 to 3605 directly after the stimulus.

Red line indicates least squares fit of a quadratic spline with position of the free knot at 3093.2. (c) Residuals of spline fit and(d) corresponding Q-Q plot. (e) Periodogram of residuals in log-log scale with spectral density of fitted FARIMA(1,d,0) model. The order p of the AR part was determined by the Bayesian information criterion. The estimate dis significantly different from zero.

(18)

0 1000 2000 3000 4000 5000 6000

880920960

Glomerulus 9

Frame

●

● ●

2600 2800 3000 3200 3400

880920960

Glomerulus 9

Frame

●

0 1000 2000 3000 4000 5000 6000

9709901020

Glomerulus 2

Frame

●

●● ●

2600 2800 3000 3200 3400

970980990

Glomerulus 2

Frame

●

0 1000 2000 3000 4000 5000 6000

96010001040

Glomerulus 7

Frame

2600 2800 3000 3200 3400

975985995

Glomerulus 7

Frame

Figure 7 Three samples of odour responses. Shown are the entire series (6000 frames) and a close-up of the period before and after the stimulus (800 frames). Red line indicates least squares fit of piecewise cubic polynomials (glomerulus 2 and 9) and a global cubic polynomial (glomerulus 7). Red points indicate the location of knots. Note that in this figure, the numbering of glomeruli is done just for convenience of notation without any anatomic significance.

(19)

0 20 40 60 80 100

0.780.790.800.810.82

jr090702bee77 [glomerulus 48 ] hexanol (10⁻⁻² M) [ control / 1mM octopamine ]

frame

fluorescence

0 20 40 60 80 100

0.640.660.680.700.72

jr090702bee80 [glomerulus 33 ] nonanol (10⁻⁻² M) [ control / w/o octopamin ]

frame

fluorescence

0 20 40 60 80 100

0.610.620.630.640.65

jr090702bee80 [glomerulus 33 ] nonanol (10⁻⁻² M) [ control / 10 mM octopamine ]

frame

fluorescence

0 20 40 60 80 100

0.690.700.710.720.730.74

jr090707bee88 [glomerulus 60 ] heptanone (10⁻⁻² M) [ dsAmOA1 / w/o octopamin ]

frame

fluorescence

0 20 40 60 80 100

0.6550.6650.675

frame

fluorescence

0 20 40 60 80 100

0.710.720.730.74

jr090713bee140 [glomerulus 36 ] heptanone (10⁻⁻² M) [ control / 1mM octopamine ]

frame

fluorescence

0 20 40 60 80 100

0.660.670.680.69

jr090807bee182 [glomerulus 60 ] hexanol (10⁻⁻² M) [ dsAmOA1 / 1mM octopamine ]

frame

fluorescence

0 20 40 60 80 100

0.680.690.700.71

jr090830bee322 [glomerulus 36 ] nonanol (10⁻⁻⁴ M) [ control / 1mM octopamine ]

frame

fluorescence

(a)

0 10 20 30 40

0.790.800.810.82

frame

fluorescence ●

●

0 10 20 30 40

0.640.660.680.700.72

frame

fluorescence

●

0 10 20 30 40

0.610.620.630.640.65

frame

fluorescence

●

0 10 20 30 40

0.690.700.710.720.730.74

frame

fluorescence ^●

●

0 10 20 30 40

0.6550.6600.6650.670

frame

fluorescence

●

0 10 20 30 40

0.710.720.730.74

frame

fluorescence

●

0 10 20 30 40

0.660.670.680.69

frame

fluorescence

●

0 10 20 30 40

0.680.690.700.71

frame

fluorescence

●

(b)

●

●●

●

0.02 0.05 0.10 0.20 0.50

2e−072e−062e−05

λλ

Power

●

●●

●

●●

●

●●

●

0.02 0.05 0.10 0.20 0.50

5e−075e−065e−05

λλ

Power

●

● ●

●

● ●

●

0.02 0.05 0.10 0.20 0.50

1e−075e−072e−06

λλ

Power

●

● ●

●

● ● ●

●

●●

●

0.02 0.05 0.10 0.20 0.50

1e−081e−061e−04

λλ

Power

●

●●

● ●

●●

●

0.02 0.05 0.10 0.20 0.50

5e−072e−061e−05

λλ

Power

● ●

●

0.02 0.05 0.10 0.20 0.50

2e−071e−065e−062e−05

λλ

Power

●

●●

●

0.02 0.05 0.10 0.20 0.50

1e−065e−062e−051e−04

λλ

Power

● ●

●

● ●

●

0.02 0.05 0.10 0.20 0.50

2e−061e−055e−05

λλ

Power

(c)

Figure 8 (a)Sample of response patterns to various odourant stimuli. (b) First 45 frames of each series. Red line indicates the least squares fit of a linear spline with one fixed knot (at frame 30) and one free knot. (c) Periodogram in log-log scale of corresponding residuals. Also displayed are spectral densities obtained from a FARIMA(0,d,0)-fit (red line), AR(1)-fit (green line) and MA(1) fit (blue line). The data and the related experiment are discussed in detail in chapter IV.

(20)

is to provide an adequate model for the mean function. The samples presented thus far illustrate some aspects one has to bear in mind when modeling glomerular response data: on the one hand, the patterns exhibit quite a large biological variability between different datasets. On the other hand, the variation within one dataset is also high. The actual variability of the biological system is potentially further increased by measurements errors and/or misidentification of the glomeruli. Very likely, a parametric model will not be able to reproduce all the different shapes observed in the different datasets. A classical nonparametric model (Kernel estimate, smoothing spline) will either fail to capture the sharp peaks or almost interpolate the data, depending on the choice of the bandwidth parameter.

A good compromise between the parametric and nonparametric approach are piecewise models. In our case, spline functions or more generally piecewise polynomials as introduced in chapter III turn out to be fruitful. Briefly, the total observation period is divided in two or more intervals. Within each interval, the trend function is then modeled by a polynomial. We usually add some global regularity constraint which ensures that the global trend function remains at least continuous at the boundaries of each interval.

The boundaries of these intervals are typically referred to as knots or joins.

In figure 7, a fixed knot was placed at frame 3000, the time of the stimulus. Within the first interval we used a cubic polynomial and second interval was modeled by a cubic spline with three free knots. For the fixed knot at frame 3000, we only impose continuity but no differentiability. All free parameters were chosen by minimising the least squared error loss. As the model is nonlinear in the knot parameters, we applied a grid search to approximate their optimal position. In figure 6, the model is fitted to an observation period of 600 frames after the stimulus. We fit a quadratic spline with one free knot, i.e.

two quadratic polynomials with the global function being continuously differentiable at the knot. Fitting of the parameters is done as before. Finally, figure 8 shows the least squares fits of a linear spline to the first 45 frames of each series. As before, the first knot is fixed at the time of the odour stimulus. The position of the second knot is again determined by a grid search.

In all cases, we see that piecewise polynomials capture the shapes of the data reason- ably well. In addition to these good empirical findings, piecewise models offer two more rather conceptual advantages: the position of a knot offers an intuitive interpretation as the time of transition of the glomerulus from one status in another status. For example in figure 6, the location of the knot may be interpreted as the approximate time when the effect of the stimulus ceases to be noticeable. Secondly, adding knots at certain time

(21)

points (e.g. time of the stimulation) or within certain periods naturally allows us to incorporated a priori knowledge in our model.

The log-log periodogram of the residuals in figure 6(e) shows a negative slope which is a heuristic indicator for long memory in the residuals, see [Ber94]. Fitting fractional autoregressive process (see 1.1.7) by means of maximum likelihood as defined in [HR89] in R (package fracdiff) with model choice based on BIC (see [BBO98]) yields an autoregressive order p= 1. The estimated value ofd together with a 95%-confidence interval is equal to 0.219±0.177, and for the autoregressive parameter we have ˆϕ₁ = 0.481±0.198. Figure 6(e) shows a good agreement between the fitted spectral density and the estimated periodogram. The log-log periodograms in figure 8(c) show a large variation in the behaviour.

In particular, some pictures show a positive slope at the origin which is a heuristic indicator for antipersistent behaviour of the residuals. We will conduct a detailed analysis of this dataset in chapter IV.

The findings of the previous paragraph suggest that a piecewise polynomial might be a suitable model for the mean structure of the glomerular time series. However, at the same time the assumption of iid residuals in these time series seems very unrealistic. Rather, an adequate model needs to incorporate long memory and event antipersistent noise as well.

This leads to the main problem of this thesis, namely the asymptotic properties of the least squares estimator of a piecewise polynomial with long-range dependent or antipersistent noise. More generally, these models belong to nonlinear regression model or change point problems. Discussing this topic in the desired generality requires considerable theoretical preparation. The main part of chapter II is devoted to these preparations. We introduce the main objects of this thesis, namely fractional processes together with their corresponding limit theorems. In this context, we also study related concepts, e.g. Hermite and Appel polynomials and give a rather detailed introduction to fractional calculus. We also recall the stochastic integral introduced by Pipiras and Taqqu [PT00b] which allows us to prove a limit theorem for linear regression or more generally for weighted sums of a fractional process.

Chapter III contains the main results of this thesis. We introduce piecewise polynomials and study least squares estimation rigorously. For the identified model we discuss asymptotic well-definition, consistency and asymptotic linearity of the least squares estimator in detail. Our discussion includes both the weighted and unweighted least squares estimator, where weight functions are chosen as in [Dah95]. Once asymptotic linearity has been established, the asymptotic distribution follows from the results in chapter II.

Furthermore, we give a partial result on the use of information criteria in our setting