Discussion - Bayesian Inference and Experimental Design for Large Generalised Linear Models

4.13. DISCUSSION 83

Our comparisons include the Laplace approximation and the expectation propagation algo-rithm [Kuss and Rasmussen, 2005]. We extend the latter to the cumulative logistic likelihood.

We apply Kullback-Leibler divergence minimisation to Gaussian process classification and de-rive an efficient Newton algorithm. Although the principles behind this method have been known for some time, we are unaware that this method has been previously implemented for GPs in practise. The existing variational method [Gibbs and MacKay, 2000, Jaakkola and Jordan, 1996] is extended by a lower bound on the cumulative Gaussian likelihood and we pro-vide an implementation based on Newton’s method. Furthermore, we give a detailed analysis of the factorial variational method [Csató et al., 2000].

All methods are considered in a common framework, approximation quality is assessed, predictive performance is measured and model selection is benchmarked.

In practise, an approximation method has to satisfy a wide range of requirements. If run-timeis the major concern or one is interested inerror rateonly, the Laplace approximation or label regression should be considered. But only expectation propagation and – although a lot slower – the KL-method deliver accuratemarginalsas well as reliableclass probabilitiesand allow for faithfulmodel selection.

If an application demands anon-standard likelihoodfunction, this also affects the choice of the algorithm: the Laplace application requires derivatives, expectation propagation and the factorial variational method need integrability with respect to Gaussian measures. However, the KL-method simply needs to evaluate the likelihood and known lower bounds naturally lead to the VB algorithm.

Finally, if the classification problem contains a lot of label noise(σ_f is small), the exact posterior distribution is effectively close to Gaussian. In that case, the choice of the approx-imation method is not crucial since in the Gaussian regime, they will give the same answer.

For weakly coupled training data, the factorial variational method can lead to quite reasonable approximations.

As a future goal remains an in-depth understanding of the properties of sparse and online approximations to the posterior and a coverage of a broader range of covariance functions.

Also, the approximation techniques discussed can be applied to other non-Gaussian inference problems besides the narrow applications to binary GP classification discussed here, and there is hope that some of the insights presented may be useful more generally.

Chapter 5

Adaptive Compressed Sensing of Natural Images

Multivariate real-world signals are highly structured: For example, the redundancy contained in natural images, e.g. sparsity after some linear transform, can be used for compression with-out perceptible loss. As a consequence, one can store an image much more efficiently than an unstructured collection of independent pixels. However, typical image acquisition devices such as digital cameras are not aware of this structure during the acquisition process: they mea-sure every pixel independently. Only later when the image isstored, redundancy is exploited in compression schemes like JPEG.

Recently the research field ofcompressed sensing(CS) [Candès et al., 2006, Donoho, 2006a]

with theoretical underpinnings from approximation theory [Ismagilov, 1974, Kashin, 1978, Gar-naev and Gluskin, 1984] emerged. Its main goal is to exploit redundancy in the acquisition pro-cess already. The main result is that structured signals like images can be sampled below the Nyquist-limit and still be reconstructed to satisfaction, if nonlinear reconstruction algorithms are used and regular undersampling designs are avoided. The randomised measurement de-sign, however, is non-adaptive to the particular signal to be measured itself.

In this chapter which is an extended version of Seeger and Nickisch [2008a], we address the CS problem within the general framework of statistical (Bayesian) experimental design. For particular natural images, we optimise the sub-Nyquist image measurement architecture so that the subsequently nonlinearly reconstructed image contains as much information as possi-ble. We present experimental results shedding more light on how to make CS work for images.

In a large study using 75 standard images, we compare various CS reconstruction methods util-ising random measurement filters from different ensembles to a number of techniques which sequentially search for these filters, including our own, and Bayesian projection optimisation [Ji and Carin, 2007]. Similar to Weiss et al. [2007], we find that a simple heuristic of measur-ing wavelet coefficients in a fixed, top-down ordermeasur-ing significantly outperforms CS methods using random measurements, even if modern CS reconstruction algorithms are applied; the approach of Ji and Carin [2007] performs even worse. Beyond that, we show that our efficient approximation to sequential Bayesian design can be used to learn measurement filters which indeed outperform measuring wavelet coefficients top-down. Our results show that the prop-erty of incoherence of a measurement design, which plays a central role in the “unstructured except for random sparsity” theoretical CS setting, bears only little significance for measuring real natural images. As we will discuss in more detail, our findings indicate that certainly for natural images, but also for other signals with non-Gaussian but structured statistics, measure-ment designs can be optimised in a data-driven way from little concrete prior knowledge, with outcomes that can be significantly superior to uninformed or even coloured random designs.

The main property driving the design optimisation in our case is the ability of the Bayesian re-construction method to maintain valid uncertainty beliefs about its point estimates at all times.

The structure of the chapter is as follows. The experimental design approach to CS is intro-duced in section 5.1 and our image acquisition model is detailed in section 5.2. Our framework

for approximate inference is described in section 5.3, where we also show how to apply it to large problems, especially for sequential acquisition. Other approaches to the same problem are reviewed in section 5.4. The empirical validation encompasses a series of experiments, comparing a range of adaptive compressed sensing methods on artificial data (section 5.5.1), and on the problem of measuring natural images (section 5.5.2).

Im Dokument Bayesian Inference and Experimental Design for Large Generalised Linear Models (Seite 97-100)