Evolution Inversions - Inverse Problems

Age τ Gyr

1.4 Inverse Problems

1.4.1 Evolution Inversions

With the equations of Section 1.2 and some chosen initial conditions, we can simulate the life of a star, and at each step of the way, determine what obser-vations of that star would yield. Thus we have a forward model M which is parameterized by initial conditionsxand timeτ, and yields datay:

M(x,τ) =y (1.127)

x= [M,Y₀,Z₀,α_MLT,. . .] (1.128) y= [L,T_eff, [Fe/H],ν,. . .]. (1.129) We now seek to interpret observations of a star in the context of the theory of stellar evolution. In other words, we seek the inverse function:

M⁻¹(y) = [x,τ]. (1.130) Of course, we can also seek a function that outputs additional quantities at the present age, such as the radius if it has not been observed. There are several approaches that have been taken to solve this problem, which I will now review.

Scaling Relations

A simple approach to estimate stellar properties is to “scale” them from solar values using the equations of stellar structure and pulsation. While such an ap-proach does not solve the full evolution inversion problem, it shares a common goal of estimating (a more limited set of) properties such as the stellar mass.

A simple example comes from the Stefan-Boltzmann law (Equation 1.22).

Replacing this equation with ratios with respect to the solar values, we may obtain

R_∗ R =

L_∗ L

−2 T_eff,_∗ T_eff,

(1.131) from which we can estimate an unknown stellar radius R_∗ from a measured stellar luminosity L_∗ and effective temperature T_eff,_∗. In principle, this relation works; in practice, the luminosities of most stars are unknown, and effective temperatures are measured rather imprecisely ('50K uncertainty).

The same kind of manipulation can be used on the asymptotic equations of stellar pulsation to obtain stellar masses and radii. From manipulation of Equations (1.62) and (1.66) we find (e.g.,Kjeldsen and Bedding 1995): which hold to decent approximation. Viani et al. (2017) recently pointed out that theν_maxscaling relation can be improved by including a term for the mean molecular weight.

As stars evolve into giants, the assumption of homology breaks down more and more, leading to systematic errors as high as15% (e.g.,Gaulme et al.2016).

By comparison of theoretical red giant model mode frequencies with those given by the scaling relations,Guggenberger et al. (2016, 2017) developed metallicity-dependent and mass-metallicity-dependent corrections to the∆νscaling relation.

These scaling relations do not tell us about the age or evolution of the star.

We saw previously that the small frequency separation probes the sound speed gradient, which is then an indicator on the main sequence of the conditions in the core, and therefore main-sequence age. The so-called C–D diagram shows the core-hydrogen abundance and stellar mass as a function of the frequency separations (Christensen-Dalsgaard 1984, see also Figure 1.26). If all stars had the solar abundances and solar mixing length, it would suffice to look up their mass and core-hydrogen abundance in this diagram. Since they do not, a more sophisticated approach is required.

Repeated Forward Modelling

A more involved approach to determining the properties of stars is through repeated forward modelling. Such an approach can also be applied to non-solar-like stars (e.g., evolved stars) where homology relations break down. These methods still make no attempt to determine the functionM⁻¹. Though there are variations, they instead try to optimize the result of the forward operator against the observations:

[xˆ, ˆτ] =arg min

[x,τ]

[M(x,τ) −y]^TΣ⁻¹_y [M(x,τ) −y] (1.134) where ˆ·means the optimal·, andΣ_yis the covariance matrix for the observations.

There are several drawbacks with this approach:

Speed. This approach can be prohibitively slow, especially if new models need to be computed for each input, or if multiple input parameters are being optimized. This is often dealt with by applying additional assump-tions to simplify the problem. For example, the mixing length parameter

0.8 0.7

F^IGURE 1.26. The C–D diagram. The small frequency separation is a proxy for core hydrogen abundance (Xc, dashed lines) through the sound speed gradient, and the large frequency separation is a proxy for stellar mass (M, solid lines) through the mean density. The gray lines are evolutionary simulations varied in their initial mass and evolved along the main sequence. The frequencies of the models have been calculated using GYRE (Townsend and Teitler2013). Stars with M'1.8 M do not have convective envelopes on the main sequence and are therefore not theoretically predicted to harbor solar-like oscillations. The points are LEGACY stars observed byKepler, colored by their metallicity (Lund et al.2017). Many of the stars fall off the diagram, thus illustrating its limitations as a look-up table for stellar properties. Figure adapted fromBellinger et al.2017a.

can be kept fixed to the solar-calibrated value (e.g.,Silva Aguirre et al.2015, 2017). Another simplification is to calculate the initial helium abundance from the initial metallicity by assuming a galactic chemical evolution law (e.g., Silva Aguirre et al.2015, 2017). This is usually achieved by fitting a line through to two points: the primordial helium abundance from models of Big Bang nucleosynthesis [Y_p = 0.2463,Z_p = 0] (e.g., Coc et al. 2014) and the calibrated initial solar mixture, e.g., [Y_0, =0.273,Z_0, =0.019], so

∆Y/∆Z '1.4. The optimization is then performed over a limited set of in-put parameters (e.g., [M,Z₀]) and potentially on a pre-computed grid of models as well. However, the end result then has (typically unpropagated) systematic errors.

Local Minima. Commonly, iterative numerical optimization algorithms such as Levenberg–Marquardt (1944, 1963) and Nelder–Mead (1965) are applied for this task (e.g., Lebreton and Goupil 2014, Appourchaux et al.

2015). These approaches can have difficulty finding global minima of the solution.

There are also no currently known theoretical bounds on the complexity of a Nelder-Mead search (Singer and Singer 1999). It is however known that this algorithm scales poorly to high dimensions (e.g., Chen et al.2015).

Redundancy. This approach implicitly assumes that each bit of observable information provides a fully independent constraint to the stellar model, and weights each observation only by its uncertainty. In reality, the ob-servations have some degree of redundancy with respect to the aspects of the model that they constrain (Angelou & Bellinger et al. 2017, see also Chapter 3). Matching such an aspect of the model is then arbitrarily up-weighted. Some practitioners deal with this problem by applying ad hoc weightings (e.g.,Paxton et al.2013).

We therefore seek an approach that naturally avoids these problems.

Random Forest Regression

In recent years, machine learning techniques have become increasingly popular for solving inverse problems (e.g.,Rosasco et al. 2005, Fai et al. 2017, Adler and Öktem2017). Some applications include automatic photograph coloration (Lars-son et al. 2016), image reconstruction (e.g., Schlemper et al. 2017), and medical imaging (e.g.,Prato and Zanni2008, Jin et al.2017). In fact, supervised learning itself can be viewed as an inverse problem (Vito et al. 2005).

In Chapter2we propose a solution to the evolution inversion problem based on machine learning. In particular, we use the variant of random forest regres-sion (Breiman 2001) known as extremely randomized trees (Geurts et al. 2006) to learn the function M⁻¹ from a dense grid of evolutionary simulations. En-semble tree-based algorithms are known to be quick to train (especially because the task is ‘embarrassingly’ parallelizable), quick to predict (when the number of trees is not very large), and to have very good predictive performance (e.g., Caruana and Niculescu-Mizil 2006). Furthermore, the bootstrap aggregation (“bagging”) that is performed helps with problem degeneracy and dimension-ality (e.g.,Skurichina and Duin 2002). Random forests can suffer from reduced performance if the number of redundant variables is large (Louppe2014), how-ever there are strategies to deal with this drawback (Tuv et al.2009).

Louppe(2014) derived the worst-case time complexity of training extremely randomized trees to be O(MKN²), where M is the number of trees, N is the number of samples, andK is the number of features that is randomly drawn at each node. In Chapter2, we cross-validateMand find satisfactory performance atM=256. The parameterKvaries between2and9, depending on the types of observations available for a given star.

To obtain the posterior distribution of solutions for an observed star with measurement uncertainties, we pass random instances of the observations per-turbed by their uncertainties through the trained network. We have to choose

how many random instances that we will use. This number should be chosen such that the sample distribution converges to a reasonable degree to the pop-ulation distribution. A useful way to quantify the differences in distributions is the Kullback-Leibler (KL) divergence, also known as relative entropy:

D_KL(P||Q) = Z_∞

−∞p(x)logp(x)

q(x) dx (1.135) where P and Q are two continuous random variables and p and q are their respective densities (Kullback and Leibler1951). A low relative entropy indicates similarity.

We seek to determine how many random samples we need to generate in order for our posterior distributions to converge to a reasonable degree to their actual distributions. A proxy for this would be to determine the KL divergence between the normal distribution and sample normal distributions of varying sizes. Figure 1.27 shows an example of a standard normal distribution ψ and sample normal densities with different sample sizes. The figure furthermore shows the KL divergence of these sample normal distributions as a function of sample size, averaged over 1,000 random trials. The distribution converges around 10,000 samples. Thus, we propagate 10,000 random instances of the measurement uncertainty through the random forest. Applying the technique fleshed out in detail in Chapter 2 to 94 stars observed by Kepler, we find the estimates shown in Figure1.28.

Im Dokument Inverse Problems in Asteroseismology (Seite 79-83)