Interpretability of the neural networks - The structure and dynamics of materials using machine

contribu-tion to the energy. The calculacontribu-tion can however, be seen as an highly complex non-linear fit.

Algorithmic transparency on the other hand is related to the grasp of the error surface and the ability to predict the output of the model based on the inputs. In this regard, neural networks are frowned upon due to their non convexity. In fact, the same activation functions that are responsible for the success of neural networks due to their non linearity, are also responsible for the multiple local minima for which the neural network optimization tend to converge. This topic has been thoroughly discussed, for example in [506]. Nevertheless, even a local minima can provide more useful and accurate results than other methods.

Furthermore, we believe that the training for forces and stresses is very helpful, as we are restricting the optimization of the neural network to a Pareto curve. We would also like to point out that these quantities are calculated in a consistent and physical manner in the neural networks force-fields that we presented: through analytical differentiation of the energy.

Finally, post hoc interpretability concerns the knowledge that can be gained from the model itself. One can for example study the importance of pair wise or three body inter-actions to the energy using the symmetry functions. Neural networks are often regarded as black box algorithms. However the picture that has been painted over the years might not be so grim. With every study, the understanding of neural networks increases and, at least, they can be seen as powerful mathematical tools that are capable of efficiently and accurately approximating functions.

So, what can we expect from neural network, or other machine learning, force-fields? We do not believe DFT will ever be replaced by machine learning force-fields. Yet, they will provide descriptions for regions of the PES that are not quite accessible within DFT, and by extent, other electronic structure methods. Moreover, these force-fields will allow for more accurate simulations (such as MD runs) than those provided by simpler fitting methods (such as classical force-fields), and will permit considerable faster samplings of the PES, which translates into a considerable speed-up for global structure prediction methods [414, 431, 434].

Chapter 5 Copper based materials and cluster expansions

The world to me was a secret, which I desired to discover; to her it was a vacancy, which she sought to people with imaginations of her own.

Mary W. Shelley Frankenstein Discover... Our objective is to discover new materials and often that requires the develop-ment of different techniques and methodologies to study them. Usually a certain compound admits many crystal phases, however only a fraction of them are indeed stable, i.e., the com-pounds only crystallize in some of these phases. Moreover, some of these phases can even interact with each other and change the properties of the compound. For example, among copper based materials, a rather common occurrence is the stabilization of a compound due to copper vacancies.

In this chapter we present studies of copper based materials using cluster expansions.

We start by explaining cluster expansions and then we discuss photovoltaics materials and transparent conducting semiconductors (TCSs), in particular CZTS and cuprous iodide, respectively. Our work with CZTS focus on a stability study using genetic algorithms, and on the transition between the kesterite and stannite structures under the incorporation of iron. Our application of cuprous iodine involves the formation of stable phases with copper vacancy complexes.

5.1 Cluster expansions

In the previous chapters, we discussed the construction of approximations for the potential energy surface using machine learning. A similar approach consists on the expansion of the energy of a system in terms of effective cluster interaction (ECI), which embody the energetic information of the underlying crystal structure. This approach is usually denoted as cluster expansion [507–509] and can be understood as a generalization of the Ising model Hamiltonian.

The definition of the cluster expansion starts with mapping of each site i in a parent lattice with a occupation variable σi. For the case of a binary allow, these variables mimic

the spin and can take values of ±1 according to the type of atom that occupies the site.

A specific arrangement of these occupation variables denotes a configuration and can be represented by a vector σ containing all the individual occupation variables. We note that the cluster expansion can be formally defined for arbitrary multi-component alloys [509].

However here we focus on cluster expansion for binary alloys as implemented in the MAPS code of the alloy theoretic automated toolkit [510] (ATAT), which we used to construct the cluster expansions presented in this chapter.

Continuing with the definition, the energy of an alloy can then be parameterized as the following polynomial of the occupation vector:

E(σ) = X

Jαmα

* Y

i∈α⁰

σi

, (5.1)

where the sum is taken over all the non-equivalent clustersα and the averaged product over all the equivalent clusters α⁰. By cluster we mean a set of sites i, and equivalent clusters means that they can not be transformed into another by a symmetry operation of the space group of the parent lattice. Furthermore, mα denote the number of equivalent clusters, and Jα represent the coefficients of the expansion. In this formalism, they are usually designated as multiplicities and ECIs, respectively. The product between the multiplicities and the spin-products averaged over the entire lattice define the correlation matrix, which can be understood as the probability to find the cluster α in a configuration σ. This quantity can be written more explicitly as

Πk,n(σ) =mα

* Y

i∈α⁰

σi

= 1 k · 1

λ · 1 mα

I=1 mα

I=1

σI1σI2· · ·σIk (5.2) where λ is the number of atomic sites i in the unit cell, and k the number of vertices in the cluster. If all the clusters α are included in the sum, then the cluster expansion can represent any function of the configuration (such as the energy E(σ)), provided we have appropriate ECIs. However, the expansion converges quickly in practise. So, usually only compact clusters are considered, such as small pairs and triplets.

The Jα remain as the only unknown variables and their determination follows from the Structure Inversion Method or the Collony-Williams method [511]. Basically, this method requires the calculation of the energy of a small number of configurations using first principles methods (in our case DFT as described in appendix A), and the calculation of the correlation matrix. Then, the Jα correspond to the least square solution of eq. (5.1), i.e., the solution of its normal equation [512].

To measure the predicative power of the cluster expansion, the MAPS code uses the cross-validation score defined as

CV = 1 n

i=1

E_i^ref −E(i)

, (5.3)

where n represents the total number of structures, E_i^ref the energy of structure i obtained with the reference method, and E(i) the prediction of the cluster expansion obtained from

5.2. CZTS 79

Im Dokument The structure and dynamics of materials using machine learning (Seite 75-79)