Artificial neural networks - Machine learning

2.3 Machine learning

2.3.1 Artificial neural networks

The artificial neural network (ANN) is defined by the inventor of a neurocomputer Hecht-Nielsen as “a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs” [Caud 88]. Specifically, a basic component of any ANN is an artificial neuron (node or processing unit). A neuron receives

signals from other neurons:

On the way to the neuron, the signal passes through synapses. A synapse de-notes the strength of the connection between two nodes. Mathematically, it is represented as a floating-point value, which is positive or negative. The synaptic weights of a neuron can be presented as a vector:

W =

w1 ... wn

. (2.12)

The “learning” part of an ANN means continuous adjustment of these weight values.

In the present thesis, a sum net function is used to sum all the input signals weighted by the corresponding synaptic weights and the biasb:

N ET =W X+b= The output of the sum net function is a positive or negative floating-point value which is passed to the activation function. ANNs can learn the non-linear relationship between input and output vectors if the activation function is non-linear. ANNs support various non-linear activation functions. The most popular activation functions are sigmoidal and hyperbolic tangent [Word 11]. In this thesis, the Elliot symmetric sigmoid activation function is used. It works approximately four times faster than the symmetric sigmoid since it does not use exponents [Beal 16].

How many neurons are needed to train an ANN to make reliable predictions and not to memorise patterns? This is one of the most challenging questions.

An optimum number of neurons can be ascertained after conducting empirical analysis. Some general guidelines on choosing the number of neurons can be found in [Sing 03, Beal 16].

Different distribution of neurons among the input-hidden-output layers, the number of layers and the input-output procedures influence the architecture and the scope of an application [Hoss 17]. In the present thesis, the multilayer feed-forward network (MLFFNN) is used. According to [Shar 13], the MLFFNN is

a universal approximator which can be used when little prior knowledge of the relationship between inputs and targets is available. The MLFFNN with one hidden layer generally produces excellent results [Sing 03, Beal 16]. If the results are not adequate, more layers might be added; notably, more training data are needed in such case.

Broadly, to train an MLFFNN is not an easy process since it has hidden layers. The optimal values of the hidden neuron outputs are not known. Hence, the weight of each hidden neuron cannot be adjusted appropriately knowing only the overall error value in the output layer of the network. According to several authors [Hayk 99, Word 11, Niel 15], the most appropriate technique for training the MLFFNN is back propagation learning, or simply propagation of error. It is a supervised learning algorithm introduced by Bryson and Ho in 1969 and rediscovered by Werbos in 1974 [Hayk 99]. The process consists of two passes through the network: the forward propagation and the backwards propagation.

The input vector is fed into the network during the forward propagation and then transmitted to the output layer through the hidden layers. The output of the network is then compared to the desired output and an error value is calculated for each of the neurons in the output layer. The error values are then propagated backwards.

The loss (also cost or error) function shows the effectiveness of the training process (correction of weights): how far the computed values of the output neurons are away from the target values. Mathematically, the desired value of the cost function is a global minimum as it is the point where the error of the training is the lowest. To reach the global minimum by minimising the cost function is not easy as the process is tangled by local minima. Furthermore, the presentation of the training set to the system only once rarely gives the desired result. That means, the system has to go through the same training set once again and discover more relevant associations between the input and output values by adjusting the weights. This leads to the error-performance surface or simply the error surface.

The true error surface is averaged over all possible input-output examples. For the network to improve the training performance, the operating point has to move down towards the global minimum on the error surface.

The procedure used to carry out the learning process in the ANN is called an optimisation algorithm. The algorithm finds a set of weights and biases which makes the cost as small as possible. Some optimisation algorithms are the gradient descent, the conjugate gradient, the Newton method, the Quasi-Newton method and the Levenberg-Marquardt algorithm [Hayk 99]. The algorithms have differ-ent characteristics and performance in terms of memory requiremdiffer-ents, processing

speed, and numerical precision. The description of each technique is out of the scope of the present thesis (detailed information about the optimisation algorithms can be found in various resources, for example, [Hayk 99, Niel 15]).

A list of the optimisation algorithms implemented in the MATLAB Neural Network Toolbox training functions can be found in [Beal 16] (the choice of the computational environment was motivated by the availability and popularity of machine learning libraries at the beginning of the PhD studies). According to Lahiri and Ghanta [Lahi 09], no algorithm is cross-functional to suit to all prob-lems; therefore, a quantitative analysis has to be performed in order to find a suitable optimisation algorithm for the damage parameter quantification problem.

Table 2.1 [Beal 16] summarises the Matlab training functions used in the present thesis having regard to the optimisation algorithms implemented in them.

Table 2.1: Training functions of the ANNs.

Training Abbr. Description function

trainrp RP Resilient back propagation is a network training function that updates weight and bias values according to the resilient back propagation algo-rithm. It is a simple batch mode training algorithm with fast conver-gence and minimal storage requirements.

trainscg SCG Scaled conjugate gradient back propagation is a network training func-tion that updates weight and bias values according to the scaled con-jugate gradient method. The method performs well particularly for networks with a large number of weights.

trainbf g BFG Broyden-Fletcher-Goldfarb-Shanno back propagation is a network train-ing function that updates weight and bias values accordtrain-ing to the quasi-Newton method. It requires more storage and has more computation in each iteration than the conjugate gradient method, but usually con-verges in fewer iterations.

trainlm LM Levenberg-Marquardt back propagation is a network training function that updates weight and bias values according to Levenberg-Marquardt optimization. It is the fastest training algorithm for networks of mod-erate size, but slower than the gradient methods.

trainbr BR Bayesian regularisation back propagation is a network training func-tion that updates the weight and bias values according to Levenberg-Marquardt optimization. It minimises the combination of squared errors and weights penalising large weights, and then determines the correct combination to produce a network that generalises well. The method performs well even on small noisy datasets since it does not require the validation dataset to be separated from the training dataset.

Im Dokument LJUBOV JAANUSKA Haar Wavelet Method (Seite 36-39)