• Keine Ergebnisse gefunden

4.5 Semi-Automatic Segmentation of Polycystic Kidneys

5.1.3 Training a Neural Network

ANNscan be formulated in terms of minimization of a loss function which is influenced by adaptative parameters such as the synaptic weights and biases. For training a neural network, gradient-based algorithms have been popularly employed as they are known to converge fast and a common method for gradient computation through application of recursive chain rule is known asBackpropagation [112, 113, 150, 151]. The backpropagation algorithm was first proposed by Paul Werbos [150] and became widely popular in the 1980s with the work of Rumelhart et al. [113] . In general, backpropagation has been widely discussed in context of supervised learning where it uses the desired output for each input and attempts to minimize the final loss function. However, it can also be used for unsupervised learning where the desired output is equal to the input and the network attempts to learn a compact representation of the input distribution. Examples of using backpropagation for unsupervised tasks include training of autoencoders [11], which may be typically useful for dimensionality reduction or for recently developed deep belief networks [54, 55]. The backpropagation process mainly includes two phases, aForward Passand aBackward Pass. For supervised learning, the input data is introduced to the network and processed through different layers of the network until it arrives at the output layer where the actual output is compared with desired output. Then, the error between the actual and desired output is computed in terms of minimization of the loss function and calculated for each neuron in the output layer.

These errors computed at neurons in the output layer are then propagated backwards to each layer in the network and during this process, backpropagation utilizes these errors to compute gradients of the loss function with respect to the weights in the network. Finally, the computed gradients in conjunction with a suitable optimization method, are used to update the weights of neurons in the network with an ultimate goal to minimize the loss. In this way, backpropagation allows randomly initialized neurons in a neural network to find the right set of parameters to learn relevant features from the input dataset for successful predictions.

After training, the goal of such a network is then to use these learned parameters to accurately identify similar patterns (or features) in new input data that was previously unseen during training phase and introduced to the network without any information regarding the expected output. Below, the two phases ofbackpropagationhave been described in more detail.

Forward and Backward Propagation

Consider a simple ANN with two inputs, two hidden neurons and, two output neurons as shown in figure 5.3. The goal of backpropagation is to optimize associated weights with each neuron such that the network learns to correctly map arbitrary inputs to the outputs. Starting from the input layer consisting of the inputs:x1andx2, respectively, the data reaches the first hidden layer consisting of neuronsh1andh2and produces training outputs ato1ando2. Each input data interacts with every individual neuron in the hidden layer and the total output of a neuron in the hidden layer is given by a combination of its weights and biases as shown below for the neuronh1.

neth1 =w1x1+w2x2+b1. (5.7) Similarly, the outputneth2 fromh2is computed by replacing the weights withw3andw4in the above equation. Using a sigmoid activation function, the output forh1is given by:

outh1 = 1

(1 +e−neth1). (5.8)

5.1 Artificial Neural Networks 59

x

1

Fig. 5.3. Artificial Neural Network Training.A simple ANN representation consisting of two inputs, two hidden neurons and two output neurons.

In the same way, the activation is applied onneth2. The above process is then repeated for the output layer neurons, by utilizing the outputsouth1andouth2 from hidden layer neurons as inputs to the final output layer. Thus, the outputouto1 ato1is given by:

neto1 =w5outh1+w6outh2+b2. (5.9) With application of the sigmoid activation function, we get:

outo1= 1

(1 +e−neto1). (5.10)

After computing the outputouto2 ato2, we now compute the error using an appropriate loss function for both output neuronso1ando2. One of the choices for loss function such as the sum-of-squared errorswould be computed as:

E=Σ1

2(outputdesiredoutputactual)2. (5.11) In the above equations,outputdesiredis given by the ground truth labels whileoutputactualis the network output given byouto1orouto2. The total error from the output neuronso1ando2

is given by:

E=Eo1+Eo2. (5.12)

This leads to the next phase ofbackward propagationwhere the goal is to update each of the weights in the network such that the actual output from the training is brought closer to desired output by minimizing the loss function for each neuron in the entire network. Thus, using the chain rule, we compute the change in error due to contribution of the weightw5as partial derivative ofEwith respect tow5:

∂E

Taking partial derivative at equation 5.11 w.r.touto1, we get:

∂E

∂outo1 =−(desiredo1outo1), (5.14)

60 Chapter 5 Deep Learning for Segmentation

where,desiredo1is the ground truth output value ato1and, partial derivative ofouto1 w.r.t neto1, is computed as:

∂outo1

∂neto1

=outo1(1−outo1). (5.15)

While considering equation 5.9, the partial derivative ofneto1 w.r.tw5is equal toouth1.

∂neto1

Similarly, chain rule is applied to compute changes in error with respect tow6,w7andw8. The backward pass is then propagated backwards to compute change in the error associated with neurons in thehidden layer, such as forw1given by:

∂E

Since the output of each hidden layer neuron contributes to the error of both output neurons outo1 andouto2, therefore in equation 5.18: following equation 5.14, the partial derivative ofEouto1 w.r.touto1 is given by:

∂Eouto1

∂outo1

=−(desiredo1outo1), (5.23) and from equation 5.15, we know that:

∂outo1

∂neto1 =outo1(1−outo1). (5.24)

Now, using equation 5.9, we find the partial derivative of neto1 w.r.touth1. Therefore, in equation 5.20:

In equation 5.19, the process is similarly repeated for ∂E∂outouto2

(denoted asEtot)can be calculated by adding above two equations (5.26 and 5.27) while remaining terms are given by:

∂outh1

Finally, equation 5.18 can be solved as:

∂E

∂w1

=Etotouth1(1−outh1)x1. (5.30) Changes in error with respect tow2, w3 andw4 can be computed similarly. This way, by calculating partial derivatives starting from the output layer through the hidden layer, all weights in the network are updated while minimizing the loss function and this process is repeated recursively to achieve the closest optimal solution to the desired output.