Training a Neural Network - Semi-Automatic Segmentation of Polycystic Kidneys

4.5 Semi-Automatic Segmentation of Polycystic Kidneys

5.1.3 Training a Neural Network

ANNscan be formulated in terms of minimization of a loss function which is influenced by adaptative parameters such as the synaptic weights and biases. For training a neural network, gradient-based algorithms have been popularly employed as they are known to converge fast and a common method for gradient computation through application of recursive chain rule is known asBackpropagation [112, 113, 150, 151]. The backpropagation algorithm was first proposed by Paul Werbos [150] and became widely popular in the 1980s with the work of Rumelhart et al. [113] . In general, backpropagation has been widely discussed in context of supervised learning where it uses the desired output for each input and attempts to minimize the final loss function. However, it can also be used for unsupervised learning where the desired output is equal to the input and the network attempts to learn a compact representation of the input distribution. Examples of using backpropagation for unsupervised tasks include training of autoencoders [11], which may be typically useful for dimensionality reduction or for recently developed deep belief networks [54, 55]. The backpropagation process mainly includes two phases, aForward Passand aBackward Pass. For supervised learning, the input data is introduced to the network and processed through different layers of the network until it arrives at the output layer where the actual output is compared with desired output. Then, the error between the actual and desired output is computed in terms of minimization of the loss function and calculated for each neuron in the output layer.

These errors computed at neurons in the output layer are then propagated backwards to each layer in the network and during this process, backpropagation utilizes these errors to compute gradients of the loss function with respect to the weights in the network. Finally, the computed gradients in conjunction with a suitable optimization method, are used to update the weights of neurons in the network with an ultimate goal to minimize the loss. In this way, backpropagation allows randomly initialized neurons in a neural network to find the right set of parameters to learn relevant features from the input dataset for successful predictions.

After training, the goal of such a network is then to use these learned parameters to accurately identify similar patterns (or features) in new input data that was previously unseen during training phase and introduced to the network without any information regarding the expected output. Below, the two phases ofbackpropagationhave been described in more detail.

Forward and Backward Propagation

Consider a simple ANN with two inputs, two hidden neurons and, two output neurons as shown in figure 5.3. The goal of backpropagation is to optimize associated weights with each neuron such that the network learns to correctly map arbitrary inputs to the outputs. Starting from the input layer consisting of the inputs:x1andx2, respectively, the data reaches the first hidden layer consisting of neuronsh₁andh₂and produces training outputs ato₁ando₂. Each input data interacts with every individual neuron in the hidden layer and the total output of a neuron in the hidden layer is given by a combination of its weights and biases as shown below for the neuronh1.

net_h₁ =w₁x₁+w₂x₂+b₁. (5.7) Similarly, the outputneth2 fromh2is computed by replacing the weights withw3andw4in the above equation. Using a sigmoid activation function, the output forh1is given by:

outh₁ = 1

(1 +e^−net^h¹). (5.8)

5.1 Artificial Neural Networks 59

x

₁

Fig. 5.3. Artificial Neural Network Training.A simple ANN representation consisting of two inputs, two hidden neurons and two output neurons.

In the same way, the activation is applied onneth₂. The above process is then repeated for the output layer neurons, by utilizing the outputsout_h₁andout_h₂ from hidden layer neurons as inputs to the final output layer. Thus, the outputouto₁ ato1is given by:

net_o₁ =w₅out_h₁+w₆out_h₂+b₂. (5.9) With application of the sigmoid activation function, we get:

outo₁= 1

(1 +e^−net^o¹). (5.10)

After computing the outputout_o₂ ato₂, we now compute the error using an appropriate loss function for both output neuronso1ando2. One of the choices for loss function such as the sum-of-squared errorswould be computed as:

E=Σ1

2(outputdesired−outputactual)². (5.11) In the above equations,outputdesiredis given by the ground truth labels whileoutputactualis the network output given byouto₁orouto₂. The total error from the output neuronso1ando2

is given by:

E=Eo₁+Eo₂. (5.12)

This leads to the next phase ofbackward propagationwhere the goal is to update each of the weights in the network such that the actual output from the training is brought closer to desired output by minimizing the loss function for each neuron in the entire network. Thus, using the chain rule, we compute the change in error due to contribution of the weightw₅as partial derivative ofEwith respect tow5:

∂E

Taking partial derivative at equation 5.11 w.r.tout_o₁, we get:

∂E

∂out_o₁ =−(desiredo1−outo₁), (5.14)

60 Chapter 5 Deep Learning for Segmentation

where,desired_o1is the ground truth output value ato₁and, partial derivative ofout_o₁ w.r.t neto₁, is computed as:

∂out_o₁

∂neto₁

=out_o₁(1−out_o₁). (5.15)

While considering equation 5.9, the partial derivative ofneto₁ w.r.tw5is equal toouth₁.

∂net_o₁

Similarly, chain rule is applied to compute changes in error with respect tow₆,w₇andw₈. The backward pass is then propagated backwards to compute change in the error associated with neurons in thehidden layer, such as forw₁given by:

∂E

Since the output of each hidden layer neuron contributes to the error of both output neurons out_o₁ andout_o₂, therefore in equation 5.18: following equation 5.14, the partial derivative ofEout_o₁ w.r.touto₁ is given by:

∂Eout_o₁

∂outo₁

=−(desiredo1−out_o₁), (5.23) and from equation 5.15, we know that:

∂outo₁

∂net_o₁ =outo₁(1−outo₁). (5.24)

Now, using equation 5.9, we find the partial derivative of net_o₁ w.r.tout_h₁. Therefore, in equation 5.20:

In equation 5.19, the process is similarly repeated for ^∂E_∂out^outo²

(denoted asE_tot)can be calculated by adding above two equations (5.26 and 5.27) while remaining terms are given by:

∂outh₁

Finally, equation 5.18 can be solved as:

∂E

∂w1

=E_totout_h1(1−out_h1)x₁. (5.30) Changes in error with respect tow₂, w₃ andw₄ can be computed similarly. This way, by calculating partial derivatives starting from the output layer through the hidden layer, all weights in the network are updated while minimizing the loss function and this process is repeated recursively to achieve the closest optimal solution to the desired output.

Im Dokument Machine Learning Methods for Segmentation in Autosomal Dominant Polycystic Kidney Disease (Seite 71-74)