Artificial Neural Networks - Reliability-Aware Architectures

1.3 Reliability-Aware Architectures

1.3.1 Artificial Neural Networks

”Artificial neural networks are an attempt at modeling the information process-ing capabilities of nervous systems.“[Roj96, p. 3]. Nervous systems as part of neural networks is a branch of neuroscience² [KSJ00, Fin01]. The research has investigated the neural network within brains and based on these discoveries de-veloped computational models, the artificial neural networks (ANNs). The first

2Neuroscience as a field of study dates back to early periods of human history. Evidences are stated that surgical practice on brains have already been performed during the Neolithic times to relieve cranial pressure or curing headaches. However, not until the mid of the nine-teenths century extensive neuroscientific knowledge was gained by systematic research, with a significant scientific increase through non-invasive studies of the brain of healthy test subjects [KSJ00, Fin01]. Neuroscience as topic includes a broad range of further studies: Molecular, cel-lular, developmental, structural, functional, evolutionary, computational, and medical aspects of the nervous system. The techniques have expanded from the individual nerve cells and their composition to complex activities of the brain.

Figure 1.8: Reliable Architecture using an ANN [Roj96, p.126]

of such computational models were introduced in 1943 by Warren McCulloch and Walter Pitts and laid the foundation to apply neural networks as instances of artificial intelligence [MP43]. After a period of depression in the 1970s and early 1980s, the neural network research experienced a renaissance in the mid 1980s through associative memory, perceptrons, support vector machines and more re-cently through deep learning [ZDL90, Hay98].

ANNs are systems of interconnected artificial neurons, as illustrated in Figure 1.8. ”The input is processed and relayed from one layer to the other, until the final result has been computed.“[Roj96, p.126] Sensor data for example serve as parameters of the input layer. The intermediate layers with its nodes are called hidden layers, since they are not directly interacting with the external environ-ment [BH00].

”The determination of the appropriate number of hidden layers and number of hidden nodes (NHN) in each layer is one of the most critical tasks in the ANN design.“ [BH00, p. 22] The output layer represents thenetwork func-tionsor the tasks the network has to process, all the needed steps for a successful execution are represented by the hidden layer. Adding/deleting the connections between neurons increase the quality of the output [vdM90]. Further, changing the weight of the interconnections results in different network functions/differ-ent outcomes of the tasks [Roj96]. The proper selection of the weights, the activa-tion funcactiva-tion and the net topology enables ANNs to learn to solve complex non-linear functions and execute various tasks like an autonomously flying aircraft [Cen03]. Therefore, modeling ANNs requires the definition of three important elements [Roj96]:

1. The structure of each artificial neuron (nodes), 2. The topology of the interconnections (network),

1.3 Reliability-Aware Architectures

Threshold θj

δ Transfer function

s_j φ

o_j Activation

function x₁ ^Weights

x₂ x₃ x_n^...

wi,j

w_i,j wi,j

w_n,j

φ_j

Figure 1.9: Artificial Neuron (based on [Smi97, p.461] and [BH00])

3. The learning algorithm, which weights all interconnections.

Figure 1.9 shows the structure of an artificial neuron (node) [Smi97, BH00], de-scribed by its four basic elements:

Weight function: All input parametersx_i :i ∈ {1..n}are weightedw_i,j = [−1..1] against each other, defining the ratio of influence each input has upon the neuron. A weight of zero for an input is equivalent to a not-existing edge, neglecting that input and parts the inhibitory influence (negative sign) from the excitatory influence (positive sign) of that input.

Transfer function: The transfer function δ evaluates the overall influence of all inputs, the net value s_j of the neuron. Equation (1.1) states that the input parameter and the weights define the transfer function, but other character-istics of the architecture may also be included.

Activation function: The activation function ϕ_j evaluates the net value s_j with the threshold valueθ_j and determines the output 0_j of the neuron. Further, ϕ_j is defined by the topology of the network and represents the influence each neuron has upon the overall system [MS10]. The functions are usu-ally monotonicusu-ally increasing, for example as a ramp, piecewise linear or sigmoid, hyperbolic tangent function [SMN11]. Especially in multilayer-perceptron neural networks the sigmoid function is used mostly [Hay98].

Threshold: The threshold valueθ_jcharacterizes the minimum net value of a neu-ron to be activated, which corresponds to the threshold potential of biolog-ical neurons.

The evaluation of the output o_j of the artificial neuron is done by the threshold value subtraction, as equation (1.2) states [Roj96, BH00].

s_j =δ_iⁿ₌₁(x_iw_i,j) (1.1)

o_j = ϕ_j(s_j−θ_j) (1.2)

In [Mea89, Roj96, MS10] Hardware Neural Networks (HNNs) were intro-duced, which begin to supersede the numerous software based implementation [MS10]. In [MS10] the development of hardware implementations of ANNs of the past 25 years has been summarized and the advantages of HNNs compared to ANNs stated as:

• The increase in speed by taking advantage of hardware parallelism,

• The decreasing of costs by lowering for example the component counts and power requirements and

• The ability the counteract degradation through fault tolerance and keeping the system running with reduced performances.

Digital architectures are using shift registers, latches or memories to store the dynamic changeable weights w_i,j and threshold value θ_j, and look-up ta-bles, standardized adders and multipliers are used for the neuron architecture [Roj96, MS10]. The advantage lies within the simplicity and the scalability (the cascadability and flexibility) of the components, the high signal-to-noise ratio and cheap fabrication [MS10]. The analog architectures are using resistors, charge-coupled devices, capacitors and floating gate transistors to store the dynamic changeable values [Roj96, MS10].

Learning involves updating the weights w_i,j dynamically, while the size of the components are fixed, which is done by varying the stored charges. Further, those architectures benefit from the physical effects of currents and voltages and are in general optimized in size [Roj96]. However,

”obtaining consistently pre-cise analog circuits, especially to compensate for variations in temperature and control voltages, requires sophisticated design and fabrication“[MS10]. Hybrid architectures represent mixed-signal implementations of the ANN as shown in [SLM99, SMN11]. Their focus is to combine the advantages of both domains, while the weaknesses are minimized.

1.3 Reliability-Aware Architectures

Figure 1.10: Implementation of an Artificial Neural Network [SMN11]

In [SMN11] a detailed description of a hardware implementation of the ANN based on Field-Programmable Gate Arrays (FPGA) is provided. Next to the pre-sented hardware layer with the neuron architecture, a global ANN learning unit is needed to transfer the learning effects provided onto the hardware.

”The con-trol unit [...] commands the digital neuron arithmetics only.“[SMN11, p. 655] The split between the control and the arithmetic unit allows the hardware to adapt to different multilayer perceptron neural networks[Roj96, SMN11] topologies, since the number of inputs and the dynamic changeable values can be altered on-the-fly. Therefore, the hardware is able to perform different ANN applications. Figure 1.10 shows the block diagram of the ANN hardware shown in [SMN11].

The strength of ANNs lies within machine learning and classification/rank-ing, achieved by the learning ability of neurons. The weights and threshold val-ues can be chosen randomly at the beginning. Even a basic trial-and-error learn-ing algorithm adjusts the values until the wanted behavior/result occurs. There-fore, a period of learning effectively affects the output of an ANN [Roj96].

”The problem is the time required for the learning process, which can increase expo-nentially with the size of the network.“[Roj96, p. 451] ANNs were successfully embedded in computer vision and speech & pattern recognition [Mea89]. A fur-ther field of application of ANNs is within robotics, with focus on reliability and specialized actuators/manipulators like prosthesis [MS10].

The strength of learning is also its weakness. ANNs tend to memorize the training data and blind themselves by that, unable to re-adjust to new data [BH00]. Only careful chosen net topologies prevent this overfitting [BH00], often referred to as the bias-variance trade-off [GJP95, EPP00, SS01]. Also, the training data need to be collected or generated manually [Pom93, Roj96]. It is essential, that a large training diversity of real-world operations is maintained [Pom93].

For this thesis the learning unit is regarded as circuit black box and not further investigated. Furthermore, to compare ANN to the other approaches, ANN has to be abstracted to be used as a task distribution system.

ANN as Task Distribution System The most basic approach is to add the neu-ral network as a centneu-ralized distribution unit containing for examplem·N-many neurons. The neural network is connected to the global learning unit and to each working core, as illustrated in Figure 1.11(a). Each core has its own monitor sup-plying the life signs to the neural network, while the output signals of the neural network unit represent thetask_ionsignals of each core.

The other approach, seen in Figure 1.11(b), is slicing the neural network into pieces and equips each core with an appropriate slice. This leads to the following:

1. The neural network is dispersed over the system and therefore less likely to be destroyed by a single impact, increasing the reliability.

2. The slices may differ in size, each core only needs those neurons placed locally, which are capable to trigger the core to allocate a appropriate task.

The global learning unit is neglected in regard to symmetry of the mechanism.

The unit is used beforehand offline to determine the weights w_i,j and threshold valueθ_j, but remains idle during the online operation time. The semi decentral-ized symmetric is used for any further consideration regarding ANN.

Im Dokument A highly dependable, analog multi-core mixed-signal task distribution system (Seite 43-48)