Backpropagation Algorithm and its Hardware Implementations: A Review

Nowadays, artificial intelligence is gearing up at faster pace. Hardware integration with Artificial Neural Networks (ANN) have paved the way to different applications in areas like control engineering, robotics and navigation. There has been extensive research in the field of machine learning to use the system in advanced applications. The combination of hardware and software modules help to build systems in a robust manner. This paper focuses mainly on back propagation algorithm and its different hardware implementation using FPGA, ASIC, Memristor and Microcontroller. Implementation of back propagation algorithm on Field programmable gate arrays is analysed in detail. FPGAs parallel processing feature makes it unique for neural networks. The parallel processing models implement back propagation algorithms for calculating errors in the hidden layers of neural networks. This review helps the researchers to understand the implementation of various hardware with back propagation algorithm also the comparative analysis on the parameters helps to identify the suitable hardware based on their requirements.


Introduction
Artificial intelligence is seen in most of the software technologies and hardware implementations. The artificial neural networks form the basis of modern era applications. ANNs mimic the nervous system of the human body. It is similar to the neurons in brain. As the neurons in the brain are connected by different nerves, artificial neural networks have different nodes which are connected by rules. As shown in Figure  1, a simple neural network has two inputs, hidden nodes, and two outputs. The nodes can be an input, output or hidden. The neuron nodes are connected by rules or algorithms.
ANNs incorporate many algorithms for different kind of applications [1]. Algorithms are implemented by following specific rules of ANN. Different algorithms have been devised to satisfy the required specific feature. Depending on the functionality or feature, many interconnected layers are present. The hidden nodes are interconnected in the form of a pattern and rules are applied in the form of equations in feed forward networks, and these equations are iterated multiple times. Feed-forward networks follow delta rule. Errors occurred at the output nodes are calculated and these errors are back propagated and distributed across the hidden nodes. Back propagation algorithm is one of the algorithms extensively used in ANNs. The back-propagation algorithm can be implemented in hardware using FPGA or ASIC or memristor or microcontrollers. Any neural structure depends on the performance and stability of the system. Machine learning configures many algorithms to implement different functionalities on hardware. Few functionalities can be done on software and few functionalities on hardware modules. Combination of both software and hardware modules improve the system capabilities like speed, performance, area utilization, cost and memory. This paper is divided into five sections. Section II and III describes the artificial intelligence and back propagation algorithm respectively. Section IV described the hardware implementation and discusses the comparative analysis. Section V concludes the review work on hardware implementation.

Artificial Intelligence
Neural network follows a specific pattern or implements a learning rule. In smaller networks, when the inputs nodes are activated, synaptic weights are updated and they are forwarded to output nodes. These kinds of synapses require to follow the rules [2]. There are different training algorithms are available like back propagation, genetic and krill herd algorithms [2], [3].
Calculations of synaptic weights requires the training of hidden nodes through functions, rules, and algorithms. When input nodes are excited, information is processed by the hidden layers and the processed information is given to output nodes. Back propagation algorithms are used in machine learning applications.
Machine learning (ML) is a branch of artificial intelligence. In modern era of software, machine learning depends on prediction of data sets based on experience. It employs various algorithms for different modules in the software.
ML is one of the approaches to Artificial Intelligence which has applications in many areas including EDA.
[4] Moreover, ML takes samples from different methods and dynamically builds timing models automatically.
[4] Machine learning employs advanced algorithms with complex data sets and it analyses data which are multi-dimensional in nature. It forms a known pattern or unknown pattern from multidimensional data.
[4] Back propagation algorithm is extensively employed in machine learning in feed forward networks.

Back Propagation Algorithm
Backpropagation (BP) algorithm has extensive applications in artificial neural networks.
[5] An algorithm can be executed in software or in hardware. The complexity of algorithms execution in software increases, when the modules require parallel processing, and it takes more iterations. When parallel processing is required [6], implementation of algorithm on hardware improves the speed [7]. Timescale option in hardware gives flexibility for concurrency.
The feed forward networks and recurrent networks do not have feedback connections. Perceptron and Adaline are single layer feed forward networks.
[2] In multi-layer feedforward networks, there are different hidden layers. A node in the hidden layer takes the data coming from below layer and sends data directly to the above layer, and there are no connections within a particular layer.
[2] The nodes must be trained using a particular learning rule. There are two learning rules to train single layer networks. They are perceptron rule and delta rule which are iterative procedures to adjust the weights of the linear networks. The functionalities implemented by single layer networks are very limited, so multi-layer networks are employed for various features. [2] In 1969, Minsky implemented a feed-forward network, which has paved a way to a new era of designers to construct ANNs. In single layer, there are no hidden layers whereas multi-layer network has hidden layers. But the problem of adjustment in weights across the nodes from input to hidden nodes was not solved in [2].
The solution for adjustment of weights is done by backpropagating the errors. While distributing the weights across the nodes, there are some errors in the calculations of a function. When a learning rule is followed or a teaching pattern is applied, the errors from output layers must be distributed across the hidden nodes with the help of back propagation property. For feed-forward, linear activation function, Two rules are applied for multi-layer networks they are i) generalized delta rule and ii) chain rule. In feed forward network, hidden nodes are modelled with non-linear functions. While calculating and updating the weights, few errors usually occur. These errors are distributed across all the hidden nodes according to chain rule. The generalized delta rules have two stages, in the first stage, for every input propagated through the network, output is calculated, and the calculated output is compared with the desired value. Second stage includes a backward pass through the network, where the error signal is passed to each node in the network and appropriate weight changes are calculated then synaptic weights for both the layers are updated. Before applying back propagation process on the hidden nodes, weighted delta value from each output node and the activation function of the hidden unit must be considered [8]. Flow of back propagation algorithm is shown in Figure 2. Step 1. The neural network is initialized, and all the weights are initialized.
Step 2. Input activation functions are given.
Step 3. The weight for the hidden nodes is calculated Step 4. The errors in desired to actual values of the outputs are calculated Step 5. Weights of the hidden nodes are updated.
Step 6. Check for the error and go to step 3 until error becomes less than 0.00001.
Back propagation algorithm has low convergence rate and the system get struck in local minima. Various advanced algorithms based on same learning rule as back propagation have been developed. The local minima problem can be rectified to some extent in generalized back propagation algorithm and the convergence rate of generalized back propagation is improved as it uses ordinary differential equation. [9]

Hardware Implementation
Back propagation algorithm can be implemented in different hardware modules. In this paper, microcontroller, memristor, ASIC and FPGA implementations are studied and discussed. Portion of the algorithm can be implemented on software and remaining portion is on hardware. Software part of BP algorithm is done mostly in MATLAB or C or HDL code is written.

A. Microcontroller Implementation
In [24], [25] neural network implements BP algorithm for multi-layers on microcontroller boards. Offline training of the network is done by tangent-sigmoid function for activating the neurons and real time implementation is done by tangent-sigmoid function by piecewise linear approximation. Microcontroller 89C52 can be used to realize perceptron for applications like a robotic car [10]. Extra circuitry is required to implement the neural hardware [11].
MATLAB has Neural network toolbox program, the firing function of neurons is done by piece wise approximation of tangent sigmoid function, so that it can be implemented on microcontroller. Neural network is selected on the trial and error basis. Hidden layer is activated by tansig-sigmoid function, and for the output layer a pure linear function is applied. Different frameworks like TensorFlow can be employed to design AI system for new modules, ideas, and interfaces.
[12] Ternarized backpropagation algorithm is aimed particularly for processing of edge technology.
As shown in the Figure 3, the input sensors data has to be learned. There is a transfer function in simulation tool which converts the inputs on to the neural network which is based on microcontroller. Microcontroller board has the memory block. The neural network must be trained with BP learning algorithm and the outputs obtained are displayed through output ports of the microcontroller.  In 2014, memristor, made of metal oxide crossbars is experimented to implement a perceptron. A memristor is used to design a single layer neural network [18]. The memristor experiment was extended to three-layer perceptron, and the perceptron was simulated using back propagation algorithm [19], [5]. Two series memristor works as half bridge rectifier. As shown in Figure 4, according to the input voltage applied, memristances changes depending on the polarity. The memristances act as weights in neural hardware [8].

C. ASIC Implementation
ASIC design flow [20] in simple form is shown in the Figure 5. The register transfer level code of the VHDL code written and is synthesized [21], then a gate level netlist is generated. The hardware implementation in ASIC has additional steps than FPGA, such as floor planning, placement and routing, and physical implementation. These additional steps will take longer time when compared to FPGA. The Zynq 7000 series FPGA architecture consists of SoC, it integrates ARM based processor with hardware programmable FPGA. On a single device, it has combination of central processing unit, digital signal processor, ASSP. The 7 series architecture is complex than older families. Moreover, its performance is better than previous families. All Xilinx FPGAs consist of slices, memory, multipliers, programmable interconnects, clock buffers. The neurons in the artificial intelligence network are concurrent. They require parallel processing while implementing on hardware. The core feature of FPGA being parallel processing, the implementation of neural hardware is done. [22] Figure 6 shows the flow [23] from High Level Language like C, Java to FPGA. Many compilers are available online. SeaCucumber software is a synthesizing compiler. Circuits are generated by SeaCucumber, accepting Java class files as input [18]. Similarly, online compilers can be used to convert from C code to hardware module interface. [24] The digital design engineers codes in HDL. IEEE Standard 1364, is applied in Verilog HDL for describing various components and circuits [25]. MATLAB can be used to design algorithms and can be simulated with Simulink. With the help of HDL coder, VHDL or Verilog code can be generated in MATLAB.
FPGAs has programmable logic blocks array. As shown [4] in Figure 7, It has several blocks of I/O, memory, multipliers, general logic, with a routing fabric which can be programmed. The I/O blocks connects the FPGA chip to the external world. The FPGAs can be programmed even after the fabrication is completed. By writing a code in HDLs, FPGAs can be programmed for a particular function or task. FPGAs are re-programmable and with different FPGA architecture, machine language algorithms can be employed [26], [4]. As the re-programmable characteristic of FPGA is suitable for machine learning type algorithms, Parallel processing implementation has many applications in machine learning [27] Field programmable gate arrays are preferable to application specific integrated circuit for various reasons. FPGA offer fast programming and testing by the end user. FPGAs are best for prototyping, and the ease of shifting to final design is what makes it unique. The re-programmable feature of FPGAs gives flexibility for the designer. The reusability of FPGAs for other design is possible, and also the design changes/modifications are way easier in comparison to ASICs.
The financial risk associated with the FPGAs is less when compared to ASICs, and turn-around time of ASICs are much longer. In terms of cost, FPGAs incur low costs in small volumes, that means it offers lower start-up costs. The design tools used for FPGA are less expensive in comparison to ASIC. The major disadvantages of FPGA compared to ASICs are slower, larger area, uses more transistors per logic, power consumption [25]. FPGA requires more area than ASIC, has speed performance slower than ASIC and consumes dynamic power more than ASIC [20], [28].

E. Performance of Back Propagation Algorithm on FPGA
There are diverse FPGA generic libraries created for ANN algorithms. While coding back propagation algorithm, these libraries will be useful. A 2x2xl neural network of two inputs, one output and two hidden nodes has been designed. The backpropagation learning algorithm is designed completely on FPGA and resource utilization is optimized [28]. To get the output neuron in particular range [-1,1], tansig is used as activation function for hidden layer. To limit output neuron in range [0,1], Sigmoid activation function is used. [28] To train the neural network, tansig and sigmoid functions has been applied iteratively and programmed on FPGA. [28] Sigmoid function implementation on an FPGA effectively is the most tough task on part of the designers. In other cases, alternative methods of sigmoid are employed. As shown in the above table 1, back propagation algorithm can be easily implemented easily on FPGA, microcontroller in comparison to ASIC and memristor.

F. Comparative Analysis of various Hardware implementation
FPGAs occupy less area but less expensive in comparison to ASIC. As memristor is a nanoscale device, back propagation can be implemented for less nodes but difficult when memristor crossbars are employed. As microcontroller boards are readily available, microcontroller boards offer less flexibility in comparison to FPGA boards when implementing back propagation algorithm. Production time is very less for FPGA and microcontroller, but FPGAs offer more flexibility in comparison to microcontroller. Hardware complexity is less for FPGA and ASIC. The coding languages used for FPGAs, ASIC are Verilog, VHDL and microcontroller uses C, C++. Memristor simulation can be done in LTSpice, MATLAB. Power consumption is less for ASIC, Microcontroller, memristor in comparison to FPGA.

Conclusions
Various back propagation algorithms on FPGA architecture, Microcontroller, ASIC, memristor is studied and analyzed. Memristor based neural networks occupy less area but as hardware complexity