Touch All the Bases
Give Your .NET App Brains and Brawn with the Intelligence of Neural Networks
Christopher M. Frenz
This article discusses:
|This article uses the following technologies:
.NET, Visual Basic
Code download available at:NeuralNetworks.exe(128 KB)
Programming Neural Networks
Back-Propagation of Error with Momentum
Putting the Network to Work
Evaluating Unknown Patterns
Pattern recognition is an increasingly complex field. Every day technologies such as handwriting recognition software, spam filters, and search engines are required to identify ever more complicated patterns. The difficulty that arises when these tasks are attempted through traditional programming is that they involve a multitude of variables and, more often than not, the relationships between these variables cannot be explicitly defined. For example, the differences between spam and legitimate e-mail are often fuzzy, so hardcoding a set of criteria to differentiate between the two can be difficult. To deal with these and similar issues, programmers are beginning to move away from such approaches and are adopting nonlinear programming techniques such as neural networks.
Artificial neural networks are designed much like biological neural networks. Both comprise a series of simple information processing units that operate in parallel. In both artificial and biological networks, these simple units are called neurons. Signals can be passed between neurons through a series of weighted connections. The pattern of these connections defines the architecture of the neural network and influences the functionality for which the neural net is best suited (pattern recognition, classification, and so on). Neural networks are able to "learn" by adjusting the strengths of these connections until they can approximate a function that computes the proper output for a given input pattern.
In this article I'll examine one of the most common types of neural networks, the feed-forward neural network, which is often used for pattern recognition and predictive purposes. I'll provide a small example program from medical informatics. The premise is that you have been hired by a group of doctors who are trying to predict their patients' risk for developing heart disease. Over the years they have monitored changes in potential risk factors of past patients, such as blood pressure, weight, and so on, and recorded whether these patients developed heart disease. The neural network under development will be trained with this information so that the doctors can predict the heart disease risk for their patients and take appropriate preventative action.
The typical architecture of a feed-forward neural network contains three layers: an input layer, a hidden layer, and an output layer (see Figure 1). The input layer transfers the array of input values into the neural network. The input layer data is then multiplied by a weight matrix (wij) and passed into the hidden layer neurons. Every possible interconnection possesses its own weight. Therefore the weight matrix has n × m dimensions, where n is the number of input layer neurons and m is the number of hidden layer neurons. Note that not every interconnection must actually exist, a case that can be modeled using a weight of zero for that interconnection.
Figure 1** Feed-Forward Neural Net **
The hidden layer neurons allow the network to represent how the elements of a complex pattern work together to produce a given output. The hidden layer increases the number of weighted interconnections. This means that the neural network can approximate more complex functions. In fact, the most basic neural network architectures, such as single-layer perceptions, lack the ability to even approximate the XOR function. A multilayer neural network like the one shown in Figure 1 can readily approximate the XOR function (such that specifying binary input values at the input layer will yield the correct XOR value of these input values at the output layer). For solving highly complex patterns, some neural networks will even employ some additional hidden layers.
A similar weight matrix (wjk) connects the hidden layer neurons to the output neurons. A bias is also added to each neuron in the hidden and output layers, which scales the neurons' input values before they pass through the neurons' transfer function. The transfer function, sometimes known as the activation function, takes the sum of all the neurons' weighted inputs and uses the value to calculate the neurons' output.
Programming Neural Networks
Now that you have an idea of what neural networks are and how they operate, let's start a new Windows® Application project in Visual Studio®. First you need to specify the number of neurons in each layer. To determine the number of input neurons, look at the DataSet provided by the physicians. The doctors provided three variables of interest: change in cholesterol, change in weight, and a family history of the disease. This means that you'll need a total of three input neurons. The number of hidden layer neurons is generally determined during the training process, which I'll discuss later, but for now pick an initial value of 3. Furthermore, since the physicians only want to know whether or not the individual is at risk for the disease, you'll only need a single output neuron, yielding a 3-3-1 architecture for the network.
Once you've established the number of neurons necessary for your network, you need to add the biases and weighted connections between these neurons. Since there's no way of knowing the appropriate weights and biases prior to training, randomly initialize each neuron with a weight between –0.5 and 0.5 (see Figure 2). For larger neural networks it may be advisable to employ a more sophisticated initialization procedure, such as Nguyen-Widrow (a simple modification to the common random weight initialization algorithm that can provide for faster training times).
Figure 2 Randomizing Neuron Weights
Private Sub Init(ByVal n As Integer, ByVal m As Integer) Dim I, J As Integer Randomize() For I = 0 To n - 1 For J = 0 To m - 1 hweight(I, J) = (Rnd() - 0.5) hweight2(I, J) = hweight(I, J) Next J Next I For J = 0 To m - 1 hbias(J) = (Rnd() - 0.5) hbias2(J) = hbias(J) oweight(J) = (Rnd() - 0.5) oweight2(J) = oweight(J) Next J obias = (Rnd() = 0.5) obias2 = obias End Sub
Within the code in Figure 2, the variables prefixed with "h," such as hweight, refer to the variables used by the hidden layer neurons, whereas those prefixed with "o" refer to output neuron variables. The variables without the numeric suffix are the actual weights and biases used by the network, while those with the suffix 2 will play a role in the training process. The training algorithm requires that you keep track of the current weights and biases as well as the values from the previous training iteration. At this point there have been no previous training iterations, so you can simply initialize both sets of variables to the same values. For the sake of simplicity, many variables in this code were not declared at the procedural level, but rather as private form-level variables, because they will be used by multiple procedures during program execution. Thus, if a variable is used in a subroutine, but not declared within that subroutine, make sure to declare that variable in the declarations section of your project's code.
At this point the architecture of the neural network has been laid out and the strength of the interconnections initialized, but you still lack a means of transferring data between layers. The first step is to establish a way to transfer the input data from the input neurons to the hidden layer neurons. When coding, it's important to remember that each hidden layer neuron is connected to every input neuron and that each individual hidden layer neuron will receive the sum of the weighted input neuron values. Applying the bias of this hidden layer neuron then scales this summation. Thus, for each individual neuron the process should proceed according to the formula
where Xi represents the input neuron values and wij represents the weight connecting the input neuron to the hidden layer neuron. In other words, the input value to a node is the bias for that node added to the sum of each input interconnection, where the value of an interconnection is the input neuron's value multiplied by the weight of the connection. You can see this in the code that follows. The subroutine in this code snippet contains two nested loops. The inner loop of the HiddenInput function carries out the summation process for each individual hidden layer neuron, as shown in the equation. The outer loop ensures that this process is repeated for each hidden layer neuron in the neural network:
Private Sub HiddenInput(ByVal n As Integer, ByVal m As Integer) Dim I, J As Integer Dim sum As Double For J = 0 To m - 1 sum = 0 For I = 0 To n - 1 sum = sum + (InputNeuron(I) * hweight(I, J)) Next I hin(J) = hbias(J) + sum Next J End Sub
Once the hidden layer neuron input values have been determined, it is time for each hidden layer neuron to process its input by passing the value through its transfer function. Hidden layer transfer functions are typically bipolar sigmoid functions that control the excitation state, or value, of the neuron. A bipolar sigmoid will generally yield an output that approaches 1 or –1, although the sigmoid of the output neuron can be scaled to yield a range of output values that is appropriate for the given application. Typically, the equation for this type of sigmoid is as follows:
where x is the neuron's scaled input. Here is how you accomplish this programmatically:
Private Sub HiddenTransfer(ByVal m As Integer) Dim J As Integer For J = 0 To m - 1 hout(J) = Trans(hin(J)) Next J End Sub Private Function Trans(ByVal Val As Double) As Double Dim f As Double f = (2 / (1 + (System.Math.Exp(-Val)))) - 1 trans = f End Function
The HiddenTransfer subroutine ensures that the scaled input of each hidden layer neuron is processed, while the Trans function encodes the actual bipolar sigmoid. The hout values yielded by this code now need to be passed over the next layer of weighted connections to the output neuron.
This process is almost identical to the process used to transfer values from the input neurons to the hidden layer neurons, although these connections have their own unique set of weights and the output neuron has its own unique bias. You can accomplish this data transfer with the following code:
Private Sub OutputInput(ByVal m As Integer) Dim J As Integer Dim sum As Double sum = 0 For J = 0 To m - 1 sum = sum + (hout(J) * oweight(J)) Next J oin = obias + sum End Sub
In this segment of code, the nested loop structures that were used before aren't necessary since there is only a single output neuron.
The output neuron also possesses a transfer function just like the hidden layer neurons. When writing a transfer function for your output neuron, it is important to consider the range of output values over which you want your network to make predictions, and scale the bipolar sigmoid accordingly. Some networks employ a more linear function as a transfer function to encompass a wider range of possible output values. In this case, the doctors just want to know whether their patient is at increased or decreased risk, so stick with the same bipolar sigmoid already used and just consider a value of –1 to represent decreased risk and a value of 1 to represent increased risk. Since you are choosing to use the same transfer function, just add the code that will pass the output neuron's scaled input into this transfer function.
Private Sub OutputTransfer() oout = Trans(oin) End Sub
Back-Propagation of Error with Momentum
Now the architecture of the neural network is laid out, but it still lacks the ability to learn. The next step is to develop a method of adjusting the weights and biases so that over time the neural network will be able to accurately predict disease based on the input variables that it is presented. This is accomplished through a process known as back-propagation of error, which utilizes a gradient descent algorithm (a form of hill climbing) that seeks to minimize the error of the values that are output from the neural network.
The first step in this process is to take an output computed by the neural network for a given pattern and compare it to a corresponding target value. Target values are known outcomes for given patterns; they are used as part of the training process so that the network can learn which patterns can be associated with which output values. An error information term, delta (δ), is then calculated by multiplying the difference between these two terms by the derivative of the activation function.
This error term is then used to compute a weight adjustment term as well as a bias adjustment term. The computation of these terms, however, also requires two additional terms to be taken into account. One term is the learning rate (alpha), which limits the size of a weight/bias adjustment step in a single training iteration. The smaller the value of alpha, the longer a network will take to train. If alpha is too large, however, the network may never reach a reasonable solution to the problem; the large step size will result in the algorithm making the network step over the set of weights and biases where the error is minimized.
The simplest way to determine the proper value for a learning rate is trial and error during the training process. Additionally, it may be advantageous to have a value for alpha that is not constant, but rather that adapts as training progresses (for example, large at first to improve speed and then smaller later to improve accuracy). Thus one possible improvement to the method being presented here would be to make alpha an adaptive value by employing the delta-bar-delta rule in which past error values can be used to make educated guesses about future calculated error values. With these rough estimates, the system can make more informed choices when adjusting weights.
The second value is mu (µ), the momentum term. Momentum is an addition to the weight adjustment equation. This enables the weight to change in response to the current gradient step and also to the previous one. It allows the network to find a reasonable solution in fewer training iterations. When both the current step and previous step are in agreement, it allows for a larger step size. This also reduces the effects of anomalous data, since the momentum-dictated change will oppose the learning rate-dictated change. The weight and bias update equations including the momentum terms are as follows:
where t represents the current set of weights/biases, t-1 the previous set, and t+1 the new set being calculated. The corresponding code is found in Figure 3.
Figure 3 UpdateOut Subroutine
Private Sub UpdateOut(ByVal I As Integer, ByVal m As Integer) Dim J As Integer odelta = dtrans(oin) * (targval(I) - oout) For J = 0 To m - 1 doweight(J) = (alpha * odelta * hout(J)) + (mu * (oweight(J) - oweight2(J))) oweight2(J) = oweight(J) oweight(J) = oweight(J) + doweight(J) Next J dobias = (alpha * odelta) + (mu * (obias - obias2)) obias2 = obias obias = obias + dobias End Sub
The process here is to first determine the value of odelta, and then enter a loop structure, which will update all of the weighted interconnections between the output neuron and the hidden layer neurons. Before this update occurs, the previous oweight values are shifted to the oweight2 array. This allows you to keep track of past weights and effectively utilize momentum. A similar update procedure is then carried out for the bias value.
The UpdateOut subroutine back-propagates the error to the interconnections present between the output and hidden layer neurons, but a procedure is still needed to do the same for the interconnections between the hidden layer and the input neurons. The procedure for updating the weights is similar to that used for the weights of the connections between the output and hidden layers, with the difference being that there are no target values that can be used to calculate the error of each neuron. Instead, you can calculate the error term of each hidden layer neuron using the value of odelta multiplied by the weight of the connection between the current hidden layer neuron and the output neuron. This allows for the distribution of the error of the output unit back to all units within the hidden layer. From this point onward, the procedure is the same as the previous UpdateOut method (see Figure 4).
Figure 4 UpdateHidden Subroutine
Private Sub UpdateHidden(ByVal n As Integer, ByVal m As Integer) Dim I, J, K As Integer For J = 0 To m - 1 hdelta = (odelta * oweight(J)) * dtrans(hin(J)) For I = 0 To n - 1 dhweight(I, J) = (alpha * hdelta * InputNeuron(I)) + (mu * (hweight(I, J) - hweight2(I, J))) hweight2(I, J) = hweight(I, J) hweight(I, J) = hweight(I, J) + dhweight(I, J) Next I dhbias(J) = (alpha * hdelta) + (mu * (hbias(J) - hbias2(J))) hbias2(J) = hbias(J) hbias(J) = hbias(J) + dhbias(J) Next J End Sub
The only notable difference between the UpdateHidden subroutine and the UpdateOut subroutine is that since there can be multiple hidden layer neurons as well as multiple input neurons, you need to utilize some additional loops to ensure that you update every weight and bias appropriately.
Putting the Network to Work
Now that all the functional components of the neural network are laid out, you need to properly utilize these components. It's that time to train the neural network using a set of training data. A proper training set will contain a set of input patterns with a corresponding set of target output values.
I use a DataSet to hold values for a patient's change in cholesterol, change in weight, and family history of the disease. For the two change-based values, the neural network will be able to accept floating point or integer values; a positive value indicates an increase in weight or cholesterol and a negative value indicates a decrease. Since family history of the disease is a yes or no situation, the values are –1 if there is no family history or 1 if there is a family history of the disease.
Although there are only three variables in this network, neural networks can successfully process many more variables. This would make the pattern more complex and would likely require a longer training time and/or a larger training set. Long training times, however, are especially problematic with the back-propagation algorithm used here. The code in this article is just a sample to demonstrate neural networks; further reading on more sophisticated techniques should be conducted before production-scale neural networks are considered. A list of useful references can be found in the Suggested References box. Also, you should note that Analysis Services 2005 provides a neural network implementation that you can take advantage of. For more information, see Jamie MacLennan's article in the September 2004 issue of MSDN®Magazine, which can be found at SQL Server 2005: Unearth the New Data Mining Features of Analysis Services 2005.
Sufficient training data is also critical to the success of neural networks, since the network will need to use a diversity of possible patterns to be able to process novel patterns. The more complex the pattern, the larger the training set required. The three input values are then followed by a –1 or 1 value that indicates whether the patient is at increased or decreased risk for developing heart disease. Within the training set there is data for eight such patients.
To incorporate training into the application, add a button control to the form and add the code found in Figure 5. In the TrainError subroutine, specify a learning rate and momentum term as well as the number of neurons in the hidden and output layers. Next instantiate a new StreamReader, and use the ReadLine method of the StreamReader to read the training data into array X1 from a file entitled SampleData.txt (for simplicity of example I've hardcoded both the path to the file and the number of data elements in the file, but in a real application you would obviously parameterize these input values). Then call the previously coded init procedure to lay out the architecture of the neural network as well as randomly initialize the weights and biases of the network.
Figure 5 Adding Training
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click alpha = 0.3 mu = 0.8 n = 4 m = 3 Dim TrainError As Double Dim I, J, K As Integer Dim NumCases As Integer = 25 Dim TrainSR As New IO.StreamReader("C:\NNData.txt") For J = 0 To NumCases - 1 For I = 0 To n X1(J, I) = TrainSR.ReadLine Next I targval(J) = X1(J, n) Next J TrainSR.Close() Init(n, m) J = 0 Do Until J = 1000 TrainError = 0 For I = 0 To NumCases - 1 For K = 0 To n - 1 InputNeuron(K) = X1(I, K) Next HiddenInput(n, m) HiddenTransfer(m) OutputInput(m) OutputTransfer() UpdateOut(I, m) UpdateHidden(n, m) Debug.WriteLine(I & " " & oout) TrainError = TrainError + (targval(I) - oout) ^ 2 Next I TrainError = System.Math.Sqrt(TrainError / NumCases) If TrainError < 0.01 Then Exit Do End If J = J + 1 Loop End Sub
Next, you see a nested loop structure in which the outer loop controls the maximum number of training iterations. The "For I" loop controls the actual training process. A training set input pattern is transferred from the X1 array to the input neurons. The HiddenInput subroutine uses these input neuron values to calculate the input into each hidden layer neuron. The HiddenTransfer function then calculates the outputs of the hidden layer neurons, and the OutputInput subroutine uses these to determine the value that will be sent into the output neuron. The OutputTransfer subroutine then calculates the value output from the neural network. You can use the inserted Debug statement to write the output values to the Output window. Watch how they start off being fairly inaccurate and then increase in accuracy.
Of course, given random initialization of weights and biases, the initial output value is most likely far from the actual target value. So you can call the UpdateOut and UpdateHidden subroutines to update the weights and biases between the output and hidden layer and between the hidden layer and input layer, respectively. At this point the next training pattern is transferred to the input neurons from X1 and the process repeated.
Once all eight training patterns have been used, the first training iteration (or epoch) is complete and the next epoch can begin. Training will continue until the value of J reaches its specified cutoff or the root mean square error stored in the variable TrainError drops below the specified point.
When training a neural network, especially for more complex patterns, it is possible that the first attempt never reasonably approximates the output values for the given input patterns. Then it's time to adjust some parameters. The first things to consider adjusting are the learning rate and the momentum term. Since these values control the degree of weight adjustment with each training step, an inappropriate value could cause the training algorithm to be either unable to converge upon a solution in the number of iterations specified or unable to converge at all. This is the trial-and-error approach.
The second parameter you could consider adjusting is the number of training epochs, since it is possible that the number of iterations allotted was insufficient for the network to converge upon a solution. The remaining parameter that you could consider adjusting is the number of hidden layer neurons, since the more hidden layer neurons within the network, the more sophisticated the internal representations of the network can become. Avoid using more input neurons than needed, since too many promotes a condition known as over-training.
An over-trained neural network will generally be able to output highly accurate values for the training set input patterns, but it loses the ability to predict novel patterns. In other words, the network is only able to create accurate predictions for sets it is familiar with. Losing the ability to deal with novel patterns greatly diminishes the usefulness of neural networks.
Luckily, over-training can be easily tested for using a set of validation data. Validation data is similar to training data in that you are aware of what the output should be for each input pattern in the set, but it should not repeat patterns contained in the training set. A validation set is basically a set of known unknowns, in that the patterns are novel to the neural network, but you know what the answers should be, and as such can accurately assess the performance of the network. The validation set can then be input into the neural network and the predicted results compared to the expected results. If the results match to within a predetermined degree of accuracy (90 percent, for instance) the neural network can be considered properly trained and used to make predictions for true unknowns (patterns that neither you nor the neural network have seen before). If the network fails to predict the validation set to within the specified degree of accuracy, you can assume that the network has been improperly trained and therefore discard it. A new network can then be trained and validated, and the process repeated until a network that successfully passes validation results.
If a given set of network parameters continues to train successfully but continually fails to validate, then it may be beneficial to try modifying the training parameters somewhat, since the training routine is likely converging on a local minima rather than the global minima.
Figure 7** Unsuccessful Validation **
Let's assume that you now have a successfully trained neural network. You can begin to examine the validation process by adding a button control and the code in Figure 6 to the application. You can see from the code that the validation procedure has much in common with the training aspect of the neural network. It reads in a set of data, computes an output value for each pattern in the data set, and then compares the output value to a target value. Through this comparison it determines how many members of the validation set were correctly predicted, and then determines if at least two out of three data set members were predicted correctly. If at least two out of three members were correctly predicted, the network is properly validated and can be used for unknown evaluation. If fewer than two members were correctly predicted (see Figure 7), then you'll need to discard this network, retrain another, and repeat the validation process once again.
Figure 6 Validation Procedure
Private Sub Button3_Click(ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles Button3.Click Dim NumCases As Integer NumCases = 3 ReDim X1(NumCases, n + 1) ReDim targval(NumCases) Dim I, J, K As Integer Dim Correct As Integer Dim TrainSR As New IO.StreamReader(("C:\NNValid.txt") For J = 0 To NumCases - 1 For I = 0 To n X1(J, I) = TrainSR.ReadLine Next I targval(J) = X1(J, n) Next J TrainSR.Close() For I = 0 To NumCases - 1 For K = 0 To n - 1 InputNeuron(K) = X1(I, K) Next HiddenInput(n, m) HiddenTransfer(m) OutputInput(m) OutputTransfer() If targval(I) = System.Math.Round(oout) Then Correct = Correct + 1 End If Next I TextBox5.Text = Correct & " out of " & NumCases & " Match. " If Correct >= 2 Then TextBox5.Text = TextBox5.Text & "Validation Successful." Else TextBox5.Text = TextBox5.Text & "Validation Unsuccessful." End If End Sub
In this case I chose to validate by determining the overall number of predictions that are correct, which is an acceptable method for binary outputs. For nonbinary outputs, validation is usually performed by considering the numeric discrepancies between the output values and the target values.
Evaluating Unknown Patterns
To make the trained neural network accessible to the physicians, you'll need to add some textboxes for inputting the variables for each patient, as well as a third command button for launching the evaluation of the input pattern. You'll also need to add a fourth textbox that will indicate to the physicians whether the patient is at increased or decreased risk. Add the code found in Figure 8 to the click event of the command button just created. The code in Figure 8 simply reads in the values entered into each of the textboxes and sequentially calls the hiddeninput, hiddentransfer, outputinput, and outputtransfer subroutines to obtain the neural network's prediction for the provided input pattern. A simple conditional then translates the neural network's numeric output into a written statement about the patient's risk (see Figure 9).
Figure 8 Evaluating Risk
Private Sub Button2_Click(ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles Button2.Click InputNeuron(0) = TextBox1.Text InputNeuron(1) = TextBox2.Text InputNeuron(2) = TextBox3.Text HiddenInput(n, m) HiddenTransfer(m) OutputInput(m) OutputTransfer() If oout < 0 Then TextBox5.Text = "The patient is at a reduced risk for disease" Else TextBox5.Text = "The patient is at an increased risk for disease" End If End Sub
Figure 9** Sample Output **
It is important to note that in order to make a prediction using a trained neural network, the weights and biases do not need to be modified. If this application was being designed for the real world, it would be beneficial to add code that could save the weights and biases of a trained network to a file. An alternate initialization procedure should also be provided, where rather than randomly initializing values, previously saved weights and biases could be loaded to allow the neural network to immediately make effective predictions without further training.
I have briefly examined the operations behind one of the most common types of neural networks. Even this simple example proved that neural networks can provide a highly useful methodology for dealing with pattern matching and predictive tasks. Neural networks are a diverse field; in addition to the feed-forward network discussed here, numerous other types of networks can be employed, depending on the task at hand. There are even variants of the feed-forward network such as networks with multiple hidden layers or networks that also provide direct weighted interconnections between the input and output layers, in addition to the typical hidden layer connections.
Many advances have been made in training algorithms as well. While they all still apply the same basic principles as the back-propagation variant discussed, many of these newer algorithms are able to converge on a solution in far fewer iterations, which can be highly advantageous for patterns with a large number of values.
All in all, the neural network coded in this article only demonstrates a fraction of the power of a modern implementation, but it should have provided you with a glimpse of the evolving and robust arena that is the world of neural networks.
- Bishop, Christopher M. Neural Networks for Pattern Recognition (Oxford University Press, 1995)
- Faussett, Laurene V. Fundamentals of Neural Networks (Prentice Hall, 1994)
- Reed, Russell D. and Marks, Robert J. II. Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks (MIT Press, 1999)
Christopher M. Frenz is a bioinformaticist and uses neural networks to model biological systems. He is the author of Visual Basic and Visual Basic .NET for Scientists and Engineers (Apress, 2002). He can be reached at firstname.lastname@example.org.