NeuralNetworkSolutions.com Homepage
HomeAboutContactResourcesProductsSupportNews

Supervised Learning in Neural Networks

The Delta Rule

In order to get around the Perceptrons limitations researchers began developing neural networks with an extra hidden layer of neurones between the inputs and outputs. This hidden layer provided a pool of units to help process data. However, the Perceptron rule did not easily extend to multiple layers.

An alternative but related approach to the perceptron learning rule is known as the delta rule. While the perceptron training rule is based on the idea of modifying weights according to some fraction of the difference between the output and target, the delta rule is based on the more general idea of "gradient descent". For example, consider the task of training a single TLU with a set of input patterns p each with a desired target output tp. The global error E is a function of the weights w. That is, as weights change, the error changes. The goal is to move in "weight space" down the slope of the error function with respect to each weight. The size of the move should be proportional to the magnitude of the slope. How is the slope calculated? Using calculus the slope may be expressed as the partial derivative of the error with respect to the weight:

If ep is the error produced by a network processing a particular pattern p, then the global error E is the mean error produced over all the different patterns in the training set:

The simplest way of determining the pattern error ep is simply the target output minus the actual output:

However, the above equation has several problems. First the subtraction means that the term may be either positive or negative rather than a simple magnitude and may therefore complicate further calculations. This issue is managed by squaring the term:

The second problem we encounter is more subtle. In order to perform gradient descent values must be continuous. This can be remedied by substituting activation a rather than the output y. Though when doing this the target should be carefully defined - if the threshold is set to 0 then one target should be set as positive and the other negative e.g. -1 and 1.

The final modification is to divide the entire term by 2 simply to make differentiation easier.

Since E is the mean of all patterns one cannot technically calculate dE/dwi until the entire set of patterns is available. However this is very computationally intensive so de/dwi is usually performed individually with each training pattern as an approximation.

I will not detail the proof for the differentiation but the equation can be seen to be intuitive. The expression (tp - ap) refers to the error. The term xp is the activation of the unit as only active units contribute to the output and should therefore have their weights adjusted. This differential can now be used to determine the modification of weights:

Now there is one further change that can be made to the above rule. The first attempt at determining error ep used the TLU output y. This had to be abandoned and the activation was used instead. However, it is possible to implement the output y within the rule if the transfer function is continuous. Traditionally a sigmoid transfer function has been used for this purpose. This introduces a further term for the derivative of the sigmoid ds(a)/dt which is often expressed as s'(a):

The above equation can be seen to be very similar to the perceptron rule. However, there are some differences. Unlike the perceptron that will stop modifying weights when the solution is reached the delta rule always modifies weights. Also, if the solution to a problem doesn't exist the perceptrons weights oscillate around the solution whereas neurones using the delta rule always converge to the solution. The most important difference is that the delta rule, unlike the perceptron rule, may be generalised to hidden layers.

Previous | Next | Page 1 2 3

Index | Supervised Learning | The Delta Rule

© 2008 Marcus bros


Message Board released
We have just opened a new message board that will provide a centre for...
29 Jul 2005 by marcus



Website released
We have released our new website to provide information about our...
29 Jul 2005 by marcus