Next, the input-to-hidden weight gradients and the hidden bias gradients are calculated:
for (int i = 0; i < numInput; ++i)
for (int j = 0; j < numHidden; ++j)
ihGrads\[i\]\[j\] = hSignals\[j\] * inputs\[i\];
for (int j = 0; j < numHidden; ++j)
hbGrads\[j\] = hSignals\[j\] * 1.0;
As before, the dummy 1.0 input value used for the bias gradients can be dropped if you wish. After all gradients have been calculated and stored, method Train updates all weights and bias values, using the gradients. Unlike gradients, where output gradients must be calculated before hidden gradients, weights and biases can be updated in any order. First, the input-to-hidden weights are updated:
for (int i = 0; i < numInput; ++i)
{
for (int j = 0; j < numHidden; ++j)
{
double delta = ihGrads\[i\]\[j\] * learnRate;
ihWeights\[i\]\[j\] += delta;
ihWeights\[i\]\[j\] += ihPrevWeightsDelta\[i\]\[j\] *
momentum;
ihPrevWeightsDelta\[i\]\[j\] = delta; // save
}
}
Here, the statement computing the value of variable delta corresponds to equation 8 in Figure 3. That delta value is saved for use in the next iteration to implement the momentum mechanism. Notice that all weights and bias values are updated after the gradients have been calculated for a single training data item. In principle, gradients should be calculated by accumulating error signals over all training items. But updating weights and biases after each training data item essentially estimates the overall gradients and is more efficient. This approach is usually called "online" or "stochastic" training. The alternative is usually called "batch" training.
Next, hidden node biases are updated:
for (int j = 0; j < numHidden; ++j)
{
double delta = hbGrads\[j\] * learnRate;
hBiases\[j\] += delta;
hBiases\[j\] += hPrevBiasesDelta\[j\] * momentum;
hPrevBiasesDelta\[j\] = delta;
}
Next, hidden-to-output weights are updated:
for (int j = 0; j < numHidden; ++j)
{
for (int k = 0; k < numOutput; ++k)
{
double delta = hoGrads\[j\]\[k\] * learnRate;
hoWeights\[j\]\[k\] += delta;
hoWeights\[j\]\[k\] += hoPrevWeightsDelta\[j\]\[k\] *
momentum;
hoPrevWeightsDelta\[j\]\[k\] = delta;
}
}
The each-data loop and main training loop finish after updating the output node biases:
for (int k = 0; k < numOutput; ++k)
{
double delta = obGrads\[k\] * learnRate;
oBiases\[k\] += delta;
oBiases\[k\] += oPrevBiasesDelta\[k\] * momentum;
oPrevBiasesDelta\[k\] = delta;
}
} // Each training item
} // While
At this point, the best weights and bias values are stored in the neural network object. Method Train concludes by fetching those values and returning them:
double\[\] bestWts = GetWeights();
return bestWts;
} // Train
Here, a serialized copy of the internal weights and bias values is created using class method GetWeights. An alternative is to refactor method Train to return void, and then call GetWeights as needed after training.
Wrapping Up
There are several ways you can modify the back-propagation code presented in this article. The version of back-propagation presented here is based on a mathematical assumption about how training error is computed. Specifically, the equations in Figure 3 assume that the goal of training is to find weights and bias values that minimize mean squared error between computed and target output values. There is an alternative form of error, called cross entropy error, which leads to a slightly different version of back-propagation.
Another modification is to use weight decay, also called regularization. The idea here is to use a form of error that penalizes large weight values, which in turn helps prevent model over-fitting where the model predicts very well on the training data, but predicts poorly when presented with new data.
#busy
Website:https://visualstudiomagazine.com/Articles/2015/04/01/Back-Propagation-Using-C.aspx?m=1&Page=3
About the Author:
Dr. James McCaffrey works for Microsoft Research in Redmond, WA. James has worked on several key Microsoft products such as Internet Explorer and Bing. James can be reached at [email protected]
Hi! I am a robot. I just upvoted you! I found similar content that readers might be interested in:
https://visualstudiomagazine.com/articles/2015/04/01/back-propagation-using-c.aspx