I always use chain rule of differentiation to understand backpropagation. Since neural network can be represented as f(g(h(...(input)...))) where each function is a layer of the neural network, while trying to minimize the error function which is another function of (predicted-actual), we differentiate this function of function representation, resulting in usage of chain rule.
You are viewing a single comment's thread from: