Great post as an introduction to computer vision.
Just I want to clear something out, the reason why in this algorithm and other simple computer vision algorithms we use gray-scale is because it's easier that way.
When you have a gray-scale image, you can easily represent that image as a 2-dimensional array of numbers, which represent each pixel and the tone of white (or light) for each pixel (for example, 0 is black and 1 is white).
When you have a color image, the most common representation is as a "RGB" (Red-Green-Blue) image, which has to be represented as 3 different 2-dimensional arrays that represent the amount of each color for each pixel of the image, which complicate the things a lot.
But that doesn't mean that all algorithms see images in gray-scale, there are algorithms that use color, but when possible, we try not to over-complicate ourselves.