Seeing The World In Machine Vision (MV)

Machine Vision (MV) is how computerized systems, whether robots or intelligent software, see and interpret images. This is how driverless vehicles, drones, face ID systems, factory robots, rovers and other AI systems will identify objects in it’s environment. Human vision differs from computers of course, mainly in how we form the image and identify what it is we are seeing. We are organic, computers are electronic. The complexity of driving around let’s say Hollywood can be a challenge. As we see things, there are so many objects to identify which we can do in split seconds. Our eyes function like a camera when looking at an image and focusing on what we see. We have a natural image resolution of 576 MP, which is about 18,000 pixels on the vertical so we don’t see any pixelations since that resolution is so high. There are no cameras that have a resolution that is even close to the human eye, at least not at the moment. Canon has created a 250 MP CMOS sensor prototype, which is the closest to the human eye so far (as of this writing October 2017). The eye can also process 1,000 fps as the theoretical upper limit, though it varies per person depending on eye and brain coordination. People with good eye sight have refresh rates between 200 and 400 Hz, while a UHD TV can hit 240 Hz refresh rates. So in order to see images computers need a camera and display with a powerful CPU and GPU (like the brain) with AI software to be able to identify images and objects. Though it doesn’t come close to human vision, it can actually be optimized through machine learning by feeding the software a large database repository of images from which the computer can use to make an identification. This does require a large memory subsystem as well, much like the human brain.

The steps involved in MV, summarized into 3 common methods, require the following:

Image Acquisition — This requires the proper equipment to capture an image and store it in a digital format. The most common device used is the camera. The image is stored in a digital format like uncompressed RAW or lossy compression formats like JPEG. The image resolution is determined by the camera sensor. The image is translated into pixels once in digital format and the > number of pixels, the higher the resolution, the more detail and quality it holds.

Image Processing — This is the lengthy process of taking the captured image and undergoing different methods to extract the contents of an image for further analysis. Processing the image requires pixel counting, filtering, combining images 2D or 3D using stitching methods, segmentation analysis, edge detection, color analysis, blob detection and extraction and pattern recognition. If there is text involved, an OCR is performed. At this point the software will begin to detect and isolate the objects in an image, depending on what it was programmed to identify or find.

Output Result — The processed data is then compared with target values in pass/fail decisions. If the image detection was looking for a certain content in an image, let’s say license plate number, it will be able to identify it from the rest of the image. In automated environments the purpose for MV could be used to inspect machine parts, checking defects in manufactured parts, food packaging inspection among others. Where most image processing create another image as output, in MV the output is an action that triggers a decision. In retail MV can be used to identify the label and price of items using a smart app which can be captured from a smartphone. Similar to the process used with Google’s Cloud Vision API, MV systems analyze an image. However, MV requires further steps that deal with real world applications. This will be used in autonomous vehicles or self driving cars as well as in industrial and utility robots. We surely would not want a robot that performs security to report every person they see walking the premises as trespassers. MV will be used with advanced AI software to determine what a threat would be.

As the world moves more toward automating repetitive tasks, MV will play a big role in the process. Software designed with MV using ML and AI software can also handle other tasks that would otherwise require a human operator. This will also find application in the IoT industry, particularly with smart home security systems. The emerging field of AR will also get a boost with better MV that can help users identify and study objects. It will enhance learning new things in training environments. MV enabled robots can collaborate alongside human workers to accomplish tasks, therefore increasing productivity levels. It’s use in autonomous vehicles will help with safety concerns as the systems become more agile in identifying it’s surroundings in order to avoid accidents. It’s significance is particularly felt in the automotive and healthcare industry. As MV gets better with time and trials, we should expect this to become an integral discipline with AI.