The ability of a system or softwaretoidentify objects, people, places, andactionsinimages. It uses machine vision technologies with artificial intelligence and trained algorithms to recognize images through a camera system.
The recent advancement in artificial intelligence and machine learning has contributed to the growth of computer vision and image recognition concepts.
From controlling a driver-less car to carrying out face detection for a biometric access, image recognition helps in processing and categorizing objects based on trained algorithms.
Humans can clearly recognize and distinguish different features of objects because our brains have been trained unconsciously with the same set of images that has resulted in the development of capabilities to differentiate between things effortlessly.
Contrary to human brains, computer views visuals as an array of numerical values and looks for patterns in the digital image, be it a still, video, graphic, or even live, to recognize and distinguish key features of the image.
The rising adoption of Industry 4.0 and automation in manufacturing industries has further stimulated the demand for Computer Vision.
Many organizations are investing in image recognition to interpret and analyze data coming primarily from visual sources for a number of uses such as medical image analysis, identifying objects in autonomous cars, face detection for security purpose, etc.
Digital image
A matrix of numerical values representing the data associated with the pixels of the image.
How image recognition works
1. Information fed to the recognition systems is the intensities and the location of different pixels in the image
2. Systems learn to map out a relationship or pattern in the subsequent images supplied to it as a part of the learning process
3. System performance on test data is validated
4. Weights to the neural networks are modified to improve the accuracy of the systems
Algorithms used in image recognition
SIFT (Scale-invariant Feature Transform)
SURF (Speeded Up Robust Features)
PCA (Principal Component Analysis)
LDA (Linear Discriminant Analysis)
MeanShift
CAMShift (Continuously Adaptive Mean Shift)
Frame differencing is one of the simplest techniques that can be used to identify the moving parts in a video.
The white lines in the frame differencing result represent the silhouette.
Filtering out extra non-relevant information allows other algorithms in the system to process the relevant information from images a lot more efficiently and therefore react to any impending danger more quickly.
The HSV color space is used for object tracking as it is an intuitive color space model that is closer to how humans perceive color.
Haar cascades refer to cascade classifiers based on Haar features, which is an effective machine learning technique that can be used to detect any object.
How the CAMShift algorithm works
1. Select a region of interest containing the object to be tracked
2. Compute the centroid of the region
3. Move the enclosing boundary so that the new centroid becomes the geometric center of the bounding box
4. Repeat for every frame to track the object in real time
Face detection refers to detecting the location of a face in a given image, while face recognition is the process of identifying who the person is.
Integral images are used to compute Haar features in a computationally efficient way for real-time face detection.
Eye detection works similarly to face detection, using an eye cascade file instead of a face cascade file.
Eye tracking can be used to determine the rate at which a driver's eyes are blinking or if their eyes are closed to detect if they are getting tired.