Why is image recognition a key function of AI?


Image recognition, in the context of machine vision, is the ability of software to identify objects, places, people, writing and actions in images. Computers can use machine vision technologies in combination with a camera and artificial intelligence software to achieve image recognition. So, why is image recognition a key function of AI?

salma khalil

Posted 2019-04-13T18:23:28.700

Reputation: 429

Welcome to AI.SE @salma khalil. This looks like a good start to a question, but perhaps you could clarify why you think image recognition is a "key function" of AI, and what you mean by "key".

Imagine recognition is definitely part of AI, and it's something that AI researchers work to improve, but I don't think it's widely viewed as an especially essential part (compared to any of the other large fields, like learning). – John Doucette – 2019-04-14T00:41:48.457

It's not a function, but an application. – Oliver Mason – 2019-04-14T14:41:54.280



Image recognition is an important application of AI techniques, as images usually act as sensory input for further problems to be solved.

For example, a self-driving car needs to take into account its environment; it needs to recognise the path/road it is driving on, obstacles, other traffic, traffic signs, etc. All this is visual input which needs to be recognised. Without image recognition a self-driving car would not be possible.

A machine that operates in a warehouse picking goods: in order to handle the goods, eg with a robotic arm, it needs to recognise where the goods are and what they look like, so that the arm can be controlled.

A system controlling a CCTV camera should be able to distinguish an intruder from an animal that happens to just walk past. Again, with image recognition and classification this could be achieved.

An AI system will react to sensory input from the 'outside' world, and act on it. The type of sensory input depends on the application: it could be a numerical value (like a bank balance), language (text or speech), tactile (eg a robotic hand), or visual. If it is a stock brokering system, image recognition is not needed. Machine translation would also not need it. But for many real-world interactions, having 'eyes' is either necessary or at the very least useful.

Oliver Mason

Posted 2019-04-13T18:23:28.700

Reputation: 3 755

"Image recognition" is not usually a sensory input. The sensor input is the image. Image recognition (and, in general, pattern recognition) is an operation that is performed on an image. I think you should clarify this. – nbro – 2019-05-22T15:32:07.253

Thanks -- I clarified this. You are correct. Sloppy use of terminology on my part. – Oliver Mason – 2019-05-22T16:00:32.177


The ability to make choices based on what is sensed is key to the sustainability of life, whether the choices are instinctual or cognitive. On the artificial side, for many products designed to assist humans, such abilities are similarly valuable. For example, one of the early successes of optically guided machine research and now ubiquitous in postal systems is the automated sorting of mail by destination address.

When considering vision systems, a control systems view of where they are typically deployed may be helpful.

We know from evolutionary biology that the same systems features reappear independently in widely separated evolutionary paths. When this occurs, it infers that the recurring biological feature leads to sustainability in more than one niche in the biosphere. The exploitation of optics is one of those recurring features.

A high level view of sustainable biological systems often involves closed loop feedback around these four kinds of system elements.

  • Input of external information
  • Internal processing
  • Action based on processing results
  • Objects in the external environment

The changed environment is then sensed again in the first item of the four. This closed loop feedback arrangement is also common in human industry, for good reasons. Control systems engineers know that any system missing any of the four is in some way operating below optimality.

Systems without closed loop feedback are disabled in a sense. The full range of capability and purpose has been diminished, thus the common phrase, "We are operating blind," when decision making input is absent.

Each of these four examples is missing one of the four kinds of system elements, leading to what could be considered a tragic existence.

  • A deaf and blind person that has no finger sensitivity to read braille.
  • A person who has lost all normal brain function and can move only randomly.
  • A person who is paralyzed, including vocal apparatus and facial muscles.
  • Someone in a disabled space capsule on a trajectory away from any habitable planet.

For products that humans design, we can draw a system boundary around any single group of elements; exclude one, two, or three of the system element types listed above; and package this subset as a product. However, the resulting product, to behave consistently in varying environments, must be employed in such a way as to incorporate the other types to form the closed loop.

The quality enhanced by closed loop control, in systems engineering, is formally called stability. This system quality is apparent in every biological system, and is often called stasis in that context. James Watt was one of the practical pioneers stable machines. LeRoy MacColl, Norbert Wiener, and others demonstrated more rigorously in mathematics how it works. In information systems, the quality is called intelligence and the closed loop enables learning.

This is worthy of note in answer to this question because any optical sensory capability in a biological system or any full featured robotic system will need to learn how to recognize new objects and trajectory patterns to be reliable and therefore sustainable in the field.

Leaning on the etymology of words a little, we can say that recognition of an object, trajectory, or phenomena in a sequence of images relies on initial cognition.

$$ \text{Initial Cognition} \Rightarrow \text{Recognition} $$

A newborn does not recognize her or his mother upon birth. The optics are in place, but focus comes later. Then association between the decrease in pain and a particular face arises. The ability for that cause-and-effect association to form is arguably a low level feature of intelligence. The use of that association after it has begun to form is recognition.

Image recognition can be more accurately phrased, "Cognition in connection with an optical input." It's importance in AI hinges on the richness of information in optical inputs about a portion of the environment closely related to system objectives. When neck or eye muscles are used to change the focus of the optics, it is usually related to avoiding danger or obtaining food energy, nutrients, or some other sustenance. Hunting, searching, and navigating are sequences of closed loop activity.

Closed loops involving information rich inputs and the associated information interpretation can lead to learning and subsequent forms of recognition that increase the reliability of the system in a wider variety of environmental conditions, leading to more sustainable usage scenarios and general usefulness.

Although sound, pressure, temperature, and many other input types can provide much information to aid in closing a cognition loop, the qualities of light make optical inputs of particularly meaningful information content.

Douglas Daseeco

Posted 2019-04-13T18:23:28.700

Reputation: 7 174