The environment that we live in has a wide spread of things that we can readily identify. This act of identification is called object recognition. It does not necessarily mean that we can name the thing in question, but does mean that we are able to make an appropriate response to it. We use the term ‘object’ to refer to a wide assortment of different types of visual event, but they all share some common attributes. All visual objects are spatially defined and circumscribed so that they occupy volume in the scene; they tend to be solid and bounded by a distinct surface.
The process of recognition of an object essentially involves the action of matching an incoming perceptual representation of an object with an internally held specification. For example, in recognizing some object as a chair the perceptual process will generate a description of the image of the object in terms of the surfaces that it has and their spatial arrangement. This description will then be compared against a set of features that all (or at least the majority of) chairs have, such as a flat horizontal surface of appropriate size held at an appropriate height by a number of upright legs. If the description compares well then, a judgement that the object is a chair can be made with reasonable confidence.
There is a specific problem in the process of object recognition. The image of an object, and its visual appearance, depend to a very considerable extent on the circumstances of the object: how it is illuminated, what direction it is being viewed from; what its surroundings are. Thus the information that is to be used to recognize the object is highly variable.
There are various possible versions of the process of object recognition which differ in the types of information that is being compared. At one extreme it is possible that the comparison process could involve matching 2-dimensional templates of objects against the image to find the template that matches best. In this case the template is effectively a single complex feature. The drawback with this approach is that the natural variability of objects would far exceed the capacity of any template. Thus a specific chair might well be recognizable from its match to a template of itself, but not any other. At the other extreme, it is possible that a description of the object made up of a large number of small features could be computed from the incoming information and then this compared with an equivalent full internal structural description. The drawback to this approach is that the process of creating a full structural description of the image of the object is itself highly complex.