Trends in Cognitive Sciences
ReviewThe role of context in object recognition
Introduction
The ability of humans to recognize thousands of object categories in cluttered scenes, despite variability in pose, changes in illumination and occlusions, is one of the most surprising capabilities of visual perception, still unmatched by computer vision algorithms. Object recognition is generally posed as the problem of matching a representation of the target object with the available image features, while rejecting the background features. In typical visual-search experiments, the context of a target is a random collection of distractors that serve only to make the detection process as hard as possible. However, in the real world, the other objects in a scene are a rich source of information that can serve to help rather than hinder the recognition and detection of objects. In this article, we review work on visual context and in its role on object recognition.
Section snippets
Contextual influences on object recognition
In the real world, objects tend to co-vary with other objects and particular environments, providing a rich collection of contextual associations to be exploited by the visual system. A large body of evidence in the literature on visual cognition 1, 2, 3, 4, 5, 6, 7, 8, computer vision 9, 10, 11 and cognitive neuroscience 12, 13, 14, 15, 16 has shown that contextual information affects the efficiency of the search and recognition of objects. There is a general consensus that objects appearing
The effects of context
Early studies have shown that context has effects at multiple levels: semantic (e.g. a table and chair are probably present in the same images, whereas an elephant and a bed are not), spatial configuration (e.g. a keyboard is expected to be below a monitor), and pose (e.g. chairs are oriented towards the table, a pen should have a particular pose relative to the paper to be useful for writing and a car will be oriented along the driving directions of a street).
Hock et al.[17] and Biederman and
Implicit learning of contextual cues
Fiser and Aslin 24, 25 have shown that humans are good at routinely extracting temporal and spatial statistical regularities between objects and do so from an early age. Seminal work by Chun and Jiang [26] revealed that human observers can implicitly learn the contingencies that exist between arbitrary configurations of distractor objects (e.g. a set of the letter L) and the location of a target object (e.g. a letter T), a form of learning called ‘contextual cueing’ (reviewed in Ref. [27]).
By
Perception of sets and summary statistics
A representation of context on the basis of object-to-object associations treats objects as the atomic elements of perception. It is an object-centered view of scene understanding. Here and in the next section, we will review work suggesting a cruder but extremely effective representation of contextual information, providing a complementary rather than an alternative source of information for contextual inference. In the same way that the representation of an object can be mediated by features
Global context: insights from computer vision
In computer vision, the most common approach to localizing objects in images is to slide a window across all locations and scales in the image and classify each local window as containing either the target or background. This approach has been successfully used to detect objects such as faces, cars and pedestrians (reviewed in Ref. [49]). However, contextual information can be used in conjunction with local approaches to improve performance, efficiency and tolerance to image degradation. One of
Contextual effects on eye movements
When exploring a scene for an object, an ideal observer will fixate the image locations that have the highest posterior probability of containing the target object according to the available image information [57]. Attention can be driven by global scene properties (e.g. when exploring a street scene for a parking meter, attention is directed to regions near the ground plane) and salient objects contextually related to the target (e.g. when looking for a computer mouse, the region near a
Concluding remarks
A scene composed of contextually related objects is more than just the sum of the constituent objects. Objects presented in a familiar context are faster to localize and recognize. In the absence of enough local evidence about an object's identity, the scene structure and prior knowledge of world regularities might provide the additional information needed for recognizing and localizing an object. Even if objects can be identified by intrinsic information, context can simplify the object
Acknowledgements
We thank George Alvarez, Moshe Bar, Timothy Brady, Michelle Greene, Barbara Hidalgo-Sotelo, Jeremy Wolfe and three anonymous reviewers for insightful comments on the manuscript. A.O. was funded by a National Science Foundation Career award (IIS 0546262) and a National Science Foundation contract (IIS 0705677). A.T. was partly funded by the National Geospatial-Intelligence Agency NEGI-1582–04–0004.
References (70)
Scene perception: detecting and judging objects undergoing relational violations
Cognit. Psychol.
(1982)- et al.
Cortical analysis of visual context
Neuron
(2003) - et al.
Contextual cueing: Implicit learning and memory of visual context guides spatial attention
Cognit. Psychol.
(1998) - et al.
In what ways do eye movements contribute to everyday activities?
Vision Res.
(2001) - et al.
Representation of statistical properties
Vision Res.
(2003) - et al.
Statistical processing: computing the average size in perceptual groups
Vision Res.
(2005) The three dimensions of human visual sensitivity to first-order contrast statistics
Vision Res.
(2007)On image classification: city images vs. landscapes
Pattern Recognit.
(1998)- et al.
Scene context guides eye movements during visual search
Vision Res.
(2006) Coarse-to-fine eye movement strategy in visual search
Vision Res.
(2007)
Modeling visual-attention via selective tuning
Artif. Intell.
Non-target objects can influence perceptual processes during object recognition
Psychon. Bull. Rev.
Scene consistency in object and background perception
Psychol. Sci.
Framing pictures: the role of knowledge in automatized encoding and memory of gist
J. Exp. Psychol. Gen.
Attentional allocation during the perception of scenes
J. Exp. Psychol. Hum. Percept. Perform.
Effects of semantic consistency on eye movements during scene viewing
J. Exp. Psychol. Hum. Percept. Perform.
The effects of contextual scenes on the identification of objects
Mem. Cognit.
Does consistent scene context facilitate object detection
J. Exp. Psychol. Gen.
Contextual guidance of attention in natural scenes: the role of global features on object search
Psychol. Rev.
Modeling global scene factors in attention
J. Opt. Soc. Am. A
Putting objects in perspective
Proc. IEEE Comp. Vis. Pattern Recog.
Visual objects in context
Nat. Rev. Neurosci.
Cortical areas involved in object, background, and object-background processing revealed with functional magnetic resonance adaptation
J. Neurosci.
The parahippocampal cortex mediates spatial and non-spatial associations
Cereb. Cortex
Contextual relations: the influence of familiarity, physical plausibility, and belongingness
Percept. Psychophys.
Contextual priming for object detection
Int. J. Comput. Vis.
Attentional cues in real scenes, saccadic targeting and Bayesian priors
Psychol. Sci.
Computational mechanisms for gaze direction in interactive visual environments
Using the forest to see the trees: a graphical model relating features, objects and scenes
Adv. in Neural Information Processing Systems
Top-down attentional guidance based on implicit learning of visual covariation
Psychol. Sci.
Familiar interacting object pairs are perceptually grouped
J. Exp. Psychol. Hum. Percept. Perform.
Unsupervised statistical learning of higher-order spatial structures from visual scenes
Psychol. Sci.
Encoding multi-element scenes: statistical learning of visual feature hierarchies
J. Exp. Psychol. Gen.
Contextual cueing: reciprocal influences between attention and implicit learning
Cited by (829)
Infants’ top-down perceptual modulation is specific to own-race faces
2024, Journal of Experimental Child PsychologyThe effect of context congruency on fMRI repetition suppression for objects
2023, Neuropsychologia