Review
The role of context in object recognition

https://doi.org/10.1016/j.tics.2007.09.009Get rights and content

In the real world, objects never occur in isolation; they co-vary with other objects and particular environments, providing a rich source of contextual associations to be exploited by the visual system. A natural way of representing the context of an object is in terms of its relationship to other objects. Alternately, recent work has shown that a statistical summary of the scene provides a complementary and effective source of information for contextual inference, which enables humans to quickly guide their attention and eyes to regions of interest in natural scenes. A better understanding of how humans build such scene representations, and of the mechanisms of contextual analysis, will lead to a new generation of computer vision systems.

Introduction

The ability of humans to recognize thousands of object categories in cluttered scenes, despite variability in pose, changes in illumination and occlusions, is one of the most surprising capabilities of visual perception, still unmatched by computer vision algorithms. Object recognition is generally posed as the problem of matching a representation of the target object with the available image features, while rejecting the background features. In typical visual-search experiments, the context of a target is a random collection of distractors that serve only to make the detection process as hard as possible. However, in the real world, the other objects in a scene are a rich source of information that can serve to help rather than hinder the recognition and detection of objects. In this article, we review work on visual context and in its role on object recognition.

Section snippets

Contextual influences on object recognition

In the real world, objects tend to co-vary with other objects and particular environments, providing a rich collection of contextual associations to be exploited by the visual system. A large body of evidence in the literature on visual cognition 1, 2, 3, 4, 5, 6, 7, 8, computer vision 9, 10, 11 and cognitive neuroscience 12, 13, 14, 15, 16 has shown that contextual information affects the efficiency of the search and recognition of objects. There is a general consensus that objects appearing

The effects of context

Early studies have shown that context has effects at multiple levels: semantic (e.g. a table and chair are probably present in the same images, whereas an elephant and a bed are not), spatial configuration (e.g. a keyboard is expected to be below a monitor), and pose (e.g. chairs are oriented towards the table, a pen should have a particular pose relative to the paper to be useful for writing and a car will be oriented along the driving directions of a street).

Hock et al.[17] and Biederman and

Implicit learning of contextual cues

Fiser and Aslin 24, 25 have shown that humans are good at routinely extracting temporal and spatial statistical regularities between objects and do so from an early age. Seminal work by Chun and Jiang [26] revealed that human observers can implicitly learn the contingencies that exist between arbitrary configurations of distractor objects (e.g. a set of the letter L) and the location of a target object (e.g. a letter T), a form of learning called ‘contextual cueing’ (reviewed in Ref. [27]).

By

Perception of sets and summary statistics

A representation of context on the basis of object-to-object associations treats objects as the atomic elements of perception. It is an object-centered view of scene understanding. Here and in the next section, we will review work suggesting a cruder but extremely effective representation of contextual information, providing a complementary rather than an alternative source of information for contextual inference. In the same way that the representation of an object can be mediated by features

Global context: insights from computer vision

In computer vision, the most common approach to localizing objects in images is to slide a window across all locations and scales in the image and classify each local window as containing either the target or background. This approach has been successfully used to detect objects such as faces, cars and pedestrians (reviewed in Ref. [49]). However, contextual information can be used in conjunction with local approaches to improve performance, efficiency and tolerance to image degradation. One of

Contextual effects on eye movements

When exploring a scene for an object, an ideal observer will fixate the image locations that have the highest posterior probability of containing the target object according to the available image information [57]. Attention can be driven by global scene properties (e.g. when exploring a street scene for a parking meter, attention is directed to regions near the ground plane) and salient objects contextually related to the target (e.g. when looking for a computer mouse, the region near a

Concluding remarks

A scene composed of contextually related objects is more than just the sum of the constituent objects. Objects presented in a familiar context are faster to localize and recognize. In the absence of enough local evidence about an object's identity, the scene structure and prior knowledge of world regularities might provide the additional information needed for recognizing and localizing an object. Even if objects can be identified by intrinsic information, context can simplify the object

Acknowledgements

We thank George Alvarez, Moshe Bar, Timothy Brady, Michelle Greene, Barbara Hidalgo-Sotelo, Jeremy Wolfe and three anonymous reviewers for insightful comments on the manuscript. A.O. was funded by a National Science Foundation Career award (IIS 0546262) and a National Science Foundation contract (IIS 0705677). A.T. was partly funded by the National Geospatial-Intelligence Agency NEGI-1582–04–0004.

References (70)

  • J.K. Tsotsos

    Modeling visual-attention via selective tuning

    Artif. Intell.

    (1995)
  • M.E. Auckland

    Non-target objects can influence perceptual processes during object recognition

    Psychon. Bull. Rev.

    (2007)
  • J.L. Davenport et al.

    Scene consistency in object and background perception

    Psychol. Sci.

    (2004)
  • A. Friedman

    Framing pictures: the role of knowledge in automatized encoding and memory of gist

    J. Exp. Psychol. Gen.

    (1979)
  • R.D. Gordon

    Attentional allocation during the perception of scenes

    J. Exp. Psychol. Hum. Percept. Perform.

    (2004)
  • J.M. Henderson

    Effects of semantic consistency on eye movements during scene viewing

    J. Exp. Psychol. Hum. Percept. Perform.

    (1999)
  • S.E. Palmer

    The effects of contextual scenes on the identification of objects

    Mem. Cognit.

    (1975)
  • A. Hollingworth et al.

    Does consistent scene context facilitate object detection

    J. Exp. Psychol. Gen.

    (1998)
  • A. Torralba

    Contextual guidance of attention in natural scenes: the role of global features on object search

    Psychol. Rev.

    (2006)
  • A. Torralba

    Modeling global scene factors in attention

    J. Opt. Soc. Am. A

    (2003)
  • D. Hoiem

    Putting objects in perspective

    Proc. IEEE Comp. Vis. Pattern Recog.

    (2006)
  • M. Bar

    Visual objects in context

    Nat. Rev. Neurosci.

    (2004)
  • J.O.S. Goh

    Cortical areas involved in object, background, and object-background processing revealed with functional magnetic resonance adaptation

    J. Neurosci.

    (2004)
  • E. Aminoff

    The parahippocampal cortex mediates spatial and non-spatial associations

    Cereb. Cortex

    (2007)
  • Gronau, N. et al. Integrated contextual representation for objects’ identities and their locations. J. Cogn. Neurosci....
  • H.S. Hock

    Contextual relations: the influence of familiarity, physical plausibility, and belongingness

    Percept. Psychophys.

    (1974)
  • A. Torralba

    Contextual priming for object detection

    Int. J. Comput. Vis.

    (2003)
  • M.P. Eckstein

    Attentional cues in real scenes, saccadic targeting and Bayesian priors

    Psychol. Sci.

    (2006)
  • R.J. Peters et al.

    Computational mechanisms for gaze direction in interactive visual environments

  • K.P. Murphy

    Using the forest to see the trees: a graphical model relating features, objects and scenes

    Adv. in Neural Information Processing Systems

    (2003)
  • M.M. Chun et al.

    Top-down attentional guidance based on implicit learning of visual covariation

    Psychol. Sci.

    (1999)
  • C. Green et al.

    Familiar interacting object pairs are perceptually grouped

    J. Exp. Psychol. Hum. Percept. Perform.

    (2006)
  • J. Fiser et al.

    Unsupervised statistical learning of higher-order spatial structures from visual scenes

    Psychol. Sci.

    (2001)
  • J. Fiser et al.

    Encoding multi-element scenes: statistical learning of visual feature hierarchies

    J. Exp. Psychol. Gen.

    (2005)
  • Y. Jiang et al.

    Contextual cueing: reciprocal influences between attention and implicit learning

  • Cited by (829)

    View all citing articles on Scopus
    View full text