Linear integration of sensory evidence over space and time underlies face categorization

Gouki Okazawa; Long Sha; Roozbeh Kiani

doi:10.1101/2020.11.27.396705

Abstract

Visual object recognition relies on elaborate sensory processes that transform retinal inputs to object representations, but it also requires decision-making processes that read out object representations and function over prolonged time scales. The computational properties of these decision-making processes remain underexplored for object recognition. Here, we study these computations by developing a stochastic multi-feature face categorization task. Using quantitative models and tight control of spatiotemporal visual information, we demonstrate that humans categorize faces through an integration process that first linearly adds the evidence conferred by task-relevant features over space to create aggregated momentary evidence, and then linearly integrates it over time with minimum information loss. Discrimination of stimuli along different category boundaries (e.g., identity or expression of a face) is implemented by adjusting feature weights of spatial integration. This linear but flexible integration process over space and time bridges past studies on simple perceptual decisions to complex object recognition behavior.