Smart Eyes: Attending and Recognizing Instances of Salient Events

© Photo Fraunhofer FIT

The SEARISE project is developing a trinocular active cognitive visual system, the Smart Eyes, for detection, tracking and categorization of salient events and behaviors. The system will have human-like capability to learn from and self-adjust to ever changing visual input, to fixate at salient events and follow their motion, to perform visual categorization of salient events based on environmental context and a set of policy rules. It is prototyped in a public location to monitor events with a large number of people.

The hardware of Smart Eyes consists of a fixed camera for global monitoring of the scenery, complemented by two active stereo cameras that automatically fixate salient events, similar to saccadic (searching) motions of the human eye. The Smart Eyes system performs multi-scale analysis by zooming in on individual parts of attended events, which might either uncover an object’s identity or display its salient actions in detail.

The software of Smart Eyes implements a computational cognitive model of visual processing replicating major principles and computational strategies of the mammalian visual cortex.

The processing starts by extracting local motion and form features from the visual input. These operations generate tremendous amounts of data in real-time, which must be processed by specialized high-performance graphics hardware.

The learning module at the next layer of the hierarchy takes the modulated form and motion responses to learn typical representations of object shapes and temporal dependencies in motion patterns. These learning mechanisms need to be integrated with attention control to salient events.

Parallel to the learning, a surprise map is computed as a measure of inconsistency between typical events that have been learnt at each layer of the hierarchy, and a present level of activity observed in the scene. Ideally, the Smart Eyes system will react to events that are likely to attract the attention of a trained human observer.