Perception of differences in naturalistic dynamic scenes, and a V1-based model

Michelle P S To*, Iain D. Gilchrist, David J. Tolhurst

*Corresponding author for this work

Research output: Contribution to journalArticle (Academic Journal)peer-review

4 Citations (Scopus)


We investigate whether a computational model of V1 can predict how observers rate perceptual differences between paired movie clips of natural scenes. Observers viewed 198 pairs of movies clips, rating how different the two clips appeared to them on a magnitude scale. Sixty-six of the movie pairs were naturalistic and those remaining were low-pass or high-pass spatially filtered versions of those originals. We examined three ways of comparing a movie pair. The Spatial Model compared corresponding frames between each movie pairwise, combining those differences using Minkowski summation. The Temporal Model compared successive frames within each movie, summed those differences for each movie, and then compared the overall differences between the paired movies. The Ordered-Temporal Model combined elements from both models, and yielded the single strongest predictions of observers' ratings. We modeled naturalistic sustained and transient impulse functions and compared frames directly with no temporal filtering. Overall, modeling naturalistic temporal filtering improved the models' performance; in particular, the predictions of the ratings for low-pass spatially filtered movies were much improved by employing a transient impulse function. The correlations between model predictions and observers' ratings rose from 0.507 without temporal filtering to 0.759 (p = 0.01%) when realistic impulses were included. The sustained impulse function and the Spatial Model carried more weight in ratings for normal and high-pass movies, whereas the transient impulse function with the Ordered-Temporal Model was most important for spatially low-pass movies. This is consistent with models in which high spatial frequency channels with sustained responses primarily code for spatial details in movies, while low spatial frequency channels with transient responses code for dynamic events.

Original languageEnglish
Pages (from-to)15.1.19
Number of pages13
JournalJournal of Vision
Issue number1
Publication statusPublished - 1 Jan 2015

Bibliographical note

© 2015 ARVO.

Structured keywords

  • Cognitive Science
  • Visual Perception


  • Computational modeling
  • Natural scenes
  • Spatial processing
  • Temporal processing
  • Visual discrimination

Cite this