0
About the selection process in dynamic scene observation

Dear authors,

I’ve read with great interest your paper; I particularly paid attention to the selection process in the observation of dynamic scenes, about which you wrote: “To effectively interact with our natural and social world, we selectively gaze at and process a limited number of local scene regions or visual items that are informative or interesting to us.”

I and my group observed what seems to be a very similar phenomenon while studying the interpretation of natural language written messages (work published in 2015 on PeerJ): a subjective and unpredictable selection of words and phrases while reading an unabridged message drawn from a real life situation. Have you deepened the source of the selection you detected while studying dynamic scenes observation?

Have you observed, maybe, if different persons, while gazing at the same scene, are attracted by different scene regions or visual items?

Thank you very much and best regards

Roberto

waiting for moderation
2 Answers
1
Accepted answer

Thanks for your interest, Roberto.

In this study, probably due to the nature of driving video clips, we did not observe a large individual differences in the fixated regions (as indicated by higher human inter-observer similarity score in Figure 2). However, as human allocation is partly determined by top-down influences such as past experience, memory and expectation, a significant individual differences would be expected if we use different stimuli (e.g. complex urban scenes) and task demand (e.g. free-viewing).

waiting for moderation
0

Thank you very much.

The methodological aspect of differences among laboratory conditions and real life observations is highly debated and has been briefly discussed in our work, too. In our case, we intentionally choosed a naturalistic approach (real-life materials).

My compliments to all the authors and best regards

Roberto

-
waiting for moderation
1
Accepted answer

Dear Dr. Roberto,

This is Jiawei, who was in charge of the numerical analysis of human eye gaze data. Good question and I agree with your viewpoint indeed!

I think my supervisor Kun Guo could give you a more explicit answer regarding individual differences when viewing the driving clips. Here is my understanding of the computation.

Regarding the personal preference of scene selections, we classified the driving videos into 10 different conditions based on the motion cues and collected data from 35 participants, averaged them to the "ground truth". I should admit there are unavoidable "outliers" due to participants getting distracted.

Then I guess averaging is sometimes not very precise indeed. To address this I used iterative computation (take an arbitrary participant out of 35 and make the iterative computing). Finally, two distinct properties of human and visual attention models emerged:

1. Humans gaze at different locations under different stimuli presentations (normal or reversed), while visual attention models, even so-called "spatiotemporal visual attention models" predicted the gazing points on the low-level saliency, i.e., there were few difference between normal or reversed video.

2. Central bias in visual attention models is not as obvious as in humans, the gazing predictions of computational vision models (even when deep learning is involved) are simply low-level or mid-level features combined. There are still different gaps between current visual attention models and human observers.

I didn't state this in the article, however, during the data analysis, I found that the participants who have driving experience or no/little driving experience have different eye gaze distributions on the temporal sequences. However, I did not have the time to investigate this further, as this research was part of a PhD in Lincoln which needed to be completed within 3.5 years.

For further understanding, please do not hesitate to contact Kun or Federica. Please cite our paper because it is really very interesting work! Thank you!

Best wishes,

Jiawei

waiting for moderation
0

Dear Jawei,

I thank you very much for your detailed answer. Thanks also to Federica Menchinelli.

My compliments for your analysis. I found also interesting that observation you did not report in your article: the driving experience of the experimental subjects influences their gazing at the scenes. For example, this seems to be consistent with observations on mirror neurons: the experience of observers (say martial art athletes or dancers) influences the intensity of mirror neurons discharging. In my opinion, this is also consistent with our observations on real-life text reading: the selection of words/phrases to focus on is certainly influenced by the readers' backgrounds.

With my best regards

Roberto

-
waiting for moderation