Image evaluation tasks are often conducted using paired comparisons or ranking. To elicit interval scales, both methods rely on Thurstone's Law of Comparative Judgment in which objects closer in psychological space are more often confused in preference comparisons by a putative discriminal random process. It is often debated whether paired comparisons and ranking yield the same interval scales. An experiment was conducted to assess scale production using paired comparisons and ranking. For this experiment a Pioneer Plasma Display and Apple Cinema Display were used for stimulus presentation. Observers performed rank order and paired comparisons tasks on both displays. For each of five scenes, six images were created by manipulating attributes such as lightness, chroma, and hue using six different settings. The intention was to simulate the variability from a set of digital cameras or scanners. Nineteen subjects, (5 females, 14 males) ranging from 19-51 years of age participated in this experiment. Using a paired comparison model and a ranking model, scales were estimated for each display and image combination yielding ten scale pairs, ostensibly measuring the same psychological scale. The Bradley-Terry model was used for the paired comparisons data and the Bradley-Terry-Mallows model was used for the ranking data. Each model was fit using maximum likelihood estimation and assessed using likelihood ratio tests. Approximate 95% confidence intervals were also constructed using likelihood ratios. Model fits for paired comparisons were satisfactory for all scales except those from two image/display pairs; the ranking model fit uniformly well on all data sets. Arguing from overlapping confidence intervals, we conclude that paired comparisons and ranking produce no conflicting decisions regarding ultimate ordering of treatment preferences, but paired comparisons yield greater precision at the expense of lack-of-fit.
Eye movement behavior was investigated for image-quality and chromatic adaptation tasks. The first experiment examined the differences between paired comparison, rank order, and graphical rating tasks, and the second experiment examined the strategies adopted when subjects were asked to select or adjust achromatic regions in images. Results indicate that subjects spent about 4 seconds looking at images in the rank order task, 1.8 seconds per image in the paired comparison task, and 3.5 seconds per image in the graphical rating task. Fixation density maps from the three tasks correlated highly in four of the five images. Eye movements gravitated toward faces and semantic features, and introspective report was not always consistent with fixation density peaks. In adjusting a gray square in an image to appear achromatic, observers spent 95% of their time looking only at the patch. When subjects looked around (less than 5% of the time), they did so early. Foveations were directed to semantic features, not achromatic regions, indicating that people do not seek out near-neutral regions to verify that their patch appears achromatic relative to the scene. Observers also do not scan the image in order to adapt to the average chromaticity of the image. In selecting the most achromatic region in an image, viewers spent 60% of the time scanning the scene. Unlike the achromatic adjustment task, foveations were directed to near-neutral regions, showing behavior similar to a visual search task.
A wearable eye tracker was used to record photographers' eye movements while they took digital photographs of person, sculpture, and interior scenes. Eye movement sequences were also recorded as the participants selected and cropped their images on a computer. Preliminary analysis revealed that during image capture people spend approximately the same amount of time looking at the camera regardless of the scene being photographed. The time spent looking at either the primary object or the surround differed significantly across the three scenes. Results from the editing phase support previous reports that observers fixate on semantic-rich regions in the image, which, in this task, were important in the final cropping decision. However, the spread of fixations, edit time, and number of crop windows did not differ significantly across the three image classes. This suggests that, unlike image capture, the cropping task was highly regular and less influenced by image content.
We explore the way in which people look at images of different semantic categories and directly relate those results to computational approaches for automatic image classification. Our hypothesis is that the eye movements of human observers differ for images of different semantic categories, and that this information can be effectively used in automatic content-based classifiers. First, we present eye tracking experiments that show the variation in eye movements across different individuals for image of 5 different categories: handshakes, crowd, landscapes, main object in uncluttered background, and miscellaneous. The eye tracking results suggest that similar viewing patterns occur when different subjects view different images in the same semantic category. Using these results, we examine how empirical data obtained from eye tracking experiments across different semantic categories can be integrated with existing computational frameworks, or used to construct new ones. In particular, we examine the Visual Apprentice, a system in which images classifiers are learned form user input as the user defines a multiple level object definition hierarchy based on an object and its parts and labels examples for specific classes. The resulting classifiers are applied to automatically classify new images. Although many eye tracking experiments have been performed, to our knowledge, this is the first study that specifically compares eye movements across categories, and that links category-specific eye tracking results to automatic image classification techniques.
Visual perception, operating below conscious awareness, effortlessly provides the experience of a rich representation of the environment, continuous in space and time. Conscious visual perception is made possible by the 'foveal compromise,' the combination of the high-acuity fovea and a sophisticated suite of eye movements. Our illusory visual experience cannot be understood by introspection, but monitoring eye movements lets us probe the processes of visual perception. Four tasks representing a wide range of complexity were used to explore visual perception; image quality judgments, map reading, model building, and hand-washing. Very short fixation durations were observed in all tasks, some as short as 33 msec. While some tasks showed little variation in eye movement metrics, differences in eye movement patterns and high-level strategies were observed in the model building and hand washing tasks. Performance in the hand washing task revealed a new type of eye movement. 'Planful' eye movements were made to objects well in advance of a subject's interaction with the object. Often occurring in the middle of another task, they provide 'overlapping' temporal information about the environment providing a mechanism to produce our conscious visual experience.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.