We have collected a large dataset of subjective image quality “*nesses,” such as sharpness or colorfulness. The dataset comes from seven studies and contains 39,415 quotations from 146 observers who have evaluated 62 scenes either in print images or on display. We analyzed the subjective evaluations and formed a hierarchical image quality attribute lexicon for *nesses, which is visualized as image quality wheel (IQ-Wheel). Similar wheel diagrams for attributes have become industry standards in other sensory experience fields such as flavor and fragrance sciences. The IQ-Wheel contains the frequency information of 68 attributes relating to image quality. Only 20% of the attributes were positive, which agrees with previous findings showing a preference for negative attributes in image quality evaluation. Our results also show that excluding physical attributes of paper gloss, observers then use similar terminology when evaluating images with printed images or images viewed on a display. IQ-Wheel can be used to guide the selection of scenes and distortions when designing subjective experimental setups and creating image databases.
Evaluating algorithms used to assess image and video quality requires performance measures. Traditional performance measures (e.g., Pearson’s linear correlation coefficient, Spearman’s rank-order correlation coefficient, and root mean square error) compare quality predictions of algorithms to subjective mean opinion scores (mean opinion score/differential mean opinion score). We propose a subjective root-mean-square error (SRMSE) performance measure for evaluating the accuracy of algorithms used to assess image and video quality. The SRMSE performance measure takes into account dispersion between observers. The other important property of the SRMSE performance measure is its measurement scale, which is calibrated to units of the number of average observers. The results of the SRMSE performance measure indicate the extent to which the algorithm can replace the subjective experiment (as the number of observers). Furthermore, we have presented the concept of target values, which define the performance level of the ideal algorithm. We have calculated the target values for all sample sets of the CID2013, CVD2014, and LIVE multiply distorted image quality databases.The target values and MATLAB implementation of the SRMSE performance measure are available on the project page of this study.
An established way of validating and testing new image quality assessment (IQA) algorithms have been to compare how
well they correlate with subjective data on various image databases. One of the most common measures is to calculate
linear correlation coefficient (LCC) and Spearman’s rank order correlation coefficient (SROCC) against the subjective
mean opinion score (MOS). Recently, databases with multiply distorted images have emerged 1,2. However with
multidimensional stimuli, there is more disagreement between observers as the task is more preferential than that of
distortion detection. This reduces the statistical differences between image pairs. If the subjects cannot distinguish a
difference between some of the image pairs, should we demand any better performance with IQA algorithms? This paper
proposes alternative performance measures for the evaluation of IQA’s for the CID2013 database. One proposed
alternative performance measure is root-mean-square-error (RMSE) value for the subjective data as a function of the
number of observers. The other alternative performance measure is the number of statistical differences between image
pairs. This study shows that after 12 subjects the RMSE value saturates around the level of three, meaning that a target
RMSE value for an IQA algorithm for CID2013 database should be three. In addition, this study shows that the state-of-the-art IQA algorithms found the better image from the image pairs with a probability of 0.85 when the image pairs with
statistically significant differences were taken into account.
Image-quality assessment measures are largely based on the assumption that an image is only distorted by one type of distortion at a time. These conventional measures perform poorly if an image includes more than one distortion. In consumer photography, captured images are subject to many sources of distortions and modifications. We searched for feature subsets that predict the quality of photographs captured by different consumer cameras. For this, we used the new CID2013 image database, which includes photographs captured by a large number of consumer cameras. Principal component analysis showed that the features classified consumer camera images in terms of sharpness and noise energy. The sharpness dimension included lightness, detail reproduction, and contrast. The support vector regression model with the found feature subset predicted human observations well compared to state-of-the-art measures.
To understand the viewing strategies employed in a quality estimation task, we compared two visual tasks—quality estimation and difference estimation. The estimation was done for a pair of natural images having small global changes in quality. Two groups of observers estimated the same set of images, but with different instructions. One group estimated the difference in quality and the other the difference between image pairs. The results demonstrated the use of different visual strategies in the tasks. The quality estimation was found to include more visual planning during the first fixation than the difference estimation, but afterward needed only a few long fixations on the semantically important areas of the image. The difference estimation used many short fixations. Salient image areas were mainly attended to when these areas were also semantically important. The results support the hypothesis that these tasks’ general characteristics (evaluation time, number of fixations, area fixated on) show differences in processing, but also suggest that examining only single fixations when comparing tasks is too narrow a view. When planning a subjective experiment, one must remember that a small change in the instructions might lead to a noticeable change in viewing strategy.
The most common tasks in subjective image estimation are change detection (a detection task) and image quality
estimation (a preference task). We examined how the task influences the gaze behavior when comparing detection and
preference tasks. The eye movements of 16 naïve observers were recorded with 8 observers in both tasks. The setting
was a flicker paradigm, where the observers see a non-manipulated image, a manipulated version of the image and again
the non-manipulated image and estimate the difference they perceived in them. The material was photographic material
with different image distortions and contents. To examine the spatial distribution of fixations, we defined the regions of
interest using a memory task and calculated information entropy to estimate how concentrated the fixations were on the
image plane. The quality task was faster and needed fewer fixations and the first eight fixations were more concentrated
on certain image areas than the change detection task. The bottom-up influences of the image also caused more variation
to the gaze behavior in the quality estimation task than in the change detection task The results show that the quality
estimation is faster and the regions of interest are emphasized more on certain images compared with the change
detection task that is a scan task where the whole image is always thoroughly examined. In conclusion, in subjective
image estimation studies it is important to think about the task.
Subjective image quality data for 9 image processing pipes and 8 image contents (taken with mobile phone
camera, 72 natural scene test images altogether) from 14 test subjects were collected. A triplet comparison setup
and a hybrid qualitative/quantitative methodology were applied. MOS data and spontaneous, subjective
image quality attributes to each test image were recorded. The use of positive and negative image quality
attributes by the experimental subjects suggested a significant difference between the subjective spaces of low
and high image quality. The robustness of the attribute data was shown by correlating DMOS data of the test
images against their corresponding, average subjective attribute vector length data. The findings demonstrate
the information value of spontaneous, subjective image quality attributes in evaluating image quality at variable
quality levels. We discuss the implications of these findings for the development of sensitive performance
measures and methods in profiling image processing systems and their components, especially at high image
quality levels.
We present a Videospace framework for classification of selected videos with chosen user-groups, device-types or
device-classes. Photospace has proven to be effective in classifying large amounts of still images via simple technical
parameters. We use the measures of subject-camera distance, scene lighting and object motion to classify single videos
and finally represent all videos of the chosen group in a
3-dimensional space. An expert-rated sample of video was
collected to obtain an estimation of the parameters for a chosen group of videos. Sub-groups of videos were found using
Videospace measures. The presented framework can be used to obtain information about technical requirements of
general device use and typical shooting conditions of the end users. Future measurement efficiency and precision could
be improved by using computer-based algorithms or device based measurement techniques to obtain better samples
Videospace parameters. Videospace information could be used for finding the most meaningful benchmarking contexts
or getting information about shooting in general with chosen devices or devices groups. Using information about typical
parameters for a chosen video group, algorithm and device development can be focused on typical shooting situations, if
processing power and device-size are otherwise reduced.
Subjective quality rating does not reflect the properties of the image directly, but it is the outcome of a quality decision
making process, which includes quantification of subjective quality experience. Such a rich subjective content is often
ignored. We conducted two experiments (with 28 and 20 observers), in order to study the effect of paper grade on image
quality experience of the ink-jet prints. Image quality experience was studied using a grouping task and a quality rating
task. Both tasks included an interview, but in the latter task we examined the relations of different subjective attributes in
this experience. We found out that the observers use an attribute hierarchy, where the high-level attributes are more
experiential, general and abstract, while low-level attributes are more detailed and concrete. This may reflect the
hierarchy of the human visual system. We also noticed that while the observers show variable subjective criteria for IQ,
the reliability of average subjective estimates is high: when two different observer groups estimated the same images in
the two experiments, correlations between the mean ratings were between .986 and .994, depending on the image
content.
This study presents a methodology of forming contextually valid scales for subjective video quality measurement. Any
single value of quality e.g. Mean Opinion Score (MOS) can have multiple underlying causes. Hence this kind of a
quality measure is not enough for example, in describing the performance of a video capturing device. By applying
Interpretation Based Quality (IBQ) method as a qualitative/quantitative approach we have collected attributes familiar to the end user and that are extracted directly from the material offered by the observers' comments. Based on these
findings we formed contextually valid assessment scales from the typically used quality attributes. A large set of data
was collected from 138 observers to generate the video quality vocabulary. Video material was shot by three types of
video cameras: Digital video cameras (4), digital still cameras (9) and mobile phone cameras (9). From the quality
vocabulary, we formed 8 unipolar 11-point scales to get better insight of video quality. Viewing conditions were adjusted
to meet the ITU-T Rec. P.910 requirements. It is suggested that the applied qualitative/quantitative approach is especially
efficient for finding image quality differences in video material where the quality variations are multidimensional in
nature and especially when image quality is rather high.
The subjective quality of an image is a non-linear product of several, simultaneously contributing subjective factors such
as the experienced naturalness, colorfulness, lightness, and clarity. We have studied subjective image quality by using a
hybrid qualitative/quantitative method in order to disclose relevant attributes to experienced image quality. We describe
our approach in mapping the image quality attribute space in three cases: still studio image, video clips of a talking head
and moving objects, and in the use of image processing pipes for 15 still image contents. Naive observers participated in
three image quality research contexts in which they were asked to freely and spontaneously describe the quality of the
presented test images. Standard viewing conditions were used. The data shows which attributes are most relevant for
each test context, and how they differentiate between the selected image contents and processing systems. The role of
non-HVS based image quality analysis is discussed.
We present an effective method for comparing subjective audiovisual quality and the features related to the quality
changes of different video cameras. Both quantitative estimation of overall quality and qualitative description of critical
quality features are achieved by the method. The aim was to combine two image quality evaluation methods, the
quantitative Absolute Category Rating (ACR) method with hidden reference removal and the qualitative Interpretation-
Based Quality (IBQ) method in order to see how they complement each other in audiovisual quality estimation tasks. 26
observers estimated the audiovisual quality of six different cameras, mainly mobile phone video cameras. In order to
achieve an efficient subjective estimation of audiovisual quality, only two contents with different quality requirements
were recorded with each camera. The results show that the subjectively important quality features were more related to
the overall estimations of cameras' visual video quality than to the features related to sound. The data demonstrated two
significant quality dimensions related to visual quality: darkness and sharpness. We conclude that the qualitative
methodology can complement quantitative quality estimations also with audiovisual material. The IBQ approach is
valuable especially, when the induced quality changes are multidimensional.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.