Real-time videoconferencing using cellular devices provides natural communication to the Deaf community. For
this application, compressed American Sign Language (ASL) video must be evaluated in terms of the intelligibility
of the conversation and not in terms of the overall aesthetic quality of the video. This work presents a paired
comparison experiment to determine the subjective preferences of ASL users in terms of the trade-off between
intelligibility and quality when varying the proportion of the bitrate allocated explicitly to the regions of the
video containing the signer. A rate-distortion optimization technique, which jointly optimizes a quality criteria
and an intelligibility criteria according to a user-specified parameter, generates test video pairs for the subjective
experiment. Experimental results suggest that at sufficiently high bitrates, all users prefer videos in which the
non-signer regions in the video are encoded with some nominal rate. As the total encoding bitrate decreases,
users generally prefer video in which a greater proportion of the rate is allocated to the signer. The specific
operating points preferred in the quality-intelligibility trade-off vary with the demographics of the users.
The subjective tests used to evaluate image and video quality estimators (QEs) are expensive and time consuming.
More problematic, the majority of subjective testing is not designed to find systematic weaknesses in the evaluated
QEs. As a result, a motivated attacker can take advantage of these systematic weaknesses to gain unfair monetary
advantage. In this paper, we draw on some lessons of software testing to propose additional testing procedures
that target a specific QE under test. These procedures supplement, but do not replace, the traditional subjective
testing procedures that are currently used. The goal is to motivate the design of objective QEs which are better
able to accurately characterize human quality assessment.
Communication of American Sign Language (ASL) over mobile phones would be very beneficial to the Deaf
community. ASL video encoded to achieve the rates provided by current cellular networks must be heavily
compressed and appropriate assessment techniques are required to analyze the intelligibility of the compressed
video. As an extension to a purely spatial measure of intelligibility, this paper quantifies the effect of temporal
compression artifacts on sign language intelligibility. These artifacts can be the result of motion-compensation
errors that distract the observer or frame rate reductions. They reduce the the perception of smooth motion
and disrupt the temporal coherence of the video. Motion-compensation errors that affect temporal coherence
are identified by measuring the block-level correlation between co-located macroblocks in adjacent frames. The
impact of frame rate reductions was quantified through experimental testing. A subjective study was performed
in which fluent ASL participants rated the intelligibility of sequences encoded at a range of 5 different frame rates
and with 3 different levels of distortion. The subjective data is used to parameterize an objective intelligibility
measure which is highly correlated with subjective ratings at multiple frame rates.
Sign language users are eager for the freedom and convenience of video communication over cellular devices. Compression of sign language video in this setting offers unique challenges. The low bitrates available make encoding decisions extremely important, while the power constraints of the device limit the encoder complexity.
The ultimate goal is to maximize the intelligibility of the conversation given the rate-constrained cellular channel and power constrained encoding device. This paper uses an objective measure of intelligibility, based on subjective testing with members of the Deaf community, for rate-distortion optimization of sign language video within the H.264 framework. Performance bounds are established by using the intelligibility metric in a Lagrangian cost function along with a trellis search to make optimal mode and quantizer decisions for each macroblock. The optimal QP values are analyzed and the unique structure of sign language is exploited in order to reduce
complexity by three orders of magnitude relative to the trellis search technique with no loss in rate-distortion performance. Further reductions in complexity are made by eliminating rarely occuring modes in the encoding process. The low complexity SL optimization technique increases the measured intelligibility up to 3.5 dB, at
fixed rates, and reduces rate by as much as 60% at fixed levels of intelligibility with respect to a rate control algorithm designed for aesthetic distortion as measured by MSE.
For members of the Deaf Community in the United States, current communication tools include TTY/TTD
services, video relay services, and text-based communication. With the growth of cellular technology, mobile
sign language conversations are becoming a possibility. Proper coding techniques must be employed to compress
American Sign Language (ASL) video for low-rate transmission while maintaining the quality of the conversation.
In order to evaluate these techniques, an appropriate quality metric is needed. This paper demonstrates that
traditional video quality metrics, such as PSNR, fail to predict subjective intelligibility scores. By considering
the unique structure of ASL video, an appropriate objective metric is developed. Face and hand segmentation
is performed using skin-color detection techniques. The distortions in the face and hand regions are optimally
weighted and pooled across all frames to create an objective intelligibility score for a distorted sequence. The
objective intelligibility metric performs significantly better than PSNR in terms of correlation with subjective
responses.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.