PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
UMHexagonS is the fast integer motion estimation algorithm adopted in H.264/AVC. It shows very good capability in both reducing motion estimation time and keeping rate distortion performance from QCIF format to HD format. With an averaging 0.05dB PSNR loss and only one case up to 0.1dB, more than 90% time reduction can be achieved by UMHexagonS compared with FFS. However, fixed search range is used in UMHexagonS. By adjusting search range dynamically, motion estimation efficiency can be improved greatly. Based on the new strategy of combining UMHexagonS and some dynamic search range algorithm, two new dynamic search range algorithms, NDSR, PDSR, thus, two new fast integer motion estimation algorithms, NDSR+UMHexagonS, PDSR+UMHexagonS, are proposed in this paper. Experiment results show that, compared with UMHexagonS, averaging 20% time reduction can be achieved with nearly no PSNR loss for the NDSR method, and averaging 50~60% time reduction can be achieved with averaging not more than 0.1dB PSNR loss for the PDSR method, which is very valuable to real-time applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a minimum variation (MINVAR) distortion criterion based approach for the rate distortion tradeoff in video coding. The MINVAR based rate distortion tradeoff framework provides a local optimization strategy as a rate control mechanism in real time video coding applications by minimizing the distortion variation while the corresponding bit rate fluctuation is limited by utilizing the encoder buffer. We use the H.264 video codec to evaluate the performance of the proposed method. As shown in the simulation results, the decoded picture quality of the proposed approach is smoother than that of the traditional H.264 joint model (JM) rate control algorithm. The global video quality, the average PSNR, is maintained while a better subjective visual quality is guaranteed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
MPEG-4 treats a scene as a composition of several objects or so-called video object planes (VOPs) that are separately encoded and decoded. Such a flexible video coding framework makes it possible to code different video object with different distortion scale. It is necessary to analyze the priority of the video objects according to its semantic importance, intrinsic properties and psycho-visual characteristics such that the bit budget can be distributed properly to video objects to improve the perceptual quality of the compressed video. This paper aims to provide an automatic video object priority definition method based on object-level visual attention model and further propose an optimization framework for video object bit allocation. One significant contribution of this work is that the human visual system characteristics are incorporated into the video coding optimization process. Another advantage is that the priority of the video object can be obtained automatically instead of fixing weighting factors before encoding or relying on the user interactivity. To evaluate the performance of the proposed approach, we compare it with traditional verification model bit allocation and the optimal multiple video object bit allocation algorithms. Comparing with traditional bit allocation algorithms, the objective quality of the object with higher priority is significantly improved under this framework. These results demonstrate the usefulness of this unsupervised subjective quality lifting framework.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
XML-based metadata is widely adopted across the different communities and plenty of commercial and open source tools for processing and transforming are available on the market. However, all of these tools have one thing in common: they operate on plain text encoded metadata which may become a burden in constrained and streaming environments, i.e., when metadata needs to be processed together with multimedia content on the fly. In this paper we present an efficient approach for transforming such kind of metadata which are encoded using MPEG's Binary Format for Metadata (BiM) without additional en-/decoding overheads, i.e., within the binary domain. Therefore, we have developed an event-based push parser for BiM encoded metadata which transforms the metadata by a limited set of processing instructions - based on traditional XML transformation techniques - operating on bit patterns instead of cost-intensive string comparisons.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
H.264/AVC is the newest block based video coding standard from MPEG and VCEG. It not only provides superior and efficient video coding at various bit rates, it also has a "network-friendly" representation thanks to a series of new techniques which provide error robustness. Flexible Macroblock Ordering (FMO) is one of the new error resilience tools included in H.264/AVC. Here, we present an alternative use of flexible macroblock ordering, using its idea of combining non-neighboring macroblocks together in one slice. Instead of creating a scattered pattern, which is useful when transmitting the data over an error-prone network, we divide the picture into a number of regions of interest and one remaining region of disinterest. It is assumed that people watching the video will pay much more attention to the regions of interest than to the remainder of the video. So we compress the regions of interest at a higher bit rate than the regions of disinterest, thus lowering the overall bit rate. Simulations show that the overhead introduced by using rectangular regions of interest is minimal, while the bit rate can be reduced by 30% and more in most cases. Even at those reductions the video stays pleasant to watch. Transcoders can use this information as well by reducing only the quality of the regions of disinterest instead of the quality of the entire picture if applying SNR scalability. In extreme cases the regions of disinterest can even be dropped easily, thus reducing the overall bit rate even further.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This work presents a partial development within the Internet2 Catalan project called "Integrated Project" which aims to design and build an advanced Internet environment based on Universal Multimedia Access (UMA) concept using MPEG-21 standard tools in order to enable transparent and augmented use of multimedia content across a wide range of networks, devices and by different users. The project is integrated with several modules using Web Service architecture in an interoperable manner to accomplish a complete distributed system. Within this framework, the DI Management & Personalization module provides services such as content recommendation, advanced searches, best content adaptation possibilities and session mobility management. By means of cataloguing tools, user preferences setting and update according to user's habit consumption, it offers content recommendations taking also into account user preferences, terminal capabilities, and network characteristics. Finally, during the consumption process, the Adaptation Decision Engine selects the best adaptation process in each case taking into account network characteristics, terminal capabilities, and state of AV content transcoding servers. The module provides extensive use of MPEG-21 and MPEG-7 standards ensuring interoperability with other similar systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It is highly desirable for many broadcast video applications to be able to provide support for many diverse user devices, such as devices supporting different resolutions, without incurring the bitrate penalty of simulcast encoding. On the other hand, video decoding is a very complex operation, while the complexity is very dependent on the resolution of the coded video. Low power portable devices typically have very strict complexity restrictions and reduced-resolution displays. For such environments total bitrate efficiency of combined layers is an important requirement, but the bitrate efficiency of a lower layer individually, although desired, is not a requirement. In this paper, we propose a complexity constrained scalable system, based on the Reduced Resolution Update mode that enables low decoding complexity, while achieving better Rate-Distortion performance than an equivalent simulcast based system. Our system is targeted on broadcast environment with some terminals having very limited computational and power resources.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present a novel video coding scheme based on overcomplete motion compensated temporal filtering (OMCTF) for robust video transmission over wireless channel. In this research, we introduce EREC and unequal error protection strategy under the framework of OMCTF. The intrinsic nature of the OMCTF structure not only provides fully scalable features for time-varying mobile wireless channels but also facilitates unequal error protection for robust video transmission. Since the most destructive effect of channel induced error in video communications over error-prone network is due to the loss of synchronization at the decoder, we apply error resilient entropy coding (EREC) to the compressed video bitstream to gain additional error resilience with negligible increase in bit budget. With EREC, the bitstream can be reorganized into fixed-length slots so that the synchronization can be easily regained at the beginning of the next uncorrupted slot, which will greatly limit the error propagation in the scalable video bitstream. To further enhance the robustness for video communication over error-prone channels, we introduce the Rate Compatible Punctured Convolutional (RCPC) codes for the protection of transmitted video over mobile wireless links. Due to the apparent prioritization order in the framework of OMCTF, the RCPC codes can be easily applied to achieve unequal error protection in the proposed OMCTF-based scheme. Integration of these strategies under the framework of OMCTF-based video coding scheme enables us to achieve robust video transmission over mobile wireless channels with an increased error resilience capability. Experimental results have demonstrated the performance of the proposed approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, different types of mutihypothesis techniques are studied. Based on the transform domain multihypothesis provided in redundant discrete wavelet transform (RDWT), other multihypothesis methods, such as spatial domain or temporal domain multihypothesis are added to improve the performance of a single multihypothesis scheme. Experimental results show that combining two types of multihypothesis motion compensation (MHMC) is promising. Choosing wisely, combining three MHMC methods is possible to get a gain. But as more and more multihypothesis involved, the room of improvement
is getting smaller and smaller. At the same time, the increasement of the overhead burden also limits the gain of adding more hypothesis. Also the limitation of combining MHMC techniques along with the vector burden are discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, we address phase-signature based digital image watermarking. The signature is extracted from the Fourier phase information of the digital media. It is then embedded in the Fourier magnitude spectrum. The detection and/or authentication is based on the well established area of phase-only filter based correlation techniques in the optics community. We propose to analyze the distortion coming out of the embedding process, model it, and eventually parameterize it so that optimal embedding can be done by trading off with other aspects of watermarking. It will be shown why permutation like functions facilitate the signature embedding process by minimizing the embedding degradation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Digital watermarking is considered to be a major technology for the protection of multimedia data. Some of the important applications are broadcast monitoring, copyright protection, and access control. In this paper, we present a semi-blind watermarking scheme for embedding a logo in color images using the DFT domain. After computing the DFT of the luminance layer of the cover image, the magnitudes of DFT coefficients are compared, and modified. A given watermark is embedded in three frequency bands: Low, middle, and high. Our experiments show that the watermarks extracted from the lower frequencies have the best visual quality for low pass filtering, adding Gaussian noise, JPEG compression, resizing, rotation, and scaling, and the watermarks extracted from the higher frequencies have the best visual quality for cropping, intensity adjustment, histogram equalization, and gamma correction. Extractions from the fragmented and translated image are identical to extractions from the unattacked watermarked image. The collusion and rewatermarking attacks do not provide the hacker with useful tools.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a scheme that apply the pixel-wise masking technique used in image watermarking into video sequence. The proposed algorithm deploys the video watermarking in the redundant discrete wavelet transform (RDWT) domain. The advantages of using an overcompleted wavelet transform instead of the traditional critically subsampled discrete wavelet transform (DWT) are discussed. The redundancy in the transform domain facilitates a better detection
of the video texture characteristics in a video sequence. Thus leads to an efficient watermark casting scheme, in which more strength of watermark can be embedded into the video sequence, but still not perceivable. Different methods of using RDWT or DWT coefficients to add the watermarking are compared. Experimental results show that RDWT domain video watermarking offers greater robustness than DWT
domain method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Picture ID authentication is very important for any identification verifications and extremely critical for homeland security. Here we propose a unique picture ID authentication apparatus which combines invisible watermark embedding and detection technology with facial recognition techniques. To demonstrate this apparatus, we implemented a system that is capable of fast and secure verification on the integrity and authenticity of ID documents with face images for Boeing. The proposed invisible watermarks tolerate most-common attacks such as recompression. We believe with only minor improvement this picture ID authentication system can be deployed in real environment at airports and country borders.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Motion plays a fundamental role in coding and processing of video
signals. Existing approaches of modeling video source are mostly
based on explicitly estimating motion information from intensity
values. Despite its conceptual simplicity, motion estimation (ME)
is a long-standing open problem itself and accordingly the
performance of a system operated on inaccurate motion information
is unlikely to be optimal. In this paper, we present a novel
approach of modeling video signals without explicit ME. Instead,
motivated by a duality between edge contour of image and motion
trajectory of video, we demonstrate that the spatio-temporal
redundancy in video can be exploited by Least-Square(LS) based
adaptive filtering techniques. We consider the application of such
implicit motion models into the problem of error concealment or
more generally known as video inpainting. Our experimental results
have shown the excellent performance of the proposed LS-based
error concealment techniques under a variety of information loss
conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The application of error concealment in video communication is very important when compressed video sequences are transmitted over error-prone networks and erroneously received. In this paper, we propose a novel error concealment scheme, in which the concealment problem is formulated as minimizing, in a weighted manner, the difference between the gradient of the reconstructed data and a prescribed vector field under given boundary condition. Instead of using the motion compensated block as the final recovered pixel values, we use the gradient of the motion compensated block together with the surrounding correctly decoded pixels of the damaged block to reconstruct the lost data. Both temporal and spatial correlations of the video signals are exploited in the proposed scheme. A well designed weighting factor is used to control the regulation level at a desired direction according to the local blockiness degree at the boundaries of the recovered block. The experimental results show that the proposed algorithm is able to achieve higher PSNR as well as better visual quality in comparison with the error concealment feature implemented in the H.264 reference software. The blocking effects are greatly alleviated while the structural information in the interior of the recovered block is well preserved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Compressed video data is very vulnerable to channel disturbances when transmitted through time-varying wireless channel with high value of BER. In this paper, a novel efficient content-adaptive spatial error concealment is presented. Specifically, we adopt a finite state Markov chain as a mathematically tractable model for wireless channels and investigate the corresponding error patterns of video transmitted under different channel conditions. Then, by analyzing the multiple features extracted from the surrounding available blocks, the proposed algorithm first estimates the characteristics of the error block (EB), and then suitable methods are employed to conceal the EB accordingly. Simulation results demonstrate that the proposed algorithm may obtain very good subjective quality whether the content in the EB is with or without complex texture, while being superior to existing approaches for countermeasure of artifacts caused by wireless communications. It is also noted that the proposed method is quite applicable in practice due to its low computational complexity comparing with the conventional methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a two-stage FEC scheme with an enhanced MAC protocol especially for multimedia data transmission over wireless LANs. The proposed scheme enables the joint optimization of protection strategies across the protocol stack, and packets with errors are delivered to the application layer for correction or drop. In stage 1, packet-level FEC is added across packets at the application layer to correct packet losses due to congestion
and route disruption. In stage 2, bit-level FEC is processed within both application packets and stage-one FEC packets to recover from bit errors in the MAC/PHY layer. Header CRC/FEC are used to enhance the MAC/PHY layer and to cooperate with the two stage FEC scheme. Thus, we add FEC only at the application layer, but can correct both application layer packet drops and MAC/PHY layer bit errors. We explore both the efficiency of bandwidth utilization and video performance using the scalable video coder MC-EZBC and ns-2 simulations. Simulation results show that the proposed scheme outperforms conventional IEEE 802.11.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The James Webb Space Telescope (JWST) is expected to produce a vast amount of images that are valuable for astronomical research and education. To support research activities related to JWST mission, NASA has provided funds to establish the Structures Pointing and Control Engineering (SPACE) Laboratory at the California State University, Los Angeles (CSULA). One of the research activities in SPACE lab is to design an effective and efficient transmission system to disseminate JWST images across the Internet.
This paper presents a prioritized transmission method to provide the best quality of the transferred image based on the joint-optimization of content-based retransmission and error concealment. First, the astronomical image is compressed using a scalable wavelet-based approach, then packetized into independently decodable packets. To facilitate the joint-optimization of two mutually dependent error control methods, a novel content index is declared to represent the significance of the packet content as well as its importance in error concealment. Based on the defined content index, the optimal retransmission schedule is determined to maximize the quality of the received image under delay constraint with the given error concealment method. Experimental results demonstrate that the proposed approach is very effective to combat the packet loss during transmission to achieve a desirable quality of the received astronomical images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Turbo codes is a promising technique for distributed source coding (DSC) in sensor networks because of its simple encoding implementation in sensors and the high decoding performance at the receiver. Different than the scenario in channel coding that only one distortion from physical channel exists, two types of distortion
from both physical channel and BSC co-exist in a distributive source coding scenario. In this paper, first, the conventional Turbo decoding is modified to handle BSC distortion. Then, it is further modified to decode mixed data with both types of distortion simultaneously. By redefining channel reliabilities and calculating the extrinsic information considering both distortions, the new decoding algorithm well matches the realistic DSC scenario and indeed improves decoding performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we discuss the maximum a-posteriori probability (MAP) decoding of variable length codes(VLCs) and propose a novel decoding scheme for the Huffman VLC coded data in the presence of noise. First, we provide some simulation results of VLC MAP decoding and highlight some features that have not been discussed yet in existing work. We will show that the improvement of MAP decoding over the conventional VLC decoding comes mostly from the memory information in the source and give some observations regarding the advantage of soft VLC MAP decoding over hard VLC MAP decoding when AWGN channel is considered. Second, with the recognition that the difficulty in VLC MAP decoding is the lack of synchronization between the symbol sequence and the coded bit sequence, which makes the parsing from the latter to the former extremely complex, we propose a new MAP decoding algorithm by integrating the information of self-synchronization strings (SSSs), one important feature of the codeword structure, into the conventional MAP decoding. A consistent performance improvement
and decoding complexity reduction over the conventional VLC MAP decoding can be achieved with the new scheme.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An algorithm for robust transmission of compressed 3-D mesh data is
proposed in this work. In the encoder, we partition a 3-D mesh
adaptively according to the surface complexity, and then encode each
partition separately to reduce the error propagation effect. To
encode joint boundaries compactly, we propose a boundary edge
collapse rule, which also enables the decoder to zip partitions
seamlessly. In the decoder, an error concealment scheme is employed
to improve the visual quality of corrupted partitions. The
concealment algorithm utilizes the information in neighboring
partitions and reconstructs the lost surface based on the
semi-regular connectivity reconstruction and the polynomial
interpolation. Simulation results demonstrate that the proposed
algorithm provides a good rendering quality even in severe error
conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a network-adaptive transport error control (TEC) mechanism over the IEEE 802.11b wireless LAN (WLAN). The proposed TEC mechanism adaptively combines packet-level FEC (forward error correction) and interleaving based on the monitored network status. The FEC component is designed to combat the varying packet loss (i.e., erasure) rate while the interleaving is addressing the burstiness of packet losses. By using end-to-end monitoring combined with the packet transport, the receiver sends feedbacks about the measured packet loss rate and delay. The proposed TEC mechanism adjusts the level of error protection according to the latest status of underlying IEEE 802.11b WLAN. To verify the feasibility of the proposed TEC mechanism, we have evaluated the transport performance over the real IEEE 802.11b WLAN testbed as well as over the emulated WLAN environment. The experimental results show that the proposed mechanism can enhance the reliability of video streaming.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Providing a certain quality of service (QoS) for multimedia
transmissions over a noisy wireless channel has always been a
challenge. The IEEE 802.11 standardization dedicates a working
group, group e, to investigate and propose a solution for enabling
IEEE 802.11 networks to provide multimedia transmissions with
certain QoS supports. As drafted in the latest draft release, the
IEEE 802.11e working group proposes the use of contention based
mechanism to achieve the transmissions of prioritized traffic, which
in turn provides a framework to support multimedia transmissions
over IEEE 802.11 networks. However, such a contention based priority
scheme does not deliver a strong QoS capability.
In this paper, we first study the characteristics of the IEEE
802.11e network. For all the four defined priorities of IEEE
802.11e, we first investigate their capacity characteristics. We
then design a resource allocation technique to better utilize the
bandwidth and improve the performance of video transmissions. Our
design uses a QoS mapping scheme according to the IEEE 802.11e
protocol characteristics to deliver scalable video. In addition, we
design an appropriate cross-layer video adaptation mechanism for the
scalable video that further improves the video quality combining
with our proposed resource allocation technique. We have evaluated
our proposed technique via simulations (NS2). We use PSNR as our
video quality measures. Our results show improvement in video
quality and resource usage when our proposed technique is
implemented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a cost-effective realization of uncompressed HD (high definition) video transport system over high-speed IP network. This transport system is motivated by the emergence of wide-area high-speed optical networks, where up to 10Gbps can be achieved. The uncompressed version of HD (commonly referred as HDTV) video requires around 1.5Gbps and thus it is really challenging to transport this huge-size uncompressed HD video contents in real-time. Also, it allows us to maintain ultra-high quality with low-latency visualization, since the time and resource consuming encoding/decoding processes are eliminated. Thus, in order to support an interactive video-based collaboration environment with ultimate HD video quality, we have designed and implemented a flexible HD video transport prototype system. Finally, the experiments conducted with the prototype system highlight the possibility of the proposed implementation, in addition to the performance bottlenecks and remaining tasks to be refined and completed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a proxy-based scheme for segment-based distributed
streaming services among UPnP(universal plug and play)-enabled
home networks. We design a "SHARE" module that extends HG (home
gateway) with a UPnP-compatible protocol. By relaying SSDP (simple
service discovery protocol) messages defined in the UPnP device
architecture, the SHARE module provides connectivity that is
needed to control other UPnP devices for streaming services among
home networks. To provide the streaming services, the SHARE module
tries to coordinate the distribution of streaming loads among
multiple senders by using many-to-one distributed streaming
service. It also tries to minimize the quality degradation of
streaming services based on the system and network resource status
of each sender by leveraging the UPnP QoS services. That is,
pre-allocation of HG resources according to the UPnP QoS services
can be used to improve the quality of streaming services. Based on
the UPnP components, the SHARE module provides the transparent
content sharing to users. Through design-level verifications and
partial implementations of the proposed SHARE module, we validate
the feasibility of our work.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multi-party collaborative environments are extensively utilized for distance learning, e-science, and other events for distributed global collaboration. In such environments, improved media service is expected to improve QoE (quality of experience) by better participants' perception in collaboration session. In this paper, we design and implement a high-quality video service attempted to be operated with a multi-party collaborative environments, Access Grid (AG), over heterogeneous networks. The proposed service uses the combination of high-quality video formats, network monitoring, and network-adaptive transmission. Its primary application is to provide seamless high-resolution video delivery service by considering one-to-many aspect of video distribution in AG-based multi-party collaborative environments. The implementation results are evaluated over a multicast-enabled network testbed and the experiment results verify the improved video service capability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Access Grid (AG) provides collaboration environments over the
IP multicast networks by enabling efficient exchange of multimedia
contents among remote users; however, since lots of current
networks are still multicast-disabled, it is not easy to deploy
this multicast-based multi-party AG. For this problem, the AG
provides multicast bridges as a solution by putting a relay server
into the multicast networks. Multicast-disabled clients make UDP
connections with this relay server and receives forwarded
multicast traffics in unicast UDP packets. This solution is facing
several limitations since it requires duplicate forwarding of the
same packet for each unicast peer. Thus, in this paper, we propose
an alternate solution for the multicast connectivity problem of
the AG based on the UMTP (UDP multicast tunneling protocol). By
taking advantage of flexibilities of UMTP, the proposed solution
is designed to improve the efficiency of network and system
utilization, to allow reuse of multicast-based AG applications
without modification, and to partially address the NAT/firewall
traversal issues. To verify the feasibility of proposed solution,
we have implemented a prototype AG connectivity tool based on the
UMTP, named as the AG Connector.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes an internet-based system for telepathology. This system provides support for multiple users and exploits the opportunities for optimization that arise in multi-user environment. Techniques for increasing system responsiveness by improving resource utilization and lowering network traffic are explored. Some of the proposed optimizations include an auto-focus module, client and server side caching, and request reordering. These systems can be an economic solution not only for remote pathology consultation but also for pathology and biology education.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Meeting environments, such as conference rooms, executive briefing centers, and exhibition spaces, are now commonly equipped with multiple displays, and will become increasingly display-rich in the future. Existing authoring/presentation tools such as PowerPoint, however, provide little support for effective utilization of multiple displays. Even using advanced multi-display enabled multimedia presentation tools, the task of assigning material to displays is tedious and distracts presenters from focusing on content.
This paper describes a framework for automatically assigning presentation material to displays, based on a model of the quality of views of audience members. The framework is based on a model of visual fidelity which takes into account presentation content, audience members' locations, the limited resolution of human eyes, and display location, orientation, size, resolution, and frame rate. The model can be used to determine presentation material placement
based on average or worst case audience member view quality, and to warn about material that would be illegible.
By integrating this framework with a previous system for multi-display presentation [PreAuthor, others], we created a tool that accepts PowerPoint and/or other media input files, and automatically generates a layout of material onto displays for each state of the presentation. The tool also provides an interface allowing the presenter to modify the automatically generated layout before or during the actual presentation. This paper discusses the framework, possible application scenarios, examples of the system behavior, and our experience with system use.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Technology abounds for capturing presentations. However, no simple solution exists that is completely automatic. ProjectorBox is a "zero user interaction" appliance that automatically captures, indexes, and manages presentation multimedia. It operates continuously to record the RGB information sent from presentation devices, such as a presenter's laptop, to display devices, such as a projector. It seamlessly captures high-resolution slide images, text and audio. It requires no operator, specialized software, or changes to current presentation practice. Automatic media analysis is used to detect presentation content and segment presentations. The analysis substantially enhances the web-based user interface for browsing, searching, and exporting captured presentations. ProjectorBox has been in use for over a year in our corporate conference room, and has been deployed in two universities. Our goal is to develop automatic capture services that address both corporate and educational needs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Sponsored by the National Aeronautical Space Association (NASA), the Synergetic Education and Research in Enabling NASA-centered Academic Development of Engineers and Space Scientists (SERENADES) Laboratory was established at California State University, Los Angeles (CSULA). An important on-going research activity in this lab is to develop an easy-to-use image analysis software with the capability of automated object detection to facilitate astronomical research. This paper presented a fast object detection algorithm based on the characteristics of astronomical images. This algorithm consists of three steps. First, the foreground and background are separated using histogram-based approach. Second, connectivity analysis is conducted to extract individual object. The final step is post processing which refines the detection results. To improve the detection accuracy when some objects are blocked by clouds, top-hat transform is employed to split the sky into cloudy region and non-cloudy region. A multi-level thresholding algorithm is developed to select the optimal threshold for different regions. Experimental results show that our proposed approach can successfully detect the blocked objects by clouds.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Linear Discriminant Analysis (LDA) has been widely applied in the field of face identification because of its simplicity and efficiency in capturing the most discriminant features. However LDA often fails when facing the change in illumination, pose or small training size. To overcome those difficulties, Principal Component Analysis (PCA), which recover the most descriptive/informative features in the reduced dimension feature space, are often used in preprocessing stage. Although there is a trend of preferring LDA over PCA in classification, it has been found that PCA may perform better than LDA in some cases, especially when the size of the training set is small. To better combine the merits of PCA and LDA, some rule-based parametric combination of PCA and LDA methods have been proposed. However in those methods the optimal parameter setting is not guaranteed and can only be approximated by exhaustive search. In this paper we propose a learning-based framework that can unify PCA and LDA in adaptively finding both discriminant and descriptive feature. To eliminate the parameter selection, we incorporate a non-linear boosting process to enhance a pool of hybrid classifiers and combine them into a more accurate one. To evaluate the performance of our boosted hybrid method, we compare it to state-of-the-art LDA variants and traditional PCA-LDA technique on three widely used face image benchmark databases. The experiment results show that our novel boosted hybrid discriminant analysis outperforms the other techniques and the best single hybrid classifier.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In managing large collections of digital photographs, there have been many research efforts to compute low level image features such as texture and color to aid different managing tasks (e.g. query-by-example applications or scene classification for image clustering). In this paper, we focus on the assessment of image quality as a complementary feature to improve the manageability of images. Specifically, we propose an effective and efficient algorithm to analyze the focus quality of the photographs and provide quantitative measurement of the assessment. In this algorithm, global figure-of-merits are computed from matrices of the local image statistics such as sharpness, brightness and color saturation. The global figure-of-merits represent how well each image meets the prior assumptions about focus quality of natural images. Then, a collection of the global figure-of-merits are used to decide how well-focused an image is. Experimental results show that the method can detect 90% of the out-of-focus photographs labeled by experts while producing 11% of false positives. We further apply this quantitative measure in image management tasks, including image content filtering/sorting based on the focus quality and image retrieval.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present the research and development status of two MPEG-7 indexing/search systems under development at the Computer Research Institute of Montreal (CRIM). The first (called ERIC-7) targets content-based encoding of still images and is mainly designed to experiment with the various aspects of the visual MPEG-7/XML schema with the help of analysis and exploration tools. The interface allows navigating graphically among the various descriptors in the XML files and through interactive UML graphics. The second (called MADIS) aims at providing a practical audio-visual MPEG-7 indexing/retrieval tool, within the framework of a light architecture. MADIS is designed to (1) be fully MPEG-7 compliant, (2) address both encoding and search, (3) combine audio, speech and visual modalities and (4) have search capability on the Internet. MADIS currently targets content-based indexing of documentary films.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video retrieval is mostly based on using text from dialogue and this remains the most significant component, despite progress in other aspects. One problem with this is when a searcher wants to locate video based on what is appearing in the video rather than what is being spoken about. Alternatives such as automatically-detected features and image-based keyframe matching can be used, though these still need further improvement in quality.
One other modality for video retrieval is based on segmenting objects from video and allowing endusers to use these as part of querying. This uses similarity between query objects and objects from video, and in theory allows retrieval based on what is actually appearing on-screen. The main hurdles to greater use of this are the overhead of object segmentation on large amounts of video and the issue of whether we can actually achieve effective object-based retrieval.
We describe a system to support object-based video retrieval where a user selects example video objects as part of the query. During a search a user builds up a set of these which are matched against objects previously segmented from a video library. This match is based on MPEG-7 Dominant Colour, Shape Compaction and Texture Browsing descriptors. We use a user-driven semi-automated segmentation process to segment the video archive which is very accurate and is faster than conventional video annotation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The problem of heterogeneous data mining deals with the computational challenges of searching multimedia data in a unified computational framework that can answer similarity queries of data mining by accurate and efficient means. The advances in data collection methodologies have generated large data-warehouses, in assortment of application domains, including but not limited to, Internet applications for multimedia retrieval and exchange. Heterogeneous data indexing has proven to be a valuable tool for complex data mining in large data domains inherently semi-structured in nature. We propose a solution to integrate the feature vectors of image and text by cooperatively representing them in a multidimensional spatial data structure, which has previously exhibited superior search performance in image database domains. We have evaluated results of content-based similarity queries on the indexing schema independently in images and textual domains. We have then studied and represented the effect of the choice of similarity metric on the similarity queries. We then propose an indexing schema that integrates the feature vectors of text and images to answer integrated queries on the unified heterogeneous data space. An added advantage of the proposed methodology is embodied by the fact that a textual feature vector can query a heterogeneous database to retrieve both text as well as images as query results. This solves the problem of individually querying each data-domain separately and sequentially scanning the integrated database for similarity results. The proposed methodology is time and space efficient, and is capable of answering complex heterogeneous data mining queries in multimedia domains.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we describe a user-oriented audiovisual retrieval model called the Semantic Views Model and its application to querying and browsing of audiovisual data described by MPEG-7. The Semantic Views Model was designed based on a user study in a professional TV production and archiving environment. Its goal is to provide a common and simple structure for the description of various audiovisual characteristics which are needed during querying and browsing. We present a query language and a hypermedia model, which are both designed based on the Semantic Views Model, and we describe their implementation to retrieve audiovisual data described by MPEG-7.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Content-based image retrieval in many cases involves performing a direct mapping operation between a query image and images stored in a database. Preliminary results are discussed on using image mapping through unsupervised learning, in the form of a Hybrid evolutionary algorithm (HEA), in a search for 3-dimensional objects that can be present in the database images. Content-based retrieval problem is formulated as the optimization problem of finding the proper mapping between the stored and the query images. The paper proposes an extension of the HEA-based method of the 2-dimensional image mapping to the 3-dimensional case. A set of image transformations is sought such that each transformation is applied to a different section of the image subject to mapping. The sought image transformation becomes a piece-wise approximation of the actual 3-D transformation of the object. The 2-D optimization problem of finding a parameter vector minimizing the difference between the images turns into a multi-objective optimization problem of finding a set of feasible parameter vectors that minimize the differences between the sections of the compared images. The search for a proper set of image transformations is conducted in a feature space formed by image local response, as opposed to a pixel-wise comparison of the actual images in the 2-D case. Using image response allows to reduce the computational cost of the search by applying thresholding techniques and a piece-wise approximation of the response matrix. The difference between the images is evaluated in the response space by minimizing the distance between the two-dimensional central moments of the image responses.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automatic music segmentation and structure analysis from audio waveforms based on a three-level hierarchy is examined in this research, where the three-level hierarchy includes notes, measures and parts. The pitch class profile (PCP) feature is first extracted at the note level. Then, a similarity matrix is constructed at the measure level, where a dynamic time warping (DTW) technique is used to enhance the similarity computation by taking the temporal distortion of similar audio segments into account. By processing the similarity matrix, we can obtain a coarse-grain music segmentation result. Finally, dynamic programming is applied to the coarse-grain segments so that a song can be decomposed into several major parts such as intro, verse, chorus, bridge and outro. The performance of the proposed music structure analysis system is demonstrated for pop and rock music.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As a result of advances in audio compression, availability of broadband Internet access at home and the popularity of electronic music distribution systems, today consumers acquire and store ever-increasing number of songs in their local databases. Moreover, consumer-devices with mass random-access storage and sophisticated rendering capabilities make the whole electronic music database available for instant playback. As opposed to traditional music playback where only a limited number of songs are manually selected, there is a strong need for intelligent play-list generation techniques that utilize the whole database while taking the user's interests into account. Moreover, it is desirable to present these songs in a seamlessly streaming manner with smooth transitions. In this paper, we propose a systematic expressive content retrieval system, called AutoDJ, that achieves both objectives. It automatically creates a play-list by sorting songs ac-cording to their low-level features and plays them in a smooth rhythmically consistent way after audio mixing. AutoDJ first builds a profile for each song using features such as tempo, beat and major. Afterwards, it uses a similarity metric to build up a play-list based on a "seed" song. Finally, it introduces smooth transition from one song (profile) to the other by equalizing the tempo and synchronizing the beat phase. We present the system design principles and the signal processing techniques used, as well as a simple AutoDJ demonstrator.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The problem of musical instrument recognition is investigated and its application to music segmentation is examined in this research. We propose a new framework for extracting features, in which audio frames are not placed uniformly as done traditionally. The equal spacing method inevitably leads to the consequence that some frames may contain a transition between notes and/or two notes may be included in one frame. Then, a music frame may consist of sounds from multiple instruments. Onset detection is integrated with frame location selection in this work to mitigate this phenomenon. This new framing scheme, called onsetaware framing scheme, provides comparable or better performance as compared with traditional methods. A new histogram-based feature is also presented and used with other common features in the musical instrument classification task. Feature reduction is adopted to reduce the dimensionality of the feature space. We conduct experiments on data sets of a different size with both synthesized and real musical signals. Finally, a simple segmentation method based on the musical instrument classification is proposed and demonstrated by an example.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a consumer video browsing system that enables use of multiple alternative summaries in a simple and effective user interface suitable for consumer electronics platforms. We present a news and talk video segmentation and summary generation technique for this platform. We use face detection on consumer video, and use simple face features such as face count, size, and x-location to classify video segments. More specifically, we cluster 1-face segments using face sizes and x-locations. We observe that different scenes such as anchorperson, outdoor correspondent, weather report, etc. form separate clusters. We then apply temporal morphological filtering on the label streams to obtain alternative summary streams for smooth summaries and effective browsing through stories. We also apply our technique to talk show video to generate separate summaries of monologue segments and guest interviews.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For multimedia servers, data independence is as beneficial as it is for databases. It means that users access the data without referring to the storage format, and the server returns them in many different formats and qualities. The storage format can then be chosen at will, and it should be selected to support a large variety of accesses. When looking at video, some of the accesses even require real-time processing. LLV1 is a layered video format for storing videos without loss of information. Its layers can be read separately, so that scalability is achieved in terms of bandwidth and computational resources. LLV1 has been developed on the basis of XviD, a state-of-the-art implementation of the MPEG-4 Part 2 standard, and is designed for use in multimedia servers to facilitate real-time format conversions, a requirement to reach data-independent access to media objects. Thus, XviD's advantages in efficient video compression are inherited by LLV1. Orthogonality of the layering is provided by the different enhancement layers in respect to temporal resolution and spatial properties. The compression efficiency is comparable to other lossless formats, however only LLV1 provides scalability features, which can be exploited in real-time processing. Moreover, the scalable design of the decompression algorithm allows for adaptable execution and thus makes QoS control possible. Additionally, the coding algorithm is asymmetric, which further reduces the computational requirements for delivering the multimedia content from storage to the end user.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we design and implement a prototype system with an adaptive intra-media synchronization scheme for haptic interactions in distributed virtual environments (DVEs). In order to interact with haptic interfaces in the DVEs, we require a stringent level of QoS service from the network. Especially, network delay jitter and packet loss over the Internet may seriously degrade the output quality of haptic media in the DVEs. The proposed system model consists of three layers (application, synchronization, and network) to support haptic interactions over time-varying IP networks. The intra-media synchronization scheme implemented in the system controls buffering time and transmission rate to enhance the QoS of networked haptic interactions. To demonstrate usefulness of the proposed scheme, we simulate it by using NS-2 simulator and implement simple haptic-based
DVEs. According to the simulation results, the proposed scheme provides more stable playout of haptic data under network delay jitter. Moreover, when lots of clients are participating in the DVEs, it can decrease the transmission rate of haptic data so that we can get smaller network delay. According to the experiment results, the proposed scheme can be applied to the real DVEs under network delay jitter.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a scalable visualization system to offer
high-resolution visualization on multiparty collaborative environments. The proposed system treats with a coordination technique to employ large-scale high-resolution display system and to display multiple high-quality videos effectively on systems with limited resources. To handle these, the proposed system includes the
distributed visualization application under generic structure to enable high-resolution video format, such as DV (digital video) and HDV (high definition video) streaming, and under decomposable decoding and display structure to assign the separated visualization
task (decoding/display) to different system resources. The system is
based on high-performance local area network and the high-performance network between decoding and display task is utilized as the system bus to transfer the decoded large pixel data. The main focus in this paper is the decoupling technique of decoding and display based on high-performance network to handle multiple high-resolution videos effectively. We explore the possibility of the proposed system by implementing a prototype and evaluating it over a high-performance network. Finally, the experiment results verify the improved scalable display system through the proposed structure.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A geometry compression algorithm for 3-D QSplat data using vector quantization (VQ) is proposed in this work. The positions of child spheres are transformed to the local coordinate system, which is determined by the parent children relationship. The coordinate transform makes child positions more compactly distributed in 3-D space, facilitating effective quantization. Moreover, we develop a constrained encoding method for sphere radii, which guarantees hole-free surface rendering at the decoder side. Simulation results show that the proposed algorithm provides a faithful rendering quality even at low bitrates.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, we propose two compression algorithms for PointTexture
3D sequences: the octree-based scheme and the motion-compensated
prediction scheme. The first scheme represents each PointTexture
frame hierarchically using an octree. The geometry information in
the octree nodes is encoded by the predictive partial matching (PPM)
method. The encoder supports the progressive transmission of the 3D
frame by transmitting the octree nodes in a top-down manner. The
second scheme adopts the motion-compensated prediction to exploit
the temporal correlation in 3D sequences. It first divides each
frame into blocks, and then estimates the motion of each block using
the block matching algorithm. In contrast to the motion-compensated
2D video coding, the prediction residual may take more bits than the
original signal. Thus, in our approach, the motion compensation is
used only for the blocks that can be replaced by the matching
blocks. The other blocks are PPM-encoded. Extensive simulation
results demonstrate that the proposed algorithms provide excellent
compression performances.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An algorithm of compound adaptive digital watermarking was presented, which can embed robust watermark and fragile watermark in one image simultaneously. This paper realized and verified two embedding tactics, course of inbedding, draw tactics, draw course is of watermark. The result of validate indicated that the embedding algorithm of compound watermark scarified less robust watermark as a cost, which achieved the dual-protection function for the primordial images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a new counter-geometric attack watermarking scheme is proposed, which uses the matching of corner points that are extracted by Harris corner detector. In the process of watermark embedding, the watermark is adaptively embedded according to the HVS. In the detection process, a new matching method implements the coarse matching of the corner points and random sample consensus is used to refine the matching of the corner points. And then the parameter of affine transform is precisely estimated by the matching of corner points. Therefore, the watermark is detected based on the register of the geometrical-attack watermarked image. The experimental results are shown that this proposed scheme can not only counter geometrical attacks and signal processing but also improve the capacity of watermark embedding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Audio retrieval is an important research topic in audio field. A good audio retrieval system is very helpful to facilitate users to find the target audio materials. In such a system, while audio features are fundamental in representing an audio clip, similarity measure is also an important fact affecting the performance of audio retrieval. In previous research works, there are many proposed audio features and used distance measures. However, there is not yet a good study about the effectiveness of different features and distance measures. Therefore, in this paper, we perform a comparative study on various audio features and various distance measures (similarity measures). The compared audio features include Mel-frequency cepstral coefficients (MFCC), Linear Predictive Coding Coefficients (LPC), sub-band energy distribution and some other temporal/spectral features, while the compared distances include Euclidean distance, Kullback-Leibler (K-L) divergence, Mahalanobis distance and Bhattacharyya distance. The study is expected to be helpful in the further design of audio retrieval system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper an audio separation algorithm is presented, which is based on Independent Component Analysis (ICA). Audio separation could be the basis for many applications for example in the field of telecommunications, quality enhancement of audio recordings or audio classification tasks. Well known ICA algorithms are not usable for real-world recordings at the time, because they are designed for signal mixtures based on linear and over time constant mixing matrices. To adapt a standard ICA algorithm for real-world two-channel auditory scenes with two audio sources, the input audio
streams are segmented in the time domain and a constant mixing matrix within a segment is assumed. The next steps are a time-delay estimation for each audio source in the mixture and a determination of the number of existing sources. In the following processing steps, for each source the input signals are time shifted and a standard ICA for linear mixtures is performed. After that, the remaining tasks are an evaluation of the ICA results and the construction of the resulting audio streams containing the separated sources.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a multiple description coder for motion vector (MV-MDC) based on data partitioned bitstream of the H.264/AVC standard. The proposed multiple description (MD) encoder separates the motion vector (MV) into two parts having the same priority and transmits each part through an independent packet. The proposed MD decoding scheme utilizes two matching criteria to find the accurate MV estimate when one of the MV descriptions is lost. Simulation results show that compared to simply duplicated bitstream transmission, the proposed MV-MDC scheme reduces a large amount of data without serious visual quality loss of reconstructed picture.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Most of fast block motion estimation algorithms reported so far in literatures aim to reduce the computation in terms of the number of search points, thus do not fit well with multimedia processors due to their irregular data flow. For multimedia processors, proper reuse of data is more important than reducing number of absolute difference operations because the execution cycle performance strongly depends on the number of off-chip memory access. Therefore, in this paper, we propose a sub-sampling predictive line search (SS-PLS) algorithm using line search pattern which can increase data reuse from on-chip local buffer, and check sub-sampling points in line search pattern to reduce unnecessary SAD operation. Our experimental results show that the prediction error (MAE) performance of the proposed SS-PLS is similar to that of the full search block matching algorithm (FSBMA), while compared with the hexagonal-based search (HEXBS), the SS-PLS outperforms. Also the proposed SS-PLS requires much lower off-chip memory access than the conventional fast motion estimation algorithm such as the hexagonal-based search (HEXBS) and the predictive line search (PLS). As a result, the proposed SS-PLS algorithm requires lower number of execution cycles in multimedia processor.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The standardization for the scalable extension of H.264 has called
for additional functionality based on H.264 standard to support the
combined spatio-temporal and SNR scalability. For the entropy coding
of H.264 scalable extension, Context-based Adaptive Binary
Arithmetic Coding (CABAC) scheme is considered so far. In this
paper, we present a new context modeling scheme by using inter layer
correlation between the syntax elements. As a result, it improves
coding efficiency of entropy coding in H.264 scalable extension. In
simulation results of applying the proposed scheme to encoding the
syntax element mb_type, it is shown that improvement in
coding efficiency of the proposed method is up to 16% in terms of
bit saving due to estimation of more adequate probability model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.