KEYWORDS: Video, Computer programming, Optical engineering, Video processing, Multimedia, Optimization (mathematics), Distortion, Detection and tracking algorithms, Fluctuations and noise, Sun
Video transcoding is usually conducted when the device does not support the current format or has limited storage capacity. Video transcoding is a computation-intensive process that changes one format to another one, and various multimedia applications have made it important in recent years. We present a new motion vector (MV) composition algorithm for arbitrary frame-size video transcoding. The proposed method uses the relation between the prediction error and the required bits when encoding MVs to form an auxiliary function called the Lagrange function. Therefore, MV composition is converted into a constrained optimization problem. Through the Lagrangian optimization, a dominant MV is selected from a set of candidate MVs by minimizing this cost function. The major contribution of the proposed method is that we emphasize the effect of the bits required to encode MVs; therefore, at the same target bitrate, the proposed method provides better coding performance. Experimental results show that the proposed method has better performance in terms of both objective and subjective qualities than other existing methods.
This paper proposes a new approach to improve the coding performance of intra block coding in H.264/AVC via finite
state machine. Grounding on high correlation between neighboring blocks, finite state machine is employed both at
encoder and decoder to reduce the number of bits required for encoding to enhance coding performance. Two extra intra
prediction modes are created in our proposed method. Through these two modes, the number of bits required to denote
the current block is greatly reduced and low bit rate can be achieved. Experimental results show that the proposed
method can greatly improve coding efficiency of intra macroblock coding in H.264/AVC.
An image authentication and tampering localization technique based on a wavelet-based digital watermarking procedure [Opt. Express 3(12), 491-496 (1998)] is proposed. To determine whether a given watermarked image has been tampered with or not, the similarity between the extracted and embedded watermarks is measured. If the similarity is less than a threshold value, the proposed sequential watermark alignment based on a coefficient stamping (SWACS) scheme is used to determine the modified wavelet coefficients corresponding to the tampered region. Then, the morphological region growing and subband duplication (MRGSD) scheme are used to include neighboring wavelet coefficients and then duplicate the wavelet coefficients in other subbands. The experimental results show that the proposed SWACS and MRGSD schemes can efficiently identify different types of image tampering. Moreover, the detection performance of the proposed system on various sizes of the watermark and tampered region is also evaluated.
A modified partial distortion search algorithm considering the neighboring-block correlation property is proposed for fast motion estimation. The motion vector information of neighboring coded blocks is used to predict the possible occurrence region of the global-minimum-distortion position of the current block. In addition, a dynamic search-range decision algorithm is also proposed for automatically changing the size of the search range. Afterwards, the normalized partial distortion search is performed in the selected region instead of the whole search window. Through the proposed algorithms, the computational complexity can be significantly reduced with slight objective quality degradation.
In this paper, we proposed a scheme for TV news segmentation via exploring the efficient visual features. The proposed scheme can be divided into three parts, such as shot change detection based on skin color; probable anchorperson shot detection and anchorperson detection. According to experimental results, our proposed method can efficiently decompose TV news into anchorperson shots and report shots. Compared to the traditional face detection methods, the proposed method can robustly exclude the non-anchorperson shots in report shots such as the interview scenes. Experimental results are given to demonstrate the feasibility and efficiency of the proposed technique.
A multimedia database system should deal efficiently with both image compression and retrieval functions. It is critical to develop image indexing techniques that search databases based on their content in a compressed domain. We propose a new scheme, query by index image, based on vector quantization, to facilitate image retrieval in a compressed domain. The proposed algorithm exploits different index images obtained by sorting codevectors to capture various kinds of image feature. Hence, intrablock correlation and interblock correlation in an image can be efficiently represented. Our proposed algorithm not only can extract features from the pixel domain but also from a transform domain, such as that of wavelet coefficients. Experimental results demonstrate that the retrieval performance of the proposed scheme is more accurate than that of other similar methods.
Arbitrary shaped coding is an important issue of MPEG-4. In this study, an efficient shaped coding method, called the boundary block-searching (BBS) algorithm, which can enhance the coding efficiency of conventional padding schemes, is proposed. The proposed BBS algorithm assumes that boundary blocks have strong correlation even though they are not connected. For an input boundary block, the most similar block (only object pixels are considered) is sought from the previously coded data. Instead of being encoded by the use of discrete cosine transform, the boundary block is encoded by a position vector, which indicates the relative position of the most similar block. Therefore, the number of bits required to denote the boundary block is greatly reduced and low bit rate can be achieved. For two video sequences under different test conditions, simulation results show that the proposed BBS algorithm can greatly improve coding efficiency.
Intelligent video pre-processing and authoring techniques that facilitate people to create MTV-style music video clips are investigated in this research. First, we present an automatic approach to detect and remove bad shots often occurring in home video, such as video with poor lighting or motion blur. Then, we consider the generation of MTV-style video clips by performing video and music tempo analysis and seeking an effective way in matching these two tempos. Experiment results are given to demonstrate the feasibility and efficiency of the proposed techniques for home video editing.
Intelligent video pre-processing and authoring techniques that facilitate people to create MTV-style music video clips are investigated in this research. First, we present an automatic approach to detect and remove bad shots often occurring in home video, such as video with poor lighting or motion blur. Then, we consider the generation of MTV-style video clips by performing video and music tempo analysis and seeking an effective way in matching these two tempos. Experiment results are given to demonstrate the feasibility and efficiency of the proposed techniques for home video editing.
An approach to extract traffic events by integrating the low-level, middle-level, and high-level feature extraction modules is developed in this research. To be more specific, the low-level module extracts features such as motion, size, and location. The middle-level module builds a bridge between the road surface plane in the real world and the captured image plane by geometric analysis. Finally, the high-level module looks for traffic events such as "traffic jam", "lane
change", and "traffic rule violation", which require the understanding of the video contents in a specific knowledge
domain. In the high-level module, various traffic events are related to motion characteristics obtained from the middle-level module. It is demonstrated by experimental results that the proposed system can achieve robust traffic event extraction. The effectiveness of the proposed technique is analyzed. Conventional traffic event extraction methods demand the knowledge of capturing conditions for camera calibration. This requirement can be greatly relaxed in our proposed scheme.
A skimming system for movie content exploration is proposed using story units extracted via general tempo analysis of audio and visual data. Quite a few schemes have been proposed to segment video data into shots with low-level features, yet the grouping of shots into meaningful units, called story units here, is important and challenging. In this work, we detect similar shots using key frames and include these similar shots as a node in the scene transition graph. Then, an importance measure is calculated based on the total length of each node. Finally, we select sinks and shots according to this measure. Based on these semantic shots, a meaningful skims can be successfully generated. Simulation results will be presented to show that the proposed video skimming scheme can preserve the essential and significant content of the original video data.
Story units are extracted by general tempo analysis including tempos analysis including tempos of audio and visual information in this research. Although many schemes have been proposed to successfully segment video data into shots using basic low-level features, how to group shots into meaningful units called story units is still a challenging problem. By focusing on a certain type of video such as sport or news, we can explore models with the specific application domain knowledge. For movie contents, many heuristic rules based on audiovisual clues have been proposed with limited success. We propose a method to extract story units using general tempo analysis. Experimental results are given to demonstrate the feasibility and efficiency of the proposed technique.
KEYWORDS: Distortion, Databases, System identification, Feature extraction, Internet, Signal processing, Bandpass filters, System integration, Multimedia, Data processing
In this work, we present an audio content identification system that identifies some unknown audio material by comparing its fingerprint with those extracted off-line and saved in the music database. We will describe in detail the procedure to extract audio fingerprints and demonstrate that they are robust to noise and content-preserving manipulations. The main feature in the proposed system is the zero-crossing rate extracted with the octave-band filter bank. The zero-crossing rate can be used to describe the dominant frequency in each subband with a very low computational cost. The size of audio fingerprint is small and can be efficiently stored along with the compressed files in the database. It is also robust to many modifications such as tempo change and time-alignment distortion. Besides, the octave-band filter bank is used to enhance the robustness to distortion, especially those localized on some frequency regions.
Vision-based highway monitoring systems play an important role in transportation management and services owing to their powerful ability to extract a variety of information. Detection accuracy of vision-based systems is however sensitive to environmental factors such as lighting, shadow and weather conditions, and it is still a challenging problem to maintain detection robustness at all time. In this research, we present a novel method to enhance detection and tracking accuracy at the nighttime based on rear-view monitoring. In the meanwhile, a method is proposed to improve the background detection and extraction, which usually serves as the first step to moving object region detection. Finally, the effectiveness of the rear-view technique will be analyzed. We compare the tracking accuracy between the front-view and the rear-view techniques, and show that the proposed system can achieve higher detection accuracy at nighttime.
KEYWORDS: Video, Visualization, Information visualization, Image segmentation, Feature extraction, Data processing, Sensors, Cameras, System integration, Visual information processing
A robust TV commercial detection system is proposed in this research. Even though several methods were investigated to address the TV commercial detection problem and interesting results were obtained before, most previous work focuses on features within a short temporal window. These methods are suitable for on-line detection, but often result in higher false alarm rates as a trade-off. To reduce the false alarm rate, we explore audiovisual features in a larger temporal window. Specifically, we group shots into scenes using audio data processing, and then obtain features that are related to commercial characteristics from scenes. Experimental results are given to demonstrate the effectiveness of the proposed system.
The vision-based traffic monitoring system provides an attractive solution in extracting various traffic parameters such as the count, speed, flow and concentration from the processing of video data captured by a camera system. The detection accuracy is however affected by various environment factors such as shadow, occlusion, and lighting. Among these, the occurrence of occlusion is one of the major problems. In this work, a new scheme is proposed to detect the occlusion and determine the exact location of each vehicle. The proposed algorithm is based on the matching of images from multiple cameras. In the proposed scheme, we do not need edge detection, region segmentation, and camera calibration operations, which often suffer from the variation of environmental conditions. Experimental results are given to verify that the proposed technique is effective for vision-based highway surveillance systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.