In this paper, a method for detecting infringements or modifications of a video in real-time is proposed. The
method first segments a video stream into shots, after which it extracts some reference frames as keyframes. This
process is performed employing a Singular Value Decomposition (SVD) technique developed in this work. Next,
for each input video (represented by its keyframes), ordinal-based signature and SIFT (Scale Invariant Feature
Transform) descriptors are generated. The ordinal-based method employs a two-level bitmap indexing scheme
to construct the index for each video signature. The first level clusters all input keyframes into k clusters while
the second level converts the ordinal-based signatures into bitmap vectors. On the other hand, the SIFT-based
method directly uses the descriptors as the index. Given a suspect video (being streamed or transferred on
the Internet), we generate the signature (ordinal and SIFT descriptors) then we compute similarity between
its signature and those signatures in the database based on ordinal signature and SIFT descriptors separately.
For similarity measure, besides the Euclidean distance, Boolean operators are also utilized during the matching
process. We have tested our system by performing several experiments on 50 videos (each about 1/2 hour in
duration) obtained from the TRECVID 2006 data set. For experiments set up, we refer to the conditions provided
by TRECVID 2009 on "Content-based copy detection" task. In addition, we also refer to the requirements issued
in the call for proposals by MPEG standard on the similar task. Initial result shows that our framework is effective
and robust. As compared to our previous work, on top of the achievement we obtained by reducing the storage
space and time taken in the ordinal based method, by introducing the SIFT features, we could achieve an overall
accuracy in F1 measure of about 96% (improved about 8%).
In this paper, we propose a framework for detecting near duplicate copies of a video based on an ordinal method.
The framework also incorporates a bitmap indexing structure instead of a typical indexing structure used in the
previous published work. Using this method, two levels of indices are constructed. The first level of this process
groups each input video (represented by their key frames) into k clusters. These clusters and the associated key
frames are then used to construct the first level index. The second level of this process converts ordinal-based
video signatures (generated using the technique developed in earlier work) into bitmap vectors. By adopting
this two-level indexing scheme, query processing times are significantly reduced. This is because, the system
is required to match only videos in the clusters that are relevant to the query and not all the videos in the
database. Additionally, the technique implemented utilizes a bitmap structure for indexing, resulting in less
storage space. Furthermore, we are able to employ low-cost Boolean operations such as AND, OR, and XOR in
the matching process instead of Euclidean distance or other similar matching algorithms. This helps to reduce
the computational time for video matching. The system has demonstrated to effectively reduce the space needed
to store collections of video signatures in a database, as well as improving the overall system performance. In
addition, initial results show that the system is effective and robust to several transformations such as changes
in brightness, color, contrast, resolution (reduction) as well as the addition of noise.
This paper proposes an algorithm for generating a video signature based on an ordinal measure. Current methods which
use a measure of temporal ordinal rank are robust to many transformations but can only detect the entire query video, not
a segment of the query, while methods which use local features may be more robust to certain transformations but less
robust to excessive noise. The proposed algorithm incorporates region-based spatial information while maintaining a
strong robustness to noise, different resolutions, illumination shifts and video file formats. In our method, a frame is first
divided into blocks. For each pixel in a block, a slice (a binary image computed based on the comparison between the
greyscale intensity of each pixel in the frame and the reference pixel) is generated. The slices of all the pixels in a block
are then added component-wise to obtain a metaslice for the block. In order to compute the distance between any two
frames, the Euclidean distance between corresponding metaslices of the two frames is computed to obtain the
metadistance between two blocks. Summing the metadifferences over all blocks and normalizing give the final measure
of distance between the two frames. To improve the speed of the algorithm, keyframes are first downsized and pixel
intensity values are represented by the average of a small block. A table of frame differences between two sets of
keyframes from two video sequences is constructed and then converted to a similarity matrix using a threshold. The
longest chain of consecutive similar keyframes is found and this produces the best matching video sequence between the
two videos. This algorithm is capable of taking into account differences between videos at various scales and is useful
for finding duplicate or modified copies of a query video in a database. Preliminary experimental results are encouraging
and demonstrate the potential of the proposed algorithm.
KEYWORDS: Image processing, Video, Digital signal processing, RGB color model, Video surveillance, Video processing, Signal processing, LCDs, Digital image processing, Raster graphics
In this work, we developed and implemented an image capturing and processing system that equipped with capability of
capturing images from an input video in real time. The input video can be a video from a PC, video camcorder or DVD
player. We developed two modes of operation in the system. In the first mode, an input image from the PC is processed
on the processing board (development platform with a digital signal processor) and is displayed on the PC. In the second
mode, current captured image from the video camcorder (or from DVD player) is processed on the board but is displayed
on the LCD monitor. The major difference between our system and other existing conventional systems is that image-processing
functions are performed on the board instead of the PC (so that the functions can be used for further
developments on the board). The user can control the operations of the board through the Graphic User Interface (GUI)
provided on the PC. In order to have a smooth image data transfer between the PC and the board, we employed Real
Time Data Transfer (RTDXTM) technology to create a link between them. For image processing functions, we developed
three main groups of function: (1) Point Processing; (2) Filtering and; (3) 'Others'. Point Processing includes rotation,
negation and mirroring. Filter category provides median, adaptive, smooth and sharpen filtering in the time domain. In
'Others' category, auto-contrast adjustment, edge detection, segmentation and sepia color are provided, these functions
either add effect on the image or enhance the image. We have developed and implemented our system using C/C#
programming language on TMS320DM642 (or DM642) board from Texas Instruments (TI). The system was showcased
in College of Engineering (CoE) exhibition 2006 at Nanyang Technological University (NTU) and have more than 40
users tried our system. It is demonstrated that our system is adequate for real time image capturing. Our system can be
used or applied for applications such as medical imaging, video surveillance, etc.
This paper presents a multi-modal two-level framework for news story segmentation designed to cope with large news video corpus such as the data used in TREC video retrieval (TRECVID) evaluations. We divide our system into two levels: shot level that assigns one of the pre-defined semantic tags to each input shot; and story level that performs story segmentation based on the output of the shot level and other temporal features. We demonstrate the generality of our framework by employing two machine-learning approaches at the story level. The first approach employs a statistical method called Hidden Markov Models (HMM) whereas the second uses a rule induction technique. We tested both approaches on ~ 120 hours of news video provided by TRECVID 2003. The results demonstrate that our 2-level machine-learning framework is effective and is adequate to cope with large-scale practical problems.
Conference Committee Involvement (2)
Multimedia Systems and Applications X
10 September 2007 | Boston, MA, United States
Multimedia Systems and Applications IX
2 October 2006 | Boston, Massachusetts, United States
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.