Realtime music identification became more and more interesting within the past few years. Possible fields are for example monitoring a radio station in order to create a playlist or scanning network traffic in search of copyright protected material.
This paper presents a client-server application to identify an unknown segment of music. The extraction and exchange of descriptive data is done with MPEG-7 only. This paper also explains how to define the similarity between two segments of music and determents its robustness towards perceptional audio coding and filtering. It also introduces an indexing system to reduce the number of segments which have to be compared to the query.
Driven by increasing amount of music available electronically the need and possibility of automatic classification systems for music becomes more and more important. Currently most search engines for music are based on textual descriptions like artist or/and title.
This paper presents a system for automatic music description, classification and visualization for a set of songs. The system is designed to extract significant features of a piece of music in order to find songs of similar genre or a similar sound characteristics. The description is done with the help of MPEG-7 only. The classification and visualization is done with the self organizing map algorithm.
In this paper we present an audio segmentation technique by searching similar sections of a song. The search is performed on MPEG-7 low-level audio feature descriptors as a growing source of multimedia meta data. These descriptors are available every 10 ms of audio data. For each block the similarity to each other block is determined. The result of this operation is a matrix which contains off-diagonal stripes representing similar regions. At that point some postprocessing is necessary due to a very disturbed structure of the similarity matrix. Using the a-priori knowledge that we search off-diagonal stripes which must represent several seconds of audio data we implemented a filter to enhance the structure of the similarity matrix. The last step is to extract the off-diagonal stripes and match them into the time domain of the audio data.
A novel concept for SNR scalability with motion compensation in the enhancement layer is introduced. The quantization of the prediction error at different quantization step sizes is performed in the same loop. This allows the application of bit plane coding if the configuration of the quantizers is appropriately chosen. Since a layered prediction is employed at the encoder a drift can occur at a base layer decoder. The concept is, therefore, extended by a drift limitation operation. In this context, two approaches are investigated. One is based on a modification of the prediction error. In the second approach the drift is controlled by dynamic clipping of the enhancement prediction. The proposed SNR scalability concept is applied to the lowpass band of a wavelet-based video coding scheme. The performance is compared with a conventional approach to SNR scalability with two and three quantization layers, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.