The analysis of videos for the recognition of Instrumental Activities of Daily Living (IADL) through the detection of objects and the context analysis, applied for the evaluation of patient’s capacity with Alzheimer's disease and age related dementia, has recently gained a lot of interest. The incorporation of human perception in the recognition tasks, search, detection and visual content understanding has become one of the main tools for the development of systems and technologies that support the performance of people in their daily life activities. In this paper we propose a model of automatic segmentation of the saliency region where the objects of interest are found in egocentric video using fully convolutional networks (FCN). The segmentation is performed with the information regarding to human perception, obtaining a better segmentation at pixel level. This segmentation involves objects of interest and the salient region in egocentric videos, providing precise information to detection systems and automatic indexing of objects in video, where these systems have improved their performance in the recognition of IADL. To measure models segmentation performance of the salient region, we benchmark two databases; first, Georgia-Tech-Egocentric-Activity database and second, our own database. Results show that the method achieves a significantly better performance in the precision of the semantic segmentation of the region where the objects of interest are located, compared with GBVS (Graph-Based Visual Saliency) method
Nowadays there is a trend towards the use of unimodal databases for multimedia content description, organization and retrieval applications of a single type of content like text, voice and images, instead bimodal databases allow to associate semantically two different types of content like audio-video, image-text, among others. The generation of a bimodal database of audio-video implies the creation of a connection between the multimedia content through the semantic relation that associates the actions of both types of information. This paper describes in detail the used characteristics and methodology for the creation of the bimodal database of violent content; the semantic relationship is stablished by the proposed concepts that describe the audiovisual information. The use of bimodal databases in applications related to the audiovisual content processing allows an increase in the semantic performance only and only if these applications process both type of content. This bimodal database counts with 580 audiovisual annotated segments, with a duration of 28 minutes, divided in 41 classes. Bimodal databases are a tool in the generation of applications for the semantic web.
KEYWORDS: Convolutional neural networks, Buildings, Image classification, RGB color model, Visualization, Cultural heritage, Image processing, Data modeling, Convolution, Video
We propose a convolutional neural network to classify images of buildings using sparse features at the network’s input in conjunction with primary color pixel values. As a result, a trained neuronal model is obtained to classify Mexican buildings in three classes according to the architectural styles: prehispanic, colonial, and modern with an accuracy of 88.01%. The problem of poor information in a training dataset is faced due to the unequal availability of cultural material. We propose a data augmentation and oversampling method to solve this problem. The results are encouraging and allow for prefiltering of the content in the search tasks.
In the computer world, the consumption and generation of multimedia content are in constant growth due to the popularization of mobile devices and new communication technologies. Retrieve information from multimedia content to describe Mexican buildings is a challenging problem. Our objective is to determine patterns related to three building eras (Pre-Hispanic, colonial and modern). For this purpose, existing recognition systems need to process a plenty of videos and images. The automatic learning systems trains the recognition capability with a semantic-annotated database. We built the database taking into account high-level feature concepts, user knowledge and experience. The annotations helps correlating context and content to understand the data on multimedia files. Without a method, the user needs a super mind to remember all and registry this data manually. This article presents a methodology for a quick images annotation using a graphical interface and intuitive controls. Emphasizing in the most two important features: time-consuming during annotations task and the quality of selected images. Though, we only classify images by its era and its quality. Finally, we obtain a dataset of Mexican buildings preserving the contextual information with semantic-annotations for training and test of buildings recognition systems. Therefore, research on content low-level descriptors is other possible use for this dataset.
Biometrics refers to identify people through their physical characteristics or behavior such as fingerprints, face, DNA,
hand geometries, retina and iris patterns. Typically, the iris pattern is to acquire in short distance to recognize a person,
however, in the past few years is a challenge identify a person by its iris pattern at certain distance in non-cooperative
environments. This challenge comprises: 1) high quality iris image, 2) light variation, 3) blur reduction, 4) specular
reflections reduction, 5) the distance from the acquisition system to the user, and 6) standardize the iris size and the density
pixel of iris texture. The solution of the challenge will add robustness and enhance the iris recognition rates. For this
reason, we describe the technical issues that must be considered during iris acquisition. Some of these considerations are
the camera sensor, lens, the math analysis of depth of field (DOF) and field of view (FOV) for iris recognition. Finally,
based on this issues we present experiment that show the result of captures obtained with our camera at distance and
captures obtained with cameras in very short distance.
Current search engines are based upon search methods that involve the combination of words (text-based search); which
has been efficient until now. However, the Internet’s growing demand indicates that there’s more diversity on it with each
passing day. Text-based searches are becoming limited, as most of the information on the Internet can be found in different
types of content denominated multimedia content (images, audio files, video files).
Indeed, what needs to be improved in current search engines is: search content, and precision; as well as an accurate display
of expected search results by the user. Any search can be more precise if it uses more text parameters, but it doesn’t help
improve the content or speed of the search itself. One solution is to improve them through the characterization of the
content for the search in multimedia files. In this article, an analysis of the new generation multimedia search engines is
presented, focusing the needs according to new technologies.
Multimedia content has become a central part of the flow of information in our daily life. This reflects the necessity of
having multimedia search engines, as well as knowing the real tasks that it must comply. Through this analysis, it is shown
that there are not many search engines that can perform content searches. The area of research of multimedia search engines
of new generation is a multidisciplinary area that’s in constant growth, generating tools that satisfy the different needs of
new generation systems.
Multimedia content production and storage in repositories are now an increasingly widespread practice. Indexing concepts for search in multimedia libraries are very useful for users of the repositories. However the search tools of content-based retrieval and automatic video tagging, still do not have great consistency. Regardless of how these systems are implemented, it is of vital importance to possess lots of videos that have concepts tagged with ground truth (training and testing sets). This paper describes a novel methodology to make complex annotations on video resources through ELAN software. The concepts are annotated and related to Mexican nature in a High Level Features (HLF) from development set of TRECVID 2014 in a collaborative environment. Based on this set, each nature concept observed is tagged on each video shot using concepts of the TRECVid 2014 dataset. We also propose new concepts, -like tropical settings, urban scenes, actions, events, weather, places for name a few. We also propose specific concepts that best describe video content of Mexican culture. We have been careful to get the database tagged with concepts of nature and ground truth. It is evident that a collaborative environment is more suitable for annotation of concepts related to ground truth and nature. As a result a Mexican nature database was built. It also is the basis for testing and training sets to automatically classify new multimedia content of Mexican nature.
The video annotation is important for web indexing and browsing systems. Indeed, in order to evaluate the performance of video query and mining techniques, databases with concept annotations are required. Therefore, it is necessary generate a database with a semantic indexing that represents the digital content of the Mexican bullfighting atmosphere. This paper proposes a scheme to make complex annotations in a video in the frame of multimedia search engine project. Each video is partitioned using our segmentation algorithm that creates shots of different length and different number of frames. In order to make complex annotations about the video, we use ELAN software. The annotations are done in two steps: First, we take note about the whole content in each shot. Second, we describe the actions as parameters of the camera like direction, position and deepness. As a consequence, we obtain a more complete descriptor of every action. In both cases we use the concepts of the TRECVid 2014 dataset. We also propose new concepts. This methodology allows to generate a database with the necessary information to create descriptors and algorithms capable to detect actions to automatically index and classify new bullfighting multimedia content.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.