In this paper, we present a novel approach for the adaptation of large images to small display sizes. As a recent
study suggests, most viewers prefer the loss of content over the insertion of deformations in the retargeting
process.1 Therefore, we combine the two image retargeting operators seam carving and cropping in order to
resize an image without manipulating the important objects in an image at all. First, seams are removed carefully
until a dynamic energy threshold is reached to prevent the creation of visible artifacts. Then, a cropping window
is selected in the image that has the smallest possible window size without having the removed energy rise above
a second dynamic threshold. As the number of removed seams and the size of the cropping window are not fix,
the process is repeated iteratively until the target size is reached. Our results show that by using this method,
more important content of an image can be included in the cropping window than in normal cropping. The
"squeezing" of objects which might occur in approaches based on warping or scaling is also prevented.
In order to display a high dynamic range (HDR) video on a regular low dynamic range (LDR) screen, it needs
to be tone mapped. A great number of tone mapping (TM) operators exist - most of them designed to tone
map one image at a time. Using them on each frame of an HDR video individually leads to flicker in the
resulting sequence. In our work, we analyze three tone mapping operators with respect to flicker. We propose
a criterion for the automatic detection of image flicker by analyzing the log average pixel brightness of the tone
mapped frame. Flicker is detected if the difference between the averages of two consecutive frames is larger
than a threshold derived from Stevens' power law. Fine-tuning of the threshold is done in a subjective study.
Additionally, we propose a generic method to reduce flicker as a post processing step. It is applicable to all tone
mapping operators. We begin by tone mapping a frame with the chosen operator. If the flicker detection reports
a visible variation in the frame's brightness, its brightness is adjusted. As a result, the brightness variation is
smoothed over several frames, becoming less disturbing.
In this paper, we propose a new method to adapt the resolution of images to the limited display resolution
of mobile devices. We use the seam carving technique to identify and remove less relevant content in images.
Seam carving achieves a high adaptation quality for landscape images and distortions caused by the removal of
seams are very low compared to other techniques like scaling or cropping. However, if an image depicts objects
with straight lines or regular patterns like buildings, the visual quality of the adapted images is much lower.
Errors caused by seam carving are especially obvious if straight lines become curved or disconnected. In order
to preserve straight lines, our algorithm applies line detection in addition to the normal energy function of seam
carving. The energy in the local neighborhood of the intersection point of a seam and a straight line is increased
to prevent other seams from removing adjacent pixels. We evaluate our improved seam carving algorithm and
compare the results with regular seam carving. In case of landscape images with no straight lines, traditional
seam carving and our enhanced approach lead to very similar results. However, in the case of objects with
straight lines, the quality of our results is significantly better.
We enhance an existing in-circuit, inline tester for printed circuit assemblies (PCA) by video-based automatic optical
inspection (Video-AOI). Our definition of video is that we continuously capture images of a moving PCA, such that each
PCA component is contained in multiple images, taken under varying viewing conditions like angle, time, camera settings
or lighting. This can then be exploited for an efficient detection of faults. The first part of our paper focuses on the
parameters of such a Video-AOI system and shows how they can be determined. In the second part, we introduce techniques
to capture and preprocess a video of a PCA, so that it can be used for inspection.
In this paper, we introduce our new visualization service which presents web pages and images on arbitrary devices with
differing display resolutions. We analyze the layout of a web page and simplify its structure and formatting rules. The
small screen of a mobile device is used much better this way. Our new image adaptation service combines several
techniques. In a first step, border regions which do not contain relevant semantic content are identified. Cropping is used
to remove these regions. Attention objects are identified in a second step. We use face detection, text detection and
contrast based saliency maps to identify these objects and combine them into a region of interest. Optionally, the seam
carving technique can be used to remove inner parts of an image. Additionally, we have developed a software tool to
validate, add, delete, or modify all automatically extracted data. This tool also simulates different mobile devices, so that
the user gets a feeling of how an adapted web page will look like. We have performed user studies to evaluate our web
and image adaptation approach. Questions regarding software ergonomics, quality of the adapted content, and perceived
benefit of the adaptation were asked.
A large number of recorded videos cannot be viewed on mobile devices (e.g., PDAs or mobile phones) due to
inappropriate screen resolutions or color depths of the displays. Recently, automatic transcoding algorithms have
been introduced which facilitate the playback of previously recorded videos on new devices. One major challenge
of transcoding is the preservation of the semantic content of the videos. Although much work was done on the
adaptation of the image resolution, color adaptation of videos has not been addressed in detail before. In this
paper, we present a novel color adaptation algorithm for videos which preserves the semantics. In our approach,
the color depth of a video is adapted to facilitate the playback of videos on mobile devices which support only
a limited number of different colors. We analyze our adaptation approach in the experimental results, visualize
adapted keyframes and illustrate, that we obtain a better quality and are able to recognize much more details
with our approach.
The recognition of human postures and gestures is considered to be highly relevant semantic information in videos and surveillance systems. We present a new three-step approach to classifying the posture or gesture of a person based on segmentation, classification, and aggregation. A background image is constructed from succeeding frames using motion compensation and shapes of people are segmented by comparing the background image with each frame. We use a modified curvature scale space (CSS) approach to classify a shape. But a major drawback to this approach is its poor representation of convex segments in shapes: Convex objects cannot be represented at all since there are no inflection points. We have extended the CSS approach to generate feature points for both the concave and convex segments of a shape. The key idea is to reflect each contour pixel and map the original shape to a second one whose curvature is the reverse: Strong convex segments in the original shape are mapped to concave segments in the second one and vice versa. For each shape a CSS image is generated whose feature points characterize the shape of a person very well. The last step aggregates the matching results. A transition matrix is defined that classifies possible transitions between adjacent frames, e.g. a person who is sitting on a chair in one frame cannot be walking in the next. A valid transition requires at least several frames where the posture is classified as "standing-up". We present promising results and compare the classification rates of postures and gestures for the standard CSS and our new approach.
Object-oriented coding in the MPEG-4 standard enables the separate processing of foreground objects and the scene background (sprite). Since the background sprite only has to be sent once,
transmission bandwidth can be saved. This paper shows that the concept of merging several views of a non-changing scene background into a single background sprite is usually not the most efficient way to transmit the background image. We have found that the counter-intuitive approach of splitting the background into several independent parts can reduce the overall amount of data. For this reason, we propose an algorithm that provides an optimal partitioning
of a video sequence into independent background sprites (a multi-sprite), resulting in a significant reduction of the involved coding cost. Additionally, our algorithm results in background sprites with better quality by ensuring that the sprite resolution has at least the final display resolution throughout the sequence.
Even though our sprite generation algorithm creates multiple sprites
instead of a single background sprite, it is fully compatible with the existing MPEG-4 standard. The algorithm has been evaluated with several test-sequences, including the well-known Table-tennis and Stefan sequences. The total coding cost could be reduced by factors of about 2.7 or even higher.
Many TV broadcasters and film archives are planning to make their
collections available on the Web. However, a major problem with large
film archives is the fact that it is difficult to search the content
visually. A video summary is a sequence of video clips extracted from
a longer video. Much shorter than the original, the summary preserves
its essential messages. Hence, video summaries may speed up the search
Videos that have full horizontal and vertical resolution will usually
not be accepted on the Web, since the bandwidth required to transfer
the video is generally very high. If the resolution of a video is
reduced in an intelligent way, its content can still be understood. We
introduce a new algorithm that reduces the resolution while preserving
as much of the semantics as possible.
In the MoCA (movie content analysis) project at the University of
Mannheim we developed the video summarization component and tested it
on a large collection of films. In this paper we discuss the
particular challenges which the reduction of the video length poses,
and report empirical results from the use of our summarization tool.
We propose an automatic camera calibration algorithm for court sports. The obtained camera calibration parameters are required for applications that need to convert positions in the video frame to real-world coordinates or vice versa. Our algorithm uses a model of the arrangement of court lines for calibration. Since the court
model can be specified by the user, the algorithm can be applied to a variety of different sports.
The algorithm starts with a model initialization step which locates the court in the image without any user assistance or a-priori knowledge about the most probable position. Image pixels are classified as court line pixels if they pass several tests including color and local texture constraints. A Hough transform is applied to
extract line elements, forming a set of court line candidates. The subsequent combinatorial search establishes correspondences between lines in the input image and lines from the court model. For the succeeding input frames, an abbreviated calibration algorithm is used, which predicts the camera parameters for the new image
and optimizes the parameters using a gradient-descent algorithm.
We have conducted experiments on a variety of sport videos (tennis, volleyball, and goal area sequences of soccer games). Video scenes with considerable difficulties were selected to test the robustness of the algorithm. Results show that the algorithm is very robust to occlusions, partial court views, bad lighting conditions, or
This paper presents a new algorithm for video-object segmentation,
which combines motion-based segmentation, high-level object-model
detection, and spatial segmentation into a single framework.
This joint approach overcomes the disadvantages of these algorithms
when applied independently. These disadvantages include the low semantic accuracy of spatial segmentation and the inexact object boundaries obtained from object-model matching and motion segmentation. The now proposed algorithm alleviates three problems common to all motion-based segmentation algorithms. First, it completes object areas that cannot be clearly distinguished
from the background because their color is near the background color.
Second, parts of the object that are not considered to belong
to the object since they are not moving, are still added to the object mask. Finally, when several objects are moving, of which only one is of interest, it is detected that the remaining regions
do not belong to any object-model and these regions are removed from the foreground. This suppresses regions erroneously considered as moving or objects that are moving but that are completely irrelevant to the user.
The live-wire approach is a well-known algorithm based on a graph search to locate boundaries for image segmentation. We will extend the original cost function, which is solely based on finding strong edges, so that the approach can take a large variety of boundaries into account. The cost function adapts to the local characteristics of a boundary by analyzing a user-defined sample using a continuous wavelet decomposition. We will finally extend the approach into 3D in order to segment objects in volumetric data, e. g., from medical CT and MR scans.
In this paper, we propose a new system for video object detection
based on user-defined models. Object models are described by
'model graphs' in which nodes represent image regions and edges
denote spatial proximity. Each node is attributed with color and
shape information about the corresponding image region. Model
graphs are specified manually based on a sample image of the
object. Object recognition starts with automatic color segmentation of the input image. For each region, the same features are extracted as specified in the model graph. Recognition is based on finding a
subgraph in the image graph that matches the model graph. Evidently, it is not possible to find an isomorph subgraph, since node and edge attributes will not match exactly. Furthermore, the automatic segmentation step leads to an oversegmented image. For this reason, we employ inexact graph matching, where several nodes of the image graph may be mapped onto a single node in the model graph. We have applied our object recognition algorithm to cartoon sequences. This class of sequences is difficult to handle with current automatic segmentation algorithms because the motion estimation has difficulties arising from large homogeneous regions and because the object appearance is typically highly variable. Experiments show that our algorithm can robustly detect the specified objects and also accurately find the object boundary.
We present a method for analyzing and resynthesizing inhomogeneously textured regions in images for the purpose of advanced compression. First the user defines image blocks so that they cover regions with homogeneous texture. These blocks are each transformed in turn. For the transform we use the so called Principle Component Analysis. After the transform into the new domain we statistically analyze the resulting coefficients. To resynthesize new texture we generate random numbers that exactly meet these statistics. Using the inverse transform the random coefficients are finally transformed back into the spatial domain. The visual appearance of the resulting artificial texture matches the original to a very high degree.
The number of video conferences conducted over the Internet has constantly increased during the last years. The need to archive the multimedia data streams of the conferences became apparent, and a number of tools accomplishing this task for audio and video streams were developed. In many video conferencing scenarios, shared whiteboards are used in addition to audio and video to transmit slides or to sketch ideas. However, none of the existing recording tools provides an efficient recording service for data streams of these tools. In this paper we present a new approach to the recording and playback of shared whiteboard media streams. We discuss generic design issues of a shared whiteboard recorder, and we present a novel algorithm that enables efficient random access to the recorded streams. We describe an implementation of our algorithms for the media streams of our digital lecture board.
In this paper, we consider the problem of similarity between video sequences. Three basic questions are raised and (partially) answered. Firstly, at what temporal duration can video sequences be compared? The frame, shot, scene and video levels are identified. Secondly, given some image or video feature, what are the requirements on its distance measure and how can it be 'easily' transformed into the visual similarity desired by the inquirer? Thirdly, how can video sequences be compared at different levels? A general approach based on either a set or sequence representation with variable degrees of aggregation is proposed and applied recursively over the different levels of temporal resolution. It allows the inquirer to fully control the importance of temporal ordering and duration. Promising experimental results are presented.