Automatic surface inspection has been used in the industry to reliably detect all kinds of surface defects and
to measure the overall quality of a produced piece. Structured light systems (SLS) are based on the reconstruction
of the 3D information of a selected area by projecting several phase-shifted sinusoidal patterns onto
a surface. Due to the high speed of production lines, surface inspection systems require extremely fast imaging
methods and lots of computational power. The cost of such systems can easily become considerable. The use
of standard PCs and Graphics Processing Units (GPUs) for data processing tasks facilitates the construction
of cost-effective systems. We present a parallel implementation of the required algorithms written in C with
CUDA extensions. In our contribution, we describe the challenges of the design on a GPU, compared with a
traditional CPU implementation. We provide a qualitative evaluation of the results and a comparison of the
algorithm speed performance on several platforms. The system is able to compute two megapixels height maps
with 100 micrometers spatial resolution in less than 200ms on a mid-budget laptop. Our GPU implementation
runs about ten times faster than our previous C code implementation.
The increased sensing and computing capabilities of mobile devices can provide for enhanced mobile user experience.
Integrating the data from different sensors offers a way to improve application performance in camera-based
applications. A key advantage of using cameras as an input modality is that it enables recognizing the context.
Therefore, computer vision has been traditionally utilized in user interfaces to observe and automatically detect
the user actions. The imaging applications can also make use of various sensors for improving the interactivity and
the robustness of the system. In this context, two applications fusing the sensor data with the results obtained
from video analysis have been implemented on a Nokia Nseries mobile device. The first solution is a real-time
user interface that can be used for browsing large images. The solution enables the display to be controlled by
the motion of the user's hand using the built-in sensors as complementary information. The second application
is a real-time panorama builder that uses the device's accelerometers to improve the overall quality, providing
also instructions during the capture. The experiments show that fusing the sensor data improves camera-based
applications especially when the conditions are not optimal for approaches using camera data alone.
The future multi-modal user interfaces of battery-powered mobile devices are expected to require computationally
costly image analysis techniques. The use of Graphic Processing Units for computing is very well suited for
parallel processing and the addition of programmable stages and high precision arithmetic provide for opportunities
to implement energy-efficient complete algorithms. At the moment the first mobile graphics accelerators
with programmable pipelines are available, enabling the GPGPU implementation of several image processing
algorithms. In this context, we consider a face tracking approach that uses efficient gray-scale invariant texture
features and boosting. The solution is based on the Local Binary Pattern (LBP) features and makes use of the
GPU on the pre-processing and feature extraction phase. We have implemented a series of image processing
techniques in the shader language of OpenGL ES 2.0, compiled them for a mobile graphics processing unit and
performed tests on a mobile application processor platform (OMAP3530). In our contribution, we describe the
challenges of designing on a mobile platform, present the performance achieved and provide measurement results
for the actual power consumption in comparison to using the CPU (ARM) on the same platform.
Since more processing power, new sensing and display technologies are already available in mobile devices, there
has been increased interest in building systems to communicate via different modalities such as speech, gesture,
expression, and touch. In context identification based user interfaces, these independent modalities are combined
to create new ways how the users interact with hand-helds. While these are unlikely to completely replace
traditional interfaces, they will considerably enrich and improve the user experience and task performance. We
demonstrate a set of novel user interface concepts that rely on built-in multiple sensors of modern mobile devices
for recognizing the context and sequences of actions. In particular, we use the camera to detect whether the user
is watching the device, for instance, to make the decision to turn on the display backlight. In our approach the
motion sensors are first employed for detecting the handling of the device. Then, based on ambient illumination
information provided by a light sensor, the cameras are turned on. The frontal camera is used for face detection,
while the back camera provides for supplemental contextual information. The subsequent applications triggered
by the context can be, for example, image capturing, or bar code reading.
Modern mobile communication devices frequently contain built-in cameras allowing users to capture highresolution
still images, but at the same time the imaging applications are facing both usability and throughput
bottlenecks. The difficulties in taking ad hoc pictures of printed paper documents with multi-megapixel cellular
phone cameras on a common business use case, illustrate these problems for anyone. The result can be
examined only after several seconds, and is often blurry, so a new picture is needed, although the view-finder
image had looked good. The process can be a frustrating one with waits and the user not being able to predict
the quality beforehand. The problems can be traced to the processor speed and camera resolution mismatch,
and application interactivity demands. In this context we analyze building mosaic images of printed documents
from frames selected from VGA resolution (640x480 pixel) video. High interactivity is achieved by providing
real-time feedback on the quality, while simultaneously guiding the user actions. The graphics processing unit of
the mobile device can be used to speed up the reconstruction computations. To demonstrate the viability of the
concept, we present an interactive document scanning application implemented on a Nokia N95 mobile phone.
The video applications on mobile communication devices have usually been designed for content creation, access,
and playback. For instance, many recent mobile devices replicate the functionalities of portable video cameras
and video recorders, and digital TV receivers. These are all demanding uses, but nothing new from the consumer
point of view. However, many of the current devices have two cameras built in, one for capturing high resolution
images, and the other for lower, typically VGA (640x480 pixels) resolution video telephony. We employ video to
enable new applications and describe four actual solutions implemented on mobile communication devices. The
first one is a real-time motion based user interface that can be used for browsing large images or documents such
as maps on small screens. The motion information is extracted from the image sequence captured by the camera.
The second solution is a real-time panorama builder, while the third one assembles document panoramas, both
from individual video frames. The fourth solution is a real-time face and eye detector. It provides another type
of foundation for motion based user interfaces as knowledge of presence and motion of a human faces in the view
of the camera can be a powerful application enabler.
Video coding standards, such as MPEG-4, H.264, and VC1, define hybrid transform based block motion compensated techniques that employ almost the same coding tools. This observation has been a foundation for defining the MPEG Reconfigurable Multimedia Coding framework that targets to facilitate multi-format codec design. The idea is to send a description of the codec with the bit stream, and to reconfigure the coding tools accordingly on-the-fly. This kind of approach favors software solutions, and is a substantial challenge for the implementers of mobile multimedia devices that aim at high energy efficiency. In particularly as high definition formats are about to be required from mobile multimedia devices, variable length decoders are becoming a serious bottleneck. Even at current moderate mobile video bitrates software based variable length decoders swallow a major portion of the resources of a mobile processor. In this paper we present a Transport Triggered Architecture (TTA) based programmable implementation for Context Adaptive Binary Arithmetic de-Coding (CABAC) that is used e.g. in the main profile of H.264 and in JPEG2000. The solution can be used even for other variable length codes.
This paper presents a comparison of two systems that can simultaneously decode multiple videos on a simple
CPU and dedicated function-level hardware accelerators. The first system is implemented in a traditional way,
such that the decoder instances access the accelerators concurrently without external coordination. The second
system implementation coordinates the tasks' accelerator accesses by scheduling. The solutions are compared
by execution cycles, energy consumption and cache hit ratios. In the traditional solution each decoder task
continuously requests access to the needed hardware accelerators. However, since the other tasks are competing
on the same resources, the tasks must often yield and wait for their turn, which reduces the energy-effciency.
The scheduling-based approach assumes that the accelerator latencies are deterministic and assigns time slots for
accelerator accesses required by each task. The accelerator access schedule is re-designed for each macroblock at
run-time, thus avoiding the over-allocation of resources and improving energy-effciency. Deterministic accelerator
latencies ensue that the CPU is not interrupted when an accelerator finishes. The contribution of this study is
the comparison of the accelerator timing solution against the traditional approach.
Image stitching is used to combine several images into one wide-angled mosaic image. Traditionally mosaic
images have been constructed from a few separate photographs, but nowadays that video recording has become
commonplace even on mobile phones, it is possible to consider also video sequences as a source for mosaic images.
However, most stitching methods require vast amounts of computational resources that make them unusable on
We present a novel panorama stitching method that is designed to create high-quality image mosaics from
both video clips and separate images even on low-resource devices. The software is able to create both 360
degree panoramas and perspective-corrected mosaics. Features of the software include among others: detection
of moving objects, inter-frame color balancing and rotation correction. The application selects only the frames
of highest quality for the final mosaic image. Low-quality frames are dropped on the fly while recording the
frames for the mosaic.
The complete software is implemented on Matlab, but also a mobile phone version exists. We present a
complete solution from frame acquisition to panorama output with different resource profiles that suit various
The multimedia capabilities of emerging high-end battery powered mobile devices rely on monolithic hardware
accelerators with long latencies to minimize interrupt and software overheads. When compared to pure software
implementations, monolithic hardware accelerator solutions need an order of magnitude less power. However, they are
rather inflexible and difficult to modify to provide support for multiple coding standards. A more flexible alternative is
to employ finer grained short latency accelerators that implement the individual coding functions. Unfortunately, with
this approach the software overheads can become very high, if interrupts are used for synchronizing the software and
hardware. Preferably, the cost of hardware accelerator interfacing should be at the same level with software functions. In
this paper we study the benefits attainable from such an approach. As a case study we restructure a MPEG-4 video
decoder in a manner that enables the simultaneous decoding of multiple bit streams using short latency hardware
accelerators. The approach takes multiple video bit streams as input and produces a multiplexed stream that is used to
control the hardware accelerators without interrupts. The decoding processes of each stream can be considered as
threads that share the same hardware resources. Software simulations predict that the energy efficiency of the approach
would be significantly better than for a pure software implementation.
Multimedia processing in battery powered mobile communication devices is pushing their computing power requirements to the level of desktop computers. At the same time the energy dissipation limit stays at 3W that is the practical maximum to prevent the devices from becoming too hot to handle. In addition, several hours of active usage time should be provided on battery power. During the last ten years the active usage times of mobile communication devices have remained essentially the same regardless of big energy efficiency improvements at silicon level. The reasons can be traced to the design paradigms that are not explicitly targeted to creating energy efficient systems, but to facilitate implementing complex software solutions by large teams. Consequently, the hardware and software architectures, including the operating system principles, are the same for both mainframe computer system and current mobile phones. In this paper, we consider the observed developments against the needs of video processing in mobile communication devices and consider means of implementing energy efficient video codecs both in hardware and software. Although inflexible, monolithic video acceleration hardware is an attractive solution, while software based codecs are becoming increasingly difficult to implement in an energy efficient manner due to increasing system complexity. Approaches that combine both the flexibility of software and energy efficiency of hardware remain to be seen.
A non-supervised clustering based method for classifying paper according to its quality is presented. The method is simple to train, requiring minimal human involvement. The approach is based on Self-Organizing Maps and texture features that discriminate the texture of effectively.
Multidimensional texture feature vectors are first extracted from paper images. The dimensionality of the data is then reduced by a Self-Organizing Map (SOM). In dimensionality reduction, the feature data are projected to a two-dimensional space and clustered according to their similarity. The clusters represent different paper qualities and can be labeled according to the quality information of the training samples. After that, it is easy to find the quality class of the inspected paper by checking where a sample is placed in the low-dimensional space.
Tests based on images taken in a laboratory environment from four different paper quality classes provided very promising results. Local Binary Pattern (LBP) texture features combined with a SOM-based approach classified the test data almost perfectly: the error percentage was only 0.2% with the multiresolution version of LBP and 1.6% with the regular LBP. The improvement to the previously used texture features in paper inspection is huge: the classification error is reduced over 40 times. In addition to the excellent classification accuracy, the method also offers a self-intuitive user interface and a synthetic view to the inspected data.
Dimensionality reduction methods for visualization map the original high-dimensional data typically into two dimensions. Mapping preserves the important information of the data, and in order to be useful, fulfils the needs of a human observer.
We have proposed a self-organizing map (SOM)- based approach for visual surface inspection. The method provides the advantages of unsupervised learning and an intuitive user interface that allows one to very easily set and tune the class boundaries based on observations made on visualization, for example, to adapt to changing conditions or material. There are, however, some problems with a SOM. It does not address the true distances between data, and it has a tendency to ignore rare samples in the training set at the expense of more accurate representation of common samples. In this paper, some alternative methods for a SOM are evaluated. These methods, PCA, MDS, LLE, ISOMAP, and GTM, are used to reduce dimensionality in order to visualize the data. Their principal differences are discussed and performances quantitatively evaluated in a few special classification cases, such as in wood inspection using centile features.
For the test material experimented with, SOM and GTM outperform the others when classification performance is considered. For data mining kinds of applications, ISOMAP and LLE appear to be more promising methods.
A key problem in using automatic visual surface inspection in industry is training and tuning the systems to perform in a desired manner. This may take from minutes up to a year after installation, and can be a major cost. Based on our experiences the training issues need to be taken into account from the very beginning of system design. In this presentation we consider approaches for visual surface inspection and system training. We advocate using a non-supervised learning based visual training method.
We have developed a self-organizing map (SOM) -based approach for training and classification in visual surface inspection applications. The approach combines the advantages of non-supervised and supervised training and offers an intuitive visual user interface. The training is less sensitive to human errors, since labeling of large amounts of individual training samples is not necessary. In the classification, the user interface allows on-line control of class boundaries. Earlier experiments show that our approach gives good results in wood inspection. In this paper, we evaluate its real time capability. When quite simple features are used, the bottleneck in real time inspection is the nearest SOM code vector search during the classification phase. In experiments, we compare acceleration techniques that are suitable for high dimensional nearest neighbor search typical for the method. We show that even simple acceleration techniques can improve the speed considerably, and the SOM approach can be used in real time with a standard PC.
In automated visual surface inspection based on statistical pattern recognition, the collection of training material for setting up the classifier may appear to be difficult. Getting a representative set of labeled training samples requires scanning through large amounts of image material by the training personnel, which is an error prone and laborious task. Problems are further caused by the variations of the inspected materials and imaging conditions, especially with color imaging. Approaches based on adaptive defect detection and robust features may appear inapplicable because of losing some faint or large area defects. Adjusting the classifier to adapt to the changed situation may appear difficult because of the inflexibility of the classifiers' implementations. This may lead to impractical often repeated training material collection and classifier retraining cycles. In this paper we propose a non-segmenting defect detection technique combined with a self-organizing map (SOM) based classifier and user interface. The purpose is to avoid the problems with adaptive detection techniques, and to provide an intuitive user interface for classification, helping in training material collection and labelling, and with a possibility of easily adjusting the class boundaries. The approach is illustrated with examples from wood surface inspection.
The accuracy of the visual inspection process that performs the quality classification of lumber is one of the key interest areas in the mechanical wood industry. In principle, the quality classification of wood is straightforward: the class of each board depends on its defects and their distribution, as defined by the quality standard. However, even the appearance of sound wood varies greatly and there are no two boards or defects that have exactly the same properties such as color and texture. We describe the development of a color vision technology for grading softwood lumber. Much attention has been given to the early and cheap recognition of sound wood regions, as only a minor portion of the surface area of boards, around 5 - 10%, is defective. The non-interesting regions can be discarded and the hardware and communication bandwidth requirements at later defect identification stages are relieved. In the end the description of the board and its defects is passed to a grader that searches for all the applicable quality classes from the given set of standards. Extensive comparative tests have been carried out in a complete simulated system. The effects of changes in the spectrum of illumination have been evaluated to identify robust color features and to produce the requirements for color calibration.
High development costs of machine vision systems can be reduced by designing more general algorithms that can be used in a wide range of applications, by using modular system architectures in which new algorithms and more computational power can be added easily, depending on the needs of the given applications, and by using powerful tools for the design of, for example, new algorithms, hardware, software, optics, and illumination. This paper overviews the related research in progress at the University of Oulu. The topics to be discussed include hybrid computer architecture for machine vision, a color segmentation algorithm based on hierarchical connected components analysis, an interactive tool for performance analysis of parallel vision programs, and an object-oriented image processing library.
The purpose of this work has been to find solutions for reliable real-time monocular visual tracking. The goal is to estimate the relative motion of a camera with respect to a rigid 3-D scene by tracking features. In the beginning, the 3-D locations of the features are not known accurately, but during the tracking process these uncertainties are reduced through the integration of new observations. Most attention has been given to modeling measurement uncertainties and selecting the features to be extracted from image frames. The experimental system under implementation employs a bank of extended Kalman filtering based trackers each of which calculates estimates for location and motion using measurements of a few feature points at a time. The small number of points makes the trackers sensitive to various measurement errors, simplifying the detection of tracking failures, thereby giving potential for improving reliability. The preliminary experiments have been performed with satisfactory results for sequences of images at the rates of 22 to 35 frames per second.
The accuracy and speed of motion determination are critical factors for vision based environment modeling of autonomous moving machines. In this paper two motion estimation algorithms for this purpose are compared using simulations. The algorithms, based on extended Kalman filtering (EKF) and Gauss-Newton minimization, determine the translations and rotations from corresponding 3-D point pairs measured from consecutive stereo images. Both solutions take the stereo measurement uncertainties into account. The simulation results show that with the same input data the accuracy of EKF is slightly lower than with the Gauss- Newton minimization approach, but its computational cost is substantially smaller.
This paper presents a vision-guided control system for an industrial robot capable of picking up an object, moving it to a goal, and placing it there. Tasks given to the control system are based on imperfect knowledge about the environment. The control system corrects the task parameters by matching them against range information gained from the environment. The control system is part of a larger system, which includes a high-level goal-oriented planner. The planner consists of hierarchically organized planning-executing-monitoring triplets, which execute given tasks by dividing them into subtasks, by sending the subtasks either to other triplets or to the control system described in this paper, and by monitoring the execution of the subtasks. The planner sees the robot and the control system as an intelligent robot capable of executing pick-and-place tasks in a dynamic, partly unknown environment. This paper presents the results of the testing of the control system with an industrial 6-axis robot and a structured light-based range sensor. Also the principle of calibrating the robot and the sensor is presented.
This paper describes the morphological methods used in a prototype for an automated visual on-line metal strip inspection system. The system is capable of both detecting and classifying surface defects in copper alloy strips and it has been installed for evaluation in a production line in a rolling mill. Mathematical morphology is used for the preprocessing and segmentation of images. This approach is powerful because the metal strip defects cannot be discriminated from the defecfless background by their contrast alone but only by reference to their shape and size as well. The algorithms have been mapped onto commercial hardware modules.