In this paper we report the instability of Adversarial Discriminative Domain Adaptation (ADDA), an unsupervised domain adaptation. The accuracy of ADDA is not stable, and we show that the instability comes from the initialization of CNN for the target domain, not from the pre-training with the source domain.
In this paper, we propose a method for estimating the number of rallies in table tennis match videos. A play scene is extracted from a video by using frame similarity, and a ball area is detected in an upper side of the table with frame difference and color thresholding. The detected ball is either a receive ball by the upper side player or the high toss service by the bottom player. The high toss service is removed with the collision sound of the ball and table or racket. Experiments of estimating the number of rallies show the accuracy of 63.7 % with 157 play scenes of T-League match videos in Japan.
In current pose estimation studies, bottom-up approaches estimating the pose from detected parts of the bodies and their relations have been major methods. This approach enabled us fast and accurate pose estimation. In other hands, there are many fields that the pose estimation could be applied. In this paper, we propose a bottom-up method for excavator pose estimation. For pose estimation, we generate ground truth confidence maps according to the annotation and evaluate loss function for training. We evaluated the model by calculating the loss between the estimated map and true map.
Traditionally, RGB rendering that calculates the intensity of light in only three components has been often used for generating photorealistic images in global illumination environment, but the method cannot render wavelengthdependent phenomena accurately. On the other hand, Spectral rendering generates photorealistic images in various kinds of scenes including wavelength-dependent phenomena such as interference and fluorescence. However, the method is computationally expensive compared to RGB rendering especially in global illumination environment, because the spectral intensity of light over the range of the visible light should be calculated each time light rays collide with scene objects in the raytracing process. To reduce the computational cost of the spectral rendering, we introduce Image-based Lighting (IBL) where target objects are rendered without a number of iterations of the ray bouncing with scene objects by using a light probe image as an ambient light. We extend the IBL into a spectral IBL in order to combine IBL with spectral rendering, that is, the spectral image that includes the spectral intensity of ultraviolet in addition to visible light is used for the light probe image, and the spectral intensity of light is calculated to render the target objects. The proposed method is able to render wavelength-dependent phenomena realistically in shorter time, because the number of intersections between rays and objects is much smaller than that of the case without IBL. We have implemented the proposed method to a PBRT renderer and rendered scenes including the effects of fluorescence to demonstrate the usefulness of the proposed method.
We propose an efficient optical tomography with discretized path integral. We first introduce the primal-dual approach to solve the inverse problem formulated as a constraint optimization problem. Next, we develop efficient formulations for computing Jacobian and Hessian of the cost function of the constraint nonlinear optimization problem. Numerical experiments show that the proposed formulation is faster (23.14±1.32 s) than the previous work with the log-barrier interior point method (91.17±1.48 s) for the Shepp–Logan phantom with a grid size of 24×24, while keeping the quality of the estimation results (root-mean-square error increasing by up to 12%).
We present a framework for optical tomography based on a path integral. Instead of directly solving the radiative transport equations, which have been widely used in optical tomography, we use a path integral that has been developed for rendering participating media based on the volume rendering equation in computer graphics. For a discretized two-dimensional layered grid, we develop an algorithm to estimate the extinction coefficients of each voxel with an interior point method. Numerical simulation results are shown to demonstrate that the proposed method works well.
This paper investigates a trade-off between computation time and recognition rate of local descriptor-based recognition for colorectal endoscopic NBI image classification. Recent recognition methods using descriptors have been successfully applied to medical image classification. The accuracy of these methods might depend on the quality of vector quantization (VQ) and encoding of descriptors, however an accurate quantization takes a long time. This paper reports how a simple sampling strategy affects performances with different encoding methods. First, we extract about 7.7 million local descriptors from training images of a dataset of 908 NBI endoscopic images. Second, we randomly choose a subset of between 7.7M and 19K descriptors for VQ. Third, we use three encoding methods (BoVW, VLAD, and Fisher vector) with different number of descriptors. Linear SVM is used for classification of a three-class problem. The computation time for VQ was drastically reduced by the factor of 100, while the peak performance was retained. The performance improved roughly 1% to 2% when more descriptors by over-sampling were used for encoding. Performances with descriptors extracted every pixel ("grid1") or every two pixels ("grid2") are similar, while the computation time is very different; grid2 is 5 to 30 times faster than grid1. The main finding of this work is twofold. First, recent encoding methods such as VLAD and Fisher vector are as insensitive to the quality of VQ as BoVW. Second, there is a trade-off between computation time and performance in encoding over-sampled descriptors with BoVW and Fisher vector, but not with VLAD.
KEYWORDS: Image segmentation, Image processing, Feature extraction, Image processing algorithms and systems, Color image processing, Optical spheres, Mouth, RGB color model, Information science, Information technology
We propose a method for segmenting a color image into object- regions each of which corresponds to the projected region of each object in the scene onto an image plane. In conventional segmentation methods, it is not easy to extract an object- region as one region. Our proposed method uses geometric features of regions. At first, the image is segmented into small regions. Next, the geometric features such as inclusion, area ratio, smoothness, and continuity, are calculated for each region. Then the regions are merged together based on the geometric features. This merging enables us to obtain an object-region even if the surface of the object is textured with a variety of reflectances; this isn't taken into account in conventional segmentation methods. We show experimental results demonstrating the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.