In recent years, research on virtual fitting has been conducted in the fashion field. Many of them have been put to practical use in prepared clothes, and companies are using the information on the shape, size, and fabric of their clothes to provide users with virtual fitting. In the case without known data, there are many methods of estimating the shape and size of the clothes in images. Using these methods, users can try on virtually the clothes they want to wear while fitting the users’ body shape and pose. On the other hand, a method for estimating the fabric of clothes remains to be developed. Because the materials of clothes are related to the softness of clothes in virtual fitting, it is difficult to reproduce the realistic movements and wrinkles of clothes using the conventional virtual fitting system. This study proposes a method for estimating the material of fabric from clothes images, aiming at realistic virtual fitting. A dataset focusing on each fabric’s texture and luster is constructed and estimated using a Convolutional Neural Network (CNN).
In recent years, virtual reality (VR) and augmented reality (AR) have been developed and applied to various simulations for business and commercial use. In these simulations, computer graphics (CG) becomes very important to express virtual objects, and there are many studies on the expression of cloth. Some optical properties of an object are necessary to represent cloth with CG. These optical properties depend on the material of the thread, the number of threads, and the thickness. Therefore it is difficult to represent clothes corresponding to these changes. This study proposes a method to formulate the reflection and transmittance that depend on the component of the cloths. To formulate the reflection, we use the Kubelka-Munk theory and the component of the cloth that can be easily obtained using a smartphone, etc.
This paper proposes a method for estimating 3D information, such as shape, orientation, size, and position of objects in a monocular image, and reproduce scenes in 3D point clouds using Convolutional Neural Network (CNN). This study proposes a network that combines depth estimation, object detection, and point cloud estimation to estimate 3D information of objects. The proposed network requires networks for object detection and segmentation, and a point cloud estimation for object shape estimation. The point cloud estimation network is robust to the reproduction of the object's surface and can deal with unknown objects through a semantic understanding of the object’s shape. In addition to these networks, we combine a depth estimation network for estimating the depth of the entire scene and the distance between the camera and object. In this paper, we consider the point cloud estimation network. We estimate the point clouds for real objects in the images of the dataset and evaluate the output point clouds.
Recently, object recognition using CNN is widespread. Still, medical images do not have a sufficient number of images because they require the doctor’s findings in the training dataset. On such a small-scale dataset, there is a problem that CNN cannot realize enough high recognition accuracy. As a solution to this problem, there is a method called transfer learning that reuses the weights learned on a large dataset. In addition, there is research on a method of pruning parameters unimportant for the target task during transfer learning. In this study, after transfer learning is performed, the convolution filter is evaluated using pruning, and the low evaluation filter is replaced with the high evaluation filter. In order to confirm the usefulness of the proposed method in recognition accuracy, we compare it with the three methods, i.e., transfer learning only, pruning, and initializing the filter. As a result, we were able to obtain a high recognition accuracy compared to other methods. We confirmed that CNN might be affected by replacing the filter in object recognition of small-scale datasets.
In urban development, it is important to make a plan that takes into account the changes in the appearance of natural objects after decades. This study proposes a simulation method of tree growth for the prediction of the appearance change of natural objects.
Cleaning is inseparable in life, but it is impossible to see with the naked eye where the room was actually cleaned. For this
reason, if information on the location where the cleaning was performed cannot be shared when cleaning by multiple
people, there is a possibility that an unclean area is remained. Therefore, if Augmented Reality (AR) can be used to
visualize the passing area of the hand or cleaning tool being cleaned, it will lead to improve cleaning efficiency and increase
motivation by visualizing the cleaning area. The purpose of this research is to obtain and superimpose the location
information of the passing area using Simultaneous Localization and Mapping (SLAM) in order to visualize the passing
area of the hand or the cleaning tool using AR.
When the printed material is imaged by a monocular digital camera, geometric distortions caused due to folds result in a different appearance from the content of the original printed material. This study aims to reproduce appearance by correction the obtained image. As a proposed method, the geometric distortion is corrected by deforming each local area after dividing the printed material image into local areas. In addition, the brightness change by shading is also corrected.
In recent years, many SLAM (simultaneous localization and mapping) systems have appeared showing impressive dense scene reconstruction. However, the normal SLAM system build 3D scenes at point level without any semantic information. Many computer vision applications require high ability of scene understanding and point-based SLAM shows insufficiency in these applications. This paper studies about fusing 3D object recognition into SLAM system, using hand-held RGB-D camera and RTAB-Map to reconstruct dense point cloud of 3D indoor scene. Then we use supervoxel based point cloud segmentation approaches to over-segment the scene. 3D object classification model trained by PointNet is added to merge the segmentation process and object recognition. Our experiment on indoor environment shows the effectiveness of this system.
There are various kinds of learning systems in the world and quite a lot of them are using video sources. Also, those video sources have many kinds according to the content of learning and aim. In this paper, I'd like to describe the usability of learning systems by using a super high definition video source focusing on making
handling of video source using super high resolution. Furthermore, the future progress and present problems would be considered by proposing an on-demand learning system using a super high definition video source. The super high resolution here means 4K (4096x2160 dots).
Geometric registration between a virtual object and the real space is the most basic problem in augmented reality. Model-based tracking methods allow us to estimate three-dimensional (3-D) position and orientation of a real object by using a textured 3-D model instead of visual marker. However, it is difficult to apply existing model-based tracking methods to the objects that have movable parts such as a display of a mobile phone, because these methods suppose a single, rigid-body model.
In this research, we propose a novel model-based registration method for multi rigid-body objects. For each frame, the 3-D models of each rigid part of the object are first rendered according to estimated motion and transformation from the previous frame. Second, control points are determined by detecting the edges of the rendered image and sampling pixels on these edges. Motion and transformation are then simultaneously calculated from distances between the edges and the control points. The validity of the proposed method is demonstrated through experiments using synthetic videos.
Two methods are described to accurately estimate diffuse and specular reflectance parameters for colors, gloss
intensity and surface roughness, over the dynamic range of the camera used to capture input images. Neither
method needs to segment color areas on an image, or to reconstruct a high dynamic range (HDR) image. The
second method improves on the first, bypassing the requirement for specific separation of diffuse and specular
reflection components. For the latter method, diffuse and specular reflectance parameters are estimated separately,
using the least squares method. Reflection values are initially assumed to be diffuse-only reflection
components, and are subjected to the least squares method to estimate diffuse reflectance parameters. Specular
reflection components, obtained by subtracting the computed diffuse reflection components from reflection
values, are then subjected to a logarithmically transformed equation of the Torrance-Sparrow reflection model,
and specular reflectance parameters for gloss intensity and surface roughness are finally estimated using the least
squares method. Experiments were carried out using both methods, with simulation data at different saturation
levels, generated according to the Lambert and Torrance-Sparrow reflection models, and the second method,
with spectral images captured by an imaging spectrograph and a moving light source. Our results show that
the second method can estimate the diffuse and specular reflectance parameters for colors, gloss intensity and
surface roughness more accurately and faster than the first one, so that colors and gloss can be reproduced more
efficiently for HDR imaging.
To overcome shortcomings of digital image, or to reproduce grain of traditional silver halide photographs, some
photographers add noise (grain) to digital image. In an effort to find a factor of preferable noise, we analyzed how a
professional photographer introduces noise into B&W digital images and found two noticeable characteristics: 1) there is
more noise in mid-tones, gradually decreasing in highlights and shadows toward the ends of tonal range, and 2)
histograms in highlights are skewed toward shadows and vice versa, while almost symmetrical in mid-tones. Next, we
examined whether the professional's noise could be reproduced. The symmetrical histograms were approximated by
Gaussian distribution and skewed ones by chi-square distribution. The images on which the noise was reproduced were
judged by the professional himself to be satisfactory enough. As the professional said he added the noise so that "it
looked like the grain of B&W gelatin silver photographs," we compared the two kinds of noise and found they have in
common: 1) more noise in mid-tones but almost none in brightest highlights and deepest shadows, and 2) asymmetrical
histograms in highlights and shadows. We think these common characteristics might be one condition for "good" noise.
Since commercial image detectors, such as charge-coupled device (CCD) cameras, have a limited dynamic range, it is difficult to obtain images that really are unsaturated, as a result of which the reflectance parameters may be inaccurately estimated. To solve this problem, we describe a method to estimate reflectance parameters from saturated spectral images. We separate reflection data into diffuse and specular components at 5-nm intervals between 380nm and 780nm for each pixel of the spectral images, which are captured at different incident angles, and estimate the diffuse reflectance parameters by applying the Lambertian model to the diffuse components. To estimate the specular reflectance parameters from the specular components, we transform the Torrance-Sparrow equation to a linear form, assuming Fresnel reflectance is constant. We then estimate specular parameters for intensity of the specular reflection and standard deviation of the Gaussian distribution, using the least squares method from unsaturated values of the specular components. Since Fresnel reflectance contributes to the physically based Torrance-Sparrow model in computer graphics and vision, we estimate both the Fresnel reflectance in terms of the Fresnel equation for the incident angle and the refractive index of the surface for dielectric materials, which varies with wavelength. We carried out experiments with measured data, and with simulated specular components at different saturation levels, generated according to the Torrance-Sparrow model. Our experimental results reveal that the diffuse and specular reflectance parameters are estimated with high quality.
We propose a new framework for interactive Augmented Reality (AR) and Mixed Reality (MR) representation using both visible and invisible projection onto physical target objects. Projection-based approach for constructing AR/MR uses physical objects such as walls, books, plaster ornaments and whatever the computer generated contents can be optically projected onto. Namely, projection makes it possible to use real objects as displays.
We mainly focus on capturing and utilizing the 3D shape of the object surface, whose information allows the AR/MR system to take into account the visual consistency when merging the physical and rendered objects. 3D shape data of the object can be used to compensate the distortion caused by the difference between positions of projectors and the viewer. The other advantage is the capability to generate proper visual occlusion between physical and virtual objects so that they seem to coexist in front of the viewer.
What we demonstrate in this study is to employ near-infrared pattern projection for triangulation so that scanning and updating the geometry data of the object is automatically performed in background process, thus parallel processing to provide AR/MR representation can be achieved according to dynamic physical geometry changes.
We propose a new technique to reproduce faithfully both the color and the gloss of an object on a computer, using multispectral images. An imaging spectrograph equipped with a monochrome charge-coupled device (CCD) camera is fixed in front of the target object. Multispectral images of a linear portion of the object's surface are captured at suitable intervals by a measuring system which comprises a light source orbiting the target object. To obtain spectral images for the whole surface, the target object is also rotated. The reflection is separated into diffuse and specular components, according to the dichromatic reflection model, and the diffuse parameters are estimated at 5-nm intervals between 380nm and 780nm for each pixel. Since the CCD camera used to capture images has a limited dynamic range, we suppose that the specular reflection is independent of wavelength for the dielectrics, and that the specular reflections are saturated, although some of them can be non-saturated. We adopt the Torrance-Sparrow reflectance model for the specular reflection, and estimate the specular parameters using the least squares method for each pixel. Our experimental results reveal that the diffuse parameters for the color and the specular parameters for the gloss of the target object are satisfactorily estimated.
In this paper, we propose new measurement technique of whole three dimensional shape for small moving objects. The proposed measurement system is very simple structure with the use of a CCD camera that installed a fish-eye lens and a cylinder that coating mirror inside. The CCD camera is set on the top side of the cylinder, and its optical axis is set to the center of cylinder. A captured image includes two types information. One is direct view of the target, the other is reflected view. These two information are used for measuring the shape of target by means of stereo matching. This proposed method can acquire the shape of target using only single image, so we can obtaine the three dimensional shape with the moving with the use of image sequence.
Wearable 3D measurement realizes to acquire 3D information of an objects or an environment using a wearable computer. Recently, we can send voice and sound as well as pictures by mobile phone in Japan. Moreover it will become easy to capture and send data of short movie by it. On the other hand, the computers become compact and high performance. And it can easy connect to Internet by wireless LAN. Near future, we can use the wearable computer always and everywhere. So we will be able to send the three-dimensional data that is measured by wearable computer as a next new data. This paper proposes the measurement method and system of three-dimensional data of an object with the using of wearable computer. This method uses slit light projection for 3D measurement and user’s motion instead of scanning system.
Under growth of request for energy saving, city planners should consider efficiency of energy consumption from the beginning. Diversified analysis of end-use energy consumption is indispensable for exploration of desirable energy system in urban area. When the visualization is available on the Internet, the city planners can discuss freely on given plans on the Internet and can ask for the help and comments of certain learned people. This paper proposes a VR-based interactive visualization system utilizing hyperlink function of VRML. The proposed visualization relates end-use energy consumption with consumers' geometrical arrangements and nests sets of visualizations. The city planners can observe them in a virtual environment over the Internet. The proposed system was applied to a set of end-use electric power consumption data of a certain area. Experimental results clear that the visualization lets users comprehend a trend of end-user and characteristics of each consumer.