PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 11515, including the Title Page, Copyright information, Table of Contents, Author and Conference Committee lists.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We investigated the relationship between the face recognition performance of individuals and their eye movement characteristics that were measured while each subject observed the faces that were displayed on a screen. We formulated the statistical nature of their eye movements from a machine-learning perspective by applying a hidden Markov model (HMM). We used a set of computer-generated faces that included both the images of actual faces and synthetic images obtained by slightly transforming the impressions of the original faces. With these visual stimuli, we conducted a simple face recognition experiment, and subjects judged whether they had seen the faces before. We obtained a quantitative hit rate score for each stimulus and subject. We also tracked their eye movements and recorded as temporal chains their eye fixation points using an eye-tracking system. For each class of face stimulus and subject, we estimated the HMM parameters from the training samples of the eye movement. For the given eye movement data as test samples, we conducted a classification test among the pre-defined classes based on the differences of the log-likelihood values obtained from each HMM. Better discrimination of the subjects by the HMM-based classification of the eye movement data corresponded to lower face recognition scores by the subjects, suggesting that individually consistent eye movement patterns may lower the face recognition performance by humans.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Eliminating reflections on a single-image has been a challenging issue in image processing and computer vision, because defining an elaborate physical model to separate irregular reflections is almost impossible. In fact, while human vision can automatically focus on the transmitted object, basic deep neural networks even have a limitation to learn the attentive mechanism. In this paper, to solve this problem, a Generative Adversarial Networks guided by using Depth of Field (DoF) is proposed. The DoF is formulated by using image statistics and indicates the focused region of image. Thus, by adding this information to both generative and discriminative networks, the generator focuses on the transmitted layer and the discriminator will be able to estimate the local consistency of the restored areas. Since it is intractable to obtain the ground-truth transmitted layer in real images, a dataset with synthetic reflection is considered for quantitative evaluation. The experimental results demonstrate that the proposed method outperforms the existing approaches in both PSNR and SSIM. The visual outputs indicate that the proposed network convincingly eliminates the reflection and produce sufficient transmitted layers as compared to the previous methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Writer verification is usually conducted by checking similarity between same type characters written by known and unknown writers. However, in the case where the same type characters do not exist in each writer’s documents, writer verification is very difficult. In this paper, we propose a method to extract the handwriting features independent of character types to solve this problem. The proposed model is based on AutoEncoder, and applying Adaptive Batch Normalization (AdaBN) and Adaptive Instance Normalization (AdaIN) for each layer of Encoder or Decoder to extract the objective features. We conducted a writer verification experiment using handwriting images pairs between different character types of ETL-1 Character Database (ETL-1). As a result of the experiment, we confirmed that the proposed method could perform writer verification with high accuracy even in such a case.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recently, as chemotherapy has advanced, it is important to accurately diagnosis the histological type (adenocarcinoma, squamous cell carcinoma and small cell carcinoma). In previous study, automated classification method for lung cancers in cytological images using a deep convolutional neural network (DCNN) was proposed. However, its classification accuracy is approximately 70%, therefore improvement in accuracy is required. In this study, we focus on liquid-based cytology images and clinical record. In this study, we aimed to improve the classification accuracy of lung cancer type by combining cytological images and electronic medical records. We aimed to develop of classification method of lung tumor type by combining cytological images and clinical record. First, the cytological images were collected. The original microscopic images were first cropped to obtain images with resolution 256 × 256 pixels. And then, we collected personal clinical data (age, gender, smoking status, laboratory test values, tumor markers and so on) corresponding to cytological images. Next, image features were extracted from cytological images using VGG-16 model pretrained on the ImageNet dataset. 4096 features before the fully connected layer were extracted. Then, these features were reduced dimensions by PCA. Image features obtained from the DCNN and clinical data corresponding to cytological images were given to the classifier. Finally, classification result of 3 histological categories was obtained. Evaluation results showed that classification by combining cytological images and clinical record improved classification accuracy than by cytological images alone. These results indicate that the proposed method may be useful for histological classification of lung tumor.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Sparse models have been widely used in image denoising, and have achieved state-of-the-art performance in past years. Dictionary learning and sparse code estimation are the two key issues for sparse models. When a dictionary is learned, sparse code estimation is equivalent to a general least absolute shrinkage and selection operator (LASSO) problem. However, there are two limitations of LASSO: 1). LASSO gives rise to a biased estimation. 2). LASSO cannot select highly correlated features simultaneously. In recent years, methods for dictionary construction based on the nonlocal self-similarity property and weighted sparse model, relying on noise estimation, have been proposed. These methods can reduce the biased gap of the estimation, and thus achieve promising results for image denoising. In this paper, we propose an elastic net with adaptive weight for image denoising. Our proposed model can achieve nearly unbiased estimation and select highly correlated features. Experimental results show that our proposed method outperforms other state-of-the-art image denoising methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The recent works on the light field (LF) image enhancement are focused on specific tasks such as motion deblurring and super-resolution. State-of-the-art methods are limited with the specific case of 3-degree-of-freedom (3-DOF) camera motion (for motion deblurring) and straight-forward high-resolution neural network (for super-resolution (SR)). In this work, we proposed a framework that utilizes the deep neural net to solve LF spatial super- resolution and deblurring under 6-DOF camera motion. The neural network is designed with end-to-end fashion and trained in multiple stages to perform robust super-resolution and deblurring. Our neural network achieves superior results in terms of quantitative and qualitative performance compared to the recent state-of-the-art LF deblurring and SR algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Binarizing historically degraded as-built drawing (HDAD) maps is a challenging job, especially in removing noise, yellowing areas, and folded lines while preserving the foreground components. This paper first proposes a convolutional neural networks-based (CNN-based) color classifier to determine the dominant color class of each HDAD block. Then, a dominant color driven- and CNN-based binarization method is proposed, producing a high-quality binarized HDAD map. Based on real HDAD dataset, the thorough experiments have been carried out to show that in terms of F-measure and perceptual effect, our binarization method substantially outperforms existing state-of-the art binarization methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to its simplicity, median filter is a very famous and useful tool in the fields such as image processing and computer graphics. Median filter is mainly for eliminating irrelevant details, especially removing salt-and- pepper noises in image. It has the ability to preserve structural edges compared with box filter and Gaussian filter, however, this ability is very limited. When the radius of filter window becomes larger, the edge-preserving ability also becomes very weak. In this paper, we propose a median-like filter that removes small details including salt-and-pepper noises in image while having stronger edge-preserving ability than classical median filter. The filter computes the output at the observed pixel using 8 sub-windows and a full window. Among these windows, 4 of them are built on the 4 quadrants respectively, other 4 of them are on left, right, top, and bottom half planes. All of these sub-windows contain the observed pixel. Moreover, since medians are computed from histograms, we update column histograms and kernel histograms by simple subtraction and addition operations to accelerate the filtering. The computational complexity of the proposed median-like filter is independent of window size and thus is in constant time. A SSIM (Structural SIMilarity) evaluation demonstrates that the proposed median-like filter performs well.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, with the improvement of computer performance, it has become possible to represent dense mesh models in computer graphics. However, performing a manipulation on the dense mesh models might be costly. In order to reduce the computational cost during manipulations, a dense model is often manipulated through coarse bounding cages that enclose the model. However, generating cages are usually tedious and time-consuming. In this paper, we propose a method of automatic cage generation by a variational remeshing method. We first evaluate the features, such as curvature, dihedral angles, of an original triangle model and then voxelize it. We extract and triangulate the outer faces of the voxels and transfer the features of the original model to the outer faces. Finally, we apply a variational remeshing method to this triangular mesh. The variational remeshing method is a method minimizing an energy function which corresponds to a good solution by global relaxation until convergence. An experiment result demonstrates that our method is effective.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Dual energy computed tomographic (DECT) enhances tissue characterization by obtaining two or three material images from two measurements at two different X-ray source potential. Recently, developing for multi-material decomposition (MMD) in DECT has been studied to obtain decomposed material images. MMD need to reduce noise and maintain spatial resolution of decomposed images. However, no studies have reported total nuclear variation (TVN) as noise suppression method for MMD to improve decomposition accuracy. We proposed a noise suppression using TVN for the direct MMD. The TVN method was applied to CT data before material decomposition to reduce noise. Tissue characterization Model 467 phantom was employed as the test object in this study. To investigate the effect of various basis materials, we selected four materials as basis materials. The volume fraction (VF) value was calculated to quantitatively evaluate quality of decomposed images. The results are compared to direct MMD method and proposed method. In all decomposed images, VF accuracies using proposed method were better than the direct MMD method. Also, proposed method can provide decomposed images with a small difference in separated density. In conclusion, proposed method could provide better quantitatively accurate images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a metric (Euclidean) rectification method for a target spatial plane showed up in a single viewpoint image. Our approach is realized from a homography matrix calculated without using four points correspondence, unlike general well-known metric rectification methods. The method first estimates a projection matrix for the pre-defined world coordinate system on a specific plane regarded as the base one, and then sequentially estimates adjacent spatial planes appeared in the input image, using our previous method.1 Once our method can estimate the target plane, the world coordinate system on the base plane is moved to on the estimated target plane, and it is regarded as the new world coordinate system. By doing this, it can be showed that the homography matrix for the target plane is calculated only from the original projection matrix and the transformation parameters between the old and the new world coordinate systems (rotation and translation). Finally, metric rectification can be realized using the calculated homography matrix; i.e., the proposed method can obtain 3D information such as length and angle on a plane where four-point correspondence is not easily determined, only using a single view image. Experimental results showed the validity and effectiveness of our approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a method for reconstructing texture patterns using deep learning. The proposed method is based on a deep neural network called pix2pix generative adversarial network (GAN) that is able to learn the conversion process between input and output images. It extends the pix2pix by adding constraints to the network to change the underline image pattern while retaining the input fine texture. Using texture images with underlying patterns and fine textures as test data, we verified the effectiveness of our modification through several computational experiments. Although the generated images can keep the input color and edge information, these images are blurred, and the input texture information cannot be reproduced in some cases. The latter problems need to be improved in the future research direction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose convolutional neural networks for semantic segmentation on road markings in the situation where sequential segmentation ground truth masks are available. The proposed model aggregates the temporal information and the context information from the multiple frames. Moreover, we employ CGNet as the backbone network to reduce trainable parameters and computation speed. In the experiment, we evaluate the model using the Gifu-city Road Marking Segmentation Dataset, which includes road markings of open roads in Gifu city. As a result, the segmentation performance such as a white center line and white dash line is an improvement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Manga (Japanese comic) is a globally popular content. In recent years, sales of e-comics that converted to electronic data from paper-based manga are increasing because of the widespread use of electronic terminals. Against this background, it has been proposed to improve the accessibility of e-comics by tagging manga images with metadata. In order to allocate metadata more efficiently, technology that automatically extracts elements such as character and speech is required. One way to classify characters is to get image features from the character's faces and cluster them. Previous research has shown that using the intermediate output of CNN which fine-tuned with character face images is effective for character face recognition. We proposed a clustering method using Density-Based Spatial Clustering of Applications with Noise (DBSCAN) to classify character face images without specifying the number of clusters. However, DBSCAN is greatly affected by the hyperparameter. The purpose of this study is to automatically classify character face images without complicated hyperparameter setting. We examine the application of Ordering Points to Identify the Clustering Structure (OPTICS) and Hierarchical DBSCAN (HDBSCAN), which are density-based clustering algorithms that extend DBSCAN. OPTICS is an algorithm for finding clusters in spatial data, and HDBSCAN is an algorithm extracts flat partition from hierarchical cluster data. We also verify the effective CNN model as the feature extractor of face images. Experimental results showed that HDBSCAN is effective for character face image clustering.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose an articulation awareness system with a three-dimensional (3D) tongue using virtual reality (VR). Human speech sounds are made through a combination of vocal fold vibration (voice source) and tongue or lip motion (articulation). Articulation is the movement of the jaws, tongue, and lips. A speaker should pay attention to the importance of these movements. In this study, the tongue shape is visualized to raise awareness about the speech organ. The subjects observed inside and outside of the mouth as if they were the size of the thumb. Models of the oral area were created from magnetic resonance imaging data collected during vowel production. The subjects reported that they were aware of the articulators after experiencing the 3D tongue using a VR system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Virtual Reality (VR) contents have been becoming popular. However, VR would certainly cause VR motion sickness. One of the main factors in VR motion sickness might be the discrepancy between the predicted physical experiences and the actual physical experiences. To prevent such discrepancy, an approach is effective that synchronizes the user's physical movements in the virtual space with the user's somatosensory movements. There has been developed a VR system which provides such a somatosensory interface. However, it is too huge, expensive and difficult to operate.
In this paper, we propose a simple somatosensory interface for moving virtual space by imitating swimming motions. The users of our system move the virtual space like to swim in the water by the kicks of both foot and the strokes of both arms. We developed a sensory interface that measures user's imitating swimming motions by four gyro sensors. Each gyro sensor is a 9-axis gyro sensor, including a 3-axis angular velocity sensor, a 3-axis acceleration sensor, and a 3-axis geomagnetic sensor. Four gyro sensors measure the movements of the strokes of both arms and kicks of both feet, respectively.
Five students evaluated our interface. They use our interface to navigate the specified course. After that, they answered the questionnaires. Experimental results verified that our interface was useful and effective for moving the virtual space without causing VR motion sickness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
When a consumer purchases a product packaged in a box or a bag for the first time, it is difficult to estimate the volume of the invisible contents in the packaged product. The purpose of this study is to help the consumer estimate the volume of the contents in the packaged product. This study developed augmented reality (AR) advertising that presented a virtual model of the product to make the consumer perceive the sense of the product’s volume. In this paper, we compared the sense of the food product’s volume among with the product only, the AR advertising on a tablet terminal, the AR advertising on a head mounted display (HMD) and the point-of-purchase (POP) advertising. As a result, we found that presenting the virtual model as AR advertising was effective in estimating the actual volume of the product. In addition, our result suggested that it was easier to understand the sense of the product’s volume using the HMD than using the tablet terminal and that AR advertising using the virtual model was useful for increasing the purchase intention.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Internet of Things (IoT) devices increases rapidly. In addition, there are more opportunities for users who do not understand security status of IoT devices to use these. Therefore, this article proposes a visualization system with mixed reality (MR) technology, which helps users understand security status of IoT devices around them. The proposed system introduces an image recognition technology into obtaining type and location of the device. Based on both of location of the devices and the relevant security information stored in database, the proposed system overlays color as security level and messages of recommended action on the devices. If the user reacts correctly, the reactions are stored as the action history for the device in the system. Then, the proposed system updates the security status of the devices based on the action history and the relevant security information stored in database. Installed on a head-mounted-display (HMD), the proposed system expresses appropriate security status of IoT devices more intuitively than the conventional visualization such as log-text or graph. Also, the system can express the vulnerabilities related to positional relation of the devices. Such intuitive and position-related representation of the security is expected even for non-expert to act adequately.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This article proposes a virtual reality (VR) system to equip knowledge and skill of the tuna dismantling, they have to learn it in hands-on style as well as in conventional lecture and self-study styles. However, high cost of tuna gives them less chances of the practical work. Therefore, a VR based system is proposed as a supplemental tool, in which the students can study and practice dismantling the tuna in a virtual space. The proposed system consists of PC, a HMD for VR and controllers for both hands. One controller is assumed to be a cutting knife, and the another is used for tracking the position of the other hand. The system encourages the user to cut the virtual tuna with the controller as the cutting knife. From position and direction of the knife when the knife is cut in, the system determines the correctness of the cutting. If the cutting is within normal range, the system represents the cutting and encourage the user to proceed. If not, the system give caution by message and/or vibration to the user and encourages him/her to re-cut in. The system has another mode called "test mode." In the test mode, the system gives no caution/hints to the user, while the system determines the correctness of the user's action (cutting). The accumulated determination is converted to score, so that the user can easily understand his/her level of achievement. Thus, the system is expected to be effective as a supplemental and continuous study tool.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Artificial Intelligence and Interdisciplinary Research I
This paper presents a novel deep-learning approach to analyze the fish feeding intensity based on the images of fish tanks during the fish feeding process. The grade of the fish feeding intensity is an important indicator of fish appetite. On the design of a smart feeding system in aquaculture, this information is of great significance for guiding feeding and optimizing the fish production. However, conventional fish appetite assessment methods are inefficient and subjective. To solve these problems, in this study, based on a space-time two-stream 3D CNN, a deep learning approach for grading fish feeding intensity is proposed to evaluate fish appetite. The flow of the approach is implemented as follows. First, a fixed RGB camera is setup to capture the videos from the fish tanks during the feeding processes. This also constructs a dataset for training the two-stream neural network, and the fish appetite levels are graded using the trained neural network model. Finally, the performance of the method is evaluated and compared with other CNN-based deep learning approaches. The results show that the grading accuracy reached 91.18%, which outperforms the compared CNN-based approaches. Thus, the model can be used to detect and evaluate fish appetite to guide production practices.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a novel element-aware domain enhancement and adaptation (EDEA) approach for semantic segmentation to increase the segmentation accuracy. In the proposed EDEA approach, we first analyze the warning elements in the testing step, such as the falling-leaves, manhole covers, cirrus, advertisements, etc., caused invalid segmented objects. Then, we create a new GTA5-like (Grand Theft Auto V-like) dataset containing the scenarios including these warning elements. Further, we perform a domain adaptation on the created GTA5-like dataset to generate a photo-realistic GTA5-like dataset. Finally, we combine the generated dataset with the original photo-realistic GTA5 dataset and the realistic Camvid dataset to constitute a more diverse training dataset. The comprehensive experimental results have confirmed the semantic segmentation accuracy improvement of the proposed EDEA approach relative to the previous two domain adaptation methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study investigates the potential of a radiomic feature-based prediction model of non-small cell lung cancer (NSCLC) recurrence within two years on chest CT images. First, tumor areas are defined as intra-tumoral areas that have been manually segmented by a radiologist and the largest tumor ROI are selected as the representative cross-section. Second, a total of 68 radiomic features including intensity, texture and shape features are extracted within the tumor area. Then, three features with weights that are clearly distinguished from other weights are defined as significant features using the Relief-F algorithm. Finally, to predict lung cancer recurrence within two years, random forests and SVM are trained for the classification of two groups representing recurrence and non-recurrence within two years. In the experimental results, since the accuracy, sensitivity, specificity, and AUC were 71.42, 80.95, 61.90, and 0.74 for random forest and were 66.66, 61.90, 71.42 and 0.65 for SVM, the prediction model constructed by the random forest shows better performance. Kaplan-meier curve that fitted with seperated patients shows the estimated probability by radiomicbased prediction model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We generate super-resolution orbital bone CT images using 3D SRGAN network to improve the accuracy of segmentation of thin bones with large slice thickness. Our method consists of data preparation and super-resolution thin bone image generation. Experimental results show that the generated super-resolution images using 3D SRGAN are clearly observed in thin bone areas and are more similar to the original high resolution images than other images using bilinear and bicubic interpolations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Equipment used in a crime is a clue to solve cases. Cars are often used in crimes for transportation. Security cameras are set in everywhere of downtowns. Drive recorders are set in many cars in Japan. Security cameras and drive recorders can record an image of a license number plate. However, the license number plate is often recorded in a small part of the image. It is needed to enlarge the image to read. However, the enlarged image is in low-resolution and blurry. Therefore, edges of the image are not clear. It is difficult to read the number plate from the low-resolution and blurry image. Machine learning has been used to identify blurry letters in an image. Performances of machine learning method change depending on classifiers. In this paper, performances of five classifiers for identifying the license number plate were compared. The proposed research can potentially contribute in solving criminal cases.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Total-variation-based techniques for JPEG artifact removal have been proposed in the literature, which first formulate the artifact removal as a constrained minimization problem, and then solve it based on a convex optimization method. However, the techniques often require large computational cost due to a large number of iteration steps. In this paper, we thus propose a fast solution technique based on another convex optimization method, referred to as the accelerated alternating direction method of multipliers. The experimental results demonstrate that the proposed solution technique is advantageous over the conventional ones in terms of computational cost.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Abstract - In video coding algorithm, a transform is one of the most important video compression tools. Transform converts residual signals into frequency domain data to get decorrelated signals. Decorrelated data using transforms is used for coding efficiency improvement in video compression. In this paper, an efficient transform selection method in term of video compression is described by using transform block size and intra coding mode on the top of the HEVC standard.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, for reducing diagnostic burdens in stomach screening, a computer aided diagnostic system (CAD system) for endoscopic stomach images is required. In our previous study, an automated polyp detection method from endoscopic images using the SSD (Single Shot MultiBox Detector) has been developed with 93.7% of detection rate. However, the detection target of this method has been limited only to fundic gland polyp. In this paper, we propose a method for automated detection and classification of two different types of polyp; fundic gland polyp (FGP) and hyperplastic polyp (HP) from endoscopic images using the SSD. In the experiment, 71 and 96 practical endoscopic images of FGP and HP were used. For training of SSD, 11210 and 5053 training images of FGP and HP were generated by data augmentation, respectively, and 20% of training images were automatically selected and used as verification images. As a result for test samples including 132 polyps (69 FGPs and 63 HPs), the detection rate for entire polyps was 96.2% (127/132), and the classification rate for two types of polyp was 88.6% (117/132). The number of false positive was only one all through the experiment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a method for automatically determining the motion parameters of robots to execute target tasks such as “scooping powdered tea and putting it into a teacup”. For robots to handle everyday objects, it is necessary to determine the motion parameters of robots for handling objects of various shapes and sizes. There are two methods for determining motion parameters. One involves using a 3D model of an object and the other does not involve using such a model. The latter method is effective in places where there are a wide variety of everyday objects such as in homes. However, it is assumed with this method that an object is placed face up. Therefore, this method cannot be used when an object is placed face down. We propose a method for determining motion parameters for handling changes in the shapes, sizes and poses (face up or face down) of objects. Our method uses a 3D deep neural network to recognize an object’s functions (e.g., “scoop” and “grasp”) and recognizes the poses of an object from the function information. Motion parameters are then determined based on the recognition results. We conducted an experiment to evaluate the performance of the method by testing it on five spoons of different shapes, sizes, and poses. The method had a success rate of approximately 86%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We previously proposed a lossless video coding method based on intra/inter-frame example search and probability model optimization. In this method, several examples, i.e. a set of pels whose neighborhoods are similar to a local texture of the target pel to be encoded, are searched from already encoded areas of the current and previous frames with integer pel accuracy. Probability distribution of an image value at the target pel is then modeled as weighted sum of the Gaussian functions whose peaked positions are given by the individual examples. Furthermore, model parameters that control shapes of the Gaussian functions are numerically optimized so that the resulting coding rate can be a minimum. In this paper, the above example search process is enhanced to allow fractional-pel positions for more accurate probability modeling.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
High-quality depth estimation from light field (LF) image is an important and challenging task for which many algorithms have been developed so far. While compression is inevitably required in practice for LF data due to its huge data amount, most depth estimation methods have not yet paid sufficient attention to the effect of compression on it. In this paper, we investigate various LF depth estimation methods to design a LF compression method in the context of good depth estimation. By noting that building the data cost is a very first step in most depth estimation algorithms and the data cost computation has a great impact on eventual quality of the depth image, in this paper, we present an in-depth analysis of data cost computation in LF depth estimation problem in the context of compression. Our results show that the data cost building on Epipolar Plane Image (EPI) outperforms other tested methods in this paper and is more robust to compression.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes an efficient lossless coding method for HDR color images stored in a floating point format called radiance RGBE. In this method, three mantissa parts of RGB components as well as a common exponent part, each of which is represented in 8-bit depth, are encoded by the block-adaptive prediction technique. In order to improve the prediction accuracy, mantissa parts of RGB components used in the prediction are adjusted so that their exponent parts can be regarded as same. Moreover, not only the same color but also already encoded other color components are used in the prediction to exploit inter-color correlations. Simulation results indicate that introduction of the above exponent equalization as well as inter-color prediction can considerably improve the coding efficiency.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Seam carving and its variants are popular as content-aware image resizing methods. However, they often suffer from the problem that excessive downscaling causes perceptually annoying distortions. This is mainly because penetration of the seams into some important objects becomes unavoidable at the latter stage of the processing. As a solution for this problem, we previously proposed a nonlinear downscaling technique which iteratively performed a DCT-based locally linear scaling operator within ‘belt-like seams’, i.e. seams with a certain width. To enhance this idea, in this paper, we replace the latter processing stage with a global linear scaling operator. A transition point between the nonlinear and linear processing stages is automatically determined based on a preservation measurement for the important objects. Simulation results show that our approach can produce subjectively better results than the conventional nonlinear downscaling methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a new method for action recognition using an extremely low-resolution infrared imaging sensor. Thermopile arrays give users privacy but this comes at the price of limited information captured. The question of what methods are applicable to this sensor remains open. In our work, we adopt a two-stream deep learning architecture that accepts both spatial and temporal sequences, processes them based on CNN and stacked GRU layers separately, and finally fuses the features for action classification. To the best of our knowledge, this is the first optical-flow-based method used in combination with extremely low-resolution thermal image sequences. We use a dataset of 16 × 16 pixel image sequences introduced by a related work to directly compare the results and demonstrate the superiority of our method. Experiments show that we are able to achieve a gain of nearly 6% (96.98% vs. 91.07%) in recognition accuracy in 5-classes setup classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traditionally, RGB rendering that calculates the intensity of light in only three components has been often used for generating photorealistic images in global illumination environment, but the method cannot render wavelengthdependent phenomena accurately. On the other hand, Spectral rendering generates photorealistic images in various kinds of scenes including wavelength-dependent phenomena such as interference and fluorescence. However, the method is computationally expensive compared to RGB rendering especially in global illumination environment, because the spectral intensity of light over the range of the visible light should be calculated each time light rays collide with scene objects in the raytracing process. To reduce the computational cost of the spectral rendering, we introduce Image-based Lighting (IBL) where target objects are rendered without a number of iterations of the ray bouncing with scene objects by using a light probe image as an ambient light. We extend the IBL into a spectral IBL in order to combine IBL with spectral rendering, that is, the spectral image that includes the spectral intensity of ultraviolet in addition to visible light is used for the light probe image, and the spectral intensity of light is calculated to render the target objects. The proposed method is able to render wavelength-dependent phenomena realistically in shorter time, because the number of intersections between rays and objects is much smaller than that of the case without IBL. We have implemented the proposed method to a PBRT renderer and rendered scenes including the effects of fluorescence to demonstrate the usefulness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There are few moving object detection techniques dealing with severely distorted video imagery, such as one taken from above the wavy water surface. In this paper, a method that identifies image frames containing a moving object from a video taken from above the wavy water surface is proposed. Considering the difficulty to apply common video processing techniques to such a video suffering from the severe distortion, the proposed method utilizes dynamic mode decomposition, a data-driven method for analysis of dynamical systems, to develop an algorithm that extracts information of a moving object from a video stream. The experimental evaluation shows that the proposed method is able to identify image frames containing a moving object from a severely distorted video stream.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The purpose of this research is to propose the method providing the visually impaired users with environment understanding via modality conversion from visual distance to haptic vibration information. According to studies on ecological perception, optical flow, which represents dynamical visual variation, plays significant roles in environment understanding for human. We have developed a head-mounted wearable device equipped with a 2-dimensional distance sensor and five vibro-motors arranged around the head. The vibration magnitude of vibro-motors is defined as not static distance to the obstacle but dynamic distance variation caused from the user movement. The vibro-stimuli are generated based on the optical flow characteristics. Thus, if the user moves to the obstacle dynamically, the user feels stronger vibration; in contrast, if the user pauses, the user feels no vibration since the distance to obstacles keeps constant. To evaluate basic performance of the proposed method, we asked five blindfolded subjects to walk toward the wall 50 times each. First 10 trials performed by each subject are considered as practice phase; hence, the remaining 40 trials are evaluated. In total 200 trials, the 97.5% of trials are considered as success ones; the almost subjects were able to perceive the wall existence and stopped without the collision with the wall. The experiment demonstrated the basic validity of our proposed method. The main contribution of this paper is utilizing dynamic distance variation to determine vibration magnitude and providing the user with vibro-stimuli simulated by optical flow to support the user’s localization in the environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We introduce ElectroMagnetic Guitar (EMG), a new guitar interface that supports chord playing mainly for beginner guitar players. In the case of right handed players, they play the guitar with their left hand’s fingers on a fretboard and pick the strings with their right hand’s fingers, our guitar guides the player’s fingering of the left hand by magnetic force. To realize such a concept, we made an original fretboard with controllable electromagnets mounted and attached it to the normal classic guitar. EMG ‘draws’ player’s left hand fingers with magnetic material attached toward proper chord positions on the fretboard. As a result of an informal user study, our EMG prototype offered sufficient magnetic force to keep chord form, though not enough power to properly draw fingering.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this research, we aim to improve the accuracy of wide-area projection using multiple projectors and speed up the measurement for projection in an environment surrounded by walls such as indoor spaces. In previous efforts, we have realized a method for large-scale image projection easily, by projecting a pattern onto the indoor space and measuring with a combination of a normal lens and a fish-eye lens. However, when patterns are projected in the indoor space surrounded by walls, an area in which measurement cannot be performed correctly by indirect reflected light occurs, which causes distortion in the projection result. Therefore, we reduce the influence of indirect reflected light by dividing and projecting pattern light. In addition, focusing on the fact that indirect reflected light loses high-frequency components in its generation process, it is greatly reduced by a simple difference process based on the similarity of indirect reflected light of a sufficiently divided pattern image. On the other hand, such division of the pattern image increases the time required for grasping the shape of the space. Therefore, divided patterns are designed to reduce the number of them as much as possible using color information and an efficient division method while efficiently removing indirect reflected light.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
When a blind person starts to cross a crosswalk, he/she may need to track moving objects such as pedestrians, pets, and so on in in order to avoid collision. This paper proposes a method of moving-obstacle tracking on a crosswalk for blindperson navigation system. In the method, borders of a crosswalk in front of the blind person is first determined using straight white lines on an intensity image, and a depth image is simultaneously used to detect candidate moving obstacles using differences among neighboring frames. The moving objects are then determined by height of the candidates, and positions of moving objects, which are assumed to be pedestrians and pets, are obtained from the depth image. This position information on previous image frames is finally used to estimate movement vector based on the moving object trajectory, and rectangular windows are employed to track moving objects in the next frame. To evaluate performance of the proposed method, experiments with pedestrians at a crosswalk were performed, and results showed effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a novel privacy-preserving machine learning scheme with encrypted images, called EtC (Encryption-then-Compression) images. Using machine learning algorithms in cloud environments has been spreading in many fields. However, there are serious issues with it for end users, due to semi-trusted cloud providers. Accordingly, we propose using EtC images, which have been proposed for EtC systems with JPEG compression. In this paper, a novel property of EtC images is considered under the use of z-score normalization. It is demonstrated that the use of EtC images allows us not only to protect visual information of images, but also to preserve both the Euclidean distance and the inner product between vectors. In addition, dimensionality reduction is shown to can be applied to EtC images for fast and accurate matching. In an experiment, the proposed scheme is applied to a facial recognition algorithm with classifiers for confirming the effectiveness of the scheme under the use of support vector machine (SVM) with the kernel trick.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a scalable AR (Augmented Reality) multiplayer robotic platform, which enables multiple players to control different machines (a drone and a robot) in shared environments, i.e virtual and real environments. We use state-of-theart visual SLAM (Simultaneous Localization and Mapping) algorithms for tracking machine poses based on camera and IMU (Inertial Measurement Units) inputs. Players will observe consistent AR objects between them thanks to our backend system, which synchronizes the AR objects between players. Moreover, the system is scalable in term of hardware (e.g. IMU, camera, machine type) and software (SLAM algorithm) as we utilize ROS for communication between modules. We demonstrate our system on a game developed in Unity, a robust and widely used popular game engine. We present some statistics of the game such as its frames-per-second performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The people of Indonesia experienced high cultural crisis, especially the younger generation who prefer foreign cultures rather than local cultures, so that local cultures fade away. The aim of this research is to develop a portal about Indonesian culture based on crowd sourcing contents to rise the recognition of local cultures. This research seeks to change the perception of old fashioned local cultures to become new fashioned local cultures by combining local cultures with the implementation of information and communication technology which is currently being used by the people of all walks of life. The method used in this study is crowdsourcing because it requires the help of all members of the communities to fill in the content and managed data which later have a large volume, so that it needs cloud computing. The website portal was successfully created and the next challenge is to increase website contents such as video, photographs and narration.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the field of outdoor education, learning evaluation is a challenge, as the conventional method, which is subjective evaluation, can be inaccurate. This study attempts to overcome this challenge by evaluating learning using motion sensors. Considering that the moment when a learner moves his or her head during a lecture encompasses important information related to concentration, we measure the concentration of the learner using the angular velocity information of his or her head. However, it is not clear based on the information obtained from only one learner whether the cause of the movement is actually a change in his or her concentration level. Therefore, we propose a method for distinguishing the concentration of learners by comparing head movements in the entire learning group. To confirm the effectiveness of the proposed method, we measured the angular velocity of the learners’ heads in outdoor education by attaching an angular velocity sensor to them. In the lecture-induced gaze direction, the head movement timing of many learners coincided after gaze induction timing. In addition, we found that the learners with higher levels of concentration did not move their heads, except at the time of gaze induction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Pitching in the correct form is essential for preventing injury and improving skills. It is not easy for athletes and instructors to check whether a pitcher is throwing in the correct form. In this study, we record a pitcher from the direction of the catcher by a monocular camera and estimate the skeleton pose of the pitcher by using OpenPose. We propose a new method to evaluate whether the pitcher can pitch in the correct form by examining the estimated pose. We use SSE(Shoulder, Shoulder, Elbow)-line as an evaluation index. When the upper body of the pitcher faces a batter, the SSE-line should be straight. To find the right frame at which the pitcher body turns squarely to the batter, the distance of the shoulders in a video frame is used. When it becomes the largest, the shape of SSE-line should be measured. Since the motion of the pitcher was fast, we use a 240 fps camera to investigate the relationship between the shape of SSE-line and the shoulder distance. The relationship between the shape of the SSE-line and the shoulder distance with the 240 fps camera was evaluated, and we discussed their pitching properties based on the evaluation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Light Field (LF) image/video data provides both spatial and angular information of scene but at the cost of tremendous data volume for their storage and transmission. At the moment, the MPEG Multi-view Video Coding (MVC) is one of promising compression solutions for LF video data, so it deserves much investigation for better prediction structure to effectively reduce the redundancy in LF video data. Several prediction structures have been investigated but only with limited experimental evaluations due to lack of dataset and non-identical testing configurations. This practical problem can be mitigated now by availability of new datasets and common test condition recently proposed by MPEG. As the first step for designing a good compression method for LF video data, in this paper, we evaluate the performance of existing prediction structures for MVC-based LF video coding methods following the MPEG common test condition and its dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Cleaning is inseparable in life, but it is impossible to see with the naked eye where the room was actually cleaned. For this
reason, if information on the location where the cleaning was performed cannot be shared when cleaning by multiple
people, there is a possibility that an unclean area is remained. Therefore, if Augmented Reality (AR) can be used to
visualize the passing area of the hand or cleaning tool being cleaned, it will lead to improve cleaning efficiency and increase
motivation by visualizing the cleaning area. The purpose of this research is to obtain and superimpose the location
information of the passing area using Simultaneous Localization and Mapping (SLAM) in order to visualize the passing
area of the hand or the cleaning tool using AR.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The availability of pedestrian location estimation is one of the critical issues to realize a reliable navigation system for pedestrians in daily scenes. We propose a new pedestrian location estimation system that utilizes both the image-retrieval approach we have developed and a SLAM (Simultaneous Localization and Mapping) approach. Both approaches need only one single camera unit as a sensor, and the location is estimated by computer vision technology on both approaches. The problem here is that high processing cost is required to operate two approaches simultaneously. It could be impractical to run these two on a single wearable computing unit. We solve the problem by executing the two approaches on two separate computers that are connected with a computer network. We have implemented a preliminary system that unites the two approaches in hybrid fashion over two computers. We measured its performance in typical daily scenes on our campus. The result is promising for further implementation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In High Efficiency Video Coding (HEVC) standard, the best intra prediction mode is decided by choosing the smallest ratedistortion cost of actual encoding among the total of 35 modes with the MPM (Most Probable Mode) scheme for compression purpose of mode encoding with reference to the adjacent reference blocks of the current prediction unit. This causes heavy computational complexity. In this paper, a deep neural network is conceived and experimented as a probable module for the intra prediction mode decision process inside of the HEVC encoding scheme. The neural network is trained and tested with a ground-truth dataset constructed from actual HEVC Intra encoding of original images. For the performance of the test, accuracy is used as the percentage of the correct mode output by the designed neural network to the ground-truth mode. The experimental results show that the neural network does not give good accuracy for the correct mode. However, accuracy increased when similar angle mode is considered as the correct mode. Also, the special modes of DC and Planar for MPM are analyzed in this paper.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It is very important to assess buildings that have been subjected to earthquakes to determine their safety. In some regions, the emergency safety evaluation should be conducted within 24h after a huge earthquake has occurred. Some structural health monitoring systems enable rapid evaluation; however, they generally require many vibration sensors. Our research group studied a video-based micro-vibration measurement system that can evaluate the safety of buildings without vibration sensors. We propose an estimation method of the camera fluctuation for the video-based micro-vibration measurement system. The proposed method estimates the camera fluctuation as a global movement across the entire image. Thus, the method finds a group of pixels with a mode of spatial motion using the time difference of the spatial phase. Then, the time-variant signals of the mode pixels are estimated as the camera fluctuation. We found that the proposed method can estimate the camera vibration frequency under conditions where multiple objects exist within the angle of view.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The technique of dealing with point clouds can be applied in a wide range of fields. In archaeological and historical research, some of the methods are very useful to analyze three-dimensional features of point clouds obtained by three-dimensional measurement of artifacts excavated from ruins. Jomon potteries have various shapes depending on the areas and ages of production. In order to investigate the characteristics of the potteries, analysis methods of the rim parts and surface patterns are required. Generally, rubbing is a technique for copying patterns by applying ink to the paper placed on the surface of the pottery. This is a popular but manual method to copy the patterns of pottery surfaces. On the other hand, if the surface patterns of Jomon potteries can be extracted as digital 3D point clouds, the risk of breakage or soiling of potteries can be reduced. In addition, 3D coordinate point clouds have the advantage that they can be segmented for analyzing necessary parts. Photogrammetry can obtain a 3D coordinate point cloud without contact to an object. Photogrammetry is a technique that can obtain a three-dimensional point set from many photographs of an object taken from various angles and by calculating camera parameters. If earthenware is photographed, not only earthenware surfaces but also its inside and rim can be converted into 3D data at the same time. Introducing photogrammetry avoids manual creation and adjustment of 3D models of the earthenware. However, to use the measured point cloud in the later process such as pattern extraction, it is necessary to segment the surface, mouth rim edge, and inside. In this paper, we propose a method for segmenting an earthenware point cloud obtained by photogrammetry into outer and inner parts, an as an application, we examine a method for extracting the surface pattern in the segmented outer point cloud.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In background subtraction, principal component analysis (PCA) based algorithm has shown remarkable ability to decompose foreground and background in video acquired by static camera. The algorithm via closed form solution of L1-norm Tucker-2 decomposition is one of the real-time background subtraction algorithms. The closed form solution can be obtained from linear combination of video frame vectors and coefficient vector which composed of only +1 and -1. However, since the optimal coefficient vector is unknown, the method cannot help to be a complicated combinatorial optimization problem, when the number of input frame is large. In this paper, to solve this problem, Bayesian optimization (BayesOpt) which is a black-box derivative-free global optimization based background subtraction method is proposed. This method finds the optimal coefficient combination without considering the linear combination of all possible coefficient-combinations, using Bayesian statistical model and Expected Improvement (EI) acquisition function. Here the Bayesian statistical modeling is the method that measures the uncertainty of unsampled coefficient combination points and the EI function is a surrogate function which indicates the next sampling coefficient combination points. The experimental results confirm the efficiency of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study proposes a method generating a 3D model of furniture from 3D point cloud data of a room captured by RGBD camera in order to realize the layout simulation of the real room with furniture.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As an important subtask of video restoration, video super-resolution has attracted a lot of attention in the community as it can eventually promote a wide range of technologies, e.g., video transmission system. Recent video super resolution model1 achieves cutting-edge performance. It efficiently utilizes recurrent architecture with neural networks to gradually aggregate details from previous frames. Nevertheless, this method faces a serious drawback that it is sensitive to occlusion, blur, and large motion changes since it only takes the previous generated output as recurrent input for the super resolution model. This will lead to undesirable rapid information loss during the recurrently generating process, and performance will therefore be dramatically decreased. Our works focus on addressing the issue of rapid information loss in video super resolution model with recurrent architecture. By producing attention maps through selective fusion module, the recurrent model can adaptively aggregate necessary details across all previously generated high-resolution (HR) frames according to their informativeness. The proposed method is useful for preserving high frequency details collected progressively from each frame while being capable of removing noisy artifacts. This significantly improves the average quality of the super resolution video.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a visualization method that takes ocean-satellite-images and fishery data for understanding the sea conditions when catch amount is high. We focus on ocean-satellite-images with date information and fishery data which is a triplet list of (date, catch amount, an ocean-satellite-image). We select the sea of Iwate prefecture in Japan as our target. Our method employs an autoencoder to calculate similarity distances between images. An autoencoder can be considered a pair of an encoder and decoder. An encoder transforms the high-dimensional input data into a low-dimensional feature vector and a decoder recovers the data from the feature vector. We employ a convolutional neural network model, giving our ocean-satellite-images to the input and output on the learning stage. As a result, a feature vector of each image can be calculated. After calculating the autoencoder, images are grouped by the features. First, each image is given one among “Positive”, “Negative” or “None” label based on catch amount. For each positive image, we find ten nearest neighbors in feature vector space. If the number of positive images in the group is greater than or equal to the threshold α ( α = 6 in this paper), we judge the group expresses sea conditions of high catch amount. The number of neighbors and the threshold value α are selected by trial and error. Our result shows that each extracted image group has high similarity and different groups are visually distinctive. We expect that our result is helpful for examining sea conditions in which catch amount will be high.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a novel approach to extract distinctive keywords from historical newspaper images without using character recognition. We converted an image of the text block on an entire newspaper page into a sequence of codes based on discretization of the feature vectors, an approach that eliminated the errors introduced by optical character recognition (OCR). This conversion makes it possible to analyze untranscribed newspaper images by using text-processing methods. We examined the daily occurrence of every tri-gram string, and extracted strings with a dense appearance as distinctive keywords. In addition, we highlighted articles that contain distinctive keywords as distinctive articles. The proposed method was evaluated on an archive of Japanese newspaper images published in the 19th century, and the results were promising.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Evaluation of an outdoor education program is generally based on questionnaires or evaluator observations, which require considerable manpower and time. Therefore, assuming that the gaze information of learners such as an object of attention and fixation time can be used for educational evaluation, we propose the method that detects the learners' head orientation during a learning activity to use the parameter which input gaze estimation system using detected head orientation. This can be achieved by sequentially calculating the position and posture of a small camera mounted on each learner’s head using a Structure from Motion (SfM) system. Besides, to match the features of key points extracted from stationary objects, this method also requires background images. The experimental results are obtained from a video recording of the actual field, a part of which is used to extract 16 input images. Furthermore, 781 input background images of the stationary objects in the educational field were captured after the learning activity was completed. We observed that among the 16 head-mounted camera images extracted every second, nine showed errors in a range of less than 7.13 degrees.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, we propose a control method of pseudo-force intensity, (PFI) using a software signal, in pseudoforce devices. Pseudo-force is a virtual tractive force, perceived by holding an asymmetrically accelerating vibrator. Asymmetrically accelerating vibrators were reported to vary PFI by altering the voltage. The voltage could be altered directly by employing this method with digital signals. The voltage was altered by using the pulse width modulation (PWM) control method to regulate the PFI variations. The PWM method controlled the output voltage by switching the input voltage for constant periods. Here, we used the pulsed signal to represent the PFI and change the voltage. PFI was evaluated as the experimental weights’ tractive force. The vibrator and weights were connected by a wire. Subjects held the vibrator, experienced its pseudo-force, and recorded their experiences. The duty ratios of the pulse signal were varied for altering the voltage. The PFI increased with an increase in HFP duty ratio. PWM could effectively control PFI, according to the alteration in the voltage.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, some virtual reality (VR) techniques are introduced for learning local histories because VR can reproduce old buildings and terrain at a low price, and many people can experience local history through such VR. One of the problems in creating VR for historical cityscapes is the difficulty of the preparation of 3D landscape models. Therefore, this paper proposes a method to create 3D terrain models from handwritten maps that do not contain any numerical terrain information such as height and gradient for reproducing VR contents of historical cityscape. First, a handwritten map is converted into an image. Second, the features of topography such as the height of the mountains and gradient of mountains are estimated as the numerical terrain data. Finally, according to the above steps, a 3D model for VR contents can be created by using the contours and the features of the estimated topography. In this paper, the 3D model created by the proposed method is shown as a result.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a method to estimate the height of nonadjacent pieces of earthenware. To estimate the height of a piece of earthenware without finding its adjacent pieces, a point cloud in the cross-section of the earthenware is obtained, and the obtained point cloud is approximated by an elliptic curve. An ellipse approximated from the cross-section of one piece is compared with an ellipse approximated from the cross- section of another piece, and the relative height of the pieces is estimated from the positions of the most similar ellipses.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this research, we propose the 3D measurement system combining structured light and speckle based pose estimation by introducing two different setting cameras. The proposed system consists of two lasers, spot laser and line laser, and two cameras, with and without lens, which can obtain both focused and defocused images at once. Local shapes are measured using focused images by a structured light method. 3D positions of points projected by laser are calculated by triangulation. Pose changes are estimated from speckle information using defocused images. Displacements of speckle patterns are detected as optical ow by Phase Only Correlation (POC) method. Pose changes are estimated from speckle displacements by solving equations derived from the physical nature of speckle. The target shape as a whole is reconstructed by integrating local shapes into common coordinates using estimated pose changes. In the experiment, the texture-less at board was measured with motion. From the experimental results, it is confirmed that the shape of the board was reconstructed correctly by the proposed 3D measurement system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In urban development, it is important to make a plan that takes into account the changes in the appearance of natural objects after decades. This study proposes a simulation method of tree growth for the prediction of the appearance change of natural objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Abstract – It is a challenging issue to generate avatar’s natural facial expressions from user’s facial images. One of the most difficult problems is to analyze user’s facial images, and estimate his emotions. We have already proposed a method that estimated user’s emotions combining both analysis on user’s facial images taken from a video camera and measuring heart rate information, pNN50. Here, pNN50 is a percent of difference between adjacent heartbeat intervals greater than 50 ms calculated from heart rate information. Each user's emotion was estimated as either of ‘Joy’, ‘Disgust’, ‘Sadness’, or ‘Anger’. Our previous method estimated user’s emotions and generated appropriately avatar’s face images. However, further improvement of measurement of each emotion is needed for the natural facial expressions of the avatars. In this paper, we propose a new method which measures the skin conductance level (SCL) by using a biometric sensor. We focus on “Russell's circumplex model of affect”, in which each emotion is described as a 2D vector of the ‘valence’ and ‘arousal’. Our method calculates the ‘valence’ by measuring pNN50, and the ‘arousal’ from the SCL. The four quadrants of “Russell's circumplex model of affect” correspond to ‘Joy’, ‘Disgust’, ‘Sadness’, and ‘Anger’, respectively. Based on results of multiple biological information, i.e., pNN50 and SCL, the proposed method estimates the accuracy of emotions and generates the avatar's natural facial expressions by adding comic symbols (called “Manpu”) successfully and appropriately. “Manpu” can emphasize emotions. By using “Manpu”, human emotions can be more easily communicated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Artificial Intelligence and Interdisciplinary Research II
Many public bus services have their timetable which provide time information of arrival and/or departure at waypoints along the route. On the other hand, there are some bus services that do not have fixed time schedule, such as in Ulaanbaatar, the city of Mongolia. In this case, a lot of confusion occurs for the passengers. Prediction of bus travel time can help to provide services such as efficient scheduling for the passengers of their trips by avoiding to wait a long time. For this purpose, we investigate some machine learning methods to predict bus travel time. Concretely, for bus travel data, we employ three regression methods: linear regression (LR), support vector regression (SVR) and artificial neural network (ANN) to predict travel time. The performances of these machine learning methods are estimated and compared using conventional measures such as mean absolute error and root mean squared error. In a quantitative study, the artificial neural network is the best model having errors less than 1 minute in most cases. We also performed a qualitative study to investigate the details of our prediction results by using heatmap visualizations. Our visualization results offer easily grasping the tendency of travel time and error values.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For general lossless data compression in information theory, researchers have repeated expansion of stochastic models to express target data and design of codes for the expanded models. In this paper, we apply this approach to lossless image compression. We expand an auto-regressive hidden Markov model to a 2-dimensional model to express images containing single diagonal edge. Then, we design a Bayes code with an approximative parameter estimation by variational Bayesian methods. Experimental results for synthetic images show that the proposed model is sufficiently flexible for the target images and the parameter estimation is accurate enough. We also confirm the behavior of the proposed method on real images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In video transmission, the videos are encoded and decoded. At that time, bit control is performed by specifying the quantization parameter (QP). The video undergoes various processing to remove redundancy and then orthogonally transforms the video signal into the frequency domain. The frequency domain coefficients are then quantized and transmitted. At that time, by specifying QP, the quantization step is changed, and the amount of data can be changed. In an opinion, a codec using super-resolution is proposed. At the CNN based super-resolution of encoded images, the degradation of the input image due to encoding depends on the characteristics of the image. As a result, there is a problem that the weights of the optimal CNN for the input image changes depending on the image characteristics. In order to solve this problem, we propose a method to adaptively perform super-resolution corresponding to image degradation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The ultrasound examination is a difficult operation because a doctor not only operates an ultrasound scanner but also interprets images in rea time, which may increase the risk of overlooking tumors. To prevent that, we study a liver tumor detection method using convolutional neural networks toward realizing computer-assisted diagnosis systems. In this paper, we propose a liver tumor detection method within a false positive reduction framework. The proposed method uses YOLOv3 [1] in order to find tumor candidate regions in real-time, and also uses VGG16 [2] to reduce false positives. The proposed method using YOLOv3 [1] and VGG16 [2] achieved an F-measure of 0.837, which showed the effectiveness of the proposed method for liver tumor detection. Future work includes the collection of training data from more hospitals and their effective use for improving the detection accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Path planning is a well-known problem in mobile robot. The robot needs to move from the starting point to the destination point and avoid obstacles. To create an agent that able to reach some destinations in the given environment, such agent need some abilities to processing a collection of information obtained by its sensors and roam freely in the environment. In this paper, we design mobile agents to solve local path planning problems in 3D environment by using Evolutionary Neural Network (ENN) algorithm. ENN combines Evolutionary Algorithm (EA) and Neural Network algorithm. We chose Genetic Algorithm (GA) for the EA part and designing a simple feed forward neural network for the neural networks part. We evaluate what kind of ENN configuration values that works best in a local path planning problem. Experiment results show that the lowest iterations rate is 1.8 with one hidden layer and 50 hidden nodes when the population size is 50.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the advantage of having a large field of view, fisheye cameras are widely used in many applications. In order to generate a precise view, calibration of the fisheye cameras is very important. In this paper, we propose a method of extrinsic parameters calibration of multiple fisheye cameras working in man-made structures. A Manhattan Worlds space assumption is used, which describes man-made structures as sets of planes that are either orthogonal or parallel to each other. The orientation of the cameras is obtained by extracting vanishing points that denote orthogonal principal directions in different images captured by the cameras at the same time. With the proposed method, the calibration of extrinsic parameters is very convenient and the system can be recalibrated remotely.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To detect abnormal breathing in a patient, it is necessary to take an image of the patient's chest and abdomen and analyze their movement. The ultimate goal of this study is to develop a system that determines the triage level by measuring the 3D deformation of the patient's abdomen and chest in real time and analyzing the results. This time, considering that there is little texture information in the abdomen and chest, we focused on Moire analysis that is effective for 3D measurement in such cases. In this research, dynamic 3D measurement using image sequences is necessary. Therefore, the sampling Moire method, which is often used in recent years because of high calculation efficiency, was adopted. In the Moire analysis, it is necessary to determine the absolute fringe order of the Moire fringes by another method. In static measurement, the distance to the reference fringe can be measured in advance, but it is difficult in dynamic measurement. Therefore, we propose a method of projecting spot light together with slit light for Moire analysis and using the dynamic stereo method together. It can be adaptively adjusted for each frame so that the spot light irradiation position on the object is on the reference stripe. The depth distance to that position can be calculated using the dynamic stereo method, so 3D measurements can be performed independently for each frame. The effectiveness and accuracy of this method were confirmed through experiments on real image sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have designed an interface that allows a guitar player to manipulate another channel, such as the manipulation of visual effects, while playing the guitar. This interface connects passive and active performance seamlessly. Our goal is to create an interface that immediately reflects the performer's intentions without turning away from the guitar playing motion. We propose a method to distinguish 3 picking states (Chord stroke/Single note/Non-picking) by sensing pressure on the pick and the rotation angle of the player’s index finger. Our brief experiment showed that our method can correctly identify the picking state with high accuracy. We also introduce an application example which manipulates visual and sound effects through our interface.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a framework, the CSSNet to exchange the upper clothes across people with different pose, body shape and clothing. We present an approach consists of three stages. (1) Disentangling the features, such as cloth, body pose and semantic segmentation from source and target person. (2) Synthesizing realistic and high resolution target dressing style images. (3) Transfer the complex logo from source clothing to target wearing. Our proposed end-to-end neural network architecture which can generate the specific person to wear the target clothing. In addition, we also propose a post process method to recover the complex logos on network outputs which are missing or blurring. Our results display more realistic and higher quality than previous methods. Our method can also preserve cloth shape and texture simultaneously.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a semantic scene modeling technique for constructing a cloud-based aquaculture surveillance system using an autonomous drone. The emergence of low-cost drones has created opportunities to find new solutions for a number of problems of computer vision and artificial intelligence based internet-of-things (AIoT). However, vision based activity detection using a mobile RGB camera still remains as a challenging task since the activities in different regions of the scene to be monitored are quite different. Moreover, the sizes of detected objects using a drone are often very small. In this work, the 3D model of an aquaculture environment is first constructed using the calibrated intrinsic camera parameters, the depth maps and the pose parameters of frames in the captured video using a drone. Next, our semantic scene modeling algorithm represents the visual and geometrical information of the semantic objects which defines the checkpoints for routine data gathering and environmental inspection. To associate each checkpoint with the GPS signal and the altitude value of the drone, our approach combines the automatic drone navigation, computer vision and machine learning algorithms to detect the checkpoint specific activities. The scene modeling algorithm transfers the essential knowledge to the mobile drone through the aquaculture cloud for monitoring the fish, persons, nets and feeding systems in an aquaculture site on a daily basis. Thus, the drone becomes a flyable intelligent robot that helps the manager of an aquaculture site to automatically collect valuable data that are important in optimization fish production using further decision making algorithms. Experiments show that our approach attains very high performance yielding significant semantics-based activity recognition accuracy without sacrificing the operation speed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Diminished Reality (DR), which removes real objects visually, is expected in various scenes. It is, however, difficult to remove an object in some scenes with textureless backgrounds. One of its main factors is the difficulty of estimating geometric information using feature-point-based approaches. This paper, therefore, proposes a method to realize DR in textureless scenes based on feature lines. Missing backgrounds are filled with each of divided background planes based on geometric information. The background is divided by boundaries of the background planes obtained by using Line Segment Detector (LSD), Hough transformation and the condition for concurrency of three straight lines. A removal object is tracked by feature point matching. Missing backgrounds are filled by using image inpainting only in the first frame. In the second frame and after, removal regions obtained by projecting background boundaries are overlaid by the complemented background image. Proposed approach achieves removing an object in textureless scenes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We are investigating a system that provides a computer graphics (CG)-based television program to viewers by sending a script to the terminal using CG and voice data. In this study, we assess the viewer satisfaction of CG programs produced using a text-based TV program making language (TVML). To verify whether CG programs can be used as a substitute for real programs, we conducted a comparative evaluation experiment between real and CG programs. In addition, to verify the best device for CG program viewing, we compared the same CG program viewed on a PC display and a smartphone. The results suggest that CG programs are acceptable substitutes for real news and information-related programs, and that smartphones might be more suitable than PC displays for viewing CG programs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Acquiring microscope operation skills has been contained by Japanese secondary education. Although many chances of observation with the microscope operation must inspire students to study deeply science, the curriculum allows short time and small budget. In this paper, a virtual reality (VR) system of self-learning for the microscope operation is proposed, which makes inexperienced students equip fundamental skill of the operation. The system, composed by a computer, a head-mounted-display (HMD) and a dial type input device, display the student a virtual microscope in the center of the virtual world. In the virtual world, the student can manipulate the virtual microscope with controllers and a dial type input device, as well as he/she can see the procedures of the operation. The operation to focus with focus knob is especially important in the procedure, so the system adopts a dial type input device to experience under the condition more similar to the reality by the sense of touch. The system displays how to operate a microscope by visual information in texts and figures and judges if user’s operation is correct by comparing result of user’s operation and the correct one stored in the system such as the position and inclination of virtual objects. If the user operates the virtual microscope incorrectly, the system represents an alert with audiovisual stimulation. The system is expected to help users to learn microscope operation skills practically at relatively low cost.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
While Japanese calligraphy has been contained by Japanese elementary and secondary education curriculum, teachers in charge of the calligraphy education are not always proficient, since skill of the calligraphy is unnecessary for teachers’ license. To overcome the problematic situation, not only training of teachers but also an ICT-based support system for the students to learn the skill by themselves. Therefore, this paper proposes a system that combines a self-study support system for inexperienced people to learn Japanese calligraphy skills, and a virtual writing system for practicing calligraphy without using ink in a virtual environment. The proposed system is a kind of the augmented reality (AR) system, which consists of a computer, a head-mounted-display (HMD) and a non-contact motion sensor. Firstly, the system introduces AR technology into visualization of expert’s motion data, which is previously recorded from an expert’s action. Secondly, on the system users practice calligraphy by imitating expert’s brush motion on virtual paper. To obtain user's motion, the motion sensor obtains the position and tilt of the brush at each frame. Simultaneously, the system simulates the handwriting from these data and superimposes it on the tip of the brush using AR technology. After user’s practice, the system calculates the differences in brush position, tilt and speed between expert’s brush motion and user’s brush motion by using DP matching and encourages user to improve his/her motion, finally. This system is expected to overcome problematic situations in calligraphy practice.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study was an experiment that looked at the possibility of VR relationships with the frequency of gamelan
in animated puppet shows that were considered based on the blocking instrument. On the other hand, gamelan instruments
provide the clearest picture of the frequency spectrum that is affected by various types of treatment, regardless of the ability of
the instrument itself to produce sound in many characteristics of each instrument. The key experiment in this study was to
arrange the position of the instrument in a different semicircle with a conventional arrangement that is lined up. This
experiment is believed by the author to bring its own impact in relation to VR. Gamelan are used Slendro and pelog to get two
different data. It can be assumed that this blocking semicircular is able to produce a more focused frequency distribution for
the proportion of the listening room and VR users. And in connection with the experiment on blocking semicircular it can be
concluded that sound is more effective and easier to identify.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of computer technology and the advent of the information age, diverse medical imaging devices are emerging. However, limited by imaging principles, single-mode images have their own advantages and disadvantages, and it is difficult to fully express all practical information, causing the limitations of diagnosis. Accordingly, medical image fusion is inevitable trend which could integrate or highlight the complementary information, achieve enhanced image quality, reduce redundancy, and provide a reliable diagnosis. In the past, there were many methods that were proposed, but the effect was largely dependent on the experimental data. Based on this, in this study, we proposed a new image fusion method based on discrete wavelet transform (DWT) and fuzzy radial basis function neural network (FRBFNN). First, we analyzed the details or feature information of two images to be processed by DWT. Here, we used a 2- level decomposition, so that each image was decomposed into 7 parts including high frequency sub-bands and low frequency sub-bands. Subsequently, for the parts of the same position of the two images, we substituted them to the proposed FRBFNN. So, with the operation of these seven neural networks, we obtained seven fused parts in turn. Finally, through the inverse wavelet transform, we could get the final fused image. For the training method of neural network, we adopted the combination of error backpropagation algorithm and gravity search algorithm. The final experimental results demonstrated that our method performed significantly better than other algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose an automatic segmentation method of meniscus using cascaded segmentation network consisting of 2D and 3D convolutional neural networks and 2D conditional random fields in knee MR images. First, 2D segmentation network and 2D conditional random fields are performed to narrow the field of view of the medial and lateral meniscus. Second, 3D segmentation network considering local and spatial information is performed to segment the medial and lateral meniscus. The 2D segmentation network showed under-segmentation inside the meniscus. The under-segmentation was prevented after 2D CRF, but over-segmentation occurred in nearby ligaments with similar intensity. The 3D segmentation network prevented under- and over-segmentation due to considering local and spatial information, and showed the best performance. The average dice similarity coefficients of proposed method were 92.27% and 90.27% at medial and lateral meniscus, showed better results of 4.78% and 9.96% at medial meniscus, 3.94% and 9.58% at lateral meniscus compared to the segmentation method using 2D U-Net results and combined 2D U-Net and 2D CRF, respectively. The medial meniscus shows higher accuracy than the lateral meniscus due to less leakage into the collateral ligament.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computer aided diagnosis (CAD) system has been proven to be useful in clinical routine. However, different kinds of software installed in different machine limits the widely usage of CAD by doctors. We transfer our previous CAD system on liver disease into a web based program to enable users to diagnose potential hepatic abnormalities through internet, by using XOJO platform which is easy to make web application under BASIC programming language and provide virtual server function when in running mode. There methods for tumor classification are investigated on web programming: GLCM-ANN, Restricted Boltzmann Machine (RBM) w/o edge computing. The result shows that deep learning has better performance to conventional ANN, but the large weight matrix is a big burden of web response speed without edge computing. Our CAD system can be easily open on different OS and any location with network connection. Such convenience makes diagnosis less time consuming while significantly collecting datasets via internet from different hospitals or even patient him/herself.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Nowadays, social networking sites (SNS) are used for posting and seeing beautiful photographs. Although many visitors click photographs in theme parks and post them on SNS, finding appropriate photo spots using SNS is not easy because of the enormous number of posted images. This study proposes recommendation algorithms that will help visitors find appropriate photo spots. As test case, we chose Tokyo Disneyland (TDL), which posts its photographs on Twitter. Twitter characteristically merges several photographs into a single image called a collage, which shows the excursion history of visitors. Based on these histories, we apply a collaborative filtering algorithm, which recommends photo spots to visitors. Before designing a photo spot recommendation system for theme parks, we must know the intentions and preferences of theme park visitors. To acquire this knowledge, we conducted a questionnaire survey. The results suggested that male subjects prefer scenic and pose-friendly photo spots, whereas female subjects tend to choose spots that render them “instagrammable,” “beautiful,” and “cute”. Based on these findings, we categorized photo spots inside TDL and created a prototype recommendation system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Keyword searches are generally used when searching for illustrations of anime characters. However, keyword searches require that the illustrations be tagged first. The illustration information that a tag can express is limited, and it is difficult to search for a specific illustration. We focus on character attributes that are difficult to express using tags. We propose a new search method using the vectorization degrees of character attributes. Accordingly, we first created a character illustration dataset limited to the hair length attribute and then trained a convolutional neural network (CNN) to extract the features. We obtained a vector representation of the character attributes using CNN and confirmed that they could be used for new searches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An efficient system to support referencing external information is proposed. External information is information that is not displayed in the original work but is related to work. We employ an annotation system to display information to the user. We are developing Wappen (Web-based Annotation Appending and Sharing Framework) to provide annotation of any scene on a primary terminal, and reference prepared scene annotations on a secondary terminal (when the user is viewing a scene on the primary terminal). Experimental results indicate that the time required to obtain the desired information by viewing annotations using two terminals is less than that using a single terminal. The results of a subjective evaluation show that using two terminals obtained higher evaluation than using a single terminal.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a warping-based motion compensation technique for video coding, which adaptively selects the translation-based and warping-based compensation for triangular patches formed on a target frame. Over-all MC efficiency is improved by introducing several techniques employed in HEVC, e.g., recursive patch partitioning, differential MV coding, and merge mode. In addition to these, a partial merge mode is newly introduced to further improve the MC efficiency. The advantage of the proposed technique is demonstrated by comparing it with an existing affine-based MC technique.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A thermal image taken by a thermal camera and an RGB color picture were arranged in a dataset pair; the datasets were learned using a deep learning algorithm called pix2pix. After the sufficient training in the machine learning, it was possible to generate a thermal image from a color image taken with a digital camera. This paper provides a new method of generating thermal image without a thermal camera; the thermal camera is required when we create training materials. Furthermore, a method for detecting abnormal temperatures using deep learning is proposed. Features of thermal images are concerned and evaluated the results of our method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a method to make children’s playground equipment more thrilling using VR technology. A user plays a playground equipment while wearing smartphone VR goggles. Another smartphone is put on the playground equipment to acquire its movement. The same playground equipment used in the real space is prepared in the VR space, and the user can play with it in the VR space as well as the real space. At this time, the size and the movement of the playground equipment can be changed in the VR space. By increasing the size or accelerating the movement of the playground equipment, the equipment for children can become more thrilling in the VR space while feeling the actual movement of the playground equipment. We implemented a prototype system and tested our method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Because patients are often dissatisfied after total knee arthroplasty (TKA), we offer an improved protocol with a three dimensional (3D) joint instability analysis system that generates basic data for navigation during TKA. The system detects knee joint instability on images and analyzes it using a weight/moment ratio provided by a six-axis force sensor. We used an Intel RealSense depth camera to acquire 3D template data of the knee joint. We then searched for similar data in a 3D point cloud of data of the entire leg, thereby locating the knee joint in that cloud. We established a template shape using two algorithms, RANSAC and ICP, and then measured knee joint angles. By setting the knee joint detection result as the center position of the 3D point cloud of the entire leg, the point cloud data could be divided into thigh and shin portions. The thigh and shin axes created by each obtained point cloud were then determined, and an angle with two degrees of freedom formed by the two axial directions was calculated. This angle was considered the measurement result of the knee joint angle. Experiments using real images confirmed that our method has sufficient accuracy for this navigation system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Versatile Video Coding (VVC) is a new state-of-the art video compression technology that is being under standardization. It targets for about two times higher coding efficiency against the existing HEVC, supporting HD/UHD/8K video and high dynamic range (HDR) video. It also targets for versatilities such as screen content coding, adaptive resolution change, and independent sub-pictures. To develop an effective coding method for chroma intra prediction mode, in this paper, we investigate its binarization process in CABAC (context adaptive binary arithmetic coding) and test a method which assigns shorter bins to more frequent chroma intra modes and longer bins to the less frequent ones based on the chroma mode statistics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Three-dimensional computer graphics (3DCGs) of disaster scenes can be a visualization method of the situation of disaster areas. The 3DCG created from photographs taken by victims or rescue teams is a useful tool for disaster relief. We propose a system based on the Structure from Motion (SfM) technology to reconstruct the 3D models of disaster areas. And it's also based on the attached GPS data within collected photographs taken by smartphones at the disaster sites to localize the reconstructed 3D models on the global map. We describe how it registers 3DCGs to a global map and how it estimates the positions and orientations of 3DCGs here. Moreover, users can edit the reconstructed 3D models with a digital pen and a dot screen in this system; use a projector to project 3DCGs on the dot screen. This paper shows the use of GPS information and presents the simulations that took place in the disaster experience facility Sona Area Tokyo of The Tokyo Rinkai Disaster Prevention Park" and the 921 Earthquake Museum of Taiwan in the National Museum of Natural Science."
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In current pose estimation studies, bottom-up approaches estimating the pose from detected parts of the bodies and their relations have been major methods. This approach enabled us fast and accurate pose estimation. In other hands, there are many fields that the pose estimation could be applied. In this paper, we propose a bottom-up method for excavator pose estimation. For pose estimation, we generate ground truth confidence maps according to the annotation and evaluate loss function for training. We evaluated the model by calculating the loss between the estimated map and true map.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
OpenCV is an open source programming library for computer vision and image processing, and has been used worldwide in industry and academia as a de facto standard for more than 10 years, from large-scale computers to embedded devices such as smartphones. OpenCV has image data representation classes cvMat and cvUMat as image processing APIs, and the latter can implement parallel computation using heterogeneous computing such as the OpenCL framework. Operator overloading is a technique that enables intuitive programming using arithmetic operations and assignment operators, and is defined in cvMat. On the other hand, operator overloading is not defined in cvUMat, and it is necessary to call functions appropriately. Therefore, programming using cvUMat is difficult, and there is a problem that the code is not compatible with cvMat using operator overloading. In addition, cvMat operator overloading has a problem of extra memory reallocation at runtime, so it is not appropriate to apply it directly to cvUMat. Therefore, in this paper, we propose a method to realize operator overloading that does not require extra memory reallocation at runtime and can be equivalently converted to cvUMat function calls at compile time. This method enables programming that achieves intuitive operator overloading without any overhead at runtime.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Motion blurs inevitably occur in an image photographing fast moving objects, and its removal, known as motion deblurring, is one of the most well-known ill-posed problems. In this paper, we investigate the deblurring problem of motion blurs by using a modulated external light. Noting that the motion blurs depend both on ambient light and modulated external light, we investigate how to design a motion deblurring method considering not only the external light but also the ambient light. The deblurring performance of the proposed method is compared to that by the conventional method which only considers the external modulated light.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a novel algorithm is proposed using speckle reducing anisotropic diffusion (SRAD) and gradient domain guided image filtering (GDGIF) to reduce speckle in synthetic aperture radar (SAR) images. SRAD is suitable for reducing multiplicative noise in SAR images because it can directly process log-compressed data. Since GDGIF has edge-aware weighting, it is adaptively applied to SRAD result images to additionally reduce speckle noise. Experimental results demonstrate that the proposed algorithm, compared to existing filtering methods, shows excellent speckle noise reduction performance and a low computational complexity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The spread of broadband networks has resulted in the spread of countless videos on the internet. Advances in video analysis technology make it possible to extract more exact metadata and make it easier to find the video we want. However, search services are not suitable for ambiguous affective video searches in which a specific query cannot be given, such as whether you want to watch while relaxing. To solve this problem, this paper considers video search method that can handle such ambiguous requests by utilizing existing search services and propose a method of replacing ambiguous requests by queries that are a group of multiple metadata of something concrete such as names and situations of objects. This paper performs classification experiments with three ambiguous requests using multiple metadata automatically attached to the videos by Google Cloud Video Intelligence and so on, and confirms that automatic classification by machine learning performs close to manual classification. This suggests that the proposed method is feasible.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Although many studies of iterative based reconstruction algorithm have been performed on dose reduction in medical applications, there is little research on how to approach the systematical acquisition method to reduce the dose. In this study, we proposed a hybrid imaging technique using half ROI and full view scan as a new acquisition method. A prototype of the CT system (TVX-IL1500H, GERI, Korea) was used. The source-to-detector and the source-to-center of rotation distance were 1178 and 905 mm, respectively. A total of 720 projection data were obtained, from which full view CT reconstructed images were obtained as a reference image using a filtered back projection (FBP). Interior ROI CT images were reconstructed using the 720 truncated projection data. Proposed hybrid CT image was reconstructed using 360 truncated and variable number of nontruncated full-size projection data. We set a total of 6 acquisition parameters according to the number of full-size projection data and acquire hybrid images. Dose reduction can be achieved through half ROI and full view scan. As the number of nontruncated full-size projection data on the hybrid image increases, the CNR value increases. FOM values were also derived for each parameter. In conclusion, appropriate acquisition parameter for half ROI and full view scan was derived based on the FOM values.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to display holographic reconstructed images in colors, we must sufficiently study the characteristics of the images reconstructed by multiplex reconstructing process and spatially multiplex holographic projecting process using blue-violet laser light. But in general, in the reconstruction of the images, if we use the laser lights of the short wave length, their reconstruction seems to be more difficult. Moreover, as for the reconstruction adopted blue-violet color laser light, it seems to have been not reported so many results. For this, we study first the relation between the number of the points of the object and the reconstructed image quality by applying multiplex reconstructing process adopted blue-violet laser light. Next, we study a basic coloring process by applying spatially multiplex holographic projecting technique using blue violet laser light. In this paper, we shall report how effectively the holographic image reconstruction adopted blue-violet laser light (wavelength: 405nm) plays its role to present the image of high quality through time-sharing multiplex reconstruction and to enlarge color region of the reconstructed image through time-sharing spatially multiplex holographic projection.. In addition, we studied a process which produces, by time-sharing spatially multiplex projection on the screen, the two colored images by time-sharing multiplex reconstructing technique using two color lasers: blue-violet, red (or green). As this result, we have confirmed that single color multiplex reconstructing process, which recovers the divided parts of the objects formed with multiple points, is also applicable to time-sharing spatially multiplex reproducing process using two colors. This seems to suggest some possibility of the color region enlargement of the recovered holographic images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have been investigating a depth estimation system for real-time applications. Stereo camera method is too sensitive to slight variations of baseline length due to vibration and temperature. Additionally, it has occlusion problem. On the other hand, monocular camera method by focusing cannot provide a balance between wide-area estimation and real-time estimation. Therefore, we have proposed a method that adopts tilted lens optics. In this method, the plane of sharp focus (POF) lies and the depth of field (DOF) enlarges toward the depth direction. Herein, we can obtain depth values at each pixel from a ratio of the sharpness values of two tilted optics images using monocular camera system with spectroscopic. This advantage works for real-time application such as automotive tasks. In this paper, we introduce a novel method to realize a compact imaging device which is consist of only one pair of an image sensor and a tilted optics. We have adopted multi-aperture using color filter to achieve our proposal. By using not only a normal aperture but also an aperture with green color filter which is smaller than the normal aperture at the same position in the optics, we realize to obtain two type blur images for tilted depth estimation easily.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a method for recognizing products of different sizes that are sold in convenience stores. The need for robots to operate products in stores instead of employing people has recently been increasing. Such robots are required to estimate the 6DoF poses of more than 2,000 products to display and dispose of them. Previous methods for 6DoF pose estimation use three-dimentional computer-aided design (3D-CAD) models of objects. However, these methods require high computational costs because models need to be prepared for each object. Our method uses object shapes, i.e., “cuboid”, “isosceles triangular prism”, “cylinder” and “regular triangular prism”. The method consists of two modules. First module that generates candidates for the shape and size of an object. Second module that extracts an optimal hypothesis from hypotheses. First, in the method, many hypotheses for various sizes and 6DoF poses are generated using the numbers of surfaces of each shape and the positional relationship among these surfaces. Second, the sizes and 6DoF poses of objects determined in the hypotheses are evaluated in depth images. Finally, the optimal hypothesis is determined using validation module. We conducted an evaluation experiment to evaluate the of proposed method by generating 100 objects of different sizes in a virtual space and applying this method to them. The recognition rate of isosceles triangular prisms was 84%, that of cuboids was 93%, and that of cylinders was 94%. Thus, the objects were recognized without the need of using 3D-CAD models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Digital cameras are used in various scenarios; however, the sharpness of images captured outdoors may be reduced due to bad weather conditions such as fog and haze. Therefore, to obtain a clear image, it is essential to remove haze. In this study, we propose a haze removal method by separating the sky and foreground regions and applying different processes to the sky region because it has different features from the foreground region; this improves the visibility of the image. We assume that the sky region is a bright region that does not change much throughout the image and extract multiple sky region candidates, which are merged according to color distance. Next, we estimate atmospheric light and transmittance of haze. Atmospheric light is the light scattered in the entire image, and transmittance of haze is the amount of scattered light. The sky region determines the brightness of atmospheric light, and the impression of the entire image determines the color of the atmospheric light. Transmittance of haze is estimated by dark channel features and morphological operations. The conventional method uses a fixed-sized patch due to which a smooth transmission map may not be generated. Our method generates a smooth transmission map for any image by changing the patch size. The haze in the image is removed using the estimated atmospheric light and transmittance; however, the resulting image is dark for which brightness correction is performed and a clean image without haze is obtained.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep completion which predicts dense depth from sparse depth has important applications in the fields of robotics, autonomous driving and virtual reality. It compensates for the shortcomings of low accuracy in monocular depth estimation. However, the previous deep completion works evenly processed each depth pixel and ignored the statistical properties of the depth value distribution. In this paper, we propose a self-supervised framework that can generate accurate dense depth from RGB images and sparse depth without the need for dense depth labels. We propose a novel attention-based loss that takes into account the statistical properties of the depth value distribution. We evaluate our approach on the KITTI Dataset. The experimental results show that our method achieves state-of-the-art performance. At the same time, ablation study proves that our method can effectively improve the accuracy of the results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It is very difficult to recognize cursive characters (kuzushiji) in classic Japanese literature. One of the difficulties is that multiple characters are connected. In this study, we propose a method for correctly recognizing consecutive kuzushiji characters by using multiple candidate regions as input to a neural network. An evaluation using an image database of three consecutive kuzushiji characters demonstrated that the proposed method had a higher accuracy rate than a method in which character detection preceded character recognition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a simple refinement method for improving estimated depth information with deep neural network (DNN) from a single-view RGB image. There have been a lot of effective depth prediction methods using DNN, such as ResNet-UpProj[1]. However, such a learning-based method sometimes estimates unclear or uncertain depth information, especially on the periphery of edges, even if enough learning is done. This paper aims to improve the predicted depth of edge periphery by applying simple image processing based on the position information of edges in an original RGB image to the depth image. Experiments with NYU Depth v2 data-set[2] showed that our simple approach can decrease about 14.3% of the root mean square errors of depth.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the recent extension of camera applications, image filtering is essential in image processing. Weighted median filtering is one of the image denoising method. The weighted median filter can be more useful for removing noise and blurring correction; however, its computational cost is high. Halide is a domain-specific language for image processing. By using Halide, we can easily optimize the code of image processing. In this study, we present weighted median filter with Halide code. Experimental results show that we can easily write the weighted median filter code.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose an efficient single image super-resolution (SR) method for multi-scale image texture recovery, based on Deep Skip Connection and Multi-Deconvolution Network. Our proposed method focuses on enhancing the expression capability of the convolutional neural network, so as to significantly improve the accuracy of the reconstructed higher-resolution texture details in images. The use of deep skip connection (DSC) can make full use of low-level information with the rich deep features. The multi-deconvolution layers (MDL) introduced can decrease the feature dimension, so this can reduce the computation required, caused by deepening the number of layers. All these features can reconstruct high-quality SR images. Experiment results show that our proposed method achieves state-of-the- art performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Joint Video Experts Team (JVET) is developing a new video coding standard beyond High Efficiency Video Coding (HEVC) named as Versatile Video Coding (VVC). In VVC, various new prediction modes have been adopted compared to HEVC and Combined Inter-Intra Prediction (CIIP) is one of them. CIIP combines inter prediction and intra prediction with derived weights to form a final prediction. In the existing CIIP, the weights are derived from the prediction modes of the two adjacent blocks of left and above for combining the final prediction, and only planar mode is used as the intra prediction mode of CIIP. In this paper, we propose methods to enhance CIIP with more accurate weights for the combination of both predictions as well as extending intra prediction modes to be combined based on the adjacent blocks’ coding modes. According to empirical observations, the below-left block and above-right block are correlated with left and above blocks in terms of prediction mode, respectively. So, the first proposed method is to derive finer weight values by using prediction modes of more adjacent blocks up to 3 blocks of left, above and above-left from the 5 adjacent blocks used for deriving regular merge candidates. The second proposed method is to use intra-coded modes of adjacent blocks of left and above blocks which are used to derive MPM candidates to be combined instead of planar mode used in the current CIIP. Experiment results show that the proposed methods slightly improve the performance of CIIP in the VVC Test Model (VTM).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Synthetic aperture radar (SAR) aboard a satellite irradiates observation microwave pulse toward the earth and the microwave pulse arriving at the earth's surface scatters every direction. The backward scattering component of it is received on the antenna of the satellite. The satellite SAR allows us to observe the earth during 24 hours without being influenced by the weather because of this observation principle. On the other hand, since the satellite needs to irradiate the earth's surface not vertically downward but obliquely downward with the observation microwave pulse, radar-shadow area, at which the observation microwave pulse does not arrive due to obstruction of the other higher place, appears depending on geographical features. However, since the observation data of the satellite SAR is recorded in order of arrival at the receiver, in other words in order of distance from the emitter, it is quite difficult to extract the radar-shadow area directly from the observation data. Therefore, when we use the satellite SAR image to detect occurrence of a natural disaster, we might carry out useless calculation about a steep valley bottom region which has a higher possibility of occurrence of the land slide. This is because the steep valley bottom region is often included in the radar-shadow area, that is the no-data area. If we can know about the radar-shadow area corresponding to the latest observation data, improvement of the estimation precision of occurrence of a natural disaster by using satellite SAR image can be expected. Furthermore, since the SAR satellite flies on a fixed orbit, the information of the radar-shadow area is suitable to construct its database for permanent use. In this paper, we propose a method for efficiently constructing a database for radar-shadow cast by ALOS-2/PALSAR- 2 (a satellite operated by JAXA) by using high resolution (5 meters mesh) DEM data as a pre-process.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the autonomous driving systems, camera and LiDAR are widely used to recognize surrounding environment. For the fusion of the images and 3D point clouds, it is necessary to conduct calibration between the camera and LiDAR. The existing method utilize the checkerboards for calibration, and the corresponding planes are detected in the images and 3D point clouds of the checkerboards. The checkerboard planes in the images can be detected using the corners of grids on the checkerboard. However, the existing approach has disadvantages of manually extracting plans from 3D LiDAR point cloud to conduct calibration. Therefore, we propose method for automatically extracting planes from LiDAR point cloud using Iterative Closet Point. Also by defining new objective function for optimization, we show that the performance of calibration between the camera and LiDAR is improved in the experimental results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A recent image sensor in a camera contains millions of pixels. The pixel itself records only the intensity of a single light signal of red, green, or blue in the Bayer patterned sensor. When the camera moves position slightly, however, the different kinds of wavelength light signals can be recorded in a different pixel. In other words, we can complement the missing color signals in the Bayer pattern by the captured ray from the moved camera. Consequently, a full-color image can be generated without demosaicing, but the capturing method may cause a false-color. In our proposed method, we take more than 50 images to complement the pixels because the camera position is moved artificially rather than mechanically. Besides, we use the dense optical flow to track the movement of all pixels between the reference image and the other images. All images are converted from Bayer images to grayscale images to calculate the optical flow. Then, the pixels are linked according to the optical flow, the missing color pixel values in reference Bayer image are complemented from the other corresponding Bayer images. In our experiments, we generated sharper images than the demosaiced method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Halide is a domain-specific language for image processing. Halide can separate a programming code into an algorithm part and a scheduling part. We can write how to work image processing in the algorithm part, and how to compute it in the scheduling part. The scheduling part has a restriction, which does not change the calculation result. The restriction prevents efficient code generation for some kinds of image processing. In this paper, we propose high-performance recursive filters with Halide and OpenMP. The recursive filter is one of the difficult algorithms for optimizing code with Halide. In our implementation, we divided an input image into multiple tiles, and then each tile is processed with Halide code. Also, each processing for a tile in Halide is parallelized by OpenMP. The processing has an approximated computation in boundary conditions for forcefully cutting the effect from the next tiles; thus, the resulting image has slightly degraded. The closed tile, however, improves the cache efficiency in computation. In the experiment, the processing time of the box image filtering with and without tiling was compared. The box image filtering is the most simple recursive filter. The box image filtering uses an integral image technique to process in constant time, and its calculation includes recursive filters. Code with tiling, which was the proposed method, the tile size is 128×128. The experimental results showed that the proposed method had a better computational time performance than the code without tiling.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Source camera identification is a fundamental area in forensic science, which deals with attributing a photo to the camera device that has captured it. It provides useful information for further forensic analysis, and also in the verification of evidential images involving child pornography cases. Source camera identification is a difficult task, especially in cases involving small-sized query images. Recently, many deep learning-based methods have been developed for camera identification, by learning the camera processing pipeline directly from the images of the camera under consideration. However, most of the proposed methods have considerably good identification accuracy for identifying the camera models, but less accurate results on individual or instance-based source camera identification. In this paper, we propose to train an accurate deep residual convolutional neural network (ResNet), with the use of curriculum learning (CL) and preprocessed noise residues of camera images, so as to suppress contamination of camera fingerprints and extract highly discriminative features for camera identification. The proposed ResNet consists of five convolutional layers and two fully connected layers with residual connections. For the curriculum learning in this paper, we propose a manual and an automatic curriculum learning algorithm. Furthermore, after training the proposed ResNet with CL, the flattened output of the last convolutional layer is extracted to form the deep features, which are then used to learn one-vs-rest linear support vector machines for predicting the camera classes. Experimental results on 10 cameras from the Dresden database show the efficiency and accuracy of the proposed methods, when compared with some existing state-of-the-art methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to make a movable display projecting onto high-rise walls or arbitrary objects in physical space as a digital signage, a system, called a drone-projector, has been investigated with a beam projector mounted on a drone. The vibration during hovering caused by the motors of propellers in the drone brings distortion in the projected image. In this paper, we extend an existing sensor-based stabilization method by compensating different scaling of the projected image due to different distance of the drone-projector to the projected surface. Our experimental results show that the distortion of the projected image is made much attenuated by using the proposed stabilization method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the development of computer graphics and three-dimensional (3D) modeling technology, 3D model retrieval has been widely used in different applications, such as industrial design, virtual reality, medical diagnosis, etc. Massive data brings new opportunities and challenges to the development of the 3D model retrieval technology. However, with the emergence of complex models, traditional retrieval algorithms are not applicable to some extent. One important reason for this is that the traditional content-based retrieval methods do not take the spatial information of 3D models into account during feature extraction. Therefore, how to use the spatial information of a 3D model to obtain a more extensive feature has become a significant issue. In our proposed algorithm, we first normalize and voxelize the model, and then extract features from different views of the voxelized model. Secondly, deep features are extracted by using our proposed feature learning network. Then, a new feature weighting algorithm is applied to our 3D-view-based features, which can emphasize the more important views of the 3D models, so the retrieval performance can be improved. The experimental results on the standard 3D model dataset, Princeton ModelNet10, show that our model can achieve promising performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Inactive students is one of the problems that is often experienced by several campuses in Indonesia There are several factors that cause students to be inactive which can be categorized into two, namely not officially active and not official. Students are informal and inactivity can lead to other problems. Therefore, college students require active data processing to read and visualize trends that can be used as supporting data for policy making in order not conclude other problems. By using a data warehouse applications, where the ETL process is done as a backend process that can provide supporting data to record and process the data the students are not active. Visualization of inactive student trends is useful for the campus to help display data in an easily understood model. This aims to enable the campus to learn trends and help the campus determine strategic policies so as to reduce the number of levels of inactive students in lectures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.