PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 12527, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the era of the ever-increasing need for computing power, deep learning (DL) algorithms are becoming critical for accomplishing success in various domains, such as accessibility and processing of information from the quantum of data present in the physical, digital, and biological realms. Medical image segmentation is one such application of DL in the healthcare sector. The segmentation of medical images, such as retinal images, enables an efficient analytical process for diagnostics and medical procedures. To segment regions of interest in medical images, U-Net has been primarily used as the baseline DL architecture that consists of contracting and expanding paths for capturing semantic features and precision localization. Although several forms of U-Net have shown promise, its limitations such as hardware memory requirements and inaccurate localization of nonstandard shapes still need to be addressed effectively. In this work, we propose U-PEN++, which reconfigures previously developed U-PEN (U-Net with Progressively Expanded Neuron) architecture by introducing a new module named Progressively Expanded Neuron with Attention (PEN-A) that consists of Maclaurin Series of a nonlinear function and multi-head attention mechanism. The proposed PEN-A module enriches the feature representation by capturing more relevant contextual information when compared to the U-PEN model. Moreover, the proposed model removes excessive hidden layers, resulting in less trainable parameters when compared to U-PEN. Experimental analysis performed on DRIVE and CHASE datasets demonstrated more effective segmentation and parameter efficiency of the proposed U-PEN++ architecture for retinal image segmentation tasks when compared to U-Net, U-PEN, and Residual U-Net architectures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Quantification of legumes biological nitrogen fixation (BNF) is normally done via analytical methods that require sampling, drying, grinding, and laboratory processing. These methods are time-consuming, expensive, and are not accessible to growers. The correlation between the BNF quantity and nodule number and nodule mass can be used to develop tools that allow rapid assessments of the BNF. In this work, we developed a graphical user interface (GUI) based deep learning and image processing system for legume nodule segmentation and classification to determine the characteristics associated with legume nodules using digital images. During image acquisition, the legume root samples were imaged using a smartphone camera and lab-made imaging setup. A total of 1468 digital images were collected from 367 root systems. After the first run of imaging, nodules were separated from the roots, and another image was obtained from the nodules of each sample. For comparison and validations, nodules were manually counted, dried, and weighed. In this study, a categorized image data library was developed and utilized for deep learning and image processing. Digital image processing filters, and image segmentation method were applied to process the digital images of the root systems and determine the number of nodules and provide their characteristics. Deep learning models were used to classify the images into different legume classes. Furthermore, a GUI was successfully developed to simplify the utilization and application of deep learning/digital image processing algorithms. The preliminary results of this study demonstrate that our deep learning/image analysis system has a great potential to accurately quantify, characterize, and count the nodules and that could be extremely valuable to growers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Over the past decade, several approaches have been proposed to learn disentangled representations for video prediction. However, reported experiments are mostly based on standard benchmark datasets such as Moving MNIST and Bouncing Balls. In this work, we address the problem of learning disentangled representation for video prediction in an industrial environment. To this end, we use decompositional disentangled variational auto-encoder, a deep generative model that aims to decompose and recognize overlapped boxes on a pallet. Specifically, this approach disentangles each frame into a dynamic component (box appearance) and a temporally variant component (box location). We evaluate this approach on a new dataset, which contains 40000 video sequences. The experimental results demonstrate the ability to learn both the decomposition of the bounding boxes and their reconstruction without explicit supervision.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Underwater plants can alter the biodiversity and function of aquatic ecosystems, leading to significant positive or negative environmental impacts. In the Great Lakes region, many efforts have been made to early identify and map underwater plants for cost-effective monitoring, management, and containment of potentially invasive species. However, traditional patrolling and visual inspection methods are time-consuming and labor-intensive. Recent advances in sensors, computing platforms, and machine learning algorithms enabled unprecedented achievements in various applications related to marine biodiversity and aquatic ecosystem. In this work, we use an ROV to collect a new underwater plant detection dataset comprising 414 underwater images of three main categories of aquatic plants, namely Leafy, Bushy, and Tapey at different lakes in the Upper Peninsula, Michigan, USA. Then we present a comparative analysis of the performance of common object detectors, including YOLOv8, Faster R-CNN, and RetinaNet on our dataset. The acquired results demonstrate the potential of such pre-trained detectors in detecting underwater plants in noisy images that are acquired by ROVs and building fully automated plant detection, mapping, and management systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The use of deep learning is particularly effective for biomedical applications involving semantic segmentation. In semantic segmentation, one of the most popular deep learning architectures is U-Net, which is specifically designed for feature cascading for pixel classification. There are several versions of U-Net, such as Residual U-Net (ResU-Net), Recurrent U-Net (RU-Net), and Recurrent Residual U-Net (R2U-Net), which have been proposed for improved performance. The recurrent connection in a layer of the neural network can create a cycle of transferring the output information of a layer back to itself as an input. Each layer's output responses can thus be thought of as additional input variables. The new model is based on Residues in Succession U-Net where the residues from successive layers extract reinforced information from the previous layers in addition to the recurrent feedback loop exhibiting several advantages. The improved learning and accumulation of the features in subsequent layers play a major part. The proposed model produces precise extraction and accumulation of features from each layer reinforcing the learning. The outputs of the combination of recurrent and residues in successive layers ensure better feature representation for segmentation tasks. We use a benchmark expert-annotated dataset viz. Structured Analysis of Retina (STARE) for measuring the abilities of the Residues in Succession Recurrent U-Net (RSR U-Net) to segment blood vessels in retinal images. The testing and evaluation results show that the new model provides improved performance when compared to U-Net, R2U-Net and Residues in Succession U-Net in the same experimentation setup.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Iris recognition is one of the well-known areas of biometric research. However, in real-world scenarios, subjects may not always provide fully open eyes, which can negatively impact the performance of existing systems. Therefore, the detection of blinking eyes in iris images is crucial to ensure reliable biometric data. In this paper, we propose a deep learning-based method using a convolutional neural network to classify blinking eyes in off-angle iris images into four different categories: fully-blinked, half-blinked, half-opened, and fully-opened. The dataset used in our experiments includes 6500 images of 113 subjects and contains images of a mixture of both frontal and off-angle views of the eyes from -50o to 50o in gaze angle. We train and test our approach using both frontal and off-angle images and achieve high classification performance for both types of images. Compared to training the network with only frontal images, our approach shows significantly better performance when tested on off-angle images. These findings suggest that training the model with a more diverse set of off-angle images can improve its performance for off-angle blink detection, which is crucial for real-world applications where the iris images are often captured at different angles. Overall, the deep learning-based blink detection method can be used as a standalone algorithm or integrated into existing standoff biometrics frameworks to improve their accuracy and reliability, particularly in scenarios where subjects may blink.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Identification of individuals based on their images has become increasingly important, to give access to certain users and to block the unknowns, in order to maintain security. A new access control approach based on facial recognition using the pix2pix generative classifier with a new decision making method was tested using the Olivetti Research Laboratory database. This approach requires a simple comparison between the generated data and the reference database using a predefined threshold. For the testing, a different number of individuals were excluded from the training database and the network was generally able to reject unknown individuals, recognize and identify individuals having access. Out of 200 unknowns, an average of 92.12% unknowns were rejected and the remaining 7.88% were considered known.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Warehouses are a storage areas with high flow of goods. As part of the robotization of these areas, one of the major problems, attracting the attention of researchers, is defined by the automation of dispatching and renewing tasks when new products arrive. Indeed, having few information, automatic detection of these new products requires an update of the intelligent system, i.e. a new training step and therefore a stop in the warehouse production line. According to the literature, a novel branch of computer vision that consists in identifying and locating objects in an image with few information is called Low Shot Object Detection (LSOD). Using this solution, neural networks can automatically find new products with no need of any additional training step. To do so, neural networks architectures have been evolved by merging extracted features. This article presents a novel method that consists in merging layers convolution Siamese-ResNet network to include new products.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper is concerned with the correlation in the African American community between their social media usage and their degree of Covid vaccine hesitancy and other general health attributes. In the past, various studies have found associations between social media use and beliefs in conspiracy theories and misinformation, however, most of these studies focus on large data sets which lack in accuracy or too general and lack of sufficient quantitative methodologies such as machine learning techniques. In this work, we experimented a pilot study with a small number of African American community regarding COVID-19 vaccine and their social beliefs. In other words, this pilot study is important for the improvement of the quality and efficiency of the main study in the future. In addition, it helps to understand the pattern of a certain community regarding certain views.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a comparison between grayscale and color-based deep learning algorithms for long distance optical UAV detection using robotic telescope systems. Three deep learning object detection algorithms are trained with a custom dataset consisting of RGB images and the performance is evaluated against the same algorithms trained with the same dataset converted to grayscale. Network training from scratch and fine-tuning are evaluated. The results for all algorithms show that fine-tuning with RGB images maximizes the detection performance and scores about 5% better in terms of mean average precision (mAP(0.5)) compared to fine-tuning on grayscale images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Generative adversarial networks have been widely developed to generate new data, and they have been used for several different applications. Some networks have been developed to classify data at the discriminator level, either by modifying the loss function or by adding a classifier. In this paper, the generative classifier pix2pix, a classifier based on generative adversarial networks, specifically the pix2pix, has been introduced. The classification is done without the need to keep the discriminator or to add additional networks, only the generator is used to classify the data. This classification requires the preparation of a reference dataset. The generative classifier pix2pix was applied to the GC character recognition task using 50000 images for training, it achieved 99.36%. It was also applied to the ORL face recognition task using 360 images for training, and it achieved an average of 97.99%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Measuring classroom engagement is an important but challenging task in education. In this paper, we present an automated method for the assessment of the degree of classroom engagement using computer vision techniques that integrate data from multiple sensors, including the front and back of the student's seating arrangement. The students' engagement is evaluated based on attributes such as facial expression, gesture, head position, and distractions visible from the frontal view of the students. Moreover, using the videos from the back of the classroom, the professor's teaching content as well as their alignment with student engagement, are calculated. We leverage deep learning methods to extract emotion and behavior features to aid in the evaluation of engagement. These AI methods will quantify the classroom engagement process.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Heart diseases are ranked the first cause of death in the world. Australia has the highest incidence of heart disease. Approximately 125 lives every single day that’s one life every 12 minutes. Heart disease describes a range of conditions that affect the heart or blood vessels and can affect anyone at any age. Also, a major concern, heart disease could cause a heart attack or stroke. Some symptoms may include chest pain, shortness of death, dizziness, fatigue, or nausea. Other serious symptoms, such as diabetes and high cholesterol, may lead to heart attacks. A healthy lifestyle, quitting smoking, and exercising are small steps to avoid heart disease. Heart diseases are easier to treat when detected early. In this paper, an effective heart disease framework is proposed. Support Vector Machine (SVM), Multilayer Perceptron (MLP), Random Forest (RF), and Radial Basic Function (RBF) techniques for classification are used. Moreover, Feature selection is performed to minimize the features to have better accuracy. Info Gain Attribute Eval – Ranker algorithm is used for feature selection. In addition, classification techniques and feature selection algorithms are applied to the LIAC heart stat log dataset which depends on the heart diseases dataset. The result’s effectiveness is described by accuracy, precision, recall, and ROC Curve.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Accurate and efficient corrosion detection is a difficult but important issue with immediate relevance to maintenance of Naval ships. The current process requires an inspector to physically access the space and perform a very manual visual inspection of the space. Considering the schedules of both the inspector and the ship, coordinating the inspection of hundreds of tanks and voids is not always a straightforward process. There is a significant amount of research into automatic detection of corrosion via computer vision algorithms, but performing pixel level segmentation introduces added difficulty. There are two key reasons for this: the lack of annotated data and the inherent difficulty in the type of problem. In this work, we utilized a combination of annotated data from a different domain and a small hand labeled dataset of panoramic images from our target domain: the inside of empty ship tanks and voids. We trained two High-Resolution Network (HRNet) models for our corrosion detector; the first with a dataset outside our target domain, the second with our hand annotated panoramic tank images. By ensembling our two models, the F1-score increased by about 120% and IOU score by about 176% with respect to the single baseline corrosion detector. The data collection process via LiDAR scanning allows the inspection process to be performed remotely. Additionally, the setup of the detector leads to a natural expansion of the corrosion dataset as panoramas from LiDAR scans are continually fed through the detector and the detections are validated. This allows for the corrosion models to be later retrained for potential improvement in accuracy and robustness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
When using a molding machine to produce plastic samples, unwanted residuals can occur. Within this study two image processing methods for the detection of residuals at plastic samples are evaluated. The aim of the two suggested methods is to detect the position of the residuals at the plastic sample reliable and to transform the image-based information into laser machine coordinates. By using the transferred coordinates, the laser machine can remove the detected residuals by laser cutting accurately without damaging the sample. The measurement setup for both methods is identical, the difference is in the processing of the captured raw image. The first method compares the raw image with the image masking template to determine the residual. The second method processes the raw image directly by comparing the light intensity transmitted through the sample to distinguish the residual from the main sample. Once the residuals can be detected, binary shifting are then performed to locate the cut lines for the residuals. The lines obtained from the image in pixel scale must then be accurately converted into millimeter-scale so that the laser machine can use them. By comparing the two methods mentioned above, the method that uses template images has more accurate and detailed results, leaving no small residuals on the sample. Meanwhile, in the method that compares the intensity of the transmitted light through the sample, there were undetectable residuals that did not produce the desired straight line. However, using the image template-matching method has some drawbacks, such as requiring each measurement to be in the same position. And thus, a more detailed design process is needed to stabilize the measurement process. In this study, a design has been made in terms of hardware as well as software with a GUI that can set several important parameters for measurement. From the results of this study, we obtained a system that can cut the residuals on the sample without damaging the sample.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Cancer has a tremendous present impact on human existence due to its extremely high global death rate. Malignant melanoma of the skin accounts for 20 daily deaths in the United States. Malignant melanomas (MEL), basal cell carcinomas (BCC), actinic keratoses intraepithelial carcinomas (AKIEC), nevi (melanocytic), keratinocytic lesions (BKL), dermatofibromas (DF), and vascular lesions (VL) are the seven main types of skin cancer (VASC). It might be challenging to recognize and classify different cancer kinds frombiomedical imaging, as there are many sub-cancer types that differ significantly from one another. Several researchers and doctors are currently trying to pinpoint the most effective means of spotting skin cancer in its earliest stages. Using multiple residual and sequential convolutional neural networks,we present a learning strategy for cancer classification in this research. An effort is made here to more precisely categorize MEL, BCC, and BKL cancers. F1 score, precision, recall, and accuracy are used to verify the validity of the proposed model. Results show the reliability and validity of the model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Wildfires have greatly increased in frequency and intensity over the past decade in the American West. The need for up-to-date data on the fuels that make up a fire regime is crucial. Semantic segmentation algorithms such as U-Net have proven very effective at similar tasks in biomedical imaging and geospatial imagery due to the powerful feature extraction from these deep networks. In this paper, we compare the U-Net and Recurrent Residual U-Net (R2U-Net) architectures’ performance on multi-class pixel-wise segmentation. A portion of Northern California was the target area for the study, and segmentation masks were based on the 40 Scott & Burgan fuel models, utilizing Sentinel-2 imagery and LANDFIRE’s LF2020 survey data. After training, the R2U-Net achieved an accuracy of 59.64% while the U-Net achieved an accuracy of 61.28%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Analysis of human gait using 3-dimensional co-occurrence skeleton joints extracted from Lidar sensor data has been shown a viable method for predicting person identity. The co-occurrence based networks rely on the spatial changes between frames of each joint in the skeleton data sequence. Normally, this data is obtained using a Lidar skeleton extraction method to estimate these co-occurrence features from raw Lidar frames, which can be prone to incorrect joint estimations when part of the body is occluded. These datasets can also be time consuming and expensive to collect and typically offer a small number of samples for training and testing network models. The small number of samples and occlusion can cause challenges when training deep neural networks to perform real time tracking of the person in the scene. We propose preliminary results with a deep reinforcement learning actor critic network for person tracking of 3D skeleton data using a small dataset. The proposed approach can achieve an average tracking rate of 68.92±15.90% given limited examples to train the network.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Gun muzzle flash produces characteristic signatures on the 766nm and 769nm wavelengths that can be passively picked up from a distance using ultra-sensitive SPAD arrays for immediate localization. Sifting through the massive number of pulses generated by the arrays in real-time however poses a challenge, especially when deep-learning models are used for classification. We present a novel FPGA-based expandable system consisting of a two-tier detection architecture that decouples the computationally-intensive deep-learning model from the data rate intensive SPAD arrays. Our slope-based first tier algorithm provides an FPGA-efficient first-look filter and our ResNet-based deep-learning model provides high sensitivity across different lighting conditions while maintaining high specificity in the face of potential false positives in an urban environment. The deep-learning model was trained with synthetic datasets generated from small samples of gun muzzle flashes from various weapons and ammunition types available to us, and sources of likely false positives in an urban environment. In testing, our system achieves a detection rate of 99.8%, 99.9% specificity and 99.6% sensitivity for shots fired from distances between 50 to 450m.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Significant bodies of research have explored the topics of computer vision and image quality, but research on the intersection of these two disciplines remains limited. Additionally, evidence suggests that image quality as determined by the human visual system may differ from image quality as determined by the performance of computer vision algorithms. Furthermore, most of the research on the relationships between image quality and computer vision performance has focused on single-label image classification and has not considered tasks such as semantic segmentation or object detection. Here, we consider the relationship between three primary image quality factors–resolution, blur, and noise–and the performance of deep-learning based object detection models. To do so, we examine the impacts of these image quality variables on mean average precision (mAP) of object detection models, evaluating the performance of models trained on only high-quality images as well as models fine tuned on lower quality images. Additionally, we map our primary image quality variables to the terms used in the General Image Quality Equation (GIQE)–namely ground sample distance (GSD), relative edge response (RER), and signal to noise ratio (SNR)–and assess the suitability of the GIQE functional form for modeling object detector performance in the presence of significant image distortions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automatic ship detections in complex background during the day and night in infrared images is an important task. Additionally, we want to have the capability to detect the ships in various scales, orientations, and shapes. In this paper, we propose the use of neural network technology for this purpose. The algorithm used for this task is the Deep Neural Machine (DNM), which contains three different parts (backbone, neck, and head). Combining all three steps, this algorithm can extract the features, create prediction layers using different scales of the backbone, and give object predictions at different scales. The experimental results show that our algorithm is robust and efficient in detecting ships in complex background.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Cancer, hematological malignancies and inherited genetic diseases can be diagnosed by detecting chromosome abnormalities. This detection is crucial for the management and follow-up of these diseases. Biologically, there are two categories of chromosome abnormalities: either in their number or in their structure. The process of karyotyping involves creating an ordered representation of the 23 pairs of chromosomes. Each given pair presents a specific band pattern, where both chromosomes are identical, in normal cases. Karyotype images are manually analyzed by qualified cytogeneticists to detect any changes on chromosomes. Based on computer vision methods, it is possible to automate the detection of chromosome abnormalities, which can assist cytogeneticists in the diagnosis process. In the literature, little research has been done to automate the detection of structural abnormalities based on computer vision techniques. In this study, we are interested in the detection of a specific abnormality: the deletion of the long arm of chromosome 5, named del(5q) deletion. We focused our work on the use of the convolutional neural network (CNN) approach, which has shown its ability to provide reliable solutions in computer vision problems. On a collected database, we trained three CNN models to test their ability to differentiate between a healthy and an deleted chromosome 5. The highest performance was provided by VGG19, achieving an accuracy of 98.66%, a sensitivity of 89.33% and specificity of 100%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Corneal endothelium assessment is carried out via specular microscopy imaging. However, automated image analysis often fails due to inadequate image quality conditions or the presence of dark regions in pathologies such as Fuchs’ dystrophy. Therefore, an early reliable image classification strategy is required before automated evaluation based on cell segmentation. Moreover, conventional classification approaches rely on manually labeled data which are difficult to obtain. We propose a two-stage semi-supervised classification algorithm, feature detection and prediction of a blurring level and guttae severity that allows us to cluster images based on the degree of segmentation complexity. For validation, we developed a web-based annotation application and surveyed a pair of expert ophthalmologists for grading a portion of the 1169 images. Preliminary results show that this approach provides a reliable and fast approach for corneal endothelial cell (CEC) image classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The human-object interaction (HOI) detection task refers to localizing humans, localizing objects, and predicting the interactions between each human-object pair. HOI is considered one of the fundamental steps in truly understanding complex visual scenes. For detecting HOI, it is important to utilize relative spatial configurations and object semantics to find salient spatial regions of images that highlight the interactions between human object pairs. This issue is addressed by the novel self-attention based guided transformer network, GTNet. GTNet encodes this spatial contextual information in human and object visual features via self-attention while achieving state of the art results on both the V-COCO1 and HICO-DET2 datasets. Code is available online∗.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The massive shift in temperatures in the Arctic region has caused the increased Albedo effect as higher amount of solar energy is absorbed in the darker surface due to melting ice and snow. This continuous regional warming results in further melting of glaciers and loss of sea ice. Arctic melt ponds are important indicators of Arctic climate change. High-resolution aerial photographs are invaluable for identifying different sea ice features and are great source for validating, tuning, and improving climate models. Due to the complex shapes and unpredictable boundaries of melt ponds, it is extremely tedious, taxing, and time-consuming to manually analyze these remote sensing data that lead to the need for automatizing the technique. Deep learning is a powerful tool for semantic segmentation, and one of the most popular deep learning architectures for feature cascading and effective pixel classification is the UNet architecture. We introduce an automatic and robust technique to predict the bounding boxes for melt ponds using a Multiclass Recurrent Residual UNet (R2UNet) with UNet as a base model. R2UNet mainly consists of two important components in the architecture namely residual connection and recurrent block in each layer. The residual learning approach prevents vanishing gradients in deep networks by introducing shortcut connections, and the recurrent block, which provides a feedback connection in a loop, allows outputs of a layer to be influenced by subsequent inputs to the same layer. The algorithm is evaluated on Healy-Oden Trans Arctic Expedition (HO-TRAX) dataset containing melt ponds obtained during helicopter photography flights between 5 August and 30 September 2005. The testing and evaluation results show that R2UNet provides improved and superior performance when compared to UNet, Residual UNet (Res-UNet) and Recurrent U-Net (R-UNet).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, great progress has been made in the study of crowd counting. Although the crowd counting networks being proposed to solve different problems have achieved satisfactory counting results, the differences of crowd density and scale in the same scene still degrade the overall counting performance. In order to deal with this problem, we propose a Multi-Scale Attention Grading Crowd Counting Network (MSAGNet), which focuses on different crowd densities in the scene by attention mechanism and fuses multi-scale information to reduce scale differences. Specifically, the grading attention feature obtaining module focuses on different densities of people in the scene by attention mechanism, and adaptively assigns corresponding weights to different density regions. Dense regions are given more weights, allowing the model to focus more on that part making the training of that region more accurate and effective. In addition, the multi-scale density feature fusion module fuses the feature maps containing density information to generate the final feature maps. The obtained feature maps contain attention information at different scales, which are density mapped to obtain the estimated density maps. This method can focus on different density regions in the same scene, and simultaneously fuse multi-scale information and attention weight, which can effectively solve the problem of counting dense regions that is difficult to calculate. Extensive experiments on existing crowd counting datasets (UCF_CC_50, ShanghaiTech, UCF-QNRF) show that our method can effectively improve the counting performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Makeup transfer aims to extract a specific makeup from a face and transfer it to another face, which can be widely used in portrait beauty, and cosmetics marketing. At present, existing methods can achieve the transfer of the entire facial makeup, but the quality of makeup transfer is not excellent because there may be a mismatch between the two images. In this paper, we propose a facial makeup transfer network based on the Laplacian pyramid, which can better preserve the facial structure from the source image and achieve high-quality transfer results. The model consists of three parts: makeup feature extraction, facial structure feature extraction, and makeup fusion. The makeup extraction part is used to extract the facial makeup from the reference image. And the facial structure feature extraction part is used to extract facial structure from the source image, in order to solve the loss of facial details when extracting facial structural features, we used the method based on Laplacian Pyramid. The makeup fusion part will fuse the facial makeup with facial structure features. Many experiments on the MT dataset have shown that this method can transfer makeup successfully without changing the original facial structure, and achieve advanced performance in various makeup styles.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Accurate crowd counting in congested scenes remain challengeable in the trade-off of efficiency and generalization. For solving this issue, we propose a mobile-friendly solution for the network deployment in high response speed demand scenarios. In order to introduce the profound potential of global crowd representations to lightweight counting model, this work suggests a novel crowd counting aimed mobile vision transformers architecture (CCMTNet), which strives for enhancing the efficiency of the model universality in real-time crowd counting tasks on resource constrained computing devices. The framework of linear CNN network interpolation structure with self-attention blocks endows the model with the ability of local feature extraction and global high-dimensional crowd information processing with low computational cost. In addition, several experimental networks with different scales based on the proposed architecture are comprehensively verified to balance the accuracy loss as compressing the computing costs. Extensive experiments on three mainstream datasets for crowd counting tasks well demonstrate the effectiveness of this proposed network. Particularly, CCMTNet achieves the feasibility of reconciling the counting accuracy and efficiency in comparisons with traditional lightweight CNN networks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video sensors are ubiquitous in the realm of security and defense. Successive image data from those sensors can serve as an integral part of early-warning systems by drawing attention to suspicious anomalies. Using object detection, computer vision, and machine learning to automate some of those detection and classification tasks aids in maintaining a consistent level of situational awareness in environments with ever-present threats. Specifically, the ability to detect small objects in video feeds would help people and systems to protect themselves against far away or small hazards. This work proposes a way to accentuate features in video stills by subtracting pixels from surrounding frames to extract motion information. Features extracted from a sequence of frames can be used either alone, or that signal can be concatenated onto the original image to highlight a moving object of interest. Using a two-stage object detector, we explore the impacts of frame differencing on Drone vs. Bird videos from both stationary cameras as well as cameras that pan and zoom. Our experiments demonstrate that this algorithm is capable of detecting objects that move in a scene regardless of the state of the camera.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.