PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 12571, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video microscopy has a long history of providing insights and breakthroughs for a broad range of disciplines, from physics to biology. Image analysis to extract quantitative information from video microscopy data has traditionally relied on algorithmic approaches, which are often difficult to implement, time consuming, and computationally expensive. Recently, alternative data-driven approaches using deep learning have greatly improved quantitative digital microscopy, potentially offering automatized, accurate, and fast image analysis. However, the combination of deep learning and video microscopy remains underutilized primarily due to the steep learning curve involved in developing custom deep-learning solutions.
To overcome this issue, we have introduced a software, currently at version DeepTrack 2.1, to design, train and validate deep-learning solutions for digital microscopy. We use it to exemplify how deep learning can be employed for a broad range of applications, from particle localization, tracking and characterization to cell counting and classification. Thanks to its user-friendly graphical interface, DeepTrack 2.1 can be easily customized for user-specific applications, and, thanks to its open-source object-oriented programming, it can be easily expanded to add features and functionalities, potentially introducing deep-learning-enhanced video microscopy to a far wider audience.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The expectations and demands for real-time performance for processing of visual information usually overloaded contemporary computer systems at all times. Increasing hardware performance enabled the use of more complex analysis and vice versa. For a long time, this led to a mutual fertilization of electronics and computer programs or architectures and algorithms with coherent data structures. Appropriate tool sets marked corner stones and boosted some significant developments. However, up to now one got always the impression that the requirements were always heading the possible performance and the open question becoming more serious is: where will it end? This paper provides a short rush through four decades of real-time image processing with a conclusion for an outlook into future opportunities and current developments providing guidelines to host corresponding research results by this conference in the coming years. New concepts, programming paradigms and production technologies are paving the way for future image processing solutions providing real-time capabilities.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Place recognition is a key task in an autonomous vehicle’s Simultaneous Localization and Mapping (SLAM). The motion estimation is bound to drift over time due to cumulative errors. Fortunately, the correct identification of a revisited area provided by the place recognition module enables further optimizations that correct drifting errors if detected in real-time. Place recognition based on structural information of the scene is more robust to luminosity changes that can lead to false detections in the case of feature-based descriptors. However, they were mainly investigated in the context of depth sensors. Inspired by a LiDAR-based descriptor, we extent this global geometric descriptor to structural information from stereo vision system. Using this descriptor, we can achieve real-time place recognition by focusing on the structural appearance of the scene derived from a 3D vision system. First, we introduce the approach used to record the 3D structural information of the visible space based on stereo images. Then, we conduct a parametric optimization protocol for precise place recognition in a given environment. Our experiments on the KITTI dataset show that the proposed approach is comparable to state-of-the-art methods, all while being low-cost. We studied the algorithm’s complexity to propose an optimized parallelization on GPU and SoC architectures. Performance evaluation on different hardware (GeForce RTX 3080, Jetson AGX Xavier, and Arria 10 SoC-FPGA) shows that the real-time requirements of an embedded system are met. Compared to a CPU implementation, processing times showed a speed-up between 4x and 16x, depending on the architecture.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, sperm analysis has become increasingly important in the treatment of fertility issues. Traditionally, sperm quality was assessed by an expert through the use of a microscope to examine samples. CASA (Computer-Assisted Sperm Analysis) systems were developed to aid these experts in measuring factors that impact sperm quality such as semen volume, total number of sperm, concentration, vitality, motility or morphology with the aim of selecting optimal sperm for fertility treatments. Computer vision techniques are used to estimate these parameters by counting the number of individuals, checking the motility of each one and even classifying them by their morphology on a small portion of the sample. Recently, deep learning methods have been improving the performance of the computer vision tasks of detection, classification and tracking needed to perform the analysis. However, such methods often use models with a large number of parameters to achieve high levels of accuracy and precision. Some disadvantages of using big models are the need of high-end GPUs in both training and inference stages and long processing times. This drawbacks often turn image processing into the bottleneck of semen quality assessment. Lighter models are proving to be capable of real-time processing with good results. Our paper studies on the performance of a simple proposed tracking optimization method on different hardware, including a high-end server, a standard personal laptop and an embedded system with GPU. This work seeks to find a compromise between model accuracy and processing time, studying low parameter models that can be used in real-time scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video super-resolution reconstruction consists of generating high-resolution frames by processing low-resolution ones. This process enhances the video quality, allowing the visualisation of fine details. Moreover, it can be considered a primary step in a video processing pipeline for further applications, such as object detection, classification and tracking from uncrewed aerial vehicles (UAV). For this reason, the super-resolution process should be performed quickly and accurately. Implementing a real-time video super-resolution method through parallel programming contributes to the efficiency of this pipeline. This work proposes two parallel super-resolution approaches for videos taken from UAVs: one using multi-core CPUs and another on a GPU architecture. The method is based on sparse representation and Wavelet transforms. First, it makes an edge correction performed in the Wavelet domain, then employs dictionaries previously trained with k-Singular Value Decomposition (k- SVD) to reconstruct the Wavelet subbands of the frames, and the high-resolution frames are computed from the Inverse Discrete Wavelet Transform (IDWT). The performance of this method was measured with the Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM) and Edge Preservation Index (EPI). The implementations are tested in a workstation with a Ryzen multi-core processor and a CUDA-enabled GPU; furthermore, they are compared with the non-parallel method regarding algorithm complexity and computing time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the growth of digital data, its protection has become a requirement to dissemination it through telecommunication networks. Nowadays, people can easily generate, edit, and share images with their own electronic devices using applications or software to process them. For this reason, in some cases, it is necessary to prove the authenticity of digital images. This paper proposes a semi-fragile color image watermarking scheme for authentication. The proposed scheme embeds EXIF (EXchangeable Image File) metadata of an image as a digital watermark using the LSB method into the Discrete Cosine Transform (DCT) coefficients. EXIF metadata stores relevant information from the image and the digital camera to organize and classify them, such as date, time information, camera settings, and image characteristics. The embedding algorithm is performed by modifying only one mid-frequency coefficient in each eight-by-eight non-overlapped block of the DCT, which offers a significant advantage in reduced processing time. The experimental results demonstrate a watermark imperceptibility according to the objective quality measures PSNR and SSIM values of the watermarked image (43 dB and 0.99, respectively). Additionally, the EXIF metadata can be extracted with 99% accuracy using a completely blind extraction process; it is performed without the original image, original watermark, original camera, or any other derivative information. The simulation results of the proposed method in parallel implementation (multicore CPU and GPU) have shown effective real-time implementation of image watermarking.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Modelling of planet formation requires empirical data on the collisions involved in the earliest stages of the process. Laboratory-based studies are required to gain this data by colliding dusty, icy particles in conditions analogous to those found in protoplanetary disks. Having technology to capture experimental footage and extract the three-dimensional motions of ensembles of particles is crucial to generating accurate collisional data within a practical timeframe. The cost of microgravity-based experiments drives a need to minimize the form-factor of such an imaging system leading this work to use light-field techniques to provide the depth element of tracking from a single camera. This work focused on the development of software to be used to perform light-field based, three-dimensional tracking and its application to real-time analysis of mm-scale particle collisions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, new emerging immersive imaging modalities, e.g. light fields, have been receiving growing attention, becoming increasingly widespread over the years. Light fields are often captured through multi-camera arrays or plenoptic cameras, with the goal of measuring the light coming from every direction at every point in space. Light field cameras are often sensitive to noise, making light field denoising a crucial pre- and post-processing step. A number of conventional methods for light field denoising have been proposed in the state of the art, making use of the redundant information coming from the different views to remove the noise. While learning-based denoising has demonstrated good performance in the context of image denoising, only preliminary works have studied the benefit of using neural networks to denoise light fields. In this paper, a learning-based light field denoising technique based on a convolutional neural network is investigated by extending a state-of-the-art image denoising method, and taking advantage of the redundant information generated by different views of the same scene. The performance of the proposed approach is compared in terms of accuracy and scalability to state-of-the-art methods for image and light field denoising, both conventional and learning-based. Moreover, the robustness of the proposed method to different types of noise and their strengths is reviewed. To facilitate further research on this topic, the code is made publicly available at https://github.com/mmspg/Light-Field-Denoising
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Three dimensional (3D) particle tracking technique has gained significant attention in recent years due to its ability to provide accurate and reliable data on the motion of particles, including micro and nanoscale particles. Here, we present a novel technique based on chromatic aberration that achieves 3D tracking using two cameras. Due to chromatic aberration of the lens, the axial position of the particle is mapped onto the lateral position in the image plane at a specific color, allowing us to use a diffraction grating to determine the lateral position on the color spectrum and thus the axial position of the particle. We also perform experiments on a 6.24 μm polystyrene particle in water, and collect data. We finally implement our method in Python and demonstrate that it performs within the tolerable error range and processes images and makes 3D coordinate predictions at a speed of 1.5 kHz.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, we present a performance study of our preliminary Automatic Parking Space Detection (APSD) system. The purpose of the APSD prototype is to enrich an information system with automatically located parking spaces. It uses images captured from a vehicle to suggest available parking spaces in urban environments. To carry out this performance evaluation, we tested three different platforms; a desktop computer with a NVIDIA RTX 2070 GPU as an upper bound performance system and two embedded solutions, a NVIDIA Jetson Xavier NX module, and a NVIDIA Jetson TX1 module. We analyze the effect of different modifications on the system, including the use of different state-of-the-art networks on the different architectures and an ablation study to verify the effect of using lower resolution images and optimizing the detection network by means of TensorRT. The evaluation results presented demonstrate the effectiveness of the proposed APSD system to meet the requirement of real-time processing. This study highlights the importance of the choice of neural network architectures used in the system, as well as the limitations of hardware devices used in the evaluation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
COVID-19 is an infectious virus caused by acute respiratory syndrome SARS-CoV-2. It was first discovered in December 2019 in Wuhan, China. This ongoing pandemic caused infected cases, including many deaths around the world. Coronavirus is spread mainly by air droplets near the infected person due to sneezing, coughing, and talking. Pretrained DL models utilize large CNN layers, which require more disk size on IoTembedded devices and affect real-time detection. This research presents an integrated lightweight DL approach for real-time and multi-task (social distancing, mask detection, and facial temperature) video measurement to control the spread of coronavirus among individuals. The three tasks have used the most recent YOLO detectors (YOLOv7-tiny). It is an object detection model optimized based on the original YOLOv7 to simply the neural network architecture. The trained models have been evaluated in terms of mean average precision, Recall, and Precision to assess the algorithm performance. The proposed approach has been deployed and executed on NVIDIA devices (Jetson nano, Jetson Xavier AGX) composed of visible and thermal cameras. The visible camera is used for face mask detection, while the thermal camera is used for facial temperature measurement and social distancing. This research enriched the prevention system of COVID-19 by the integrated approach compared to the state-of-the-art methodologies. In addition, we obtained promising results for real-time detection. The proposed approach is suitable for a surveillance system to monitor social distancing, Face mask detection, and measuring the facial temperature among individuals.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep learning-based Single Image Super-Resolution (SISR) has recently provided great performances when compared with state-of-the-art approaches. However, these performances are usually at the cost of high computational complexity and memory management, even at the inference stage. In this paper, we aim to reduce the structural complexity of a state-of-the-art Deep Neural Networks (DNN)1 approach to propose a cost-effective solution to the problem of SISR. We have investigated how the different components of the model (baseline) may affect the overall complexity while minimizing the negative effect on its quality performances. This has provided a solution, which yields quality performances comparable to the baseline model, while approximately reducing more than one order of magnitude the number of parameters, the spatial complexity (GPU memory) up to 1/6 and inference time by 1/2.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
KeyWord Spotting (KWS), i.e. the capability to identify vocal commands as they are pronounced, is becoming one of the most important features of Human-Machine Interface (HMI), also thanks to the pervasive diffusion of high-performance MEMS audio sensors with very reduced dimensions. In-Sensor Computing (ISC) appears the most viable solution to get the maximum advantage of KWS, since the dimensions of MEMS microphones remain reduced and minimally invasive. ISC, indeed, represents the extreme evolution of the edge computing paradigm, where the processing circuits are moved close to the audio sensor, integrated into its auxiliary circuitry or in the same package. However, ISC introduces severe area and power constraints and must trade off with processing speed to meet real-time operations naturally required by KWS. In this work, we want to show a neural network-based KWS suitable for ISC contexts, when audio sensor data are converted into MEL spectrogram images and a Depthwise Separable Convolutional Neural Network (DSCNN) with feature extraction capabilities is designed. To show the advantages of the above approach, the DSCNN is compared with an alternative Fully Connected Neural Network (FCNN), operating on audio signals not converted into images. The considered models have been profiled on a microcontroller and implemented on an FPGA. Their performances are compared in terms of classification accuracy and HW resources. Comparisons show that the FCNN is very far from meeting the ISC real-time processing requirements, showing a number of parameters and a frame latency respectively of 3 and 1 orders of magnitude higher than required by the DSCNN alternative when mapped to a Xilinx Zynq Ultrascale+ MPSoC.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Driver posture and micro movements are main indicators of his attention and situation awareness, as well as of his capability to suddenly take control if necessary. Therefore, the real-time detection of wrong postures is essential to mitigate the risk of accidents. In this work we want to show that, by using a custom Convolutional Neural Network (CNN) for image processing, a very accurate driver posture recognition system can be realized by using a limited number of pressure sensors, grouped in a small carpet placed only on the seat of the driver, regardless of its shape. Data from the sensor carpet are converted in images reproducing the different pressure regions of the driver’s body, so that the CNN can extract features and classify 8 postures with an average accuracy of 98.81 % in real-time. According to the edge computing paradigm, the CNN implements an end-to-end classification by exploiting a quantization scheme for weights and binarized activations to reduce the number of required resources and allow a compact and low-power HW implementation on a small FPGA. When implemented with a Xilinx Artix 7 FPGA, the CNN consumes less than 7 mW of dynamic power at an operation frequency of 47.64 MHz. Such frequency is compatible with a sensor Output Data Rate (ODR) of 16.50 kHz, fundamental in critical applications, requiring a continuous monitoring and real-time action. Results of a 130 nm CMOS standard cells synthesis have also been reported.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Steered-Mixtures-of-Experts (SMoE) present a unified framework for sparse representation and compression of image data with arbitrary dimensionality. Recent work has shown great improvements in the performance of such models for image and light-field representation. However, for the case of videos the straight-forward application yields limited success as the SMoE framework leads to a piece-wise linear representation of the underlying imagery which is disrupted by nonlinear motion. We incorporate a global motion model into the SMoE framework which allows for higher temporal steering of the kernels. This drastically increases its capabilities to exploit correlations between adjacent frames by only adding 2 to 8 motion parameters per frame to the model but decreasing the required amount of kernels on average by 54.25%, respectively, while maintaining the same reconstruction quality yielding higher compression gains. By halving the number of necessary kernels, we achieve a significant reduction in complexity on the decoder side being a crucial step towards real-time processing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Research in the past years introduced Steered Mixture-of-Experts (SMoE) as a framework to form sparse, edge-aware models for 2D- and higher dimensional pixel data, applicable to compression, denoising, and beyond, and capable to compete with state-of-the-art compression methods. To circumvent the computationally demanding, iterative optimization method used in prior works an autoencoder design is introduced that reduces the run-time drastically while simultaneously improving reconstruction quality for block-based SMoE approaches. Coupling a deep encoder network with a shallow, parameter-free SMoE decoder enforces an efficent and explainable latent representation. Our initial work on the autoencoder design presented a simple model, with limited applicability to compression and beyond. In this paper, we build on the foundation of the first autoencoder design and improve the reconstruction quality by expanding it to models of higher complexity and different block sizes. Furthermore, we improve the noise robustness of the autoencoder for SMoE denoising applications. Our results reveal that the newly adapted autoencoders allow ultra-fast estimation of parameters for complex SMoE models with excellent reconstruction quality, both for noise free input and under severe noise. This enables the SMoE image model framework for a wide range of image processing applications, including compression, noise reduction, and super-resolution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The application of optical and visual sensors including image processing for extraction of guidance information for robotics is not a new topic. The presenters experience with 40 years of robotics including aerospace, defence, food and agricultural sectors has included application of such technologies as computer vision, ultrasonic sensing, laser imaging, many of which continue to be restrictive or subject for research when practical uses for robotic applications are involved. Performing tasks that require live assessment of a work environment and work pieces with the sensory and decision processes that enable a robot to perform tasks to the same degree of skill as human workers has not been a trivial undertaking in industrial robotics or indeed R&D. The focused issues and understandings of the industry needs will be presented with the use of examples in use as well as those in development.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The number of industrial accidents has been recorded by construction cranes for a high proportion compared to other machines on construction sites. For this reason the technology for preventing collision between salvages and obstacles is strongly demanded. In this study, we propose an intelligent safety management method based on a rotational obstacle detection that detects obstacles around a crane by learning a private dataset acquired in an environment similar to an actual construction site. The rotational obstacle detection model of the proposed method is designed to more accurately predict obstacles around a crane using RGB video sequences images from the multi-domain dataset. It is composed of the real-time models for object detection, one of the typical one-stage detectors, and the self attention distillation (SAD) method. In the experimental results, its performance of accuracy over than 70% mAP. This study can be applied not only to cranes but also to other machines for safety monitoring systems on various domain fields.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.