KEYWORDS: Image segmentation, Transformers, Mammography, Visual process modeling, Medical imaging, Data modeling, Education and training, Performance modeling, Breast, Breast cancer
Mammography is main tests used for breast cancer risk assessment. However, due to the high similarity of the mass to breast tissue, blurring of the mass edges causes the task of segmenting the breast mass on mammograms to be challenging. To reduce computational costs and the workloads of radiologists, deep learning techniques based on computer vision have become a common implementation in the field of medical image segmentation. However, owing to the locality of the convolutional operation, the neural networks cannot effectively learn global and remote semantic information. We propose a novel gated axial transformer network (GATNet) framework for the mass segmentation of mammograms. GATNet uses an encoder-decoder structure. First, we use axial attention to decompose 2D self-attention into two one-dimensional self attentions. Second, we construct an efficient location-sensitive gated transformer module for image features to establish remote contextual dependencies. We conducted evaluations the proposed GATNet on two publicly available breast mass segmentation datasets. The average Dice similarity coefficients between the GATNet results and INBreast and CBISDDSM data were 80.98% and 83.63%, respectively. The experimental results indicate that GATNet effectively reduces both the computational and spatial complexity of the medical image segmentation network, demonstrating its remarkable performance.
To solve the problems of missing information and masking features in traditional pooling methods, a new Gaussian stochastic pooling (GS-pooling) method is proposed in this paper. The elements in the pooling domain are assigned corresponding weight values, and then the multinomial distribution sampling is sampled according to the weight values. Finally, the pooling result is obtained according to the corresponding Gaussian kernel. We conduct comparative experiments on the MNIST, CIFAR-10, and Market-1501 datasets. The results demonstrate that our pooling method achieves the best performance.
Action recognition methods based on human skeleton can explicitly represent human actions, and have gradually become one of the important research directions in the field of computer vision. To address the problems that the skeleton graph in graph convolutional networks is fixed to represent only the physical structure of the human body and the lack of adaptive ability to the skeleton topology graph structure, this paper proposes a dual-stream non-local graph convolutional network based on the attention mechanism. First, the temporal convolution layer is extended to a parallel structure with multiple kernels, and different temporal convolution kernel modules are adaptively selected to collect features by channel weights; second, an attention model consisting of an attention pooling layer is proposed to capture the correlation and temporal continuity among joints; finally, the nonlocal graph convolution network is used as the basic framework with joint information, skeletal information and respective motion information A dual-stream fusion model is constructed. The proposed method is compared with the mainstream methods in recent years on the action recognition dataset NTU RGB+D, and the experimental results show that the proposed method achieves high accuracy in action recognition.
Due to the small differences in driver distraction actions and the high similarity of some actions, in this paper we propose a distracted driver behavior recognition method (BACNN) based on convolutional neural network (CNN) using bilinear fusion network and combining attention mechanism with current mainstream algorithms of deep learning. In this paper, we use a driver dataset from State Farm for testing, and use 75% of this dataset for training and 25% for testing. The driver behavior pictures in the dataset are extracted using our specific convolutional neural network model for feature extraction and classified using a fully connected layer. Experiments demonstrate that this method has better recognition results compared to single-model network extracted features.
Aiming at the problem that the image edge is blurred and the middle hidden layer features are lost in image semantic segmentation, the attention mechanism of joint training in channel domain and spatial domain is proposed, and the multiobjective joint training loss function and residual connection module are used to learn the semantic features in semantic segmentation, and the hidden layer features in the process of network training are added to the calculation of loss value. The experimental results show that the introduction of multi-channel attention mechanism and residual connection in semantic segmentation network is helpful to improve the effect of semantic segmentation. Pascal voc2012 data set was used in the experiment.
Video tracking uses the semantic information between video image sequences to process, analyze and study the target to achieve target tracking. In response to the problems of sharp changes in target position, large deformation, and occlusion due to similar background interference, changes in lighting conditions, and different shapes of targets in the target tracking process, the tracking algorithm is less accurate and less robust in target appearance. An improved target tracking algorithm based on multi-domain network (MDNet) is proposed to solve this problem. By adding the Image-Align layer to the video target tracking task, a more accurate target value is obtained; applying the Directed Acyclic Graph-Recurrent Neural Network (DAG-RNN) by combining it with a convolutional neural network and modeling image neighborhood context dependencies for the target region to be tracked, we improve the problem that conventional networks only perform multi-layer extraction for the appearance features of the target, thus resulting in poor robustness to changes in target appearance; ROI Align layer is added after the convolution layer to speed up target feature extraction.
Action recognition methods based on human skeleton can explicitly represent human actions, and have gradually become one of the important research directions in the field of computer vision. To address the problems that the skeleton graph in graph convolutional networks is fixed to represent only the physical structure of the human body and the lack of adaptive ability to the skeleton topology graph structure, this paper proposes a dual-stream non-local graph convolutional network based on the attention mechanism. First, the temporal convolution layer is extended to a parallel structure with multiple kernels, and different temporal convolution kernel modules are adaptively selected to collect features by channel weights; second, an attention model consisting of an attention pooling layer is proposed to capture the correlation and temporal continuity among joints; finally, the nonlocal graph convolution network is used as the basic framework with joint information, skeletal information and respective motion information A dual-stream fusion model is constructed. The proposed method is compared with the mainstream methods in recent years on the action recognition dataset NTU RGB+D, and the experimental results show that the proposed method achieves high accuracy in action recognition.
This paper proposes a new model-based gait recognition method. Different from other methods using 3D (3-dimensional) keypoint information and skeleton information, we directly stack the 2D (2-dimensional) keypoint heatmaps in the gait sequence in the time dimension, and input it into the network structure based on 3D-CNN (3-dimensional-convolutional neural network). Then, through the gait analysis on the two dimensions of time and space, the effective gait features are finally obtained. Compared with other model-based methods, this method is more clear, concise and elegant in the process of feature extraction. The test of CASIA-B dataset shows that in the model-based gait recognition method, we have competitive performance.
The integrated imaging stereoscopic display technology is an image technology that uses a microlens array to record and display 3D spatial scene information. The research of integrated imaging LED display technology provides a new development path for the LED industry. In this paper, we use micro lens array to perform stereoscopic display experiments on LED display screens with a dot pitch of 1.25mm. It is found that there is a phenomenon of dispersion of light (rainbow phenomenon) due to crosstalk. Especially when a white light source is displayed, crosstalk is the most serious. In order to solve this problem, according to the relationship between the light emission characteristics of the LED point light source and the crosstalk degree of the LED display pixel point after the lens, a barrier lens is designed to effectively solve the light dispersion phenomenon.
In the haze environment, the visible image collected by a single sensor can express the details of the shape, color and texture of the target very well, but because of the haze, the sharpness is low and some of the target subjects are lost; Because of the expression of thermal radiation and strong penetration ability, infrared image collected by a single sensor can clearly express the target subject, but it will lose detail information. Therefore, the multi-source image fusion method is proposed to exploit their respective advantages. Firstly, the improved Dark Channel Prior algorithm is used to preprocess the visible haze image. Secondly, the improved SURF algorithm is used to register the infrared image and the haze-free visible image. Finally, the weighted fusion algorithm based on information complementary is used to fuse the image. Experiments show that the proposed method can improve the clarity of the visible target and highlight the occluded infrared target for target recognition.
Low illumination color image usually has the characteristics of low brightness, low contrast, detail blur and high salt and pepper noise, which greatly affected the later image recognition and information extraction. Therefore, in view of the degradation of night images, the improved algorithm of traditional Retinex. The specific approach is: First, the original RGB low illumination map is converted to the YUV color space (Y represents brightness, UV represents color), and the Y component is estimated by using the sampling acceleration guidance filter to estimate the background light; Then, the reflection component is calculated by the classical Retinex formula and the brightness enhancement ratio between original and enhanced is calculated. Finally, the color space conversion from YUV to RGB and the feedback enhancement of the UV color component are carried out.
TOF (time-of-flight) depth camera is a 3D imaging device that can obtain high-precision distance information. It is used to estimate 3D structure directly without the help of traditional computer-vision algorithms. However, it has some disadvantages such as generated a depth map with low resolution and it has large random noise. In order to overcome this limitation, we proposed a new method with combines the confidence map and use the interpolation operation to improve the resolution of the image. In the up-sampling process, we consider the relationship between distance information generated by TOF camera and the confidence map, through a novel method that based on the confidence value to weighted for the traditional nearest neighbor interpolation, bilinear interpolation and the bi-cubic interpolation algorithm, increase the weight of the depth information with high confidence value in every way. Then through the multi-directional changes of edge gradient, the pixel points of edge region to be optimize with the interpolation. Experimental results show that ours method can improve the resolution of the depth image and optimize the edge effect.
Visible light LEDs are being used for indoor optical wireless communication system as well as the illumination of rooms. The visible light communication system based on LEDs can attain high luminosity as a lighting source, and thus high quality transmission for an optical wireless system. In indoor diffuse optical wireless links multi-path dispersion limits the maximum transmission data rates. The relation between optimal strategies of light source and received optical power has been discussed. The optical diversity reception technology has been given to get rid of intersymbol interference and increase the SNR. And the model of the optical detector's layout is given. The system simulation model is built, and the relation curve between BER and root-mean-square delay spread of OOK-NRZ and OOK-RZ is given.
We propose a method to extract thin occlusions from multi-focus images. The occluders are various shapes of arbitrarily thin noise (e.g., fences, window shutters, tree branches and football nets). The proposed method can recognize and extract the thin occlusions in a variety of complex scenes by using color similarity and image registration. Experimental results on real images show the validity of the proposed method.
The fusion technology of video image is to make the video obtained by different image sensors complementary to each other by some technical means, so as to obtain the video information which is rich in information and suitable for the human eye system. Infrared cameras in harsh environments such as when smoke, fog and low light situations penetrating power, but the ability to obtain the details of the image is poor, does not meet the human visual system. Single visible light imaging can be rich in detail, high resolution images and for the visual system, but the visible image easily affected by the external environment. Infrared image and visible image fusion process involved in the video image fusion algorithm complexity and high calculation capacity, have occupied more memory resources, high clock rate requirements, such as software, c ++, c, etc. to achieve more, but based on Hardware platform less. In this paper, based on the imaging characteristics of infrared images and visible light images, the software and hardware are combined to obtain the registration parameters through software matlab, and the gray level weighted average method is used to implement the hardware platform. Information fusion, and finally the fusion image can achieve the goal of effectively improving the acquisition of information to increase the amount of information in the image.
To quickly obtain a 3D model of real world objects, multi-point ranging is very important. However, the traditional measuring method usually adopts the principle of point by point or line by line measurement, which is too slow and of poor efficiency. In the paper, a no scanning depth imaging system based on TOF (time of flight) was proposed. The system is composed of light source circuit, special infrared image sensor module, processor and controller of image data, data cache circuit, communication circuit, and so on. According to the working principle of the TOF measurement, image sequence was collected by the high-speed CMOS sensor, and the distance information was obtained by identifying phase difference, and the amplitude image was also calculated. Experiments were conducted and the experimental results show that the depth imaging system can achieve no scanning depth imaging function with good performance.
Image fusion technology usually combines information from multiple images of the same scene into a single image so that the fused image is often more informative than any source image. Considering the characteristics of low-light visible images, this study presents an image fusion technology to improve contrast of low-light images. This study proposes an adaptive threshold-based fusion rule. Threshold is related to the brightness distribution of original images. Then, the fusion of low-frequency coefficients is determined by threshold. Pulse-coupled neural networks (PCNN)-based fusion rule is proposed for fusion of high-frequency coefficients. Firing times of PCNN reflect the amount of detail information. Thus, a high-frequency coefficient corresponding to maximum firing times is chosen as the fused coefficient. Experimental results demonstrate that the proposed method obtains high-contrast images and outperforms traditional fusion approaches on image quality.
Visibility of Nighttime video image has a great significance for military and medicine areas, but nighttime video image
has so poor quality that we can’t recognize the target and background. Thus we enhance the nighttime video image by
fuse infrared video image and visible video image. According to the characteristics of infrared and visible images, we
proposed improved sift algorithm andαβ weighted algorithm to fuse heterologous nighttime images. We would
deduced a transfer matrix from improved sift algorithm. The transfer matrix would rapid register heterologous nighttime
images. And theαβ weighted algorithm can be applied in any scene. In the video image fusion system, we used the
transfer matrix to register every frame and then used αβ weighted method to fuse every frame, which reached the time
requirement soft video. The fused video image not only retains the clear target information of infrared video image, but
also retains the detail and color information of visible video image and the fused video image can fluency play.
Due to the effect of bad weather conditions, it often conducts visual distortions on images for outdoor vision systems.
Rain is one specific example of bad weather. Generally, rain streak is small and falls at high velocity. Traditional rain
removal methods often cause blued visual effect. In addition, there is high time complexity. Moreover, some rain streaks
are still in the de-rained image. Based on the characteristics of rain streak, a novel rain removal technology is proposed.
The proposed method is not only removing the rain streak effectively, but also retaining much detail information. The
experiments show that the proposed method outperform traditional rain removal methods. It can be widely used in
intelligent traffic, civilian surveillance and national security so on.
KEYWORDS: Video, Video surveillance, Image processing, Field programmable gate arrays, Video processing, Image resolution, Digital signal processing, Linear filtering, Fiber optic gyroscopes, Image transmission
As the effect of atmospheric particles scattering, the video image captured by outdoor surveillance system has low contrast and brightness, which directly affects the application value of the system. The traditional defogging technology is mostly studied by software for the defogging algorithms of the single frame image. Moreover, the algorithms have large computation and high time complexity. Then, the defogging technology of video image based on Digital Signal Processing (DSP) has the problem of complex peripheral circuit. It can’t be realized in real-time processing, and it’s hard to debug and upgrade. In this paper, with the improved dark channel prior algorithm, we propose a kind of defogging technology of video image based on Field Programmable Gate Array (FPGA). Compared to the traditional defogging methods, the video image with high resolution can be processed in real-time. Furthermore, the function modules of the system have been designed by hardware description language. At last, the results show that the defogging system based on FPGA can process the video image with minimum resolution of 640×480 in real-time. After defogging, the brightness and contrast of video image are improved effectively. Therefore, the defogging technology proposed in the paper has a great variety of applications including aviation, forest fire prevention, national security and other important surveillance.
Depth map is critical in Free-viewpoint television (FTV) system, and the quality of reconstructed depth map impacts the quality of rendering view. Depth map obtained from TOF camera, not only appears with large flat area and sharp edges, but also contains lots of noises. In order to achieve the aim of decreasing the noise and keeping the accurate of edges in the depth map, an iterative trilateral filter is proposed by combining bilateral filter and the introduced factor of illumination normal in this paper. The experimental results show that the proposed method can reduce the noise obviously,and keep the edge of the depth map from TOF camera well.
KEYWORDS: 3D image processing, Imaging arrays, Integral imaging, 3D displays, 3D image reconstruction, Optics manufacturing, 3D acquisition, Optical components, Image processing, Distortion
Integral imaging is a kind of three-dimensional imaging technology, it can be used of micro-lens array recording and displaying 3d objects in the scene. Since there are many shortcomings of traditional optical methods, such as optical components manufacturing process is complex and expensive, the generated images are easily overlapped. We propose a method using computer based on depth information matching method to obtain the elemental images array, this method can quickly get higher-resolution images without distortion. In the final we have reconstructed the elemental images, and have got the original three-dimensional scene and verified the correctness of the proposed method.
For traditional orthogonal subspace projection method, before performing hyperspectral image target detection, we must acquire the background spectrum vectors. However, in many cases, we cannot obtain the prior knowledge of the background spectrum accurately. And constrained energy minimization algorithm detect targets without a priori information of background spectrum, but the algorithm has a poor performance on the big target detection and cannot effectively extract the target contour. For this reason, we propose a sample weighted orthogonal subspace projection algorithm by defining the weighted autocorrelation matrix to estimation background, and then use the orthogonal subspace projection method to detect the targets. The algorithm effectively reduces the proportion of target pixels in the sample autocorrelation matrix, and has better inhibitory effect to the background. It overcomes the inherent defects of orthogonal subspace projection and constrained energy minimization, the experimental results shows better detection effect.
Integral imaging (II) – is an autostereoscopic technique, which provides 3D-images that can be viewed in full parallax without special glasses. Obtaining volumetric images consists of pick up part and reconstruction. In this work we study principles of mapping process, using preliminarily generated simple object. On the assumption of mathematical principles of straight line we implemented mapping algorithm and for verification our experiment reconstructed the result.
The amount of image data from the captured three-dimensional integral image is large and the resolution of reconstructed images by conventional computational reconstruction methods is low in Integral Imaging (II) system. To overcome these problems, a computational reconstruction method by sampling elemental images array is proposed. This method makes full use of the matching pixels in adjacent elemental images of integral imaging, thus high resolution reconstruction is realized by only using sampled elemental images. Experimental results show that the proposed method improves the resolution of reconstructed images and reduces data amount used in reconstruction. This provides certain convenience for the storage and transmission of integral images.
Free-space laser communication is a research direction in wireless communication domain. The image transmission,
compression coding and storage are ones of most important techniques in the free space laser communication. In order to
improve error resilient capability and transmission efficiency for image transmission over noisy and varying channels in
the free space laser communication (FLSC) , we contrive a joint source and channel coding (JSCC) scheme, in which
extended set partitioning in hierarchical trees (ESPIHT) and rate compatible punctured convolutional code (RCPC) are
used. Specifically, ESPIHT segments an image into different sizes of blocks whilst determining different levels of source
significance; RCPC code is then used to provide unequal error protection for individual blocks. Various simulations
show that this scheme can support robust and efficient image transmission in a very effective way.
Experiments demonstrate that the proposed method can obviously improve the image quality compared to that of the
classical approach of separate source and channel coding. The peak SNR for reconstruction image also has some
improvement. It is an effective fast image compression method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.