Class imbalance usually exists in the task of object detection based on deep learning, which has attracted extensive attention. When the number of instances belonging to different classes in the dataset is obviously unequal, class imbalance will occur, leading to the object detection model being biased towards over-represented classes during training. To handle the issue of foreground-foreground class imbalance, we design a constraint function for balancing the number of inter-class positive samples, and the improved Class-Balanced Training Sample Assignment (CBTSA) method is therefore proposed in this work. In our method, the quantitative characteristics of various classes in training set are utilized in the constraint function in order to keep the classifier in balance by equalizing the numbers of training positive samples for all kinds of ground-truth boxes. Hungarian algorithm combined with constrained positive sample numbers, CIoU loss and extended cost matrix is then used to calculate the globally optimal positive samples allocation scheme. Experiments on the challenging MS COCO 2017 benchmark are carried out to verify the effectiveness of the method given in this paper. The results demonstrate that CBTSA method boosts the performance of classifier for underrepresented classes and improves the baseline detector on detection accuracy.
Salient object detection(SOD) is particularly important especially for applications like autonomous driving which requires real-time inference speed and high performance. Most of the previous works however focus on global object accuracy but not on the connection of local objects. In this paper, we first process the cityscapes dataset into a saliency detection dataset, which focuses on distinguishing between moving objects on the road and moving objects on the sidewalk. In order to enable the saliency detection network to learn the connection between the target categories, we propose a gated convolution(GCov), which can control the input of the feature layer. For the evaluation of SOD, we combine a variety of loss functions to form a mixed loss. Equipped with the GCov and mixed loss, the proposed architecture is able to effectively distinguish the difference in the semantics of the location for the targets of the same category. Experimental results on the dataset show that our method has competitive results compared with other saliency detection networks.
The high-voltage switch is an important equipment for transmission in the power system. In autonomous inspection, a major task is to detect the state of the switches. Aimed at state determination, an angle measurement algorithm that estimates the angle between two switch arms is presented in this paper. The GR-PBAS algorithm is proposed to obtain accurate foreground switch segmentation results, eliminating the ghost at the same time. The line segments are extracted by introducing PPHT for representation. A line assignment algorithm is then put forward to map the line segments to the two arms of a switch by K-means, after which the angle can finally be calculated without much effort. The experiment and validation on real-scene data demonstrate the effectiveness of the proposed algorithm.
Angle measurement of high-voltage switch based on image is one of crucial technologies to accomplish the evaluating of switch state automatically. To realize the automation detection of the switch status, we analyze limited data and propose an angle measurement algorithm based on ER-FT feature fusion strategy that combines the extra-red feature with frequency-tuned saliency feature to detect the red positioning marks on image. In order to promote angle detecting accuracy and robust under the conditions of the complex background and various weather and illumination, we propose a two-stage segmentation scheme, and an improved Gamma correction (Gamma-M) as image pre-processing is designed in this paper to balance the brightness contrast. The experiments on RasPi are therefore carried out to demonstrate the effectiveness of our method based on the evaluation index such as IoU, the success rate of angle estimation and average angle error. The experimental results demonstrate that the ER-FT algorithm with Gamma-M pre-processing significantly improves the success rate of angle estimation and achieves a higher segmentation accuracy for red mark, while keeps a low average angle error. The outdoor test on RasPi also illustrates the algorithm proposed in this paper is effective and applicable.
Visual object tracking has attracted a lot of interests due to its applications in numerous fields such as industry and security. Because the change of illumination could lead to RGB tracking failure, more and more researchers focus on RGB-T tracking methods based on fusion of visible and thermal infrared spectrums and hasten their development in recent years. In order to utilize dual-modal complementary information adaptively, we design a weight-aware dual-modal feature aggregation mechanism, and the WF DiMP algorithm for RGB-T tracking is therefore proposed in this paper. In WF DiMP, deep features of visible and thermal infrared images are extracted by ResNet50 and are leveraged to produce heterogenous response maps, from which dual-modal weights are learned adaptively. Weighted deep features are then concatenated as input of classifier and bounding box estimation module respectively in DiMP (Discriminative Model Prediction) network to obtain the final confidence map and an object bounding box. Experiments on VOT-RGBT2019 dataset are carried out. The results show that WF DiMP algorithm has higher tracking accuracy and robustness. The evaluation indexes PR, SR reach 82.1% and 56.3% respectively, which prove the effectiveness of our mechanism given in the paper.
Due to the rotation of unmanned aerial vehicle, the position of object in the image could shift a lot which easily leads to tracking failure. To solve this problem, a motion compensation model based on Kalman Filter and Homography Transformation (KFHT) is designed in this paper to predict the position of trackers and to compensate position offset. And then an improved online multiple object tracking algorithm based on KFHT is proposed. In our algorithm, object appearance feature is extracted by residual CNN, the feature similarity and location association of objects are utilized to accomplish the object discrimination by two stage matching. To verify the effectives of the improved algorithm, experimental evaluation is carried out on the VisDrone2019 dataset by using YOLOv5 detection results and prior ground truth respectively. Results demonstrate that the algorithm given in this paper reduces the number of identity switches by 17% with YOLOv5 and by 66% with prior ground truth, and increases the tracking accuracy about 1.5% and 3.6% in MOTA respectively. The experimental results show that our algorithm based on the KFHT model is effective.
This paper proposes a novelty dense stereo matching method based on TC-MST (Threshold Constrained Minimum Spanning Tree), which aims to improve the accuracy of distance measuring. Due to the threshold has a great impact on the results of image segments, to select a better threshold, we adopt iteration threshold method. And then we uses MST to calculate the cost aggregation, and utilize the winner-take-all algorithm for the cost aggregation to obtain the disparity. Finally the method proposed is used in a distance measuring system. The experiment results show that this method improves the distance measuring accuracy compared with BM (block matching).
In the conventional face recognition, most researchers focused on enhancing the precision which input data was already the member of database. However, they paid less necessary attention to confirm whether the input data belonged to database. This paper proposed an approach of face recognition using two-dimensional principal component analysis (2DPCA). It designed a novel composite classifier founded by statistical technique. Moreover, this paper utilized the advantages of SVM and Logic Regression in field of classification and therefore made its accuracy improved a lot. To test the performance of the composite classifier, the experiments were implemented on the ORL and the FERET database and the result was shown and evaluated.
In this paper, we present a novel binary descriptor with orientation, which called Intensity-Centroid LDB (IC-LDB). This descriptor resolves the problems that the current non-binary descriptors are too compute-expensive to achieve real-time performance in the nonlinear scale space and that the original Local Difference Binary (LDB) descriptors do not have an orientation component to keep rotation invariant. Experimental results demonstrate that IC-LDB proposed in this paper was faster than previously non-binary descriptors which were used in nonlinear scale space, while performing as well in many situations.
In this paper, a new embedded intelligent monitoring system based on face recognition is proposed. The system uses Pi Raspberry as the central processor. A sensors group has been designed with Zigbee module in order to assist the system to work better and the two alarm modes have been proposed using the Internet and 3G modem. The experimental results show that the system can work under various light intensities to recognize human face and send alarm information in real time.
The high speed attitude maneuver of Unmanned Aerial Vehicle (UAV) always causes large motion between adjacent frames of the video stream produced from the camera fixed on the UAV body, which will severely disrupt the performance of image object tracking process. To solve this problem, this paper proposes a method that using a gyroscope fixed on the camera to measure the angular velocity of camera, and then the object position’s substantial change in the video stream is predicted. We accomplished the object tracking based on template matching. Experimental result shows that the object tracking algorithm’s performance is improved in its efficiency and robustness with embedded gyroscope information.
In this paper, we propose a novel 30-dimension descriptor named SIFTRO(SIFT of Ring Order) to promote the matching speed, which is generated from 3 local ring areas. A new element reordering method is presented to ensure the descriptor’s rotation invariance. To obtain the best scale factor for SIFTRO descriptor, the weight hierarchy decision model based on AHP is designed. The experiments show that the SIFTRO descriptor inherits the advantages of the invariance to image scaling, rotation and affine, and it also speeds up greatly in image matching, while the precision is improved compared with that of original SIFT.
In order to meet the accuracy requirement of a target recognition system, a target recognition algorithm based on support
vector machine is proposed in this paper. In the algorithm, firstly, a fast image multi-threshold segmentation method is
accomplished by using a novel searching path of particle swarm optimization to separate the target from the background.
Then some characteristics of target samples such as moment feature, affine invariant feature and texture feature based on
co-occurrence matrix are extracted. Thus, the parameter optimizing selection is achieved according to the corresponding
rule. After comparing with other kernel functions, the radial basis function kernel is selected to build a target classifier
for one particular typical target. Meanwhile, a BP neural network based target recognition system is implemented to
facilitate comparison. Finally, the target recognition method presented in this paper is applied to the airplane recognition.
The experimental results show that the algorithm given in this paper can effectively detect and recognize the image
target automatically. It can be applied to both single target and multi-objective recognition. Moreover, real-time target
recognition can be achieved for single target.
Wavelet transform is a new branch of mathematics, which is developing rapidly in recent years. Because it gets rid of some defects of Fourier transform. Wavelet analysis method has been paid more and more attention and widely used in various fields of engineering application, especially in image processing. In this paper, after the wavelet pyramidal decomposition of the image, the different quantization and coding schemes for each subimage are then carried out in accordance with its statistical properties and distributed properties of the coefficients. The computer simulation result shows that this compression system can attain good reconstructed image while assuring satisfying compression ratio.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.