It is not a valid asseration that everyone can paint as the difference of people's drawing ability. Thus, there is a common problem of 'bad painting' . Therefore, how to automatically optimize a large number of real and low-quality hand-drawn sketches produced in actual situations such as people 'can't draw' and 'can't draw well', so as to help most people reduce the pressure of painting, and achieve their own painting expectations when doodling, and more importantly, enhance the discriminability, interpretability and applicability of such hand-drawn sketches in vector style, is a critical problem to be solved in the field of freehand sketch processing. In this paper, we intend to address the issue of optimizing low-quality vector sketches based on diffusion model. Concretely, we first propose a criterion of information entropy (IE) to define sketch quality and use it as a preprocessing method to preliminarily screen sketch data, a latent structure which based on diffusion model is then utilized to optimize the low-quality sketch. Extensive experiments demonstrate the effectiveness of the proposed method, furthermore, the optimized sketch can be used for downstream tasks such as image recognition and understanding.
Accurate and efficient video quality assessment (VQA) methods are important guidelines for optimizing network video quality, improving video compression performance, and recommending compression coding parameters. No-reference VQA methods are mainly divided into bitstream-based methods, which are less time-consuming but less accurate, and pixel-based methods, which are more accurate but more time-consuming and require high hardware requirements. Both methods are not applicable to the quality assessment of a large number of network videos. To address the above problem, we propose an efficient and accurate cross-domain no-reference network video quality assessment method (CDNRVQA) based on bitstream. CDNRVQA takes I-frames as representative frames and uses deep neural network to obtain video high semantic features and distortion features from the representative frames, then obtains temporal features and some spatial features of the video based on the macroblock and motion vector in the compressed videos, and uses feature fusion strategy to fuse features from pixel domain and compressed domain, and finally obtains comprehensive cross-domain features which can represent the video distortion brought by content, motion and compression. CDNRVQA performs well on large VQA datasets. CDNRVQA achieves similar performance to the state-of-the-art VQA models, but greatly improves prediction time, and greatly reduces hardware consumption on the device. CDNRVQA makes it possible to apply VQA methods to video platforms.
With the flourishing development of deep learning in the field of computer vision, research of defocus deblurring based on it has gradually become a hotspot. However, most of the research focuses on defocus region detection or defocus map estimation, and algorithms for directly generating restoration images are less studied. Stressing on the problems of defocus deblurring, we propose a defocus deblurring deep model based on multi-scale information and convolution neural network. Concretely, we first perform an efficient and concise multi-scale information fusion by the selective receptive field module, thus the model can adapt to the scale sensitivity of the image defocusing region. We then use the residual channel attention module in the bottleneck module to extract the correlation features between channels, which enhances the effective channels and suppress the useless ones. Finally, a fusion objective function of edge loss and mean square loss is proposed to enhance the edge details of the image. Experimental results on a large-scale defocus deblurring dual-pixel dataset demonstrate that the proposed model has better performance than the traditional and existing deep-based methods. Comparing with the methods of the state of the art, the proposed model has a 0.44-DB improvement in PSNR metric.
As a result of people taking more and more pictures in their lives, image assessment technology, which can automatically help people choose high quality pictures quickly, has become particularly important. Most algorithms use peak signal-to-noise ratio (PSNR) to assess image quality. However, images with high scores on PSNR are not as beautiful as individuals think. Image aesthetic assessment technology can come closer to human aesthetic standards. We report on a method named saliency symbiosis network for image aesthetic assessment. This is significant because we improved the conventional convolutional neural networks (CNN) method, which gets very close to the human visual mechanism after adding saliency features in CNN. Owing to considering limitations of CNN input size, we also proposed a pooling strategy to improve the ability of the model to accept arbitrary input sizes. Afterward, we propose an effective mean Huber loss function, which becomes less sensitive to outliers and can quickly train the model to being optimal. The experiment results proved that the proposed method, by using very small training data, performed the highest accuracy in image aesthetic assessment and classification.
Convolutional neural networks (CNN) have given rise to a new generation of video super-resolution (SR) technique. However, most existing CNN-based video SR algorithms treat the consecutive frames as a series of feature maps, just as the procedure performed in single image SR algorithms. We propose an end-to-end three-dimensional (3-D) CNN video SR framework. The input frames are considered as a cube in our framework. 3-D convolution is performed on it to extract features along spatial and temporal dimension. Image prior knowledge, such as optical flows, is introduced in reconstruction. A combination of mean square error loss and multiscale structure similarity index (MS-SSIM) loss is used to optimize the model. Experimental results show that the proposed method reconstructs high-resolution frames with more accurate and visually pleasant structures compared with state-of-the-art video SR algorithms. We also achieve comparable PSNR/SSIM results with less computation time.
Convolutional neural networks (CNN) have given rise to a new generation of video super-resolution (SR) technique. However, most existing CNN-based video SR algorithms treat the consecutive frames as a series of feature maps, just as the procedure performed in single image SR algorithms. We propose an end-to-end three-dimensional (3-D) CNN video SR framework. The input frames are considered as a cube in our framework. 3-D convolution is performed on it to extract features along spatial and temporal dimension. Image prior knowledge, such as optical flows, is introduced in reconstruction. A combination of mean square error loss and multiscale structure similarity index (MS-SSIM) loss is used to optimize the model. Experimental results show that the proposed method reconstructs high-resolution frames with more accurate and visually pleasant structures compared with state-of-the-art video SR algorithms. We also achieve comparable PSNR/SSIM results with less computation time.
Image style transfer processing is an important part of image processing, which processes image information such as the color, silhouette, and line to other image styles by computer. It has made great progress in recent years, with the development of deep learning. This paper designs a deep architecture to generate highly stylized images based on convolutional neural networks and generative adversarial networks (GANs). In addition, we construct an image style similarity measure model, which can discriminate whether the generated style image is similar to the real ones. Experimental results show that the generated style images can achieve good visual results compared with related image style transformation algorithms. Generally, we propose an efficient real-time image style transfer model to generate highly stylized images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.