Deep neural networks have become a great tool for creating solutions to denoise the speech signal, improving the intelligibility, speech quality and signal-to-noise ratio. An important element during training deep speech networks is the use of an appropriate loss function that allows to improvement the subjective and objective measures. In our work, we used the loss function based on a well-trained deep network to classify whether the signal is noisy and clean. Thanks to this, the deep network responsible for denoising is based on minimizing the difference of deep features of the pure and enhanced signal. Our work shows that the use of only deep features in the loss function allows a significant improvement in the measurement of speech signal quality. Novelty is also feature extractor, which has been trained as a multi-objective noise classifier. We believe that deep-feature loss could help in the optimization of functions difficult to differentiate.
In this paper tuning for deep learning algorithms is performed for face alignment and pose estimation problems. For pose estimation the classical indirect method (from fp68 landmarks via Candide model to pose) is compared with direct method when both the landmarks and the pose are obtained by regressive deep neural network (DNN) algorithms of VGG type. Indirect method appeared slightly more accurate than the direct one with respect to inter-ocular, inter-pupil, and box-diagonal measures . We analyzed also both indirect and direct DNN algorithms in two scenarios of resolution reducing for convoluted data tensors: via max-pooling and via striding of convolution operations. The striding algorithms exhibit relatively low amount of parameters (around 10 percent of max-pooling version compression) traded for slight loss of accuracy.
Paper presents virtual reality application framework and application concept for mobile devices. Framework
uses Google Cardboard library for Android operating system. Framework allows to create virtual reality 360
video player using standard OpenGL ES rendering methods. Framework provides network methods in order to
connect to web server as application resource provider. Resources are delivered using JSON response as result
of HTTP requests. Web server also uses Socket.IO library for synchronous communication between application
and server. Framework implements methods to create event driven process of rendering additional content based
on video timestamp and virtual reality head point of view.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.