KEYWORDS: Spine, Principal component analysis, 3D modeling, Data modeling, Statistical modeling, 3D image reconstruction, Artificial intelligence, 3D image processing, Shape analysis
Statistical shape models (SSMs) represent the distribution of labeled points across a training set of shapes. The standard practice for SSMs based on principal component analysis (PCA) is to use clipping, thresholding the latent representation so that all shapes lie within 3 standard deviations of the mean. This practice precludes the representation of shapes that are not well represented by the training set, constraining the model to realistic solutions, but making it impossible to work with shapes at the edges of the statistical population. In this study, we investigate the impact of clipping in a PCA-based SSM and whether using L2 regularization is a good replacement for clipping in the context of the automatic 2D to 3D reconstruction of the spine. We first show that using L2 regularization is equivalent to using a probabilistic PCA with two error variables, accounting for the suppression of the least important principal components and for the fact that the training set cannot perfectly represent all shapes at test time. Secondly, we use two data sets of 1746 and 768 patients with adolescent idiopathic scoliosis to study the effect of regularization, for different regularization weights and with or without clipping, for removing landmark detection errors using a simulated noise or a reconstruction pipeline. In both sets of experiments, we show that regularization removes noise in a way similar to clipping without preventing the reconstruction of out-of-distribution shapes, leading to outputs closer to ground truth, demonstrating that using a regularized SSM should be preferred to clipping.
Most algorithms to detect and identify anatomical structures in medical images require either to be initialized close to the target structure, or to know that the structure is present in the image, or to be trained on a homogeneous database (e.g. all full body or all lower limbs). Detecting these structures when there is no guarantee that the structure is present in the image, or when the image database is heterogeneous (mixed configurations), is a challenge for automatic algorithms. In this work we compared two state-of-the-art machine learning techniques in order to determine which one is the most appropriate for predicting targets locations based on image patches. By knowing the position of thirteen landmarks points, labelled by an expert in EOS frontal radiography, we learn the displacement between salient points detected in the image and these thirteen landmarks. The learning step is carried out with a machine learning approach by exploring two methods: Convolutional Neural Network (CNN) and Random Forest (RF). The automatic detection of the thirteen landmarks points in a new image is then obtained by averaging the positions of each one of these thirteen landmarks estimated from all the salient points in the new image. We respectively obtain for CNN and RF, an average prediction error (both mean and standard deviation in mm) of 29 ±18 and 30 ± 21 for the thirteen landmarks points, indicating the approximate location of anatomical regions. On the other hand, the learning time is 9 days for CNN versus 80 minutes for RF. We provide a comparison of the results between the two machine learning approaches.
The 3D analysis of the spine deformities (scoliosis) has a high potential in its clinical diagnosis and treatment. In a biplanar radiographs context, a 3D analysis requires a 3D reconstruction from a pair of 2D X-rays. Whether being fully-/semiautomatic or manual, this task is complex because of the noise, the structure superimposition and partial information due to a limited projections number. Being involved in the axial vertebra rotation (AVR), which is a fundamental clinical parameter for scoliosis diagnosis, pedicles are important landmarks for the 3D spine modeling and pre-operative planning. In this paper, we focus on the extension of a fully-automatic 3D spine reconstruction method where the Vertebral Body Centers (VBCs) are automatically detected using Convolutional Neural Network (CNN) and then regularized using a Statistical Shape Model (SSM) framework. In this global process, pedicles are inferred statistically during the SSM regularization. Our contribution is to add a CNN-based regression model for pedicle detection allowing a better pedicle localization and improving the clinical parameters estimation (e.g. AVR, Cobb angle). Having 476 datasets including healthy patients and Adolescent Idiopathic Scoliosis (AIS) cases with different scoliosis grades (Cobb angles up to 116°), we used 380 for training, 48 for testing and 48 for validation. Adding the local CNN-based pedicle detection decreases the mean absolute error of the AVR by 10%. The 3D mean Euclidian distance error between detected pedicles and ground truth decreases by 17% and the maximum error by 19%. Moreover, a general improvement is observed in the 3D spine reconstruction and reflected in lower errors on the Cobb angle estimation.
Two experiments were conducted to examine the visual comfort of stereoscopic images. The test video sequences
consisted of moving meteorite-like objects against a blue sky background. In the first experiment, a panel of viewers
rated stereoscopic sequences in which the objects moved back and forth in depth. The velocity of movement, disparity
(depth) range, and disparity type (i.e., depth position with respect to the screen plane: front, behind, or front/behind) of
the objects varied across sequences. In the second experiment, the same viewers rated stereoscopic test sequences in
which the target objects moved horizontally across the screen. Also in this case, the velocity, disparity magnitude, and
disparity type of the objects varied across sequences. For motion in the depth direction, the results indicate that visual
comfort is significantly influenced by the velocity, disparity range, and disparity type of the moving objects. We also
found significant interactions between velocity and disparity type and between disparity type and disparity range. For
motion across the screen in the horizontal plane, ratings of visual comfort depended on velocity and disparity
magnitude. The results also indicate a significant interaction between velocity and disparity. In general, the overall
results confirm that changes in disparity of stereoscopic images over time are a significant contributor to visual
discomfort. Interestingly, the detrimental effect of object velocity on visual comfort are manifested even when the
changes are confined within the generally accepted visual comfort zone of less than 60 arc minutes of horizontal
disparity.
KEYWORDS: Video, 3D vision, Glasses, 3D video compression, Molybdenum, 3D displays, Video processing, Video coding, Computer programming, Image quality
In the stereoscopic frame-compatible format, the separate high-definition left and high-definition right views are reduced
in resolution and packed to fit within the same video frame as a conventional two-dimensional high-definition signal.
This format has been suggested for 3DTV since it does not require additional transmission bandwidth and entails only
small changes to the existing broadcasting infrastructure. In some instances, the frame-compatible format might be used
to deliver both 2D and 3D services, e.g., for over-the-air television services. In those cases, the video quality of the 2D
service is bound to decrease since the 2D signal will have to be generated by up-converting one of the two views. In this
study, we investigated such loss by measuring the perceptual image quality of 1080i and 720p up-converted video as
compared to that of full resolution original 2D video. The video was encoded with either a MPEG-2 or a H.264/AVC
codec at different bit rates and presented for viewing with either no polarized glasses (2D viewing mode) or with
polarized glasses (3D viewing mode). The results confirmed a loss of video quality of the 2D video up-converted
material. The loss due to the sampling processes inherent to the frame-compatible format was rather small for both 1080i
and 720p video formats; the loss became more substantial with encoding, particularly for MPEG-2 encoding. The 3D
viewing mode provided higher quality ratings, possibly because the visibility of the degradations was reduced.
Depth maps are important for generating images with new camera viewpoints from a single source image for
stereoscopic applications. In this study we examined the usefulness of smoothing depth maps for reducing the
cardboard effect that is sometimes observed in stereoscopic images with objects appearing flat like cardboard
pieces. Six stereoscopic image pairs, manifesting different degrees of the cardboard effect, were tested. Depth
maps for each scene were synthesized from the original left-eye images and then smoothed (low-pass filtered).
The smoothed depth maps and the original left-eye images were then used to render new views to create new
"processed" stereoscopic image pairs. Subjects were asked to assess the cardboard effect of the original
stereoscopic images and the processed stereoscopic images on a continuous quality scale, using the doublestimulus
method. In separate sessions, depth quality and visual comfort were also assessed. The results from
16 viewers indicated that the processed stereoscopic image pairs tended to exhibit a reduced cardboard effect,
compared to the original stereoscopic image pairs. Although visual comfort was not compromised with the
smoothing of the depth maps, depth quality was significantly reduced when compared to the original.
In depth image based rendering, video sequences and their associated depth maps are used to render new camera
viewpoints for stereoscopic applications. In this study, we examined the effect of temporal downsampling of the
depth maps on stereoscopic depth quality and visual comfort. The depth maps of four eight-second video sequences
were temporally downsampled by dropping all frames, except the first, for every 2, 4, or 8 consecutive frames. The
dropped frames were then replaced by the retained frame. Test stereoscopic sequences were generated by using the
original image sequences for the left-eye view and the rendered image sequences for the right-eye view. The
downsampled versions were compared to a reference version with full depth maps that were not downsampled.
Based on the data from 21 viewers, ratings of depth quality for the downsampled versions were lower. Importantly,
ratings depended on the content characteristics of the stereoscopic video sequences. Results were similar for visual
comfort, except that the differences in ratings between sequences were larger. The present results suggest that more
processing, such as interpolation of depth maps, might be required to counter the negative effects of temporal
downsampling, especially beyond a downsampling of two.
The ability to convert 2D video material to 3D would be extremely valuable for the 3D-TV industry. Such
conversion might be achieved using depth maps extracted from the original 2D content. We previously
demonstrated that surrogate depth maps with limited or imprecise depth information could be used to produce
effective stereoscopic images. In the current study, we investigated whether gray intensity images associated
with the Cr colour component of standard 2D-colour video sequences could be used effectively as surrogate
depth maps. Colour component-based depth maps were extracted from ten video sequences and used to render
images for the right-eye view. These were then combined with the original images for the left-eye view to form
ten stereoscopic test sequences. A panel of viewers assessed the depth quality and the visual comfort of the
synthesized test sequences and, for comparison, of monoscopic and camera-captured stereoscopic versions of
the same sequences. The data showed that the ratings of depth quality for the synthesized test sequences were
higher than those of the monoscopic versions, but lower than those of the camera-captured stereoscopic
versions. For visual comfort, ratings were lower for the synthesized than for the monoscopic sequences but
either equal to or higher than those of the camera-captured versions
Intermediate view reconstruction is an essential step in content preparation for multiview 3D displays and freeviewpoint
video. Although many approaches to view reconstruction have been proposed to date, most of them share the need to model and estimate scene depth first, and follow with the estimation of unknown-view texture using this depth and other views. The approach we present in this paper follows this path as well. First, assuming
a reliable disparity (depth) map is known between two views, we present a spline-based approach to unknownview
texture estimation, and compare its performance with standard disparity-compensated interpolation. A distinguishing feature of the spline-based reconstruction is that all virtual views between the two known views can be reconstructed from a single disparity field, unlike in disparity-compensated interpolation. In the second part
of the paper, we concentrate on the recovery of reliable disparities especially at object boundaries. We outline
an occlusion-aware disparity estimation method that we recently proposed; it jointly computes disparities in
visible areas, inpaints disparities in occluded areas and implicitly detects occlusion areas. We then show how
to combine occlusion-aware disparity estimation with spline-based view reconstruction presented earlier, and we
experimentally demonstrate its benefits compared to occlusion-unaware disparity-compensated interpolation.
Three-dimensional television (3D-TV) will become the next big step in the development of advanced TV systems.
One of the major challenges for the deployment of 3D-TV systems is the diversity of display technologies and
the high cost of capturing multi-view content. Depth image-based rendering (DIBR) has been identified as a key
technology for the generation of new views for stereoscopic and multi-view displays from a small number of views
captured and transmitted. We propose a disparity compensation method for DIBR that does not require spatial
interpolation of the disparity map. We use a forward-mapping disparity compensation with real precision. The
proposed method deals with the irregularly sampled image resulting from this disparity compensation process by
applying a re-sampling algorithm based on a bi-cubic spline function space that produces smooth images. The
fact that no approximation is made on the position of the samples implies that geometrical distortions in the
final images due to approximations in sample positions are minimized. We also paid attention to the occlusion
problem. Our algorithm detects the occluded regions in the newly generated images and uses simple depth-aware
inpainting techniques to fill the gaps created by newly exposed areas. We tested the proposed method in the
context of generation of views needed for viewing on SynthaGramTM auto-stereoscopic displays. We used as
input either a 2D image plus a depth map or a stereoscopic pair with the associated disparity map. Our results
show that this technique provides high quality images to be viewed on different display technologies such as
stereoscopic viewing with shutter glasses (two views) and lenticular auto-stereoscopic displays (nine views).
Previously we demonstrated that surrogate depth maps, consisting of "depth" values mainly at object boundaries in the
image of a scene, are effective for converting 2D images to stereoscopic 3D images using depth image based rendering.
In this study we examined the use of surrogate depth maps whose depth edges were derived from cast shadows located
in multiple images (Multiflash method). This method has the capability to delineate actual depth edges, in contrast to
methods based on (Sobel) edge identification and (Standard Deviation) local luminance distribution. A group of 21 nonexpert
viewers assessed the depth quality and visual comfort of stereoscopic images generated using these three methods
on two sets of source images. Stereoscopic images based on the Multiflash method provided an enhanced depth quality
that is better than the depth provided by a reference monoscopic image. Furthermore, the enhanced depth was
comparable to that observed with the other two methods. However, all three methods generated images that were rated
"mildly uncomfortable" or "uncomfortable" to view. It is concluded that there is no advantage in the use of the
Multiflash method for creating surrogate depth maps. As well, even though the depth quality produced with surrogate
depth maps is sufficiently good, the visual comfort of the stereoscopic images need to be improved before this approach
of using surrogate depth maps can be deemed suitable for general use.
KEYWORDS: Video, Information operations, Detection and tracking algorithms, Surveillance, Video surveillance, Image segmentation, Image processing, Error analysis, Visualization, Image processing algorithms and systems
This paper proposes a novel algorithm for the real-time detection and correction of occlusion and split in feature-based
tracking of objects for surveillance applications. The proposed algorithm detects sudden variations of
spatio-temporal features of objects in order to identify possible occlusion or split events. The detection is
followed by a validation stage that uses past tracking information to prevent false detection of occlusion or split.
Special care is taken in case of heavy occlusion, when there is a large superposition of objects. In this case
the system relies on long-term temporal behavior of objects to avoid updating the video object features with
unreliable (e.g. shape and motion) information. Occlusion is corrected by separating occluded objects. For
the detection of splits, in addition to the analysis of spatio-temporal changes in objects features, our algorithm
analyzes the temporal behavior of split objects to discriminate between errors in segmentation and real separation
of objects, such as in the deposit of an object. Split is corrected by physically merging the objects detected to be
split. To validate the proposed approach, objective and visual results are presented. Experimental results show
the ability of the proposed algorithm to detect and correct, both, split and occlusion of objects. The proposed
algorithm is most suitable in video surveillance applications due to: its good performance in multiple, heavy, and
total occlusion; its distinction between real object separation and faulty object split; its handling of simultaneous
occlusion and split events; and its low computational complexity.
Depth image based rendering (DIBR) is a method for converting 2D material to stereoscopic 3D. With DIBR, information contained in a gray-level (luminance intensity) depth map is used to shift pixels in the 2D image to generate a new image as if it were captured from a new viewpoint. The larger the shift (binocular parallax), the larger is the perceived depth of the generated stereoscopic pair. However, a major problem with DIBR is that the shifted pixels now occupy new positions and leave areas that they originally occupied "empty." These disoccluded regions have to be filled properly, otherwise they can degrade image quality. In this study we investigated different methods for filling these disoccluded regions: (a) Filling regions with a constant color, (b) filling regions with horizontal linear interpolation of values on the hole border, (c) solving the Laplace equation on the hole boundary and propagate the values inside the region, (d) horizontal extrapolation with depth information taken into account, (e) variational inpainting with depth information taken into account, and (f) preprocessing of the depth map to prevent disoccluded regions from appearing. The methods differed in the time required for computing and filling, and the appearance of the filled-in regions. We assessed the subjective image quality outcome for several stereoscopic test images in which the left-eye view was the source and the right-eye view was a rendered view, in line with suggestions in the literature for the asymmetrical coding of stereoscopic images.
Depth image based rendering (DIBR) is useful for multiview autostereoscopic systems because it can produce a set of new images with different camera viewpoints, based on a single two-dimensional (2D) image and its corresponding depth map. In this study we investigated the role of object boundaries in depth maps for DIBR. Using a standard subjective assessment method, we asked viewers to evaluate the depth and the image quality of stereoscopic images in which the view for the right eye was rendered using (a) full depth maps, (b) partial depth maps containing full depth information but that was only located at object boundaries and edges, and (c) partial depth maps containing binary depth information at object boundaries and edges. Results indicate that depth quality was enhanced and image quality was slightly reduced for all test conditions, compared to a reference condition consisting of 2D images. The present results confirm previous observations indicating that depth information at object boundaries is sufficient in DIBR to create new views such as to produce a stereoscopic effect. However, depth ratings for the partial depth maps tended to be slightly lower than those generated with the full depth maps. The present study also indicates that more research is needed to increase the depth and image quality of the rendered stereoscopic images based on DIBR before the technique can be of wide and practical use.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.