PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 11599, including the Title Page, Copyright information, and Table of Contents.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Welcome and Introduction to SPIE Medical Imaging conference 11599: Image Perception, Observer Performance, and Technology Assessment
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Mammography screening for breast cancer is an exemplar of an emerging role for artificial intelligence (AI) in image interpretation. Focusing on AI for mammography, the presentation will outline the need to build the evidence base to enable broader adoption of AI in health systems. This includes shifting research focus from early studies of AI development and validation using standardised cancer-enriched imaging datasets, to large scale comparative studies that improve the quality and generalisability of the evidence. Other challenges for the adoption of AI into practice will also be discussed to inform research.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In SPECT, list-mode (LM) format allows storing data at higher precision compared to binned data. There is significant interest in investigating whether this higher precision translates to improved performance on clinical tasks. Towards this goal, in this study, we quantitatively investigated whether processing data in LM format, and in particular, the energy attribute of the detected photon, provides improved performance on the task of absolute quantification of region-of-interest (ROI) uptake in comparison to processing the data in binned format. We conducted this evaluation study using a DaTscan brain SPECT acquisition protocol, conducted in the context of imaging patients with Parkinson’s disease. This study was conducted with a synthetic phantom. A signal-known exactly/background-known-statistically (SKE/BKS) setup was considered. An ordered-subset expectation-maximization algorithm was used to reconstruct images from data acquired in LM format, including the scatter-window data, and including the energy attribute of each LM event. Using a realistic 2-D SPECT system simulation, quantification tasks were performed on the reconstructed images. The results demonstrated improved quantification performance when LM data was used compared to binning the attributes in all the conducted evaluation studies. Overall, we observed that LM data, including the energy attribute, yielded improved performance on absolute quantification tasks compared to binned data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Objective evaluation of new and improved methods for PET imaging requires access to images with ground truth, as can be obtained through simulation studies. However, for these studies to be clinically relevant, it is important that the simulated images are clinically realistic. In this study, we develop a stochastic and physics-based method to generate realistic oncological two-dimensional (2-D) PET images, where the ground-truth tumor properties are known. The developed method extends upon a previously proposed approach. The approach captures the observed variabilities in tumor properties from actual patient population. Further, we extend that approach to model intra-tumor heterogeneity using a lumpy object model. To quantitatively evaluate the clinical realism of the simulated images, we conducted a human-observer study. This was a two-alternative forced-choice (2AFC) study with trained readers (five PET physicians and one PET physicist). Our results showed that the readers had an average of ∼ 50% accuracy in the 2AFC study. Further, the developed simulation method was able to generate wide varieties of clinically observed tumor types. These results provide evidence for the application of this method to 2-D PET imaging applications, and motivate development of this method to generate 3-D PET images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Emerging uses for extended-reality (XR) head-mounted displays (HMDs) within medical environments include visualizations of medical data across various imaging modalities including radiography, computed tomography, ultrasound, and magnetic resonance images. Rendering medical data in XR environments requires real-time updates to account for user movement within the environment. Unlike stationary 2D medical displays, XR HMDs also require real-time stereoscopic rendering capabilities with high performance graphics processing units. Furthermore, performance depends on the status of added systems including tracking sensor technology, user's input data, and in the case of augmented reality (AR), spatial mapping and image registration. These temporal considerations have implications for the interpretation of medical data. However, methods for the evaluation of their effects on image quality are not yet well defined. The definition of these effects in the context of medical XR devices is at best inconsistent if not completely lacking. In this work, we compare the effects and causes for three classes of XR spatiotemporal characteristics affecting medical image quality: temporal artifacts, luminance artifacts, and spatial mapping artifacts. We describe the XR system components starting from user movement recognized by inertial measurement unit and camera sensors and ending with user perception of the display through the optics of the HMD. We summarize our findings and highlight device performance areas contributing to the different effects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Subjective reading is still the majority way in current medical image diagnostics, and the visualization effect of images is one important factor which may affect the reading performance or diagnostic quality. In computed tomography (CT), CT numbers are converted into greyscale images by the display window settings. Therefore, settings of display window width and window level significantly influence the image visibility, and the object detectability could be enhanced with appropriate display window settings. In this study, we propose a new idea that the window settings can be automatically adjusted based on greyscale-based contrast-to-noise ratio, which takes into account the effect of the window settings on image quality. With optimized window settings, the greyscale-based image contrast is enhanced and reading performance is improved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The purpose of this work is to propose a framework that could help to accelerate the development of task models and figures of merit for fluoroscopy applications. Our final goal is to use this framework to establish an imaging task based on pediatric vesicoureteral reflux (VUR) diagnosis, and to assess a reader study design that mimics contrast medium uptake. The proposed framework is based on task and observer study fine-tuning after consecutive virtual trials. Radiographs of neonates were selected by a radiologist for phantom and observer study development. Ureter depictions of five VUR grades were segmented from published references and used as imaging tasks. A tool to simulate patient+task images was developed based on well-known x-ray imaging models. To validate this tool, two quality assurance phantoms were simulated and compared to actual acquisitions, having as result a good agreement in terms of maximum resolvable line-pair frequency and contrast resolution. In addition, the noise texture and magnitude were very similar. To facilitate virtual trials, a web-based application was developed, which displays simulated images and asks the observer to grade them. Preliminary tests have shown that the application is practical, accessible and provides the needed flexibility for testing different study designs. In conclusion, a framework to facilitate phantom profiling and observer study design has been developed. With this framework it has been possible to simulate and score pediatric VUR diagnostic tasks embedded in realistic anatomical backgrounds, with the goal of developing a study design that can be performed in real time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multi-object tracking (MOT) in computer vision and cell tracking in biomedical image analysis are two similar research fields, whose common aim is to achieve instance level object detection/segmentation and associate such objects across different video frames. However, one major difference between these two tasks is that cell tracking also aim to detect mitosis (cell division), which is typically not considered in MOT tasks. Therefore, the acyclic oriented graphs matching (AOGM) has been used as de facto standard evaluation metrics for cell tracking, rather than directly using the evaluation metrics in computer vision, such as multiple object tracking accuracy (MOTA), ID Switches (IDS), ID F1 Score (IDF1) etc. However, based on our experiments, we realized that AOGM did not always function as expected for mitosis events. In this paper, we exhibit the limitations of evaluating mitosis with AOGM using both simulated and real cell tracking data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Medical imaging systems are commonly assessed and optimized by use of objective-measures of image quality (IQ) that quantify the performance of an observer at specific tasks. Variation in the objects to-be-imaged is an important source of variability that can significantly limit observer performance. This object variability can be described by stochastic object models (SOMs). In order to establish SOMs that can accurately model realistic object variability, it is desirable to use experimental data. To achieve this, an augmented generative adversarial network (GAN) architecture called AmbientGAN has been developed and investigated. However, AmbientGANs cannot be immediately trained by use of advanced GAN training methods such as the progressive growing of GANs (ProGANs). Therefore, the ability of AmbientGANs to establish realistic object models is limited. To circumvent this, a progressively-growing AmbientGAN (ProAmGAN) has been proposed. However, ProAmGANs are designed for generating two-dimensional (2D) images while medical imaging modalities are commonly employed for imaging three-dimensional (3D) objects. Moreover, ProAmGANs that employ traditional generator architectures lack the ability to control specific image features such as fine-scale textures that are frequently considered when optimizing imaging systems. In this study, we address these limitations by proposing two advanced AmbientGAN architectures: 3D ProAmGANs and Style-AmbientGANs (StyAmGANs). Stylized numerical studies involving magnetic resonance (MR) imaging systems are conducted. The ability of 3D ProAmGANs to learn 3D SOMs from imaging measurements and the ability of StyAmGANs to control fine-scale textures of synthesized objects are demonstrated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Model Observers (MO) are algorithms designed to evaluate and optimize the parameters of newly developed medical imaging technologies by providing a measure of human accuracy for a given diagnostic task. If designed well, these algorithms can expedite and reduce the expenses of coordinating sessions with radiologists to evaluate the diagnosis potential of such reconstruction technologies. During the last decade, classic machine learning techniques along with feature engineering have proved to be a good MO choice by allowing the models to be trained to detect or localize defects and therefore potentially reduce the extent of needed human observer studies. More recently, and with the developments in computer processing speed and capabilities, Convolutional Neural Networks (CNN) have been introduced as MOs eliminating the need of feature engineering. In this paper, we design, train and evaluate the accuracy of a fully convolutional U-Net structure as a MO for a defect forced-localization task in simulated images. This work focuses on the optimization of parameters, hyperparameters and choice of objective functions for CNN model training. Results are shown in the form of human accuracy vs model accuracy as well as efficiencies with respect to the ideal observer, and reveal a strong agreement between the human and the MO for the chosen defect localization task.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It is widely accepted that medical imaging reconstruction strategies should be optimized by maximizing the chance of a radiologist making a correct diagnostic. This implies organizing costly sessions with doctors to evaluate the images and provide feedback. Model Observers (MO) are algorithms designed to act as human surrogates in evaluating and providing feedback as a measure of diagnostic accuracy, and should be tuned to make the same diagnostic as the human’s, regardless of the correctness. In this work, we use a previously trained and optimized Convolutional Neural Network (CNN) based MO to construct classification images that show how the diagnostic information is accessed by the MO in a form of a perceptual filter. A single MO was trained for a forced-localization task in simulated data with three different power-law noise backgrounds representing different levels of background variability. The classification images are computed in the same way as they would for a human observer using 10,000 simulated images with a defect. The frequency profile of the MO classification images show that frequency weights appear band-pass in nature and highly correlated to the frequency weights from the human observer classification images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, we focus on developing a channelized Hotelling observer (CHO) that estimates ideal linear observer performance on signal detection in images resulting from non-linear image reconstruction in computed tomography. In particular, many options on specifying the channel functions are explored. A hybrid channel model is proposed where a set of traditional Laguerre-Gauss functions are concatenated with a set of central pixel functions. This expanded channel set allows the CHO to perform robustly over a wide range of image reconstruction and system parameters. The application of this model observer to determining of the total-variation constrained least-squares algorithm yields images that are seen to favor detection of small, subtle signals.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, we implement CNN-based multi-slice model observer for 3D CBCT images and compare it with a conventional linear model observer. To evaluate detection performance of the model observer, we considered SKE/BKS four alternative detection task for 3D CBCT images. To generate training and testing datasets, we used a power law spectrum to generate anatomical noise structure. Generated anatomical noise was reconstructed by using FDK algorithm with a CBCT geometry. We employed msCHO and vCHO with LG channels as a comparative linear model observer. We implemented CNN-based multi-slice model observer mimicked msCHOa, which was composed of multiple CNNs. Each CNN consisted of convolutional operator, the batch normalization, a Leaky-ReLU as activation function, and had the following characteristics. (1) To reduce the number of variables, we used full convolutional network and set the filter size as 3×3. (2) Since downscaling layer ignores high frequency components, we did not use any kind of downscaling layer. We used ADAM optimizer and the cross-entropy loss function to train the network. We compared the detection performance of CNN-based multislice model observer, vCHO and msCHO using 1,000 trial cases when the number of slices was three, five and seven. For all numbers of slices, CNN-based multi-slice model observer provided higher detection performance than conventional linear model observers. CNN-based multi-slice model observer required more than 50,000 signal-present and signal-absent images to provide optimized performance, while msCHO required about 5,000 image pairs. Strategy to reduce the amount of training dataset will be a future research topic.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The ideal observer (IO) sets an upper performance limit among all observers and has been advocated for use in assessing and optimizing imaging systems. For joint detection-estimation tasks, the estimation ROC (EROC) curve has been proposed for evaluating the performance of observers. However, in practice, it is generally difficult to accurately approximate the IO that maximizes the area under the EROC curve (AEROC) for a general detection-estimation task. In this study, a hybrid method that employs machine learning is proposed to accomplish this. Specifically, a hybrid approach is developed that combines a multi-task convolutional neural network (CNN) and a Markov-Chain Monte Carlo (MCMC) method in order to approximate the IO for detectionestimation tasks. The multi-task CNN is designed to estimate the likelihood ratio and the parameter vector, while the MCMC method is employed to compute the utility-weighted posterior mean of the parameter vector. The IO test statistic is subsequently formed as the product of the likelihood ratio and the posterior mean of the parameter vector. Computer simulation studies were conducted to validate the proposed method, which include backgroundknown-exactly (BKE) and background-known-statistically (BKS) tasks. The proposed method provides a new approach for approximating the IO and may enable the application of EROC analysis for optimizing imaging systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Model observers are mathematical models used to perform a specific task, such as lesion detection. In this document, we will restrict ourselves to ideal model observers, which do not try to mimic human performance but try to perform perfect classification. However, we will not be following the usual definition of ideal model observer, which describes the model observer as a statistical classifier between two classes. Instead we will define a GAN network and train it to generate images from class H0, without lesions, and then use the discriminator of the GAN network as a model observer. Our method relies on pix2pix, which is a type of conditional GAN, the network is first trained to generate SPECT reconstructions-like data from the corresponding CT images. Later, the discriminator is used on simulated lesions to validate is usage as a classifier
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Task-based assessment of image quality in undersampled magnetic resonance imaging (MRI) using constraints is important because of the need to quantify the effect of the artifacts on task performance. Fluid-attenuated inversion recovery (FLAIR) images are used in detection of small metastases in the brain. In this work we carry out two-alternative forced choice (2-AFC) studies with a small signal known exactly (SKE) but with varying background for reconstructed FLAIR images from undersampled multi-coil data. Using a 4x undersampling and a total variation (TV) constraint we found that the human observer detection performance remained fairly constant for a broad range of values in the regularization parameter before decreasing at large values. Using the TV constraint did not improve task performance. The non- prewhitening eye (NPWE) observer and sparse difference-of-Gaussians (S-DOG) observer with internal noise were used to model human observer detection. The parameters for the NPWE and the internal noise for the S-DOG were chosen to match the average percent correct (PC) in 2-AFC studies for three observers using no regularization. The NPWE model observer tracked the performance of the human observers as the regularization was increased but slightly over-estimated the PC for large amounts of regularization. The S-DOG model observer with internal noise tracked human performace for all levels of regularization studied. To our knowledge this is the first time that model observers have been used to track human observer detection for undersampled MRI.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have previously presented a method for sorting textures based on whether they obscure a signal, and thus hinder the ability of an observer to perform a signal-detection task, or whether the presence of certain textures can be easily ignored by the observer, and thus do little to impede performance. This analysis has led to a surrogate figure of merit that was demonstrated to correlate with human-observer performance as measured by the channelized Hotelling observer. In this work, we generalize our previous results to include more tasks including estimation and combined detection/estimation tasks. We demonstrate the ability of this method to determine the textures present in a set of images that are the most detrimental to the specified task. We further devise alternative surrogate figures of merit can utilize this texture-compression method as a mechanism for generating channels for observer-performance computations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The purpose of this study was to evaluate radiologists’ performance in detecting lung nodules using chest computed tomography (CT) scans when assisted by a computer-aided detection (CAD) system with a vessel suppression function. Three radiologists participated in this preliminary observer study. The observer study was conducted on 80 CT scans including 94 nodules. The ratio of nodule-free scans to with-nodule scans was 1:1. CAD systems with (CAD-VS) and without (CAD-nVS) a vessel suppression function were developed to assist radiologists in reading chest CT scans. The radiologists read the CT scans in a two-session process, which had at least a one-month interval in between. Freeresponse receiver operating characteristic (FROC) curves and localization receiver operating characteristic (LROC) curves were utilized to analyze the nodule detection results. The CAD-VS and the CAD-nVS detected 96.8% and 93.6% of nodules, respectively, at 0.5 false positive per scan. For the observer study, the mean area under the LROC curve (LROC-AUC) for nodule detection improved from 0.877 by use of the CAD-nVS to 0.942 by use of the CAD-VS. Radiologists averagely detected 94.0% and 96.5% of nodules with the CAD-nVS and CAD-VS, respectively; average specificity increased from 71.7% to 81.7%. The CAD-VS improved radiologists’ performance for lung nodule detection, compared to the general CAD-nVS. This suggests that the CAD-VS technique is feasible to help radiologists further improve the clinical detection accuracy of lung nodules in chest CT scans.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study investigated the possibility of building an end-to-end deep learning-based model for the prediction of a future breast cancer based on prior negative mammograms. We explored whether the probability of abnormal class membership given by the model was correlated with the gist of the abnormal as perceived by radiologists in negative prior mammograms. To build the model, an end-to-end network, previously developed for breast cancer detection, was fine-tuned for breast cancer prediction by using a dataset containing 650 prior mammograms from women, who were diagnosed with breast cancer in a subsequent screening and 1000 cancer-free women. On a set of 630 test images, the model achieved an AUC of 0.73. For extracting gist responses, 17 experienced radiologists were recruited, viewed mammograms for 500 milliseconds and gave a score showing whether they would categorize the case as normal or abnormal on the scale of 0- 100. The image set contained 40 normal, 40 current cancer images along with 72 prior mammograms from women who would eventually develop a breast cancer. We averaged the scores from 17 readers and produced a single score per image. The network achieved an AUC of 0.75 for differentiating prior images from normal images. For 72 prior mammograms, the output of the network was significantly correlated with the strength of the gist of the abnormal as perceived by experienced radiologists (Spearman’s correlation=0.84, p<0.01). This finding suggested that the network successfully learned the representation of the gist of the abnormal in prior mammograms as perceived by experienced radiologists.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep neural network (DNN)-based image denoising methods have been proposed for use with medical images. These methods are commonly optimized and evaluated by use of traditional physical measures of image quality (IQ). However, the objective evaluation of IQ for such methods remains largely lacking. In this study, task-based IQ measures are used to evaluate the performance of DNN-based denoising methods. Specifically, we consider signal detection tasks under background-known-statistically conditions. The performance of the ideal observer (IO) and the Hotelling observer (HO) are quantified and detection efficiencies are computed to investigate the impact of the denoising operation on task performance. The experimental results show that, in the cases considered, the application of a denoising network generally results in a loss of task-relevant information. The impact of the depth of the denoising networks on task performance is also assessed. While mean squared error improved as the network depths were increased, signal detection performance degraded. These results highlight the need for the objective evaluation of IQ for DNN-based denoising technologies and may suggest future avenues for improving their effectiveness in medical imaging applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Class Activation Mapping (CAM) can be used to obtain a visual understanding of the predictions made by Convolutional Neural Networks (CNNs), facilitating qualitative insight into these neural networks when they are, for instance, used for the purpose of medical image analysis. In this paper, we investigate to what extent CAM also enables a quantitative understanding of CNN-based classification models through the creation of segmentation masks out of class activation maps, hereby targeting the use case of brain tumor classification. To that end, when a class activation map has been created for a correctly classified brain tumor, we additionally perform tumor segmentation by binarization of the aforementioned map, leveraging different methods for thresholding. In a next step, we compare this CAM-based segmentation mask to the segmentation ground truth, measuring similarity through the use of Intersection over Union (IoU). Our experimental results show that, although our CNN-based classification models have a similarly high accuracy between 86.0% and 90.8%, their generated masks are different. For example, our Modified VGG-16 model scores an mIoU of 12.2%, whereas AlexNet scores an mIoU of 2.1%. When comparing with the mIoU obtained by our U-Net-based models, which is between 66.6% and 67.3%, and where U-Net is a dedicated pixel-wise segmentation model, our experimental results point to a significant difference in terms of segmentation effectiveness. As such, the use of CAM for the purpose of proxy segmentation or as a ground truth segmentation mask generator comes with several limitations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Tomographic imaging is an ill-posed linear inverse problem, and is often regularized using prior knowledge of the sought-after object property. However, typical hand-crafted priors such as sparsity-promoting penalties may be insufficient to comprehensively describe the prior knowledge of the object to-be-imaged. In order to utilize more detailed prior knowledge, data-driven methods using deep neural networks have recently been explored for learning a prior from existing image data. However, an analysis of the ability of such data-driven methods to generalize to data that may lie outside the training distribution is still under investigation. This is particularly critical for medical imaging applications. In order to address such concerns, in this work we propose to understand the effect of the prior imposed by a reconstruction method by comparing the null components of the sought-after object and its reconstructed estimate, when ground truth objects are available. The concept of a hallucination map is introduced for the purpose of assessing non-data-driven and data-driven regularization for image reconstruction. Numerical studies were conducted using stylized undersampled k-space measurements from publicly available magnetic resonance imaging (MRI) datasets. It is demonstrated that the proposed method can be employed to identify the source of false structures in estimates of the sought-after object for a given reconstruction method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Understanding repeatability of classification by classifier in the context of overall classification performance and operating points can contribute to improved design of computer-aided diagnosis (CADx). Breast lesions (243 benign, 853 malignant: 1,096 total) were segmented using a fuzzy c-means method from dynamic contrast-enhanced magnetic resonance images acquired over 2005-2015. Thirty-eight radiomic features were extracted. Overall classification performance, case-based classification repeatability, and attainment of ‘preferred’ target and ‘optimal’ sensitivity and specificity were investigated for three classifiers: linear discriminant analysis, support vector machine, and random forest using a 1000-iteration 0.632 bootstrap. The area under the receiver operating characteristic curve (AUC) for the task of classifying lesions as malignant or benign was determined using the 0.632+ bootstrap correction. AUC was compared between classifiers; statistical significance was indicated when the 98.33% confidence interval (CI) of the difference in AUC (corrected for multiple comparisons) excluded zero. Classifier repeatability was determined through 95% CI width of classifier output by case across classifier output range. Classifier output thresholds were determined from the training folds for target sensitivity (95%), target specificity (95%), and for a selected ‘optimal’ operating point determined by minimizing (1-sensitivity)2 + (1-specificity)2 and applied to the test folds. No difference in AUC was observed between the three classifiers. Classifier output, however, was more repeatable when the random forest classifier was used as indicated by a lower 95% CI width of classifier output overall. Moreover, limited differences by classifier in threshold to attain target and ‘optimal’ sensitivities and specificities along with attained sensitivities and specificities were observed. CADx design may benefit from these considerations when selecting which classifier is used.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For some imaging modalities (e.g., Digital Breast Tomosynthesis, DBT), radiologists are provided, in addition to the 3D image stack, a 2D image known as C-view, a synthesized image from the corresponding 3D slices. An understanding of the functional perceptual interaction between the 2D image and the 3D search remains unexplored. We have yet to elucidate the basic perceptual mechanisms of visual search and attention that drive possible added benefits of incorporating the C-View image in the diagnostic process. We explore how the presence of a 2D synthesized view influences the detectability of signals and eye movements during 3D search in 1/f2.8 filtered noise backgrounds. Six trained observers searched for a microcalcification-like signal and a mass-like signal in 3D volumes (100 slices) with or without an additional 2D synthesized image (2D-S). The 2D-S was obtained by applying a high pass filter and a pixelwise maximum operation across the slices. We found that the detection and localization of small microcalcification-like signals in the 3D images improves when presented together with the 2D-S (p < 0.01). For larger mass-like signals, there was an improvement but not to the same extent as the microcalcification. Additionally, search times are significantly shorter for both signals when the 3D volume is accompanied by the 2D-S versus when used alone (p < 0.05). Eye movement analysis showed significantly fewer search errors in the 2D-S + 3D condition relative to the 3D condition for the microcalcification (p < .001) but not for the mass. The results suggest that a 2D-S allows an observer to efficiently identify suspicious locations, guide the search in 3D, and mitigate detrimental effects of peripheral vision on the detectability of small signals.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Mammographic test sets are a prominent form of quality assurance in breast screening and they have been associated in the lab with positive changes in radiologists’ performance. Focusing on this educational value, we examined the clinical audit history of 19 participants in the BreastScreen Reader Assessment Strategy (BREAST) test sets to investigate if changes in clinical performance reflected test-set participation. Included participants were radiologists who have read for BreastScreen New South Wales (NSW) in the period between 2010 and 2018 and who read on average 2000 cases or more in those years. Their audit data included 2 years before and 2 years after test-set participation. Wilcoxon Signed Ranks tests were used to investigate the difference in recall rates, cancer detection rates, and positive predictive value (PPV) for the cohort before and after testset participation. The data indicated that, over time, radiologists have significantly improved recall rate (screening rounds 2+), PPV, and the detection of ductal carcinoma in situ (DCIS). Those results suggest that breast screen readers who participate with test-set readings improve their clinical performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study aimed to investigate the effect on reading performance of how long radiologists have been awake (“time awake”) and the number of hours they slept at night (“hours slept at night”) before a reading session. Data from 133 mammographic reading assessments were extracted from the Breast Screen Reader Assessment Strategy database. Analysis of covariance was performed to determine whether sensitivity, specificity, lesion sensitivity, ROC, and JAFROC were influenced by the time awake and the hours slept at night. The results showed that less experienced radiologists’ performance varied significantly according to the time awake: lesion sensitivity was significantly lower among radiologists who performed readings after being awake for less than 2 h (44.6%) than among those who had been awake for 8 to <10 h (71.03%; p = 0.013); likewise, the same metric was significantly lower among those who had been awake for 4 to <6 h (47.7%) than among those who had been awake for 8 to <10 h (71.03%; p = 0.002) and for 10 to <12 h (63.46%; p = 0.026). The ROC values of the less experienced radiologists also seemed to depend on the hours slept at night: values for those who had slept ≤6 h (0.72) were significantly lower than for those who had slept >6 h (0.77) (p = 0.029). The results indicate that inexperienced radiologists’ performance may be affected by the time awake and hours slept the night before a reading session.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Currently in the UK, a national trial to test the effect of a transition from traditional Full Field Digital Mammography (FFDM) to Digital Breast Tomosynthesis (DBT) is being conducted. DBT, having a higher sensitivity and specificity as compared to FFDM alone, could be a better modality in national breast cancer screening. However, its incorporation in the incredibly busy and detailed UK screening program is difficult. Reading times in DBT have been shown to be longer and strenuous (Connor et al, 2012). Therefore, much research needs to be completed to develop recommendations for its efficiency. One key factor in DBT reading is the progression of fatigue, as both a cause and effect of prolonged reading times. We aimed to develop a program to process real time raw eye tracking data to identify a change in fatigue-state through blink detection. Our focus was on analysing the whole data set and defining blinks through observed events. Two real time signals which the eye tracker generates, namely the left and right ‘Eyelid Opening’ value, were considered. Through assessment of these signals, blinks of varying duration were identified. Additional parameters such as recorded frame sequences and time stamps were added to the processing to delineate the exact occurrence of these blinks during the reading process. We aim to analyse past and future large DBT eye tracked files, with our processing software, to identify the point of fatigue onset in a DBT reading session.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The UK national screening program for breast cancer currently uses Full Field Digital Mammography (FFDM). Various studies have shown that DBT has a higher sensitivity and specificity in identifying early breast cancer apart from benign pathologies, even in very dense breasts. This potentially makes DBT a better screening modality to detect early breast cancer, as well as minimize false positive recall rates. However, DBT has multiple image slices and thereby makes reading cases inherently a longer and potentially more visually fatiguing task. Our previous studies (Dong et al, 2017 and 2018) have demonstrated the impact of institutional training on reading techniques in DBT. The reading technique itself appears to have an effect on total reading time. In other follow-on studies we have employed eye tracking which gives rise to complex data sets, including parameters such as eyelid opening and pupil diameter measures, which can then be employed to gauge blinks and fatigue onset. Findings from this work have guided changes in our blink identification techniques and we have now developed semi-automated programmed processes which can analyze the large data set and provide a more accurate assessment of fatigue and vigilance parameters through blink detection. Here, we have considered ‘eyelid opening’ parameters of both the left and the right eye separately. Having such a separated approach allowed us to tease out particular aspects of blinking. Similar to Schleicher et al (2008), we found there to be ultra-short blinks (30-50 milli seconds), short blinks (51- 100 msecs), long blinks (101-500 msecs) and also microsleeps (>500 msecs). We argue that the changes observed in the frequencies of these blinks can be used as a measure of vigilance and fatigue during DBT reading.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Several previous studies investigate the performance of radiologists in western countries when reading 3D mammographic cases, however the diagnostic efficacy of this modality in China is understudied. This study aimed to improve the understanding of reading performance of 3D mammography among Chinese radiologists and compare their performances with Australian radiologists. One test set consisting of 35 3D mammography cases was used to assess reading performance. Twelve Chinese and twelve Australian radiologists read the test set independently and provide a score of 1-5 to each perceived cancer lesion. Case sensitivity, specificity, lesion sensitivity and Area Under the receiver operating characteristic Curve (AUC) were used to assess performance and radiologists’ characteristics were collected. Performance metrics and characteristics were compared using Mann-Whitney U tests and Fisher’s Exact tests. Higher specificity (0.65 vs 0.38, p=0.0003), lesion sensitivity (0.70 vs 0.40, p=0.0172) and AUC (0.81 vs 0.57, p=0.0001) were found in Australian radiologists compared to their Chinese counterparts. There was no difference between case sensitivity (0.82 vs 0.75, p=0.31). Higher values for number of years reading 3D mammography (p=0.0194) and cases read per week (p=0.0122) and numbers of hours of reading per week of 2D mammography (p=0.0094) were shown among the Australian group. In conclusion, Australian radiologists had higher reading performance when reading a 3D mammography test set compared to Chinese radiologists. Training and education programs of 3D mammography may effectively address this discrepancy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In medical imaging, it is sometimes desirable to acquire high resolution images that reveal anatomical and physiological information to support clinical practice such as diagnosis and image-guided therapies. However, for certain imaging modalities (i.e., magnetic resonance imaging (MRI)), acquiring high resolution images can be a very time-consuming and resource-intensive process. One popular solution recently developed is to create a high resolution version of the acquired low-resolution image by use of deep image super-resolution (DL-SR) methods. It has been demonstrated in literature that deep super-resolution networks can improve the image quality measured by traditional physical metrics such as mean square error (MSE), structural similarity index metric (SSIM) and peak signal-to-noise ratio (PSNR). However, it is not clear how well these metrics quantify the diagnostic value of the generated SR images. Here, a task-based super-resolution (SR) image quality assessment is conducted to quantitatively evaluate the efficiency and performance of DL-SR methods. A Rayleigh task is designed to investigate the impact of signal length and super-resolution network complexity on s binary detection performance. Numerical observers (NOs) including the regularized Hotelling Observer (RHO), the anthropomorphic Gabor channelized observers (Gabor CHO) and the ResNet-approximated ideal observer (ResNet-IO) are implemented to assess the Rayleigh task performance. For the datasets considered in this study, little to no improvement in task performance of the considered NOs due to the considered DL-SR SR networks, despite substantial improvement in traditional IQ metrics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We would propose a Deep Learning based Model Observer (DLMO) to assess performance of computed tomography (CT) images generated by applying tin-filter based spectral shaping technique. The DLMO was constructed based on a simplified VGG neural network trained from scratch. The training and test image datasets were obtained by scanning an anthropomorphic phantom with high-fidelity pulmonary structure at four dose levels with and without tin-filter, respectively. Spherical urethane foams were attached at variant positions of pulmonary tree to mimic ground glass nodule (GGN). These low dose CT scan images were assessed by the trained DLMO for lung nodule detection. The result demonstrated that spectral shaping by tin-filter can provide additional benefits on detection accuracy for certain ultra-low dose level scan (~0.2mGy), but faces challenges for extremely low dose level (~0.05mGy) due to significant noise. For normal dose range (~0.5 to 1mGy), both images from scan with and scan without tin-filter can achieve comparable detection accuracy on mimic GGN objects. A human observer (HO) study performed by 8 experienced CT image quality engineers on the same dataset as a signal-known-exactly (SKE) nodule detection task also indicated similar results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Model observers for image classification, usually rely on either knowing the statistics for the two classes or being able to estimate them. This is a reasonable assumption when designing simple experiments for object detection or localization, but it does not transfer well to more complex problems. In this paper, we show a new methodology for task based image quality assessment based on a two alternative forced choice comparison with an ensemble of classes characterizing the normal class (𝐻0 ), without the need to fully describe the abnormal class (𝐻1)
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.