PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 11316, including the Title Page, Copyright information, Table of Contents, Author and Conference Committee lists.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The study of human perception is as old as medical imaging. Understanding perception has yielded the rules of engagement for radiologists as they tackle the “Where’s Waldo?” situations, the satisfaction of search problem, distractions, fatigue, the varying subtlties of disease states and normal, their prior training and experience, and the somewhat endless non-image-interpretation tasks associated with a radiology practice. The understanding of artificial intelligence (AI) on a radiologist’s interpretation can be likened to considering the suggestions from a first-year resident to incorporating insights from a seasoned expert. Kundel’s eye gaze experiments which demonstrated the search patterns of radiologists and laymen continue to be used today to understand the added influence of AI in the end user’s performance. Multi-disciplinary perception research has evolved from understanding human performance in the interpretation of medical images, to the understanding of computer-aided diagnosis (CAD), and to now the understanding of AI -- either as an aid to radiologists as a second reader, a concurrent reader, or a primary reader, or as a complete replacement. This lecture will take the audience through history to appreciate the role and necessity of perception (and its associated metrics of performance) in the development, validation, and ultimate future implementation of AI in the clinical radiology workflow.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We investigated the effects of using games to identify hidden abnormalities to enhance visual diagnostic skills in radiology residents. Radiology residents viewed 50 chest images while their eye position was recorded. They were then given a Where’s WALDO book to study over 3 weeks. They then reviewed the 50 chest images again. Performance in detecting abnormalities and visual search parameters were analyzed. There was no significant difference as a function of Waldo vs control for pre vs post testing. With respect to the eye-tracking data, the first measure considered was total viewing time. Overall there was a significant difference as a function of Waldo vs control group (z = 2.332, p = 0.0197) and pre vs post (F = 43.48, p < 0.0001) with those in the Waldo group experiencing a larger drop in viewing time from pre (mean = 20.65, sd = 15.29 Waldo; mean = 12.43, sd = 6.91 control) to post (mean = 14.62, sd = 10.32 Waldo; mean = 10.86, sd = 6.11 control). Fixation durations were significantly shorter (F = 16.51, p < 0.0001) pre (mean = 332.31, sd = 232.47 Waldo; mean = 378.32, sd = 234.88 control) than post (mean = 328.36, sd = 210.41 Waldo; mean = 359.45, sd = 228.17 control) and for control vs Waldo (F = 188.56, p < 0.0001). Practicing Where’s WALDO or similar nonradiology search task images may facilitate the acquisition of radiology image interpretation skills.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Pathology in the UK is on the verge of transformation from analogue to digital practice through the development of digital pathology (DP). Advances in technology has allowed for this change to occur through the use of high-throughput slide scanners to obtain whole histopathology glass slides onto computer workstations rather than the use of a conventional light microscope (LM). Previous studies have shown that the use of digital imaging to view histopathology slides has proven to be of benefit to pathology departments. It allows pathologists to analyse samples remote from the laboratory, making sharing of the slides between pathologists more straight-forward, and also enables expert review out of hours. With the ability to electronically transfer slides from the laboratory to the reporting pathologist, it may provide solutions for local shortages of pathologists across NHS trusts in the UK. However, a number of researchers argue that the costs of implementing digital pathology may outweigh its advantages. Moreover, images produced by DP systems are often of inferior resolution when compared to conventional light microscopy. The lack of literature on this subject limits the adoption of this new technology by laboratories across the country. This multi-centre study aims to analyse how the study pathologists examine DP images of different pathology modalities by using eye-tracking technology, thus using data on their reading and interpretation technique to improve performance and contribute to the adoption of DP across the UK.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Numerous factors contribute to radiologist image reading discrepancy and interpretive errors. However, a factor often overlooked is how interpretations might be impacted by the time of day when the image reading takes place—a factor that other disciplines have shown to be a determinant of competency. This study therefore seeks to investigate whether radiologists’ reading performances vary according to the time of the day at which the readings take place. We evaluated 197 mammographic reading assessments collected from the BreastScreen Reader Assessment Strategy (BREAST) database, which included reading timestamps and radiologists’ demographic data, and conducted an analysis of covariance to determine whether time of day influenced the radiologists’ specificity, lesion sensitivity, and jackknife alternative free-response receiver operating characteristic (JAFROC). After adjusting for radiologist experience and fellowship, we found a significant effect of the time of day of the readings on specificity but none on lesion sensitivity or JAFROC. Radiologist specificity was significantly lower in the late morning (10 am–12 pm) and late afternoon (4 pm–6 pm) than in the early morning (8 am–10 am) or early afternoon (2 pm–4 pm), indicating a higher rate of false-positive interpretations in the late morning and late afternoon. Thus, the time of day mammographic image readings take place may influence radiologists’ performances, specifically their ability to identify normal images correctly. These findings present significant implications for radiologic clinicians.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computer Vision Syndrome (CVS) is an umbrella term for a pattern of symptoms associated with prolonged digital screen exposure such as eyestrain, headaches, blurred vision, dry eyes, and neck/shoulder pain. Commercially available blue light filtering lenses (BLFL) are advertised as improving CVS. This pilot study evaluated the effectiveness of BLFL on reducing CVS symptoms and fatigue in a cohort of radiology trainees. In this Institutional Review Board approved prospective crossover study, 10 radiology residents were randomized into two cohorts: one wearing BLFL first then a sham pair (non-BLFL), and the other wearing a sham pair first then the BLFL over the course of a typical clinical work day for 5 days. Every evening, participants filled out a questionnaire based on a previously validated CVS questionnaire (CVS-Q:16 questions, Likert scale 1-5) and the Swedish Occupational Fatigue Index (SOFI: 16 questions, Likert scale 0- 10). 10 radiology residents (8 PGY-2, 1 PGY-3, and 1 PGY-4): 4 males, 6 females, participated. Although none of the 32 symptoms demonstrated statistically significant differences, 11/16 (68.8%) symptoms measured on the CVS-Q and 13/16 (81.3%) symptoms measured on the SOFI were reduced with the BLFL compared to the sham glasses. Two symptoms, “drowsy” and “lack of concern,” decreased in the BLFL cohort nearing statistical significance, p = 0.057 and p = 0.075, respectively. Use of BLFL may ameliorate CVS symptoms. Future studies with larger sample sizes and participants of different ages are required to verify the potential of BLFL.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Purpose: Blinded Independent Central Review (BICR) is a well-accepted method employed in many oncology registration trials. Ongoing monitoring radiologist “reader” performance is both good clinical trial practice and a requirement by regulatory authorities. We continue to use Reader Disagreement Index (RDI) as an important measure in BICR. In this work we studied RDI as an early indicator to identify an outlier reader during the monitoring of reader performance in BICR. Early indication would enable early intervention and thus possibly improve trial outcomes.
Methods: We performed a retrospective analysis of readers’ RDIs in nineteen different clinical trials. Ninety-two reader performances were examined at five intervals in each trial. These individual trial reviews were conducted by forty-three board-certified radiologist readers using several established imaging assessment trial criteria. The objective was to see how well RDI performance above a threshold at progressive monitoring intervals would “flag” a potential overall end-point performance “issue” for that specific reader.
Results: We present results for the prediction of exceeding threshold (one standard deviation above a study mean RDI). Sensitivity, Specificity, Positive Predictive Value (PPV) and Negative Predictive Value (NPV) were determined for the predicted performance outcomes. We explored interpreting multiple “flags” for each trial to improve the aforementioned metrics.
Conclusions: One would expect that a “flag” of RDI exceeding threshold at an early-stage would likely give a useful prediction of end-point reader performance. We refined our methods to use multiple flags which enable statistically improved Specificity and PPV. Improved predictive capability at early stage intervals coupled with persistent monitoring across subsequent intervals will enable trial managers to focus on specific readers. An earlier indication of possible reader performance issues can permit proactive intervention and enhance good trial practices.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Given the wide variety of CT reconstruction algorithms currently available { from filtered back projection, to non- linear iterative algorithms, and now even deep learning approaches { there is a pressing need for reconstruction quality metrics that correlate well with task-specific goals. For detection tasks, metrics based on a model observer framework are an attractive option. In this framework, a reconstruction algorithm is assessed based on how well a statistically optimal "model observer" performs on a signal present/signal absent detection task. However, computing exact model observers requires a detailed description of the statistics of the reconstructed images, which are often unknown or computationally intractable to obtain, especially in the case of non-linear reconstruction algorithms. Instead, we study the feasibility of using supervised machine learning approaches to approximate model observers in a CT reconstruction setting. In particular, we show that we can well-approximate the Hotelling observer, i.e., the optimal linear classifier, for a signal-known-exactly/background-known-exactly task by training from labeled training images in the case of FBP reconstruction. We also investigate the feasibility of training multi-layer neural networks to approximate the ideal observer in the case of total variation constrained iterative reconstruction. Our results demonstrate that supervised machine learning methods achieve close to ideal performance in both cases.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In medical imaging systems, task-based metrics have been advocated as a means of evaluating image quality. Mathematical observers are one method of computing such metrics. Although the Bayesian Ideal Observer (IO) is optimal by definition, it is frequently intractable and non-linear. Linear approximations to the IO are sometimes employed to obtain task-based statistics when computing the IO is infeasible. The optimal linear observer for maximizing the SNR of the test statistic is the Hotelling Observer (HO). However, the computational cost for computing the HO increases with image size and becomes intractable for larger images. Channelized methods of reducing the dimensionality of the data before computing the HO have become popular, with efficient channels capable of approximating the HO’s performance at significantly reduced computational cost. State-of-the-art channels have been learned by using an autoencoder (AE) to encode data by employing a known signal template as the desired reconstruction, but the method is dependant on a high-quality estimate of the signal. An alternative to channels is approximating the test statistic directly using a feed-forward neural network (FFNN). However, this approach can overfit when the amount of training data is limited. In this work, a generalized method for learning channels utilizing an AE with dual losses (AEDL) is proposed. The AEDL framework jointly minimizes both task-specific and reconstruction losses to learn a set of efficient channels, even when the number of training images is relatively small. Preliminary results indicate that the proposed network outperforms state-of-the-art methods on the selected imaging task. Additionally, the AEDL framework suffers from less overfitting than the FFNN.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Ideal Observer (IO) performance has been advocated when optimizing medical imaging systems for signal detection tasks. However, analytical computation of the IO test statistic is generally intractable. To approximate the IO test statistic, sampling-based methods that employ Markov-Chain Monte Carlo (MCMC) techniques have been developed. However, current applications of MCMC techniques have been limited to several object models such as a lumpy object model and a binary texture model, and it remains unclear how MCMC methods can be implemented with other more sophisticated object models. Deep learning methods that employ generative adversarial networks (GANs) hold great promise to learn stochastic object models (SOMs) from image data. In this study, we described a method to approximate the IO by applying MCMC techniques to SOMs learned by use of GANs. The proposed method can be employed with arbitrary object models that can be learned by use of GANs, thereby the domain of applicability of MCMC techniques for approximating the IO performance is extended. In this study, both signal-known-exactly (SKE) and signal-known-statistically (SKS) binary signal detection tasks are considered. The IO performance computed by the proposed method is compared to that computed by the conventional MCMC method. The advantages of the proposed method are discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Model observers that replicate human observers are useful tools for assessing image quality based on detection tasks. Linear model observers including nonprewhitening matched filters (NPWMFs) and channelized Hotelling observers (CHOs) have been widely studied and applied successfully to evaluate and optimize detection performance. However, there is still room for improvement in predicting human observer responses in detection tasks. In this study, we used a convolutional neural network to predict human observer responses in a two-alternative forced choice (2AFC) task for PET imaging. Lesion-absent and lesion-present images were reconstructed from clinical PET data with simulated lesions added to the liver and lungs and were used for the 2AFC task. We trained the convolutional neural network to discriminate images that human observers chose as lesion-present and lesion-absent in the 2AFC task. We evaluated the performance of the trained network by calculating the concordance between human observer responses and predicted responses from the network output and compared it to those of NPWMF and CHO. The trained network showed better agreement with human observers than the linear NPWMF and CHO model observers. The results demonstrate the potential for convolutional neural networks as model observers that better predict human performance. Such model observers can be used for optimizing scanner design, imaging protocols, and image reconstruction to improve lesion detection in PET imaging.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Model Observers (MO) are algorithms designed to evaluate and optimize the parameters of new medical imaging reconstruction methodologies by providing a measure of human accuracy for a diagnostic task. In contrast with a computer-aided diagnosis system, MOs are not designed to outperform human diagnosis but only to find a defect if a radiologist would be able to detect it. These algorithms can economize and expedite the finding of optimal reconstruction parameters by reducing the number of sessions with expert radiologists, which are costly and prolonged. Convolutional Neural Networks (CNN or ConvNet) have been successfully used in the computer vision field for image classification, segmentation and video analytics. In this paper, we propose and test several U-Net configurations as MO for a defect localization task on synthetic images with different levels of correlated noisy backgrounds. Preliminary results show that the CNN based MO has potential and its accuracy correlates well with that of the human.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Radiologists reading screening mammograms often do this in batches of images read sequentially. This work investigates ways that readers change over the course of a batch. We evaluate sequential reading effects in terms of suspicion scores and reading times from an ongoing study in the Netherlands.
A set of 3510 screening cases read as part of a national screening program by 10 qualified radiologist readers forms the basis for our study. The readers give a suspicion score (on a standalone device) in addition to their standard screening report. The score is time-stamped so that reading order and batch grouping can be assessed. Batches are defined as groups of cases with less than 10 minutes (600 s) between sequential readings. We use Kendall’s Tau, weighted by batch size, as a measure of association between batch position, and suspicion score or reading time. Randomization is used to get confidence intervals on the null hypothesis ( τ=0 ).
We find significant associations between batch position and both of the variables under investigation (suspicion scores and reading time). The associations are negative, suggesting that both suspicion and reading time are reduced at later points in a batch. These results are consistent with the hypothesis that readers are becoming visually adapted to the properties of the images as they progress through a batch of cases, affecting their perception and decisions about the images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Breast density is an important risk factor for breast cancer and has a substantial effect on the sensitivity of mammography screening. This study aimed to evaluate intra and inter reader variability of visual breast density assessment in Saudi Arabia, using the American College of Radiology (ACR) Breast Imaging Reporting and Data System (BI-RADS) breast density categories (5th edition) and Visual Analogue Scales (VAS). A random sample of 102 screening mammograms from the Saudi National Breast Cancer Screening Programme (SNBCSP) was assessed twice by two breast screening consultant radiologists for intra reader variability. Inter reader variability was assessed using screening mammograms from 1132 women. Each mammogram was assessed by two readers from a pool of 11 radiologists. Inter reader variability for two mammography technologists using a sample of 75 mammograms is also reported. Intra reader variability showed radiologist A had excellent agreement for VAS [Intraclass Correlation Coefficient (ICC) = 0.95] and BI-RADS [weighted kappa (κ) = 0.88], radiologist B had lower but still excellent agreement for VAS [ICC= 0.88] and substantial agreement for BI-RADS [κ = 0.71]. Inter reader variability between radiologists showed overall moderate agreement for BI-RADS [κ =0.61] while VAS had excellent agreement [ICC=0.89]. Results of inter reader agreement between two mammography technologists was fair using BI-RADS [κ= 0.35] and moderate using VAS [ICC=0.41]. In conclusion, agreement in breast density assessment by radiologists in the Saudi breast screening programme is acceptable. Mammography technologists showed lower agreement for both methods. Training is essential to increase reader agreement, double reading is also important in such population based breast cancer screening programmes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We evaluated a radiomics/machine learning method for dynamic contrast-enhanced magnetic resonance (DCE-MR) images of breast lesions and the impact of case-based classification repeatability on sensitivity and specificity. DCE-MR images of 1,169 unique breast lesions (267 benign, 902 malignant) were retrospectively collected under HIPAA/IRB. Lesions were automatically segmented using a fuzzy c-means method; thirty-eight radiomic features were extracted. Three classification tasks were investigated: (i) benign vs. malignant, (ii) pure ductal carcinoma in situ (DCIS) vs. DCIS with invasive ductal carcinoma (IDC), and (iii) luminal A or luminal B cancers vs. other molecular subtypes. Case-based repeatability of classifier output was constructed using 0.632+ bootstrap sampling (1000 iterations) with classification by support vector machine (SVM). Repeatability profiles were constructed for each task using the 95% confidence interval widths of the classifier output for cases in the test folds over all bootstrap iterations. The relationships between classifier output repeatability and variability in sensitivity and specificity over the bootstrap test folds were investigated. Most cases fell within the highest repeatability of classifier output over all three classification tasks. Sensitivity and specificity demonstrated more variability in the test folds than in the training folds at corresponding thresholds for the classifier output. Higher repeatability of classifier output was associated with lower variability in sensitivity and specificity in tasks (i) and (ii) but not in task (iii). Case-based repeatability profiles may be important for characterizing impact of using radiomics with desired sensitivity and specificity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study explored the possibility of using the gist signal (radiologists’ first impression about a case) for improving the performance of two recently developed deep learning-based breast cancer detection tools. We investigated whether by combining the cancer class probability from the networks with the gist signal, higher performance in identifying malignant cases can be achieved. In total, we recruited 53 radiologists, who provided an abnormality score on a scale from 0 to 100 to unilateral mammograms following a 500-millisecond presentation of the image. Twenty cancer cases, 40 benign cases, and 20 normal were included. Two state-ofthe-art deep learning-based tools (M1 and M2) for breast cancer detection were adopted. The abnormality scores from the networks and the gist responses for each observer were fed into a support vector machine (SVM). The SVM was personalized for each radiologist and its performance was evaluated using leave-one-out cross-validation. We also considered the average reader; whose gist responses were the mean abnormality scores given by all 53 readers to each image. The mean and range of AUCs in the gist experiment were 0.643 and 0.492-0.794, respectively. The AUC values for M1 and M2 were 0.789 (0.632-0.892) and 0.814 (0.673-0.897), respectively. For the average reader, the AUC for gist, gist+M1, and gist+M2 were 0.760 (0.617-0.862), 0.847 (0.754-0.928), 0.897 (0.789-0.946). For 45 readers, the performance of at least one of the models improved after aggregating its output with the gist signal. The results showed that the gist signal has the potential to improve the performance of adopted deep learning-based tools.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A common study design for comparing the diagnostic performance of imaging modalities is to obtain modalityspecific ratings from multiple readers of multiple cases whose true statuses are known. Typically, there is overlap between the modalities, readers, and/or cases for which special analytical methods are needed to perform statistical comparisons. We describe our new R software package MRMCaov, which is designed for multi-reader multi-case comparisons of two or more imaging modalities. The software allows for the comparison of reader performance metrics, such as area under the receiver operating characteristic curve (ROC AUC), with analysis of variance methods originally proposed by Obuchowski and Rockette (1995) and later unified and improved by Hillis and colleagues (2005, 2007, 2008, 2018). MRMCaov is an open-source package with an integrated command-line interface for performing multi-reader multi-case statistical analysis, plotting, and presenting results. Features of the package include (1) ROC curves estimated parametrically or non-parametrically; (2) reader-specific ROC curves and performance metrics; (3) user-definable performance metrics; (4) modality-specific estimates of mean performance along with confidence intervals and p-values for statistical comparisons; (5) support for factorial, nested, or partially paired study designs; (6) inference for random readers and cases, random readers and fixed cases, or fixed readers and random cases; (7) DeLong, jackknife, or unbiased covariance estimation; and (8) compatibility with Microsoft Windows, Mac OS, and Linux.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Two-alternative forced-choice (2AFC) reader studies are useful for evaluating medical imaging devices because humans can rapidly make direct comparisons with high precision leading to low variability in study results. We propose a method for estimating the receiver operating characteristic (ROC) curve, reader performance (area under the ROC curve, AUC), and uncertainty on AUC from a series of 2AFC trials on a finite data set. Our method greatly reduces the number of 2AFC comparisons required by using an algorithm created for sorting, in this case Merge Sort. By altering the algorithm to work in discrete layers, we can make unbiased estimates as the study proceeds. Because the merging is pre-planned with a tree structure, we can use a Hanley-McNeil approximation to predict the reduction in variance in AUC from performing more 2AFC comparisons. The algorithm is also altered to increase the amount of time between the reader seeing the same image repeatedly thus decreasing potential learning. We compare our method with that of Massanes and Brankov (2016).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multi-reader multi-case (MRMC) studies are widely used in assessing medical imaging and computer-aided diagnosis devices to demonstrate the generalizability of diagnostic performance to both the population of patient cases and the population of physician readers. Simulation of MRMC study data plays an important role in validating MRMC data analysis methods or sizing a pivotal study based on pilot data. The popular Roe and Metz simulation model is a linear mixed-effect model that models a human reader's latent decision variable in assessing patient's likelihood of disease as a sum of fixed modality effect, random reader effect, random case effect, random interaction effects, and a random error. The fixed effect is represented by a mean parameter and each random effect is represented by the variance parameter of a zero-mean Gaussian distribution. The purpose of this paper is to develop a method to set these parameters such that the simulated data have realistic ROC performance characteristics (mean AUC, variance components of AUC, and inter-reader/inter-modality correlations). To this end, we derived quasi-closed-form expressions to express the mean AUC and its U-statistic variance components as functions of the simulation model parameters. We then developed a numerical algorithm to solve the simulation model parameters from the mean AUC and its U-statistic variance components. Since the mean AUC and its U-statistic variance components can be estimated from real-world reader study data, simulated data have similar performance characteristics with the real-world data. Simulation studies were conducted to verify our parameter transformation algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The most frequently used model for simulating MRMC data that emulate confidence-of-disease ratings from diagnostic imaging studies has been the Roe and Metz model, proposed in 1997. The RM model generates continuous confidence-of-diseases ratings based on an underlying equal-variance binormal model for each reader, with the separation between the normal and abnormal rating distributions varying across readers. A problem with the RM model is that the parameters are expressed in terms of the rating distributions, as opposed to the reader performance outcomes. Because MRMC analysis results are almost always expressed in terms of the reader performance outcomes, and not in terms of the rating data distributions, it has been difficult to assess how similar the simulated data are to MRMC data encountered in practice. To remedy this situation, recently Hillis (in 2018) derived formulas expressing parameters that describe the distribution of empirical AUC outcomes computed from RM simulated data as functions of the RM parameters. An examination of these values revealed several problems with the realism of the simulated data. This paper continues that work by providing the inverse mapping, i.e., by deriving an algorithm that expresses the RM parameters as functions of the AUC empirical distribution parameters. This result will enable the creation of a recalibrated RM model that more closely emulates real-data studies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The overlapping structures in a chest radiograph can make the detection of pneumothorax difficult. In addition, the visual signs of a pneumothorax, including a fine line at the edge of the lung and a change in texture outside the lung, can be subtle. Some published studies have reported high performances using deep learning for the detection of pneumothorax in chest radiographs using the publicly available ChestX-ray8 dataset. However, at the image input sizes these studies used, 256 x 256 or 224 x 224 pixels, the visual signs of a pneumothorax are typically not visible. In this study, radiographs labeled as pneumothorax in the ChestX-ray8 dataset were interpreted by a radiologist and then confirmed using the radiologist-defined truth from a pneumothorax challenge database. In addition, chest radiographs with and without pneumothorax were obtained from our institution and verified. Therefore, the entire dataset of 5,346 radiographs had truth confirmed by two radiologists. The dataset was used for fine-tuning a VGG19 neural network for the task of detecting pneumothorax in chest radiographs. After fine-tuning was complete, network visualization was performed using Grad-CAM to determine the most influential aspects of the radiograph for the network’s classification. It was found that 67% of Grad-CAM heatmaps for correctly classified pneumothorax cases did not have regions of high influence that overlapped with the actual location of the pneumothorax. Overall, the independent test set yielded an AUC of 0.78 (95% confidence interval: 0.74, 0.82) in the task of distinguishing between radiographs with and without pneumothorax.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Laparoscopic videos can be affected by different distortions which may impact the performance of surgery and introduce surgical errors. In this work, we propose a framework for automatically detecting and identifying such distortions and their severity using video quality assessment. There are three major contributions presented in this work (i) a proposal for a novel video enhancement framework for laparoscopic surgery; (ii) a publicly available database for quality assessment of laparoscopic videos evaluated by expert as well as non-expert observers and (iii) objective video quality assessment of laparoscopic videos including their correlations with expert and non-expert scores.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The objective optimization of medical imaging systems requires full characterization of all sources of randomness in the measured data, which includes the variability within the ensemble of objects to-be-imaged. This can be accomplished by establishing a stochastic object model (SOM) that describes the variability in the class of objects to-be-imaged. Generative adversarial networks (GANs) can be potentially useful to establish SOMs because they hold great promise to learn generative models that describe the variability within an ensemble of training data. However, because medical imaging systems record imaging measurements that are noisy and indirect representations of object properties, GANs cannot be directly applied to establish stochastic models of objects to-be-imaged. To address this issue, an augmented GAN architecture named AmbientGAN was developed to establish SOMs from noisy and indirect measurement data. However, because the adversarial training can be unstable, the applicability of the AmbientGAN can be potentially limited. In this work, we propose a novel training strategy|Progressive Growing of AmbientGANs (ProAGAN)|to stabilize the training of AmbientGANs for establishing SOMs from noisy and indirect imaging measurements. An idealized magnetic resonance (MR) imaging system and clinical MR brain images are considered. The proposed methodology is evaluated by comparing signal detection performance computed by use of ProAGAN-generated synthetic images and images that depict the true object properties.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Assessment of computed tomography (CT) images can be complex due to a number of dependencies that affect system performance. In particular, it is well-known that noise in CT is object-dependent. Such objectdependence can be more pronounced and extend to resolution and image textures with the increasing adoption of model-based reconstruction and processing with machine learning methods. Moreover, such processing is often inherently nonlinear complicating assessments with simple measures of spatial resolution, etc. Similarly, recent advances in CT system design have attempted to improve fine resolution details – e.g., with newer detectors, smaller focal spots, etc. Recognizing these trends, there is a greater need for imaging assessment that are considering specific features of interest that can be placed within an anthropomorphic phantom for realistic emulation and evaluation. In this work, we devise a methodology for 3D-printing phantom inserts using procedural texture generation for evaluation of performance of high-resolution CT systems. Accurate representations of texture have previously been a hindrance to adoption of processing methods like model-based reconstruction, and texture serves as an important diagnostic feature (e.g. heterogeneity of lesions is a marker for malignancy). We consider the ability of different systems to reproduce various textures (as a function of the intrinsic feature sizes of the texture), comparing microCT, cone-beam CT, and diagnostic CT using normal- and high-resolution modes. We expect that this general methodology will provide a pathway for repeatable and robust assessments of different imaging systems and processing methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, we show that when a training data set is supplemented by drawing samples from a distribution that is different from that of the target population, the differences in the distributions of the original and supplemental training populations should be considered to maximize the performance of the classifier in the target population. Depending on these distributions, drawing a large number of cases from the supplemental distribution may result in lower performance compared to limiting the number of added cases. This is relevant for medical images when synthetic data is used for training a machine learning algorithm, which may result in a mixed distribution for the training set. We simulated a twoclass classification problem and determined the performance of a linear classifier and a neural network classifier on test cases when trained with cases from only the target distribution, and when cases from a shifted, supplemental distribution are added to a limited number of cases from the target distribution. We show that adding data from a supplemental distribution for machine learning classifier training may improve the performance on the target test distribution. However, given the same number of training cases from a mixed distribution, the performance may not reach the performance of only training on data from the target distribution. In addition, the increase in performance will peak or plateau, depending on the shift in the distribution and the number of cases from the supplemental distribution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In general, deep networks are biased by the truth data provided to the network in training. Many recent studies are focused on understanding and avoiding biases in deep networks so that they can be corrected in future predictions. Particularly, as deep networks experience increased implementation, it is important that biases are explored to understand where predictions can fail. One potential source of bias is in the truth data provided to the network. For example, if a training set consists of only white males, it is likely that predictive performance will be improved on a testing set of white males than a testing set of African-American females. The U-Net architecture is a deep network that has seen widespread use over recent years, particularly for medical imaging segmentation tasks. The network is trained using a binary mask delineating the object to be segmented, which is typically produced using manual or semi-automated methods. It is possible for the manual/semi-automated method to yield biased truth, thus, the purpose of our study is to evaluate the impact of varying truth data as provided by two different observers on U-Net segmentation performance. Additionally, a common problem in medical imaging research is a lack of data, forcing many studies to be performed with insufficient datasets. However, the U-Net has been shown to achieve sufficient segmentation performance on small training set sizes, thus we also investigate the impact of training set size on U-Net performance for a simple segmentation task in low-dose thoracic CT scans. This is also conducted to support that the results produced in the observer variability section of this study are not caused by lack of sufficient training data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We investigate a series of two-alternative forced-choice (2AFC) discrimination tasks based on malignant features of abnormalities in low-dose lung CT scans. A total of 3 tasks are evaluated, and these consist of a size-discrimination task, a boundary-sharpness task, and an irregular-interior task. Target and alternative signal profiles for these tasks are modulated by one of two system transfer functions and embedded in ramp-spectrum noise that has been apodized for noise control in one of 4 different ways. This gives the resulting images statistical properties that are related to weak ground glass lesions in axial slices of low-dose lung CT images. We investigate observer performance in these tasks using a combination of statistical efficiency and classification images. We report results of 24 2AFC experiments involving the three tasks. A staircase procedure is used to find the approximate 80% correct discrimination threshold in each task, with a subsequent set of 2,000 trials at this threshold. These data are used to estimate statistical efficiency with respect to the ideal observer for each task, and to estimate the observer template using the classification-image methodology. We find efficiency varies between the different tasks with lowest efficiency in the boundary-sharpness task, and highest efficiency in the non-uniform interior task. All three tasks produce clearly visible patterns of positive and negative weighting in the classification images. The spatial frequency plots of classification images show how apodization results in larger weights at higher spatial frequencies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Model observers for image quality assessment have been extensively used in the field of medical imaging. The majority of model observer developments have involved signal detection tasks with a few number of signal locations and models that have not explicitly incorporated the varying resolution in visual processing across the visual field (foveated vision). Here, we evaluate search performance by human and model observers in 3D search and 2D single-slice search with DBT virtual phantoms images for a simulated single simulated macrocalcification. We compare the ability of a Channelized Hotelling Observer model (CHO) and a Foveated Channelized Hotelling model (FCHO) in predicting human performance across 2D and 3D search. Human performance detecting the macrocalcification signal was significantly higher in 2D than in 3D (proportion correct, PC = 0.89 vs 0.68). However, the CHO model predicted a lower performance in 2D than in 3D search (PC = 0.84 vs 0.93). The FCHO, that processes the visual field with lowering spatial detail as the distance increases from the point of fixation, executes eye movements, and scrolls across slices, correctly predicts the relative performance for the detection of the macrocalcification in 2D and 3D search (PC = 0.92 vs 0.59). These results suggest that foveation is a key component for model observers when predicting human performance detecting small signals in DBT search.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Medical imaging systems are commonly assessed by use of objective image quality measures. Supervised deep learning methods have been investigated to implement numerical observers for task-based image quality assessment. However, labeling large amounts of experimental data to train deep neural networks is tedious, expensive, and prone to subjective errors. Computer-simulated image data can potentially be employed to circumvent these issues; however, it is often difficult to computationally model complicated anatomical structures, noise sources, and the response of real-world imaging systems. Hence, simulated image data will generally possess physical and statistical differences from the experimental image data they seek to emulate. Within the context of machine learning, these differences between the sets of two images is referred to as domain shift. In this study, we propose and investigate the use of an adversarial domain adaptation method to mitigate the deleterious effects of domain shift between simulated and experimental image data for deep learning-based numerical observers (DL-NOs) that are trained on simulated images but applied to experimental ones. In the proposed method, a DL-NO will initially be trained on computer-simulated image data and subsequently adapted for use with experimental image data, without the need for any labeled experimental images. As a proof of concept, a binary signal detection task is considered. The success of this strategy as a function of the degree of domain shift present between the simulated and experimental image data is investigated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Purpose: To develop a deep learning approach for channelization of the Hotelling model observer (DL-CHO) and apply to the task based image quality evaluation of digital breast tomosynthesis (DBT) using a structured phantom. Methods: An acrylic semi-cylindrical container was filled with different sizes of acrylic spheres and water. Five 3D printed non-spiculated mass models were also inserted in the phantom, each with different size (diameter from 1.5mm to 5.7mm). The phantom was scanned on 8 different DBT systems, at 3 dose levels on each system, giving a total of 594 DBT scans. Nearly half of the image dataset was read by human readers using a 4-alternative forced choice (4-AFC) paradigm. From the human results, an anthropomorphic DL-CHO was developed and trained, utilizing a single convolutional layer with five kernels functioning like channels. After 50 training epochs, the convolutional kernels were fixed and then validated with the second half of the image dataset. Statistical analysis of the goodness of the fit between the newly developed DL-CHO and human observers was performed to estimate the appropriateness of the new CHO for multivendor tomosynthesis studies. Results: The DL-CHO shows good agreement with human observers for all 8 DBT systems, with Pearson’s correlation between 0.90 and 0.99; linear regression slope between 0.60 and 1.17; and mean error between -5.6PC and 12PC. The DL-CHO shows better reproducibility compared to human observers for most of the lesion sizes. Conclusions: The DL-CHO offers a robust and efficient means of evaluating DBT test object images, for the purpose of DBT system image quality evaluation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We proposed a convolutional neural network (CNN)-based anthropomorphic model observer to predict human observer detection performance for breast cone-beam CT images. We generated the breast background with a 50% volume glandular fraction and inserted 2mm diameter spherical signal near the center. Projection data were acquired using a forward projection algorithm and were reconstructed using the Feldkamp-Davis-Kress reconstruction. To generate different noise structures, the projection data were filtered with Hanning, SheppLogan, and Lam-Lak filters with and without Fourier interpolation, resulting in six different noise structures. To investigate the benefits of non-linearity in CNN, we used the two different network architectures: linear CNN (Li-CNN) without any activation function and multi-layer CNN (ML-CNN) with a leaky rectified linear unit. For comparison, we also used a nonprewhitening observer with an eye-filter (NPWE) having the peak value at the frequency of 7 cyc/deg based on our previous work. We trained CNN to minimize the mean squared error using 12,000 pairs of signal-present and signal-absent images which were labeled with decision variable from NPWE. When labeling, the eye filter parameter of NPWE was fine-tuned separately for each noise structure to match percent correct to that of human observers. Note that we trained a single network for different noise structures whereas the template of NPWE was estimated for each noise structure. We conducted four alternative forced choice for detection tasks, and percent correct of human and model observers were compared. The results show that the proposed ML-CNN better predicts detection performance of human observers than NPWE and Li-CNN.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, we aim to accurately segment the cerebral vasculature on MRI-TOF images. This study is part of a wider project1 in which we intend to characterize the arterial bifurcations in order to estimate the risk of occurrence of intra-cranial aneurysms (ICA). However, a very accurate segmentation of the vasculature is needed along the Circle of Willis (as this is where most of Intra-Cranial Aneurysms occur) prior to launch the bifurcation characterization. An imprecise segmentation of the Circle of Willis will inevitably lead to a deficient characterization, and thus an erroneous ICA risk estimation. This study was motivated by the lack of efficiency of various State of the Art segmentation methods. In this work, we try to mimic the behavior of the Human Visual System in order to correctly segment the Circle of Willis on TOF imaging of the brain. When Neuroradiologists diagnose an aneurysm on an MRI volume, they modulate the image contrast and luminance so that the vasculature is highlighted within the image. In this work, we first consider the display monitor behavior and we exploit a model that mimics the perception of contrasts by a human observer, in order to accentuate the vasculature for the last segmentation step. Indeed, thanks to this perceptual contrast enhancement, the amplitude of the vasculature moves beyond the rest of the image (parenchyma, cerebrospinal fluid,· · ·) this perceptual contrast stretching then allows to simplify the final thresholding step.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have implemented a technique for analyzing and characterizing the textures in medical images. This technique generates a list of characteristic textures and sorts them from most important to least important for the task of detecting a specific signal in the image. The effects of the human-visual system can be incorporated into this method through the use of an eye filter. The final set of sorted textures can be quickly utilized to analyze new sets of images and make comparison regarding performance on the same task. This analysis is based upon whether the new set of images contains textures that are similar or dissimilar to that of the original set of images. We present the method for analyzing and sorting textures based on how well signals can be distinguished. We also discuss the importance of the most "obscuring" textures that make signal-detection difficult. Results and comparisons of task performance are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image quality assessment is important to maintain and improve the imaging system performance, and conducting a human observer study is considered the most desirable approach for the given task because the human makes the diagnostic decision. However, performing a human observer study is time-consuming and expensive. As an alternative method, mathematical model observers to mimic the human observer performance have been proposed. In this work, we proposed convolutional neural network (CNN) based anthropomorphic model observer and compared its performance with human observer and dense difference-of-Gaussian channelized Hotelling observers (D-DOG CHO) for breast tomosynthesis images. The proposed network contained input image, 2D convolution, batch normalization, leaky ReLU, fully connected, and regression layers, and we trained the network using stochastic gradient with momentum (SGDM) optimizer with design parameters, such as filter size and number of filters. For training, validation, and testing data set, anatomical background with 30% volume glandular fraction was generated using the power law spectrum of breast anatomy, and sphere object with a 1 mm diameter was used as a lesion for detection task. In-plane breast tomosynthesis images were obtained using filtered back-projection based tomosynthesis reconstruction. To evaluate detection performance of human observer, D-DOG CHO, and the proposed network, we calculated percent correct (Pc) as a figure of merit. Our results show that the detectability of the proposed network containing 20 number of 11 by 11 convolution filters is most similar to that of human observer.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Signal-known-statistically (SKS) detection task is more relevant to the clinical tasks compared to signal-knownexactly (SKE) detection task. However, anthropomorphic model observers for SKS tasks have not been studied as much as those for SKE tasks. In this study, we compare the ability of conventional model observers (i.e., channelized Hotelling observer and nonprewhitening observer with an eye-filter) and convolutional neural network (CNN) to predict human observer performance on SKS and background-known-statistically tasks in breast cone beam CT images. For model observers, we implement 1) the model which combines the responses of each signal template and 2) two-layer CNN. We implement two-layer CNN in linear and nonlinear schemes. Nonlinear CNN contains max pooling layer and nonlinear activation function which are not contained in linear CNN. Both linear and nonlinear CNN based model observers predict the rank of human observer performance for different noise structures better than conventional model observers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Subjective reading is still the majority way in current medical image diagnostics, and the image visualization effect to the observer is very important for the reading performance. And the display window settings play the significant role on the display quality of CT images. To improve the greyscale-based image contrast detectability, we propose a new idea that the window settings can be automatically adjusted in accordance with human visual properties. With the optimized window settings, the greyscalebased image contrast is enhanced, reading performance is improved by maximizing the visibility of targeting objects which the observer focusing on, and image impression is maintained as some level of consistency.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
When using a conventional analogue dermatoscope, imperfect focus or other image quality problems are readily detected and corrected by the dermatologist. In digital dermoscopy, however, images may be inspected by the dermatologist well after their acquisition. In such a situation, there is no possibility to remedy image quality deficiencies. This demonstrates the importance of image quality in digital dermoscopy. In this paper, we investigate the effects of resolution, blur, and color consistency on the perceived quality of digital dermoscopic images. The effects are quantified using task-based adaptive measurements of human observers perceived threshold for each imaging parameter. Our results indicate that the thresholds for imaging parameters depend on the image content and the observer and that the acceptable quality threshold lies well below what is achievable with our current imaging equipment. An unexpected and interesting observation we made in the course of experiments is that a user may prefer a less colorful image to enhance visualization of certain details.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image quality assessment (IQA) is an important step to determine whether the computed tomography (CT) images are suitable for diagnosis. Since the high dose CT images are usually not accessible in clinical practice, no-reference (NR) CT IQA should be used. Most NR-IQA methods for CT images based on deep learning strategy focus on global information and ignores local performance, i.e., contrast, edge of local region. In this work, to address this issue, we presented a new NR-IQA framework combining global and local information for CT images. For simplicity, the NR-IQA framework is termed as NR-GL-IQA. In particular, the presented NR- GL-IQA adopts a convolutional neural network to predict entire image quality blindly without a reference image. In this stage, an elaborate strategy is used to automatically label the entire image quality for neural network training to cope with the problem of time-consuming in manually massive CT images annotation. Second, in the presented NR-GL-IQA method, Perception-based Image QUality Evaluator (PIQUE) is used to predict the local region quality because the PIQUE can adaptively capture the local region characteristics. Finally, the overall image quality is estimated by combining the global and local IQA together. The experimental results with Mayo dataset demonstrate that the presented NR-GL-IQA method can accurately predicts CT image quality and the combination of global and local IQA is closer to the radiologist assessment than that with only one single assessment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Purpose: This work aims to develop an anthropomorphic convolutional neural network (CNN) classifier, based on the ResNet18 deep learning network and validate it for task based image quality evaluation of digital breast tomosynthesis (DBT) using a structured phantom with non-spiculated mass simulating lesions. Methods: The phantom is constructed from an acrylic breast-shaped container, filled with acrylic spheres and water resembling the background. Five 3D printed non-spiculated mass targets are also inserted in the phantom each with differing size from 1.5mm to 5.7mm. The phantom was scanned 530 times on 8 different DBT systems with 3 dose levels. Half of the image dataset was read by human readers in 4-alternative forced choice (4-AFC) paradigm. The 4-AFC human scores were used to label the cropped signal present and signal absent images. A pre-trained ResNet18 neural network was used and modified for binary classification and the labeled images were used to further train the network for the specific non-spiculated mass detection task. With completed 50 training epochs, the resulting ResNet18 classifier was validated wit the second half of the image dataset against human results. During the training process the loss and accuracy were stored, and statistical analysis was performed for the validation of the ResNet18 against human observers. Results and conclusions: The ResNet18 classifier shows good agreement against human observers for most of the DBT systems and reading sessions. The overall correlation was higher than 0.92. The study shows that a CNN can successfully approximate human scores and can be used for future DBT system image quality estimation studies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the advent of powerful convolutional neural networks (CNNs), recent studies have extended early applications of neural networks to imaging tasks thus making CNNs a potential new tool for assessing medical image quality. Here, we compare a CNN to model observers in a search task for two possible signals (a simulated mass and a smaller simulated micro-calcification) embedded in filtered noise and single slices of Digital Breast Tomosynthesis (DBT) virtual phantoms. For the case of the filtered noise, we show how a CNN can approximate the ideal observer for a search task, achieving a statistical efficiency of 0.77 for the microcalcification and 0.78 for the mass. For search in single slices of DBT phantoms, we show that a Channelized Hotelling Observer (CHO) performance is affected detrimentally by false positives related to anatomic variations and results in detection accuracy below human observer performance. In contrast, the CNN learns to identify and discount the backgrounds, and achieves performance comparable to that of human observer and superior to model observers (Proportion Correct for the microcalcification: CNN = 0.96; Humans = 0.98; CHO = 0.84; Proportion Correct for the mass: CNN = 0.98; Humans = 0.83; CHO = 0.51). Together, our results provide an important evaluation of CNN methods by benchmarking their performance against human and model observers in complex search tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Toward objective assessment of skin erythema, diverse approaches were in a contest. Yet, spectral measurement techniques are discriminated by identifying the distinct states of skin health with a unique signature. In this work, we selected two spectral techniques for the assessment of erythema induced by radiation therapy of skin cancer. The selected techniques are diffuse reflectance spectroscopy (DRS) measurements and hyperspectral imaging (HSI). The purpose of this work is to evaluate the performance of DRS and HSI compared with the visual assessment (VA) technique. VA is the gold standard for erythema assessment. For evaluation purposes, erythema indices were computed for both spectral techniques. Next, Pearson correlation was computed relative to VA scores. The results showed that HSI had a higher correlation with VA rather than DRS technique. In sum, the DRS technique suffers from the limited region of measurements and being in contact with the skin which is not the case with HSI.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.