The application of computer-vision algorithms in medical imaging has increased rapidly in recent years. However, algorithm training is challenging due to limited sample sizes, lack of labeled samples, as well as privacy concerns regarding data sharing. To address these issues, we previously developed (Bergen et al. 2022) a synthetic PET dataset for Head & Neck (H&N) cancer using the temporal generative adversarial network (TGAN) architecture and evaluated its performance segmenting lesions and identifying radiomics features in synthesized images. In this work, a two-alternative forced-choice (2AFC) observer study was performed to quantitatively evaluate the ability of human observers to distinguish between real and synthesized oncological PET images. In the study eight trained readers, including two board-certified nuclear medicine physicians, read 170 real/synthetic image pairs presented as 2D-transaxial using a dedicated web app. For each image pair, the observer was asked to identify the “real” image and input their confidence level with a 5-point Likert scale. P-values were computed using the binomial test and Wilcoxon signed-rank test. A heat map was used to compare the response accuracy distribution for the signed-rank test. Response accuracy for all observers ranged from 36.2% [27.9-44.4] to 63.1% [54.8-71.3]. Six out of eight observers did not identify the real image with statistical significance, indicating that the synthetic dataset was reasonably representative of oncological PET images. Overall, this study adds validity to the realism of our simulated H&N cancer dataset, which may be implemented in the future to train AI algorithms while favoring patient confidentiality and privacy protection.
Artificial intelligence (AI)-based methods are showing substantial promise in segmenting oncologic positron emission tomography (PET) images. For clinical translation of these methods, assessing their performance on clinically relevant tasks is important. However, these methods are typically evaluated using metrics that may not correlate with the task performance. One such widely used metric is the Dice score, a figure of merit that measures the spatial overlap between the estimated segmentation and a reference standard (e.g., manual segmentation). In this work, we investigated whether evaluating AI-based segmentation methods using Dice scores yields a similar interpretation as evaluation on the clinical tasks of quantifying metabolic tumor volume (MTV) and total lesion glycolysis (TLG) of primary tumor from PET images of patients with non-small cell lung cancer. The investigation was conducted via a retrospective analysis with the ECOG-ACRIN 6668/RTOG 0235 multi-center clinical trial data. Specifically, we evaluated different structures of a commonly used AI-based segmentation method using both Dice scores and the accuracy in quantifying MTV/TLG. Our results show that evaluation using Dice scores can lead to findings that are inconsistent with evaluation using the task-based figure of merit. Thus, our study motivates the need for objective task-based evaluation of AI-based segmentation methods for quantitative PET.
Preclinical PET imaging is widely used to quantify in vivo biological and metabolic process at molecular level in small animal imaging. In preclinical PET, low-count acquisition has numerous benefits in terms of animal logistics, maintaining integrity in longitudinal multi-tracer studies, and increased throughput. Low-count acquisition can be realized by either decreasing the injected dose or by shortening the acquisition time. However, both these approaches lead to reduced photons, generating PET images with low signal-to-noise ratio (SNR) exhibiting poor image quality, lesion contrast, and quantitative accuracy. This study is aimed at developing a deep-learning (DL) based framework to generate high-count PET (HC-PET) from low-count PET (LC-PET) images using Residual U-Net (RU-Net) and Dilated U-Net (D-Net)-based architectures. Preclinical PET images at different photon count levels were simulated using a stochastic and physics-based method and fed into the framework. The integration of residual learning in the U-Net architecture enhanced feature propagation while the dilated kernels enlarged receptive field-of-view to incorporate multiscale context. Both DL methods exhibited significantly (p≤0.05) better performance in terms of Structural Similarity Index Metric (SSIM), Peak Signal-to-Noise Ratio (PSNR) and Normalized Root Mean Square Error (NRMSE) when compared to existing non-DL denoising techniques such as Non-Local Means (NLM) and BM3D filtering. In objective evaluation of quantification task, the DL-based approaches yielded significantly lower bias in determining the mean standardized uptake value (SUVmean) of liver and tumor lesion than the non-DL approaches. Of the DL frameworks, D-Net based generation of HC-PET had the least bias and coefficient of variation at all photon count levels. Our study suggests that DL can predict HC-PET images with improved visual quality and quantitative accuracy from LC-PET (preclinical) images.
Objective evaluation of quantitative imaging (QI) methods with patient data is highly desirable, but is hindered by the lack or unreliability of an available gold standard. To address this issue, techniques that can evaluate QI methods without access to a gold standard are being actively developed. These techniques assume that the true and measured values are linearly related by a slope, bias, and Gaussian-distributed noise term, where the noise between measurements made by different methods is independent of each other. However, this noise arises in the process of measuring the same quantitative value, and thus can be correlated. To address this limitation, we propose a no-gold-standard evaluation (NGSE) technique that models this correlated noise by a multi-variate Gaussian distribution parameterized by a covariance matrix. We derive a maximum-likelihood-based approach to estimate the parameters that describe the relationship between the true and measured values, without any knowledge of the true values. We then use the estimated slopes and diagonal elements of the covariance matrix to compute the noise-to-slope ratio (NSR) to rank the QI methods on the basis of precision. The proposed NGSE technique was evaluated with multiple numerical experiments. Our results showed that the technique reliably estimated the NSR values and yielded accurate rankings of the considered methods for 83% of 160 trials. In particular, the technique correctly identified the most precise method for ∼ 97% of the trials. Overall, this study demonstrates the efficacy of the NGSE technique to accurately rank different QI methods when the correlated noise is present, and without access to any knowledge of the ground truth. The results motivate further validation of this technique with realistic simulation studies and patient data.
Objective evaluation of new and improved methods for PET imaging requires access to images with ground truth, as can be obtained through simulation studies. However, for these studies to be clinically relevant, it is important that the simulated images are clinically realistic. In this study, we develop a stochastic and physics-based method to generate realistic oncological two-dimensional (2-D) PET images, where the ground-truth tumor properties are known. The developed method extends upon a previously proposed approach. The approach captures the observed variabilities in tumor properties from actual patient population. Further, we extend that approach to model intra-tumor heterogeneity using a lumpy object model. To quantitatively evaluate the clinical realism of the simulated images, we conducted a human-observer study. This was a two-alternative forced-choice (2AFC) study with trained readers (five PET physicians and one PET physicist). Our results showed that the readers had an average of ∼ 50% accuracy in the 2AFC study. Further, the developed simulation method was able to generate wide varieties of clinically observed tumor types. These results provide evidence for the application of this method to 2-D PET imaging applications, and motivate development of this method to generate 3-D PET images.
Nicotinamide has been shown to affect blood flow in both tumor and normal tissues, including skeletal muscle. Intraperitoneal injection of nicotinamide was used as a simple intervention to test the sensitivity of noninvasive diffuse correlation spectroscopy (DCS) to changes in blood flow in the murine left quadriceps femoris skeletal muscle. DCS was then compared with the gold-standard fluorescent microsphere (FM) technique for validation. The nicotinamide dose-response experiment showed that relative blood flow measured by DCS increased following treatment with 500- and 1000-mg / kg nicotinamide. The DCS and FM technique comparison showed that blood flow index measured by DCS was correlated with FM counts quantified by image analysis. The results of this study show that DCS is sensitive to nicotinamide-induced blood flow elevation in the murine left quadriceps femoris. Additionally, the results of the comparison were consistent with similar studies in higher-order animal models, suggesting that mouse models can be effectively employed to investigate the utility of DCS for various blood flow measurement applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.