|
1.INTRODUCDTIONCardiac computed tomography (CT) has emerged as a noninvasive method to evaluate the coronary artery disease and assess the cardiac function. However, image quality can be limited by motion of cardiac structures. For example, even slow coronary vessel motion (~15mm/s) can cause significant blurring of vessels [2]. Improved hardware such as faster gantry rotation or dual source designs can avoid/reduce motion artifacts but further improvement appears limited by physical constraints. Machine learning algorithms [3], [4] have been used to correct motion artifacts in reconstructed images. However, current approaches are limited by the need as a true motion vector field (for training) is unavailable in clinical data. Recently, implicit neural representations (INR) [5] have been used to improve reconstruction of medical images [6], [7]. Gupta et al. [1] recently developed an INR-based framework to improve reconstruction of CT data corrupted by object motion. This framework, called “NeuralCT”, takes CT sinograms as the input and produces time-resolved images and was shown to correct motion artifacts. A key benefit of NeuralCT is that it does not impose a motion model nor require estimates of the object motion. An overview is shown in Fig 1. NeuralCT utilizes a neural network to implicitly represents (neural representation) the moving object boundary via the signed distance function (SDFs). Concretely, the INR maps the spatiotemporal domain of the moving object (a point at a particular position and time) to SDF value domain (the real-time relative position of this point with respect to the object boundary). In this work, the neural representation was initialized using intensity-based segmentation of the motion corrupted Filtered Backprojection (FBP) result. The representation was then optimized via differentiable rendering (DR) [5], a technique used to identify the shape of an object that best “explains” its acquired projection. Thus, NeuralCT aims to identify the optimal time-varying shape of moving object such that the resultant projection agrees with the CT sinogram (ground truth projections). We emphasize that NeuralCT is not a learning task that requires training and testing datasets as such approaches depend on data driven priors which have a tendency to introduce bias in the reconstruction. Instead NeuralCT builds on work where INR problems are solved via optimization. In this case, NeuralCT performs optimized reconstruction by forward rendering the moving object to acquire projection estimates, calculating the error between projection estimates and the true sinogram, and then updating the reconstruction by backpropagating the error via gradient descent. In the initial description of NeuralCT, Gupta et al. showed high-quality motion-correction for a single foreground object with high angular motion (up to 200° displacement per gantry rotation) as well as complex topological deformation [1]. However, clinical CT scans are not composed of a single foreground class. Therefore, the core contribution to this study is to extend NeuralCT to successfully correct motion artifacts in scenes with multiple (i.e., different intensity) moving objects. In particular, we observed that imaging multiple moving objects with different attenuations can limit the accuracy of intensity-based segmentation and consequently decrease the reconstruction performance. As a result, we incorporate spatial information into the segmentation and compare our improved reconstruction result with the initial NeuralCT and FBP. 2.METHODSA.NeuralCT FrameworkThe NeuralCT framework is described in Fig. 1 and the full description can be found in Ref [1]. The CT sinogram is the input and a time-resolved attenuation map I*(x, t) (motion-corrected image) is the output. The steps of the algorithm are: Step 1: FBP images are created via backprojection of the sinogram P (comprised of a set of projections {P1, P2, …, Pn} for n gantry positions). This results in a series of motion-corrupted attenuation images IFBP (x, t). Step 2: Segmentation Seg is used to identify different foreground objects from IFBP(x, t). The choice of Seg will be further discussed in Section II.B and II.C. Segmentation results in a binary time-varying images B(x, t, k) where the kth channel corresponds to the kth foreground object. Step 3: The time-varying scene of k binary images B(x, t, k) is implicitly represented using the signed distance function (SDF). Specifically, SDF(x, t, k) is generated to represent the position of the boundary as the signed distance of a point at location x in space at a particular time t to the boundary of the kth object. Step 4: For each location x ∈ RN where N is the number of spatial dimensions, the temporal evolution of an object’s SDF was represented by Fourier Features (FF) using Fourier coefficients {A0, A1, …, AM, B0, B1, …, BM}: Here, ωi are M randomly sampled frequencies. In our work, we approximated the SDF map SDF(x, t, k) by a SIREN neural network [8] (an efficient framework to capture high frequency information). This neural network g(x, k; w), where w are weights in the network, was trained to output correct Fourier coefficients {Ai, Bi} in Eqn. 1: Ai(x, k; g) and Bi(x, k; g). The weights w were initialized randomly, and then updated by the standard gradient descent, where L = LSDF + λLE. L is the total loss; LSDF is the mean difference of the true SDF map (derived from FBP) versus SDF(x, t, k; g) for all x, t, k; LE is the Eikonal constraint computed as the mean value of absolute value of ||∇xSDF(x, t, k; g)||2-1 for all position x. λ is the regularization factor. To conclude, after Steps 1-4 a SIREN neural network g is created that implicitly approximates the SDF map of the motion corrupted FBP images so g contains motion artifacts present after FBP. Step 5: Differentiable Rendering (DR) is used to optimize g such that it represents a scene that is consistent with the acquired sinogram. Specifically, DR was used to identify the optimized shape S*of an object that minimizes the projection loss LP between the true projections (Pi) and the projections obtained via rendering of the estimated shape S: Here, DR(S; θi) is the differentiable rendering operator; in CT, it represents the projection of an object shape S from “spatiotemporal attenuation space” I(x, t) to the “projection space” Pi by the line integral of attenuation along the x-ray path u traversing through the scene at a gantry position θi: where Rθ(t) is the time-varying rotation matrix describing the gantry rotation with angle θi. Spatiotemporal attenuation maps I(x, t) in Eqn. 4 were obtained from the SIREN SDF (SDF(x, t, k; g)) by first converting the SDF to an occupancy map ε (where negative SDF value means the pixel is occupied) and then multiplying ε with the object’s attenuation a(k) (Eqn. 5). a(k) was approximated as the median attenuation of the kth segmented object in the FBP image. Combining Eqn. 3-5, this approach enables the loss LP to be defined as a differentiable function of g. Additional loss terms – LE (Eikonal constraint), LTVS and LTVT (total variances computed as the gradient of the SDF with respect to x and t) were added to constrain the result, leading to a total loss L = LP+ λ1LE + λ2LTVS + λ3LTVT where λ1 to λ3 serve as regularization weighting parameters. Step 6: After optimization, the result SDF(x, t, k; g*) was convert to the motion-corrected image I*(x, t) (i.e., the final product of NeuralCT reconstruction) via Eqn. 5. B.NeuralCT with Intensity-based SegmentationAs outlined above, a key step in the NeuralCT framework is the initialization described in Step 4 where SIREN g aims to approximate the SDF map of the scene of interest. Gupta et al. [1] used a Gaussian Mixture Model (GMM) [9] that was solely based on the intensity histogram in IFBP(x, t). GMM fits a finite number of Gaussian distributions to the intensity histogram and assigns pixels with intensity from the same Gaussian distribution as the same class. After excluding the background, the top k classes with the most pixel were used to identify foreground objects. As shown in [1], this segmentation method, hereafter referred to as SegGMM, worked well in the scenes with a single foreground object – as it readily separates the object from the background, despite motion artifacts. C.NeuralCT with Spatially Aware SegmentationHowever, when SegGMM is applied to a scene with multiple moving objects, each with different attenuations, it becomes difficult to differentiate objects based solely on the intensity distribution. Fig. 2 shows a failure of SegGMM when analyzing the FBP reconstruction of two moving dots with two different attenuations (top = 0.7, bottom = 0.2). Based on the histogram, GMM identifies the top two intensity values with the most pixel counts from the distribution. However, this results in incorrect labeling of two dots as one bright foreground class and a second dimmer object spread throughout the image. The core contribution of this study is to improve NeuralCT performance in the case of multiple intensity objects by resolving this segmentation error. We did so by applying a spatially-aware segmentation approach SegSI which incorporated both the Spatial (S) and Intensity (I) information of each object in the FBP image. SegSI aims to assign different classes to objects with different spatial positions and be aware of the different intensities between the real object and the motion artifacts. This can be achieved using various approaches such as Region-Of-Interest (ROI) definition plus thresholding or data-driven methods (e.g., deep learning segmentation). Here, we focus on demonstrating that this improvement in segmentation leads to improvements in NeuralCT performance. In Fig. 2, we show a simple approach to add spatial information. Specifically, bounding boxes were used to guide thresholding-based segmentation. Each bounding box was defined to only contain one moving dot such that we assigned one individual class to each box. In the box, we defined an intensity threshold = γ ×Imax where Imax is the maximum intensity in the scene in each box to capture the real object. γ = 0.7 was set empirically. Given that artifacts will always be present in the initial FBP images, we highlight here that the goal with this new segmentation is not to achieve a perfect segmentation but rather to provide a segmentation that is not so poor that it precludes improvement by the NeuralCT framework. We hypothesize that by improving the initial segmentation, we will avoid overt failures and improve image quality obtained with NeuralCT. 3.EXPERIMENTS AND RESULTSWe performed two experiments to demonstrate the impact of the segmentation on the subsequent result and evaluate the improvement associated with our new segmentation approach. Experiment 1: Angular Displacement of Two DotsAs shown in Fig. 2, two circular dots which translate with angular displacement ΔØ per full gantry rotation were imaged. The two dots had different attenuation levels (top = 0.7, bottom = 0.2), mimicking the difference between contrast-enhanced vessels and the myocardium in cardiac CT. Background = 0. The image resolution was set to 128× 128 and a parallel beam CT geometry was used with 720 gantry positions per rotation. Two NeuralCT frameworks were then evaluated – intensity-based segmentation (NCT-SegGMM) and spatially aware segmentation (NCT-SegSI) – across a range of ΔØ (from 20° to 200° per gantry rotation). Performance was evaluated using root-mean-square-error (RMSE) and DICE coefficients relative to the ground truth image. As shown by the images and metrics in Fig. 3, FBP motion artifacts increased at higher ΔØ. Reconstruction with NCT-SegGMM was limited when ΔØ > 60°. In contrast, NCT-SegSI maintained high-quality motion-corrected reconstructions for all ΔØ and achieved low RSME (<0.028) and high (>0.89) DICE for ΔØ up to 160°. Experiment 2: Complex Deformation of LettersIn experiment 2, we evaluated the ability of NCT-SegSI to improve reconstruction of scenes with complex topological changes. As shown in Fig. 4, in this case, we simulated CT imaging during transformation of letters. The top letter transformed from “A” to “B” to “A” (attenuation = 0.7) while the bottom letter transformed from “B” to “A” to “B” (attenuation = 0.4). NCT-SegSI (red line) significantly reduced the severity of artifacts observed with FBP (blue) and NCT-SegGMM (orange), especially during transformation periods (2nd-3rd and 5th-6th columns). Quantitatively, median RMSE of NCT-SegSI (median = 0.050 [0.042-0.061]) was significantly lower (p<0.05) than NCT-SegGMM (0.090 [0.076-0.096]) and FBP (0.069 [0.047-0.085]). Median DICE for NCT-SegSI (0.89 [0.86-0.93]) was significantly higher (p<0.05) than NCT-SegGMM (0.72 [0.69-0.76]) and FBP (0.72 [0.64-0.87]). Lastly, NCT- SegSI increased the percentage of the frames with RMSE< 0.05 (NCT-SegSI: 45.7%, NCT-SegGMM: 0%, FBP: 28.0%) as well as with DICE > 0.85 (NCT-SegSI: 89.6%, NCT-SegGMM: 0%, FBP: 27.6%). 4.SUMMARYReconstruction of moving scenes using a neural implicit representation-based framework (NeuralCT) can improve image quality without the need for a prior motion model or estimation. Here, we show that when imaging scenes with multiple moving objects, performance of NeuralCT can be limited by poor segmentation of motion-corrupted FBP images. Using a spatially aware object segmentation method that incorporates both spatial and intensity information can result in an NeuralCT solution which maintains high reconstruction performance for moving objects with multiple attenuation levels despite high angular motion and complex topological changes. REFERENCESK. Gupta, B. Colvert, and F. Contijoch,
“Neural Computed Tomography,”
(2022) http://arxiv.org/abs/2201.06574 ,”, Jan ). 2022). Google Scholar
Z. Chen et al,
“Precise measurement of coronary stenosis diameter with CCTA using CT number calibration,”
Med. Phys., 46
(12), 5514
–5527
(2019). https://doi.org/10.1002/mp.v46.12 Google Scholar
T. Lossau et al,
“Motion estimation and correction in cardiac CT angiography images using convolutional neural networks,”
Comput. Med. Imaging Graph., 76 101640
(2019). https://doi.org/10.1016/j.compmedimag.2019.06.001 Google Scholar
Y. Ko, S. Moon, J. Baek, and H. Shim,
“Rigid and non-rigid motion artifact reduction in X-ray CT using attention module,”
Med. Image Anal., 67 101883
(2021). https://doi.org/10.1016/j.media.2020.101883 Google Scholar
P. Wang, L. Liu, Y. Liu, C. Theobalt, T. Komura, and W. Wang,
“NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction,”
(2021) http://arxiv.org/abs/2106.10689 ,”, Dec ). 2021). Google Scholar
L. Shen, J. Pauly, and L. Xing,
“NeRP: Implicit Neural Representation Learning with Prior Embedding for Sparsely Sampled Image Reconstruction,”
(2021) http://arxiv.org/abs/2108.10991 ,”, Aug ). 2021). Google Scholar
Q. Wu et al,
“An Arbitrary Scale Super-Resolution Approach for 3-Dimensional Magnetic Resonance Image using Implicit Neural Representation,”
(2021) http://arxiv.org/abs/2110.14476 ,”, Oct ). 2021). Google Scholar
V. Sitzmann et al,
“Implicit Neural Representations with Periodic Activation Functions,”
(2020) http://arxiv.org/abs/2006.09661 ,”, Jun ). 2020). Google Scholar
D. Reynolds et al,
“Gaussian Mixture Models,”
Encyclopedia of Biometrics, 659
–664
(2009). https://doi.org/10.1007/978-0-387-73003-5 Google Scholar
|