Open Access Paper
17 October 2022 Motion correction image reconstruction using NeuralCT improves with spatially aware object segmentation
Zhennong Chen, Kunal Gupta, Francisco Contijoch
Author Affiliations +
Proceedings Volume 12304, 7th International Conference on Image Formation in X-Ray Computed Tomography; 123041A (2022) https://doi.org/10.1117/12.2646402
Event: Seventh International Conference on Image Formation in X-Ray Computed Tomography (ICIFXCT 2022), 2022, Baltimore, United States
Abstract
NeuralCT [1] has been recently proposed as an implicit neural representation-based image reconstruction that can produce time-resolved images from CT sinograms and reduce motion artifacts, even when undergoing complex motions. NeuralCT does not require the prior motion model or estimation of object motion. Instead, it utilizes a network to implicitly represent the time-varying object boundary by singed distance function and optimizes the network via differentiable rendering. In this work, we modify the NeuralCT framework to reconstruct scenes that have multiple moving objects with distinct attenuation levels. We show that the performance of NeuralCT reconstruction depends on the quality of the initialization of the network (in this case, object segmentation in motion-corrupted FBP image). We show how spatially aware object segmentation can improve motion-corrected reconstruction in moving objects with multiple attenuation levels despite high angular motion and complex topological changes.

1.

INTRODUCDTION

Cardiac computed tomography (CT) has emerged as a noninvasive method to evaluate the coronary artery disease and assess the cardiac function. However, image quality can be limited by motion of cardiac structures. For example, even slow coronary vessel motion (~15mm/s) can cause significant blurring of vessels [2]. Improved hardware such as faster gantry rotation or dual source designs can avoid/reduce motion artifacts but further improvement appears limited by physical constraints. Machine learning algorithms [3], [4] have been used to correct motion artifacts in reconstructed images. However, current approaches are limited by the need as a true motion vector field (for training) is unavailable in clinical data.

Recently, implicit neural representations (INR) [5] have been used to improve reconstruction of medical images [6], [7]. Gupta et al. [1] recently developed an INR-based framework to improve reconstruction of CT data corrupted by object motion. This framework, called “NeuralCT”, takes CT sinograms as the input and produces time-resolved images and was shown to correct motion artifacts. A key benefit of NeuralCT is that it does not impose a motion model nor require estimates of the object motion. An overview is shown in Fig 1.

Fig. 1.

NeuralCT framework. FBP = filtered backprojection, SDF = signed distance function, DR = Differentiable Rendering. In this study we proposed a new segmentation (red box) to extend NeuralCT to more complicated scenes.

00047_PSISDG12304_123041A_page_1_1.jpg

NeuralCT utilizes a neural network to implicitly represents (neural representation) the moving object boundary via the signed distance function (SDFs). Concretely, the INR maps the spatiotemporal domain of the moving object (a point at a particular position and time) to SDF value domain (the real-time relative position of this point with respect to the object boundary). In this work, the neural representation was initialized using intensity-based segmentation of the motion corrupted Filtered Backprojection (FBP) result. The representation was then optimized via differentiable rendering (DR) [5], a technique used to identify the shape of an object that best “explains” its acquired projection. Thus, NeuralCT aims to identify the optimal time-varying shape of moving object such that the resultant projection agrees with the CT sinogram (ground truth projections). We emphasize that NeuralCT is not a learning task that requires training and testing datasets as such approaches depend on data driven priors which have a tendency to introduce bias in the reconstruction. Instead NeuralCT builds on work where INR problems are solved via optimization. In this case, NeuralCT performs optimized reconstruction by forward rendering the moving object to acquire projection estimates, calculating the error between projection estimates and the true sinogram, and then updating the reconstruction by backpropagating the error via gradient descent.

In the initial description of NeuralCT, Gupta et al. showed high-quality motion-correction for a single foreground object with high angular motion (up to 200° displacement per gantry rotation) as well as complex topological deformation [1]. However, clinical CT scans are not composed of a single foreground class. Therefore, the core contribution to this study is to extend NeuralCT to successfully correct motion artifacts in scenes with multiple (i.e., different intensity) moving objects. In particular, we observed that imaging multiple moving objects with different attenuations can limit the accuracy of intensity-based segmentation and consequently decrease the reconstruction performance. As a result, we incorporate spatial information into the segmentation and compare our improved reconstruction result with the initial NeuralCT and FBP.

2.

METHODS

A.

NeuralCT Framework

The NeuralCT framework is described in Fig. 1 and the full description can be found in Ref [1]. The CT sinogram is the input and a time-resolved attenuation map I*(x, t) (motion-corrected image) is the output. The steps of the algorithm are:

Step 1: FBP images are created via backprojection of the sinogram P (comprised of a set of projections {P1, P2, …, Pn} for n gantry positions). This results in a series of motion-corrupted attenuation images IFBP (x, t).

Step 2: Segmentation Seg is used to identify different foreground objects from IFBP(x, t). The choice of Seg will be further discussed in Section II.B and II.C. Segmentation results in a binary time-varying images B(x, t, k) where the kth channel corresponds to the kth foreground object.

Step 3: The time-varying scene of k binary images B(x, t, k) is implicitly represented using the signed distance function (SDF). Specifically, SDF(x, t, k) is generated to represent the position of the boundary as the signed distance of a point at location x in space at a particular time t to the boundary of the kth object.

Step 4: For each location xRN where N is the number of spatial dimensions, the temporal evolution of an object’s SDF was represented by Fourier Features (FF) using Fourier coefficients {A0, A1, …, AM, B0, B1, …, BM}:

00047_PSISDG12304_123041A_page_2_1.jpg

Here, ωi are M randomly sampled frequencies. In our work, we approximated the SDF map SDF(x, t, k) by a SIREN neural network [8] (an efficient framework to capture high frequency information). This neural network g(x, k; w), where w are weights in the network, was trained to output correct Fourier coefficients {Ai, Bi} in Eqn. 1: Ai(x, k; g) and Bi(x, k; g). The weights w were initialized randomly, and then updated by the standard gradient descent,

00047_PSISDG12304_123041A_page_2_2.jpg

where L = LSDF + λLE. L is the total loss; LSDF is the mean difference of the true SDF map (derived from FBP) versus SDF(x, t, k; g) for all x, t, k; LE is the Eikonal constraint computed as the mean value of absolute value of ||∇xSDF(x, t, k; g)||2-1 for all position x. λ is the regularization factor. To conclude, after Steps 1-4 a SIREN neural network g is created that implicitly approximates the SDF map of the motion corrupted FBP images so g contains motion artifacts present after FBP.

Step 5: Differentiable Rendering (DR) is used to optimize g such that it represents a scene that is consistent with the acquired sinogram. Specifically, DR was used to identify the optimized shape S*of an object that minimizes the projection loss LP between the true projections (Pi) and the projections obtained via rendering of the estimated shape S:

00047_PSISDG12304_123041A_page_3_2.jpg

Here, DR(S; θi) is the differentiable rendering operator; in CT, it represents the projection of an object shape S from “spatiotemporal attenuation space” I(x, t) to the “projection space” Pi by the line integral of attenuation along the x-ray path u traversing through the scene at a gantry position θi:

00047_PSISDG12304_123041A_page_3_3.jpg

where Rθ(t) is the time-varying rotation matrix describing the gantry rotation with angle θi.

Spatiotemporal attenuation maps I(x, t) in Eqn. 4 were obtained from the SIREN SDF (SDF(x, t, k; g)) by first converting the SDF to an occupancy map ε (where negative SDF value means the pixel is occupied) and then multiplying ε with the object’s attenuation a(k) (Eqn. 5). a(k) was approximated as the median attenuation of the kth segmented object in the FBP image.

00047_PSISDG12304_123041A_page_3_4.jpg

Combining Eqn. 3-5, this approach enables the loss LP to be defined as a differentiable function of g. Additional loss terms – LE (Eikonal constraint), LTVS and LTVT (total variances computed as the gradient of the SDF with respect to x and t) were added to constrain the result, leading to a total loss L = LP+ λ1LE + λ2LTVS + λ3LTVT where λ1 to λ3 serve as regularization weighting parameters.

Step 6: After optimization, the result SDF(x, t, k; g*) was convert to the motion-corrected image I*(x, t) (i.e., the final product of NeuralCT reconstruction) via Eqn. 5.

B.

NeuralCT with Intensity-based Segmentation

As outlined above, a key step in the NeuralCT framework is the initialization described in Step 4 where SIREN g aims to approximate the SDF map of the scene of interest. Gupta et al. [1] used a Gaussian Mixture Model (GMM) [9] that was solely based on the intensity histogram in IFBP(x, t). GMM fits a finite number of Gaussian distributions to the intensity histogram and assigns pixels with intensity from the same Gaussian distribution as the same class. After excluding the background, the top k classes with the most pixel were used to identify foreground objects. As shown in [1], this segmentation method, hereafter referred to as SegGMM, worked well in the scenes with a single foreground object – as it readily separates the object from the background, despite motion artifacts.

C.

NeuralCT with Spatially Aware Segmentation

However, when SegGMM is applied to a scene with multiple moving objects, each with different attenuations, it becomes difficult to differentiate objects based solely on the intensity distribution. Fig. 2 shows a failure of SegGMM when analyzing the FBP reconstruction of two moving dots with two different attenuations (top = 0.7, bottom = 0.2). Based on the histogram, GMM identifies the top two intensity values with the most pixel counts from the distribution. However, this results in incorrect labeling of two dots as one bright foreground class and a second dimmer object spread throughout the image.

Fig.2.

Two different object segmentation approaches used in NeuralCT. The first image shows the ground truth motion of two dots (top intensity = 0.7, moving from left to right, bottom intensity = 0.2, moving from right to left). ΔØ is the angular displacement per gantry rotation. SegGMM: Gaussian mixture model incorrectly assigned the motion artifacts and the bottom dot as the same class. SegSI: Spatially aware segmentation utilized both spatial info (by setting bounding box in this example) and intensity info (thresholding) and led to correct detection of both top and bottom dots.

00047_PSISDG12304_123041A_page_3_1.jpg

The core contribution of this study is to improve NeuralCT performance in the case of multiple intensity objects by resolving this segmentation error. We did so by applying a spatially-aware segmentation approach SegSI which incorporated both the Spatial (S) and Intensity (I) information of each object in the FBP image. SegSI aims to assign different classes to objects with different spatial positions and be aware of the different intensities between the real object and the motion artifacts. This can be achieved using various approaches such as Region-Of-Interest (ROI) definition plus thresholding or data-driven methods (e.g., deep learning segmentation). Here, we focus on demonstrating that this improvement in segmentation leads to improvements in NeuralCT performance. In Fig. 2, we show a simple approach to add spatial information. Specifically, bounding boxes were used to guide thresholding-based segmentation. Each bounding box was defined to only contain one moving dot such that we assigned one individual class to each box. In the box, we defined an intensity threshold = γ ×Imax where Imax is the maximum intensity in the scene in each box to capture the real object. γ = 0.7 was set empirically.

Given that artifacts will always be present in the initial FBP images, we highlight here that the goal with this new segmentation is not to achieve a perfect segmentation but rather to provide a segmentation that is not so poor that it precludes improvement by the NeuralCT framework. We hypothesize that by improving the initial segmentation, we will avoid overt failures and improve image quality obtained with NeuralCT.

3.

EXPERIMENTS AND RESULTS

We performed two experiments to demonstrate the impact of the segmentation on the subsequent result and evaluate the improvement associated with our new segmentation approach.

Experiment 1: Angular Displacement of Two Dots

As shown in Fig. 2, two circular dots which translate with angular displacement ΔØ per full gantry rotation were imaged. The two dots had different attenuation levels (top = 0.7, bottom = 0.2), mimicking the difference between contrast-enhanced vessels and the myocardium in cardiac CT. Background = 0. The image resolution was set to 128× 128 and a parallel beam CT geometry was used with 720 gantry positions per rotation. Two NeuralCT frameworks were then evaluated – intensity-based segmentation (NCT-SegGMM) and spatially aware segmentation (NCT-SegSI) – across a range of ΔØ (from 20° to 200° per gantry rotation). Performance was evaluated using root-mean-square-error (RMSE) and DICE coefficients relative to the ground truth image.

As shown by the images and metrics in Fig. 3, FBP motion artifacts increased at higher ΔØ. Reconstruction with NCT-SegGMM was limited when ΔØ > 60°. In contrast, NCT-SegSI maintained high-quality motion-corrected reconstructions for all ΔØ and achieved low RSME (<0.028) and high (>0.89) DICE for ΔØ up to 160°.

Fig. 3.

NCT-SegSI accurately depicts the moving dots with two attenuations and high angular displacements. FBP suffers from motion artifacts for all ΔØ; NCT-SegGMM failed the reconstructions for high ΔØ (>60); Only NCT-SegSI maintained high-quality motion-corrected reconstruction for all ΔØ with higher DICE and lower RMSE when compared with FBP and NCT-SegGMM. ΔØ = angular displacement per gantry rotation.

00047_PSISDG12304_123041A_page_5_1.jpg

Experiment 2: Complex Deformation of Letters

In experiment 2, we evaluated the ability of NCT-SegSI to improve reconstruction of scenes with complex topological changes. As shown in Fig. 4, in this case, we simulated CT imaging during transformation of letters. The top letter transformed from “A” to “B” to “A” (attenuation = 0.7) while the bottom letter transformed from “B” to “A” to “B” (attenuation = 0.4). NCT-SegSI (red line) significantly reduced the severity of artifacts observed with FBP (blue) and NCT-SegGMM (orange), especially during transformation periods (2nd-3rd and 5th-6th columns). Quantitatively, median RMSE of NCT-SegSI (median = 0.050 [0.042-0.061]) was significantly lower (p<0.05) than NCT-SegGMM (0.090 [0.076-0.096]) and FBP (0.069 [0.047-0.085]). Median DICE for NCT-SegSI (0.89 [0.86-0.93]) was significantly higher (p<0.05) than NCT-SegGMM (0.72 [0.69-0.76]) and FBP (0.72 [0.64-0.87]). Lastly, NCT- SegSI increased the percentage of the frames with RMSE< 0.05 (NCT-SegSI: 45.7%, NCT-SegGMM: 0%, FBP: 28.0%) as well as with DICE > 0.85 (NCT-SegSI: 89.6%, NCT-SegGMM: 0%, FBP: 27.6%).

Fig. 4.

NCT-SegSI accurately depicts the complex topological change with multiple attenuations. The ground truth image (red box) contains two letters that transform over two gantry rotations. Seven frames including three stationary phases (column 1,4, 7) and four intermediate transformation phases (column 2-3, 5-6) are displayed. Both reconstructed images and the quantitative metrics indicates that NCT-SegSI improved the imaging of a complex scene.

00047_PSISDG12304_123041A_page_6_1.jpg

4.

SUMMARY

Reconstruction of moving scenes using a neural implicit representation-based framework (NeuralCT) can improve image quality without the need for a prior motion model or estimation. Here, we show that when imaging scenes with multiple moving objects, performance of NeuralCT can be limited by poor segmentation of motion-corrupted FBP images. Using a spatially aware object segmentation method that incorporates both spatial and intensity information can result in an NeuralCT solution which maintains high reconstruction performance for moving objects with multiple attenuation levels despite high angular motion and complex topological changes.

REFERENCES

[1] 

K. Gupta, B. Colvert, and F. Contijoch, “Neural Computed Tomography,” (2022) http://arxiv.org/abs/2201.06574 ,”, Jan ). 2022). Google Scholar

[2] 

Z. Chen et al, “Precise measurement of coronary stenosis diameter with CCTA using CT number calibration,” Med. Phys., 46 (12), 5514 –5527 (2019). https://doi.org/10.1002/mp.v46.12 Google Scholar

[3] 

T. Lossau et al, “Motion estimation and correction in cardiac CT angiography images using convolutional neural networks,” Comput. Med. Imaging Graph., 76 101640 (2019). https://doi.org/10.1016/j.compmedimag.2019.06.001 Google Scholar

[4] 

Y. Ko, S. Moon, J. Baek, and H. Shim, “Rigid and non-rigid motion artifact reduction in X-ray CT using attention module,” Med. Image Anal., 67 101883 (2021). https://doi.org/10.1016/j.media.2020.101883 Google Scholar

[5] 

P. Wang, L. Liu, Y. Liu, C. Theobalt, T. Komura, and W. Wang, “NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction,” (2021) http://arxiv.org/abs/2106.10689 ,”, Dec ). 2021). Google Scholar

[6] 

L. Shen, J. Pauly, and L. Xing, “NeRP: Implicit Neural Representation Learning with Prior Embedding for Sparsely Sampled Image Reconstruction,” (2021) http://arxiv.org/abs/2108.10991 ,”, Aug ). 2021). Google Scholar

[7] 

Q. Wu et al, “An Arbitrary Scale Super-Resolution Approach for 3-Dimensional Magnetic Resonance Image using Implicit Neural Representation,” (2021) http://arxiv.org/abs/2110.14476 ,”, Oct ). 2021). Google Scholar

[8] 

V. Sitzmann et al, “Implicit Neural Representations with Periodic Activation Functions,” (2020) http://arxiv.org/abs/2006.09661 ,”, Jun ). 2020). Google Scholar

[9] 

D. Reynolds et al, “Gaussian Mixture Models,” Encyclopedia of Biometrics, 659 –664 (2009). https://doi.org/10.1007/978-0-387-73003-5 Google Scholar
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Zhennong Chen, Kunal Gupta, and Francisco Contijoch "Motion correction image reconstruction using NeuralCT improves with spatially aware object segmentation", Proc. SPIE 12304, 7th International Conference on Image Formation in X-Ray Computed Tomography, 123041A (17 October 2022); https://doi.org/10.1117/12.2646402
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image segmentation

Signal attenuation

Computed tomography

Image quality

Image restoration

Motion models

Neural networks

Back to Top