*in vivo*gadolinium-enhanced magnetic resonance imaging in the delayed phase. We use MRI of formalin-fixed human

*ex vivo*liver samples as phantoms that mimic the textural contrast of

*in vivo*Gd-MRI. We have developed a local texture analysis that is applied to phantom images, and the results are used to train model observers to detect HF. The performance of the observer is assessed with the area-under-the-receiver–operator-characteristic curve (AUROC) as the figure-of-merit. To optimize the MRI pulse sequence, phantoms were scanned with multiple times at a range of flip angles. The flip angle that was associated with the highest AUROC was chosen as optimal for the task of detecting HF.

## 1.

## Introduction

Chronic liver disease (CLD) is a widespread health concern that represents a common disease pathway for a number of important etiologies, including nonalcoholic steatohepatitis (NASH), alcoholic cirrhosis, and viral hepatitis.^{1}^{,}^{2} These diseases lead to inflammation and damage, usually first involving the portal triad region surrounding the hepatic lobules, resulting in the deposition of collagen scar tissue in the extracellular matrix (ECM), a process diagnosed as hepatic fibrosis (HF).^{1}^{,}^{3}^{–}^{5} HF is the hallmark of CLD.^{2}^{,}^{6}^{–}^{8} Monitoring for the presence of HF and staging (quantifying) the severity and progression over time are essential for the diagnosis and therapeutic management of CLD.

The current reference standard for CLD diagnosis and HF staging is needle biopsy of the liver.^{6}^{,}^{9}^{–}^{11} Biopsy provides cellular-resolution images that make it possible for a pathologist to identify fibrotic tissue and stage severity. When providing a diagnosis of HF, a pathologist will report severity using a numerical staging system based on one of several alternative scoring methods. Two of these techniques are the “Ishak score,” using a seven-point scale, and the “METAVIR score,” which uses a five-point severity scale.^{3}^{,}^{9} Each scale has metrics for determining the severity of HF and each institution or medical organization adopts a particular scale.

Needle biopsy can provide diagnostic specificity for HF, but the technique suffers from multiple drawbacks.^{2}^{,}^{8}^{,}^{10}^{,}^{12}^{,}^{13} The sample recovered in needle biopsy is $\sim 1\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{mm}}^{3}$ and is used to determine the health of an organ $\sim \mathrm{50,000}$ times larger than the sample’s volume; CLD is a nonuniform disease affecting different regions to different degrees, making biopsies prone to volume sampling errors. Additionally, pathologists require extensive training to make a quantitative assessment of HF, and there is a significant variance between scores,^{6}^{,}^{9}^{,}^{10}^{,}^{14}^{,}^{15} either between pathologists within the same center or between centers, which may be aggravated when different scales are used. There is also risk for the patient, who must undergo an invasive procedure that has potential complications, including pain, bleeding, and/or infection.

HF caused by CLD can be treated with therapies that delay progression or reverse damage to the liver; therapy is most effective if HF is diagnosed in early-stage disease. Change of lifestyle is effective at delaying or preventing progression of CLD in the cases of alcoholic liver disease and NASH.^{16}^{–}^{18} There are also antiviral treatments for viral hepatitis.^{19}^{,}^{20} The capacity to provide a quantitative, whole liver, noninvasive MRI surrogate measure for HF would be of pivotal impact to the diagnosis, management, and further development of improvements in new therapies.

As mentioned, the METAVIR score is a method to stage HF from needle biopsy.^{9} It has also been suggested as a reference scale for studies that use MRI techniques to stage HF. The METAVIR score is based on a five-point scale ranging from F0 to F4. F0 corresponds to a healthy liver with no detectable HF. F1 is diagnosed when collagen has formed around the portal triads, the veins that supply blood to the hepatocytes that perform the primary function of removing toxins from the blood; F2 is based on identifying HF extending from the portal triads with fibrosis branching out between the triads; F3 stage is based upon identifying fibrosis bridging across portal triads; and F4, also referred to as cirrhosis, is called when thickened fibrotic bands both bridge triads and encase liver lobules.

Magnetic-resonance elastography (MRe)^{13}^{,}^{21}^{–}^{24} is another MRI method that has been developed for HF detection and staging. MRe utilizes an externally triggered transducer that produces mechanical, pneumatically driven, longitudinal pressure distortion waves on the surface of the patient’s body that transfer to the liver.^{21}^{,}^{22}^{,}^{25}^{,}^{26} Deformation in response to the pressure waves is spatially and temporally measured using a phase-sensitive MRI technique, allowing conversion to a measure of liver tissue stiffness.^{21}^{,}^{24}^{,}^{27}^{–}^{29} Liver stiffness has been shown to correlate with the presence of HF.^{13}^{,}^{24}^{,}^{27}^{,}^{30}^{–}^{32} Limitations of MRe include the requirement for specialized equipment, additional time in the MR scanner, potential patient discomfort, and that MRe is relatively insensitive to early stages of HF,^{23}^{,}^{27}^{,}^{29} $\le \mathrm{F}2$ disease. Our goal is to develop a quantitative MR imaging technique directly sensitive to the textural changes the pathologist is observing, but with the advantages of being noninvasive, fast, requiring only standard MRI systems, and with the capacity to provide whole-liver sampling.

Gadolinium (Gd) contrast agent (gadobenate dimeglumine) has previously been shown to accumulate in the extracellular space where collagen has emerged in the liver, providing contrast between healthy and fibrotic tissue.^{2}^{,}^{6} Gadolinium reduces the T1 and T2 wherever it accumulates. The effect on T1 is much greater than on T2, making gadolinium an effective *in vivo* T1-imaging contrast agent.^{33} *In vivo* images suitable for the detection of HF are collected with a T1-weighted MRI pulse sequence at the delayed phase of Gd-enhancement. The spatial resolution achievable in images acquired by clinical MRI is very near the scale of the characteristic size of the hepatic lobule and larger HF bands. However, the images are challenging to analyze reproducibly and quantitatively by unaided radiologists.

Since statistics of the data are not known, we chose to assume the data obey normal statistics, and use a Hotelling observer to assess local texture in formalin-fixed liver samples obtained at autopsy using MR images. Data were collected with a 3-D gradient-echo pulse sequence using a TR/TE/NA of $9.79\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{ms}/4.44\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{ms}/2$. These settings were chosen, based on results outlined in Section 2.1, to recreate the contrast observed *in vivo* in Gd-enhanced images that radiologists read to assess HF.

In clinical MRI, the operator has control over the TR, TE, and flip angle, and receives radiologist feedback to confirm if the sequence is collecting images with diagnostically acceptable contrast. The image sequence used is not necessarily ideal for performing the task of separating images of F0 and F4 liver images with a mathematical observer. Our goal is to use task-based performance assessment using a linear ideal (Hotelling) observer to determine the optimal parameters for maximizing sensitivity to fibrotic structures in MR imaging.

The TR, TE, and flip angle parameters directly contribute to the contrast of an MR image. However, TR and TE also directly impact the scan time of the image sequence. Increasing the duration of the MR acquisition in the abdomen is not desirable due to increased artifacts from motion associated with the patient’s breathing. Changing the flip angle has a similar effect on overall contrast without significantly impacting the length of the sequence. For this reason, we focus on determining the ideal flip angle for an MR sequence that will be used to assess HF.

We use area-under-the-receiver–operator-characteristic curve (AUROC) as the figure-of-merit, and present results of a study to find the optimal flip angle for detecting HF in liver phantoms. This optimization method is translatable to the clinical setting.

## 2.

## Materials and Methods

## 2.1.

### Liver Phantoms

Liver specimens were recovered from the University of Arizona’s Department of Pathology for use as MRI phantoms. One- to two-inch thick slices were sectioned during autopsy and fixed in formalin. After the liver was fixed in formalin, biopsies were collected and the phantom placed in an air-tight container. The containers were then placed in the MRI to collect images.

To confirm our observation that MRI of formalin-fixed tissue is comparable to clinical contrast-enhanced MRI, we compared the textures of the liver tissues using a technique introduced by Burgess et al. The method takes the Fourier transform of the data and measures the radially averaged power spectra in frequency space. The results are reported on a log–log scale to determine the slope of the power spectra. Imaging modalities with similar slopes in the spectral density have similar image features.^{34}^{,}^{35} We compared healthy and cirrhotic patient data to healthy and cirrhotic phantom images. Patient data were collected at a TR/TE/FA of $4.36\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{ms}/2.3\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{ms}/10\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{deg}$ and phantom data were collected at a TR/TE/FA of $9.79\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{ms}/4.44\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{ms}/10\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{deg}$. All data sets were collected with resolutions of $1.5\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{mm}}^{2}$ in-plane resolution at a slice thickness of 3 mm. Results for this experiment are shown in Fig. 1. All of the collected spectra exhibited similar slopes to within experimental error. With this result, we conclude that the formalin-fixed tissue produces images with textures similar to *in vivo* clinical images of fibrosis. Higher-resolution *in vivo* experiments are not yet possible due to the limitations of patient respiratory motion.

## 2.2.

### Magnetic Resonance Imaging

All images in this optimization study were collected on a Siemens 3T Skyra MRI using the Siemens flex body imaging coil using a 3-D gradient-echo T1-weighted imaging sequence (3-D VIBE, Siemens) with $\mathrm{TR}/\mathrm{TE}=9.79\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{ms}/4.44\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{ms}$, 2 averages, and a range of FAs from $\sim 10$ to $\sim 50\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{deg}$. The FAs available are based on hardware limitations. A field-of-view (FOV) of $26.5\times 26.2\times 3.36\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{cm}$ with a sampling matrix of $768\times 760\times 96$ was selected, resulting in images with isotropic resolution of $0.35\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{mm}}^{3}$. All images were collected at room temperature (22°C). The total scan time at one FA was $\sim 25\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{min}$. We collected enough data to fully train the covariance matrices required to calculate the Hotelling observer. This MRI optimization method will eventually be repeated with *in vivo* data sets once enough patient images are collected.

Before the mathematical observer was trained to perform the task of HF-detection on liver tissue, a basic threshold was implemented to segment out and remove areas of the image that contained blood vessels from the analysis. We found this to be a necessary step in developing the observer technique.

## 2.3.

### Biopsy

Tissue biopsy was used as the gold standard to determine the METAVIR score for the phantoms prior to imaging. Eight scalpel biopsies of $10\times 10\times 1\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{mm}}^{3}$ size were collected from each formalin-fixed liver. Large biopsies were possible since the tissue was excised from autopsy. H&E stained slides were prepared for review by a pathologist, who evaluated each sample for the stage of fibrosis. Only phantoms with consistent METAVIR scores were selected to train and test the observers.

## 2.4.

### Local Texture Analysis

We tested texture analyses based on a local, normalized, 2-D discrete autocorrelation (2DAC) and a local, normalized, 2-D discrete circular autocorrelation (2DCC). We found that a Hotelling observer working with 2DCC is capable of distinguishing whether local regions drawn from an $\mathrm{F}0/\mathrm{F}1$ or an F4 liver while requiring a modest amount of training data. The 2DCC is given by

## Eq. (1)

$${S}_{k,l}^{\prime}=\sum _{m=1}^{M}\sum _{n=1}^{N}f(m,n){f}^{*}[\sigma (m-k+M),\sigma (n-\mathrm{l}+N)]\text{\hspace{0.17em}\hspace{0.17em}}\begin{array}{c}1\le k\le M\\ 1\le l\le N\end{array},\phantom{\rule{0ex}{0ex}}{S}_{k,l}=\frac{{S}_{k,l}^{\prime}}{{\overrightarrow{{S}^{\prime}}}_{\mathrm{max}}},$$## 2.5.

### Optimal Linear Observer

To perform the classification task between a signal-absent class corresponding to normal liver, and a signal-present class corresponding to fibrotic liver, an optimal linear observer that maximizes detection signal-to-noise ratio (SNR), also known as the Hotelling observer, was trained using local 2DCCs from 2-D slices from MR images of phantoms with METAVIR scores confirmed via biopsy. The set of pixels from the texture analysis were ordered as a $P\times 1$ vector.^{36}^{,}^{37} The optimal linear observer template, is also a $P\times 1$ vector and a test statistic is calculated as an inner product between a data vector and the observer vector^{36}^{,}^{37}

## Eq. (2)

$$\tau (\overrightarrow{g})=\sum _{p=1}^{P}{w}_{p}{g}_{p}={\overrightarrow{w}}^{t}\overrightarrow{g}.$$^{36}

^{,}

^{37}

## Eq. (3)

$$\overrightarrow{w}={\left(\frac{{\mathbf{K}}_{\mathbf{1}}+{\mathbf{K}}_{\mathbf{0}}}{2}\right)}^{-1}(\overline{\overrightarrow{{g}_{1}}}-\overline{\overrightarrow{{g}_{0}}}),$$The recovered test statistic $\tau (g)$ is used to make a decision based on a threshold. If $\tau (g)$ is greater than ${\tau}_{\mathrm{th}}$ then it is decided that ${H}_{1}$ is true, whereas if $\tau (g)$ is less than ${\tau}_{\mathrm{th}}$, ${H}_{0}$ is true.^{37}

## 2.6.

### Ideal Quadratic Observer

The Hotelling observer is the ideal observer only when ${\mathbf{K}}_{\mathbf{0}}\cong {\mathbf{K}}_{\mathbf{1}}$. However, if the statistics are multivariate normal but the covariance matrices are not equal or nearly equal, then the quadratic observer is the ideal observer. It is given by

## Eq. (4)

$$\tau (\overrightarrow{g})=\frac{1}{2}{(\overrightarrow{g}-\overline{\overrightarrow{{g}_{0}}})}^{t}{\mathbf{K}}_{0}^{-1}(\overrightarrow{g}-\overline{\overrightarrow{{g}_{0}}})-\frac{1}{2}{(\overrightarrow{g}-\overline{\overrightarrow{{g}_{1}}})}^{t}{\mathbf{K}}_{1}^{-1}(\overrightarrow{g}-\overline{\overrightarrow{{g}_{1}}}).$$The ideal quadratic observer can be computed with the same data required to train the linear observer.^{38} Decisions are made in the same manner as for the linear observer, using a comparison of $\tau (\overrightarrow{g})$ to ${t}_{\mathrm{th}}$ to decide if $\overrightarrow{g}$ is a member of ${H}_{0}$ or ${H}_{1}$.

## 2.7.

### Receiver Operator Characteristic Analysis

We recovered the receiver–operator-characteristic (ROC) curves and calculated the AUROC as the figure-of-merit for the observer. To perform ROC analysis, ${\tau}_{\mathrm{th}}$ was varied across the range of possible $\tau $ values spanned by the test statistics ${\tau}_{0}$ and ${\tau}_{1}$. ${\tau}_{0}$ is the test statistics from confirmed signal-absent testing data and ${\tau}_{1}$ is the test statistics from confirmed signal-present testing data. At each threshold ${\tau}_{\mathrm{th}}$, the false-positive fraction (FPF) and true-positive fraction (TPF) were calculated^{37} using

For each threshold, the TPF was plotted as a function of the FPF, forming the ROC curve. The figure-of-merit for the ROC curve is the AUROC, which has a possible range from 0.5 to 1.0. An AUROC of 0.5 denotes a situation where the distribution of test statistics fully overlap one another and the observer can do no better than random guessing. An AUROC (or AUC for short) of 1.0 means a complete separation of the test-statistic distributions and perfect observer performance.

## 3.

## Results

## 3.1.

### Liver Phantoms and Biopsy

We collected biopsies from four liver phantoms fixed in formalin for imaging with MRI. Each phantom had biopsies from eight regions assessed by a pathologist. Only phantoms with homogeneous biopsy results were used in this study. Two phantoms were reported as F4, one as F0, and one as F1. Representative images of biopsy slides are provided for each phantom in Fig. 2.

The F0 sample showed no sign of fibrosis, whereas the F1 biopsy showed early fibrosis forming around the portal veins. The two F4 phantoms have a complete ECM and lobules were clearly visible in the biopsy slides. The F0 and F1 livers were used to define the signal-absent class of the linear and quadratic observers and the two F4 livers define the signal-present class to train the model observers. Figure 3 provides a representative MRI of the phantoms at TR/TE $9.79\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{ms}/4.44\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{ms}$ at FA 19 deg associated with each biopsy sample.

## 3.2.

### MRI of Liver Phantoms

Each phantom was imaged at five flip angles: 8, 15, 19, 30, and 45 deg in order. Selected slices from each phantom at 19 deg are shown in Fig. 3.

The images from the F0 and F1 phantoms appear relatively untextured and the liver tissue appears uniform in signal throughout a majority of the tissue. The F1 liver in Fig. 3(b) has some features associated with vasculature. This is dependent on the location the slice is removed from during autopsy. The vasculature features, which are dark, are ignored by our analysis. The F4 images suggest that there is visible contrast between the ECM and liver tissue in the cirrhotic livers that appears at the expected length scale associated with fibrosis.

## 3.3.

### Training Model Observers

The set of local 2DCCs from the F0 and F1 phantoms comprised our signal-absent data for training a model observer and the 2DCCs from the two F4 phantoms comprised the signal present data. To avoid bias in the results, only one phantom was used to train the model observer; the other phantom was selected as the testing data. With four phantoms, two in each class, we could derive and test 4 independent observers to check for reproducibility. $7\times 7\text{\hspace{0.17em}\hspace{0.17em}}\text{pixel}$ ROIs were selected with independent gridding to calculate the means and covariance matrices. With this selection method, the liver in Fig. 3(a) had 248,439 ROIs, the liver in Fig. 3(b) had 56,865 ROIs, the liver in Fig. 3(c) had 49,858 ROIs, and the liver in Fig. 3(d) had 127,150 ROIs. The linear observers for each FA are shown in flattened 1-D form of length $P$ in Fig. 4, based notation in Eq. (5). The observer index is the vector component. We find that the templates all detect the same features, regardless of choice of training and testing data and FA—namely the peaks in the 2DCC function associated with the ECM cell size. Figure 5 provides the 2-D representation $M\times N$ of the templates at each FA for one set of training data, based on indexing in Eq. (1). The 2-D templates show a high degree of rotational symmetry, as expected for 2DCCs, which make the results invariant to image rotation.

The Hotelling observer has a template form $\stackrel{\rightharpoonup}{w}$ that one can visualize, whereas the quadratic observer does not. The sample covariance matrices for each flip angle for a representative signal-absent and signal-present training combination are shown in Fig. 6.

## 3.4.

### ROC Analysis and Curve Fitting

The four phantoms allowed for four different combinations of training and testing data, for which ROC analysis was performed and the AUC for each combination was calculated as a function of flip angle. ROIs were selected with a sliding window to increase sensitivity to local changes in texture; the liver in Fig. 3(a) had 7,686,034 ROIs, the liver in Fig. 3(b) had 2,952,176 ROIs, the liver in Fig. 3(c) had 2,503,899 ROIs, and the liver in Fig. 3(d) had 6,053,726 ROIs. The AUC values are plotted as a function of flip angle in Fig. 7.

The mean relative AUCs, after a minimal least squares adjustment to remove overall offsets between training and testing combinations, were computed as a function of flip angle, and plotted for the linear and quadratic observers in Figs. 8 and 9, respectively. The optimal flip angle was chosen based on maximizing the AUC and for both the linear and quadratic observers was found to be near 24 deg. The AUC values for the quadratic observer did not improve upon the AUC values of the linear observer.

## 4.

## Conclusions and Future Work

Task-based optimization of MRI acquisition sequence parameters can be carried out whenever a model observer is applied to the MRI images, and this method should have extensive utility for a variety of clinical applications.

The method of optimization shown in this work was focused on phantoms, but the approach can be translated to clinical practice. Similar studies are planned that use data collected from *in vivo* scans and will thus be useful for improving patient sequences. The optimal flip angle determined for *ex vivo* phantoms at our resolution is not necessarily the optimal flip angle for the *in vivo* experiments.

Additionally, more moderate cases of HF will be collected to establish the AUCs for early detection. This will have the challenge of identifying intermediate cases, i.e., F1, F2, and F3, with gold standard verification. We expect to extend our techniques to repeat the optimization experiment for best multiple-class decisions.

We are acquiring more phantom data with early stage liver disease to further develop this tool, but this is difficult due to limited access to autopsy tissue samples. Even though only a limited number of phantoms were used in the current work, we were able to collect enough training data to calculate the required covariance matrices and test the Hotelling template. We are also considering alternatives to the 2DCC for local texture analysis.

## Acknowledgments

This research was supported in part by a grant from the Arizona Biomedical Research Commission (ADHS14-082996) and the Biomedical Imaging and Spectroscopy Fellowship (NIH/NIBIB T32-EB000809). We thank the faculty and staff of the University of Arizona, Banner University Medical Center Pathology Department, with special thanks to Dr. Bruce Parks, M.D., Dr. Richard Sobonya, M.D., Dr. Cornel Moga M.D., and Samuel Kinghorn for providing the tissue samples and other necessary resources for preparing our phantoms. Finally, the authors thank Eric A. Clarkson, Ph.D., and Matt A. Kupinski, Ph.D., for their training in the field of model observer training and testing.

## References

## Biography

**Jonathan F. Brand** studied for his PhD under Lars R. Furenlid at the University of Arizona in the College of Optical Sciences. His work was in the development of classification techniques in medical imaging using model observers. His current interest is in optical design and image processing for biomedical applications.

**Lars R. Furenlid** is a professor at the University of Arizona and co-director of the Center for Gamma-Ray Imaging, with appointments in the Department of Medical Imaging (Radiology) and the College of Optical Sciences. He is also a member of the Graduate Interdisciplinary Degree Program in Biomedical Engineering and the Arizona Cancer Center. Before moving to the University of Arizona, he was a physicist at the National Synchrotron Light Source at Brookhaven National Laboratory.

**Tulshi Bhattacharyya** received his BS degree in physiology from the University of Arizona. He is a PACS analyst and research technician at Banner University of Arizona Medical Center—Radiology Department. Prior research in pathology includes work at the SAVAHCS ALS Brain Bank and University of Arizona Cancer Center.

**Ali Bilgin** is an associate professor with the Departments of Biomedical Engineering, Electrical, and Computer Engineering, and Medical Imaging at the University of Arizona, Tucson, AZ. His current research interests are in the areas of signal and image processing, and include image and video coding, data compression, and magnetic resonance imaging.