## 1.

## Introduction

Hyperspectral imagery (HSI) contains hundreds of contiguous spectral bands which enable the discrimination of different materials and make a variety of potential civilian and military applications possible.^{1}^{,}^{2} Target detection is the ability to detect a low-probability target with a known signature from an unknown background.^{3}^{–}^{5} When the target spectral signature is unknown, unsupervised anomaly detection has to be applied, which is a method used to find anomalous pixels whose spectral signatures are different from their surroundings.^{6}^{,}^{7} As a classic anomaly detector, the Reed-Xiaoli (RX) algorithm^{8}^{–}^{10} was developed under a hypothesis testing where the conditional probability density functions under the two hypotheses (without and with anomaly) are assumed to be Gaussian. The solution turns out to be an adaptive Mahalanobis distance between the pixel under test and the local background. It is preferred to use local background to capture nonstationary statistics, and its advantage of using a global background covariance matrix has been demonstrated in the literature.^{11}^{–}^{13}

The RX detector has become the benchmark of anomaly detection algorithms in HSI. Obviously, the key to success is an appropriate estimate of a local background covariance matrix for effective background suppression. An adaptive RX detector employs a dual-window strategy: the inner window is slightly larger than the pixel size, the outer window is even larger than the inner one, and only the samples in the outer region (i.e., between the frames of inner and outer windows) are used to estimate the background covariance matrix to avoid the use of the potential anomalous pixels. Intuitively, the number of pixels in the outer region (related to the sizes of inner and outer windows) should be more than the number of bands so that the resulting covariance matrix can be full-rank for inverse matrix operation. However, even when the covariance matrix is ill-rank, its inversion can still be computed by several strategies, such as eigen-decomposition and reconstruction of nonzero eigenvalues and eigenvectors, data dimensionality reduction, or simply matrix regularization. Thus, in this work, we do not limit our discussion to the case of a full-rank local covariance matrix.

In addition to the classical RX detector, a number of extensions and other anomaly detection algorithms have also been proposed for hyperspectral data. A time-efficient method has been introduced for anomaly detection in Ref. 14, the kurtosis maximization-based anomaly detection was improved in Ref. 15, the subpixel anomaly detection was discussed in Ref. 16, a random-selection-based anomaly detector was introduced in Ref. 17, weighted and linear filter-based RX was analyzed in Ref. 18, subspace-projection-based detectors were proposed in Ref. 19, and discriminative metric learning was applied to anomaly detection in Ref. 20. In particular, kernel-based detectors, such as kernel RX (KRX),^{21} kernel eigenspace separation transform,^{22} and kernel regression analysis^{23} for anomaly detection were introduced. In addition, different background modeling approaches were proposed, such as support vector data description,^{24} automated modeling methods in Ref. 25, and the collaborative-representation-based method.^{26} However, the dual-window-based RX algorithm remains the benchmark due to its relative robustness and easy implementation.

A multiple-window-based RX (MW-RX) detector was recently discussed in Ref. 27, whose final output is independent of the window sizes. In MW-RX, RX was implemented several times with different dual windows, but for each pixel, only the maximum RX output was used to generate the final detection map. In this paper, we propose a decision-fusion approach for hyperspectral anomaly detection using multiple windows, where a decision map is produced for each dual-window detector and the final decision map is generated with a voting strategy. Experimental results will demonstrate that the proposed strategy can reduce the false alarm rates when maintaining the same true positive rates.

## 2.

## Proposed Anomaly Detection Method

## 2.1.

### Dual-Window RX Detector

Consider a three-dimensional hyperspectral cube with resized samples $\mathbf{X}={\{{\mathbf{x}}_{i}\}}_{i=1}^{n}$ in ${\mathbb{R}}^{d}$ ($d$ is the number of spectral bands) and $n$ is the total number of samples. For each pixel $\mathbf{y}$ (of size $d\times 1$), surrounding data are collected inside the outer window (of size ${w}_{\mathrm{out}}\times {w}_{\mathrm{out}}$) while outside the inner window (of size ${w}_{\mathrm{in}}\times {w}_{\mathrm{in}}$), centered at the pixel $\mathbf{y}$. The selected data are resized into a two-dimensional matrix ${\mathbf{X}}_{s}=\{{\mathbf{x}}_{i}{\}}_{i=1}^{s}$ ($s$ is the number of chosen samples, $s={w}_{\mathrm{out}}\times {w}_{\mathrm{out}}-{w}_{\mathrm{in}}\times {w}_{\mathrm{in}}$). Hence, the matrix ${\mathbf{X}}_{s}$ (of size $d\times s$) is obtained for every pixel $\mathbf{y}$ on its own local window.

A single pixel form of the RX algorithm is often approximated by the following equation:^{6}^{,}^{13}^{,}^{28}

## Eq. (1)

$$r(\mathbf{y})={(\mathbf{y}-{\mathit{\mu}}_{\text{local}})}^{T}{\sum}_{\text{local}}^{-1}(\mathbf{y}-{\mathit{\mu}}_{\text{local}}),$$In Ref. 21, KRX has been investigated via projecting data into a high-dimensional feature space in which the data become more separable. In the kernel-induced feature space, the mapping function $\mathrm{\Phi}$ maps the pixel $\mathbf{y}\to \mathrm{\Phi}(\mathbf{y})\in {\mathbb{R}}^{{d}^{\prime}\times 1}$ (${d}^{\prime}\gg d$ is the dimension of the kernel feature space) and $\mathbf{\Phi}=\mathrm{\Phi}({\mathbf{x}}_{1})$, $\mathrm{\Phi}({\mathbf{x}}_{2}),\cdots ,\mathrm{\Phi}({\mathbf{x}}_{s})\in {\mathbb{R}}^{{d}^{\prime}\times s}$. The corresponding output of KRX is represented as

## Eq. (2)

$${r}_{\mathrm{\Phi}}(\mathbf{y})={[\mathrm{\Phi}(\mathbf{y})-{\mathit{\mu}}_{{\mathrm{\Phi}}_{\text{local}}}]}^{T}{\sum}_{{\mathrm{\Phi}}_{\text{local}}}^{-1}[\mathrm{\Phi}(\mathbf{y})-{\mathit{\mu}}_{{\mathrm{\Phi}}_{\text{local}}}],$$## 2.2.

### Proposed Decision-Fusion Detector

Adaptive anomaly detection is used to detect anomalies whose spectral signatures are different from the local background; depending upon the definition of local, the resulting anomaly detection performance will be different. In the setting of dual-window implementation, the pixels between the inner and outer windows are considered as local background; of course, the change of dual-window sizes will end up with different anomaly detection performances. Note that the purpose of the inner window is to prevent the background from being contaminated by the central pixel when it is a target; thus, the size of the inner window should be slightly larger than the target size; under a complete unknown environment, this information is unknown as well. Inspired by multiclassifier fusion,^{29} such difficulty in appropriate window setting may be mitigated by detector fusion.

In the proposed decision-fusion approach, detection outputs for a pixel $\mathbf{y}$ using $m$ detectors with $m$ different windows are expressed as $\{{r}_{i}(\mathbf{y}),i=\mathrm{1,2},\cdots ,m\}$, where ${r}_{i}(\mathbf{y})$ represents the $i$’th output using the $i$’th pair $({w}_{\mathrm{in}},{w}_{\mathrm{out}})$ via Eq. (1) or Eq. (2). The outputs of an entire image are normalized to have a range of [0, 1] and compared with a prescribed threshold $\eta $. A pixel will be claimed to an anomaly if the output is larger than $\eta $. The number of times that the pixel $\mathbf{y}$ is assigned to be an anomaly will be counted:

## Eq. (3)

$$N(\mathbf{y})=\{\text{Count}|{r}_{i}(\mathbf{y})-\eta >0,\phantom{\rule[-0.0ex]{2em}{0.0ex}}i=\mathrm{1,2},\cdots ,m\}.$$The final class-label decision follows a voting process expressed as

## Eq. (4)

$${D}^{\mathrm{RX}-\text{Fusion}}(\mathbf{y})=\{\begin{array}{ll}1& \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}N(\mathbf{y})>=t\\ 0& \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}N(\mathbf{y})<t\end{array},$$In MW-RX, for a pixel $\mathbf{y}$, after obtaining RX outputs with multiple dual windows, the maximum value will be taken^{27}

## Eq. (5)

$${r}^{\mathrm{MW}-\mathrm{RX}}(\mathbf{y})=\underset{1\le i\le m}{\mathrm{max}}\hspace{0.17em}{r}_{i}(\mathbf{y}),$$## 3.

## Experimental Results

## 3.1.

### Hyperspectral Data

The first experimental data we employed are the hyperspectral digital imagery collection experiment (HYDICE) image^{30} This scene consists of $80\times 100\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{pixels}$ for an urban area. The spatial resolution is approximately 1 m. 175 bands of spectral coverage 0.4 to $2.5\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mu \mathrm{m}$ remain after removal of water vapor absorption bands. There are approximately 21 anomalous pixels, representing cars and roof. The scene and the ground-truth map of anomalies are shown in Fig. 1.

The second dataset was acquired by the HyMap airborne hyperspectral imaging sensor,^{31} which provides 126 spectral bands spanning the wavelength interval 0.4 to $2.5\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mu \mathrm{m}$. The image dataset, covering one area of Cooke City, Montana, was collected on July 4, 2006, with the spatial size $200\times 800\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{pixels}$. Each pixel has approximately 3 m of ground resolution. Seven types of targets, including four fabric panel targets, and three vehicle targets, were deployed in the region of interest. In our experiment, we crop a subimage of size $100\times 300\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{pixels}$, including all these targets (anomalies) as depicted in Fig. 2. Figure 3 further illustrates the spectral signatures of the seven targets, which are significantly different from the mean of background.

## 3.2.

### Detection Performance

We investigate the effectiveness of the proposed RX-Fusion and KRX-Fusion. For KRX, a commonly used Gaussian radial basis function kernel is adopted.^{21} In this work, the kernel parameter is set to 50 for these two data according to our experimental study. As for windows $({w}_{\text{in}},{w}_{\text{out}})$, since the size of anomalies is usually small, we set the general choices as listed in Table 1, which includes 12 pairs in total. Figure 4 first illustrates the performance with varying sizes of windows $({w}_{\text{in}},{w}_{\text{out}})$ using the HYDICE urban data. The receiver-operating-characteristic (ROC) curve is employed to quantitatively evaluate the detection ability. The results clearly show that the performance of the detector changes significantly with different $({w}_{\text{in}},{w}_{\text{out}})$ and indicate that it deteriorates if an inappropriate window is chosen, which motivates us to design a window-independent detector. The proposed RX-Fusion and KRX-Fusion, based on the decision-fusion strategy, simultaneously adopt multiple windows and produce the final decision map via a voting process.

## Table 1

General choices for sizes of windows (win,wout).

win | wout | ||
---|---|---|---|

3 | 5 | 7 | 9 |

5 | 7 | 9 | 11 |

7 | 9 | 11 | 13 |

9 | 11 | 13 | 15 |

Figures 5 to 6 illustrate the area under ROC (AUC) performance of RX, KRX, RX-Fusion, and KRX-Fusion. In Fig. 5(a), the best $({w}_{\text{in}},{w}_{\text{out}})$ for both RX and KRX is (7, 9); moreover, we observe that the AUC performance of RX and KRX is sensitive to the choice of sizes of windows, which is consistent with the performance in Fig. 4. In Fig. 5(b), the optimal $t$ values (out of 12) for RX-Fusion and KRX-Fusion are 5 and 4, respectively. Note that when $t=6$, the performance of RX-Fusion and KRX-Fusion is very similar to the best ones, which are also close to the case with the best window settings as shown in Fig. 5(b). In Fig. 6, for the HyMap data, the best $({w}_{\text{in}},{w}_{\text{out}})$ for both RX and KRX is (7, 11), and the best $t$ values for RX-Fusion and KRX-Fusion are 9 and 8, respectively. In Fig. 6(b), if $t=6$, the performance of both RX-Fusion and KRX-Fusion is slightly worse, but much better than the cases with inappropriate window sizes as shown in Fig. 6(a).

Under the best parameters, Figs. 7 to 8 illustrate the ROC performance of the proposed RX-Fusion and KRX-Fusion compared with RX, KRX, MW-RX, and MW-KRX. For better visualization, we separate the cases of RX-Fusion and KRX-Fusion. From the results, it is obvious that the proposed RX-Fusion is always superior to RX and MW-RX, and the proposed KRX-Fusion outperforms KRX and MW-KRX. For the HYDICE urban data, MW-KRX exhibits a better performance than KRX; however, this is not true for the HyMap data. To further investigate the detection performance in the HYDICE urban data, Fig. 9 illustrates the detection maps when ${P}_{f}$ is fixed to a small value (e.g., 0.005) and ${P}_{d}$ is the maximum. The proposed RX-Fusion and KRX-Fusion still perform the best with the largest ${P}_{d}$, which is consistent with the results in Fig. 7.

Table 2 further summarizes the AUC performance. From the AUC values shown in Figs. 5 to 6, we can see that although the performances of suboptimal RX-Fusion and KRX-Fusion (i.e., $t=6$ when $m=12$) are slightly worse than the best RX and KRX (which are practically unknown), respectively, they are much better than their worst and average performances. This means, in reality, we can empirically choose $t$ to equal 50% of the total number of detectors; in other words, if half of detectors claim a pixel to be an anomaly, then it will be an anomaly in the final decision.

## Table 2

Area under ROC (AUC) for several anomaly detectors using the two experimental data.

HYDICE | HyMap | |
---|---|---|

RX (best) | 0.9964 | 0.7304 |

RX (worst) | 0.9030 | 0.5857 |

RX (average) | 0.9512 | 0.6665 |

MW-RX | 0.9944 | 0.6243 |

RX-Fusion (best) | 0.9973 | 0.7343 |

RX-Fusion (suboptimal) | 0.9953 | 0.7024 |

KRX (best) | 0.9968 | 0.8694 |

KRX (worst) | 0.9079 | 0.5876 |

KRX (average) | 0.9516 | 0.7622 |

MW-KRX | 0.9974 | 0.7661 |

KRX-Fusion (best) | 0.9976 | 0.8738 |

KRX-Fusion (suboptimal) | 0.9959 | 0.8638 |

## 4.

## Conclusions

In this work, we proposed an effective decision-fusion strategy for dual-window-based anomaly detection in HSI. For each testing sample, the detection outputs of a detector with multiple windows were first obtained. The final detection was achieved through a voting process. Experimental results of two hyperspectral data demonstrated that the proposed RX-Fusion/KRX-Fusion outperformed the existing RX, KRX, MW-RX, and MW-KRX. Although the final decision is dependent on a voting parameter, we find out that 50% voting can generate a suboptimal (and close to optimal) performance, which is significantly better than a single detector with unfortunately poor window settings. The base detector utilizes the fashion of spatial convolution with a sliding dual window, which is suitable for parallel computing,^{32}^{,}^{33} because the output of one pixel is irrelevant to the output of another. In the proposed decision-fusion framework, the multiple dual windows can also be simultaneously implemented, which will be investigated as the future work.

## Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under (Grant No. NSFC-61302164), and in part by the Fundamental Research Funds for the Central Universities under (Grant No. YS-1404).

## References

## Biography

**Wei Li** received his PhD degree in electrical and computer engineering from Mississippi State University, Starkville, in 2012. Subsequently, he spent 1 year as a postdoctoral researcher at the University of California, Davis. Currently, he is with the College of Information Science and Technology at Beijing University of Chemical Technology, Beijing, China. His research interests include statistical pattern recognition, hyperspectral image analysis, and data compression.

**Qian Du** received her PhD degree in electrical engineering from the University of Maryland Baltimore County, Baltimore, Maryland, in 2000. Currently, she is the Bobby Shackouls professor with the Department of Electrical and Computer Engineering at Mississippi State University, Mississippi. Her research interests include hyperspectral remote sensing image analysis, pattern classification, data compression, and neural networks. She serves as an associate editor for the *Journal of Applied Remote Sensing*. She is a fellow of SPIE.