Regularized graph-embedded covariance discriminative learning for image set classification

Hengliang Tan; Ying Gao; Jiao Du; Shuo Yang

doi:10.1117/1.JEI.29.4.043018

4 August 2020 Regularized graph-embedded covariance discriminative learning for image set classification

Hengliang Tan, Ying Gao, Jiao Du, Shuo Yang

Author Affiliations +

Journal of Electronic Imaging, Vol. 29, Issue 4, 043018 (August 2020). https://doi.org/10.1117/1.JEI.29.4.043018

Abstract

Riemannian manifold has attracted an increasing amount of attention for visual classification tasks, especially for video or image set classification. Covariance matrices are the natural second-order statistics of image sets. However, nonsingular covariance matrices, known as symmetric positive defined (SPD) matrices, lie on the non-Euclidean Riemannian manifold (SPD manifold). Covariance discriminative learning (CDL) is an effective discriminative learning method that employs the Riemannian manifold in the SPD kernel space. However, in practice, the discriminative learning of CDL often suffers from the problems of poor generalization and overfitting caused by a finite number of training samples and noise corruption. Hence, we propose to address these problems by importing eigenspectrum regularization and graph-embedded frameworks. Discriminative learning with SPD manifold is generalized by the graph-embedded framework, which combines with eigenspectrum regularization in the SPD kernel space. Three local Laplacian graphs of graph-embedded framework and two eigenspectrum regularized models are incorporated to the proposed method. Comprehensive mathematical deduction of the proposed method is depicted with the “kernel tricks.” Experimental results on set-based face recognition and object categorization tasks reveal the effectiveness of the proposed method.

1. Introduction

The development of intelligent video surveillance, social networks, and electronic commerce enables a probe image set to be matched against all gallery image sets that becomes an image set classification task.¹ Image sets can be extract from videos or albums. Each probe image set and gallery image set contains multiple images that belong to the same class, which allows the extraction of considerably more discriminative information than is possible in the traditional single image classification task.² Image set classification has achieved widespread success in face recognition³^–⁶ and object categorization.⁷^–¹¹

Recently, many studies have indicated that numerous particular visual features lie on a Riemannian manifold.¹² The subspaces of image sets form the Grassmann manifold, the symmetric positive definite (SPD) matrices form the SPD manifold,¹³ and the two-dimensional shapes lie on Kendall shape spaces.¹⁴ The use of Riemannian manifold to model image set and build the corresponding classifier for classification is popular in recent years.¹⁵ Subspace and covariance matrix are two typical representations for modeling image sets on the Riemannian manifold. The subspaces of image sets form the Grassmann manifold,⁸ and the nonsingular covariance matrices form the SPD manifold.¹⁰ Linear subspace is a popular choice for modeling image sets due to its excellent accommodation of image variations. Hence, the Grassmann manifold formed by subspaces is widely used for image set classification. However, linear subspace-based modeling has the limitation that it incorporates only relatively weak information (such as the subspace angles) about the location and boundary of the samples in the input space.¹⁰ The second-order statistic feature known as the (nonsingular) covariance matrices of image sets that form the SPD manifold characterizes the set structure more faithfully.¹⁰ Many studies¹⁰^,¹²^,¹⁶ have shown the effectiveness of SPD manifold for image set classification and that the covariance descriptor is robust to noise and illumination variations.

Covariance discriminative learning (CDL)¹⁰ is one of the most representative methods that uses covariance descriptor for image set classification. Covariance descriptor provides a natural representation for an image set, which makes no assumption about the set data distribution. Hence, covariance descriptor characterizes the set structure more faithfully, and the representation possesses stronger resistance to outliers.¹⁰ The SPD manifold formed by covariance matrices is mapped to a high-dimensional reproducing kernel Hilbert spaces (RKHS), where Euclidean geometry applies. Subsequently, linear discriminant analysis (LDA)¹⁷ is applied to perform discriminative learning with the “kernel trick,” which is known as kernel discriminant analysis.¹⁸ CDL has achieved considerable results on set-based face recognition and objection categorization tasks.

In this work, we focus on the discriminative learning problem of SPD manifold on the mapped RKHS. Due to the conventional problems of linear discriminative learning, such as the singularity of within-class scatter matrix and the instability of its inverse caused by the finite number of training samples,¹⁷ CDL may suffer from overfitting and poor generalization since conventional problems may also occur during discriminative learning in the kernel space.¹⁹ To address the conventional problems of LDA, numerous approaches, such as the Fisherface LDA,¹⁷ direct LDA,²⁰ and null space LDA²¹ on linear Euclidean space, have been proposed. For the conventional problems in the kernel space, kernel methods of kernel Fisherface LDA,²² null space kernel LDA,¹⁹ and kernel direct-LDA²³ in the nonlinear kernel space exist. However, these approaches usually discard a subspace (either the principal space or null space) to circumvent the singularity before discriminant learning, which causes a loss of discriminative information.²⁴ Although dual-subspace LDA²⁵ considers the contributions of both subspaces, the associated average scaling factor may not be a suitable choice for information in the principal subspace. To address these problems, the eigenfeature regularization and extraction (ERE)²⁴ and complete discriminant evaluation and feature extraction (CDEFE)²⁶ approaches were proposed to address these problems in a linear flat space and nonlinear kernel Euclidean space, respectively. ERE considers that the entire eigenspace of the within-class scatter matrix $S_{W}$ should be retained for discriminant analysis and regularized by the eigenspectrum regularization weighting function. The entire eigenspace is partitioned into three parts according to the median operation, and three different strategies according to the eigenspectrum of $S_{W}$ are devised for regularization.²⁴ CDEFE tackles these problems in the kernel space by nonlinear mapping; it decomposes the kernel within-class variation matrix into principal and noise dominated subspaces. A weighting function that is based on the ratios of the successive eigenvalues of the eigenspectrum was proposed to circumvent the undue scaling of projection vectors.²⁶ Discriminative vectors by applying predicted eigenvalues²⁷ combined the eigenspectrum regularization models of ERE and CDEFE. Recently, regularized locality preserving discriminant embedding²⁸ and locality regularization embedding (LRE)²⁹ were proposed; these methods generalized the eigenfeature extraction of ERE by the graph-embedded framework to better preserve data locality. An adaptive locality preserving regulation model was devised for eigenspectrum regularization. The experimental results have demonstrated the effectiveness of these eigenspectrum regularization techniques.²⁹

Inspired by eigenspectrum regularization, in this work, we aim to address the conventional problems of CDL in discriminative learning by exploiting the eigenspectrum regularization with the graph-embedded framework in the RKHS, which is mapped from the SPD manifold. We refer to the proposed method as regularized graph-embedded covariance discriminative learning (RGCDL). Figure 1 shows the conceptual illustration of the proposed method. The main contributions of this paper are presented as follows.

1. We circumvent the instability, overfitting, and poor generalization on CDL¹⁰ with kernel eigenspectrum regularization architecture. The input elements in high-dimensional SPD manifold are reduced to lower-dimensional SPD manifold by principal component analysis (PCA).
2. We incorporate the graph-embedded framework with three local Laplacian graphs into the Riemannian kernel eigenspectrum regularization architecture to better preserve data locality. We give the systematic derivation of the graph-embedded framework that incorporates with the eigenspectrum in Riemannian kernel space.
3. We evaluate the advantages of the proposed method with two eigenspectrum regularization models on face recognition and object categorization tasks. The experimental results show the stability of the extracted features, robustness to noise-corrupted data, and high classification rate to numerous set-based classification methods.

Fig. 1

Conceptual illustration of the proposed RGCDL. Image sets of subjects A and B can be described by refined covariance matrices, and the lower-dimensional SPD manifold is formed. Then, the eigenspectrum regularization models of ERE and CDEFE are incorporated with the graph-embedded framework for discriminative learning on the mapped RKHS, where Euclidean geometry applies. The eigenspectrum regularizaiton circumvents the instability, overfitting, or poor generalization of discriminative learning, and the graph-embedded retains the local properties and increases the discriminatory power between classes.

The rest of this paper is organized as follows. We present the works related to image set classification according to the image set representations in Sec. 2. The original CDL method and the architecture of eigenspectrum regularization are introduced in Sec. 3. Then, the RGCDL approach is presented in Sec. 4. Experimental evaluation and discussions are presented in Sec. 5. Finally, Sec. 6 concludes this paper.

2. Related Work

In this paper, we aim to use the proposed method to solve the image set classification task. The major issues of image set classification focus on how to represent image set and measure the distance or similarity between two sets.⁵ Various techniques have been proposed to represent image set, such as the statistical distribution,³⁰^,³¹ affine/convex hull model,⁴ spare representation,⁵ subspace,⁷^,³² and covariance matrix.¹⁰^,¹²

The methods³⁰^,³¹ that model each image set by statistical distribution are one of the earliest approaches employed for image set classification. They measure the similarities between pairs of distributions of two sets and achieve considerable results. However, if the set data have no strong statistical correlations for parameter estimation, these methods often fail to work.⁵ The most representative affine/convex hull-based methods are the affine/convex hull-based image set distance (AHISD/CHISD);⁴ AHISD/CHISD represents images as points in a linear or affine feature space and computes the distance of convex geometric region spanned by its feature points. Hu et al.⁵ incorporated the sparse representation to regularize the affine hull model. Zhu et al.⁶ employed the collaborative representation technique to utilize the discrimination information between gallery sets. The affine/convex hull approaches actually aim to find the synthetic nearest points between image sets.¹¹ However, these hull models usually cannot handle the complex appearance variations caused by multiple views and extreme illumination.

Subspace is a popular and effective approach for modeling image sets. Mutual subspace method (MSM)³² is one of the earliest classic subspace-based method for image set classification. MSM models all image sets by linear subspaces, and the similarity between pairs of subspaces is measured by canonical correlation analysis (CCA).³³ Fukui and Yamaguchi³⁴ and Fukui and Maki³⁵ projected the linear subspaces to a “difference subspace,” which can extract the disparity between two subspaces. Kim et al.⁷ incorporated discriminative learning into subspace-based set classification according to canonical correlations (DCC). DCC attempts to obtain a linear transformation that maximizes the canonical correlations of within-class subspaces and minimizes the canonical correlations of between-class subspaces. Arandjelovic³⁶ extended CCA to an extended version (ECCA) by extracting the most similar models of variability within two sets and exploited the discriminative learning architecture to train a classifier (DECCA).

Subspaces can also be treated as points that lie on a special type of Riemannian manifold, which is known as the Grassmann manifold. The method in Ref. 3 represents an image set as multiple local linear subspaces and treats them as points on the Grassmann manifold; then, the manifold-to-manifold distance (MMD) is defined between two manifolds of two image sets. Manifold discriminant analysis³⁷ was proposed to learn an embedding space by maximizing the manifold margin of the MMD. Grassmann manifold can also be mapped to an RKHS, where Euclidean geometry applies, Grassmann discriminant analysis (GDA)⁸ implements LDA on the mapped RKHS by the Grassmannian kernel. GDA is generated to kernel GDA (KGDA) using Gaussian kernel principal subspaces.³⁸ Graph-embedding Grassmann discriminant analysis (GGDA)⁹ is another counterpart to the GDA method; it exploits the graph-embedded framework to implement discriminant analysis on the mapped RKHS. Grassmann nearest points (GNP)¹¹ finds the nearest Grassmann points on the mapped vector space using the affine hull. More recently, regularized Grassmann discriminant analysis (RGDA)² was proposed to circumvent the conventional problems of LDA, when the training sets are insufficient. However, as previously mentioned, the linear subspace-based methods have the limitation of using weak information to measure the similarity.¹⁰ Modeling visual features as covariance matrices for visual classification has become popular in recent years¹⁰^,¹²^,³⁹ since the nonsingular covariance matrix (as known as SPD matrix) can form a special Riemannian manifold, which is referred to as SPD manifold.¹² Previous studies employed covariance matrices to characterize local regions within an image, which is named the region covariance.³⁹ Different from the region covariance descriptor, CDL is the crucial method that models the whole image set by the covariance descriptor for addressing the image set classification with SPD manifold. Huang et al.⁴⁰ proposed log-Euclidean metric learning to learn a tangent mapping from the original tangent space of the SPD manifold to a new discriminative space. Tan and Gao¹⁶ proposed a patch-based principal covariance discriminative learning (PPCDL) method, in which the image set is partitioned into several local maximum linear patches by a hierarchical divisive clustering method, the local patches are modeled by covariance matrices, and the final discriminative learning is similar to CDL. Discriminant analysis on Riemannian manifold of Gaussian distributions (DARG)⁴¹ models the image set with a Gaussian mixture model (GMM) and derives a series of kernels for Gaussians discriminative learning on SPD manifold. Symmetric positive definite manifold learning¹² learns an orthonormal projection from the high-dimensional SPD manifold to a low-dimensional, more discriminative manifold.

3. Preliminaries

In this section, we first review the theory of CDL¹⁰ and then present the architecture of eigenspectrum regularization according to LRE.²⁹

3.1.

Covariance Discriminative Learning

CDL uses a natural methodology to characterize image sets by the covariance descriptor. Let $X = [x_{1}, x_{2}, \dots, x_{n}]$ denote the data matrix of an image set with $n$ image vectors, where $x_{i} \in R^{D}$ in the $D$ -dimensional vector space. The covariance descriptor can be expressed as

Eq. (1)

B = \frac{1}{n - 1} \sum_{i = 1}^{n} (x_{i} - \bar{x}) {(x_{i} - \bar{x})}^{T},

where

\bar{x}

denotes the mean of image vectors in

X

. The covariance matrix of

B

represents one image set, which is rather simple to derive and compute. It is worth noting that, due to the high dimensionality of visual features and insufficient samples within set, the covariance matrix of an image set is usually singular (when the number of image samples is less than the dimensions of the vector space). A simple way to circumvent this problem is to introduce a small perturbation to the covariance matrix.¹⁰ This perturbation can be denoted as

B^{*} = B + η I

, where

I

is the identity matrix and

η

is a scaling parameter. Hence, the nonsingular covariance matrix becomes a

D \times D

SPD matrix

{sym}_{D}^{+}

, which is an element on Riemannian manifold. In the following paper, we still use

B

to denote the nonsingular covariance matrix for simplicity. After modeling the image sets as multiple SPD matrices, CDL explores a Riemannian kernel that is induced by the Riemannian metric, such as the log-Euclidean distance (LED)⁴² to map the

{sym}_{D}^{+}

to an Euclidean space. The Riemannian metric of LED defines a true geodesic on the Riemannian manifold, as it is induced by a positive definite kernel,⁴² and the manifold structure can be preserved as much as possible. The metric of LED is defined as

Eq. (2)

d_{LED} (B_{1}, B_{2}) = {‖ \log (B_{1}) - \log (B_{2}) ‖}_{F},

where

{‖ \cdot ‖}_{F}

is the matrix Frobenius norm and

\log (\cdot)

denotes the principal matrix logarithm operation. The eigendecomposition of an SPD matrix

B

is given by

B = U Σ U^{T}

, and it can compute the principal matrix logarithm of

B

as

Eq. (3)

\log (B) = U \log (Σ) U^{T},

where

\log (Σ)

is easily calculated using the logarithms of the eigenvalues in the diagonal matrix

Σ

. CDL implements image set classification in an extrinsic manner by first mapping the Riemannian manifold to an Euclidean space. The mapping induced by the LED metric can be defined as

ϕ : M \to H

, where

M

denotes the manifold spanned by the SPD matrices and the vector space

H

is the inner product space on RKHS, which can be viewed as an Euclidean space

R

. Subsequently, the kernel function induced by the LED metric,

k_{\log} : (M \times M) \to R

is used to define the inner product on RKHS. For two

{sym}_{D}^{+}

matrices of

B_{1}

and

B_{2}

, the LED Riemannian kernel function can be formulated as

Eq. (4)

k_{\log} (B_{1}, B_{2}) = tr [\log (B_{1}) \log (B_{2})] .

The kernel function $k_{\log}$ is shown to be an SPD kernel¹⁰^,¹³ that obeys Mercer’s theorem.⁴³ Therefore, the manifold structure can be preserved by the LED Riemannian kernel.

The explicit kernel feature mapping allows application of any standard vector space learning algorithms. The discriminative learning of CDL is conducted by the kernel LDA¹⁸ with the kernel trick. The mapping of Riemannian manifold to an Euclidean space is defined by the function $ϕ (\cdot)$ . Therefore, if $L$ points of the specified Riemannian manifold are spanned by the ${sym}_{D}^{+}$ matrices ${B_{1}, B_{2}, \dots, B_{L}}$ , the mapped feature points on Euclidean space can be denoted as ${f (B_{1}), f (B_{2}), \dots, f (B_{L})}$ . With the inner product $⟨ ϕ (B_{i}), ϕ (B_{j}) ⟩ = k_{\log} (B_{i}, B_{j})$ , CDL seeks to solve the following optimization:¹⁰

Eq. (5)

α_{opt} = \arg \max \frac{α^{T} K W K α}{α^{T} K K α},

where

α = {[α_{1}, α_{2}, \dots, α_{L}]}^{T}

,

K

is the kernel Gram matrix with elements

K_{i j} = k_{\log} (B_{i}, B_{j})

, and

W

is the connection matrix with element

Eq. (6)

W_{i j} = {\begin{cases} \frac{1}{N_{c}}, & if B_{i} \in C_{c} and B_{j} \in C_{c} \\ 0, & otherwise \end{cases},

where

N_{c}

is the number of sets in the

c

’th class, we denote the

c

’th class as

C_{c}

in this paper. Here,

B_{i} \in C_{c}

indicates that the label of

B_{i}

belongs to class

C_{c}

. The optimal projection matrix is given by the largest

C - 1

(

C

is the number of training classes) eigenvectors of solving the eigenproblem

K W K α = λ K K α

, which is denoted as

A = [α_{1}, α_{2}, \dots, α_{C - 1}]

. Finally, for a given testing

{sym}_{D}^{+}

matrix

B^{t e} \in M

in the input manifold space. The projected feature

z^{t e}

in the new discriminant Euclidean subspace can be obtained by

Eq. (7)

z^{t e} = A^{T} K^{t e}, K^{t e} = {[k_{\log} (B_{1}, B^{t e}), \dots, k_{\log} (B_{L}, B^{t e})]}^{T} .

3.2.

Eigenspectrum Regularization Technique

Eigenspectrum regularization²⁴^,²⁶^,²⁹ was originally proposed to address the conventional problems (problems caused singularity of $S_{W}$ and the numerical instability of its inverse) of LDA on the linear Euclidean space. In this section, we introduce LRE²⁹ as an instance since it is the prototype of the proposed method in this paper.

Consider $n$ samples of training data $X = [x_{1}, x_{2}, \dots, x_{n}]$ with ${x_{i} \in R^{D} | i = 1,2, \dots, n}$ . In LRE, intrinsic data structure is modeled to regularize the directions of data locality $X L_{loc} X^{T}$ .²⁹ The eigenspectrum and directions can be obtained by decomposing

Eq. (8)

λ = V^{T} X L_{loc} X^{T} V,

where

L_{loc}

is the local Laplacian matrix, which manifests the manifold through local geometry preservation,²⁹ and

λ

is a diagonal matrix whose diagonal elements are the eigenvalues

[λ_{1}, λ_{2}, \dots, λ_{D}]

in descending order. The plot of the eigenvalues

λ_{k}

against the index

k

is referred to as the eigenspectrum.

V = [v_{1}, \dots, v_{D}]

contains the eigenvectors (directions) of the locality preserving matrix

X L_{loc} X^{T}

corresponding to

λ

. The locality preserving matrix

X L_{loc} X^{T}

has been shown to be exactly equal to the within-class scatter matrix with equal weights on the edges of adjacent data pairs of

L_{loc}

.²⁸

LRE decomposes the entire eigenspace $V$ into two subspaces: (1) the disparity subspace $V_{disparity} = [v_{1}, v_{2}, \dots, v_{q}]$ , which corresponds to lower locality preservation, and (2) the principal subspace $V_{principal} = [v_{q + 1}, v_{q + 2}, \dots, v_{D}]$ for higher locality preservation. LRE indicates that the first few eigenvectors of the eigenspace correspond to large eigenvalues that provide lower locality preserving capability, whereas the eigenvectors that correspond to smaller eigenvalues provide higher locality preserving capability. Hence, larger weights are imposed on the subspace with higher locality preservation, whereas smaller weights are assigned to the subspace with lower locality preservation. A method is devised by determining “fences” to separate the disparity subspace and principal subspace, and then regularize these two subspace according to an adaptive eigenspectrum regularization model. The fences is defined by a split point on eigenspectrum $λ_{disparity} = γ (Q 3 + 1.5 \times I Q R)$ , where $Q 3$ is the third quartile (cutting off the highest 75% or lowest 25% of the sum of $λ$ ), and $γ$ is a parameter for adaptively scaling the separating value. The definition of $I R Q$ is $I R Q = Q 3 - Q 1$ , where $Q 1$ is the first quartile.

This adaptive eigenspectrum regularization model finds the $q$ ’th split eigenvalue that satisfies $λ_{q} = \max {\forall λ_{i} | λ_{i} \leq λ_{disparity}}$ . The piecewise regularization function of LRE is defined as

Eq. (9)

w_{k}^{LRE} = {\begin{cases} λ_{k}^{- 1 / 2}, & 1 \leq k \leq q \\ λ_{q}^{- 1 / 2}, & q < k \leq D \end{cases} .

The regularization function is imposed on the corresponding eigenvectors to form a full-dimensional transformation matrix

Eq. (10)

\tilde{V} = {[w_{k}^{L R E} v_{k}]}_{k = 1}^{D} .

Then, LRE can obtain a more localized feature by transforming the original training data

Eq. (11)

\tilde{X} = {\tilde{V}}^{T} X .

We indicate that there is no dimensional reduction has occurred in this transforming. The information of the original training data is preserved as much as possible.

In the subsequent step of LRE, feature extraction and dimensional reduction from the regularized and more compact data are performed. To further preserve the within-locality and between-locality power, a similarity weight matrix $G^{LRE}$ is utilized. The element of $G^{LRE}$ is defined as

Eq. (12)

G_{i j}^{LRE} = {\begin{cases} \frac{1}{n_{c}} - \frac{1}{n}, & if x_{i} \in C_{c} and x_{j} \in C_{c} \\ - \frac{1}{n}, & otherwise \end{cases},

where

n_{c}

is the number of samples in the

c

’th class. The within-locality graph edges are weighted with positive-valued coefficients that quantify the intraclass similarity, whereas the between-locality graph edges are weighted with negative-valued coefficients that characterize discriminative features among different class samples.²⁹ The final objective function of LRE is defined as

Eq. (13)

U^{*} = \underset{U^{T} U = 1}{\arg \max} U^{T} \tilde{X} G^{LRE} {\tilde{X}}^{T} U .

This problem can be easily solved by converting it to a generalized eigenvalue problem $\tilde{X} G^{LRE} {\tilde{X}}^{T} u_{i} = φ_{i} u_{i}$ . By retaining $d$ eigenvectors $U = [u_{1}, u_{2}, \dots, u_{d}]$ ( $d \leq D$ ), which correspond to the largest $d$ eigenvalues, the projection matrix $Z^{LRE} = \tilde{V} U$ is used for the final lower-dimensional eigenfeature extraction.

4. Proposed Method

In this section, the proposed RGCDL is presented. To incorporate the eigenspectrum regularization and graph-embedded framework with SPD manifold in the kernel space, the algorithm of RGCDL is quite different from the original CDL algorithm. Generally, the algorithm of our RGCDL mainly comprises of two steps. The first step is eigenspectrum regularization, and the second step is feature extraction and dimensional reduction.

4.1.

Representation of SPD Manifold

At first, according to Harandi et al.¹² and Tan and Gao,¹⁶ the computational cost of the Riemannian kernel with a high-dimensional SPD matrix is quite high. Several strategies are available to lower the dimensionality of the SPD matrix and reduce the computational cost of constructing the Riemannian kernel matrix.¹² Here, we combine all the training data in different training sets to collaboratively produce the dimensional reduction projection matrix by PCA.

Consider $L$ training image sets $χ = {X_{i}}_{i = 1}^{L}$ , each set contains $L_{i}$ images $X_{i} = [x_{1}, x_{2}, \dots, x_{L_{i}}]$ . We combine all images of all sets to build a sample data collection

Eq. (14)

χ = {x_{j}}_{j = 1}^{N}, N = \sum_{i = 1}^{L} L_{i} .

The dimensional reduction projection matrix can be obtained by decomposing the following sample covariance matrix:

Eq. (15)

Π = \frac{1}{N} \sum_{j = 1}^{N} (x_{j} - \bar{x}) {(x_{j} - \bar{x})}^{T},

where

\bar{x}

is the sample mean. We select

d_{1} (d_{1} \leq D)

orthonormal eigenvectors that correspond to the

d_{1}

largest eigenvalues of

Π

to form the dimensional reduction projection matrix

Γ

. All images in each set are transformed to a low-dimensional feature space, and the

j

’th sample in the low-dimensional feature space is calculated as

Eq. (16)

y_{j} = Γ x_{j} .

This simple PCA that is applied to all training sets not only alleviates the problem of the high computational complexity of constructing the SPD kernel matrix but also better preserves the main variations in the set data to build the covariance matrices, which form the SPD manifold. This operation of refining the high-dimensional SPD matrices can also be viewed as a transformation from the high-dimensional manifold to a low-dimensional manifold.¹⁶

4.2.

Eigenspectrum Regularization with SPD Manifold

The $i$ ’th dimensional reduced set can be represented as $Y_{i} = [y_{1}^{i}, y_{2}^{i}, \dots, y_{L_{i}}^{i}]$ . $Y_{i}$ can be modeled by the covariance descriptor [Eq. (1)] and represented as $B_{i} \in R^{d_{1} \times d_{1}}$ . To ensure that $B_{i}$ is nonsingular to form the SPD manifold, a small perturbation is added to the covariance matrix $B_{i} + η I$ . Hence, the perturbed $B_{i}$ is an SPD matrix ${sym}_{d_{1}}^{+}$ .

For $L$ image sets of $C$ classes, they can be denoted as a collection of ${sym}_{d_{1}}^{+}$ matrices $B = {B_{1}, B_{2}, \dots, B_{L}}$ that form an SPD manifold. By defining the Riemannian mapping $ϕ : M \to H$ , we can obtain the samples $Φ (B) = [ϕ (B_{1}), ϕ (B_{2}), \dots, ϕ (B_{L})]$ on the RKHS $H$ , which is homeomorphic to Euclidean space.

To further preserve the local structure, we incorporate the graph-embedded framework into our proposed method. The local Laplacian matrix $L_{loc}$ is utilized to preserve the locality information, whereas the global Laplacian matrix is adjacent regardless of the class membership of all vertices.²⁹ In this step, we aim to obtain the eigenspectrum and directions of the local structure in the SPD Riemannian kernel space. They can be implemented by decomposing the locality preserving matrix $Φ (B) L_{loc} Φ {(B)}^{T}$ on the mapped space; we denote $Φ (B)$ as $Φ$ for simplicity. Then, we have

Eq. (17)

λ = V^{T} Φ L_{loc} Φ^{T} V,

where

V

constructs the kernel eigenspace of

Φ L_{loc} Φ^{T}

, and the eigenvalues in

λ

define the kernel eigenspectrum. The local Laplacian matrix

L_{loc}

can be specified into different local Laplacian graphs. In this work, we employ the binary local Laplacian

L_{bin}

, intraclass local Laplacian

L_{class}

, and adjustable local Laplacian

L_{adjloc}

for instances.

L_{bin}

is a simple-minded Laplacian matrix in which intraclass vertices are adjacent with equal weight of each edge.

L_{class}

is the Laplacian graph that satisfies

Eq. (18)

L_{class} = {\begin{cases} 1 - W_{i j}, & i = j, \\ - W_{i j}, & i \neq j, if B_{i} \in C_{c} and B_{j} \in C_{c} \\ 0, & otherwise \end{cases},

where

W_{i j}

is the connection weight of the

i

’th and

j

’th sets, and it has the same definition as Eq. (6) in CDL.¹⁰ The locality preserving matrix

Φ L_{class} Φ^{T}

can be proved to be equal to the kernel within-class scatter matrix

S_{W}^{Φ}

with equal weights on the edges of adjacent data pairs of

L_{class}

.²⁹

Unlike the edge weights in $L_{bin}$ and $L_{class}$ , which are fixed in values, the edge weights in $L_{adj}$ are variables that are based on different similarity definitions, such as the heat kernel in locality preserving projections⁴⁴ and neighborhood reconstruction coefficients in neighborhood preserving embedding.⁴⁵ $L_{adj}$ is a Laplacian matrix that is computed by

Eq. (19)

L_{adj} = D - W,

where

D

is a diagonal matrix calculated by

D_{i i} = \sum_{j} W_{i j}

. In this paper, we compute the edge weights of

W

in Eq. (19) based on the heat kernel, which is calculated by Gaussian distribution. The edge weight

W_{i j}

of

L_{adj}

can be computed as

Eq. (20)

W_{i j} = {\begin{cases} \exp (- \frac{{‖ ϕ (B_{i}) - ϕ (B_{j}) ‖}^{2}}{σ}), & if B_{i} \in C_{c} and B_{j} \in C_{c} \\ 0, & otherwise \end{cases},

where

σ

is the kernel width parameter. The Euclidean distance of the mapped feature

ϕ (B_{i})

and

ϕ (B_{j})

can be easily transformed to

K_{i i} - 2 K_{i j} + K_{j j}

, where

K_{i j}

can be calculated by the Riemannian kernel, such as Eq. (4).

As known by linear algebra, the projection direction of $V$ in Eq. (17) can be represented as a linear combination of the eigenspace on the mapped space

Eq. (21)

V = Φ α .

By substituting Eq. (21) into Eq. (17), we obtain $λ = α^{T} Φ^{T} Φ L_{loc} Φ^{T} Φ α$ . We use the Riemannian kernel function [e.g., Eq. (4)] to build the kernel Gram matrix $K = Φ^{T} Φ$ , Eq. (17) can be rewritten as

Eq. (22)

λ = α^{T} K L_{loc} K α .

Equation (22) can be solved by the eigendecomposition of $K L_{loc} K$ subject to $α^{T} α = 1$ .

In the theory of eigenspectrum regularization, we need to regularize the whole feature space $V$ [see Eq. (21)] on the mapped space. Assume that, the regularization can be generalized to the weighting function, which is defined as

Eq. (23)

w = {[w_{k}]}_{k = 1}^{L},

where

L

is the number of all training sets. The full-dimensional feature space

V

contains

L

vectors

[v_{1}, v_{2}, \dots, v_{L}]

since the dimensions of matrix

K L_{loc} K

is

L \times L

). Hence, the regularized eigenspace can be computed as

Eq. (24)

\tilde{V} = {[w_{k} v_{k}]}_{k = 1}^{L} .

According to Eq. (21), we have $v_{k} = Φ α_{k}$ . By defining

Eq. (25)

ρ = {[w_{k} α_{k}]}_{k = 1}^{L}

the regularized eigenspace of Eq. (24) can be rewritten as

Eq. (26)

\tilde{V} = Φ ρ .

The regularized eigenspace is known as a transformation matrix²⁴ that can transform the original feature data to an intermediate feature vector space. It is worth noting that the transformation matrix $\tilde{V}$ is a full-dimensional matrix with size $L \times L$ . Hence, the mapped data $Φ (B)$ from SPD manifold can be transformed to the new feature vector space $\tilde{Φ} (B)$ with no dimensional reduction, which can preserve information as much as possible. We denote $\tilde{Φ} (B)$ as $\tilde{Φ}$ for simplicity. The transformation is depicted as

Eq. (27)

\tilde{Φ} = {\tilde{V}}^{T} Φ .

Although $Φ$ is implicitly defined, the transformed feature $\tilde{Φ}$ can be explicitly expressed by the kernel trick. According to Eqs. (26) and (27), we can denote the transformed $\tilde{Φ}$ by $\tilde{Φ} = ρ^{T} Φ^{T} Φ$ . As $K = Φ^{T} Φ$ , Eq. (27) can be rewritten as

Eq. (28)

\tilde{Φ} = ρ^{T} K .

In this aspect, according to the previous mathematical deduction, by defining the important regularized eigenspace $ρ$ [see Eq. (25)], the regularization of eigenspace $V$ is turned into the regularization of the eigenspace $α$ . In other words, the effectiveness of eigenspectrum regularization model on the eigenspace $α$ is equivalent to the eigenspace $V$ .

The selection of a suitable eigenspectrum regularization model is a critical aspect of the proposed method. The proper eigenspectrum regularization model ensures that the regularized data can be very close to the real population variances.⁴⁶ The eigenspectrum regularization of LRE is an adaptive model that estimates the optimal parameter $γ$ using training data. However, this process is usually time-consuming, and the performance decays quickly when the training data are insufficient. In this paper, we employ the data-independent eigenspectrum regularization models of ERE and CDEFE to regularize the eigenspace $α$ , which are more general and robust. The first model is the eigenspectrum regularization model of ERE.²⁴ The heuristic theory of ERE for designing the eigenspectrum regularization model is the median operation. The weighting function applied to the eigenspace $α$ is defined as

Eq. (29)

w_{k}^{ERE} = {\begin{cases} λ_{k}^{- 1 / 2} & k < m_{1} \\ {(\frac{a}{k + b})}^{- 1 / 2} & m_{1} \leq k \leq r \\ {(\frac{a}{r + 1 + b})}^{- 1 / 2} & r < k \leq L \end{cases},

where

m_{1}

is the

m_{1}

’th eigenvalue of

λ

in descending order, which satisfies

Eq. (30)

λ_{m_{1}} = \max {\forall λ_{k} | λ_{k} < [λ_{med} + μ (λ_{med} - λ_{r})]},

where

λ_{med}

is the median value computed by

median {\forall λ_{k} | k \leq r}

.

μ

is a constant with a recommendation value of 1.²⁴

r

is the rank of

K L_{loc} K

. The parameters of

a

and

b

are calculated as

Eq. (31)

a = \frac{λ_{1} λ_{m_{1}} (m_{1} - 1)}{λ_{1} - λ_{m_{1}}}, b = \frac{m_{1} λ_{m_{1}} - λ_{1}}{λ_{1} - λ_{m_{1}}} .

The second regularization model is taken from CDEFE.²⁶ The eigenspectrum regularization model of CDEFE regularizes the eigenspace in a Gaussian kernel space, which may have a special effect on the proposed RGCDL in the Riemannian kernel space.

The second regularization model aims to find the minimum eigenratio from the eigenspectrum of $K L_{loc} K$ , which is formed by the eigenvalues [see Eq. (17)] in descending order. Let $δ_{k}$ denote the ratio of two adjacent eigenvalues $λ_{k}$ and $λ_{k + 1}$ in the eigenspectrum, we have

Eq. (32)

δ_{k} = λ_{k} / λ_{k + 1} .

The minimum eigenratio can be formulated as

Eq. (33)

δ_{s} = \min {\forall δ_{k}, 1 \leq k < r},

where

s

is the index of the minimum eigenratio, and

r

is the rank of the locality matrix

K L_{loc} K

. The eigenspectrum

λ

is split by the point of the

m_{2}

’th eigenvalue, and

λ_{s}

is defined as

λ_{s} = \max {\forall λ_{k}, k \geq m_{2}}

. Thus, the final regularized weighting function can be defined as

Eq. (34)

w_{k}^{CDEFE} = {\begin{cases} λ_{k}^{- 1 / 2}, & 1 \leq k \leq m_{2} \\ λ_{m_{2}}^{- 1 / 2}, & m_{2} < k \leq L . \end{cases}

4.3.

Feature Extraction and Dimensional Reduction

The new feature vectors $\tilde{Φ}$ are compact and full dimensional, the eigenfeature of $\tilde{Φ}$ should be decorrelated and dimension reduced for classification. According to Jiang et al.,²⁴ PCA is exploited to extract the final discriminative eigenfeatures since it is less sensitive to different training databases. However, the class affinity is not considered in Ref. 24, which may cause the missing discriminative information. In this work, we employ a graph-embedded framework to extract the final discriminative features, which incorporates a similarity weight matrix to form the final scatter matrix. Although LRE had extended the graph-embedded framework to eigenfeature extraction, our method is designed to address the problems in a Riemannian kernel space.

According to Eq. (13), the eigenfeature extraction and dimensional reduction of RGCDL can be achieved by solving the following eigendecomposition problem on the mapped space:

Eq. (35)

U^{*} = \underset{U^{T} U = 1}{\arg \max} U^{T} \tilde{Φ} G {\tilde{Φ}}^{T} U,

where

G

is the similarity weight matrix, and the affinity between set

X_{i}

and set

X_{j}

is defined as

Eq. (36)

G_{i j} = {\begin{cases} \frac{1}{N_{c}} - \frac{1}{L}, & if X_{i} \in C_{c} and X_{j} \in C_{c} \\ - \frac{1}{L}, & otherwise \end{cases},

where

N_{c}

is the number of sets in the

c

’th class. This similarity weight matrix allows the intraclass samples to be more compact and allows the interclass samples to be more separated. Clearly, the problem of Eq. (35) can be solved by decomposing matrix

\tilde{Φ} G {\tilde{Φ}}^{T}

. The projection matrix

U

consists of the eigenvectors that correspond to eigenvalues in descending order. We retain the first

d_{2}

eigenvectors

U = [u_{1}, u_{2}, \dots, u_{d_{2}}]

, where

d_{2} \leq L

, for the final dimension of the extracted feature. Hence, the final regularized projection matrix of RGCDL can be defined as

Eq. (37)

Z = \tilde{V} U .

Obviously, $Z$ does not have an explicit expression since $\tilde{V} = Φ ρ$ , and $Φ$ is the vector space mapped by the Riemannian mapping, which is implicitly defined.

However, an explicit expression can be provided by the kernel trick when calculated with the test samples. For a given test nonsingular covariance matrix $B^{t e}$ , which is an element of the SPD manifold, we use $ϕ^{t e}$ to denote the test feature vector that is mapped by the Riemannian mapping. Subsequently, we can extract the discriminative feature $F$ by the transformation

Eq. (38)

F = Z^{T} ϕ^{t e} = U^{T} {\tilde{V}}^{T} ϕ^{t e}

Substitute

\tilde{V}

by Eq. (26), and calculate a kernel Gram matrix

K^{t e}

by the Riemannian kernel function [e.g., Eq. (4)], the final extracted eigenfeature can be rewritten as

Eq. (39)

F = {(ρ U)}^{T} K^{t e}, K^{t e} = {[k_{\log} (B_{1}, B^{t e}), \dots, k_{\log} (B_{L}, B^{t e})]}^{T} .

Here,

F

is constructed by feature vectors on the mapped space. Hence, various distance metrics and classification methods that designed in Euclidean space, such as the nearest neighbor (NN) classifier, can be applied for classification.

4.4.

Complete RGCDL Algorithm

The steps of RGCDL algorithm are given in Algorithm 1.

Algorithm 1

RGCDL algorithm.

At the training stage:

1. Collaboratively reduce

L

training sets

{X_{1}, X_{2}, \dots, X_{L}}

by Eq. (16) to

{Y_{1}, Y_{2}, \dots, Y_{L}}

, and model them as

{sym}_{d_{1}}^{+}

by the covariance descriptor [Eq. (1)], denoted as

{B_{1}, B_{2}, \dots, B_{L}}

.

2. Use Riemannian mapping

ϕ : M \to H

to map the

{sym}_{d_{1}}^{+}

matrices

{B_{1}, B_{2}, \dots, B_{L}}

to the RKHS, and designate them

{ϕ_{1}, ϕ_{2}, \dots, ϕ_{L}}

.

3. Obtain the optimal coefficient

α

and the eigenspectrum

λ

by solving the eigendecomposition problem of Eq. (22); switch the local Laplacian matrix

L_{loc}

by

L_{bin}

,

L_{class}

, and

L_{adj}

, respectively.

4. Regularize the eigenspace

V

using the weighting functions of Eqs. (29) and (34), respectively, and construct the full-dimensional transformation matrix

\tilde{V}

using Eq. (24).

5. Transform the mapped feature

Φ

to an intermediate full-dimensional feature space using Eqs. (25) and (28).

6. Solve the eigendecomposition problem of Eq. (35), and compute the final projection matrix

Z

by Eq. (37).

At the testing stage:

1. Given a test image set

X^{t e}

, project it to a lower-dimensional feature space and extract the covariance feature

B^{t e}

.

2. Map

B^{t e}

to the RKHS by the Riemannian mapping, and designate as

ϕ^{t e}

.

3. Project

ϕ^{t e}

to the final projection matrix

Z

to extract the discriminative features by Eq. (39).

4. Measure the extracted features between the training and test sets, and classify the label by the classifier (e.g., NN).

5. Experimental Results

Experiments were conducted on set-based face recognition and object categorization tasks. First, we compare the proposed RGCDL to the original CDL method when using different numbers of extracted features. Second, we show the advantages of RGCDL over the recent RGDA method. Last, we evaluate the recognition performance of our RGCDL, and compare it to numerous image set-based classification methods.

5.1.

Dataset and Parameter Settings

We employed the Extended Yale face database B (ExtYaleB)⁴⁷ for face recognition task and the RGB-D object database⁴⁸ for object categorization task. The ExtYaleB database is the extension of the Yale face database B; it contains 16,128 images of 28 human subjects with 64 illumination conditions and 9 poses for each subject. According to the 9 poses of each subject, we built 9 image sets ( $\sim 60$ images per set), which correspond to 9 poses for each subject. We utilized a cascaded face detector⁴⁹ to collect faces from each image frame. The captured faces were then converted to grayscale and resized to $20 \times 20 pixels$ . Some example images are shown in Fig. 2. We selected 2 to 5 image sets of 9 poses for discriminative training (103 sets) and employed the remaining sets for testing (149 sets). The experiments were repeated 10 times by randomly choosing the reference sets for training and the test sets for probe.

Fig. 2

Captured face examples from the ExtYaleB dataset.

The RGB-D object database is a large-scale dataset of 300 common household objects that are organized into 51 categories (classes). Each category has 3 to 14 objects that belong to the same category. For each object, 3 video sequences are recorded with a camera that is mounted at different heights so that the object is viewed from different angles with the horizon. The video sequences were captured by placing each object on a turntable for a whole rotation using a Kinect style three-dimensional camera. More than 100 images were extracted for each object’s video sequences; they involve RGB color channels and a depth channel. We removed the depth images in this study to ensure fair comparisons. We built image sets according to each object, forming a total of 300 image sets (102 sets for training and 198 sets for testing). Grayscale and resized images of $20 \times 20 pixels$ were adopted for RGB-D dataset. Some example objects are shown in Fig. 3. To obtain more general results, we also conducted 10 cross-validation experiments by randomly choosing different combinations of training sets and test sets. The NN classifier was applied for all evaluations to ensure fair comparisons.

Fig. 3

Some example objects from the RGB-D dataset.

5.2.

Stability of Extracted Features

In this section, we evaluated the stability of the extracted features from RGCDL. We show that by applying eigenspectrum regularization, the features extracted by RGCDL are more stable than those extracted by the original CDL method.

As described in Sec. 4, the proposed RGCDL aims to extract features from the whole regularized eigenfeature space. As the number of final extracted features increases [controlled by varying $d_{2}$ dimensions of $U$ in Eq. (37)], a higher performance can be achieved by our RGCDL, whereas the original CDL algorithm cannot retain this characteristic. To confirm this assumption and provide evidence, we employed real data of face and object datasets to conduct these experiments.

The collaborative dimensional reduction of each image set is set to 100 dimensions, that is, the dimension-reduced covariance matrix of each image set is $100 \times 100$ . Attributed to the covariance descriptor, the computational cost of constructing the Riemannian kernel matrices is not associated with the number of images within a set. The Riemannian kernel induced by the LED in Eq. (4) was employed for the RGCDL and CDL methods. We vary the final feature dimensions of RGCDL and CDL to perform a comprehensive comparison. The comparison results are shown in Figs. 4 and 5. Each figure consists of the recognition rates against the number of final extracted features. The recognition rates are the average results of 10 cross-validation experiments. Two eigenspectrum regularization models of ERE and CDEFE were evaluated for the proposed RGCDL.

Fig. 4

Error rates using different numbers of features on the ExtYaleB dataset.

Fig. 5

Error rates using different numbers of features on the RGB-D dataset.

As shown in Figs. 4 and 5, with the increasing dimensions of the final extracted features, the proposed RGCDL methods with two regularization models generally produce low error rates, whereas the original CDL degrades rapidly from the dimension $C - 1$ . The dimension of $C - 1$ represents the optimal performance number of features in LDA.¹⁷ The degradation of CDL is caused by the incorrectly scaled null kernel space of the within-class scatter matrix,²⁴ which causes overfitting and poor generalization. These results reveal that, with the eigenspectrum regularization models, the conventional problems (e.g., the singularity of within-class scatter matrix) caused by limited training samples in CDL can be alleviated. Since the new feature space is properly scaled, the estimated eigenvalues obey the true variances of the population;²⁴ hence, better generalization can be achieved. The final extracted features are learned from the regularized full-dimensional transformation matrix $\tilde{V}$ [Eq. (26)], which can preserve discriminative information as much as possible. As a result, using an increasing number of features, the recognition rates of RGCDL preserve the stable performance.

5.3.

Performance Evaluation against RGDA

RGDA is the preliminary work of this paper, however, RGDA was proposed to solve the overfitting and poor generalization problems of Grassmann discriminative learning, whereas our RGCDL solves these problems against CDL on SPD manifold. Moreover, we further employed different local Laplacian graphs to analyze the locality preserving ability and improve the performance; the locality preserving is evaluated in Sec. 5.4. We discovered that the performance of the SPD manifold with the eigenspectrum regularization techniques is better than that of the Grassmann manifold. We conducted two experiments to evaluate the advantages of the proposed method. First, we compared the classification ability of RGDA and RGCDL when different dimensions of the extracted features were applied. As shown in Figs. 6 and 7, for both eigenspectrum regularization models in two datasets, the error rates of RGCDL with different number of features always lower than those of the RGDA. Moreover, the error rate curves of RGCDL are smoother and steadier for a different number of features, and RGCDL can achieve a lower error rate even with a low number of features, particularly for the ExtYaleB dataset. The RGDA method usually cannot achieve high performance using lower dimensions of features. This finding demonstrates that the SPD manifold formed by the covariance matrices has better discriminative information preservation ability than the Grassmann manifold formed by subspaces.

Fig. 6

Performance comparisons of RGCDL and RGDA on the ExtYaleB dataset.

Fig. 7

Performance comparisons of RGCDL and RGDA on the RGB-D dataset.

Subsequently, we conducted noisy set data to evaluate the robustness of the proposed method. Image sets may contain noisy data in real-world applications, for example, outliers of other categories or subjects within sets exist, which may degrade the performance of classifiers. Here, we show that the SPD manifold-based RGCDL method is more robust than the Grassmann manifold-based RGDA method. We conducted experiments by systematically corrupting the training (gallery) sets or test (probe) sets. The corruption is implemented by adding images from other classes. The data with no noise are denoted as “clean,” the data with noise in the gallery sets are denoted as “N_G,” and the data with noise in the probe sets are denoted as “N_P.” Experiments were evaluated using both face recognition and object categorization.

The average classification rates of several cross-validations with different noise-corrupted datasets are shown in Figs. 8 and 9. The classification rates of our RGCDL always outperform RGDA in clean and different corrupted data. Especially in the gallery corrupted data N_G, RGCDL-ERE and RGCDL-CDEFE exhibit great advantages than RGDA-ERE and RGDA-CDEFE. Once again, these results demonstrate that the SPD manifold formed by the second-order statistic covariance matrices can be able to account for the noisy set data better than the Grassmann manifold formed by subspaces; it reveals the robustness of RGCDL when dealing with noisy set data.

Fig. 8

Evaluated performances on the noisy ExtYaleB dataset.

Fig. 9

Evaluated performances on the noisy RGB-D dataset.

5.4.

Performance Comparison to Other Set-Based Classification Methods

We further evaluated the proposed RGCDL compared with other set-based classification methods. Multiple image set-based classification methods were evaluated for comprehensive comparison. The compared methods include the subspace-based methods DCC⁷ and ECCA;³⁶ the Grassmann manifold methods of GDA,⁸ KGDA,³⁸ GGDA,⁹ MMD,³ GNP,¹¹ and RGDA;² the SPD Riemannian manifold methods of CDL,¹⁰ PPCDL,¹⁶ and DARG.⁴¹

The parameter settings of different methods are depicted as follows. The final feature dimension of GDA, KGDA, GGDA, and RGDA was established as the recommendation of Ref. 2, and only the projection kernel⁸ was employed for Grassmannian mapping. The parameters (such as the nonlinearity score and number of NNs of data points) of MMD were tuned to be optimal with the code provided by the authors in our datasets. For CDL and PPCDL, the final feature dimension was set as the recommended value $C - 1$ . The dimension of the input covariance matrices of DARG and PPCDL was set to $100 \times 100$ , which is the same as the proposed RGCDL for fairness. The Riemannian kernel induced by LED was applied for CDL, PPCDL, and our RGCDL. We chose kernel-based DARG and the good performance of MD + LED⁴¹ distance matrix for evaluation. We fixed the dimension to 150 for DCC by preapplying PCA to the data.⁷ The number of canonical correlations of DCC and ECCA was set to 20, which is the same as the Grassmannian dimension of the GDA, KGDA, GGDA, and RGDA methods. For RGDA and our RGCDL, the eigenspectrum regularization models of ERE and CDEFE were applied for comparison. Three local Laplacian matrices of $L_{bin}$ , $L_{class}$ , and $L_{adj}$ were evaluated in the proposed RGCDL.

Experiments were evaluated on the ExtYaleB and RGB-D datasets. The experimental results are formed by the average classification rates and standard deviations over 10-fold cross-validations. As shown in Table 1, the proposed RGCDL with regularization models of ERE and CDEFE achieves the best classification results among all methods. The SPD manifold with covariance matrices of CDL, PPCDL, DARG, and our RGCDL approaches usually achieves better performance than other methods in ExtYaleB, which has shown the better accommodative ability of the second-order statistic of covariance matrix on handling the illumination-varying face recognition. The inferior performances of PPCDL and DARG on RGB-D dataset may cause by the conventional problems of discriminative learning and the improperly estimated GMM. Benefitting from the eigenspectrum regularization with the graph-embedded framework, the proposed RGCDL with different models outperforms all other methods. For the evaluation of different local Laplacian graphs, the $L_{class}$ achieves the best results with the ERE regularization model. However, the performance of adjustable Laplacian matrix $L_{adj}$ is also outstanding in ERE model, and it achieves the best results with the CDEFE model. The adjustable Laplacian matrix $L_{adj}$ performs stable in different regularization models, it has revealed the good locality preserving ability. Obviously, $L_{adj}$ is not the best local Laplacian matrix for locality preserving, better affinity matrix can be designed according to suitable theories.

Table 1

Average classification rates and standard deviations (%) on ExtYaleB and RGB-D datasets.

Method	ExtYaleB	RGB-D
MMD	$58.9 \pm 4.1$	$54.4 \pm 2.8$
KGDA	$86.7 \pm 2.7$	$66.5 \pm 2.6$
GGDA	$73.4 \pm 4.1$	$61.9 \pm 2.6$
GDA	$89.7 \pm 2.1$	$66.3 \pm 3.6$
ECCA	$68.2 \pm 2.5$	$56.3 \pm 2.7$
DCC	$79.9 \pm 2.6$	$67.3 \pm 2.6$
CDL	$97.4 \pm 1.2$	$69.7 \pm 3.4$
GNP	$80.1 \pm 1.6$	$62.1 \pm 1.1$
PPCDL	$97.2 \pm 1.3$	$51.8 \pm 2.7$
DARG	$91.8 \pm 2.4$	$55.3 \pm 4.1$
RGDA-ERE	$91.7 \pm 2.4$	$65.2 \pm 3.0$
RGDA-CDEFE	$91.0 \pm 1.1$	$64.9 \pm 3.2$
RGCDL-ERE- $L_{bin}$	$98.5 \pm 1.3$	$71.9 \pm 3.3$
RGCDL-ERE- $L_{class}$	$98.5 \pm 1.3$	$72.1 \pm 3.3$
RGCDL-ERE- $L_{adj}$	$98.5 \pm 1.3$	$72.0 \pm 2.2$
RGCDL-CDEFE- $L_{bin}$	$98.1 \pm 1.1$	$71.4 \pm 3.3$
RGCDL-CDEFE- $L_{class}$	$98.3 \pm 1.1$	$71.1 \pm 2.2$
RGCDL-CDEFE- $L_{adj}$	$98.5 \pm 1.1$	$71.5 \pm 2.2$

Note: The bold values denote the proposed methods and the highest classification rates.

6. Conclusion

In this paper, we proposed a regularized graph-embedded CDL method, which is referred to as RGCDL. The eigenspectrum regularization and graph-embedded framework are collaboratively employed to attenuate the overfitting and poor generalization problems of the original CDL method. Comprehensive mathematical deduction in SPD manifold kernel space is given to exhibit the combination of these techniques. The experimental results of evaluating a different number of extracted features show that the proposed method can maintain stable and lower error rates throughout all dimensions of the extracted features. This result manifests the stability of the eigenspectrum regularization to linear discriminative learning in the SPD manifold kernel space. The graph-embedded framework benefits by preserving compact within-class affinity relations and achieves higher performance. Compared with the more recent RGDA method, our RGCDL achieves higher and steadier performance when different number of features are employed. Moreover, our RGCDL exhibits more robust ability than RGDA when the gallery or probe sets are corrupted by noise. According to the plentiful comparisons with other set-based classification methods, our RGCDL has shown considerable results. The local Laplacian matrix reflects the local structure of intraclass, how to devise the similarity of intraclass vertex pairs to better preserve locality information is one of our future works.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61701126, 61802148, and 61802079, the Research projects in Guangzhou University, China, under Grant No. RP2020123, and the Scientific Research Program of Guangzhou under Grant No. 201904010493.

References

1.

H. Hu, “Face recognition with image sets using locally Grassmannian discriminant analysis,” IEEE Trans. Circuits Syst. Video Technol., 24 (9), 1461 –1474 (2014). https://doi.org/10.1109/TCSVT.2014.2309834 Google Scholar

2.

H. Tan et al., “Eigenspectrum regularization on Grassmann discriminant analysis with image set classification,” IEEE Access, 7 150792 (2019). https://doi.org/10.1109/ACCESS.2019.2947548 Google Scholar

3.

R. Wang et al., “Manifold–manifold distance and its application to face recognition with image sets,” IEEE Trans. Image Process., 21 (10), 4466 –4479 (2012). https://doi.org/10.1109/TIP.2012.2206039 IIPRE4 1057-7149 Google Scholar

4.

H. Cevikalp and B. Triggs, “Face recognition based on image sets,” in IEEE Conf. Comput. Vision and Pattern Recognit., 2567 –2573 (2010). https://doi.org/10.1109/CVPR.2010.5539965 Google Scholar

5.

Y. Q. Hu, A. S. Mian and R. Owens, “Face recognition using sparse approximated nearest points between image sets,” IEEE Trans. Pattern Anal. Mach. Intell., 34 (10), 1992 –2004 (2012). https://doi.org/10.1109/TPAMI.2011.283 ITPIDJ 0162-8828 Google Scholar

6.

P. F. Zhu et al., “Image set-based collaborative representation for face recognition,” IEEE Trans. Inf. Forensics Secur., 9 (7), 1120 –1132 (2014). https://doi.org/10.1109/TIFS.2014.2324277 Google Scholar

7.

T. K. Kim, J. Kittler and R. Cipolla, “Discriminative learning and recognition of image set classes using canonical correlations,” IEEE Trans. Pattern Anal. Mach. Intell., 29 (6), 1005 –1018 (2007). https://doi.org/10.1109/TPAMI.2007.1037 ITPIDJ 0162-8828 Google Scholar

8.

J. Hamm and D. D. Lee, “Grassmann discriminant analysis: a unifying view on subspace-based learning,” in Int. Conf. Mach. Learn., 376 –383 (2008). Google Scholar

9.

M. T. Harandi et al., “Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching,” in IEEE Conf. Comput. Vision and Pattern Recognit., 2705 –2712 (2011). https://doi.org/10.1109/CVPR.2011.5995564 Google Scholar

10.

R. Wang et al., “Covariance discriminative learning: a natural and efficient approach to image set classification,” in IEEE Conf. Comput. Vision and Pattern Recognit., 2496 –2503 (2012). https://doi.org/10.1109/CVPR.2012.6247965 Google Scholar

11.

H. L. Tan et al., “Grassmann manifold for nearest points image set classification,” Pattern Recognit. Lett., 68 190 –196 (2015). https://doi.org/10.1016/j.patrec.2015.09.008 PRLEDG 0167-8655 Google Scholar

12.

M. Harandi, M. Salzmann and R. Hartley, “Dimensionality reduction on SPD manifolds: the emergence of geometry-aware methods,” IEEE Trans. Pattern Anal. Mach. Intell., 40 (1), 48 –62 (2018). https://doi.org/10.1109/TPAMI.2017.2655048 ITPIDJ 0162-8828 Google Scholar

13.

S. Jayasumana et al., “Kernel methods on the Riemannian manifold of symmetric positive definite matrices,” in IEEE Conf. Comput. Vision and Pattern Recognit., 73 –80 (2013). https://doi.org/10.1109/CVPR.2013.17 Google Scholar

14.

D. G. Kendall, “Shape manifolds, procrustean metrics, and complex projective spaces,” Bull. London Math. Soc., 16 (2), 81 –121 (1984). https://doi.org/10.1112/blms/16.2.81 Google Scholar

15.

R. Vemulapalli, J. K. Pillai and R. Chellappa, “Kernel learning for extrinsic classification of manifold features,” in IEEE Conf. Comput. Vision and Pattern Recognit., 1782 –1789 (2013). https://doi.org/10.1109/CVPR.2013.233 Google Scholar

16.

H. Tan and Y. Gao, “Patch-based principal covariance discriminative learning for image set classification,” IEEE Access, 5 15001 –15012 (2017). https://doi.org/10.1109/ACCESS.2017.2733718 Google Scholar

17.

P. N. Belhumeur, J. P. Hespanha and D. J. Kriegman, “Eigenfaces vs. Fisherfaces: recognition using class specific linear projection,” IEEE Trans. Pattern Anal. Mach. Intell., 19 (7), 711 –720 (1997). https://doi.org/10.1109/34.598228 ITPIDJ 0162-8828 Google Scholar

18.

G. Baudat and F. Anouar, “Generalized discriminant analysis using a kernel approach,” Neural Comput., 12 (10), 2385 –2404 (2000). https://doi.org/10.1162/089976600300014980 NEUCEB 0899-7667 Google Scholar

19.

W. Liu et al., “Null space-based kernel Fisher discriminant analysis for face recognition,” in IEEE Int. Conf. Autom. Face and Gesture Recognit., 369 –374 (2004). https://doi.org/10.1109/AFGR.2004.1301558 Google Scholar

20.

H. Yu and H. Yang, “A direct LDA algorithm for high-dimensional data—with application to face recognition,” Pattern Recognit., 34 (10), 2067 –2070 (2001). https://doi.org/10.1016/S0031-3203(00)00162-X Google Scholar

21.

L. F. Chen et al., “A new LDA-based face recognition system which can solve the small sample size problem,” Pattern Recognit., 33 (10), 1713 –1726 (2000). https://doi.org/10.1016/S0031-3203(99)00139-9 Google Scholar

22.

M. H. Yang, “Kernel eigenfaces vs. kernel Fisherfaces: face recognition using kernel methods,” in IEEE Int. Conf. Autom. Face and Gesture Recognit., (2002). https://doi.org/10.1109/AFGR.2002.4527207 Google Scholar

23.

J. Lu, K. N. Plataniotis and A. N. Venetsanopoulos, “Face recognition using kernel direct discriminant analysis algorithms,” IEEE Trans. Neural Networks, 14 117 –126 (2003). https://doi.org/10.1109/TNN.2002.806629 ITNNEP 1045-9227 Google Scholar

24.

X. D. Jiang, B. Mandal and A. Kot, “Eigenfeature regularization and extraction in face recognition,” IEEE Trans. Pattern Anal. Mach. Intell., 30 (3), 383 –394 (2008). https://doi.org/10.1109/TPAMI.2007.70708 ITPIDJ 0162-8828 Google Scholar

25.

X. Wang and X. Tang, “Dual-space linear discriminant analysis for face recognition,” in IEEE Conf. Comput. Vision and Pattern Recognit., (2004). https://doi.org/10.1109/CVPR.2004.1315214 Google Scholar

26.

X. Jiang, B. Mandal and A. Kot, “Complete discriminant evaluation and feature extraction in kernel space for face recognition,” Mach. Vision Appl., 20 (1), 35 –46 (2009). https://doi.org/10.1007/s00138-007-0103-1 MVAPEO 0932-8092 Google Scholar

27.

B. Mandal et al., “Prediction of eigenvalues and regularization of eigenfeatures for human face verification,” Pattern Recognit. Lett., 31 (8), 717 –724 (2010). https://doi.org/10.1016/j.patrec.2009.10.006 PRLEDG 0167-8655 Google Scholar

28.

P. Y. Han, A. B. J. Teoh and F. S. Abas, “Regularized locality preserving discriminant embedding for face recognition,” Neurocomputing, 77 (1), 156 –166 (2012). https://doi.org/10.1016/j.neucom.2011.09.007 NRCGEO 0925-2312 Google Scholar

29.

Y. H. Pang, A. B. J. Teoh and F. S. Hiew, “Locality regularization embedding for face verification,” Pattern Recognit., 48 (1), 86 –102 (2015). https://doi.org/10.1016/j.patcog.2014.07.010 Google Scholar

30.

X. Liu and T. Cheng, “Video-based face recognition using adaptive hidden markov models,” in IEEE Conf. Comput. Vision and Pattern Recognit., 340 –345 (2003). https://doi.org/10.1109/CVPR.2003.1211373 Google Scholar

31.

M. Kim et al., “Face tracking and recognition with visual constraints in real-world videos,” in IEEE Conf. Comput. Vision and Pattern Recognit., 1787 –1794 (2008). https://doi.org/10.1109/CVPR.2008.4587572 Google Scholar

32.

O. Yamaguchi, K. Fukui and K. I. Maeda, “Face recognition using temporal image sequence,” in IEEE Int. Conf. Autom. Face and Gesture Recognit., 318 –323 (1998). https://doi.org/10.1109/AFGR.1998.670968 Google Scholar

33.

H. Hotelling, “Relation between two sets of variable,” Biometrica, 28 321 –377 (1936). https://doi.org/10.1093/biomet/28.3-4.321 BIJODN 1521-4036 Google Scholar

34.

K. Fukui and O. Yamaguchi, “Face recognition using multi-viewpoint patterns for robot vision,” in Int. Symp. Rob. Res., 192 –201 (2005). Google Scholar

35.

K. Fukui and A. Maki, “Difference subspace and its generalization for subspace-based methods,” IEEE Trans. Pattern Anal. Mach. Intell., 37 (11), 2164 –2177 (2015). https://doi.org/10.1109/TPAMI.2015.2408358 ITPIDJ 0162-8828 Google Scholar

36.

O. Arandjelovic, “Discriminative extended canonical correlation analysis for pattern set matching,” Mach. Learn., 94 (3), 353 –370 (2014). https://doi.org/10.1007/s10994-013-5380-5 MALEEZ 0885-6125 Google Scholar

37.

R. Wang and X. Chen, “Manifold discriminant analysis,” in IEEE Conf. Comput. Vision and Pattern Recognit., 429 –436 (2009). https://doi.org/10.1109/CVPR.2009.5206850 Google Scholar

38.

T. S. Wang and P. F. Shi, “Kernel Grassmannian distances and discriminant analysis for face recognition from image sets,” Pattern Recognit. Lett., 30 (13), 1161 –1165 (2009). https://doi.org/10.1016/j.patrec.2009.06.002 PRLEDG 0167-8655 Google Scholar

39.

O. Tuzel, F. Porikli and P. Meer, “Pedestrian detection via classification on Riemannian manifolds,” IEEE Trans. Pattern Anal. Mach. Intell., 30 (10), 1713 –1727 (2008). https://doi.org/10.1109/TPAMI.2008.75 ITPIDJ 0162-8828 Google Scholar

40.

Z. H. Huang et al., “Log-Euclidean metric learning on symmetric positive definite manifold with application to image set classification,” in Int. Conf. Mach. Learn., 720 –729 (2015). Google Scholar

41.

W. Wang et al., “Discriminant analysis on Riemannian manifold of Gaussian distributions for face recognition with image sets,” IEEE Trans. Image Process., 27 151 –163 (2018). https://doi.org/10.1109/TIP.2017.2746993 IIPRE4 1057-7149 Google Scholar

42.

V. Arsigny et al., “Geometric means in a novel vector space structure on symmetric positive definite matrices,” SIAM J. Matrix Anal. Appl., 29 (1), 328 –347 (2007). https://doi.org/10.1137/050637996 SJMAEL 0895-4798 Google Scholar

43.

B. Scholkopf and A. J. Smola, Learning with Kernels, MIT Press, Cambridge, Massachusetts (2002). Google Scholar

44.

X. He and P. Niyogi, “Locality preserving projections,” in Adv. Neural Inf. Process. Syst., 234 –241 (2003). Google Scholar

45.

X. He et al., “Neighborhood preserving embedding,” in IEEE Int. Conf. Comput. Vision, (2005). https://doi.org/10.1109/ICCV.2005.167 Google Scholar

46.

X. D. Jiang, “Linear subspace learning-based dimensionality reduction,” IEEE Signal Process Mag., 28 (2), 16 –26 (2011). https://doi.org/10.1109/MSP.2010.939041 Google Scholar

47.

A. S. Georghiades, P. N. Belhumeur and D. J. Kriegman, “From few to many: illumination cone models for face recognition under variable lighting and pose,” IEEE Trans. Pattern Anal. Mach. Intell., 23 (6), 643 –660 (2001). https://doi.org/10.1109/34.927464 ITPIDJ 0162-8828 Google Scholar

48.

K. Lai et al., “A large-scale hierarchical multi-view RGB-D object dataset,” in IEEE Int. Conf. Rob. and Autom., 1817 –1824 (2011). https://doi.org/10.1109/ICRA.2011.5980382 Google Scholar

49.

P. Viola and M. J. Jones, “Robust real-time face detection,” Int. J. Comput. Vision, 57 (2), 137 –154 (2004). https://doi.org/10.1023/B:VISI.0000013087.49260.fb IJCVEQ 0920-5691 Google Scholar

Biography

Hengliang Tan received his BE degree from Foshan University, Foshan, China, in 2006, and his ME and PhD degrees from Sun Yat-sen University, Guangzhou, China, in 2011 and 2016, respectively. He joined the School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, in 2016. His current research interests include machine learning, pattern recognition, and manifold learning.

Ying Gao received his PhD from the South China University of Technology, Guangzhou, China, in 2002. He is currently a professor at the School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou. His main research interests include intelligent optimization algorithms, pattern recognition, and signal processing.

Jiao Du received her MS and PhD degrees from the Chongqing University of Posts and Telecommunications in 2013 and 2017, respectively. She is currently a lecturer at the School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China. Her research interests include pattern recognition and image processing.

Shuo Yang received his master’s degree in software engineering from Dalian Jiaotong University, China, in 2013. He was awarded a doctorate degree in software engineering, University of Macau, in 2017. He is currently a lecturer at the School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China. His research interests include semantic interoperability and semantic inference with artificial intelligence technology, mainly applied to the fields of e-commerce, e-marketplace, and clinical area.

CC BY: © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.

Citation Download Citation

Hengliang Tan, Ying Gao, Jiao Du, and Shuo Yang "Regularized graph-embedded covariance discriminative learning for image set classification," Journal of Electronic Imaging 29(4), 043018 (4 August 2020). https://doi.org/10.1117/1.JEI.29.4.043018

Received: 19 April 2020; Accepted: 22 July 2020; Published: 4 August 2020

Access the abstract

JOURNAL ARTICLE
20 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

KEYWORDS

Matrices

Image classification

Feature extraction

Lab on a chip

Light emitting diodes

Facial recognition systems

Data modeling