Robust multi-view locality preserving regression embedding

Ling Jing; Yi Li; Hongjie Zhang

doi:10.7717/peerj-cs.2619

Robust multi-view locality preserving regression embedding

Ling Jing^1,2,3, Yi Li², Hongjie Zhang ⁴

December 20, 2024

Author and article information

Abstract

Feature extraction research has witnessed significant advancements in recent decades, particularly with single-view graph embedding (GE) methods that demonstrate clear advantages by incorporating structural information. However, multi-view data includes descriptions from various perspectives or sensors, offering richer and more comprehensive information compared to single-view data. Research interest in multi-view feature extraction is steadily increasing. Hence, there is a pressing need for a comprehensive framework that extends single-view methods, especially effective GE methods, into multi-view approaches. This article proposes three innovative multi-view feature extraction frameworks based on regression embedding. These frameworks extend single-view GE methods to the multi-view scenario. Our approach meticulously considers the consistency and complementarity of multi-view data, placing strong emphasis on robustness to noisy datasets. Additionally, the use of non-linear shared embedding helps prevent the loss of essential information that may occur with linear projection techniques. Through numerical experiments, we validate the effectiveness and robustness of our proposed frameworks on both real and noisy datasets.

Cite this as

Jing L, Li Y, Zhang H. 2024. Robust multi-view locality preserving regression embedding. PeerJ Computer Science 10:e2619 https://doi.org/10.7717/peerj-cs.2619

Main article text

Introduction

The continuous advancement of information technology has significantly enhanced the capabilities of data collection, transmission, storage, processing, and utilization. In theory, a higher numerical feature dimension within a given image sample allows for the inclusion of more valuable information for identification and classification (Braik et al., 2024; Li et al., 2024; Huang et al., 2023). However, practical applications reveal that an excessively high feature dimension introduces a range of challenges in data processing (Jain, Duin & Mao, 2000). These challenges include issues related to data storage and computational complexity, a high redundancy among data features, and the occurrence of the Hughes phenomenon (Hughes, 1968). To address the drawbacks of high dimensionality, feature extraction technology offers a solution that reduces computational costs and improves the efficiency of machine learning. While multi-view feature extraction may somewhat lag behind deep learning algorithms in terms of performance, it provides notable advantages in interpretability and compatibility with various hardware types, including central processing units (CPUs), graphic processing units (GPUs), and digital signal processor (DSPs). These strengths contribute to its continued relevance as an important research focus. Therefore, advancing traditional multi-view feature extraction methods to more effectively extract discriminative features remains a significant pursuit.

The categorization of feature extraction methods into single-view feature extraction (Nie et al., 2023; Li et al., 2022; Zhang et al., 2022a) and multi-view feature extraction (Wei et al., 2023; Wong et al., 2022; Qiang et al., 2022) is primarily based on the type of data samples. In single-view feature extraction, principal component analysis (PCA) (Oh & Kwak, 2016) is a commonly used technique. PCA is an unsupervised approach that aims to find the subspace with the highest sample variance, making it suitable for various subsequent tasks. However, PCA is less effective when dealing with nonlinear data. As manifold learning techniques have evolved, various nonlinear feature extraction methods like locality preserving projections (LPP) (He & Niyogi, 2003), neighborhood preserving embedding (NPE) (He et al., 2005), and isometric projection (Cai, He & Han, 2007) have been introduced to address the limitations of linear methods like PCA. These methods aim to retain the original structures in the subspace but may require different prerequisites. For example, LPP relies on the upfront generation of a neighbor graph for the original data. NPE assumes that the linear reconstruction relationship of the original samples can be maintained between the samples and neighboring samples in embedding space. In contrast, supervised feature extraction methods leverage sample labels to provide more discriminant information. Linear discriminant analysis (LDA) (Belhumeur, Hespanha & Kriegman, 1997) seeks to find a transformation that minimizes within-class scatter and maximizes between-class scatter. However, like PCA, LDA is inherently linear, which might not yield optimal results when samples within a class form separate clusters. To overcome this, researchers have introduced methods like local fisher discriminant analysis (Sugiyama, 2007) and marginal fisher analysis (Yan et al., 2007) to consider distinct local structures. Additionally, sparsity-preserving discriminant projections (Qiao, Chen & Tan, 2010) have been developed based on sparse reconstruction to maintain sparse reconstruction coefficients in the subspace. Despite the distinct motivations behind these algorithms, Yan et al. (2007) have presented a general framework known as graph embedding (GE) that unifies the above unsupervised and supervised methods within a common framework. In this framework, each algorithm is designed to preserve a specific structure within a dataset, improving feature extraction performance and providing a platform for developing new dimensionality reduction algorithms based on various graph structures.

Furthermore, multi-view data typically offers a wealth of complementary information compared to single-view data (Li et al., 2024; Feng & Wang, 2023; Liu et al., 2023). Multi-view feature extraction, which amalgamates relationships between different views to enhance model performance, has found extensive application across various domains (Zhang et al., 2022b; Liang et al., 2022; Zheng et al., 2021). Here, the most straightforward approach involves consolidating all features into an extended vector and then applying a single-view method for feature extraction. However, this method tends to be computationally expensive and doesn’t effectively consider the information related to consistency and complementarity among multiple views. Canonical correlation analysis (CCA) (Hardoon, Szedmák & Shawe-Taylor, 2004) is a prominent unsupervised multi-view technique aimed at uncovering consistency information. It achieves this by optimizing two projection matrices to maximize the correlation between different representations of the same embedding sample from two distinct views. In the realm of single-view feature extraction techniques, which prioritize structure preservation, variations of CCA have been introduced. Locality-preserving CCA (LPCCA) (Sun & Chen, 2007), including its variant, a new LPCCA (Wang & Zhang, 2013), focuses on retaining local neighbor relationships during the computation of the projection matrices. It is essential to note that manual neighbor selection in these methods may introduce inaccuracies. To overcome this shortcoming, canonical sparse cross-view correlation analysis (Zu & Zhang, 2016) leverages sparse representation to unveil local geometric structures without manual intervention. Additionally, Rupnik & Shawe-Taylor (2010) introduced multi-view CCA (MCCA), an extension of CCA designed for handling data with more than two views. Based on this, discriminative MCCA (Gao et al., 2018) considers label information and falls under supervised multi-view feature extraction. Beyond CCA-based models, researchers have proposed the multi-view uncorrelated discriminant analysis method (MvDA) Shu et al. (2019), drawing inspiration from single-view feature extraction techniques like LDA and PCA. MvDA strives to identify a shared subspace while maintaining intra-class compactness and inter-class separation. Wei et al. (2023) introduced an entropy-weighted non-negative matrix factorization (EWNMF) that assigns weights to each data attribute, placing emphasis on their importance.

While single-view feature extraction research has made significant progress over the past few decades, it has recently reached a plateau. The widespread availability of multi-view data across various domains and the remarkable performance of multi-view learning have shifted the focus of current research towards multi-view feature extraction. However, due to the specific characteristics of multi-view data, single-view feature extraction methods cannot be directly applied to multi-view data, which will lead to the waste of the long-developed single-view feature extraction methods. Therefore, there is an urgent need for a comprehensive framework that extends single-view methods, especially the effective GE methods, into multi-view approaches. This extension should fully consider the unique attributes of multi-view data. In response to this need, we propose the multi-view regression embedding frameworks. Our frameworks extend single-view GE methods into a multi-view context, carefully considering the consistency and complementarity of multi-view data. Specifically, our frameworks apply GE to shared embeddings from multiple views, effectively capturing the consistency information among these views. This shared embedding is obtained through non-linear projections, derived by fitting linear projections of different views using regression techniques. The use of non-linear shared embedding helps prevent the loss of essential information that can occur with linear projection techniques. Furthermore, our frameworks introduce adaptive weights to the GEs and the regression techniques of different views, effectively leveraging the complementary information across multiple views. Additionally, by imposing distinct norm constraints on the projection matrices, our framework enhances its robustness to noisy data.

The main contributions of this article can be summarized as follows:

The multi-view regression embedding frameworks are proposed to extend the effectiveness single-view GE methods to the multi-view feature extraction.
The frameworks comprehensively considers the consistency and complementarity of multi-view data, enhancing robustness through the incorporation of norm constraints.
The framework utilizes nonlinear shared embedding to prevent the loss of crucial information that can occur with linear projections.
Various multi-view feature extraction models are constructed within the framework, and their performance is confirmed through real dataset evaluations.

The structure of this article is outlined as follows: ‘Related Work’ briefly describes the GE framework and two specific methods PCA and LDA under this framework. ‘Proposed Method’ presents the multi-view feature extraction framework and expounds on two extended frameworks. ‘Optimization strategy’ details the algorithm design of the feature extraction framework. ‘Experiments’ carries out experiments on three datasets to assess the proposed approach. ‘Conclusion’ concludes the article by presenting a summary and analysis of the entire work.

Proposed method

This section proposes three multi-view feature extraction frameworks based on regression embedding, namely multi-view regression embedding (MRE), multi-view locally preserved regression embedding (MLPRE), robust multi-view locally preserved regression embedding (RMLPRE). Specific multi-view feature extraction models are devised by integrating traditional single-view PCA and LDA into these frameworks. An illustrative overview of these frameworks construction process is provided in Fig. 1.

Figure 1: Process of constructing MRE and expanding frameworks.

Download full-size image

DOI: 10.7717/peerj-cs.2619/fig-1

Multi-view feature extraction problem can be defined as follows: Let $v$ be the number of multi-view data sources, and consider multi-view data samples $X^{(i)} = [x_{1}^{(i)}, x_{2}^{(i)}, \dots, x_{n}^{(i)}] \in R^{d_{(i)} \times n}$ . Here, $d_{(i)}$ represents the dimensionality of the feature space for the $i$ -th view, and $n$ is the total number of data samples. The goal of multi-view feature extraction is to learn projection matrices $P_{(i)} \in R^{d_{(i)} \times m}$ , where $m ≪ d_{(i)}$ . These projection matrices aim to transform the original data into embedding representations $Y^{(i)} = P_{(i)}^{T} X^{(i)} \in R^{m \times n}$ .

MRE

To comprehensively account for the consistency and complementarity information among views while extending the single-view GE framework to multi-view feature extraction, we apply GE by incorporating the structural information from distinct views into shared embedding. Additionally, we introduce adaptive weights to the regression terms and the GEs in distinct views. The optimization problem is as follows:

(10) $\begin{array}{l} min_{α, Y, P_{(i)}} \sum_{i = 1}^{v} α_{i}^{r} [G (Y, X^{(i)}) + \frac{1}{2} ‖ P_{(i)}^{T} X^{(i)} - Y ‖_{F}^{2}], \\ s.t. α_{i} \geq 0, \sum_{i = 1}^{v} α_{i} = 1, \\ Y H Y^{T} = I or {P^{T}}_{(i)} P_{(i)} = I, i = 1, . . ., v, \end{array}$ where

(11) $G (Y, X^{(i)}) = Y L^{(i)} Y^{T}, i = 1, \dots, v,$

The parameter $r$ is a positive value, and $α = [α_{1}, \dots, α_{v}]$ . $L^{(i)}$ represents the Laplacian matrix of the similarity matrix in $i$ th view. In this context, the second term of the objective function employs regression techniques to fit linear projections of different views to the shared embedding $Y$ . This fitting process yields fitting residuals $Y_{r}^{(i)}$ , which has the following representation:

(12) $Y = P_{(i)}^{T} X^{(i)} + Y_{r}^{(i)}, i = 1, 2, \dots, v,$

This implies that the shared embedding $Y$ is a nonlinear embedding due to this regression-based transformation. Therefore, this approach effectively prevents the loss of crucial information associated with linear projection methods.

To avoid over-fitting and prevent degenerate trivial solutions, we introduced a regularization term for the norm of the projection matrices $‖ P_{(i)}^{T} ‖_{F}^{2}$ . Additionally, we introduced a structured regularization term $‖ Y_{i} - \bar{Y} ‖_{F}^{2}$ to maximize the overall divergence of the samples after projection, thereby retaining more complete information from each view. Here, $\bar{Y}$ is calculated as:

(13) $\bar{Y} = \frac{1}{v} Y \cdot 1 \cdot 1^{T} .$

The optimization problem is as follows:

(14) $\begin{array}{l} min_{α_{i}^{r}, Y, P_{(i)}} \frac{\sum_{i = 1}^{v} α_{i}^{r} [G (Y, X^{(i)}) + \frac{1}{2} ‖ P_{(i)}^{T} X^{(i)} - Y ‖_{F}^{2} + \frac{γ}{2} ‖ P_{(i)}^{T} ‖_{F}^{2}]}{‖ Y_{i} - \overset{―}{Y} ‖_{F}^{2}}, \\ s.t. α_{i} \geq 0, \sum_{i = 1}^{v} α_{i} = 1. \end{array}$

In Eq. (14), the constraint $Y H Y^{T} = I$ or ${P^{T}}_{(i)} P_{(i)} = I$ , $i = 1, \dots, v$ has been replaced by the regularization term $| | P_{(i)}^{T} | |_{F}^{2}$ in the objective function. Therefore, it can be omitted from the constraints.

MLPRE

To further explore the consistency information, we leverage the shared $k$ -nearest neighbor structure across all views to constrain the shared embedding. The optimization problem can be stated as follows:

(15) $\begin{aligned} min_{α_{i}, Y, P_{(i)}} \frac{\sum_{i = 1}^{v} α_{i}^{r} [G (Y, X^{(i)}) + \frac{1}{2} ‖ P_{(i)}^{T} X^{(i)} - Y ‖_{F}^{2} + \frac{γ}{2} ‖ P_{(i)}^{T} ‖_{F}^{2} + \frac{λ}{2} \sum_{j, k = 1}^{n} W_{j k}^{L P (i)} ‖ Y_{j} - Y_{k} ‖_{F}^{2}]}{‖ Y - \overset{―}{Y} ‖_{F}^{2}}, \\ s.t. α_{i} \geq 0, \sum_{i = 1}^{v} α_{i} = 1, \end{aligned}$ where

(16) $W_{j k}^{L P (i)} = {\begin{matrix} e x p (\frac{- ‖ x_{j}^{(i)} - x_{k}^{(i)} ‖_{2}^{2}}{t}), & x_{i} and x_{j} are k -neighbors in all views . \\ 0, & otherwise \end{matrix}$

Note that GE uses the graph structure under each view to constrain the shared embedding $Y$ separately, whereas here the graph structure common to all views is used to constrain the shared embedding $Y$ . Thus it further facilitates the exploration of the consistency information, and thus also weakens the detrimental effect of the inaccuracy of the graph structure in a particular view due to the noise and redundant features.

RMLPRE

To enhance robustness against noisy data, we employ distinct norms for the projection matrices within the proposed framework. The resulting model is expressed as follows:

(17) $\begin{aligned} min_{α_{i}, Y, P_{(i)}} \frac{\sum_{i = 1}^{v} α_{i}^{r} [G (Y, X^{(i)}) + \frac{1}{2} ‖ P_{(i)}^{T} X^{(i)} - Y ‖_{F}^{2} + \frac{γ}{2} ‖ P_{(i)}^{T} ‖_{β} + \frac{λ}{2} \sum_{j, k = 1}^{n} W_{j k}^{L P (i)} ‖ Y_{j} - Y_{k} ‖_{F}^{2}]}{‖ Y - \overset{―}{Y} ‖_{F}^{2}}, \\ s.t. α_{i} \geq 0, \sum_{i = 1}^{v} α_{i} = 1. \end{aligned}$

In Eq. (17), the symbol $‖ \cdot ‖_{β}$ denotes a particular matrix norm, for example L1-norm or L2,1-norm. Notably, research based on L1-norm and L2,1-norm has shown that these norms exhibit improved performance in the presence of outliers compared to F-norm-based methods. Moreover, L2,1-norm-based techniques are often easier to solve than their L1-norm counterparts.

Framework application

In this section, we extend the single-view PCA and LDA into six multi-view feature extraction models by the three frameworks proposed above, respectively, namely pcMRE, pcMLPRE, pcRMLPRE, daMRE, daMLPRE, and daRMLPRE.

To derive the optimization problems for pcMRE, pcMLPRE, and pcRMLPRE, the Laplacian matrix for each view is calculated as follows:

(18) $L^{(i)} = I - \frac{1}{n} 1 \cdot 1^{T}, i = 1, \dots, v,$ with the L2,1-norm regularization applied to objective function of pcRMLPRE. The optimization problems can be formulated as follows:

1) pcMRE:

(19) $\begin{array}{l} min_{α_{i}, Y, P_{(i)}} \frac{\sum_{i = 1}^{v} α_{i}^{r} [t r (Y (I - \frac{1}{n} 1 \cdot 1^{T}) Y^{T}) + \frac{1}{2} ‖ P_{(i)}^{T} X^{(i)} - Y ‖_{F}^{2} + \frac{γ}{2} ‖ P_{(i)}^{T} ‖_{F}^{2}]}{‖ Y - \overset{―}{Y} ‖_{F}^{2}}, \\ s.t. α_{i} \geq 0, \sum_{i = 1}^{v} α_{i} = 1. \end{array}$

2) pcMLPRE:

(20) $\begin{aligned} min_{α_{i}, Y, P_{(i)}} \frac{\sum_{i = 1}^{v} α_{i}^{r} [t r (Y (I - \frac{1}{n} 1 \cdot 1^{T}) Y^{T}) + \frac{1}{2} ‖ P_{(i)}^{T} X^{(i)} - Y ‖_{F}^{2} + \frac{γ}{2} ‖ P_{(i)}^{T} ‖_{F}^{2} + \frac{λ}{2} \sum_{j, k = 1}^{n} W_{j k}^{L P (i)} ‖ Y_{j} - Y_{k} ‖_{F}^{2}]}{‖ Y - \overset{―}{Y} ‖_{F}^{2}}, \\ s.t. α_{i} \geq 0, \sum_{i = 1}^{v} α_{i} = 1. \end{aligned}$

3) pcRMLPRE:

(21) $\begin{array}{l} min_{α_{i}, Y, P_{(i)}} \frac{\sum_{i = 1}^{v} α_{i}^{r} [t r (Y (I - \frac{1}{n} 1 \cdot 1^{T}) Y^{T}) + \frac{1}{2} ‖ P_{(i)}^{T} X^{(i)} - Y ‖_{F}^{2} + \frac{γ}{2} ‖ P_{(i)}^{T} ‖_{2, 1} + \frac{λ}{2} \sum_{j, k = 1}^{n} W_{j k}^{L P (i)} ‖ Y_{j} - Y_{k} ‖_{F}^{2}]}{‖ Y - \bar{Y} ‖_{F}^{2}}, \\ s.t. α_{i} \geq 0, \sum_{i = 1}^{v} α_{i} = 1. \end{array}$

To derive the optimization problems for daMRE, daMLPRE and daRMLPRE, the Laplacian matrix for each view is calculated as follows:

(22) $L^{(i)} = I - \sum_{c = 1}^{n_{c}} \frac{1}{n_{c}} 1^{c} \cdot {1^{c}}^{T}, i = 1, \dots, v,$ with the L2,1-norm regularization applied to objective function of daRMLPRE. The optimization problems can be formulated as follows:

1) daMRE:

(23) $\begin{aligned} min_{α_{i}, Y, P_{(i)}} \frac{\sum_{i = 1}^{v} α_{i}^{r} [t r (Y (I - \sum_{c = 1}^{n_{c}} \frac{1}{n_{c}} 1^{c} \cdot {1^{c}}^{T}) Y^{T}) + \frac{1}{2} ‖ P_{(i)}^{T} X^{(i)} - Y ‖_{F}^{2} + \frac{γ}{2} ‖ P_{(i)}^{T} ‖_{F}^{2}]}{‖ Y - \overset{―}{Y} ‖_{F}^{2}}, \\ s.t. α_{i} \geq 0, \sum_{i = 1}^{v} α_{i} = 1. \end{aligned}$

2) daMLPRE:

(24) $min_{α_{i}, Y, P_{(i)}}$

(25) $\begin{array}{l} \frac{\sum_{i = 1}^{v} α_{i}^{r} [t r (Y (I - \sum_{c = 1}^{n_{c}} \frac{1}{n_{c}} 1^{c} \cdot {1^{c}}^{T}) Y^{T}) + \frac{1}{2} ‖ P_{(i)}^{T} X^{(i)} - Y ‖_{F}^{2} + \frac{γ}{2} ‖ P_{(i)}^{T} ‖_{F}^{2} + \frac{λ}{2} \sum_{j, k = 1}^{n} W_{j k}^{L P (i)} ‖ Y_{j} - Y_{k} ‖_{F}^{2}]}{‖ Y - \bar{Y} ‖_{F}^{2}}, \\ s.t. α_{i} \geq 0, \sum_{i = 1}^{v} α_{i} = 1. \end{array}$

3) daRMLPRE:

(26) $min_{α_{i}, Y, P_{(i)}}$

(27) $\begin{array}{l} \frac{\sum_{i = 1}^{v} α_{i}^{r} [t r (Y (I - \sum_{c = 1}^{n_{c}} \frac{1}{n_{c}} 1^{c} \cdot {1^{c}}^{T}) Y^{T}) + \frac{1}{2} ‖ P_{(i)}^{T} X^{(i)} - Y ‖_{F}^{2} + \frac{γ}{2} ‖ P_{(i)}^{T} ‖_{2, 1} + \frac{λ}{2} \sum_{j, k = 1}^{n} W_{j k}^{L P (i)} ‖ Y_{j} - Y_{k} ‖_{F}^{2}]}{‖ Y - \bar{Y} ‖_{F}^{2}}, \\ s.t. α_{i} \geq 0, \sum_{i = 1}^{v} α_{i} = 1. \end{array}$

In summary, the integration of the Laplacian matrices $L^{(i)}$ , $i = 1, 2, \dots, v$ from PCA and LDA into our proposed frameworks allows for the development of tailored methods. Moreover, the Laplacian matrices from other single-view graph embedding methods can also be seamlessly incorporated into these frameworks. As a result, our frameworks effectively bridge the gap, expanding single-view graph embedding techniques into the domain of multi-view feature extraction and, in turn, propelling the progress of multi-view learning.

Optimization strategy

We designed a simplified iterative algorithm to replace the traditional alternating iterative algorithm to reduce the cost of each iteration.

Optimization of MRE and MLPRE

Let Eq. (15) be $f (Y, P_{(i)}, α_{i})$ :

(28) $\frac{\partial f}{\partial P_{(i)}^{T}} = g (P_{(i)}^{T} X^{(i)} X^{(i) T} + γ P_{(i)}^{T}),$ where $g$ is independent of $P_{(i)}^{T}$ . Let the partial derivative be $0$ :

(29) $P_{(i)}^{T} = Y X^{(i) T} (X^{(i)} X^{(i) T} + γ I)^{- 1} .$

Substituting Eq. (29) into Eq. (15), we have

(30) $\begin{array}{l} min_{α_{i}, Y, P_{(i)}} \frac{\sum_{i = 1}^{v} α_{i}^{r} [G (Y, X^{(i)}) + \frac{1}{2} ‖ P_{(i)}^{T} X^{(i)} - Y ‖_{F}^{2} + \frac{γ}{2} ‖ P_{(i)}^{T} ‖_{F}^{2} + \frac{λ}{2} \sum_{j, k = 1}^{n} W_{j k}^{L P (i)} ‖ Y_{j} - Y_{k} ‖_{F}^{2}]}{‖ Y - \bar{Y} ‖_{F}^{2}} \\ => min_{α_{i}, Y} \frac{\sum_{i = 1}^{v} α_{i}^{r} [G (Y, X^{(i)}) + t r [Y (M^{(i)}) Y^{T}]]}{t r [Y N Y^{T}]}, \end{array}$ where

(31) $M^{(i)} = I - X^{(i) T} (X^{(i)} X^{(i) T} + γ I)^{- 1} X^{(i)} + \frac{λ}{α_{i}^{r}} (D - W)$ and

(32) $N = I - \frac{1}{n} 1 \cdot 1^{T} .$

If $G (Y, X^{(i)})$ can be written as $t r [[Y (L^{(i)}) Y^{T}]$ , we have

(33) $\begin{array}{l} min_{α_{i}, Y} \frac{\sum_{i = 1}^{v} α_{i}^{r} [G (Y, X^{(i)}) + t r [Y (M^{(i)}) Y^{T}]]}{t r [Y N Y^{T}]} \\ => & min_{α_{i}, Y} \frac{t r [Y (\sum_{i = 1}^{v} α_{i}^{r} (L^{(i)} + M^{(i)})) Y^{T}]}{t r [Y N Y^{T}]} \\ => & min_{α_{i}, Y} t r [Y (\sum_{i = 1}^{v} α_{i}^{r} (L^{(i)} + M^{(i)})) Y^{T}], s . t . Y N Y^{T} = I . \end{array}$

Step 1: Updating $Y$ while fixing $α_{i}$ . By constructing the multivariate Lagrange function and set the partial derivative of $Y^{T}$ to $0$ , the optimization problem Eq. (33) can be solved by the following generalized eigenvalue problem

(34) $(\sum_{i = 1}^{v} α_{i}^{r} (L^{(i)} + M^{(i)})) Y^{T} = μ N Y^{T} .$

Step 2: Updating $α_{i}$ while fixing $Y$ . The optimal solution of problem Eq. (33) can be calculated as:

(35) $\begin{aligned} α_{i} = \frac{[1 / t r (Y (L^{(i)} + M^{(i)})) Y^{T})]^{(1 / (r - 1))}}{\sum_{i = 1}^{v} [1 / t r (Y (L^{(i)} + M^{(i)})) Y^{T})]^{(1 / (r - 1))}} . \end{aligned}$

We alternatively update $Y$ and $α_{i}$ until convergence. $P_{(i)}^{T}$ can be solved by Formula (29). The complete procedures are described in Algorithm 1.

Algorithm 1 :

Algorithm of MRE and MLPRE.

Input: Given a set of multi-view datasets

$X^{(i)} \in R^{d_{(v)} \times n}$ , the number of iterations T, the paramenter r, T and γ, and the embedded space dimensions m.

Initialization:

$α_{i} = 1 / v$ ;

Compute

$M^{(i)}$ by Eq. (31);

Compute

$N$ by Eq. (32);

for

$t = 1 : T$ do

Compute

$Y$ according to Eq. (34);

Update

$α_{i}$ according to Eq. (35);

Compute

$P_{(i)}$ according to Eq. (29);

end for

Output: The contribution weight

$α_{i}$ , and the projection matrix

$P_{(i)}$ .

DOI: 10.7717/peerj-cs.2619/table-12

Optimization of RMLPRE

Substituting Eq. (29) into Eq. (17), Eq. (17) can be further written as follows:

If $G (Y, X^{(i)})$ can be written as $t r [Y (L^{(i)}) Y^{T}]$ , we have

(36) $\begin{array}{l} min_{α_{i}, Y, P_{(i)}} \frac{\sum_{i = 1}^{v} α_{i}^{r} [G (Y, X^{(i)}) + \frac{1}{2} ‖ P_{(i)}^{T} X^{(i)} - Y ‖_{F}^{2} + \frac{γ}{2} ‖ P_{(i)}^{T} ‖_{2, 1} + λ \sum_{i, j = 1}^{n} \frac{1}{2} W_{i j}^{L P} ‖ Y_{i} - Y_{j} ‖_{F}^{2}]}{‖ Y - \bar{Y} ‖_{F}^{2}} \\ => & min_{α_{i}, Y, P_{(i)}} \sum_{i = 1}^{v} α_{i}^{r} [\frac{1}{2} ‖ P_{(i)}^{T} X^{(i)} - Y ‖_{F}^{2} + \frac{γ}{2} ‖ P_{(i)}^{T} ‖_{2, 1} + t r [Y [L^{(i)} + \frac{λ}{α_{i}} (D^{(i)} - W^{L P (i)})] Y^{T}]], \\ s.t. α_{i} \geq 0, \sum_{i = 1}^{v} α_{i} = 1, Y N Y^{T} = I, \end{array}$ where $N = I - \frac{1}{n} 1 \cdot 1^{T}$ .

Step 1: Updating $P_{(i)}$ while fixing $α_{i}$ and $Y$ . Problem Eq. (36) can be derived by minimizing the following function

(37) $\begin{array}{l} f (P_{(i)}) = α_{i}^{r} [\frac{1}{2} ‖ P_{(i)}^{T} X^{(i)} - Y ‖_{F}^{2} + \frac{γ}{2} ‖ P_{(i)}^{T} ‖_{2, 1}] . \end{array}$

Take the derivative of $f (P_{(i)})$ and set it to zero

(38) $P_{(i)}^{T} = Y X^{(i) T} (X^{(i)} X^{(i) T} + \frac{1}{2} γ W^{(i)})^{- 1},$ where

(39) $W^{(i)} = [\begin{matrix} 1 ‖ P_{(i) 1}^{T} ‖_{2} \\ ⋱ \\ 1 ‖ P_{(i) d_{i}}^{T} ‖_{2} \end{matrix}],$ and $d_{i}$ is the dimension of the $i$ th view.

Step 2: Updating $Y$ while fixing $α_{i}$ and $P_{(i)}$ . Problem Eq. (36) is equivalent to

(40) $\begin{array}{l} min_{α_{i}, Y, P_{(i)}} \sum_{i = 1}^{v} α_{i}^{r} [\frac{1}{2} ‖ P_{(i)}^{T} X^{(i)} - Y ‖_{F}^{2} + t r [Y [L^{(i)} + \frac{λ}{α_{i}^{r}} (D^{(i)} - W^{L P (i)})] Y^{T}]], \\ s.t. Y N Y^{T} = I . \end{array}$

Substituting Eq. (38) into Eq. (40), the optimization equation can be solved by the following generalized eigenvalue problem

(41) $[\sum_{i = 1}^{v} α_{i}^{r} G^{(i)} + λ (D - W^{L P})] Y^{T} = μ N Y^{T},$ where $G^{(i)} = X^{(i) T} [(X^{(i)} X^{(i) T} + \frac{1}{2} γ W^{(i)})^{- 1} X^{(i)} X^{(i) T} (X^{(i)} X^{(i) T} + \frac{1}{2} γ W^{(i)})^{- T} - (X^{(i)} X^{(i) T} + \frac{1}{2} γ W^{(i)})^{- 1} - (X^{(i)} X^{(i) T} + \frac{1}{2} γ W^{(i)})^{- T}] X^{(i)} + I + L^{(i)}$ .

Step 3: Updating $α_{i}$ while fixing $P_{(i)}$ and $Y$ . The optimal solution of problem Eq. (36) can be calculated as:

(42) $\begin{aligned} α_{i} = \frac{{[1 / t r (Y G^{(i)} Y^{T} + P_{(i)}^{T} W^{(i)} P_{(i)})]}^{(1 / (r - 1))}}{\sum_{i = 1}^{v} {[1 / t r (Y G^{(i)} Y^{T} + P_{(i)}^{T} W^{(i)} P_{(i)})]}^{(1 / (r - 1))}} . \end{aligned}$

We alternatively update $P_{(i)}^{T}$ , $Y$ and $α_{i}$ until convergence. The complete procedures are described in Algorithm 2.

Algorithm 2 :

Algorithm of RMLPRE.

Input: Given a set of multi-view datasets

$X^{(i)} \in R^{d_{(v)} \times n}$ , the number of iterations T, the paramenter r, T and γ, and the embedded space dimensions m.

Initialization:

$α_{i} = 1 / v$ ,

$W^{(i)} \in R^{n \times n}$ ;

Compute

$N$ by Eq. (32);

for

$t = 1 : T$ do

Compute

$Y$ according to Eq. (41);

Update

$P_{(i)}$ according to Eq. (38);

Update

$W_{(i)}$ according to Eq. (39);

Update

$α_{i}$ according to Eq. (42);

end for

Output: The contribution weight

$α_{i}$ , and the projection matrix

$P_{(i)}$ .

DOI: 10.7717/peerj-cs.2619/table-13

Time complexity analysis

The time complexity of Algorithm 1 is mainly determined by the computational costs of its key equations. Assuming

(43) $d_{m a x} = m a x {d_{(i)} | i = 1, 2, \dots, v} .$

For Algorithm 1, the time complexity of Eqs. (31), (34), (35) and (29) are $O (n * d_{m a x}^{2} + d_{m a x}^{3} + n^{2} * d_{m a x})$ , $O (n^{3})$ , $O (n^{2} + m * n^{2} + n * m^{2})$ , $O (n * d_{m a x}^{2} + d_{m a x}^{3} + m * n * d_{m a x})$ , respectively. Given that $m ≪ d_{(i)}$ , the main time complexity of Algorithm 1 is $O (d_{m a x}^{3} + n * d_{m a x}^{2} + n^{2} * d_{m a x} + n^{3})$ . Similarly, for Algorithm 2, the operations and their associated costs are comparable, resulting in the main time complexity of $O (d_{m a x}^{3} + n * d_{m a x}^{2} + n^{2} * d_{m a x} + n^{3})$ .

Experiments

To validate the effectiveness of our proposed frameworks, we compared the classification accuracy with traditional multi-view methods, including LPCCA (Sun & Chen, 2007), ALPCCA (Wang & Zhang, 2013), MCCA (Rupnik & Shawe-Taylor, 2010), DMCCA (Gao et al., 2018), MvDA (Shu et al., 2019), MvPLS (Cao et al., 2018), EWNMF (Wei et al., 2023), MUNPE (Jayashree, Shiva Prakash & Venugopal, 2024). Experiments were conducted on three real-world datasets using a Windows 10 desktop computer with a 2.5 GHz Intel Core i5-7300HQ CPU, 64 GB of RAM, and Matlab R2019b (64-bit).

Datasets description

Coil (https://cave.cs.columbia.edu/repository/COIL-20): The Coil Dataset (Nene, Nayar & Murase, 1996), originating from Columbia University, contains a diverse collection of 1,400 images, featuring 20 different objects. Each object is represented by a substantial set of 72 images, offering rich variability for analysis.

Orl (http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html): Hailing from the Olivetti Laboratory in Cambridge, England, the ORL Dataset (Samaria & Harter, 1994) comprises 400 images showcasing the faces of 40 distinct individuals. These images were captured under a range of conditions, including differing lighting, positions, and expressions.

Yale (https://vision.ucsd.edu/datasets/yale-face-database): The Yale Face Dataset contains 165 images of 15 people’s faces, with variations in lighting, expression, and posture.

Our multi-view graph embedding method is designed to process data in vector form. For example, for the image data, we first preprocess it to extract numerical vector features of gray-scale intensity (GSI), local binary patterns (LBP), and histogram of oriented gradients (HOG), respectively. Then, the proposed method, as well as comparison methods, is applied to extract the features of these numerical vector representations. For some details, please refer to Table 1. As shown in Fig. 2, the image features extracted using the three described techniques display significant differences, representing three distinct perspectives of the image data. Furthermore, we evaluated the robustness of our proposed methods by introducing salt-and-pepper noise with densities of 0.1 and 0.3 to each dataset.

Table 1:

Experimental details about datasets.

Dataset	Views	Noise	No. classes	Samples	Training samples	Features
Coil	3 (GSI, LBP, HOG)	None, 0.1, 0.3	20	1,440	1,152	4,096, 4,096, 1,764
Orl	3 (GSI, LBP, HOG)	None, 0.1, 0.3	40	400	320	2,000, 2,000, 720
Yale	3 (GSI, LBP, HOG)	None, 0.1, 0.3	15	165	132	2,000, 2,000, 720

DOI: 10.7717/peerj-cs.2619/table-1

Figure 2: Image features extracted using three techniques on the Coil, ORL and Yale Face datasets.

Download full-size image

DOI: 10.7717/peerj-cs.2619/fig-2

Experiments setup

In our experiments, we employed a random split of the data, allocating 80% of the samples for training and reserving the remaining 20% for testing. The number of training samples for each dataset is provided in Table 1. For multi-view datasets ${X_{1}, \dots, X_{v}}$ , once we obtained the projection matrices ${P_{1}, \dots, P_{v}}$ , we extracted embedding features for each view as follows:

(44) $Y^{(i)} = P_{(i)}^{T} X^{(i)}, i = 1, \dots, v .$

We applied a 1NN classifier for classification, and these experiments were repeated five times. The evaluation criteria were based on the average classification accuracies of the embedding representations. Optimal parameters were determined through grid search with $γ, λ \in {2^{- 5}, 2^{- 3}, 2^{- 1}, 2, 2^{3}, 2^{5}}$ . The parameters of the algorithms employed for comparison were rigorously adhered to as delineated within the original publication.

Experiment results

On the real-world datasets

In our results, we commence by presenting the classification accuracy results across feature dimensions ranging from 10 to 90 for each dataset. The summarized results for all methods can be found in Tables 2–4, with the best performance highlighted in bold. Furthermore, we provide a visual representation of the classification accuracy of all methods under various reduced dimensions for each dataset in Fig. 3. Based on our experimental findings, several key observations are noted.

Table 2:

1NN classification accuracy (%) for Coil Dataset.

Bold entries indicate the best results.

	10	20	30	40	50	60	70	80	90
PCA (Oh & Kwak, 2016)	9.03	9.14	9.18	9.20	9.17	9.19	9.20	9.20	9.21
LDA (Belhumeur, Hespanha & Kriegman, 1997)	48.53	49.42	49.52	49.63	49.66	49.67	49.68	49.69	49.69
ALPCCA (Wang & Zhang, 2013)	18.77	32.69	44.30	53.13	58.39	62.90	65.26	66.40	66.45
LPCCA (Sun & Chen, 2007)	41.81	44.13	45.82	46.04	46.47	46.82	47.18	47.45	47.38
MCCA (Rupnik & Shawe-Taylor, 2010)	46.04	59.38	65.39	69.65	72.03	73.78	75.21	76.21	77.04
DMCCA (Gao et al., 2018)	67.11	72.46	74.73	75.84	76.31	77.21	70.12	74.69	78.29
MvDA (Shu et al., 2019)	53.80	50.76	45.02	39.03	40.06	41.51	42.60	43.91	44.73
MvPLS (Cao et al., 2018)	44.06	44.89	45.13	45.08	44.98	44.80	44.65	44.64	44.56
EWNMF (Wei et al., 2023)	69.23	71.57	72.78	73.95	74.23	73.38	74.61	76.93	77.27
MUNPE (Jayashree, Shiva Prakash & Venugopal, 2024)	68.83	70.86	67.75	67.11	66.67	64.86	65.42	63.61	61.81
pcMRE	80.11	82.58	81.93	80.71	78.97	76.52	74.79	73.17	71.74
pcMLPRE	80.05	82.52	81.91	80.72	78.91	76.51	74.69	73.03	71.59
pcRMLPRE	78.83	80.94	80.40	80.03	79.34	78.64	78.13	78.08	78.44
daMRE	85.44	86.83	87.30	86.09	84.67	82.01	79.50	76.93	74.89
daMLPRE	85.94	86.84	87.21	86.41	85.37	84.08	83.65	83.41	83.23
daRMLPRE	84.46	86.53	84.12	81.73	80.32	79.65	78.96	78.44	78.45

DOI: 10.7717/peerj-cs.2619/table-2

Table 3:

1NN classification accuracy (%) for ORL dataset.

Bold entries indicate the best results.

	10	20	30	40	50	60	70	80	90
PCA (Oh & Kwak, 2016)	8.80	9.38	9.43	9.43	9.50	9.45	9.50	9.50	9.53
LDA (Belhumeur, Hespanha & Kriegman, 1997)	28.38	29.13	29.38	29.63	29.63	29.63	29.63	29.75	29.75
ALPCCA (Wang & Zhang, 2013)	10.42	18.33	26.04	32.10	38.29	44.58	49.79	52.56	55.92
LPCCA (Sun & Chen, 2007)	23.38	35.94	41.42	43.08	45.40	48.15	50.67	53.65	56.25
MCCA (Rupnik & Shawe-Taylor, 2010)	15.63	27.50	38.75	45.50	50.83	57.67	61.92	66.17	68.92
DMCCA (Gao et al., 2018)	55.50	68.25	74.92	78.67	80.04	81.29	83.33	84.29	85.63
MvDA (Shu et al., 2019)	56.96	56.33	60.17	53.75	45.08	45.17	44.79	44.67	45.21
MvPLS (Cao et al., 2018)	41.58	44.54	45.71	46.08	46.21	46.33	46.04	46.08	46.13
EWNMF (Wei et al., 2023)	65.50	73.16	81.86	85.20	86.47	87.19	87.41	87.85	88.26
MUNPE (Jayashree, Shiva Prakash & Venugopal, 2024)	49.58	55.00	55.83	58.33	60.00	55.42	60.00	59.58	54.17
pcMRE	68.15	81.83	86.08	88.50	88.08	89.42	88.58	88.67	87.67
pcMLPRE	68.17	82.00	86.08	88.42	87.92	89.17	88.67	88.83	87.75
pcRMLPRE	72.58	83.25	85.08	86.83	87.67	89.33	89.42	88.58	87.75
daMRE	80.79	86.17	86.97	90.91	92.42	89.17	89.92	90.83	90.00
daMLPRE	80.83	86.25	86.92	90.83	91.75	88.58	88.33	89.75	89.42
daRMLPRE	80.17	87.42	89.33	94.17	92.08	90.42	87.75	86.25	84.92

DOI: 10.7717/peerj-cs.2619/table-3

Table 4:

1NN classification accuracy (%) for Yale Face Database.

Bold entries indicate the best results.

	10	20	30	40	50	60	70	80	90
PCA (Oh & Kwak, 2016)	7.20	7.96	8.13	8.27	8.31	8.27	8.44	8.49	8.49
LDA (Belhumeur, Hespanha & Kriegman, 1997)	21.56	22.22	22.22	22.22	22.22	22.22	22.22	22.22	22.22
ALPCCA (Wang & Zhang, 2013)	31.52	50.78	60.33	65.59	69.11	71.37	72.26	73.85	73.70
LPCCA (Sun & Chen, 2007)	24.56	30.48	35.74	38.30	40.74	42.41	43.56	44.52	45.37
MCCA (Rupnik & Shawe-Taylor, 2010)	43.33	56.74	63.04	66.44	69.41	70.22	71.48	72.74	72.89
DMCCA (Gao et al., 2018)	80.30	87.63	92.74	94.67	89.56	80.30	73.85	75.04	76.37
MvDA (Shu et al., 2019)	36.30	38.81	39.04	40.30	40.30	40.44	40.59	40.59	40.74
MvPLS (Cao et al., 2018)	36.15	39.33	40.00	40.07	40.22	40.07	40.44	40.74	41.26
EWNMF (Wei et al., 2023)	63.33	74.64	79.47	82.64	83.75	83.96	84.10	84.61	84.97
MUNPE (Jayashree, Shiva Prakash & Venugopal, 2024)	35.56	38.52	39.26	41.48	37.04	39.26	33.33	35.46	37.78
pcMRE	75.46	82.87	89.93	93.63	92.00	89.48	88.44	86.96	84.30
pcMLPRE	75.85	82.96	89.78	93.48	92.59	89.93	89.48	87.85	85.78
pcRMLPRE	85.78	87.70	90.67	92.44	91.41	90.22	90.07	89.93	88.89
daMRE	94.81	96.15	92.59	95.70	98.52	98.07	97.48	96.74	96.44
daMLPRE	94.96	98.37	96.30	96.30	98.48	98.93	98.07	97.19	95.70
daRMLPRE	94.67	95.70	93.78	91.11	89.48	88.85	87.19	86.56	85.48

DOI: 10.7717/peerj-cs.2619/table-4

Figure 3: Classification accuracies with a different dimension.
(A) Coil Dataset. (B) Orl dataset. (C) Yale Face Database.

Download full-size image

DOI: 10.7717/peerj-cs.2619/fig-3

The tables and figure unequivocally illustrate that, in nearly all instances, our proposed methods achieve superior classification accuracy compared to the comparison method. Specifically, in the realm of supervised methods, the MLPRE-based methods typically attain the highest accuracy, often peaking around 50 dimensions. However, it is notable that the classification accuracy curve does not always maintain a high level as dimensionality reduction dimension increases. This behavior is primarily due to the F-norm regularization term’s inclination to retain more feature information, and with higher dimensions, redundant information becomes more prominent, adversely affecting model accuracy. In sum, when using the F-norm regularization term, it is advisable to aim for a dimension reduction of around 50.

In the unsupervised methods, the proposed RMLPRE-based methods usually achieve the highest classification accuracy and tends to be more stable compared to the MLPRE-based methods. This stability is attributed to the L2,1-norm regularization, which tends to select a small number of features close to zero, rendering it more robust. In addition, based on the classification results, methods that incorporate shared structural information items generally outperform those that lack these items. This superiority is because the MLPRE-based methods generally outperform the MRE-based methods in terms of classification accuracy.

Considering the classification results, it is evident that different norm regularization terms have distinct applicable ranges. The L2,1-norm regularizer is more suitable for unsupervised methods, while the F-norm regularizer is better suited for supervised methods.

On the real-world datasets with added noise

Thereafter, our experiments extended to verify the robustness of the proposed methods. We initially present the classification accuracy results across feature dimensions from 10 to 90 for each dataset with added noise. The comprehensive results for all methods are detailed in Tables 5–10, with the optimal performance highlighted in bold. We also offer a visual representation of the classification accuracy of all methods at various reduced dimensions for each dataset in Figs. 4 and 5. Based on our experimental findings, several key observations emerge.

Table 5:

1NN classification accuracy (%) for Coil Dataset with 0.1 salt-pepper noise added.

Bold entries indicate the best results.

	10	20	30	40	50	60	70	80	90
PCA (Oh & Kwak, 2016)	7.37	7.25	6.99	6.58	6.25	5.96	5.84	5.76	5.63
LDA (Belhumeur, Hespanha & Kriegman, 1997)	11.84	11.96	11.21	11.07	10.99	11.24	11.23	11.36	10.62
ALPCCA (Wang & Zhang, 2013)	18.88	27.61	34.79	40.40	44.12	44.60	44.32	45.36	46.83
LPCCA (Sun & Chen, 2007)	7.61	7.28	8.74	8.70	8.42	8.18	7.43	6.48	7.16
MCCA (Rupnik & Shawe-Taylor, 2010)	14.16	20.05	26.25	31.54	34.31	37.11	39.50	41.71	42.93
DMCCA (Gao et al., 2018)	52.21	64.44	67.16	69.72	71.78	73.09	29.92	29.90	30.01
MvDA (Shu et al., 2019)	69.93	35.51	31.98	32.14	31.90	31.97	31.81	31.98	31.99
MvPLS (Cao et al., 2018)	41.18	41.98	42.08	41.41	40.87	40.17	39.83	39.21	38.82
EWNMF (Wei et al., 2023)	36.72	45.75	49.75	55.76	57.92	58.26	59.73	61.29	62.74
MUNPE (Jayashree, Shiva Prakash & Venugopal, 2024)	47.28	50.36	50.33	49.72	47.39	47.97	46.97	46.33	45.58
pcMRE	61.93	67.62	65.77	62.76	61.20	57.62	55.69	54.92	52.43
pcMLPRE	66.18	70.71	69.82	68.36	66.67	64.88	63.91	62.67	63.56
pcRMLPRE	64.35	70.68	67.16	67.52	66.99	67.39	66.98	66.61	65.86
daMRE	70.91	76.29	81.16	77.74	75.25	72.38	70.03	68.93	68.51
daMLPRE	70.96	76.37	80.49	79.05	76.78	74.63	71.39	70.34	67.73
daRMLPRE	69.02	76.93	77.09	71.04	68.58	65.88	63.41	64.76	67.50

DOI: 10.7717/peerj-cs.2619/table-5

Table 6:

1NN classification accuracy (%) for ORL Dataset with 0.1 salt-pepper noise added.

Bold entries indicate the best results.

	10	20	30	40	50	60	70	80	90
PCA (Oh & Kwak, 2016)	6.43	6.98	7.23	7.30	7.23	7.33	7.28	7.13	6.93
LDA (Belhumeur, Hespanha & Kriegman, 1997)	5.38	4.75	5.00	4.50	4.63	4.63	4.50	4.38	4.50
ALPCCA (Wang & Zhang, 2013)	8.10	12.96	17.15	20.79	24.00	26.54	28.13	29.31	30.73
LPCCA (Sun & Chen, 2007)	9.46	9.44	9.85	10.67	9.19	10.42	11.96	12.90	13.63
MCCA (Rupnik & Shawe-Taylor, 2010)	6.83	9.25	12.25	15.92	17.92	20.63	21.88	24.33	27.88
DMCCA (Gao et al., 2018)	24.42	37.63	45.25	51.88	50.13	50.21	50.67	55.75	58.38
MvDA (Shu et al., 2019)	61.83	76.67	83.46	31.67	30.71	30.71	29.38	28.88	29.21
MvPLS (Cao et al., 2018)	34.92	40.08	42.00	42.00	42.71	42.63	42.50	42.08	42.29
EWNMF (Wei et al., 2023)	45.48	53.92	56.76	58.94	62.34	61.88	64.66	65.18	65.79
MUNPE (Jayashree, Shiva Prakash & Venugopal, 2024)	22.08	27.50	33.75	25.00	30.00	28.33	30.42	31.25	31.25
pcMRE	46.58	60.33	63.25	65.25	66.25	66.75	66.42	66.58	67.92
pcMLPRE	60.42	68.83	70.58	69.42	68.92	68.00	66.92	66.75	66.00
pcRMLPRE	55.83	66.33	68.00	65.42	68.42	68.42	68.83	70.42	68.67
daMRE	68.42	70.56	74.83	81.03	85.67	83.42	81.75	78.08	76.50
daMLPRE	68.33	70.58	74.77	81.00	85.83	83.92	83.08	80.58	80.17
daRMLPRE	61.25	66.67	68.67	79.83	83.08	76.25	72.58	69.08	66.42

DOI: 10.7717/peerj-cs.2619/table-6

Table 7:

1NN classification accuracy (%) for Yale Face Database with 0.1 salt-pepper noise added.

Bold entries indicate the best results.

	10	20	30	40	50	60	70	80	90
PCA (Oh & Kwak, 2016)	6.62	7.73	8.22	8.13	8.18	8.04	7.51	7.82	7.96
LDA (Belhumeur, Hespanha & Kriegman, 1997)	13.56	13.11	13.56	14.00	13.56	13.33	13.33	13.56	13.78
ALPCCA (Wang & Zhang, 2013)	21.33	34.93	40.89	45.41	48.59	49.11	51.56	51.22	52.30
LPCCA (Sun & Chen, 2007)	21.41	24.96	26.89	25.48	23.19	23.37	24.63	26.78	27.81
MCCA (Rupnik & Shawe-Taylor, 2010)	19.04	24.74	29.63	32.37	37.04	40.44	42.00	41.93	43.56
DMCCA (Gao et al., 2018)	58.74	62.89	70.52	75.33	27.26	28.67	27.63	27.11	28.00
MvDA (Shu et al., 2019)	85.41	33.11	32.89	32.59	32.22	31.85	32.00	31.56	32.22
MvPLS (Cao et al., 2018)	31.70	35.56	37.85	38.15	39.70	39.78	40.44	39.70	39.56
EWNMF (Wei et al., 2023)	42.86	51.49	57.94	60.09	65.12	65.63	71.83	75.93	78.19
MUNPE (Jayashree, Shiva Prakash & Venugopal, 2024)	22.22	31.85	25.93	31.11	33.33	34.81	34.81	37.78	33.33
pcMRE	56.30	68.30	70.22	68.44	66.37	68.59	70.22	74.22	77.78
pcMLPRE	66.52	78.22	78.22	79.56	77.93	76.59	74.96	72.89	66.67
pcRMLPRE	65.63	74.96	77.33	78.33	78.81	77.15	75.74	76.48	76.22
daMRE	82.01	91.26	90.07	90.22	88.44	85.78	82.96	78.00	73.85
daMLPRE	82.07	90.22	89.15	89.75	88.11	87.41	87.56	87.26	87.81
daRMLPRE	67.26	81.48	78.11	78.26	77.96	79.00	73.96	70.33	69.44

DOI: 10.7717/peerj-cs.2619/table-7

Table 8:

1NN classification accuracy (%) for Coil Dataset with 0.3 salt-pepper noise added.

Bold entries indicate the best results.

	10	20	30	40	50	60	70	80	90
PCA (Oh & Kwak, 2016)	7.34	6.85	6.31	5.88	5.42	5.30	5.03	5.04	5.04
LDA (Belhumeur, Hespanha & Kriegman, 1997)	8.19	8.40	8.38	8.33	8.27	8.50	8.47	8.50	8.55
ALPCCA (Wang & Zhang, 2013)	14.41	19.98	23.02	24.15	24.43	24.07	24.93	24.60	25.26
LPCCA (Sun & Chen, 2007)	5.20	5.33	6.17	6.31	6.28	6.34	5.53	5.37	5.54
MCCA (Rupnik & Shawe-Taylor, 2010)	6.96	9.27	10.48	10.93	11.97	12.79	14.02	14.37	15.74
DMCCA (Gao et al., 2018)	28.03	37.82	39.11	41.05	44.04	45.94	22.07	21.83	22.07
MvDA (Shu et al., 2019)	51.23	28.76	27.31	27.24	26.90	26.67	26.59	26.50	26.59
MvPLS (Cao et al., 2018)	34.41	32.66	31.74	30.73	30.00	28.99	28.31	28.38	28.06
EWNMF (Wei et al., 2023)	23.86	31.85	32.05	32.17	33.42	33.62	33.71	34.14	34.27
MUNPE (Jayashree, Shiva Prakash & Venugopal, 2024)	27.33	31.11	30.47	30.25	30.31	29.53	28.89	29.06	28.31
pcMRE	39.94	38.87	40.15	41.25	38.95	41.74	41.11	39.33	39.01
pcMLPRE	44.59	44.84	45.28	44.39	43.43	41.24	39.20	37.46	35.87
pcRMLPRE	40.98	32.61	31.27	29.71	28.54	28.63	27.67	27.61	27.19
daMRE	47.86	52.77	66.46	63.26	57.53	52.38	51.13	50.27	50.49
daMLPRE	47.88	52.83	66.94	60.12	53.81	49.51	46.66	45.71	44.76
daRMLPRE	45.64	51.24	54.49	47.54	39.57	37.04	35.83	32.85	33.09

DOI: 10.7717/peerj-cs.2619/table-8

Table 9:

1NN classification accuracy (%) for ORL Dataset with 0.3 salt-pepper noise added.

Bold entries indicate the best results.

	10	20	30	40	50	60	70	80	90
PCA (Oh & Kwak, 2016)	6.50	6.85	6.38	6.33	6.33	5.93	6.08	6.03	5.88
LDA (Belhumeur, Hespanha & Kriegman, 1997)	4.38	4.25	4.50	4.13	4.38	4.25	4.87	4.50	4.75
ALPCCA (Wang & Zhang, 2013)	4.38	5.48	5.92	6.65	7.10	7.06	7.27	6.77	7.04
LPCCA (Sun & Chen, 2007)	3.67	3.85	4.38	4.08	3.10	4.60	4.81	5.27	5.29
MCCA (Rupnik & Shawe-Taylor, 2010)	3.42	4.17	5.04	5.96	6.08	6.21	6.54	6.88	6.75
DMCCA (Gao et al., 2018)	11.17	16.21	18.58	21.50	19.21	17.50	18.92	20.75	22.25
MvDA (Shu et al., 2019)	29.08	39.29	50.42	19.29	17.83	17.17	16.21	16.38	16.21
MvPLS (Cao et al., 2018)	22.79	25.38	23.25	23.42	23.13	22.96	22.67	22.79	22.58
EWNMF (Wei et al., 2023)	6.67	9.72	11.43	12.05	12.43	13.85	12.94	14.36	14.67
MUNPE (Jayashree, Shiva Prakash & Venugopal, 2024)	10.42	9.17	10.42	8.75	9.58	9.17	7.08	8.75	10.42
pcMRE	14.42	15.50	15.08	15.67	18.00	19.08	19.67	22.33	23.42
pcMLPRE	16.75	18.67	18.67	20.08	20.42	19.17	21.08	19.92	20.92
pcRMLPRE	6.83	8.75	9.25	10.00	11.00	11.67	12.42	12.33	12.50
daMRE	37.17	36.50	42.67	53.39	53.50	49.17	47.33	47.08	46.08
daMLPRE	37.47	36.52	42.77	53.42	54.00	49.33	44.83	41.58	40.67
daRMLPRE	25.92	24.67	28.33	30.08	29.83	26.42	23.67	22.83	22.92

DOI: 10.7717/peerj-cs.2619/table-9

Table 10:

1NN classification accuracy (%) for Yale Face Database with 0.3 salt-pepper noise added.

Bold entries indicate the best results.

	10	20	30	40	50	60	70	80	90
PCA (Oh & Kwak, 2016)	5.82	5.91	6.09	6.18	6.27	6.18	6.40	6.40	6.84
LDA (Belhumeur, Hespanha & Kriegman, 1997)	10.00	7.78	8.44	9.78	10.44	10.00	10.22	10.22	10.89
ALPCCA (Wang & Zhang, 2013)	10.11	12.59	14.07	16.00	16.81	17.41	18.11	17.41	17.56
LPCCA (Sun & Chen, 2007)	12.19	13.15	12.81	16.26	14.15	14.74	16.44	14.93	14.93
MCCA (Rupnik & Shawe-Taylor, 2010)	8.59	11.85	11.04	12.59	12.00	13.26	13.41	14.74	15.26
DMCCA (Gao et al., 2018)	25.41	25.93	29.70	32.52	14.37	13.85	13.93	14.59	13.48
MvDA (Shu et al., 2019)	48.81	18.96	17.93	18.37	17.78	17.70	17.56	18.15	18.15
MvPLS (Cao et al., 2018)	22.81	23.56	22.30	22.44	22.81	23.33	24.30	25.78	24.52
EWNMF (Wei et al., 2023)	19.46	21.75	22.30	21.36	22.96	23.33	24.54	24.75	25.03
MUNPE (Jayashree, Shiva Prakash & Venugopal, 2024)	14.07	8.15	8.15	8.89	10.37	7.41	5.93	8.89	8.89
pcMRE	20.59	30.22	31.41	30.52	28.30	32.30	33.63	36.44	35.11
pcMLPRE	30.67	30.22	27.56	26.22	27.26	27.11	26.52	25.63	25.04
pcRMLPRE	17.19	19.56	21.19	21.19	20.15	21.33	21.63	20.15	21.33
daMRE	48.62	70.22	62.52	57.04	51.85	56.30	51.70	48.74	48.85
daMLPRE	48.59	71.70	59.26	49.33	41.04	36.15	34.78	33.85	33.66
daRMLPRE	34.52	34.22	33.56	35.85	31.70	30.22	30.07	31.85	31.89

DOI: 10.7717/peerj-cs.2619/table-10

Figure 4: Classification accuracies with a different dimension (0.1 salt-pepper noise).
(A) Coil Dataset. (B) ORL Dataset. (C) Yale Face Database.

Download full-size image

DOI: 10.7717/peerj-cs.2619/fig-4

Figure 5: Classification accuracies with a different dimension (0.3 salt-pepper noise).
(A) Coil Dataset. (B) ORL Dataset. (C) Yale Face Database.

Download full-size image

DOI: 10.7717/peerj-cs.2619/fig-5

Regarding Tables 5–7 and Fig. 4, the following conclusions can be made:

The proposed methods continue to perform well even with small amounts of noise in the data. Methods that incorporate shared structural information terms (MLPRE-based methods) generally outperform those without these terms (MRE-based methods) across the board. The L2,1-norm regularization remains more suitable for unsupervised methods, while the F-norm regularizer remains better suited for supervised methods.

According to Tables 8–10 and Fig. 5, the following conclusions can be made:

With increasing levels of noise, the accuracy and stability of various methods decline, but our methods still show significant advantages. The L2,1-norm regularization methods (RMLPRE-based methods) exhibit significantly lower accuracy compared to the F-norm, suggesting that the L2,1-norm regularizer is less robust in high-noise data. Interestingly, methods incorporating shared structural information terms (MLPRE-based methods) start to perform worse than those without these terms (MRE-based methods). This is due to the severe inaccuracy of the shared $k$ -nearest neighbour in a high-noise environment, which negatively affects the performance of the models.

Experimental results on both real and noisy datasets demonstrate that our framework effectively captures the consistency and complementarity of multi-view data, with robustness further enhanced by incorporating L2,1-norm constraints. Moreover, by employing a nonlinear shared embedding, the framework significantly alleviates the critical information loss commonly associated with linear projection methods. This approach enables better retention of discriminative information, ensuring stable performance even in noisy conditions and underscoring the advantages of our framework over traditional methods.

Ablation study

The effects of each component of the model framework on the feature extraction results are evaluated through ablation studies.

Table 11 presents the peak accuracy of the k-nearest neighbors algorithm following dimensionality reduction through different combinations of components within the model framework on datasets with varying noise levels. The results reveal that the norm term of the projection matrix exerts the most pronounced influence on the discriminative capability of feature extraction outcomes, with the F-norm term showing a slight advantage over the L2,1-norm term. In the context of low-noise datasets, the shared k-nearest neighbor structure term has relatively little influence on the discriminative power of feature extraction. However, it significantly improves the accueacy of subsequent classification algorithms in noisy conditions.

Table 11:

Experimental results of ablation study.

Dataset	GE (PCA)	$∥ P_{(i)}^{T} X^{(i)} - Y ∥_{F}^{2}$	$∥ P_{(i)}^{T} ∥_{F}^{2}$	$∥ P_{(i)}^{T} ∥ 2, 1$	$W_{j k}^{L P (i)} ∥ Y_{j} - Y_{k} ∥_{F}^{2}$	Accuracy (%)
Coil	✓	✓	✗	✗	✗	24.19
	✓	✓	✓	✗	✗	82.58
	✓	✓	✗	✓	✗	79.17
	✓	✓	✗	✗	✓	24.76
	✓	✓	✓	✗	✓	82.52
	✓	✓	✗	✓	✓	80.94
Coil (0.1 salt-pepper noise)	✓	✓	✗	✗	✗	5.38
	✓	✓	✓	✗	✗	67.62
	✓	✓	✗	✓	✗	62.72
	✓	✓	✗	✗	✓	5.42
	✓	✓	✓	✗	✓	70.71
	✓	✓	✗	✓	✓	70.68
Coil (0.3 salt-pepper noise)	✓	✓	✗	✗	✗	5.77
	✓	✓	✓	✗	✗	41.74
	✓	✓	✗	✓	✗	21.18
	✓	✓	✗	✗	✓	6.96
	✓	✓	✓	✗	✓	45.28
	✓	✓	✗	✓	✓	40.98

DOI: 10.7717/peerj-cs.2619/table-11

By minimizing the norm of the projection matrix, we strive to obtain a “simplified” projection matrix, which preserves the structural integrity of the original data after dimensionality reduction, thus preventing structural imbalances caused by overfitting of the training dataset and improving the discriminant ability of feature extraction. Regarding the simplification of the projection matrix, the F-norm term is more inclined to select an abundance of features near zero, whereas the L2,1-norm term results in a sparser set of feature vectors, exhibiting lower sensitivity to outliers. The superiority of the F-norm term over the L2,1-norm term in this research is attributed to the linear projection fitting term’s inherent outlier rejection, which reduces the effectiveness of the L2,1-norm term. As a result, in the ablation studies, the L2,1-norm term failed to outperform the F-norm term in influencing feature extraction discriminability. The global shared k-nearest neighbor structure term is beneficial for preserving the spatial structure of the data. In low-noise datasets, where data is rich in information, the linear projection fitting term and the projection matrix norm term are sufficient to maintain the intrinsic structural relationships, rendering the global shared k-nearest neighbor structure term less impactful. Conversely, in high-noise scenarios where the original data information is heavily distorted, the global shared k-nearest neighbor structure term significantly augments the discriminative ability of feature extraction by complementing the intrinsic structural relationships.

Influence of parameters

To explore the impact of parameters $γ$ and $λ$ on model performance, we conducted experiments by extracting the features to 50 dimensions and then classifying them. We’ve created corresponding curves depicting the average classification accuracy as a function of these parameters, as shown in Fig. 6. Based on the figures, we’ve made the following conclusions:

Figure 6: The parameters influence.
(A) pcMRE. (B) pcMLPRE. (C) pcRMLPRE. (D) daMRE. (E) daMLPRE. (F) daRMLPRE.

Download full-size image

DOI: 10.7717/peerj-cs.2619/fig-6

In the MRE framework, pcMRE appears to be relatively insensitive to the parameter $γ$ , while daMRE is more sensitive to this parameter. The highest classification accuracy is typically achieved when $γ = 2$ . In the MLPRE framework, pcMLPRE is generally insensitive to the parameter $γ$ . When $γ$ is less than or equal to 2, pcMLPRE is insensitive to the parameter $λ$ . However, when $γ$ is greater than 2, pcMLPRE becomes sensitive to the parameter $λ$ , and the average classification accuracy increases with an increase in this parameter. Conversely, daMLPRE is sensitive to the parameter $γ$ , and the average classification accuracy decreases as this parameter increases. daMLPRE is also sensitive to the parameter $λ$ , with the highest classification accuracy typically occurring when $λ = 2$ . In the RMLPRE framework, pcRMLPRE is sensitive to both parameters $γ$ and $λ$ . The classification accuracy tends to be higher when $λ = 0.5$ . Similarly, daRMLPRE is sensitive to both parameters $γ$ and $λ$ , with the highest classification accuracy typically occurring when $λ = 8$ .

In summary, the choice of parameters $γ$ and $λ$ significantly influences the classification accuracy of various models and frameworks. Fine-tuning these parameters allows us to identify the optimal configurations for achieving the best performance under different conditions. Therefore, careful parameter selection is a crucial consideration during the experimental process, as it can substantially enhance a model’s classification accuracy, better aligning it with the requirements of the downstream tasks.

Convergence analysis

We uniformly extract the features to 50 dimensions using our methods. Figure 7 shows the convergence curves of the proposed methods. As depicted in the figure, the objective function value for the MRE and MLPRE methods remains nearly constant after the third iteration. Similarly, for the RMLPRE method, it stabilizes after seven iterations. This demonstrates that our proposed methods converge rapidly, with the MRE and MLPRE methods converging faster than the RMLPRE method.

Conclusions

This article introduces three novel multi-view feature extraction frameworks based on regression embedding, extending single-view graph embedding models to the multi-view scenario. Our frameworks meticulously consider the consistency and complementarity of multi-view data, emphasizing robustness to noisy datasets. Furthermore, the utilization of non-linear shared embedding helps prevent the loss of essential information that can occur with linear projection techniques. Through numerical experiments, we verify the effectiveness and robustness of our frameworks on both real and noisy datasets. In particular, we analyze the applicability, advantages, and disadvantages of each framework, providing a solid theoretical and experimental basis for choosing the appropriate framework for specific tasks.

However, it is important to note that our frameworks only extend the single-view graph embedding methods, and they may not be applicable to some other excellent single-view methods. In our future work, we aim to develop a more general framework that acts as a bridge between single-view feature extraction and multi-view feature extraction.

Supplemental Information

This file includes raw data and code.

DOI: 10.7717/peerj-cs.2619/supp-1

Download

Additional Information and Declarations

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Ling Jing conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Yi Li conceived and designed the experiments, performed the experiments, performed the computation work, prepared figures and/or tables, and approved the final draft.

Hongjie Zhang conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, authored or reviewed drafts of the article, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The experimental data and code for each dataset are available in the Supplemental Files.

The Coil Dataset is available at https://cave.cs.columbia.edu/repository/COIL-20.

Our Database of Faces, formerly ‘The ORL Database of Faces’, is available at: http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html.

The Yale Face Database is available at: https://vision.ucsd.edu/datasets/yale-face-database.

Funding

The work was supported by the National Natural Science Foundation of China (Nos. 62076244, 12071024), the Beijing Digital Agriculture Innovation Consortium Project (BAIC10-2023), and the National Shrimp and Crab Industry Technical System Construction Project 2022 (No. CARS-48). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

[1] Belhumeur PN, Hespanha JP, Kriegman DJ. 1997. Eigenfaces vs. fisherfaces: recognition using class specific linear projection. Ieee Transactions on Pattern Analysis and Machine Intelligence 19(7):711-720

[2] Braik M, Hammouri AI, Alzoubi H, Sheta AF. 2024. Feature selection based nature inspired capuchin search algorithm for solving classification problems. Expert Systems with Applications 235(1):121128

[3] Cai D, He X, Han J. 2007. Isometric projection.

[4] Cao G, Iosifidis A, Chen K, Gabbouj M. 2018. Generalized multi-view embedding for visual recognition and cross-modal retrieval. IEEE Transactions on Cybernetics 48(9):2542-2555

[5] Feng W, Wang Z. 2023. Multi-view multi-manifold learning with local and global structure preservation. Applied Intelligence 53(10):12908-12924

[6] Gao L, Qi L, Chen E, Guan L. 2018. Discriminative multiple canonical correlation analysis for information fusion. IEEE Transactions on Image Processing 27(4):1951-1965

[7] Hardoon DR, Szedmák S, Shawe-Taylor J. 2004. Canonical correlation analysis: an overview with application to learning methods. Neural Computation 16(12):2639-2664

[8] He X, Cai D, Yan S, Zhang H. 2005. Neighborhood preserving embedding.

[9] He X, Niyogi P. 2003. Locality preserving projections. In: Thrun S, Saul LK, Schölkopf B, eds. NeurIPS Proceedings. 153-160

[10] Huang H, Liu Z, Chen CLP, Zhang Y. 2023. Hyperspectral image classification via active learning and broad learning system. Applied Intelligence 53(12):15683-15694

[11] Hughes GF. 1968. On the mean accuracy of statistical pattern recognizers. IEEE Transactions on Information Theory 14(1):55-63

[12] Jain AK, Duin RPW, Mao J. 2000. Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1):4-37

[13] Jayashree, Shiva Prakash T, Venugopal KR. 2024. MUNPE: multi-view uncorrelated neighborhood preserving embedding for unsupervised feature extraction. Knowledge-Based Systems 287(2):111421

[14] Li X, Wang Q, Nie F, Chen M. 2022. Locality adaptive discriminant analysis framework. IEEE Transactions on Cybernetics 52(8):7291-7302

[15] Li J, Zhou C, Ji X, Li M, Lu G, Xu Y, Zhang D. 2024. Multi-view instance attention fusion network for classification. Information Fusion 101(3):101974

[16] Liang N, Duan P, Xu H, Cui L. 2022. Multi-view structural feature extraction for hyperspectral image classification. Remote Sensing 14(9):1971

[17] Liu X, Ding S, Xu X, Wang L. 2023. Deep manifold regularized semi-nonnegative matrix factorization for multi-view clustering. Applied Soft Computing 132(2):109806

[18] Nene S, Nayar S, Murase H. 1996. Columbia object image library.

[19] Nie F, Zhao X, Wang R, Li X. 2023. Adaptive maximum entropy graph-guided fast locality discriminant analysis. IEEE Transactions on Cybernetics 53(6):3574-3587

[20] Oh J, Kwak N. 2016. Generalized mean for robust principal component analysis. Pattern Recognition 54(1):116-127

[21] Qiang Q, Zhang B, Wang F, Nie F. 2022. Multi-view unsupervised dimensionality reduction with probabilistic neighbors. Neurocomputing 500(3):203-216

[22] Qiao L, Chen S, Tan X. 2010. Sparsity preserving projections with applications to face recognition. Pattern Recognit 43(1):331-341

[23] Rupnik J, Shawe-Taylor J. 2010. Multi-view canonical correlation analysis. In: Proceedings of the Conference on Data Mining and Data Warehouses, 12 October 2010, Ljubljana, Slovenia.

[24] Samaria F, Harter A. 1994. Parameterisation of a stochastic model for human face identification.

[25] Shu X, Yuan P, Jiang H, Lai D. 2019. Multi-view uncorrelated discriminant analysis via dependence maximization. Applied Intelligence 49(2):650-660

[26] Sugiyama M. 2007. Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. Journal of Machine Learning Research 8:1027-1061

[27] Sun T, Chen S. 2007. Locality preserving CCA with applications to data visualization and pose estimation. Image and Vision Computing 25(5):531-543

[28] Wang F, Zhang D. 2013. A new locality-preserving canonical correlation analysis algorithm for multi-view dimensionality reduction. Neural Processing Letters 37(2):135-146

[29] Wei J, Tong C, Wu B, He Q, Qi S, Yao Y, Teng Y. 2023. An entropy weighted nonnegative matrix factorization algorithm for feature representation. IEEE Transactions on Neural Networks and Learning Systems 34(9):5381-5391

[30] Wong HS, Wang L, Chan RH, Zeng T. 2022. Deep tensor CCA for multi-view learning. IEEE Transactions on Big Data 8(6):1664-1677

[31] Yan S, Xu D, Zhang B, Zhang H, Yang Q, Lin S. 2007. Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(1):40-51

[32] Zhang H, Qiang W, Zhang J, Chen Y, Jing L. 2022a. Unified feature extraction framework based on contrastive learning. Knowledge-Based Systems 258(4):110028