Multiview data visualisation via manifold learning
 Published
 Accepted
 Received
 Academic Editor
 Shibiao Wan
 Subject Areas
 Computational Biology, Algorithms and Analysis of Algorithms, Data Mining and Machine Learning, Data Science, Visual Analytics
 Keywords
 Multimodal data, Multiview data, Data visualisation, Data clustering, Manifold learning
 Copyright
 © 2024 Rodosthenous et al.
 Licence
 This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
 Cite this article
 2024. Multiview data visualisation via manifold learning. PeerJ Computer Science 10:e1993 https://doi.org/10.7717/peerjcs.1993
Abstract
Nonlinear dimensionality reduction can be performed by manifold learning approaches, such as stochastic neighbour embedding (SNE), locally linear embedding (LLE) and isometric feature mapping (ISOMAP). These methods aim to produce two or three latent embeddings, primarily to visualise the data in intelligible representations. This manuscript proposes extensions of Student’s tdistributed SNE (tSNE), LLE and ISOMAP, for dimensionality reduction and visualisation of multiview data. Multiview data refers to multiple types of data generated from the same samples.
The proposed multiview approaches provide more comprehensible projections of the samples compared to the ones obtained by visualising each dataview separately. Commonly, visualisation is used for identifying underlying patterns within the samples. By incorporating the obtained lowdimensional embeddings from the multiview manifold approaches into the Kmeans clustering algorithm, it is shown that clusters of the samples are accurately identified. Through extensive comparisons of novel and existing multiview manifold learning algorithms on real and synthetic data, the proposed multiview extension of tSNE, named multiSNE, is found to have the best performance, quantified both qualitatively and quantitatively by assessing the clusterings obtained.
The applicability of multiSNE is illustrated by its implementation in the newly developed and challenging multiomics singlecell data. The aim is to visualise and identify cell heterogeneity and cell types in biological tissues relevant to health and disease. In this application, multiSNE provides an improved performance over singleview manifold learning approaches and a promising solution for unified clustering of multiomics singlecell data.
Introduction
Data visualisation is an important and useful component of exploratory data analysis, as it can reveal interesting patterns in the data and potential clusters of the observations. A common approach for visualising highdimensional data (data with a higher number of features ( $p$) than samples ( $n$), i.e. $p\gg n$) is by reducing its dimensions. Linear dimensionality reduction methods, including principal component analysis (PCA) (Jolliffe & Cadima, 2016) and nonnegative matrix factorization (NMF) (Garcia et al., 2018), assume linearity within data sets and as a result these methods often fail to produce reliable representations when linearity does not hold. Manifold learning, an active research area within machine learning, in contrast to the linear dimensionality reduction approaches do not rely on any linearity assumptions. By assuming that the dimensions of the data sets are artificially high, manifold learning methods aim to capture important information with minimal noise, in an induced lowdimensional embedding (Zheng & Xue, 2009). The generated lowdimensional embeddings can be used for data visualisation in the 2D or 3D spaces.
Manifold learning approaches used for dimensionality reduction and visualisation, focus on preserving at least one of the characteristics of the data. For example, the stochastic neighbour embedding (SNE) preserves the probability distribution of the data (Hinton & Roweis, 2003). The locally linear embedding (LLE) proposed by Roweis & Saul (2000) is a neighbourhoodpreserving method. The isometric feature mapping (ISOMAP) proposed by Tenenbaum, Silva & Langford (2000) is a quasiisometric method based on multidimensional scaling (Kruskal, 1964). Spectral Embedding finds lowdimensional embeddings via spectral decomposition of the Laplacian matrix (Ng, Jordan & Weiss, 2001). The Local Tangent Space Alignment method proposed by Zhang & Zha (2004) learns the embedding by optimising local tangent spaces, which represent the local geometry of each neighbourhood, and uniform manifold approximation and projection (UMAP) preserves the global structure of the data by constructing a theoretical framework based on Riemannian geometry and algebraic topology (McInnes et al., 2018).
This manuscript focuses on data visualisation of multiview data, which are regarded as different types of data sets that are generated on the same samples of a study. It is very common nowadays in many different fields to generate multiple dataviews on the same samples. For example, multiview imaging data describe distinct visual features such as local binary patterns (LBP), and histogram of oriented gradients (HOG) (Shen, Tao & Ma, 2013), while multiomics data, e.g. proteomics, genomics, etc, in biomedical studies quantify different aspects of an organism’s biological processes (Hasin, Seldin & Lusis, 2017). Through the collection of multiview data, researchers are interested in better understanding the collected samples, including their visualisation, clustering and classification. Analysing simultaneously the multiview data is not a straightforward task, as each dataview has its own distribution and variation pattern (Rodosthenous, Shahrezaei & Evangelou, 2020).
Several approaches have been proposed for the analysis of multiview data. These include methods on clustering (Kumar, Rai & Hal, 2011; Liu et al., 2013; Sun et al., 2015; Ou et al., 2016; Ye et al., 2018; Ou et al., 2018; Wang & Allen, 2021), classification (Shu, Zhang & Tang, 2019), regression (Li, Liu & Chen, 2019) and dimensionality reduction (Rodosthenous, Shahrezaei & Evangelou, 2020; Sun, 2013; Zhao et al., 2018; Xu, Tao & Xu, 2015). Multiview approaches have been reviewed and discussed in the overview articles of Xu, Tao & Xu (2013) and Zhao et al. (2017). Another field that has dealt with multiview data is representation learning. In the multiview representation learning survey of Li, Liu & Chen (2019) two main strategies were identified: alignment and fusion. Fusion methods include both graphical models and neural network models.
In this manuscript, we focus on the visualisation task through manifold learning. By visualising multiview data collectively the aim is to obtain a global overview of the data and identify patterns that would have potentially be missed if each dataview was visualised separately. Typically, multiple visualisations are produced, one from each dataview, or the features of the dataviews are concatenated to produce a single visualisation. The former could provide misleading outcomes, with each dataview revealing different visualisations and patterns. The different statistical properties, physical interpretation, noise and heterogeneity between dataviews suggest that concatenating features would often fail in achieving a reliable interpretation and visualisation of the data (Fu et al., 2008).
A number of multiview visualisation approaches have been proposed in the literature, with some of these approaches based on the manifold approaches tSNE and LLE. For example, Xie et al. (2011) proposed mSNE that combines the probability distributions produced by each dataview into a single distribution via a weight parameter. The algorithm then implements tSNE on the combined distribution to obtain a single lowdimensional embedding. The proposed solution finds the optimal choice for both the lowdimensional embeddings and the weight parameter simultaneously. Similarly, Kanaan Izquierdo (2017) proposed two alternative solutions based on tSNE, named MVtSNE1 and MVtSNE2. MVtSNE2 is similar to mSNE combining the probability distributions through expert opinion pooling.
Portions of this text were previously published as part of a preprint (https://doi.org/10.48550/arXiv.2101.06763). In parallel to our work, Canzar & Hoan Do (2021) proposed a multiview extension of tSNE, named jSNE. Both multiSNE and jSNE firstly appeared as preprints in January 2021 https://doi.org/10.1101/2021.01.10.426098. JSNE produces lowdimensional embeddings through an iterative procedure that assigns each dataview a weight value that is updated per iteration through regularisation.
In addition, Shen, Tao & Ma (2013) proposed multiview locally linear embeddings (mLLE) that is an extension of LLE for effectively retrieving medical images. MLLE produces a single lowdimensional embedding by integrating the embeddings from each dataview according to a weight parameter $c$, which refers to the contribution of each dataview. Similarly to mSNE, the algorithm optimizes both the weight parameter and the embeddings simultaneously. Zong et al. (2017) proposed MVLLE that minimises the cost function by assuming a consensus matrix across all dataviews.
Building on the existing literature work, we propose here alternative extensions to the manifold approaches: tSNE, LLE, and ISOMAP, for visualising multiview data. The cost functions of our proposals are different from the existing ones, as they integrate the available information from the multiview data iteratively. At each iteration, the proposed multiSNE updates the lowdimensional embeddings by minimising the dissimilarity between their probability distribution and the distribution of each dataview. The total cost of this approach equals to the weighted sum of those dissimilarities. Our proposed variation of LLE, MultiLLE, constructs the lowdimensional embeddings by utilising a consensus weight matrix, which is taken as the weighted sum of the weight matrices computed by each dataview. Lastly, the lowdimensional embeddings in the proposed multiISOMAP are constructed by using a consensus graph, for which the nodes represent the samples and the edge lengths are taken as the averaged distance between the samples in each dataview. MISOMAP is proposed as an alternative ISOMAPbased multiview manifold learning algorithm. Similar to mSNE and mLLE, mISOMAP provide a weighted integration of the lowdimensional embeddings produced by the implementation of ISOMAP on each dataview separately.
As the field of multiview data analysis is relatively new, the literature lacks comparative studies between multiview manifold learning algorithms. This manuscript makes a novel contribution to the field by conducting extensive comparisons between the multiview nonlinear dimensionality reduction approaches proposed in this manuscript, multiSNE, multiLLE, multiISOMAP and mISOMAP with other approaches proposed in the literature. These comparisons are conducted on both real and synthetic data that have been designed to capture different data characteristics. The aim of these comparisons is to identify the bestperforming algorithms, discuss pitfalls of the approaches and guide the users to the most appropriate solution for their data.
We illustrate that our proposals result to more robust solutions compared to the approaches proposed in the literature, including mSNE, mLLE and MVSNE. We further illustrate through the visualisation of the lowdimensional embeddings produced by the proposed multiview manifold learning algorithms, that if clusters exist within the samples, they can be successfully identified. We show that this can be achieved by applying the Kmeans algorithm on the lowdimensional embeddings of the data. The Kmeans (MacQueen, 1967) was chosen to cluster the data points, as it is one of the most famous and prominent partition clustering algorithms (Xu & Tian, 2015). A better clustering performance by Kmeans suggests a visually clearer separation of clusters. Through the conducted experiments, we show that the proposed multiSNE approach recovers wellseparated clusters of the data, and has comparable performance to multiview clustering algorithms that exist in the literature.
Materials and Methods
In this section, the proposed approaches for multiview manifold learning are described. This section starts with an introduction of the notation used throughout this manuscript. The proposed multiSNE, multiLLE and multiISOMAP are described in Sections “MultiSNE”, “MultiLLE” and “MultiISOMAP”, respectively. The section ends with a description of the process for tuning the parameters of the algorithms.
Notation
Throughout this article, the following notation is used:
N: The number of samples.
$p$: The number of variables of the design matrix.
$X\in {\mathbb{R}}^{N\times p}$: A singleview data matrix, representing the original highdimensional data used as input; ${\mathbf{x}}_{i}\in {\mathbb{R}}^{p}$ is the ${i}^{th}$ data point of X.
M: The number of dataviews in a given data set; $m\in \{1,\cdots ,M\}$ represents an arbitrary dataview.
${X}^{(m)}\in {\mathbb{R}}^{N\times {p}_{m}}$: The ${m}^{th}$ dataview of multiview data; ${\mathbf{x}}_{i}^{m}\in {\mathbb{R}}^{{p}_{m}}$ is the ${i}^{th}$ data point of ${X}^{(m)}$.
$Y\in {\mathbb{R}}^{N\times d}$: A lowdimensional embedding of the original data. ${\mathbf{y}}_{i}\in {\mathbb{R}}^{d}$ represents the ${i}^{th}$ data point of Y. In this manuscript, $d=2$, as the focus of the manuscript is on data visualisation.
MultiSNE
SNE, proposed by Hinton & Roweis (2003), measures the probability distribution, P of each data point ${\mathbf{x}}_{i}$ by looking at the similarities among its neighbours. For every sample $i$ in the data, $j$ is taken as its potential neighbour with probability ${p}_{ij}$, given by
(1) $${p}_{ij}=\frac{\mathrm{exp}({d}_{ij}^{2})}{{\sum}_{k\ne i}\mathrm{exp}({d}_{ik}^{2})},$$where ${d}_{ij}=\frac{{\mathbf{x}}_{i}{\mathbf{x}}_{j}{}^{2}}{2{\sigma}_{i}^{2}}$ represents the dissimilarity between points ${\mathbf{x}}_{i}$ and ${\mathbf{x}}_{j}$. The value of ${\sigma}_{i}$ is either set by hand or found by binary search (van der Maaten & Hinton, 2008). Based on this value, a probability distribution of sample $i$, ${P}_{i}={\sum}_{j}{p}_{ij}$, with fixed perplexity is produced. Perplexity refers to the effective number of local neighbours and it is defined as $Perp({P}_{i})={2}^{H({P}_{i})}$, where $H({P}_{i})={\sum}_{j}{p}_{ij}{\mathrm{log}}_{2}{p}_{ij}$ is the Shannon entropy of ${P}_{i}$. It increases monotonically with the variance ${\sigma}_{i}$ and typically takes values between $5$ and $50$.
In the same way, a probability distribution in the lowdimensional space, Y, is computed as follows:
(2) $${q}_{ij}=\frac{\mathrm{exp}({\mathbf{y}}_{i}{\mathbf{y}}_{j}{}^{2})}{{\sum}_{k\ne i}\mathrm{exp}({\mathbf{y}}_{i}{\mathbf{y}}_{k}{}^{2})},$$which represents the probability of point $i$ selecting point $j$ as its neighbour.
The induced embedding output, ${\mathbf{y}}_{i}$, represented by probability distribution, Q, is obtained by minimising the KullbackLeibler divergence (KLdivergence) $KL(PQ)$ between the two distributions P and Q (Kullback & Leibler, 1951). The aim is to minimise the cost function:
(3) $${C}_{SNE}={\sum}_{i}KL({P}_{i}{Q}_{i})={\sum}_{i}{\sum}_{j}{p}_{ij}\mathrm{log}\frac{{p}_{ij}}{{q}_{ij}}$$
Hinton & Roweis (2003) assumed a Gaussian distribution in computing the similarity between two points in both high and low dimensional spaces. van der Maaten & Hinton (2008) proposed a variant of SNE, called tSNE, which uses a symmetric version of SNE and a Student tdistribution to compute the similarity between two points in the lowdimensional space Q, given by
(4) $${q}_{ij}=\frac{{(1+{\mathbf{y}}_{i}{\mathbf{y}}_{j}{}^{2})}^{1}}{{\sum}_{k\ne l}{(1+{\mathbf{y}}_{k}{\mathbf{y}}_{l}{}^{2})}^{1}}$$
TSNE is often preferred, because it reduces the effect of crowding problem (limited area to accommodate all data points and differentiate clusters) and it is easier to optimise, as it provides simpler gradients than SNE (van der Maaten & Hinton, 2008).
We propose multiSNE, a multiview manifold learning algorithm based on tSNE. Our proposal computes the KLdivergence between the distribution of a single lowdimensional embedding and each dataview of the data separately, and minimises their weighted sum. An iterative algorithm is proposed, in which at each iteration the induced embedding is updated by minimising the cost function:
(5) $${C}_{multiSNE}={\sum}_{m}{\sum}_{i}{\sum}_{j}{w}^{m}{p}_{ij}^{m}\mathrm{log}\frac{{p}_{ij}^{m}}{{q}_{ij}},$$where ${w}^{m}$ is the combination coefficient of the ${m}^{th}$ dataview. The vector $\mathbf{w}=({w}^{1},\cdots ,{w}^{M})$ acts as a weight vector that satisfies ${\sum}_{m}{w}^{m}=1$. In this study, equal weights on all dataviews were considered, i.e. ${w}^{m}=\frac{1}{M},\phantom{\rule{1em}{0ex}}\mathrm{\forall}m=1,\cdots ,M$. The algorithm of the proposed multiSNE approach is presented in Algorithm 1 of Section 1 in the Supplemental File.
An alternative multiview extension of tSNE, called mSNE was proposed by Xie et al. (2011). MSNE applies tSNE on a single distribution in the highdimensional space, which is computed by combining the probability distributions of the dataviews, given by ${p}_{ij}={\sum}_{m=1}^{M}{\beta}^{m}{p}_{ij}^{m}$. The coefficients (or weights) ${\beta}^{m}$ share the same role as ${w}^{m}$ in multiSNE and similarly $\text{\beta}=({\beta}^{1},\cdots ,{\beta}^{M})$ satisfies ${\sum}_{m}{\beta}^{m}=1$. This leads to a different cost function than the one in Eq. (5).
Kanaan Izquierdo (2017) proposed a similar cost function for multiview tSNE, named MVtSNE1, given as follows:
(6) $${C}_{\mathrm{M}\mathrm{V}\mathrm{t}\mathrm{S}\mathrm{N}\mathrm{E}1}=\sum _{m}\sum _{i}\sum _{j}{p}_{ij}^{m}\mathrm{log}\frac{{p}_{ij}^{m}}{{q}_{ij}}$$
Their proposal is a special case of multiSNE, with ${w}_{m}=\frac{1}{M}$. Kanaan Izquierdo (2017) did not pursue MVtSNE1 any further, but instead, they proceeded with an alternative solution, MVtSNE2, which combines the probability distributions (similar to mSNE) through expert opinion pooling. A comparison between multiSNE, mSNE and MVtSNE2 is presented in Fig. S2 of Section 4.1 in the Supplemental File. Based on two real data sets, multiSNE and mSNE outperformed MVtSNE2, with the solution by multiSNE producing the best separation among the clusters in both examples.
MultiSNE avoids combining the probability distributions of all dataviews together. Instead, the induced embeddings are updated by minimising the KLdivergence between every dataview’s probability distribution and that of the lowdimensional representation we seek to obtain. In other words, this is achieved by computing and summing together the gradient descent for each dataview. The induced embedding is then updated by minimising the summed gradient descent.
Throughout this article, for all variations of tSNE we have applied the PCA pretraining step proposed by van der Maaten & Hinton (2008). van der Maaten & Hinton (2008) discussed that by reducing the dimensions of the input data through PCA the computational time of tSNE is reduced. In this article, the principal components taken retained at least $80\mathrm{\%}$ of the total variation (variance explained) in the original data. In addition, as the multiSNE algorithm is an iterative algorithm we opted for running the algorithm for 1,000 iterations for all analyses conducted. Alternatively, a stopping rule could have been implemented with the iterative algorithm to stop after no significant changes were observed to the costfunction. Both these options are available at the implementation of the multiSNE algorithm.
MultiLLE
LLE attempts to discover a nonlinear structure of highdimensional data, X, by computing lowdimensional and neighbourhoodpreserving embeddings, Y (Saul & Roweis, 2001). The main three steps of the algorithm are:

1.
The set, denoted by ${\mathrm{\Gamma}}_{i}$, contains the K nearest neighbours of each data point ${\mathbf{x}}_{i},i=1,\cdots ,N$. The most common distance measure between the data points is the Euclidean distance. Other local metrics can also be used in identifying the nearest neighbours (Roweis & Saul, 2000).

2.
A weight matrix, W, is computed, which acts as a bridge between the highdimensional space in X and the lowdimensional space in Y. Initially, W reconstructs X, by minimising the cost function:
(7) $${\mathcal{E}}_{X}=\sum _{i}{\mathbf{x}}_{i}\sum _{j}{W}_{ij}{\mathbf{x}}_{j}{}^{2}$$where the weights ${W}_{ij}$ describe the contribution of the ${j}^{th}$ data point to the ${i}^{th}$ reconstruction. The optimal weights ${W}_{ij}$ are found by solving the least squares problem given in Eq. (7) subject to the constraints:
(a) ${W}_{ij}=0$, if $j\notin {\mathrm{\Gamma}}_{i}$, and
(b) ${\sum}_{j}{W}_{ij}=1$

3.
Once W is computed, the lowdimensional embedding ${\mathbf{y}}_{i}$ of each data point $i=1,\cdots ,N$, is obtained by minimising:
(8) $${\mathcal{E}}_{Y}=\sum _{i}{\mathbf{y}}_{i}\sum _{j}{W}_{ij}{\mathbf{y}}_{j}{}^{2}$$
The solution to Eq. (8), is obtained by taking the bottom $d$ nonzero eigenvectors of the sparse $N\times N$ matrix, $M=(IW{)}^{T}(IW)$ (Roweis & Saul, 2000).
We propose multiLLE, a multiview extension of LLE, that computes the lowdimensional embeddings by using the consensus weight matrix:
(9) $$\hat{W}=\sum _{m}{\alpha}^{m}{W}^{m}$$where ${\sum}_{m}{\alpha}^{m}=1$, and ${W}^{m}$ is the weight matrix for each dataview $m=1,\cdots M$. Thus, $\hat{Y}$ is obtained by solving:
$${\mathcal{E}}_{\hat{Y}}=\sum _{i}{\hat{\mathbf{y}}}_{i}\sum _{j}{\hat{W}}_{ij}{\hat{\mathbf{y}}}_{j}{}^{2}$$
The multiLLE algorithm is presented in Algorithm 2 of Section 1 in the Supplemental File.
Shen, Tao & Ma (2013) proposed mLLE, an alternative multiview extension of LLE. The LLE embeddings of each dataview are combined and LLE is applied to each dataview separately. The weighted average of those embeddings is taken as the unified lowdimensional embedding. In other words, computing the weight matrices ${W}^{m}$ and solving ${\mathcal{E}}_{{Y}^{m}}={\sum}_{i}{\mathbf{y}}_{i}^{m}{\sum}_{j}{W}_{ij}^{m}{\mathbf{y}}_{j}^{m}{}^{2}$, for each $m=1,\cdots M$ separately. Thus, the lowdimensional embedding $\hat{Y}$ is computed by $\hat{Y}={\sum}_{m}{\beta}^{m}{Y}^{m}$, where ${\sum}_{m}{\beta}^{m}=1$.
An alternative multiview LLE solution was proposed by Zong et al. (2017) to find a consensus manifold, which is then used for multiview clustering via nonnegative matrix factorization; we refer to this approach as MVLLE. This solution minimises the cost function by assuming a consensus weight matrix across all dataviews, as given in Eq. (9). The optimisation is then solved by using the Entropic Mirror Descent Algorithm (EMDA) (Beck & Teboulle, 2003). In contrast to mLLE and MVLLE, multiLLE combines the weight matrices obtained from each dataview, instead of the LLE embeddings. No comparisons were conducted between MVLLE and the proposed multiLLE, as the code of the MVLLE algorithm is not publicly available.
MultiISOMAP
ISOMAP aims to discover a lowdimensional embedding of highdimensional data by maintaining the geodesic distances between all points (Tenenbaum, Silva & Langford, 2000); it is often regarded as an extension of Multidimensional Scaling (MDS) (Kruskal, 1964). The ISOMAP algorithm comprises of the following three steps:

Step 1.
A graph is defined. Let $G\sim (V,E)$ define a neighbourhood graph, with vertices V representing all data points. The edge length between any two vertices $i,j\in V$ is defined by the distance metric ${d}_{X}(i,j)$, measured by the Euclidean distance. If a vertex $j$ does not belong to the K nearest neighbours of $i$, then ${d}_{X}(i,j)=\mathrm{\infty}$. The parameter K is given as input, and it represents the connectedness of the graph G; as K increases, more vertices are connected.

Step 2.
The shortest paths between all pairs of points in G are computed. The shortest path between vertices $i,j\in V$ is defined by ${d}_{G}(i,j)$. Let ${D}_{G}\in {\mathbb{R}}^{V\times V}$ be a matrix containing the shortest paths between any vertices $i,j\in V$, defined by $({D}_{G}{)}_{ij}={d}_{G}(i,j)$.
The most efficient known algorithm to perform this task is Dijkstra’s Algorithm (Dijkstra, 1959). In large graphs, an alternative approach to Dijkstra’s Algorithm would be to initialize ${d}_{G}(i,j)={d}_{X}(i,j)$ and replace all entries by ${d}_{G}(i,j)=min\left\{{d}_{G}(i,k),{d}_{G}(k,j)\right\}$.

Step 3.
The lowdimensional embeddings are constructed. The ${i}^{th}$ component of the lowdimensional embedding is given by ${y}_{i}=\sqrt{{\lambda}_{p}}{u}_{p}^{i}$, where ${u}_{p}^{i}$ the ${i}^{th}$ component of ${p}^{th}$ eigenvector and ${\lambda}_{p}$ is the ${p}^{th}$ eigenvalue in decreasing order of the the matrix $\tau ({D}_{G})$ (Tenenbaum, Silva & Langford, 2000). The operator, $\tau $ is defined by $\tau (D)=\frac{HSH}{2}$, where S is the matrix of squared distances defined by ${S}_{ij}={D}_{ij}^{2}$, and H is defined by ${H}_{ij}={\delta}_{ij}\frac{1}{N}$. This is equivalent to applying classical MDS to ${D}_{G}$, leading to a lowdimensional embedding that best preserves the manifold’s estimated intrinsic geometry.
MultiISOMAP is our proposal for adapting ISOMAP on multiview data. Let ${G}_{m}\sim (V,{E}_{m})$ be a neighbourhood graph obtained from dataview ${X}^{(m)}$ as defined in the first step of ISOMAP. All neighbourhood graphs are then combined into a single graph, $\stackrel{~}{G}$; this combination is achieved by computing the edge length as the averaged distance of each dataview, i.e. ${d}_{\stackrel{~}{G}}(i,j)={w}_{m}{\sum}_{m}{d}_{{G}_{m}}(i,j)$. Once a combined neighbourhood graph is computed, multiISOMAP follows steps 2 and 3 of ISOMAP described above. For simplicity, the weights throughout this article were set as ${w}_{m}=\frac{1}{M},\mathrm{\forall}m$. The multiISOMAP algorithm is presented in Algorithm 3 of Section 1 in the Supplemental File.
For completion, we have in addition adapted ISOMAP for multiview visualisation following the framework of both mSNE and mLLE. Following the same logic, mISOMAP combines the ISOMAP embeddings of each dataview by taking the weighted average of those embeddings as the unified lowdimensional embedding. In other words, the lowdimensional embedding $\hat{Y}$ is obtained by computing $\hat{Y}={\sum}_{m}{\beta}^{m}{Y}^{m}$, where ${\sum}_{m}{\beta}^{m}=1$.
Parameter tuning
The multiview manifold learning algorithms were tested on real and synthetic data sets for which the samples can be separated into several clusters. The true clusters are known and they were used to tune the parameters of the methods. To quantify the clustering performance, we used the following four extrinsic measures: (i) accuracy (ACC), (ii) Normalised Mutual Information (NMI) (Vinh, Epps & Bailey, 2010), (iii) Rand Index (RI) (Rand, 1971) and (iv) Adjusted Rand Index (ARI) (Hubert & Arabie, 1985). All measures take values in the range $\left[0,1\right]$, with $0$ expressing complete randomness, and $1$ perfect separation between clusters. The mathematical formulas of the four measures are presented in Section 2 of the Supplemental File.
SNE, LLE and ISOMAP depend on parameters of which their proper tuning ensues to optimal results. LLE and ISOMAP depend on the number of nearest neighbours (NN). SNE depends on the Perplexity ( $Perp$) parameter, which is directly related to the number of nearest neighbours. Similarly, the multiview extensions of the three methods depend on the same parameters. The choice of the parameter can influence the visualisations and in some cases present the data into separate maps (van der Maaten & Hinton, 2012).
By assuming that the data samples belong to a number of clusters that we seek to identify, the performance of the algorithms was measured for a range of tuning parameter values from the set $\left\{2,10,20,50,80,100,200\right\}$. Note that for all algorithms, the parameter value cannot exceed the total number of samples in the data.
For all manifold learning approaches, the following procedure was implemented to tune the optimal parameters of each method per data set:

1.
The method was applied for all parameter values in the set $\left\{2,10,20,50,80,100,200\right\}$.

2.
The Kmeans algorithm was applied to the lowdimensional embeddings produced for each parameter value

3.
The performance of the chosen method was evaluated quantitatively by computing ACC, NMI, RI and ARI for all tested parameter values.
The optimal parameter value was finally selected based on the evaluation measures. Section “Optimal Parameter Selection” explores how the different approaches are affected by their parameter values. For the other subsections of Section “Result”, the optimal parameter choice per approach was used for the comparison of the multiview approaches. Section “Optimal Parameter Selection” presents the process of parameter tuning on the synthetic data analysed, and measures the performance of singleview and multiview manifold learning algorithms. The same process was repeated for the real data analysed (Fig. S4 in Section 4.3 of the Supplemental File).
Data
Data sets with different characteristics were analysed to explore and compare the proposed multiview manifold learning algorithms under different scenarios (Table 1). The methods were evaluated on data sets that have a different number of dataviews, clusters and sample sizes. The generated synthetic data are considered homogeneous as they were generated under the same conditions and distributions. Both highdimensional ( $p\gg N$) and lowdimensional data sets were analysed. Through these comparisons, we wanted to investigate how the multiview methods perform and how they compare with singleview methods.
Data description  

Views (M)  Clusters ( $k$)  Features ( ${p}_{largest}$)  Samples (N)  Heterogeneous  High dimensional  
Data set  
Real  Cancer types  3  3  22,503  253  ✓  ✓ 
Caltech7  6  7  1,984  1,474  ✓  ✓  
Handwritten digits  6  10  240  2,000  ✓  ✗  
−−−−−−−  
Synthetic  MMDS  3  3  300  300  ✗  ✗ 
NDS  4  3  400  300  ✗  ✓  
MCS  3  5  300  500  ✗  ✗ 
In this section, we describe the synthetic and real data sets analysed in the manuscript. Some of the real data sets analysed have previously been used in the literature for examining different multiview algorithms, for example, data integration (Wang et al., 2014) and clustering (Ou et al., 2018).
Synthetic data
A motivational multiview example was constructed to qualitatively evaluate the performance of multiview manifold learning algorithms against their corresponding singleview algorithms. Its framework was designed specifically to produce distinct projections of the samples from each dataview. Additional synthetic data sets were generated to explore how the algorithms behave when the separation between the clusters exists, but it is not as explicit as in the motivational example.
All synthetic data were generated using the following process. For the same set of samples, a specified number of dataviews were generated, with each dataview capturing different information of the samples. Each dataview, $m$ follows a multivariate normal distribution with mean vector ${\text{\mu}}_{\mathbf{m}}=({\mu}_{1},\cdots ,{\mu}_{{p}_{m}}{)}^{T}$ and covariance matrix ${\mathrm{\Sigma}}_{m}={I}_{{p}_{m}}$, where ${p}_{m}$ is the number of features in the ${m}^{th}$ dataview. The matrix ${I}_{{p}_{m}}$ represents a ${p}_{m}\times {p}_{m}$ identity matrix. For each dataview, different ${\text{\mu}}_{\mathbf{m}}$ values were chosen to distinguish the clusters. Noise, ε, following a multivariate normal distribution with mean ${\mu}_{\epsilon}$ and covariance matrix ${\mathrm{\Sigma}}_{\epsilon}={I}_{Pm}$ was added to increase randomness within each dataview. Noise, $\epsilon $, increases the variability within a given dataview. The purpose of this additional variability is to assess whether the algorithms are able to equally capture information from all dataviews and are not biased towards the dataview(s) with a higher variability. Thus, noise, $\epsilon $, was only included in selected dataviews and opted out from the rest. Although this strategy is equivalent to sampling once using larger variance, the extra noise explicitly distinguishes the highly variable dataview with the rest.
In other words, $X\sim MVN({\mu}_{m},{\mathrm{\Sigma}}_{m})+\epsilon $, where MV N represents multivariate normal distribution. Distinct polynomial functions (e.g. $h(x)={x}^{4}+3{x}^{2}+5$) were randomly generated for each cluster in all dataviews and applied to the relevant samples to express nonlinearity and to distinguish them into clusters (e.g. the same polynomial function was used for all samples that belong in the same cluster). The last step was performed to ensure that linear dimensionality reduction methods (e.g. PCA) would not successfully cluster the data.
The three synthetic data sets with their characteristics are described next.
Motivational multiview data scenario
Assume that the truth underlying structure of the data separates the samples into three true clusters as presented in Fig. 1. Each synthetic dataview describes the samples differently, which results in three distinct clusterings, none of which reflects the global underlying truth. In particular, the first view separates only cluster C from the others (View 1 in Fig. 1), the second view separates only cluster B (View 2) and the third view separates only cluster A (View 3). In this scenario, only the third dataview contained an extra noise parameter, $\epsilon $, suggesting a higher variability than the other two dataviews.
Noisy dataview scenario
A synthetic data set which consists of four dataviews and three true underlying clusters was generated. The first three dataviews follow the same structure as MMDS, while the ${4}^{th}$ dataview represents a completely noisy dataview, i.e. with all data points lying in a single cluster. The rationale for creating such a data set is to examine the effect of the noisy data views in the multiview visualisation and clustering. This data set was used to show that the multiview approaches can identify not useful dataviews and discard them. For $n=300$ equally balanced data samples, the dataviews contain ${p}_{m}=100,\mathrm{\forall}m=1,2,3,4,$ features. To summarise, noisy dataview scenario (NDS) adds a noisy dataview to the MMDS data set.
More clusters than dataviews scenario
A synthetic data set that was generated similarly to MMDS but with five true underlying clusters instead of three. The true underlying structure of the each dataview is shown in Fig. 2. In this data set, ${p}_{v}=100,\mathrm{\forall}v$ features were generated on $n=500$, equally balanced data samples. In comparison with MMDS, more clusters than dataviews scenario (MCS) contains more clusters, but the same number of dataviews. Similarly to MMDS and NDS, in this scenario, only the third dataview contained an extra noise parameter, $\epsilon $, suggesting a higher variability than the other two dataviews.
Real data
The three real data sets analysed in the study are described below.
Cancer types (http://compbio.cs.toronto.edu/SNF/SNF/Software.html)
This data set includes $65$ patients with breast cancer, $82$ with kidney cancer and $106$ with lung cancer. For each patient the three dataviews are available: (a) genomics ( ${p}_{1}=10,299$ genes), (b) epigenomics ( ${p}_{2}=22,503$ methylation sites) and (c) transcriptomics ( ${p}_{3}=302$ miRNA sequences). The aim is to cluster patients by their cancer type (Wang et al., 2014).
Caltech7 (https://Github.com/yeqinglee/mvdata)
Caltech101 contains pictures of objects belonging to 101 categories. This publicly available subset of Caltech101 contains seven classes. It consists of 1,474 objects on six dataviews: (a) Gabor ( ${p}_{1}=48$), (b) wavelet moments ( ${p}_{2}=40$), (c) CENTRIST ( ${p}_{3}=254$), (d) histogram of oriented gradients ( ${p}_{4}=1,984$), (e) GIST ( ${p}_{5}=512$), and (f) local binary patterns ( ${p}_{6}=928$) (FeiFei, Fergus & Pietro, 2006).
Handwritten digits (https://archive.ics.uci.edu/ml/datasets/Multiple+Features)
This data set consists of features on handwritten numerals ( $09$) extracted from a collection of Dutch utility maps. Per class $200$ patterns have been digitised in binary images (in total there are 2,000 patterns). These digits are represented in terms of six dataviews: (a) Fourier coefficients of the character shapes ( ${p}_{1}=76$), (b) profile correlations ( ${p}_{2}=216$), (c) KarhunenLove coefficients ( ${p}_{3}=64$), (d) pixel averages in 2 × 3 windows ( ${p}_{4}=240$), (e) Zernike moments ( ${p}_{5}=47$) and (f) morphological features ( ${p}_{6}=6$) (Dua & Graff, 2017).
The handwritten digits data set is characterised by having perfectly balanced data samples; each of the 10 clusters contains exactly $200$ numerals. On the other hand, caltech7 is an imbalanced data set with the first two clusters containing many more samples than the other clusters. The number of samples in each cluster is {A: 435, B: 798, C: 52, D: 34, E: 35, F: 64, G: 56}. The performance of the methods was explored on both the imbalanced caltech7 data set and a balanced version of the data, for which $50$ samples from clusters A and B were randomly selected.
Results
In this section, we illustrate the application and evaluation of the proposed multiview extensions of tSNE, LLE and ISOMAP on real and synthetic data. Comparisons between the multiview solutions, along with their respective singleview solutions are implemented. A trivial solution would be to concatenate the features of all dataviews into a large single dataset and apply on that dataset a singleview manifold learning algorithm; such comparisons are explored in this section. Since it is likely that each dataview has different variance, each dataview was normalised before concatenation to ensure same variability across all dataviews. Normalisation was achieved by removing the mean and dividing by the standard deviation of each feature in all dataviews.
In the following subsections we have addressed the following:

1.
Can multiview manifold learning approaches obtain better visualisations than singleview approaches? The performance of the multiview approaches in visualising the underlying structure of the data is illustrated. Section “Optimal Parameter Selection” further illustrates how the underlying structure is misrepresented when individual data sets or the concatenated data set are visualised.

2.
The visualisations of multiview approaches are quantitatively evaluated using Kmeans. Section “Multiview Manifold Learning for Clustering” shows how extracting the low dimensional embeddings of the multiview approaches and inputting them as features in the clustering algorithm Kmeans. We have quantitatively evaluated the performance of the approaches for identifying underlying clusters and patterns within the data.

3.
The effect of the parameter values on the multiview manifold learning approaches was explored. As discussed the proposed multiview manifold approaches depend on a parameter that requires tuning. Section “Optimal Parameter Selection” presents a series of experiments that we investigated the effect that the parameter value has on each approach. This was done by exploring both the visualisations produced and by evaluating the clustering of the approaches for different parameter values.

4.
Should we use all available dataviews? If some dataviews contain more noise than signal, should we discard them? These are two crucial questions that concern every researcher working with multiview data; are all dataviews necessary and beneficial to the final outcome? Section “Optimal Number of Dataviews” we have addressed these questions by analysing data sets that contain noisy data. By investigating both the produced visualisations and evaluating the clusterings obtained with and without the noisy data, we discuss why it is not always beneficial to include all available data views.
The section ends by proposing alternative variations for the bestperforming approach, multiSNE. Firstly, a proposal for automatically computing the weights assigned to each dataview. In addition, we explore an alternative pretraining step for multiSNE, where instead of conducting PCA on each dataview, multiCCA is applied on the multiple dataviews for reducing their dimensions into a latent space of uncorrelated embeddings (Rodosthenous, Shahrezaei & Evangelou, 2020).
Comparison between singleview and multiview visualisations
Visualising multiview data can be trivially achieved either by looking at the visualisations produced by each dataview, or by concatenating all features into a long vector. TSNE, LLE and ISOMAP applied on every single dataview of the MMDS data set separately capture the correct local underlying structure of the respective dataview (Fig. 3). However, by design, they cannot capture the global structure of the data. $\mathrm{S}\mathrm{N}{\mathrm{E}}_{concat}$, $\mathrm{L}\mathrm{L}{\mathrm{E}}_{concat}$ and $\mathrm{I}\mathrm{S}\mathrm{O}\mathrm{M}\mathrm{A}{\mathrm{P}}_{concat}$ represent the trivial solutions of concatenating the features of all dataviews before applying tSNE, LLE and ISOMAP, respectively. These trivial solutions capture mostly the structure of the third dataview, because that dataview has a higher variability between the clusters than the other two.
MultiSNE, multiLLE and multiISOMAP produced the best visualisations out of all SNEbased, LLEbased and ISOMAPbased approaches, respectively. These solutions were able to separate clearly the three true clusters, with multiSNE showing the clearest separation between them. Even though mSNE separates the samples according to their corresponding clusters, this separation would not be recognisable if the true labels were unknown, as the clusters are not sufficiently separated. The visualisation by mLLE was similar to the ones produced by singleview solutions on concatenated features, while mISOMAP bundles all samples into a single cluster.
By visualising the MMDS data set via both singleview and multiview clustering approaches, multiSNE has shown the most promising results (Fig. 3). We have shown that singleview analyses may lead to conflicting results, while multiview approaches are able to capture the true underlying structure of the synthetic MMDS.
Multiview manifold learning for clustering
It is very common in studies to utilise the visualisation of data to identify any underlying patterns or clusters within the data samples. Here, it is illustrated how the multiview approaches can be used to identify such clusters. To quantify the visualisation of the data, we applied the Kmeans algorithm on the lowdimensional embeddings produced by multiview manifold learning algorithms. If the twodimensional embeddings can separate the data points to their respective clusters quantitatively with high accuracy via a clustering algorithm, then those clusters are expected to be qualitatively separated and visually shown in two dimensions. For all examined data sets (synthetic and real), the number of clusters (ground truth) within the samples is known, which attracts the implementation of Kmeans over alternative clustering algorithms. The number of clusters was used as the input parameter, K, of the Kmeans algorithm and by computing the clustering measures we evaluated whether the correct sample allocations were made.
The proposed multiSNE, multiLLE, and multiISOMAP approaches were found to outperform their competitive multiview extensions (mSNE, mLLE, mISOMAP) as well as their concatenated versions ( $\mathrm{S}\mathrm{N}{\mathrm{E}}_{concat}$, $\mathrm{L}\mathrm{L}{\mathrm{E}}_{concat}$, $\mathrm{I}\mathrm{S}\mathrm{O}\mathrm{M}\mathrm{A}{\mathrm{P}}_{concat}$) (Tables 2 and 3). For the majority of the data sets the multiSNE approach was found to overall outperform all other approaches.
Data Set  Algorithm  Accuracy  NMI  RI  ARI 

Handwritten digits  $\mathrm{S}\mathrm{N}{\mathrm{E}}_{concat}$ (Perp = 10)  0.717 (0.032)  0.663 (0.013)  0.838 (0.005)  0.568 (0.026) 
mSNE (Perp = 10)  0.776 (0.019)  0.763 (0.009)  0.938 (0.004)  0.669 (0.019)  
multiSNE* (Perp = 10)  0.882 (0.008)  0.900 (0.005)  0.969 (0.002)  0.823 (0.008)  
$\mathrm{L}\mathrm{L}{\mathrm{E}}_{concat}$ (NN = 10)  0.562  0.560  0.871  0.441  
mLLE (NN = 10)  0.632  0.612  0.896  0.503  
multiLLE (NN = 5)  0.614  0.645  0.897  0.524  
$\mathrm{I}\mathrm{S}\mathrm{O}\mathrm{M}\mathrm{A}{\mathrm{P}}_{concat}$ (NN = 20)  0.634  0.619  0.905  0.502  
mISOMAP (NN = 20)  0.636  0.628  0.898  0.477  
multiISOMAP (NN = 5)  0.658  0.631  0.909  0.518  
Caltech7  $\mathrm{S}\mathrm{N}{\mathrm{E}}_{concat}$ (Perp = 50)  0.470 (0.065)  0.323 (0.011)  0.698 (0.013)  0.290 (0.034) 
mSNE* (Perp = 10)  0.542 (0.013)  0.504 (0.029)  0.757 (0.010)  0.426 (0.023)  
multiSNE (Perp = 80)  0.506 (0.035)  0.506 (0.006)  0.754 (0.009)  0.428 (0.022)  
$\mathrm{L}\mathrm{L}{\mathrm{E}}_{concat}$ (NN = 100)  0.425  0.372  0.707  0.305  
mLLE (NN = 5)  0.561  0.348  0.718  0.356  
multiLLE (NN = 80)  0.638  0.490  0.732  0.419  
$\mathrm{I}\mathrm{S}\mathrm{O}\mathrm{M}\mathrm{A}{\mathrm{P}}_{concat}$ (NN = 20)  0.408  0.167  0.634  0.151  
mISOMAP (NN = 5)  0.416  0.306  0.686  0.261  
multiISOMAP (NN = 10)  0.519  0.355  0.728  0.369  
Caltech7 (balanced)  $\mathrm{S}\mathrm{N}{\mathrm{E}}_{concat}$ (Perp = 80)  0.492 (0.024)  0.326 (0.018)  0.687 (0.023)  0.325 (0.015) 
mSNE (Perp = 10)  0.581 (0.011)  0.444 (0.013)  0.838 (0.022)  0.342 (0.016)  
multiSNE* (Perp = 20)  0.749 (0.008)  0.686 (0.016)  0.905 (0.004)  0.619 (0.009)  
$\mathrm{L}\mathrm{L}{\mathrm{E}}_{concat}$ (NN = 20)  0.567  0.348  0.725  0.380  
mLLE (NN = 10)  0.403  0.169  0.617  0.139  
multiLLE (NN = 5)  0.622  0.454  0.710  0.391  
$\mathrm{I}\mathrm{S}\mathrm{O}\mathrm{M}\mathrm{A}{\mathrm{P}}_{concat}$ (NN = 5)  0.434  0.320  0.791  0.208  
mISOMAP (NN = 5)  0.455  0.299  0.797  0.224  
multiISOMAP (NN = 5)  0.548  0.368  0.810  0.267  
Cancer types  $\mathrm{S}\mathrm{N}{\mathrm{E}}_{concat}$ (Perp = 10)  0.625 (0.143)  0.363 (0.184)  0.301 (0.113)  0.687 (0.169) 
mSNE (Perp = 10)  0.923 (0.010)  0.839 (0.018)  0.876 (0.011)  0.922 (0.014)  
multiSNE* (Perp = 20)  0.964 (0.007)  0.866 (0.023)  0.902 (0.005)  0.956 (0.008)  
$\mathrm{L}\mathrm{L}{\mathrm{E}}_{concat}$ (NN = 10)  0.502  0.122  0.091  0.576  
mLLE (NN = 20)  0.637  0.253  0.235  0.647  
multiLLE (NN = 10)  0.850  0.567  0.614  0.826  
$\mathrm{I}\mathrm{S}\mathrm{O}\mathrm{M}\mathrm{A}{\mathrm{P}}_{concat}$ (NN=5)  0.384  0.015  0.009  0.556  
mISOMAP (NN = 10)  0.390  0.020  0.013  0.558  
multiISOMAP (NN = 50)  0.514  0.116  0.093  0.592 
Data set  Algorithm  Accuracy  NMI  RI  ARI 

NDS  $\mathrm{S}\mathrm{N}{\mathrm{E}}_{concat}$ (Perp = 80)  0.747 (0.210)  0.628 (0.309)  0.817 (0.324)  0.598 (0.145) 
mSNE (Perp = 50)  0.650 (0.014)  0.748 (0.069)  0.766 (0.022  0.629 (0.020)  
multiSNE* (Perp = 80)  0.989 (0.006)  0.951 (0.029)  0.969 (0.019)  0.987 (0.009)  
$\mathrm{L}\mathrm{L}{\mathrm{E}}_{concat}$ (NN = 5)  0.606 (0.276)  0.477 (0.357)  0.684 (0.359)  0.446 (0.218)  
mLLE (NN = 20)  0.685 (0.115)  0.555 (0.134)  0.768 (0.151)  0.528 (0.072))  
multiLLE (NN = 20)  0.937 (0.044)  0.768 (0.042)  0.922 (0.028)  0.823 (0.047)  
$\mathrm{I}\mathrm{S}\mathrm{O}\mathrm{M}\mathrm{A}{\mathrm{P}}_{concat}$ (NN = 100)  0.649 (0.212)  0.528 (0.265)  0.750 (0.286)  0.475 (0.133)  
mISOMAP (NN = 5)  0.610 (0.234)  0.453 (0.221)  0.760 (0.280)  0.386 (0.138)  
multiISOMAP (NN = 300)  0.778 (0.112)  0.788 (0.234)  0.867 (0.194)  0.730 (0.094)  
MCS  $\mathrm{S}\mathrm{N}{\mathrm{E}}_{concat}$ (Perp = 200)  0.421 (0.200)  0.215 (0.185)  0.711 (0.219)  0.173 (0.089) 
mSNE (Perp = 2)  0.641 (0.069)  0.670 (0.034)  0.854 (0.080)  0.575 (0.055)  
multiSNE* (Perp = 50)  0.919 (0.046)  0.862 (0.037)  0.942 (0.052)  0.819 (0.018)  
$\mathrm{L}\mathrm{L}{\mathrm{E}}_{concat}$ (NN = 50)  0.569 (0.117)  0.533 (0.117)  0.796 (0.123)  0.432 (0.051)  
mLLE (NN = 20)  0.540 (0.079)  0.627 (0.051)  0.819 (0.077)  0.487 (0.026)  
multiLLE (NN = 20)  0.798 (0.059)  0.647 (0.048)  0.872 (0.064)  0.607 (0.022)  
$\mathrm{I}\mathrm{S}\mathrm{O}\mathrm{M}\mathrm{A}{\mathrm{P}}_{concat}$ (NN = 150)  0.628 (0.149)  0.636 (0.139)  0.834 (0.167)  0.526 (0.071)  
mISOMAP (NN = 5)  0.686 (0.113)  0.660 (0.106)  0.841 (0.119)  0.565 (0.051)  
multiISOMAP (NN = 300)  0.717 (0.094)  0.630 (0.101)  0.852 (0.118)  0.570 (0.044) 
Figure 4 shows a comparison between the true clusters of the handwritten digits data set and the clusters identified by Kmeans. The clusters reflecting the digits 6 and 9 are clustered together, but all remaining clusters are well separated and agree with the truth.
MultiSNE applied on caltech7 produces a good visualisation, with clusters A and B being clearly separated from the rest (Fig. 5B). Clusters C and G are also wellseparated, but the remaining three clusters are bundled together. Applying Kmeans to that lowdimensional embedding does not capture the true structure of the data (Table 2). It provides a solution with all clusters being equally sized (Fig. 5A) and thus its quantitative evaluation is misleading. Motivated by this result, we have further explored the performance of proposed approaches on a balanced version of the caltech7 data set (generated as described in the Section “Real Data”).
Similarly to the visualisation of the original data set, the visualisation of the balanced caltech7 data set shows clusters A, B, C and G to be wellseparated, while the remaining are still bundled together (Figs. 5C and 5D).
Through the conducted work, it was shown that the multiview approaches proposed in the manuscript generate lowdimensional embeddings that can be used as input features in a clustering algorithm (as for example the Kmeans algorithm) for identifying clusters that exist within the data set. We have illustrated that the proposed approaches outperform existing multiview approaches and the visualisations produced by multiSNE are very close to the ground truth of the data sets.
Alternative clustering algorithms, that do not require the number of clusters as input, can be considered as well. For example, Densitybased spatial clustering of applications with noise (DBSCAN) measures the density around each data point and does not require the true number of clusters as input (Ester et al., 1996). In situations, where the true number of clusters is unknown, DBSCAN would be preferable over Kmeans. For completeness of our work, DBSCAN was applied on two of the real data sets explored, with similar results observed as the ones with Kmeans. The proposed multiSNE approach was the bestperforming method of partitioning the data samples. The analysis using DBSCAN can be found in Figs. S9 and S10 of Section 4.8 in the Supplemental File.
An important observation made was that caution needs to be taken when data sets with imbalanced clusters are analysed as the quantitative performance of the approaches on such data sets is not very robust.
Optimal parameter selection
SNE, LLE and ISOMAP depend on a parameter that requires tuning. Even though the parameter is defined differently in each algorithm, for all three algorithms it is related to the nearest number of global neighbours. As described earlier, the optimal parameter was found by comparing the performance of the methods on a range of parameter values, $S=\left\{2,10,20,50,80,100,200\right\}$. In this section, the synthetic data sets, NDS and MCS, were analysed, because both data sets separate the samples into known clusters by design and evaluation via clustering measures would be appropriate.
To find the optimal parameter value, the performance of the algorithms was evaluated by applying Kmeans on the lowdimensional embeddings and comparing the resulting clusterings against the truth. Once the optimal parameter was found, we confirmed that the clusters were visually separated by manually looking at the twodimensional embeddings. Since the data in NDS and MCS are perfectly balanced and were generated for clustering, this approach can effectively evaluate the data visualisations.
On NDS, singleview SNE, LLE and ISOMAP algorithms produced a misclustering error of $0.3$, meaning that a third of the samples was incorrectly clustered (Fig. 6B). This observation shows that singleview methods capture the true local underlying structure of each synthetic dataview. The only exception for NDS is the fourth dataview, for which the error is closer to $0.6$, i.e. randomly assigns the clusters (which follows the simulation design, as it was designed to be a random dataview). After concatenating the features of all dataviews, the performance of singleview approaches remains poor (Fig. 6A). The variance of the misclustering error on this solution is much greater, suggesting that singleview manifold learning algorithms on concatenated data are not robust and thus not reliable. Increasing the noise level (either by incorporating additional noisy dataviews, or by increasing the dimensions of the noisy dataview) in this synthetic data set had little effect on the overall performance of the multiview approaches (Table S2, Figs. S5, S6 in Section 4.4. of the Supplemental File).
On both NDS and MCS, multiLLE and multiSNE were found to be sensitive to the choice of their corresponding parameter value (Figs. 6A and 7). While multiLLE performed the best when the number of nearest neighbours was low, multiSNE provided better results as perplexity was increasing. On the other hand, multiISOMAP had the highest NMI value when the parameter was high.
Overall, ISOMAPbased multiview algorithms were found to be more sensitive to their tuning parameter (number of nearest neighbours). This observation is justified by the higher variance observed in the misclustering error on the optimal parameter value throughout the simulation runs (Fig. 6B). The performance of ISOMAPbased methods improved as the parameter value increased (Fig. 7). However, they were outperformed by multiLLE and multiSNE for both synthetic data sets.
Out of the three manifold learning foundations, LLEbased approaches mostly depend on their parameter value to produce the optimal outcome. Specifically, their performance dropped when the parameter value was between $20$ and $100$ (Fig. 6A). When the number of nearest neighbours was set to be greater than $100$ their performance started to improve. Out of all LLEbased algorithms, the highest NMI and lowest misclustering error was obtained by multiLLE (Figs. 6 and 7). Our observations on the tuning parameters of LLEbased approaches are in agreement with earlier studies (Karbauskaitė, Kurasova & Dzemyda, 2007; ValenciaAguirre et al., 2009). Both Karbauskaitė, Kurasova & Dzemyda (2007) and ValenciaAguirre et al. (2009) found that LLE performs best with low nearest number of neighbours and their conclusions reflect the performance of multiLLE; best performed on low values of the tuning parameter. Even though mSNE performed better than singleview methods in terms of both clustering and error variability, multiSNE produced the best results (Figs. 6 and 7). In particular, multiSNE outperformed all algorithms presented in this study on both NDS and MCS. Even though it performed poorly for low perplexity values, its performance improved for $Perp\ge 20$. MultiSNE was the algorithm with the lowest error variance, making it a robust and preferable solution.
The four implemented measures (Accuracy, NMI, RI and ARI) use the true clusters of the samples to evaluate the clustering performance. In situations where cluster allocation is unknown, alternative clustering evaluation measures can be used, such as the Silhouette score (Rousseeuw, 1987). The Silhouette score in contrast to the other measures does not require as input the cluster allocation and is a widely used approach for identifying the best number of clusters and clustering allocation in an unsupervised setting.
Evaluating the clustering performance of the methods via the Silhouette score agrees with the other four evaluation measures, with multiSNE producing the highest value out of all multiview manifold learning solutions. The Silhouette score of all methods applied on the MCS data set can be found in Fig. S8 of Section 4.7 in the Supplemental File.
The same process of parameter tuning was implemented for the real data sets and their performance is presented in Fig. S4 of Section 4.3 in the Supplemental File. In contrast to the synthetic data, multiSNE on cancer types data performed the best at low perplexity values. For the remaining data sets, its performance was stable for all parameter values. With the exception of cancer types data, the performance of LLEbased solutions follows their behaviour on synthetic data.
Optimal number of dataviews
It is common to think that more information would lead to better results, and in theory that should be the case. However, in practice that is not always true (Kumar, Rai & Hal, 2011). Using the cancer types data set, we explored whether the visualisations and clusterings are improved if all or a subset of the dataviews are used. With three available dataviews, we implemented a multiview visualisation on three combinations of two dataviews and a single combination of three dataviews.
The genomics dataview provides a reasonably good separation of the three cancer types, whereas miRNA dataview fails in this task, as it provides a visualisation that reflects random noise (first column of plots in Fig. 8). This observation is validated quantitatively by evaluating the produced tSNE embeddings (Table S3 in Section 4.5 of the Supplemental File). Concatenating features from the different dataviews before implementing tSNE does not improve the final outcome of the algorithm, regardless of the dataview combination.
Overall, multiview manifold learning algorithms have improved the data visualisation to a great extent. When all three dataviews are considered, both multiSNE and mSNE provide a good separation of the clusters (Fig. 8). However, the true cancer types can be identified perfectly when the miRNA dataview is discarded. In other words, the optimal solution in this data set is obtained when only genomics and epigenomics dataviews are used. That is because miRNA dataview contains little information about the cancer types and adds random noise, which makes the task of separating the data points more difficult.
This observation was also noted between the visualisations of MMDS and NDS (Fig. 9). The only difference between the two synthetic data sets is the additional noisy dataview in NDS. Even though NDS separates the samples to their corresponding clusters, the separation is not as clear as it is in the projection of MMDS via multiSNE. In agreement with the exploration of the cancer types data set, it is favourable to discard any noisy dataviews in the implementation of multiview manifold learning approaches.
It is not always a good idea to include all available dataviews in multiview manifold learning algorithms; some dataviews may provide noise which would result in a worse visualisation than discarding those dataviews entirely. The noise of a dataview with unknown labels may be viewed in a singleview tSNE plot (all datapoints in a single cluster), or identified, if possible, via quantification measures such as signaltonoise ratio.
MultiSNE variations
This section presents two alternative variations of multiSNE, including automatic weight adjustments and multiCCA as a pretraining step for reducing the dimensions of the input dataviews.
Automated weight adjustments
A simple weightupdating approach is proposed based on the KLdivergence measure from each dataview. This simple weightupdating approach guarantees that more weight is given to the dataviews producing lower KLdivergence measures and that no dataview is being completely discarded from the algorithm.
Recall that $KL(PQ)\in [0,\mathrm{\infty})$, with $KL(PQ)=0$, if the two distributions, P and Q, are perfectly matched. Let $\mathbf{k}=({k}^{(1)},\cdots ,{k}^{(M)})$ be a vector, where ${k}^{(m)}=KL({P}^{(m)}Q),\mathrm{\forall}m=\{1,\cdots ,M\}$ and initialise the weight vector $\mathbf{w}=({w}^{(1)},\cdots ,{w}^{(M)})$ by ${w}^{(m)}=\frac{1}{M},\mathrm{\forall}m$. To adjust the weights of each dataview, the following steps are performed at each iteration:

1.
Normalise KLdivergence by ${k}^{(m)}=\frac{{k}^{(m)}}{{\sum}_{i}^{M}{k}^{(i)}}$. This step ensures that ${k}^{(m)}\in [0,1],\mathrm{\forall}m$ and that ${\sum}_{m}{k}^{(m)}=1$.

2.
Measure the weights for each dataview by ${w}^{(m)}=1{k}^{(m)}$. This step ensures that the dataview with the lowest KLdivergence value receives the highest weight.
Based on the analysis in Section “Optimal Number of Dataviews”, we know that cancer types and NDS data sets contain noisy dataviews and thus multiSNE performs better when they are entirely discarded. Here, we assume that this information is unknown and the proposed weightupdating approach is implemented on those two data sets to test if the weights are being adjusted correctly according to the noise level of each dataview.
The proposed weightadjustment process, which looks at the produced KLdivergence between each dataview and the lowdimensional embeddings, distinguishes which dataviews contain the most noise and the weight values are updated accordingly (Fig. 10B). In cancer types, transcriptomics (miRNA) receives the lowest weight, while genomics (Genes) was given the highest value. This weight adjustment comes in agreement with the qualitative (tSNE plots) and quantitative (clustering) evaluations performed in Section “Optimal Number of Dataviews”. In NDS, ${X}^{(4)}$ which represents the noisy dataview received the lowest weight, and the other dataviews had around the same weight value, as they all impact the final outcome equally.
The proposed weightadjustment process updates the weights at each iteration. For the first $100$ iterations, the weights are not changing, as the algorithm adjusts to the produced lowdimensional embeddings (Fig. 10B). In NDS, the weights converge after $250$ iterations, while in cancer types, they are still being updated even after $1,000$ iterations. The changes recorded are small and the weights can be said to have stabilised.
The lowdimensional embeddings produced in NDS with weight adjustments separate clearly the three clusters, an observation missed without the implementation of the weightupdating approach (Fig. 10A); it resembles the MMDS (i.e. without noisy dataview) multiSNE plot (Fig. 9). The automatic weightadjustment process identifies the informative dataviews, by allocating them a higher weight value than to the noisy dataviews. This observation was found to be true even when a dataset contains more noise than informative dataviews (Table S2, Figs. S5, S6 in Section 4.4 of the Supplemental File).
The embeddings produced using multiSNE with the automated weight adjustments in cancer types do not separate the three clusters as clearly as multiSNE without the noisy dataview. A more clear separation is obtained compared to multiSNE on the complete data set without weight adjustments.
The weights produced by this weight adjustment approach can indicate the importance of each dataview in the final lowerdimensional embedding. For example, dataviews with very low weights may be assumed futile and a better visualisation may be produced if those dataviews are discarded. The actual weights assigned to each dataview do not have any further meaning.
MultiCCA as pretraining
As mentioned earlier, van der Maaten & Hinton (2008) proposed the implementation of PCA as a pretraining step for tSNE to reduce the computational costs, provided that the fraction of variance explained by the principal components is high. In this article, pretraining via PCA was implemented in all variations of SNE. Alternative linear dimensionality reduction methods may be considered, especially for multiview data. In addition to reducing the dimensions of the original data, such methods can capture information between the dataviews. For example, canonical correlation analysis (CCA) captures relationships between the features of two dataviews by producing two latent lowdimensional embeddings (canonical vectors) that are maximally correlated between them (Hotelling, 1936; Rodosthenous, Shahrezaei & Evangelou, 2020). Rodosthenous, Shahrezaei & Evangelou (2020) demonstrated that multiCCA, an extension of CCA that analyses multiple (more than two) dataviews, would be preferable as it reduces overfitting.
This section demonstrates the application of multiCCA as pretraining in replacement of PCA. This alteration of the multiSNE algorithm was implemented on the handwritten digits data set. MultiCCA was applied on all dataviews, with six canonical vectors produced for each dataview (in this particular data set $min({p}_{1},{p}_{2},{p}_{3},{p}_{4},{p}_{5},{p}_{6})=6$). The variation of multiCCA proposed by Witten & Tibshirani (2009) was used for the production of the canonical vectors, as it is computationally cheaper compared to others (Rodosthenous, Shahrezaei & Evangelou, 2020). By using these vectors as input features, multiSNE produced a qualitatively better visualisation than using the principal components as input features (Fig. 11). By using an integrative algorithm as pretraining, all $10$ clusters are clearly separated, including six and nine. Quantitatively, clustering via KMeans was evaluated with ACC = 0.914, NMI = 0.838, RI = 0.968, ARI = 0.824. This evaluation suggests that quantitatively, it performed better than the $10$dimensional embeddings produced multiSNE with PCA as pretraining.
Comparison of multiSNE variations
Section “MultiSNE” introduced multiSNE, a multiview extension of tSNE. In Sections “Automated weight adjustments” and “MultiCCA as pretraining”, two variations of multiSNE are presented. The former implements a weightadjustment process which at each iteration updates the weights allocated for each dataview, and the latter uses multiCCA instead of PCA as a pretraining step. In this section, multiSNE and its two variations are compared to assess whether the variations introduced to the algorithm perform better than the initial proposal.
The implementation of the weightadjustment process improved the performance of multiSNE on all real data sets analysed (Table 4). The influence of multiCCA as a pretraining step produced inconsistent results; in some data sets this step boosted the clustering performance of multiSNE (e.g. handwritten digits), while for the other data sets, it did not (e.g. cancer types). From this analysis, we conclude that adjusting the weights of each dataview always improves the performance of multiSNE. On the other hand, the choice of pretraining, either via PCA or multiCCA, is not clear, and it depends on the data at hand.
Variation  Handwritten digits  Cancer types  Caltech7 original  Caltech7 balanced  NDS  MCS 

MultiSNE without weightadjustment  0.822  0.964  0.506  0.733  0.989 (0.006)  0.919 (0.046) 
MultiSNE with weightadjustment  0.883  0.994  0.543  0.742  0.999 (0.002)  0.922 (0.019) 
MultiCCA multiSNE without weightadjustment  0.901  0.526  0.453  0.713  0.996(0.002)  0.993 (0.005) 
MultiCCA multiSNE with weightadjustment 
0.914  0.562  0.463  0.754  0.996 (0.002)  0.993 (0.005) 
Multiomics singlecell data analysis
An important area of active current research, where manifold learning approaches, such as tSNE, as visualisation tools are commonly used is singlecell sequencing (scRNAseq) and genomics. Last few years, have seen fast developments of multiomics singlecell methods, where for example for the same cells multiple omics measurements are being obtained such as transcripts by scRNAseq and chromatin accessibility by a method known as scATACseq (Stuart et al., 2019). As recently discussed the integration of this kind of multiview singlecell data poses unique and novel statistical challenges (Argelaguet et al., 2021). We, therefore, believe our proposed multiview methods will be very useful in producing an integrated visualisation of cellular heterogeneity and cell types studied by multiomics singlecell methods in different tissues, in health and disease.
To illustrate the capability of multiSNE for multiomics singlecell data, we applied multiSNE on a representative data set of scRNAseq and ATACseq for human peripheral blood mononuclear cells (PBMC) (https://support.10xgenomics.com/singlecellmultiomeatacgex/datasets/1.0.0/pbmc_granulocyte_sorted_10k) (Fig. 12). MultiSNE produced more intelligible projections of the cells compared to mSNE and achieved higher evaluation scores (Fig. S1 in Section 3 of the Supplemental File).
To test the quality of the obtained multiview visualisation, we compared its performance against the multiview clustering approach proposed by Liu et al. (2013) on this singlecell data. A balanced subset of this data set was used, which consists of two dataviews on 9,105 cells (scRNAseq and ATACseq with 36,000 and 108,000 features, respectively). A detailed description of this data set, the preprocessing steps performed, and the projections of tSNE and multiSNE on the original data are provided in Fig. S1 of Section 3 in the Supplemental File. We found multiSNE to have the highest accuracy (and a close NMI to the approach by Liu et al. (2013)) as seen in Fig. 12. Qualitatively, the projections by tSNE on scRNAseq and multiSNE are similar, but multiSNE separates the clusters better (especially between CD4 and CD8 cell types (Figs. 12, and S1 in Section 3 of the Supplemental File). While it is known that ATACseq data is noisier and has less information by itself, we see that integration of the dataviews results in better overall separation of the different cell types in this data set. These results indicate the promise of multiSNE as a unified multiview and clustering approach for multiomics singlecell data.
Discussion
In this manuscript, we propose extensions of the wellknown manifold learning approaches tSNE, LLE, and ISOMAP for the visualisation of multiview data sets. These three approaches are widely used for the visualisation of highdimensional and complex data sets on performing nonlinear dimensionality reduction. The increasing number of multiple data sets produced for the same samples in different fields, emphasises the need for approaches that produce expressive presentations of the data. We have illustrated that visualising each data set separately from the rest is not ideal as it does not reveal the underlying patterns within the samples. In contrast, the proposed multiview approaches can produce a single visualisation of the samples by integrating all available information from the multiple dataviews. Python and R (only for multiSNE) code of the proposed solutions can be found in the links provided in Section 5 of the Supplemental File.
Multiview visualisation has been explored in the literature with a number of approaches proposed in recent years. In this work, we propose multiview visualisation approaches that extend the wellknown manifold approaches: tSNE, LLE, and ISOMAP. Through a comparative study of real and synthetic data, we have illustrated that the proposed approach, multiSNE, provides a better and more robust solution compared to the other tested approaches proposed in the manuscript (multiLLE and multiISOMAP) and the approaches proposed in the literature including mLLE, mSNE, MVtSNE2, jSNE, jUMAP (additional results in Figs. S2 and S3 of Sections 4.1 and 4.2, respectively, in the Supplemental File). Although multiSNE was computationally the most expensive multiview manifold learning algorithm (Table S4 in Section 4.6 of the Supplemental File), it was found to be the solution with superior performance, both qualitatively and quantitatively. All tSNE results presented in this manuscript were based on the original R implementation (https://cran.rproject.org/web/packages/tsne/) and verified by the original Python implementation (https://lvdmaaten.github.io/tsne/); multiSNE was based on the original tSNE implementation and not on any other variation of the algorithm. A future direction is the exploration of other variations of the tSNE algorithm for the implementation of multiSNE.
It was widely known that tSNE can not deal with highdimensional data and we have therefore implemented the suggestion of van der Maaten & Hinton (2008) and applied firstly PCA to obtain a smaller dataset that tSNE is subsequently applied to. A similar procedure was implemented for multiSNE. As part of our exploration of multiSNE we proposed the use of multiCCA for obtaining the lower dimensional datasets. The use of the multiCCA step provided mixed results, that suggest the need for further exploration of an appropriate dimensionality reduction approach, whether it is a multiview one vs. a singleview one, or whether is a nonlinear one vs. a linear one.
We have utilised the lowdimensional embeddings of the proposed algorithms as features in the Kmeans clustering algorithm, which we have used (1) to quantify the visualisations produced, and (2) to select the optimal tuning parameters for the manifold learning approaches. By investigating synthetic and real multiview data sets, each with different data characteristics, we concluded that multiSNE provides a more accurate and robust solution than any other singleview and multiview manifold learning algorithms we have considered. Specifically, multiSNE was able to produce the best data visualisations of all data sets analysed in this article. MultiLLE provides the secondbest solution, while multiview ISOMAP algorithms have not produced competitive visualisations. By exploring several data sets, we concluded that multiview manifold learning approaches can be effectively applied to heterogeneous and highdimensional data (i.e. $p\gg n$).
Through the conducted experiments, we have illustrated the effect of the parameters on the performance of the methods. We have shown that SNEbased methods perform the best when perplexity is in the range $\left[20,100\right]$, LLEbased algorithms should take a small number of nearest neighbours, in the range $\left[10,50\right]$, while the parameter of ISOMAPbased should be in the range $\left[100,N\right]$, where N is the number of samples.
We believe that the best approach to selecting the tuning parameters of the methods is to explore a wide range of different parameter values and assess the performance of the methods both qualitatively and quantitatively. If the produced visualisations vary a lot between a range of parameter values, then the data might be too noisy, and the projections misleading. In this case, it might be beneficial to look at the weights obtained for each dataview and explore removing the noisiest dataviews (depending on the number of dataviews used and/or existing knowledge of noise in the data). Otherwise (if the produced visualisations vary slightly between various parameter values), the parameter value with the best qualitative and quantitative performance can be selected. Since tSNE (and its extensions) are robust to perplexity (van der Maaten & Hinton, 2008), a strict optimal parameter value would not be necessary to produce meaningful visualisations and clusterings, i.e. identical performance qualitatively and quantitatively can be observed for a range of values.
Cao & Wang (2017) proposed an automatic approach for selecting the perplexity parameter of tSNE. According to the authors the trade between the final KL divergence and perplexity value can lead to good embeddings, and they proposed the following criterion:
(10) $$S(Perp)=2KL(PQ)+\mathrm{log}(n)\frac{Perp}{n}$$
This solution can be extended to automatically select the multiSNE perplexity, by modifying the criterion to:
(11) $$S(Perp)=2\sum _{m}KL({P}^{(m)}Q)+\mathrm{log}(n)\frac{Perp}{n}$$
Our conclusions about the superiority of multiSNE have been further supported by implementing the Silhouette score as an alternative approach for evaluating the clustering and tuning the parameters of the methods. In contrast to the measures used throughout the article, the Silhouette score does not take into account the number of clusters that exist in the data set, illustrating the applicability of multiSNE approach in unsupervised learning problems where the underlying clusters of the samples are not known (Fig. S8 in Section 4.7 of the Supplemental File). Similarly, we have illustrated that alternative clustering algorithms can be implemented for clustering the samples. By inputting the produced multiSNE embeddings in the DBSCAN algorithm we further illustrated how the clusters of the samples can be identified (Figs. S9 and S10 in Section 4.8 of the Supplemental File).
As part of our work, we explored the application of multiSNE for the visualisation and clustering of multiOMICS singlecell data. We illustrated that the integration of the dataviews results to better overall separation of the different cell types in the analysed dataset. Pursuing further this area of research it would be interesting to compare the proposed multiSNE approach with deep generative models proposed for the analysis of singlecell data including the approaches proposed by Ashuach et al. (2023) and Li et al. (2022).
Multiview clustering is a topic that has gathered a lot of interest in recent years with a number of approaches published in the literature. Such approaches include the ones proposed by Kumar, Rai & Hal (2011), Liu et al. (2013), Sun et al. (2015), Ou et al. (2016) and Ou et al. (2018). The handwritten data set presented in the manuscript has been analysed by the aforementioned studies for multiview clustering. Table 5 shows the NMI and accuracy values of the clusterings performed by the multiview clustering algorithms (these values are as given in the corresponding articles). In addition, the NMI and accuracy values of the Kmeans clustering applied on the multiSNE lowdimensional embeddings (from 2 to 10 dimensions) are presented in the table. On handwritten digits, the multiSNE variation with multiCCA as pretraining and weight adjustments had the best performance (Table 4). This variation of multiSNE with Kmeans was compared against the multiview clustering algorithms and it was found to be the most accurate, while pretraining with PCA produced the highest NMI (Table 5). By applying Kmeans to the lowdimensional embeddings of multiSNE can successfully cluster the observations of the data (Fig. S7 in Section 4.6 of the Supplemental File for a threedimensional visualisation via multiSNE).
Multiview clustering on handwritten digits data set  

Kumar, Rai & Hal (2011)  Liu et al. (2013)  Sun et al. (2015)  Ou et al. (2016)  Ou et al. (2018)  MultiSNE with PCA/multiCCA  
2D  3D  5D  10D  
NMI  0.768  0.804  0.876  0.785  0.804  0.863/0.838  0.894/0.841  0.897/0.848  0.899/0.850 
ACC  –  0.881  –  0.876  0.880  0.822/0.914  0.848/0.915  0.854/0.922  0.849/0.924 
The increasing number of multiview, highdimensional and heterogeneous data requires novel visualisation techniques that integrate this data into expressive and revealing representations. In this manuscript, new multiview manifold learning approaches are presented and their performance across real and synthetic data sets with different characteristics was explored. The multiSNE approach is proposed to provide a unified solution for robust visualisation and subsequent clustering of multiview data.