Privacy-preserving k-NN interpolation over two encrypted databases

View article
PeerJ Computer Science

Introduction

Due to its low cost, scalability and reliability, cloud computing has increased its reputation in both the business and scientific communities. In addition to the benefits, it introduces new concerns that need to be addressed carefully (Krutz & Vines, 2010). One of the emerging issues in cloud computing is extracting knowledge from sensitive data while protecting the privacy of data owners, which is called privacy-preserving data mining (Agrawal & Srikant, 2000; Vaidya & Clifton, 2004). A privacy-preserving data mining method aims to provide data privacy using either data perturbation or cryptographic methods. Data perturbation-based models struggle with data quality issues, i.e. the valuable statistical information might be dissolved. This may yield less accurate and less reliable results. On the other hand, cryptographic-based models achieves the privacy of data owners through the encryption of data before outsourcing it to the cloud. However, this presents challenges of performing required operations over the encrypted data.

In addition to these facts, collaboration among different cloud service providers may also help them to create more accurate and reliable results in a privacy-preserving data mining method, i.e. more clouds can discover more knowledge than they can uncover on their own when they combine their data (Demir & Tugrul, 2018). There are some studies that propose privacy-preserving solutions for horizontally-partitioned databases to increase the total number of data samples with the goal of creating more accurate data mining models (Inan et al., 2007). In some cases, vertically-partitioned database solutions can be preferred to increase the number of attributes for the same instances (Skillicorn & McConnell, 2008). Institutions such as hospitals operating in different parts of a country may prefer the first choice. On the other hand, institutions such as banks and insurance companies may aggregate their data using the second choice.

In this study, we will examine the k-NN interpolation method that preserves the confidentiality of two different databases stored by two different cloud service providers. k-NN, categorized as a lazy learner, is a non-parametric method used for classification, clustering and interpolation which utilizes the idea that neighboring objects possess or display similar characteristics. Complex interpolation methods such as Kriging involve advanced operations and thus pose a great challenge to cloud computing. In addition, the high time requirements of such methods make them unsuitable in some scenarios such as healthcare applications. On the contrary, the simplicity and interpretability of the k-NN method make it an efficient tool for query processing tasks.

Our contribution

In this article, we introduce an efficient secure two-party k-NN (STPkNN) interpolation protocol that enables two different data owners to outsource their databases together with the query processing service to the cloud, and allows a query owner to extract the interpolation of the k-nearest neighbors of a query point from the encrypted databases. Our protocol preserves the confidentiality of data, assures the privacy of user’s query point, and hides data access patterns.

The STPkNN protocol can be considered as an extension of the protocol SkNNm proposed in Elmehdwi, Samanthula & Jiang (2014), that enables a query owner to retrieve the k-nearest neighbors of a query point from a single encrypted database, to two-cloud settings. Briefly, the SkNNm protocol calculates the k-nearest neighbors in an iterative way by performing the following steps k times: (i) it finds the minimum of the Euclidean distances between the data records and the query point, (ii) it calculates the one of the nearest neighbors that corresponds to the index of the minimum distance, and excludes the corresponding distance from the Euclidean distances. On the other hand, in two-cloud settings, the clouds have to share their local minimums of the Euclidean distances to decide on the global minimum that corresponds the index of the nearest neighbor of two databases at the moment, and remove that record from further iterations. However, it is not trivial to achieve this without revealing which data record corresponds to global minimum to any cloud.

To this aim, we first propose two new security primitives, the Secure Transformation (ST) protocol and the Secure Bit-AND-OR (SBAOR) protocol that enable the clouds to decide on the global minimum and exclude it from the further calculations without revealing data access pattern to any cloud. We show that both protocols protect the confidentiality of the input values which will be in encrypted form, i.e. no information about the input values is leaked to any party during the protocols, and the output is only revealed to one of the parties in the protocols. Briefly, the ST protocol allows the servers to securely transform the encryption of a record under a public key to an encryption of same record under another public key. On the other hand, for given the encryptions of two bit vectors x and y, the SBAOR protocol enables the servers to securely compute the negation of the logical disjunction of all bitwise multiplications xi · yi in encrypted form without revealing the bit vectors to any party.

By employing the ST and SBAOR protocols together with the other existing security protocols, we build our main protocol STPkNN that enables a query owner (QO) to extract the interpolation of the k-nearest neighbors of a query point chosen by QO from two different databases outsourced to two different cloud service providers. In the protocol, data owners encrypt their data before outsourcing them to the cloud service providers, and they do not participate in the STPkNN protocol. Thus, no information about the data is leaked to the cloud service providers during the protocol. Besides, our protocol guarantees that any record from both databases or any intermediate result generated in the protocol is not leaked to the cloud service providers. Also, it hides the data access pattern from both data owners and cloud service providers, i.e. the protocol does not reveal the information of which data records were used to produce the interpolation of k-nearest neighbors to any cloud service provider. On the other hand, the STPkNN protocol outputs the interpolation of k-nearest neighbors only to the query owner, and the query owner gets no information other than the interpolation.

We also conduct various experiments on two real-world datasets from the UCI machine learning repository, the cervical cancer (risk factors) dataset and the default of credit card clients dataset, to show the practicability of our protocol in real world scenarios. The experimental evaluation presents that our protocol scales well for the large datasets.

Related works

Due to its usefulness in many application scenarios such as classification, similarity search, and collaborative filtering, the problem of computing the k-nearest neighbors of a query point has been gained a lot of attention in recent years. The early studies mostly focused on how to implement a secure k-NN method between data owner and clients without using cloud systems. Shaneck, Kim & Kumar (2009) proposed a privacy-preserving protocol that employs secure multiparty computation to compute k-NN in horizontally partitioned databases. Besides, they also showed how their protocol can be efficiently used in different application such as outlier detection, classification, and clustering problems. Moreover, Qi & Atallah (2008) proposed a provable secure protocol for the single-step k-NN search problem that enjoys linear computation and communication complexity. Vaidya & Clifton (2005) introduced a privacy-preserving algorithm that performs top-k queries in vertically partitioned data. Additionally, Kantarcoğlu & Clifton (2004) proposed a method that privately calculates the k-NN classification over horizontally partitioned data in the distributed database model. Note that all of the above methods require the data owners to perform the necessary calculations to generate the result, and to return it directly to the query users. However, in our model, the data is outsourced to the cloud in encrypted form instead of being kept by the data owners. All of the computation required to process k-NN queries are performed by the cloud.

The recent studies have mostly focused on solutions in cloud computing settings. Wong et al. (2009) proposed an asymmetric scalar-product-preserving encryption (ASPE) scheme that can be employed to construct a secure k-NN protocol. The protocol proposed in Wong et al. (2009) uses a distance comparison function instead of an exact distance calculation. However, the secret key in the protocol should be disclosed to the query users. Zhu, Huang & Takagi (2016) introduced a secure protocol that achieves k-NN query processing on encrypted data without totally revealing the data owner’s secret key to the query user. However, their scheme requires data owners to be involved in the encryption of query points. Hu et al. (2011) proposed a secure traversal framework that can used, together with privacy homomorphism, to achieve secure k-NN query processing protocol. Cheng et al. (2015) proposed a privacy-preserving protocol that employs an encrypted hierarchical index tree to perform k-NN queries over spatial data outsourced to cloud in encrypted form. All three protocols (Hu et al., 2011; Zhu, Huang & Takagi, 2016; Cheng et al., 2015) leak data access pattern to the cloud. On the other hand, Kesarwani et al. (2018) proposed a secure k-NN query processing protocol over encrypted data by utilizing a leveled fully homomorphic encryption scheme. Wu et al. (2019) introduced a privacy preserving k-NN classification scheme over the encrypted cloud database that is secure against known-plaintext attack. Besides, Lei et al. (2020) shed light on the connection between a secure k-NN query processing scheme and a secure range query scheme. Based on this connection, they utilize a secure range query scheme together with a data structure named as random Bloom filter to build a secure k-NN query processing scheme. All three protocols (Kesarwani et al., 2018; Wu et al., 2019; Lei et al., 2020) hide data access pattern as well as preserving the data privacy and query privacy. However, they require the decryption keys to be given the query users. However, in our model, the decryption keys are not shared with the query users.

On the other hand, Elmehdwi, Samanthula & Jiang (2014) tackled with the same problem using homomorphic encryption method. In addition to ensuring the confidentiality of data owners and clients, the protocol proposed in Elmehdwi, Samanthula & Jiang (2014) also achieves to hide data access patterns from the clouds. Moreover, Xu et al. (2017) proposed an efficient secure k-NN protocol which achieves sublinear computational complexity. Similar to Elmehdwi, Samanthula & Jiang (2014), their protocol also achieves hiding of data access patterns using garbled circuits to simulate Oblivious RAM. Furthermore, Guo & Sun (2020) adopted the data structure R-tree to build an efficient k-NN scheme that requires only two rounds of interactions between the client and cloud servers to generate the result. They also utilized the Merkle hash tree techniques to obtain a better k-NN scheme that is secure against even a malicious cloud servers. There are also some studies that engage in location-based query processing over encrypted geospatial data (Lei et al., 2019; Lian et al., 2020). Lian et al. (2020) proposed an efficient k-NN scheme by employing the Moore curves together with the AES encryption scheme, that ensures the spatial data and location privacy.

The aforementioned studies use k-NN methods for either classification or query search applications. Unlike previous solutions, Kalideen, Osmanoglu & Tugrul (2019) proposed an efficient solution for the problem of computing the interpolation of k-NN to a given point in cloud computing settings. However, their solution reveals the knowledge of which data records were used to produce the interpolation to the cloud servers, and leaking such information might not be desired in some application required the data security. Unlike the protocol presented in Kalideen, Osmanoglu & Tugrul (2019), our protocol assures the desired security features, i.e. it hides data access pattern.

Problem formulation

In this section, we will give more precise definition of the problem and its security requirements.

Secure two-party k-NN interpolation problem

In our system there are two data owners DO1 and DO2 holding two different spatial databases D1 and D2, respectively. Each database Du consists of n records d1(u),,dn(u) such that each record di(u) is an m-dimensional spatial vector, i.e. di(u)=di,1(u),,di,m(u) where u = 1,2. There are also two cloud pairs (CSP1(u), CSP2(u)) so that each one is associated with a public key-secret key pair (pku, sku) of a public key encryption scheme that is semantically secure (Goldwasser & Micali, 1982). As the most of the studies in this field, we also consider each pair of cloud service providers (CSP1(u), CSP2(u)) as two non-colluding cloud servers, i.e. CSP1(u) stores the database and performs most of the homomorphic operations; on the other hand, CSP2(u) keeps the secret key and helps CSP1(u) to perform the complex operations over the ciphertexts.

In our problem, we assume that each data owner DOu initially encrypts his database Du as Epku(Du) where Epku(Du) consists of the attribute-wise encryptions Epku(di,j(u)) for 1 ≤ in and 1 ≤ jm. Each DOu then outsources Epku(Du) together with the query processing service to CSP1(u). Note that the underlying public key encryption scheme should enable cloud servers to perform homomorphic operations over ciphertexts.

There is also an authorized query owner QO who wants to retrieve the interpolation of k-nearest neighbors of a query point Q from both databases D1 and D2 stored in CSP1(1) and CSP1(2), respectively. After QO requests the interpolation, the cloud service providers generate the result by performing required operations over the encrypted databases. This process should output the interpolation of k-nearest neighbors only to the query owner. The query owner should not learn any information other than the interpolation during this process. We denote such process as secure two-party k-nearest neighbors (STPkNN) protocol. We remark that STPkNN protocol should preserve the confidentiality of the records in the databases D1 and D2, and protect the privacy of the query point. Moreover, the protocol should hide data access patterns, i.e. it should not reveal the information of which data records were used to produce the interpolation of k-nearest neighbors to any data owner or any cloud service provider.

Example

In 2016, European Union adopted a new regulation on the protection of personal data, Regulation (EU) 2016/679 of the European Parliament and Of The Council (European-Parliament, 2016). The regulation states that ‘the protection of natural persons in relation to the processing of personal data is a fundamental right’. All of the personal health records that reveal information relating to the past, current or future physical or mental health status of the data subject are considered as personal sensitive data in the regulation. Therefore, the personal health records should be protected against unauthorized parties, i.e. only the one approved by the owner should be able to access to the data.

On the other hand, the processing of health data may be significant to advance research or healthcare practices. Consider a doctor who tries to determine whether a person has a particular hearth disease or not by analyzing the medical records of the person. In addition, the doctor may desire to compare the patient’s medical records with other patients’ presenting similar properties in order to improve diagnostic accuracy. In fact, this comparison enables the doctor to evaluate the validity of some tests, especially when the scores do not match the expected values. Consequently, the doctor can make an accurate diagnosis, if he is allowed to reach the data of other patients in the same region or across the country. Moreover, if the personal health records are stored in the cloud as encrypted in order not to violate the fundamental right of the owner of the records, it will be possible to perform reliable analysis on large datasets.

Let us clarify it with an example. Consider the subset of heart disease data set from UCI Machine Learning Repository depicted in Table 1. There are 10 different instances shown in the table, and each instance is associated with five attributes: ID (patient’s identity), trestbps (resting blood pressure in mm Hg), chol (serum cholesterol in mg/dl), thalach (maximum heart rate achieved), and oldpeak (ST depression induced by exercise relative to rest). Assume the data owner, which can be viewed as hospital in this context, encrypts these attributes, and outsources the encrypted database Epk(D) together with the future query processing to the cloud. Also, assume there is a doctor who wants to determine whether a specific patient carries risk for a particular hearth disease. Let the medical record of the patient be Q=150,250,145,3. The doctor, that will be the query owner in our context, asks the interpolation of k-nearest neighbors of Q from the cloud by providing the encryption Epk(Q) to the cloud. Then, the cloud determines the interpolation of k-nearest neighbors by searching the encrypted database Epk(D). For simplicity, let k be 3 for this example. As we observe here, the instances having IDs 1, 7, and 9 will be the 3 nearest neighbors to Q. So, the cloud returns the interpolation T=141.6,251.6,152.3,2.4 to the doctor that will benefit from T to make an accurate diagnosis. Consequently, necessary analysis can be carried out without revealing any sensitive information about both his patient and the other patients.

Table 1:
A subset heart disease data set.
ID trestbps chol thalach oldpeak
1 145 233 150 2.3
2 160 286 108 1.5
3 120 229 129 2.6
4 130 250 187 3.5
5 130 204 172 1.4
6 120 236 178 0.8
7 140 268 160 3.6
8 120 354 163 0.6
9 130 254 147 1.4
10 140 203 155 3.1
DOI: 10.7717/peerj-cs.965/table-1

Effect of collaboration on interpolation accuracy

Aguilar et al. (2005) stated that if interpolation models are developed with an insufficient amount of data, they will be less accurate and reliable. Namely, the collaboration between participants affects the accuracy of interpolation models. We here conduct a series of experiments to assess the impact of collaboration between participants on the accuracy of prediction in the interpolation methods. In our experiments, we employ two publicly available datasets from U.S. National Geochemical Survey Database that present sodium (Na) content of the soil in two states: Colorado and Wisconsin. Summary statistics of both data sets are presented in Table 2.

Table 2:
Summary statistics of data sets.
Colorado Wisconsin
Mean 0.999 0.773
Median 0.912 0.771
Minimum 0.063 0.007
Maximum 3.230 2.043
Standard Deviation 0.460 0.299
Skewness 0.912 −0.177
Kurtosis 4.034 3.131
DOI: 10.7717/peerj-cs.965/table-2

There are various performance evaluation metrics for interpolation methods. We here employed Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE), which are often chosen as evaluation metrics for numerical prediction. The small values of both MAE and RMSE indicate that models will produce results that are more accurate. MAE and RMSE values are calculated as follows;

MAE=[1ni=1n|p(xi,yi)a(xi,yi)|]

RMSE=[1ni=1n(p(xi,yi)a(xi,yi))2]1/2

where n is the total number of data points in the dataset, p({xi, yi}) and z({xi, yi}) are predicted and actual values at location ({xi, yi}), respectively. The effects of varying k values on MAE and RMSE values are shown in Fig. 1.

Effects of varying k values on MAE and RMSE for Colorado and Wisconsin data sets.

Figure 1: Effects of varying k values on MAE and RMSE for Colorado and Wisconsin data sets.

We assume that two data holders share all data points in the data set. Both data sets are randomly divided into two parts using sampling without replacement strategy, assuming each party has one of the pieces. In some situations, data holders may not have data in equal proportions. So, we have determined different sharing ratios considering the cases where there is no equal distribution. We specify the β value as the distribution ratio, which means that if one party holds β portion of the data, the other party will hold the remaining portion (1 − β). After several trials, the MAE and RMSE values obtained according to the various number of nearest neighbor counts are shown in the Tables 3 and 4, respectively.

Table 3:
Effects of collaboration for varying k and β (split ratio between parties) values on MAE.
β Wisconsin Colorado
k 25 50 75 100 25 50 75 100
1 0.183 0.169 0.163 0.155 0.319 0.298 0.290 0.283
5 0.156 0.147 0.146 0.145 0.258 0.241 0.235 0.232
10 0.158 0.145 0.143 0.139 0.258 0.242 0.234 0.227
15 0.161 0.147 0.142 0.140 0.259 0.245 0.236 0.232
30 0.167 0.154 0.149 0.143 0.269 0.252 0.244 0.238
50 0.181 0.161 0.154 0.150 0.286 0.262 0.252 0.246
DOI: 10.7717/peerj-cs.965/table-3
Table 4:
Effects of collaboration for varying k and β (split ratio between parties) values on RMSE.
β Wisconsin Colorado
k 25 50 75 100 25 50 75 100
1 0.254 0.243 0.241 0.233 0.443 0.421 0.408 0.402
5 0.208 0.199 0.199 0.198 0.361 0.334 0.327 0.321
10 0.212 0.196 0.193 0.188 0.358 0.337 0.325 0.317
15 0.215 0.198 0.193 0.189 0.358 0.341 0.329 0.323
30 0.224 0.206 0.200 0.195 0.368 0.348 0.338 0.331
50 0.238 0.214 0.207 0.202 0.385 0.360 0.347 0.340
DOI: 10.7717/peerj-cs.965/table-4

As seen from the results, the smallest MAE and RMSE values are observed when the 10-nearest neighbors are used for all points in each data set. The smallest MAE and RMSE are underlined in the tables. As seen from the Table 3, if only half of the data is available for creating a prediction model, there will be a deterioration in MAE values of 4.31% for the Wisconsin data set and 6.60% for the Colorado data set. It is also possible to observe similar aspects in Table 4 for each split ratio. As observed from the results, the data holder who has less amount of data always produces less accurate predictions. On the contrary, if there is a sufficient amount of data, the predictions generated by the model are more accurate and reliable.

Premilinaries

In this section, we will present the notations and the definitions of some primitives that will be used in our proposed protocols.

Notation

We here give the notations used in this paper.

  • n, the number of records in each database

  • m, the number of the attributes in each record

  • , the domain size (in bits) of the squared Euclidean distance

  • DOu, the uth data owner

  • Du, the uth database

  • di(u), the ith record of the database Du

  • QO, the query owner

  • Q, the query point

  • [x], the encryption of the individual bits of x

  • dminp, the pth closest record to Q

  • (pku, sku), the public key-secret key pair assigned to the uth cloud pair

  • (CSP1(u), CSP2(u)), the uth cloud pair, i.e. the former holds the encryption of the database Epku(Du) and the latter holds the corresponding secret key sku.

Homomorphic encryption

Homomorphic encryption is an encryption scheme that allows users to perform some mathematical operations on ciphertexts, such as addition and multiplication. This property enables to protect the confidentiality of the data, and makes the encryption scheme a very practical and useful tool in cloud computing, especially for the sensitive data. For that reason, homomorphic encryption schemes have been gaining a lot of attention in recent years. Within this direction, many homomorphic encryption schemes have been proposed (Goldwasser & Micali, 1982; Elgamal, 1984; Boneh, Goh & Nissim, 2005). In this study we use a well-known homomorphic encryption system, the Paillier scheme, to construct our protocols.

Let Epk(·) be the encryption function with the public key pk and Dsk(·) be the decryption function with the secret key sk. For any given two plaintexts a and b, the Paillier scheme satisfies the following properties:

  • Addition: Dsk(Epk(a + b)) = Dsk(Epk(a) * Epk(b) mod N2)

  • Multiplication: Dsk(Epk(a * b)) = Dsk(Epk(a)bmod N2)

Note that the Paillier encryption scheme is semantically secure (Paillier, 1999).

Basic security primitives

Here, we briefly explain a set of basic security protocols. In these protocols, it’s assumed that there exist two semi-honest parties P1 and P2 joining the protocols, and the Paillier’s secret key is known only to one of them. We will also introduce two new security primitives in “Construction” that will be employed together with the basic primitives given here as building blocks in forming our construction.

Secure multiplication (SM) protocol

Consider two parties P1 and P2 such that the former holds (Epk(x), Epk(y)) and the latter holds the secret key sk, where x and y are not known to both parties. The protocol outputs Epk(x * y) to P1. Note that the output Epk(x * y) is only known to P1, and no information about x and y is revealed to any party during the protocol.

Secure squared Euclidean distance (SSED) protocol

The protocol considers two parties P1 and P2 with the inputs (Epk(X), Epk(Y)) and the secret key sk, respectively, and outputs Epk(|XY|2) to P1, where X and Y are m dimensional vectors. In the protocol, the encryption of squared Euclidean distance Epk(|XY|2) is only known to P1.

Secure bit-decomposition (SBD) protocol

The protocol considers P1 with the input Epk(x) and P2 with the secret key sk, and outputs the encryptions of the bit-decomposition of x as [x]=Epk(x1),,Epk(x), where 0x2. Note that the encryptions of bit-decomposition [x] is known to only P1.

Secure minimum (SMIN) protocol

In the protocol, P1 with the inputs ([x], [y]) and P2 with the secret key sk securely compute the encryption of individual bits of minimum between x and y as [min(x,y)]. Note that the output [min(x,y)] is only known to P1, and no information about x and y is revealed to any party during the protocol.

Secure minimum out of n numbers (SMINn) protocol

In the protocol, P1 with the inputs ([x1],…,[xn]) and P2 with the secret key sk securely compute [min(x1,…,xn)], where [min(x1,…,xn)] is the encryption of the individual bits of min(x1,…,xn). Note that the output [min(x1,…,xn)] is only known to P1, and no information about xi for any i is revealed to any party during the protocol.

Secure Bit-OR (SBOR) protocol

Consider two parties P1 and P2 such that the former holds (Epk(a), Epk(b)) and the latter holds the secret key sk, where a and b are two bits. The protocol outputs Epk(ab) to P1. The output Epk(ab) is only known to P1, and no information about a and b is revealed to any party during the protocol.

Since we don’t aim to study the existing protocols given above, we simply consider the most efficient implementation of them which were presented in Elmehdwi, Samanthula & Jiang (2014) and Samanthula, Hu & Jiang (2013). However, the implementation of the SMINn protocol given in Elmehdwi, Samanthula & Jiang (2014) fails for some inputs, i.e. it generates an incorrect output if the size of the input is given as n = 8k + 1 for some kZ. Let me illuminate it with an example: assume the protocol takes nine inputs ([x1],…,[x9]). At the last step, the protocol applies the SMIN protocol to the intermediate values [x1] and [x7]) (the encryptions of the local minimums), and outputs the encryption of 0 since [x7] was set to the encryption of zero at some previous steps. Therefore, independent of the inputs, the protocol always outputs the encryption of zero as the final output when the size of the input is given as n = 8k + 1 for some kZ. Thus, we develop a new implementation of the SMINn protocol that simply works as follows:

  1. The server P1 initially executes the SMIN protocol together with P2 on [x1] and [x2] to get [R1] = [min(x1,x2)] as the encryption of the individual bits of min(x1,x2),

  2. it then iteratively runs the SMIN protocol together with P2 on [Ri−1] and [xi+1] to get [Ri] = [min(Ri−1,xi+1)] as the encryption of the individual bits of min(x1,x2,…,xi+1) for i = 2… (n − 1).

Note that the final output of the iterative steps will be [Rn−1] = [min(x1,…,xn)], which is the encryption of the individual bits of min(x1,…,xn).

Construction

In this section, we first introduce two new security primitives: the Secure bit-AND-OR (SBAOR) protocol and Secure Transformation (ST) protocol. We then give the security analysis of this protocols. By utilizing SBAOR and ST protocols together with the basic security primitives given in “Premilinaries”, we construct our main protocol. Furthermore, we also give the security analysis of the main protocol and discuss the computation complexity at the end of this section.

Secure bit-AND-OR (SBAOR) protocol

The SBAOR protocol allows the servers to securely compute the negation of the logical disjunction of all bitwise multiplications xi · yi in encrypted form without revealing the bit vectors to any party. In the main protocol, it will help the servers to separate the index of the current closest record to the query point from all other records of both databases by assigning the encryption of 1 to that particular index and the encryption of 0 to all other indices. In this way, the servers will be able to calculate the current closest record, and remove the corresponding index from further calculations.

The protocol considers two parties P1 and P2 such that the former holds ([x], [y]) and the latter holds the secret key sk, where [x] and [y] are the encryption of individual bits of x and y. The protocol enables the parties P1 and P2 to securely compute the encryption Epk(A¯) where A¯=1A and A=(x1y1)(x2y2)(xy). The output Epk(A¯) is only known to P1, and no information about x and y is revealed to any party during the protocol.

In the protocol, P1 and P2 first runs the SM protocol on the inputs Epk(xi) and Epk(yi) to calculate Epk(xi * yi) for i[] where xi and yi are the i-th bits of x and y, respectively. Note that each Epk(xi * yi) is only revealed to P1. The server P1 then calculates Epk(x1y1xy) as follows:

  • it initially executes the SBOR protocol together with P2 on Epk(x1 * y1) and Epk(x2 * y2) to get Epk(R1)=Epk(x1y1x2y2),

  • it then iteratively runs the SBOR protocol together with P2 on Epk(Ri−1) and Epk(xi+1 * yi+1) to get Epk(Ri)=Epk(Ri1xi+1yi+1) for i=21.

Note that the final output of the iterative steps will be Epk(R1)=Epk(x1y1xy). Finally, P1 applies the equation Epk(R1¯)=Epk(1)Epk(R1)N1 to compute the final output.

Algorithm 1:
SBAOR.
Input: ([x], [y]) from P1 and sk from P2
Output: Epk(xy¯) to P1
1. P1 and P2;
fori = 1 to do
Epk(xiyi)SM(Epk(xi),Epk(yi));
2. P1 and P2;
Epk(R1)SBOR(Epk(x1y1),Epk(x2y2);
fori = 2 to − 1 do
Epk(Ri)SBOR(Epk(Ri1),Epk(xi+1yi+1));
3. P1;
Epk(xy¯)Epk(1R1)Epk(1)×Epk(R1)N1;
DOI: 10.7717/peerj-cs.965/table-5

Security Analysis of SBAOR: At the beginning of the protocol, the servers P1 and P2 execute the Secure Multiplication protocol. As emphasized in “Premilinaries”, the output of the protocol is only revealed to the server P1, and no information about the plaintexts xi and yi is revealed to any party during this protocol. Later, the servers run the Secure Bit-OR (SBOR) Protocol on the inputs Epk(Ri) and Epk(xi+1 * yi+1). The SBOR protocol outputs the new Epk(Ri+1) only to the server P1, and no information about the plaintexts is revealed to any party during the protocol. At the final, the server P1 only applies some homomorphic operations on the encryption Epk(R1) computed at the previous step. Therefore, the SBAOR protocol protects the confidentiality of the data, i.e. no information about the contents of the encryptions is revealed to any party during the protocol.

Secure Transformation (ST) protocol

The ST Protocol enables the servers to transform the encryption of a record under a public key to the encryption of same record under another public key. In the main protocol, the servers employ ST Protocol to collect the encryptions of local minimums, that indicate the indexes of local closest records of both databases, under the same public key so that they can decide the minimum among them.

The protocol considers three parties P1(u1) with the input Epku1(t), P2(u1) with the secret key sku1, and P1(u2). The protocol simply aims to transform the encryption of a record t under the public key pku1 to the encryption of t under the public key pku2. Note that no information about t is revealed to any party during the protocol and the output Epku2(t) is only known to the party P1(u2).

Briefly, P1(u1) first masks Epku1(t) with the randomly chosen vector rZNm as μ=Epku1(t)Epku1(r), and sends μ to P2(u1) and Epku2(r) to P1(u2). After getting μ, P2(u1) first decrypts it as μ=Dsku1(μ), then encrypts μ′ with the public key pku2 as Epku2(μ), and finally sends the encryption to P1(u2). After receiving the encryption, the party P1(u2) first removes the randomness r′ from the encryption Epku2(μ) and gets the encryption Epku2(t) as Epku2(t)=Epku2(μr). From the homomorphic property of the underlying encryption scheme, Epku2(μr) can easily be calculated as Epku2(μ)Epku2(r)N1.

Algorithm 2:
ST.
Input: Epku1(t) from P 1(u1) and sku1 from P 2(u1)
Output: Epku2(t) to P 1(u2)
1. P 1(u1);
μEpku1(t)×Epku1(r); r R ZNm;
send μ to P 2(u1) and Epku2(r) to P 1(u2);
2. P 2(u1);
μDsku1(μ);
send Epku2(μ) to P 1(u2);
3. P 1(u2);
Epku2(t)Epku2(μr)Epku2(μ)×Epku2(r)N1;
DOI: 10.7717/peerj-cs.965/table-6

Security Analysis of ST: At the beginning of the protocol, the servers P1(u1) randomizes the encryption Epku1(t) with rZNm before sending it to the server P2(u1). So, the decryption computed by P2(u1) will be uniformly random in ZNm. Besides, P 1(u2) locally subtracts the encryption of the randomness r′ under the public key pku2 from the encryption sent by P2(u1) by performing some homomorphic operations. Thus, the protocol does not reveal any information about the record t to any party.

Main protocol

In this section, we will give the construction of our main protocol that enables a query owner to extract the interpolation of k-nearest neighbors for a query point of his choice as shown in Fig. 2. As we stated in the “Introduction”, our construction can be viewed as an extension of the protocol presented in Elmehdwi, Samanthula & Jiang (2014) that proposes an efficient solution of the k-nearest neighbor query problem over encrypted database outsourced to a single cloud.

The STPkNN protocol: for u = 1, 2; (1) each DOu uploads its data to the server CSP1(u); (2) each DOu gives its secret key to CSP2(u); (3) QO sends its query point Q in encrypted form to the servers CSP1(u); (4) CSP1(u) and CSP2(u) find the local nearest neighbours; (5) CSP1(1) and CSP1(2) decide on the global nearest neighbour among the local nearest neighbors (4 and 5 are repeated k times in the protocol); (6) the final prediction value is forwarded to QO.

Figure 2: The STPkNN protocol: for u = 1, 2; (1) each DOu uploads its data to the server CSP1(u); (2) each DOu gives its secret key to CSP2(u); (3) QO sends its query point Q in encrypted form to the servers CSP1(u); (4) CSP1(u) and CSP2(u) find the local nearest neighbours; (5) CSP1(1) and CSP1(2) decide on the global nearest neighbour among the local nearest neighbors (4 and 5 are repeated k times in the protocol); (6) the final prediction value is forwarded to QO.

We assume that each data owner DOu has a database Du that consists of n records d1(u),,dn(u) where each di(u) is m-dimensional vector that lies in [0,2]. We also assume that there exist two non-colluding semi-honest cloud service providers, CSP1(u) and CSP2(u) for each database Du where CSP1(u) is given the encryption of the database Du and CSP2(u) is given the corresponding secret key sku.

Initially, each DOu encrypts his database Du as Epku(di,j(u)) where 1≤ in and 1≤ jm. Each DOu then outsources the encryptions of the database, together with the future query service to the clouds, i.e. DOu gives Epku(di,j(u)) to CSP1(u) and his secret key sku to CSP2(u). When the query owner (QO) wants to retrieve the interpolation of the k-nearest neighbors for a query point Q, he produces two encryptions of his query point Q as Epk1(Q)=Epk1(q1),,Epk1(qm) and Epk2(Q)=Epk2(q1),,Epk2(qm) using the public keys of the data owners DO1 and DO2, respectively; and gives each encryption Epku(Q) to the corresponding cloud service provider CSP1(u).

After receiving the encryption Epku(Q), each CSP1(u) runs the SSED protocol together with the corresponding server CSP2(u) on the input (Epku(Q),Epku(di(u))) where Epku(di(u))=Epku(di,1(u)),,Epku(di,m(u)) for 1 ≤ in, and obtains the encryption of the squared Euclidean distance between Q and di(u) as Epku(ei(u)) where ei(u)=|Qdi(u)|2. From the SSED protocol, Epku(ei(u)) is revealed only to CSP1(u).

Opposite to the protocol proposed in Kalideen, Osmanoglu & Tugrul (2019), instead of sending the encryptions Epku(ei(u)) to the server CSP2(u) where 1 ≤ in, that reveals the information of which indexes being used to compute the interpolation to CSP2(u), each CSP1(u) securely runs the SBD protocol with the server CSP2(u) on the inputs Epku(ei(u)) to compute [ei(u)]=Epku(ei,1(u)),,Epku(ei,(u)), the encryptions of the individual bits of ei(u). Note that [ei(u)] is only revealed to CSP1(u).

After this stage, the servers produce the interpolation of the k-nearest neighbours of the query point Q in an iterative way. In each iteration:

  • Each pair of servers CSP1(u) and CSP2(u) securely calculate the encryptions of the individual bits of the minimum value [emin(u)] among [e1(u)],,[en(u)] by running the protocol SMINn. Note that [emin(u)] is only revealed to CSP1(u).

  • Each CSP1(u) then locally calculates the encryption of emin from [emin(u)] as

Epku(emin(u))=i=01Epku(emin,i(u))2i1=Epku(emin,1(u)×21++emin,(u)).

  • At this stage of the protocol, the servers CSP1(1) and CSP1(2) have Epk1(emin(1)) and Epk2(emin(2)) as the encryption of the minimum distances. Now, the servers apply the following steps to decide the minimum among emin(1) and emin(2):

  • – the servers CSP1(2) with the input Epk2(emin(2)), CSP2(2) with the secret key sk2, and CSP1(1) securely runs the ST protocol to compute the encryption of emin(2) under the public key pk1. Note that Epk1(emin(2)) is only known to CSP1(1).

  • – the server CSP1(1) now executes the SBD protocol with the server CSP2(1) on the inputs Epk1(emin(1)) and Epk1(emin(2)) to compute [emin(1)] and [emin(2)] where [emin(u)]=Epk1(emin,1(u)),,Epk1(emin,(u)). From the SBD protocol, [emin(1)] and [emin(2)] are only revealed to CSP1(1).

  • CSP1(1) then runs the SMIN protocol with CSP2(1) on the inputs [emin(1)] and [emin(2)], and gets [emin]=[min{emin(1),emin(2)}].

  • – after getting [emin], CSP1(1) locally calculates the encryption of emin as

Epk1(emin)=i=01Epk1(emin,i)2i1.

  • – the servers CSP1(1) with the input Epk1(emin), CSP2(1) with the secret key sk1, and CSP1(2) securely runs the ST protocol to compute the encryption of emin under the public key pk2. Note that Epk2(emin) is only known to CSP1(2).

  • After identifying the minimum emin among emin(1) and emin(2), each CSP1(u) locally computes the encryption of difference (eminei(u)) for each i as Epku(λi(u))=Epku(eminei(u))=Epk(emin)Epk(ei(u))N1.

  • Each CSP1(u) then randomizes Epku(λi(u)) as Epku(αi(u))=Epku(λi(u))ri(u)=Epku(λi(u)ri(u)) where ri(u) is a random number in ZN. It is a fact that only one is the encryption of zero among all 2n encryptions Epku(αi(u)) and all others are the encryptions of some random numbers where i = 1 … n and u = 1, 2.

  • Each CSP1(u) securely runs the SBD protocol with the server CSP2(u) on the inputs Epku(αi(u)) to compute [αi(u)]=Epku(αi,1(u)),,Epku(αi,(u)), the encryptions of the individual bits of αi(u). Note that [αi(u)] is only revealed to CSP1(u).

  • After getting the encryptions [αi(u)], each CSP1(u) runs the SBAOR protocol with the server CSP2(u) on [αi(u)] and [1] for each i where [1]=Epku(1),,Epku(1), and gets Epku(βi(u))=Epku(αi(u)1¯). Note that one of the encryptions among all Epku(βi(u)) is Epku(1) and the remaining encryptions are Epku(0) where i ∈ [n] and u = 1,2. Furthermore, if βj(v)=1, then dj(v) is the closest record to Q from both databases.

  • CSP1(u) securely runs the SM protocol with CSP2(u) on the inputs Epku(βi(u)) and Epku(di,j(u)) to compute βi,j(u)=Epku(βi(u)di,j(u)), for 1 ≤ in and 1 ≤ jm. Then, each CSP1(u) can now calculate the encryption of its candidate for the first closest record d1(u) as Epku(d1(u))=Epku(d1,1(u)),,Epku(d1,m(u)) where Epku(d1,j(u))=i=1nβi,j(u). As we stated before, since only one of the encryptions among all Epku(βi(u)) is Epku(1) and the remaining are Epku(0), one of the encryptions Epku(d1(u)) will be the encryption of zero and the other one will be the encryption of nonzero number that will be the first closest record.

  • CSP1(2) with the input Epk2(d1(2)), CSP2(2) with the secret key sk2, and CSP1(1) securely runs the ST protocol to compute the encryption of d1(2) under the public key pk1. Note that Epk1(d1(2)) is only known to CSP1(1).

  • CSP1(1) now can calculate the encryption of the first closest record as Epk1(dmin1)=Epk1(d1(1))Epk1(d1(2)). From the homomorphic property of the underlying encryption scheme, Epk1(d1(1))Epk1(d1(2))=Epk1(d1(1)+d1(2)), and since one of them is zero, Epk1(d1(1)+d1(2)) will be the encryption of the first closest record from both databases.

  • As the final step of the first iteration, the first closest records dmin1 should be excluded from the further iterations. To this aim, each CSP1(u) securely executes the SBOR protocol with CSP2(u) on the inputs βi(u) and Epku(ei,h(u)) where 1 ≤ in and 1h. As the output of the protocol, CSP1(u) gets the encryptions of renewed distances as Epku(ei,h(u))=Epku(βi(u)ei,h(u)). Observe that if βj(u)=Epku(1) for a particular j, the corresponding distance ej(u) will take the maximum value, i.e. [ej(u)]=Epku(1),,Epku(1). On the other hand, if βi(u)=Epku(0), the SBOR protocol will have no effect on ei(u).

Because our protocol outputs the interpolation of the k-nearest neighbors of the query point Q, the server CSP1(1) does not need to keep all the nearest records separately. Instead, it gradually builds the interpolation, i.e. after each iteration, CSP1(1) adds the current closest record Epk1(dminp) to the previous sum Epk1(Sp1)=Epk1(dmin1++dminp1) as Epk1(Sp1)Epk1(dminp), and gets the current sum Epk1(Sp)=Epk1(dmin1++dminP).

After k iterations, CSP1(1) will have the sum Epk1(Sk)=Epk1(dmin1++dmink) as the encryption of the sum of the k-nearest neighbors of the query point Q. CSP1(1) then computes the randomization of the encryptions as γj=Epk1(Sk,j)Epk1(rj) where rj are random numbers in ZN and 1 ≤ jm. CSP1(1) then sends γj to CSP2(1) and rj to the query owner. Upon receiving γj, CSP2(1) decrypts them as γj=Dsk1(γj) and sends the decryptions to the query owner. The query owner QO then computes the sum of k-nearest record as Sk,j=γjrj where 1 ≤ jm. As the final step, QO computes the interpolation of k-nearest neighbors of Q as Sk,1/k,,Sk,m/k.

Security analysis

In this section, we will give the security analysis of the protocol shown in Algorithm 3. As we emphasized above, the data owners encrypt their data before outsourcing them to the cloud. Since they use the Paillier encryption scheme which is semantically secure, the data is not leaked to any cloud service provider. On the other hand, at the first step of Algorithm 3, the query point Q is encrypted before given to the corresponding cloud service providers. Similarly, since the underlying encryption scheme (the Paillier cryptosystem) is semantically secure, the query point Q is not revealed to any data owner or any cloud service provider.

Algorithm 3:
STPkNN.
Input: Epk1(Du) from CSP1(u); sku from CSP2(u); Q from QO
Output:T, the interpolation of k-nearest neighbors of Q
1. QO;
 a) compute Epku(Q)=Epku(q1),,Epku(qm);
 b) send each Epku(Q) to the corresponding server CSP1(u);
2. CSP1(u) and CSP2(u);
 fori = 1 to ndo
   Epku(ei(u))SSED(Epku(Q),Epku(di(u)));
    [ei(u)]SBD(Epku(ei(u)))
3. forp = 1 to kdo
  a) CSP1(u) and CSP2(u);
   – [emin(u)]SMINn([e1(u)],,[en(u)]);
   – CSP1(u) computes Epku(emin(u))i=01Epku(emin,i(u))2i1;
  b) CSP1(1), CSP2(1), CSP1(2), and CSP2(2);
   – CSP1(2), CSP2(2), and CSP1(1) execute Epk1(emin(2))ST(Epk2(emin(2)));
   – CSP1(1) and CSP2(1) compute [emin(u)]SBD(Epk1(emin(u)));
   – CSP1(1) and CSP2(1) compute [emin]SMIN(Epk1([emin(1)],[emin(2)]));
   – CSP1(1) computes Epk1(emin)i=01Epk1(emin,i)2i1;
   – CSP1(1), CSP2(1), and CSP1(2) execute Epk2(emin)ST(Epk1(emin));
  c) CSP1(u) and CSP2(u);
   fori = 1 to ndo
     Epku(λi(u))Epku(ei(u)emin);
     Epku(αi(u))Epku(λi(u))ri(u), where ri(u) R ZN;
      [αi(u)]SBD(Epku(αi(u)));
     Epku(βi(u))SBAOR([αi(u)],[1]);
  d) CSP1(u) and CSP2(u);
   fori = 1 to n and j = 1 to mdo
        βi,j(u)SM(Epku(βi(u)),Epku(di,j(u)));
      Epku(dp,j(u))i=1nβi,j(u)
  e) CSP1(2), CSP2(2), and CSP1(1);
   – Epk1(dp(2))ST(Epk2(dp(2)));
  f) CSP1(1);
   – Epk1(dminp)Epk1(dp(1)+dp(2))Epk1(d1(1))Epk1(d1(2));
   – Epk1(Sp)Epk1(Sp1)Epk1(dminp);
  g) CSP1(u) and CSP2(u);
   fori = 1 to n and h = 1 to do
       Epku(ei,h(u))SBOR(Epku(βi(u)),Epku(ei,h(u)))
4. CSP1(1);
 forj = 1 to mdo
   – γjEpk1(Sk,j)×Epk1(rj), where rjRZN;
   – sends γj to CSP 2(1) and rj to QO
5. CSP2(1);
 forj = 1 to mdo
   – γjDsk1(Epk1(γj));
   – sends γj to QO
6. QO;
 a) forj = 1 to mdo
     Sk,jγjrj
 b) computes the interpolation as T=Sk,1/k,,Sk,j/k;
DOI: 10.7717/peerj-cs.965/table-7

At the second step of Algorithm 3, the servers CSP1(u) and CSP2(u) execute the protocols SSED and SBD. As stated in Elmehdwi, Samanthula & Jiang (2014), the outputs of the protocols will be in the encrypted format, and will only be revealed to the servers CSP1(u). Besides, no information about the plaintexts is revealed to any party during these protocols. At the step 3(a) of each iteration in Algorithm 3, the output of the protocol SMINn is only revealed to the servers CSP1(u). Besides, the SMINn protocol guarantees that the servers involved in the protocol do not know which records from both databases correspond to the current minimum distances. Similarly, the output of the SMIN protocol executed at the step 3(b) of Algorithm 3 is only revealed to the server CSP1(1). Also, the protocol does not reveal which record corresponds to the current global minimum.

The servers also run the ST protocol at the steps 3(b) and 3(e) of Algorithm 3 to transform the encryption of the current minimum distance under the public key pku1 to the encryption under the public key pku2. As we explained at the beginning of this section, the ST protocol protects the content of the encryption from all parties involved in the protocol. Furthermore, at the step 3(c), each server CSP1(u) runs the SBAOR protocol with CSP2(u) that outputs either the encryption of 1 just for the index corresponding to the current global minimum or the encryption of 0 for all the other indexes. Note that the SBAOR protocol uses the protocols SM and SBD as sub procedures, and it does not leak the index that corresponds to the current global minimum. Thus, data access patterns are protected from all the involved servers through the protocol, i.e. the servers do not know which data records used in producing the interpolation of k-nearest neighbors.

In conclusion, the STPkNN protocol preserves the confidentiality of the data, secures the privacy of user’s query point, and hides data access patterns.

Complexity analysis

In this section, we will discuss the computation complexity of our protocol. The servers perform n instantiations of SSED and SBD protocols at the second step of the protocol. Since the computation complexity of the SSED protocol proposed in Elmehdwi, Samanthula & Jiang (2014) is bounded by O(m) multiplications and O(m) exponentiations, and the computation complexity of the SBD protocol proposed in Samanthula, Hu & Jiang (2013) is bounded by O() multiplications and O() exponentiations, the computation complexity of this step is bounded by O(n(m+)) multiplications and O(n(m+)) exponentiations.

On the other hand, at the third step of our protocol, the servers perform the following operations O(k) times: a single instantiation of SMINn protocol, a single instantiation of SMIN, 2 instantiations of ST protocol, n instantiations of SBD and SBAOR protocols, n · m instantiations of SM protocol, and n instantiations of SBOR protocol. The computation complexity of the SMINn protocol presented in this paper is bounded by O(n) multiplications and O(n) exponentiations and the computation complexity of the SMIN protocol presented in Elmehdwi, Samanthula & Jiang (2014) is bounded by O() multiplications and O() exponentiations. Besides, the ST protocol proposed in this paper, the SM protocol presented in Elmehdwi, Samanthula & Jiang (2014), and the SBOR protocol presented in Elmehdwi, Samanthula & Jiang (2014) only contain a constant number of multiplications and a constant number of exponentiations. Also, as we emphasized above, the computation complexity of the SBD protocol is bounded by O() multiplications and O() exponentiations (Samanthula, Hu & Jiang, 2013). Moreover, since the SBAOR protocol proposed in this paper deploys instantiations of SM protocol and   1 instantiations of SBOR protocols as sub procedures, the computation complexity of the SBAOR protocol bounded by by O() multiplications and O() exponentiations. Thus, the computation complexity of the third step is bounded by O(kn(m+)) multiplications and exponentiations at total.

In addition, the servers perform only O(m) operations at the remaining steps of the protocol. Thus, the total computation complexity of our protocol is bounded by O(kn(m+)) multiplications and exponentiations.

Performance evaluation

In this section, we evaluated the performance of the proposed protocol STPkNN by carrying out a number of experiments under different parameter settings. We deployed Paillier cryptosystem (Paillier, 1999) for the encryption, and implemented the proposed protocols in Java. All the experiments were performed on a virtual Linux machine with an IntelR XeonR Two-CoreTM CPU 2.20 GHz processor and 4 GB RAM running Ubuntu 16.04 LTS. For the experiments, we utilized two real data sets from UCI machine learning repository (Dua & Graff, 2017); Heart Disease that consists of 600 data records such that each one contains 14 attributes concerning heart disease diagnosis, and Bank Marketing that contains 800 data records such that each one includes 15 attributes that helps to predict whether a new client will pay a term deposit. We first processed these data sets so that they contain only non-negative integer values. We then split each data set into two equal parts so that each one will be operated by a single cloud pair. Note that, for all the measurements, the experiment was repeated for multiple query points and the average time taken to execute a query was reflected to the table.

We first evaluated the computation cost of STPkNN on finance data set in minutes for varying the number of nearest neighbors (k) and the number of attributes (m). As shown in Fig. 3A, if we fix the number of attributes as m = 6, the running time of our protocol varies from 74.08 to 226.16 min for finance data set when k is changed from 5 to 15. Besides, for m = 12, the running time of our protocol varies from 78.85 to 239.21 min when k is changed from 5 to 15. So the running time of our protocol grows linear with k. Also, we observe that the computation cost of our protocol increases by nearly a factor of 1.06 when m is doubled.

Running time of STPkNN for varying k values on the (A) finance and the (B) health data set.

Figure 3: Running time of STPkNN for varying k values on the (A) finance and the (B) health data set.

Similarly, we also evaluated the computation cost of STPkNN on heart disease data set in minutes for varying the number of nearest neighbors (k) and the number of attributes (m). As shown in Fig. 3B, if we set the number of attributes as m = 6, the running time of our protocol varies from 55.89 to 168.26 min when k is changed from 5 to 15. Besides, for m = 12, the running time of our protocol varies from 59.56 to 178.63 min when k is changed from 5 to 15. Thus, it is easy to observe that our protocol scales linearly with k. On the other hand, the running time of our protocol increases by almost a factor of 1.34 when the number of data records (n) is changed from 300 to 400. Thus, the running time of our protocol grows linear with n.

Conclusions

In this study, we proposed a secure k-NN method that produces an interpolation of k-nearest neighbors to a query point over encrypted databases. We here claimed that instead of using one, employing two different databases in the protocol will yield more accurate and reliable interpolation value. We validated this claim by conducting experiments on publicly available real data sets. We also showed that our protocol preserves the confidentiality of data, assures the privacy of user’s query point, and hides data access patterns. We finally analyzed the performance of the proposed protocol through a number of experiments under different parameter settings. As a future study, we will examine and expand our work to apply other interpolation methods on encrypted data in distributed architecture. We will extend our protocol, that considers two encrypted databases stored in two different clouds, to multi-cloud settings.

Supplemental Information

Processed Finance Data.

DOI: 10.7717/peerj-cs.965/supp-3

Processed Health Data.

DOI: 10.7717/peerj-cs.965/supp-4
  Visitors   Views   Downloads