Secure trustless text processing of sensitive documents

School of Applied Mathematics, Fundação Getulio Vargas, Rio de Janeiro, Rio de Janeiro, Brazil
DOI
10.7287/peerj.preprints.2994v1
Subject Areas
Cryptography, Data Science, Natural Language and Speech
Keywords
Sensitive documents, machine learning, Hash functions, Document classification
Copyright
© 2017 Coelho et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Coelho FC, Cuconato B. 2017. Secure trustless text processing of sensitive documents. PeerJ Preprints 5:e2994v1

Abstract

Scaling up the analysis of sensitive or confidential documents frequently stumbles on the limited number of individuals with the necessary clearance to access the documents. The availability of cryptographic protocols compatible with text processing methods can greatly improve this situation allowing for the automated processing of large corpora of confidential documents by ``untrusted'' third-parties. In this paper we propose a protocol which allows for secure outsourcing of text analytics tasks without compromising the confidentiality of documents. The method scales to large corpora, and presents linear time complexity on the size of the corpus.

Author Comment

Preprint manuscript submitted to a peer reviewed journal.