Secure trustless text processing of sensitive documents
School of Applied Mathematics, Fundação Getulio Vargas, Rio de Janeiro, Rio de Janeiro, Brazil
- Published
- Accepted
- Subject Areas
- Cryptography, Data Science, Natural Language and Speech
- Keywords
- Sensitive documents, machine learning, Hash functions, Document classification
- Copyright
- © 2017 Coelho et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. Secure trustless text processing of sensitive documents. PeerJ Preprints 5:e2994v1 https://doi.org/10.7287/peerj.preprints.2994v1
Abstract
Scaling up the analysis of sensitive or confidential documents frequently stumbles on the limited number of individuals with the necessary clearance to access the documents. The availability of cryptographic protocols compatible with text processing methods can greatly improve this situation allowing for the automated processing of large corpora of confidential documents by ``untrusted'' third-parties. In this paper we propose a protocol which allows for secure outsourcing of text analytics tasks without compromising the confidentiality of documents. The method scales to large corpora, and presents linear time complexity on the size of the corpus.
Author Comment
Preprint manuscript submitted to a peer reviewed journal.