12 Grand challenges in single-cell data science

Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
Department of Paediatric Oncology, Haematology and Immunology, Medical Faculty, Heinrich Heine University, University Hospital, Düsseldorf, Germany
Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, United States of America
Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics University of Warsaw, Warsaw, Poland
Bioinformatics and Cellular Genomics, St Vincent's Institute of Medical Research, Fitzroy, Australia
Melbourne Integrative Genomics, School of BioSciences / School of Mathematics & Statistics, Faculty of Science, University of Melbourne, Melbourne, Australia
Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, United States of America
Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom
The Alan Turing Institute, British Library, London, United Kingdom
Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
Department of Statistics, University of British Columbia, Vancouver, Canada
Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada
Data Science Institute, University of British Columbia, Vancouver, Canada
Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, United States of America
Department of Pathology, Harvard Medical School, Boston, United States of America
Broad Institute of Harvard and MIT, Cambridge, Massachusets, United States of America
Department of Computer Science, Georgia State University, Atlanta, United States of America
Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Spain
Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
Oncode Institute, Utrecht, The Netherlands
Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
Institute for Advanced Study, University of Amsterdam, Amsterdam, The Netherlands
Department of Surgery and Cancer, The Imperial Centre for Translational and Experimental Medicine, Imperial College London, London, United Kingdom
Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, Nijmegen, The Netherlands
European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
Biometris, Wageningen University & Research, Wageningen, The Netherlands
Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden, The Netherlands
Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
PRB lab, Delft University of Technology, Delft, The Netherlands
Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
Computer Science & Engineering Department, University of Connecticut, Storrs, United States of America
Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge, United Kingdom
Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom
European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom
Center for Bioinformatics, Saarland University, Saarbrücken, Germany
Max Planck Institute for Informatics, Saarbrücken, Germany
Institute of Pathology, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
Computation molecular design, Zuse Institute Berlin, Berlin, Germany
Mathematics department, Mount Saint Vincent, New York, United States of America
Helmholtz Institute for RNA-based Infection Research, Helmholtz-Center for Infection Research, Würzburg, Germany
Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
Institute of Computational Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
Division of Drug Discovery and Safety, Leiden Academic Center for Drug Research (LACDR) / Leiden University, Leiden, The Netherlands
The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
Department of Computer Science, Princeton University, Princeton, United States of America
Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, United States of America
DOI
10.7287/peerj.preprints.27885v2
Subject Areas
Bioinformatics, Computational Biology, Data Science, Scientific Computing and Simulation
Keywords
single-cell, data science, data integration, sequencing, transcriptomics, phylogenomics, tumour heterogeneity, cell atlas, sparse data, developmental trajectory
Copyright
© 2019 Laehnemann et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Laehnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks SC, Robinson MD, Vallejos CA, Beerenwinkel N, Campbell KR, Mahfouz A, Pinello L, Skums P, Stamatakis A, Stephan-Otto Attolini C, Aparicio S, Baaijens J, Balvert M, de Barbanson B, Cappuccio A, Corleone G, Dutilh B, Florescu M, Guryev V, Holmer R, Jahn K, Jessurun Lobo T, Keizer EM, Khatri I, Kiełbasa SM, Korbel JO, Kozlov AM, Kuo T, Lelieveldt BP, Mandoiu II, Marioni JC, Marschall T, Mölder F, Niknejad A, Rączkowski Ł, Reinders M, de Ridder J, Saliba A, Somarakis A, Stegle O, Theis FJ, Yang H, Zelikovsky A, McHardy AC, Raphael BJ, Shah SP, Schönhuth A. 2019. 12 Grand challenges in single-cell data science. PeerJ Preprints 7:e27885v2

Abstract

The recent upswing of microfluidics and combinatorial indexing strategies, further enhanced by very low sequencing costs, have turned single cell sequencing into an empowering technology; analyzing thousands—or even millions—of cells per experimental run is becoming a routine assignment in laboratories worldwide. As a consequence, we are witnessing a data revolution in single cell biology. Although some issues are similar in spirit to those experienced in bulk sequencing, many of the emerging data science problems are unique to single cell analysis; together, they give rise to the new realm of 'Single Cell Data Science'.

Here, we outline twelve challenges that will be central in bringing this new field forward. For each challenge, the current state of the art in terms of prior work is reviewed, and open problems are formulated, with an emphasis on the research goals that motivate them.

This compendium is meant to serve as a guideline for established researchers, newcomers and students alike, highlighting interesting and rewarding problems in 'Single Cell Data Science' for the coming years.

Author Comment

Version 2 fixes one previously misspelled author name and adds a middle initial for another one. Middle initials were further harmonised to 'X.' spelling and the spacing on the title page optimised.