Sequencing data discovery with MetaSeek

Department of Marine Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
Independent Researcher, Durham, NC, United States
Department of Veterinary Medicine, University of California, Davis, Davis, CA, United States
DOI
10.7287/peerj.preprints.27804v1
Subject Areas
Bioinformatics, Data Science
Keywords
data discovery, ngs, sequence analysis, interactive visualization, sequence data mining
Copyright
© 2019 Hoarfrost et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Hoarfrost A, Brown N, Brown CT, Arnosti C. 2019. Sequencing data discovery with MetaSeek. PeerJ Preprints 7:e27804v1

Abstract

Sequencing data resources have increased exponentially in recent years, as has interest in large-scale meta-analyses of integrated next-generation sequencing datasets. However, curation of integrated datasets that match a user’s particular research priorities is currently a time-intensive and imprecise task. MetaSeek is a sequencing data discovery tool that enables users to flexibly search and filter on any metadata field to quickly find the sequencing datasets that meet their needs. MetaSeek automatically scrapes metadata from all publicly available datasets in the Sequence Read Archive, cleans and parses messy, user-provided metadata into a structured, standard-compliant database, and predicts missing fields where possible. MetaSeek provides a web-based graphical user interface and interactive visualization dashboard, as well as a programmatic API to rapidly search, filter, visualize, save, share, and download matching sequencing metadata.

The MetaSeek online interface is available at https://www.metaseek.cloud/. The MetaSeek database can also be accessed via API to programmatically search, filter, and download all metadata. MetaSeek source code, metadata scrapers, and documents are available at https://github.com/MetaSeek-Sequencing-Data-Discovery/metaseek/.

Additional guides, tutorials, and documents are available at https://github.com/MetaSeek-Sequencing-Data-Discovery/metaseek, and on the MetaSeek website, https://www.metaseek.cloud/. MetaSeek is distributed under an MIT license.

Author Comment

This is a preprint submission to PeerJ Preprints.