Standardized representation of the LIDC annotations using DICOM

Brigham and Women's Hospital / Harvard Medical School, Boston, MA, United States
Enthought Inc, Austin, TX, USA
PixelMed Publishing, LLC, Bangor, PA, USA
University of Arkansas for Medical Sciences, Little Rock, AR, United States
Frederick National Laboratory for Cancer Research, Frederick, MD, USA
Isomics Inc, Cambridge, MA, USA
Dana Farber Cancer Institute / Harvard Medical School, Boston, MA, USA
Brigham and Women's Hospital / Harvard Medical School, Boston, MA, USA
Fraunhofer MEVIS, Bremen, Germany
Mathematics/Computer Science Faculty, University of Bremen, Bremen, Germany
DOI
10.7287/peerj.preprints.27378v2
Subject Areas
Bioinformatics, Oncology, Radiology and Medical Imaging
Keywords
data descriptor, cancer imaging, imaging informatics, DICOM, medical image computing, data sharing
Copyright
© 2019 Fedorov et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Fedorov A, Hancock M, Clunie D, Brochhausen M, Bona J, Kirby J, Freymann J, Pieper S, Aerts H, Kikinis R, Prior F. 2019. Standardized representation of the LIDC annotations using DICOM. PeerJ Preprints 7:e27378v2

Abstract

The Lung Imaging Data Consortium and Image Database Resource Initiative (LIDC) conducted a multi-site reader study that produced a comprehensive database of Computed Tomography (CT) scans for over 1000 subjects annotated by multiple expert readers. The result is hosted in the LIDC-IDRI collection of The Cancer Imaging Archive (TCIA). Annotations that accompany the images of the collection are stored using project-specific XML representation. This complicates their reuse, since no general-purpose tools are available to visualize or query those objects, and makes harmonization with other similar type of data non-trivial. To make the LIDC dataset more FAIR (Findable, Accessible, Interoperable, Reusable) to the research community, we prepared their standardized representation using the Digital Imaging and Communications in Medicine (DICOM) standard. This manuscript is intended to serve as a companion to the dataset to facilitate its reuse.

Author Comment

The manuscript describes a public DICOM dataset of the annotations and measurements lung Computed Tomography (CT) images collected by the LIDC-IDRI project. Compared to the initial version, this version has more extensive instructions about the usage and is accompanied by a Jupyter Notebook illustrating its usage. The underlying dataset has also been updated as follows:

* DICOM Segmentation objects now do not encode empty slices to reduce object size

* the coded terms used to describe the nodule annotations now use fewer non-standard (99QIICR) codes

* SegmentLabel attribute is populated in the DICOM SEG objects to list nodule annotation name instead of "Nodule", to help with readability for the user