As an IT Support technician in a Scientific Institution, in Rule 1, "Know what to expect", I would add a bullet point about attempting to anticipate how much data you would expect to accrue, especially that which will need long-term storage. Although this is touched on in other points, lack of forward planning in terms of data storage is becoming a huge problem in the academic sector, and can quickly result in a messy array of unsuitable and unsafe data storage media. The most common problem is that a grant application has not included sufficient funds to store data for the long term - professional storage is substantially more expensive per Terabyte than the cost of external hard drives, for instance.
Another minor point - there is no given definition of JSON.
Thank you for writing down the ten rules. I wholeheartedly agree to them. If all researchers would stick to these rules, we would have fewer problems on the it and data management side...
However, in biomedical research, we are frequently confronted with situations, where clinical data on specific diseases have been collected for a long time, sometimes decades. Re-acquiring data from these patients is often not possible for various reasons. Of course, research produces new findings during this time, leading to new research questions and use cases that were not foreseeable when the data collection started. In addition, technology brings new data sources adding new possibilities.
The trend towards inter-institutional collaborative research adds to the problem: Now several datasets with different histories and data definitions have to be harmonized to provide a larger cohort.
Consequently, we will have to live with projects violating rules 1 and 2, because in many circumstances it is not possible to acquire a fresh, well defined data set for each new project. To attenuate negative effects of imperfect data definitions, I think it is important to rely on discipline-specific standards, as you already mentioned. Those standards, at least for base-line data, will help to re-use data for the future.
You can also choose to receive updates via daily or weekly email digests. If you are following multiple preprints then we will send you no more than one email per day or week based on your preferences.
Note: You are now also subscribed to the subject areas of this preprint and will receive updates in the daily or weekly email digests if turned on. You can add specific subject areas through your profile settings.
Usage since published - updated daily