Thank you for writing down the ten rules. I wholeheartedly agree to them. If all researchers would stick to these rules, we would have fewer problems on the it and data management side...
However, in biomedical research, we are frequently confronted with situations, where clinical data on specific diseases have been collected for a long time, sometimes decades. Re-acquiring data from these patients is often not possible for various reasons. Of course, research produces new findings during this time, leading to new research questions and use cases that were not foreseeable when the data collection started. In addition, technology brings new data sources adding new possibilities.
The trend towards inter-institutional collaborative research adds to the problem: Now several datasets with different histories and data definitions have to be harmonized to provide a larger cohort.
Consequently, we will have to live with projects violating rules 1 and 2, because in many circumstances it is not possible to acquire a fresh, well defined data set for each new project. To attenuate negative effects of imperfect data definitions, I think it is important to rely on discipline-specific standards, as you already mentioned. Those standards, at least for base-line data, will help to re-use data for the future.