Data Stewardship – Module 1: MOOC: 4.3 Documenting your data: The Readme file

Besides rich metadata, the Readme file is the other means by which data should be documented. So why do you need a Readme file, when your dataset is archived with rich and well-structured metadata? The main difference between metadata and a Readme file in its purpose is that metadata is to ensure machine readable information, while the Readme file is meant for human reading. However, the Readme file may still be searchable by text mining tools. So even if you should view the Readme file primarily as a human readable guide to your dataset, text mining will enhance how people may find your dataset through searching.

When writing your Readme file, you can reuse information from your Data Management Plan, if you have worked well with that.

In this video you will learn why the Readme file is crucial in order to avoid misinterpretation of the dataset.

Transcript of video "How to structure ...: Documenting your data - The Readme-file"

Lessons learned:

The Readme file complements the metadata documentation, and is a human readable introduction and explanation of what information the dataset holds.
Even the Readme file may be searchable through text mining tools.
Take the needs of an outsider as the starting point, and include in the Readme file all information needed to make sure anyone is able to understand and interpret your dataset correctly, both now and also many years from now.
You should start entering information into your Readme file early, and update the file as new information is obtained.
The Readme file should be in a plain preferred format, either plain text with UTF8, or PDF/A.

Food for thought
Think through your own PhD project and the data you have collected, or plan to collect. What do you see as essential to include and explain in a Readme file, to make sure your dataset is understood correctly by outsiders?

Examples illustrating informative Readme-files:

Data and code to replicate "A dynamic and hierarchical spatial occupancy model for interacting species"

Replication data for: An integrated CO2 unit for heating, cooling and DHW installed in a hotel-Data from the field

Naposledy změněno: pondělí, 19. prosince 2022, 11.52