Documentation and metadata
To ensure that you understand your own data and make it user-friendly for others, you should add documentation and metadata (data about data) to the documents and datasets you create.
What are data documentation and metadata?
Data documentation provides all the information required to interpret, understand and use data. Good documentation is also essential for successful data preservation.
Metadata means data about data and appears in a standardised in format. The bibliographic records for items on the Library catalogue are a good example of metadata. In the context of data management, it forms a subset of data documentation that explains the purpose, origin, time reference, geographic location, creator, access conditions and terms of use of a data collection.
Why should I document my data?
Digital data are machine-readable. However, the task of interpreting data falls to humans. For this to work we must have sufficient contextual information. Thus, the importance of documenting your data during the collection and analysis stage of your research cannot be underestimated.
Whilst collecting your data you might be able to remember what all your classification systems mean, but the chances are slim that this will still be the case a few months or a year's time. Sufficient documentation, explaining the codes and classifications you have used, will eliminate this possibility.
Others may also want to examine your data for many reasons, such as:
- understanding your findings
- verifying your results
- reviewing your submitted publication
- replicating your results
- designing a similar study
- archiving your data for access and re-use.
Good documentation will ensure that all of the above are possible regardless of what system or software they might be using.
When and how do I include documentation/metadata?
You should document your data from the very beginning of your research project. Information can then be added as the project progresses. Include procedures for documentation in your data plan. Documentation can be added to your data at various levels:
Variable level documentation
This documentation can be included within the data or document itself, as a header or at a specified location within a file. Examples of variable level documentation can include information on labels, codes, classifications, missing values, derivations and aggregations.
File or database level documentation
This type of documentation explains how all the files that make up the dataset relate to each other, what format they are in or whether particular files are intended replace other files, etc. A readme.txt file is an established way of accounting for all the files and folders in a project.
Project level documentation
Explains the aims of the study, what the research questions/hypotheses are, what methodologies are being used, what instruments and measures are being used, etc.This information is contained in separate files accompanying the data in order to provide context, explanation, or instructions on data use or reuse. Examples of project level documentation include: working papers, laboratory books, questionnaires, interview guides, final project reports and publications.
Metadata
This is structured information that has the potential for machine-to-machine interoperability. It can be used to identify and locate the data via a web browser or web based catalogue. Usually structured according to an international standard, metadata descriptors are crucial if you intend to share your data online. Researchers usually create metadata when completing a data centre's deposit form or metadata editor. Fields include: title, description, abstract, creator, geographic location, keywords.