CMDI best practice guide
Current effort: see the CLARIN website.
The Best Practices Guide is maintained by members of the CMDI task force and curation task force. See their meeting notes for details on the process.
The most recent version of the guide can be found in the GitHub repository (drafts repository).
The guide has also been published as CE-2017-1076.
Old information
Goal of this document: giving a good insight in the process of metadata modelling with CMDI.
Timeline:
- 16 May - 1 June: collecting suggestions and a skeleton structure on this trac page
- 2 June: finalize skeleton and distribute writing of sections
- 2 June - 30 June: author a first version
- 1 July: decide if the current version is good enough for publication (if so: publish it at www.clarin.eu)
- 1 July - 31 August: next iteration
Who: Oddrun Ohren, Axel Herold, Marc Kemps-Snijders, Matej Durco, Dieter Van Uytvanck (coordination), all CMDI coordinators and taskforce members (contributions)
Some ideas for sections
Introduction to CMDI
More detailed and elaborated version of http://media.dwds.de/clarin/userguide/text/metadata_CMDI.xhtml (Axel?: to avoid duplication the userguide text can probably be shortened and linked to this introduction once it is available)
subjects to be included:
- general introduction to CMDI:
- why component metadata (as opposed to more "monolithic" schemes)
- CMDI "eco system" (Component Registry, Data providers, VLO, ...)
- technical building blocks:
- persistent identifiers (in the context of MdSelfLink, ResourceProxy)
- vocabularies, ISOcat values (relation to high-quality content)
- resource relations (might also go to section "Modelling do's and dont's")
- real-world examples
How to choose profiles and components
- discussion of profile proliferation
- preference for profile deveopment:
- re-use existing profiles/components,
- adapt existing profiles/components,
- create your own components,
- create your own profiles
- reference to the CmdiRecommendedComponents (list needs to be double-checked and updated)
Lifecycle management
Oddrun and Marc have some ideas for this chapter. Possibly include versioning and outcomes of Dutch centre interviews.
- lifecycle management of instances
- lifecycle management of components (once CMDI-1.2 is available)
Conversion and migration
some information available in the FAQ section: http://www.clarin.eu/faq-page/274
Modelling do's and dont's
information (including examples) about:
- normalisation
- recommended metadata fields/ISOcat categories (e.g. fields that will be used as facettes in the VLO are a good start)
- hierarchies, inheritance
- metadata versus data
This section could re-use information from the granularity document
Quality check list
cross-check with modelling do's and dont's
[This list is obsoleted by Cmdi/QualityCriteria]
- is the CMDI file schema-valid?
- are the header fields complete?
- is there a uniqe MdSelfLink?
- is there an MdCollectionDisplayName?
- does it contain ResourceProxy elements?
- is there a link to a LandingPage when available?
- is there an indication of the mime type?
- is the file too big to be useful? (> suggest higher granularity)
- sparseness: what about files that hardly contain any information?
- what is the information entropy? (> lots of very similar files might be an indication of a suboptimal modelling)
- ...