Version 12 (modified by 9 years ago) (diff) | ,
---|
CMDI best practice guide
Goal of this document: giving a good insight in the process of metadata modelling with CMDI.
Timeline:
- 16 May - 1 June: collecting suggestions and a skeleton structure on this trac page
- 2 June: finalize skeleton and distribute writing of sections
- 2 June - 30 June: author a first version
- 1 July: decide if the current version is good enough for publication (if so: publish it at www.clarin.eu)
- 1 July - 31 August: next iteration
Who: Oddrun Ohren, Axel Herold, Marc Kemps-Snijders, Matej Durco, Dieter Van Uytvanck (coordination), all CMDI coordinators and taskforce members (contributions)
Some ideas for sections
Introduction to CMDI
More detailed and elaborated version of http://media.dwds.de/clarin/userguide/text/metadata_CMDI.xhtml (Axel?: to avoid duplication the userguide text can probably be shortened and linked to this introduction once it is available)
subjects to be included:
- general introduction to CMDI:
- why component metadata (as opposed to more "monolithic" schemes)
- CMDI "eco system" (Component Registry, Data providers, VLO, ...)
- technical building blocks:
- persistent identifiers (in the context of MdSelfLink, ResourceProxy)
- vocabularies, ISOcat values (relation to high-quality content)
- resource relations (might also go to section "Modelling do's and dont's")
- real-world examples
How to choose profiles and components
- discussion of profile proliferation
- preference for profile deveopment:
- re-use existing profiles/components,
- adapt existing profiles/components,
- create your own components,
- create your own profiles
- reference to the CmdiRecommendedComponents (list needs to be double-checked and updated)
Lifecycle management
Oddrun and Marc have some ideas for this chapter. Possibly include versioning and outcomes of Dutch centre interviews.
- lifecycle management of instances
- lifecycle management of components (once CMDI-1.2 is available)
Conversion and migration
some information available in the FAQ section: http://www.clarin.eu/faq-page/274
Modelling do's and dont's
information (including examples) about:
- normalisation
- recommended metadata fields/ISOcat categories (e.g. fields that will be used as facettes in the VLO are a good start)
- hierarchies, inheritance
- metadata versus data
This section could re-use information from the granularity document
Quality check list
cross-check with modelling do's and dont's
[This list is obsoleted by Cmdi/QualityCriteria]
- is the CMDI file schema-valid?
- are the header fields complete?
- is there a uniqe MdSelfLink?
- is there an MdCollectionDisplayName?
- does it contain ResourceProxy elements?
- is there a link to a LandingPage when available?
- is there an indication of the mime type?
- is the file too big to be useful? (> suggest higher granularity)
- sparseness: what about files that hardly contain any information?
- what is the information entropy? (> lots of very similar files might be an indication of a suboptimal modelling)
- ...