wiki:CmdiBestPracticeGuide

Version 12 (modified by matej.durco@oeaw.ac.at, 9 years ago) (diff)

--

CMDI best practice guide

Goal of this document: giving a good insight in the process of metadata modelling with CMDI.

Timeline:

  • 16 May - 1 June: collecting suggestions and a skeleton structure on this trac page
  • 2 June: finalize skeleton and distribute writing of sections
  • 2 June - 30 June: author a first version
  • 1 July: decide if the current version is good enough for publication (if so: publish it at www.clarin.eu)
  • 1 July - 31 August: next iteration

Who: Oddrun Ohren, Axel Herold, Marc Kemps-Snijders, Matej Durco, Dieter Van Uytvanck (coordination), all CMDI coordinators and taskforce members (contributions)

Some ideas for sections

Introduction to CMDI

More detailed and elaborated version of http://media.dwds.de/clarin/userguide/text/metadata_CMDI.xhtml (Axel?: to avoid duplication the userguide text can probably be shortened and linked to this introduction once it is available)

subjects to be included:

  • general introduction to CMDI:
    • why component metadata (as opposed to more "monolithic" schemes)
    • CMDI "eco system" (Component Registry, Data providers, VLO, ...)
  • technical building blocks:
    • persistent identifiers (in the context of MdSelfLink, ResourceProxy)
    • vocabularies, ISOcat values (relation to high-quality content)
    • resource relations (might also go to section "Modelling do's and dont's")
  • real-world examples

How to choose profiles and components

  • discussion of profile proliferation
  • preference for profile deveopment:
    1. re-use existing profiles/components,
    2. adapt existing profiles/components,
    3. create your own components,
    4. create your own profiles
  • reference to the CmdiRecommendedComponents (list needs to be double-checked and updated)

Lifecycle management

Oddrun and Marc have some ideas for this chapter. Possibly include versioning and outcomes of Dutch centre interviews.

  • lifecycle management of instances
  • lifecycle management of components (once CMDI-1.2 is available)

Conversion and migration

some information available in the FAQ section: http://www.clarin.eu/faq-page/274

Modelling do's and dont's

information (including examples) about:

  • normalisation
  • recommended metadata fields/ISOcat categories (e.g. fields that will be used as facettes in the VLO are a good start)
  • hierarchies, inheritance
  • metadata versus data

This section could re-use information from the granularity document

Quality check list

cross-check with modelling do's and dont's

[This list is obsoleted by Cmdi/QualityCriteria]

  • is the CMDI file schema-valid?
  • are the header fields complete?
    • is there a uniqe MdSelfLink?
    • is there an MdCollectionDisplayName?
  • does it contain ResourceProxy elements?
    • is there a link to a LandingPage when available?
    • is there an indication of the mime type?
  • is the file too big to be useful? (> suggest higher granularity)
  • sparseness: what about files that hardly contain any information?
  • what is the information entropy? (> lots of very similar files might be an indication of a suboptimal modelling)
  • ...