wiki:CmdiBestPracticeGuide

CMDI best practice guide

Current effort: see the CLARIN website.

The Best Practices Guide is maintained by members of the CMDI task force and curation task force. See their meeting notes for details on the process.

The most recent version of the guide can be found in the GitHub repository (drafts repository).

The guide has also been published as CE-2017-1076.

Old information

NOTE: The information below was previously (May - July 2014) gathered on this page and is not necessarily reflected in the current best practices guide!

Goal of this document: giving a good insight in the process of metadata modelling with CMDI.

Timeline:

  • 16 May - 1 June: collecting suggestions and a skeleton structure on this trac page
  • 2 June: finalize skeleton and distribute writing of sections
  • 2 June - 30 June: author a first version
  • 1 July: decide if the current version is good enough for publication (if so: publish it at www.clarin.eu)
  • 1 July - 31 August: next iteration

Who: Oddrun Ohren, Axel Herold, Marc Kemps-Snijders, Matej Durco, Dieter Van Uytvanck (coordination), all CMDI coordinators and taskforce members (contributions)

Some ideas for sections

Introduction to CMDI

More detailed and elaborated version of http://media.dwds.de/clarin/userguide/text/metadata_CMDI.xhtml (Axel?: to avoid duplication the userguide text can probably be shortened and linked to this introduction once it is available)

subjects to be included:

  • general introduction to CMDI:
    • why component metadata (as opposed to more "monolithic" schemes)
    • CMDI "eco system" (Component Registry, Data providers, VLO, ...)
  • technical building blocks:
    • persistent identifiers (in the context of MdSelfLink, ResourceProxy)
    • vocabularies, ISOcat values (relation to high-quality content)
    • resource relations (might also go to section "Modelling do's and dont's")
  • real-world examples

How to choose profiles and components

  • discussion of profile proliferation
  • preference for profile deveopment:
    1. re-use existing profiles/components,
    2. adapt existing profiles/components,
    3. create your own components,
    4. create your own profiles
  • reference to the CmdiRecommendedComponents (list needs to be double-checked and updated)

Lifecycle management

Oddrun and Marc have some ideas for this chapter. Possibly include versioning and outcomes of Dutch centre interviews.

  • lifecycle management of instances
  • lifecycle management of components (once CMDI-1.2 is available)

Conversion and migration

some information available in the FAQ section: http://www.clarin.eu/faq-page/274

Modelling do's and dont's

information (including examples) about:

  • normalisation
  • recommended metadata fields/ISOcat categories (e.g. fields that will be used as facettes in the VLO are a good start)
  • hierarchies, inheritance
  • metadata versus data

This section could re-use information from the granularity document

Quality check list

cross-check with modelling do's and dont's

[This list is obsoleted by Cmdi/QualityCriteria]

  • is the CMDI file schema-valid?
  • are the header fields complete?
    • is there a uniqe MdSelfLink?
    • is there an MdCollectionDisplayName?
  • does it contain ResourceProxy elements?
    • is there a link to a LandingPage when available?
    • is there an indication of the mime type?
  • is the file too big to be useful? (> suggest higher granularity)
  • sparseness: what about files that hardly contain any information?
  • what is the information entropy? (> lots of very similar files might be an indication of a suboptimal modelling)
  • ...
Last modified 6 years ago Last modified on 10/26/18 12:08:28