wiki:CmdiBestPracticeGuide

Version 9 (modified by Dieter Van Uytvanck, 10 years ago) (diff)

--

CMDI best practice guide

Goal of this document: giving a good insight in the process of metadata modelling with CMDI.

Timeline:

  • 16 May - 1 June: collecting suggestions and a skeleton structure on this trac page
  • 2 June: finalize skeleton and distribute writing of sections
  • 2 June - 30 June: author a first version
  • 1 July: decide if the current version is good enough for publication (if so: publish it at www.clarin.eu)
  • 1 July - 31 August: next iteration

Who: Oddrun Ohren, Axel Herold, Marc Kemps-Snijders, Matej Durco, Dieter Van Uytvanck (coordination), all CMDI coordinators and taskforce members (contributions)

Some ideas for sections

Introduction to CMDI

More detailed and elaborated version of http://media.dwds.de/clarin/userguide/text/metadata_CMDI.xhtml

subjects to be included:

  • persistent identifiers (in the context of MdSelfLink, ResourceProxy)
  • vocabularies, ISOcat values (relation to high-quality content)
  • resource relations (might also go to section "Modelling do's and dont's")
  • real-world examples

How to choose profiles and components

reference to the CmdiRecommendedComponents (list needs to be double-checked and updated)

Lifecycle management

Oddrun and Marc have some ideas for this chapter. Possibly include versioning and outcomes of Dutch centre interviews.

Conversion and migration

some information available in the FAQ section: http://www.clarin.eu/faq-page/274

Modelling do's and dont's

information (including examples) about:

  • normalisation
  • hierarchies, inheritance
  • metadata versus data

This section could re-use information from the granularity document

quality check-list

see also Taskforces/Curation

  • is the CMDI file schema-valid?
  • are the header fields complete?
    • is there a uniqe MdSelfLink?
    • is there an MdCollectionDisplayName?
  • does it contain ResourceProxy elements?
    • is there a link to a LandingPage when available?
    • is there an indication of the mime type?
  • is the file too big to be useful? (> suggest higher granularity)
  • sparseness: what about files that hardly contain any information?
  • what is the information entropy? (> lots of very similar files might be an indication of a suboptimal modelling)
  • ...