= CMDI best practice guide = Current effort: see the [https://www.clarin.eu/content/cmdi-best-practice-guide CLARIN website]. The Best Practices Guide is maintained by members of the [[Taskforces/CMDI|CMDI task force]] and [[Taskforces/Curation|curation task force]]. See their [[Taskforces/CMDI/Meetings|meeting notes]] for details on the process. The most recent version of the guide can be found in the [https://github.com/clarin-eric/cmdi-best-practices/releases GitHub repository] ([https://github.com/cmdi-taskforce/cmdi-best-practices drafts repository]). The guide has also been published as [https://office.clarin.eu/v/CE-2017-1076-CMDI-Best-Practices.pdf CE-2017-1076]. == Old information == {{{#!div class="system-message" '''NOTE''': The information below was previously (May - July 2014) gathered on this page and is not necessarily reflected in the current best practices guide! }}} Goal of this document: giving a good insight in the process of metadata modelling with CMDI. Timeline: * 16 May - 1 June: collecting suggestions and a skeleton structure on this trac page * 2 June: finalize skeleton and distribute writing of sections * 2 June - 30 June: author a first version * 1 July: decide if the current version is good enough for publication (if so: publish it at www.clarin.eu) * 1 July - 31 August: next iteration Who: Oddrun Ohren, Axel Herold, Marc Kemps-Snijders, Matej Durco, Dieter Van Uytvanck (coordination), all CMDI coordinators and taskforce members (contributions) === Some ideas for sections === ==== Introduction to CMDI ==== More detailed and elaborated version of http://media.dwds.de/clarin/userguide/text/metadata_CMDI.xhtml ([[herold|Axel]]: to avoid duplication the userguide text can probably be shortened and linked to this introduction once it is available) subjects to be included: * general introduction to CMDI: * why component metadata (as opposed to more "monolithic" schemes) * CMDI "eco system" (Component Registry, Data providers, VLO, ...) * technical building blocks: * persistent identifiers (in the context of !MdSelfLink, !ResourceProxy) * vocabularies, ISOcat values (relation to high-quality content) * resource relations (might also go to section "Modelling do's and dont's") * real-world examples ==== How to choose profiles and components ==== * discussion of profile proliferation * preference for profile deveopment: 1. re-use existing profiles/components, 2. adapt existing profiles/components, 3. create your own components, 4. create your own profiles * reference to the CmdiRecommendedComponents (list needs to be double-checked and updated) ==== Lifecycle management ==== Oddrun and Marc have some ideas for this chapter. Possibly include versioning and outcomes of Dutch centre interviews. * lifecycle management of instances * lifecycle management of components (once CMDI-1.2 is available) ==== Conversion and migration ==== some information available in the FAQ section: http://www.clarin.eu/faq-page/274 ==== Modelling do's and dont's ==== information (including examples) about: * normalisation * recommended metadata fields/ISOcat categories (e.g. fields that will be used as facettes in the VLO are a good start) * hierarchies, inheritance * metadata versus data This section could re-use information from the [http://www.clarin.eu/sites/default/files/AP3-007-CMDI_and_granularity.pdf granularity document] ==== Quality check list ==== cross-check with modelling do's and dont's [This list is obsoleted by Cmdi/QualityCriteria] * is the CMDI file schema-valid? * are the header fields complete? * is there a uniqe !MdSelfLink? * is there an !MdCollectionDisplayName? * does it contain !ResourceProxy elements? * is there a link to a !LandingPage when available? * is there an indication of the mime type? * is the file too big to be useful? (> suggest higher granularity) * sparseness: what about files that hardly contain any information? * what is the information entropy? (> lots of very similar files might be an indication of a suboptimal modelling) * ...