wiki:Cmdi/QualityCriteria

Version 1 (modified by matej.durco@oeaw.ac.at, 9 years ago) (diff)

started page, integrated criteria from CmdiBestPracticeGuide and MDQAS

This page serves to collect quality criteria for metadata records (and - by extension - profiles).

There has been a number of such lists proposed, we aim to gather them here in one place.

Related work

  • CmdiBestPracticeGuide#Qualitycheck-list (criteria from there moved to this list)
  • MDQAS older draft for specifying a MD quality assessment service (criteria from there moved to this list)
  • Curation Module will operationalize the defined quality criteria and allow to check metadata records if they adhere to these criteria
  • LREC2014 paper by Thorsten Trippel et al. Towards automatic quality assessment of component metadata
  • Technical report by Marc Kemps-Snijders (2014): Metadata Quality Assurance

Criteria

(Aspects with questionmark are subject to discussion)

Schema level

  • presence of "required" data categories
  • ratio of elements with data categories
  • size? (number of elements, components)
  • ?

Instance level

  • availability of the (reference to) schema (i.e. both there is a reference to schema and the schema can be retrieved)
  • validity of the record wrt to the schema
  • are the header fields complete?
  • does it contain ResourceProxy? elements?
    • is there a link to a LandingPage? when available?
    • is there an indication of the mime type?
  • URL-inspection are the links resolvable / dead-links? (both links/handles in ResourceProxy? and in the components/elements)
  • filled-in ratio / sparseness ?
    • how many of the elements defined by schema are actually populated with information
    • what about files that hardly contain any information?
  • values conform to a controlled vocabulary (where applicable)
  • size? - e.g. overall size (measured in characters)
    • is the file too big to be useful?
      -> suggest higher granularity)
  • what is the information entropy? (lots of very similar files might be an indication of a suboptimal modelling)

Attachments (1)