Cmdi/QualityCriteria – CLARIN Trac

Context Navigation

This page serves to collect quality criteria for metadata records (and - by extension - profiles).

There has been a number of such lists proposed, we aim to gather them here in one place.

CmdiBestPracticeGuide#Qualitycheck-list (criteria from there moved to this list)
MDQAS older draft for specifying a MD quality assessment service (criteria from there moved to this list)
Curation Module will operationalize the defined quality criteria and allow to check metadata records if they adhere to these criteria
LREC2014 paper by Thorsten Trippel et al. Towards automatic quality assessment of component metadata
Technical report by Marc Kemps-Snijders (2014): Metadata Quality Assurance

(Aspects with questionmark are subject to discussion)

strict criteria:

soft:

strict:

availability of the (reference to) schema (i.e. both there is a reference to schema and the schema can be retrieved)
Does the schema reside in the Component Registry
validity of the record wrt to the schema
are the header fields complete?
- is there a unique MdSelfLink?
- is there an MdCollectionDisplayName?
- is there a MdProfile?
does it contain ResourceProxy elements?
- link to a LandingPage? when available?
- is there an indication of the mime type?
- ResourceProxy? with @type=Resource or @type=Metadata available?
URL-inspection - broken links
- in MdSelfLink?
- in ResourceProxy?
- in the CMD_elements ( less important)
filled-in ratio / sparseness?
- how many of the elements defined by schema are actually populated with information
- what about files that hardly contain any information?
- completeness concerning relative importance
- coverage of the VLO facets (instance level)
size - e.g. overall size (measured in MB)
- is the file too big to be useful? (VLO has strong rule on max size)
  -> suggest higher granularity)
- max number of ResourceProxy per record

soft:

values conform to a controlled vocabulary (where applicable)
number of elements
Length of the strings in description fields
what is the information entropy? (lots of very similar files might be an indication of a suboptimal modelling)

problematic:

? publication before creation date
? editing quality: typos, expected format(date, capitalization, pl vs sing)
the combination of values in a related categorical fields is not consistent (if field A has value x, field B should have y or z)

interesting, but difficult:

accuracy - semantic distance between data and md?
conformance to expectation metrics?
coherence - different metadata fields describe the same object in the same way?
accessibility metrics - retrieval and cognitive)? check if you can retrieve the data, or there is login (and shib-login)
timeliness metrics - how quality changes in time?

Last modified 8 years ago Last modified on 11/11/15 15:05:07

The Metadata Quality Assurance-final_MKS.docx (5.2 MB) - added by matej.durco@oeaw.ac.at 9 years ago. Marc Kemps-Snijders (2014): Metadata Quality Assurance, technical report