Version 7 (modified by 9 years ago) (diff) | ,
---|
This page serves to collect quality criteria for metadata records (and - by extension - profiles).
There has been a number of such lists proposed, we aim to gather them here in one place.
Related work
- CmdiBestPracticeGuide#Qualitycheck-list (criteria from there moved to this list)
- MDQAS older draft for specifying a MD quality assessment service (criteria from there moved to this list)
- Curation Module will operationalize the defined quality criteria and allow to check metadata records if they adhere to these criteria
- LREC2014 paper by Thorsten Trippel et al. Towards automatic quality assessment of component metadata
- Technical report by Marc Kemps-Snijders (2014): Metadata Quality Assurance
Criteria
(Aspects with questionmark are subject to discussion)
Schema level
- ratio of elements with data categories
- presence of "required" data categories
- coverage of mapping to vlo facets
- public accessibility of the profile
- status of the concept (approved/candidate/depracted/...)
- number of defined elements (identify a range - too many, too little)
- number of unique components used
- total number of components used
- number of references to distinct data categories defined externaly
Instance level
- availability of the (reference to) schema (i.e. both there is a reference to schema and the schema can be retrieved)
- validity of the record wrt to the schema
- are the header fields complete?
- is there a uniqe MdSelfLink??
- is there an MdCollectionDisplayName??
- does it contain ResourceProxy? elements?
- is there a link to a LandingPage? when available?
- is there an indication of the mime type?
- URL-inspection - broken links
- filled-in ratio / sparseness?
- how many of the elements defined by schema are actually populated with information
- what about files that hardly contain any information?
- completeness concerning relative importance
- editing quality: typos, expected format(date, capitalization, pl vs sing)
- values conform to a controlled vocabulary (where applicable)
- publication before creation date
- the combination of values in a related categorical fields is not consistent (if field A has value x, field B should have y or z)
- size? - e.g. overall size (measured in characters)
- is the file too big to be useful?
-> suggest higher granularity)
- is the file too big to be useful?
- number of elements
- Length of the strings in description fields
- what is the information entropy? (lots of very similar files might be an indication of a suboptimal modelling)
- accuracy - semantic distance between data and md?
- conformance to expectation metrics?
- coherence - different metadata fields describe the same object in the same way?
- accessibility metrics - retrieval and cognitive)?
- timeliness metrics - how quality changes in time?
Attachments (1)
-
The Metadata Quality Assurance-final_MKS.docx (5.2 MB) - added by 9 years ago.
Marc Kemps-Snijders (2014): Metadata Quality Assurance, technical report