VLO Taskforce Virtual meeting February 18, 2015
Overall Topic: Quality Assurance Methods for Metadata within the VLO
Participants: Peter Fankhauser, Hanna Hedeland, Susanne Haaf, Axel Herold, Thomas Eckart, Jens Stegmann, Matej Durco, Jörg Knappen
- (1) Report Matej about the current work of CLARIN's Metadata Curation Taskforce
- Main goal: improve the presentation of data within the VLO (at the presentation level)
- "Organisation" facet: Proposal for a mapping of terms onto one another in order to reduce of variants; this mapping has been implemented, but does not lead to 100% correct results, yet
- Resources which should be notices/used for the MC issue:
- Marc Kemps-Snijders: Metadata quality assurance for CLARIN (Paper; will be circulated internally)
- Thorsten Trippel, Daan Broeder, Matej Durco, Oddrun Pauline Ohren: Towards automatic quality assessment of component metadata. http://www.lrec-conf.org/proceedings/lrec2014/summaries/1011.html (Paper)
- Oliver Schonefeld: CMDI validator (Tool) --> Would it be possible to apply this tool in our context and at which cost?
- Jan Odijk: Discovering Resources in CLARIN: Problems and Suggestions for Solutions. http://dspace.library.uu.nl/bitstream/handle/1874/303788/Searching_with_the_VLO.pdf (Paper)
- CMDI-Taskforce: CMDI Best Practice Guide (Documentation; currently in preparation)
- Questions:
- What would be a feasible permanent QA workflow for the VLO (at every harvest) and how can we get to it?
- What can we (as MC and VLO taskforces) provide to support the implementation of QA mechanisms for the VLO (documentation?, lists of closed vocabularies?, mappings? ... )?
- (2) Discussion: MC via Pre- vs. Post-Processing
- Preprocessing:
- problem: the incoming metadata might be changed significantly without knowledge (and consent) of the data providers
- possible solution: curation reports sent to the data providers
- Postprocessing:
- could result in a hybrid strategy:
- changes to the metadata only affect the presentation level; PID leads to the original data
- in addition data providers will be informed about problems with and possible corrections of their MD which they would have to perform themselves
- could result in a hybrid strategy:
- Preprocessing:
- (3) Next steps:
- VLO-Taskforce finalizes document on facet definitions and fillings which is currently under preparation (till March 20, 2015)
- facets will be implemented according to the specifications in that document
- evaluation of the specifications based on the outcome
- consultation with the MC-Taskforce on the results
Last modified 9 years ago
Last modified on 03/12/15 14:14:54