wiki:VLO-Taskforce/Meeting-2015-02-18

VLO Taskforce Virtual meeting February 18, 2015

Overall Topic: Quality Assurance Methods for Metadata within the VLO

Participants: Peter Fankhauser, Hanna Hedeland, Susanne Haaf, Axel Herold, Thomas Eckart, Jens Stegmann, Matej Durco, Jörg Knappen

  • (1) Report Matej about the current work of CLARIN's Metadata Curation Taskforce
    • Main goal: improve the presentation of data within the VLO (at the presentation level)
    • "Organisation" facet: Proposal for a mapping of terms onto one another in order to reduce of variants; this mapping has been implemented, but does not lead to 100% correct results, yet
    • Resources which should be notices/used for the MC issue:
      1. Marc Kemps-Snijders: Metadata quality assurance for CLARIN (Paper; will be circulated internally)
      2. Thorsten Trippel, Daan Broeder, Matej Durco, Oddrun Pauline Ohren: Towards automatic quality assessment of component metadata. http://www.lrec-conf.org/proceedings/lrec2014/summaries/1011.html (Paper)
      3. Oliver Schonefeld: CMDI validator (Tool) --> Would it be possible to apply this tool in our context and at which cost?
      4. Jan Odijk: Discovering Resources in CLARIN: Problems and Suggestions for Solutions. http://dspace.library.uu.nl/bitstream/handle/1874/303788/Searching_with_the_VLO.pdf (Paper)
      5. CMDI-Taskforce: CMDI Best Practice Guide (Documentation; currently in preparation)
    • Questions:
      • What would be a feasible permanent QA workflow for the VLO (at every harvest) and how can we get to it?
      • What can we (as MC and VLO taskforces) provide to support the implementation of QA mechanisms for the VLO (documentation?, lists of closed vocabularies?, mappings? ... )?
  • (2) Discussion: MC via Pre- vs. Post-Processing
    • Preprocessing:
      • problem: the incoming metadata might be changed significantly without knowledge (and consent) of the data providers
      • possible solution: curation reports sent to the data providers
    • Postprocessing:
      • could result in a hybrid strategy:
        1. changes to the metadata only affect the presentation level; PID leads to the original data
        2. in addition data providers will be informed about problems with and possible corrections of their MD which they would have to perform themselves
  • (3) Next steps:
    1. VLO-Taskforce finalizes document on facet definitions and fillings which is currently under preparation (till March 20, 2015)
    2. facets will be implemented according to the specifications in that document
    3. evaluation of the specifications based on the outcome
    4. consultation with the MC-Taskforce on the results
Last modified 9 years ago Last modified on 03/12/15 14:14:54