wiki:Taskforces/CMDI/Meeting20161025-28

CMDI Taskforce & curation taskforce joint meetings at the CLARIN Annual Conference 2016

Documents

Agenda

Please add your agenda points or send them to tf-cmdi@lists.clarin.eu.

Tuesday 25 October (dinner)

Dinner, opportunity for (even more) informal discussions. We meet at 19:30 in front of the conference venue (see above for location details).

Restaurant: TBD (reasonable options: Alto Gusto, Le Maharaja, Sajna, Le Cèdre, La Grange)

Wednesday 26 October (joint taskforces meeting)

Google Doc

Progress update, planning for the upcoming period.

  1. CMDI 1.2 (Twan, Menzo)
    1. Brief status update
    2. Outlook
  2. VLO
    1. Brief status update
    2. The VLO's 'Format' facet and the use of MIME types (Hanna)
      • To improve the 'Format' facet, we probably first need some agreement on it's content, i.e. the use of MIME types, in particular for file formats that don't have an unambiguous standard MIME type such as application/tei+xml or application/xml. In CLARIN-D there has been a decision to try out an approach based on Content-Type syntax; a standard MIME type with additional parameters for disambiguation, cf. MIME format variants for some info and links to the discussion on mailing lists.
    3. Helpdesk responsibilities and workflows (Hanna)
      • Is our (and CLARIN-D's) idea of helpdesk responsibilities and workflows for reports and requests via the VLO OK with everyone involved?
    4. Priorities and planning (Twan)
      • Maintenance release (VLO 4.0.2)
      • Next minor release (VLO 4.1)
    5. Development process (Twan)
      • Code forks/synchronisation
      • Sustainable procedure for (near) future
  3. Facet mapping & normalisation (Matej, Davor)
    1. Brief status update w.r.t. curation module, related work
    2. Workflow for application of mapping in production
      • Short term
      • Long term (curation module)

Thursday 27 October (lunch)

We meet at the entrance of the conference centre at 13:05, and will walk to a restaurant where we can have lunch (TBD)

Input from TF members, distribution of work

  1. CMDI 1.2
    1. Brief status update
    2. Outlook
    3. Distribution of work
  2. VLO
    1. The VLO's 'Format' facet and the use of mimetypes
    2. Helpdesk responsibilities and workflows
    3. Outlook
    4. Distribution of work
  3. Facet mapping & normalisation
    1. Brief status update
    2. Outlook
    3. Distribution of work

Notes

Present:

  • Wednesday
    • Menzo Windhouwer, Twan Goosen, Thomas Eckart, Matej Durco, Marcin Oleksy, Hanna Hedeland, Gregoire Montcheuil, Danguole Kalinauskaite, Dimitrios Galanis, João Silva, Krista Liin, Davor Ostojic, Jan Odijk
  • Thursday
    • Menzo Windhouwer, Twan Goosen, Matej Durco, Marcin Oleksy, Hanna Hedeland, Pavel Stranak, Neeme Kahusk, Marcin Pol, Davor Ostojic

CMDI 1.2

  • CMDI1.2 rolled out - supported in the production environment
  • CMDI toolkit finished - validate, create, migrate CMD records
  • Spec is available online
  • ComponentRegistry supports 1.2 (and still 1.1), not all 1.2 features are implemented yet. -> Twan working on CR 2.2 with more complete coverage of 1.2 features
  • support for 1.2 in tools/infra:
    • Metadata curation module
    • COMEDI - not yet but there's some commitment towards support (could become the default/recommended CMDI editor instead of Arbil)
    • Centres partly already providing 1.2 (for now only Leipzig known, but others (in CLARIN-D) have plans)
  • CLAVAS!
    • CMDI 1.2 supports the CLAVAS vocabularies (currently: Organisations, Language codes; new vocabularies could come in, but need commitment)
    • candidates for other vocabularies: resource type, media types (MIME), licences - see below
    • CLAVAS based on OpenSKOS provided by Meertens - new version of OpenSKOS coming soon.
    • governance/versioning/curation of vocabularies is a problem (to solve)
    • how to edit the vocabularies (google-spreadsheets)
    • all controlled vocabularies to be used in VLO should be in CLAVAS

VLO

  • Status
    • This summer 4.0 released: CMDI 1.2 + UI
  • 'Format' facet and the use of MIME types (Hanna)
    • facet is fed from resource proxies (120 distinct values)
      • Also map from mime type component? E.g. in case of zips
    • Make an (open) vocabulary for media types
      • Menzo: Yes, but cannot be validated using schema but could be part of the quality assessment procedure
  • Helpdesk responsibilities and workflows (Hanna)
    • Jan Odijk: quite satisfied, usually get an answer
    • Matej: receive reports, do not always act on it
    • Ticket can be addressed to multiple people if they are in the queue (or by sending an e-mail)
    • Define Workflow better between trac-tickets and OTRS-issues (and github-issues)
    • Hanna will come back to us with a new/updated approach
  • Priorities and planning
    • Milestones
    • Maintenance release (VLO 4.0.2) - update of mappings!
    • Next minor release (VLO 4.1)
      • Use new dynamic mapping retrieval workflow
        • Various forks of a (to be created) mapping repository
          • Domain experts and other curators can work on a fork and send pull request to curation experts
          • Curation experts can send pull request to (production/beta/alpha) VLO maintainers
          • VLO maintainers can maintain testing and production branches/forks
          • Use GitHub to publish and distribute mappings
        • Sync/transfer between CSV, CLAVAS, uniform mapping files (see below)
        • https://github.com/clarin-eric/VLO/issues/22
      • Multivalue selection in facets
      • Davor make ticket on 4.1 milestone for performance issue w.r.t. availability selection
      • Q1 2017??
  • Development process
    • Discuss in separate meeting

Availability components: Pavel look into existing components and report.

Facet mapping & normalisation

  • Status
    • Curation module, Curation instance of VLO
      https://github.com/acdh-oeaw/vlo-curation
      • Facet coverage
        • E.g Resourcetype: map from profile name - big improvement
        • Tentative mapping(s) -> how to agree on (precise) application of this approach?
    • Jan: for organisations, mapping is not enough, also fuzzy search required, too many spelling variation
      So we would need to update the organisation vocabulary,
      But lot could be achieved also with fuzzy search (string normalisation, editing distance)
      Reporting of new values (to curators) would be helpful - possibly with automatic (pre)processing on top
    • Workflow for application of mapping in production
      • Use github (with forks) for curating/editing the vocabularies
        You can edit csv in github directly
      • github:vlo-curation/maps/csv
      • github:vlo-curation/maps/profileName -> resourceType
  • Short term (VLO 4.1)
    • Resource type
      • CLAVAS vocabulary
      • Map for profile name -> resource type mapping
      • People: Jan
    • Licence/availability
      • CLAVAS vocabulary
        • Licence URIs (instead of the handles that are assigned to concepts by OpenSKOS automatically)
      • Mapping files licence identifier (URI) -> licence name, licence URL
      • People: Legal Issues Committee, ask CLARINO people to curate/update (Oddrun?)
    • MIME type
      • Map ~120 types to descriptions (vocabularies)
        • Some names/descriptions already provided in “MIME type inventory” document
      • Map for cross-facet mapping (-> resource type)
      • Hanna (+SCCTC editorial team of interoperability document?)
  • Usage of CLAVAS
    • how to deal with wild variants of concepts
      • Hidden labels - for spelling errors etc;
      • or separate ConceptSchemes? one with official labels/concepts, one with all the misspellings and mapping realtions between the concepts of the different concept schemes
  • Mapping definitions and translations between formats/modalities (csv, mapping xml, clavas)
    • How to get from CSV to SKOS?
    • How to get from SKOS to vlo-map? (fetch whole concept-scheme from CLAVAS and transform via XSL to a map)
    • Davor, Matej, Twan: Determine/implement transitions between CSV, CLAVAS, Mapping files
  • Long term (curation module)
    • ???
Last modified 7 years ago Last modified on 11/11/16 12:58:42

Attachments (1)

Download all attachments as: .zip