wiki:Taskforces/CMDI/Meeting20161025-28

Version 10 (modified by Twan Goosen, 8 years ago) (diff)

Pasted notes from Google Doc. TODO: clean up

rom = CMDI Taskforce & curation taskforce joint meetings at the CLARIN Annual Conference 2016 =

Documents

Agenda

Please add your agenda points or send them to tf-cmdi@lists.clarin.eu.

Tuesday 25 October (dinner)

Dinner, opportunity for (even more) informal discussions. We meet at 19:30 in front of the conference venue (see above for location details).

Restaurant: TBD (reasonable options: Alto Gusto, Le Maharaja, Sajna, Le Cèdre, La Grange)

Wednesday 26 October (joint taskforces meeting)

Google Doc

Progress update, planning for the upcoming period.

  1. CMDI 1.2 (Twan, Menzo)
    1. Brief status update
    2. Outlook
  2. VLO
    1. Brief status update
    2. The VLO's 'Format' facet and the use of MIME types (Hanna)
      • To improve the 'Format' facet, we probably first need some agreement on it's content, i.e. the use of MIME types, in particular for file formats that don't have an unambiguous standard MIME type such as application/tei+xml or application/xml. In CLARIN-D there has been a decision to try out an approach based on Content-Type syntax; a standard MIME type with additional parameters for disambiguation, cf. MIME format variants for some info and links to the discussion on mailing lists.
    3. Helpdesk responsibilities and workflows (Hanna)
      • Is our (and CLARIN-D's) idea of helpdesk responsibilities and workflows for reports and requests via the VLO OK with everyone involved?
    4. Priorities and planning (Twan)
      • Maintenance release (VLO 4.0.2)
      • Next minor release (VLO 4.1)
    5. Development process (Twan)
      • Code forks/synchronisation
      • Sustainable procedure for (near) future
  3. Facet mapping & normalisation (Matej, Davor)
    1. Brief status update w.r.t. curation module, related work
    2. Workflow for application of mapping in production
      • Short term
      • Long term (curation module)

Thursday 27 October (lunch)

Input from TF members, distribution of work

  1. CMDI 1.2
    1. Brief status update
    2. Outlook
    3. Distribution of work
  2. VLO
    1. The VLO's 'Format' facet and the use of mimetypes
    2. Helpdesk responsibilities and workflows
    3. Outlook
    4. Distribution of work
  3. Facet mapping & normalisation
    1. Brief status update
    2. Outlook
    3. Distribution of work

Notes

Wednesday 26 October

Joint TF meeting (raw notes: Google Doc)

CMDI 1.2

CMDI1.2 rolled out - supported in the production environment CMDI toolkit finished - validate, create, migrate CMD records Spec is available online CompRegistry? supports 1.2 (and still 1.1), not all 1.2 features are implemented yet. -> Twan working on CR 2.2 with more complete coverage of 1.2 features support for 1.2 in tools: ??

COMEDI? (could become the default/recommended CMDI editor instead of Arbil)

Centres partly already providing 1.2 (Leipzig)

CLAVAS! CMDI 1.2 supports the CLAVAS vocabularies (currently: Organisations, Language codes; new vocabularies could come in, but need commitment) candidates for vocabularies: mime-types, licences (manager: legal issues committee?) CLAVAS based on OpenSKOS provided by Meertens - new version of OpenSKOS coming soon. governance/versioning/curation of vocabularies is a problem (to solve) how to edit the vocabularies (google-spreadsheets) how to deal with wild variants of concepts Hidden labels - for spelling errors etc; or separate ConceptSchemes? one with official labels/concepts, one with all the misspellings and mapping realtions between the concepts of the different concept schemes all controlled vocabularies to be used in VLO should be in CLAVAS

Best practices - a more practical guide to usage of CMDI Current/previous effort: https://www.clarin.eu/content/cmdi-best-practice-guide Issues: Hierarchies resource type Input (non-exhaustive) Existing best practices draft CLARIN-D VLO-WG output https://trac.clarin.eu/wiki/VLO-Taskforce Recommendations document Granularity recommendations FAQ Paper by Thorsten Trippel and Claus Zinn ?? ... Has to go inline/in sync with curation work and VLO development

VLO

Brief status update 4.0: CMDI 1.2 + UI The VLO's 'Format' facet and the use of MIME types (Hanna) To improve the 'Format' facet, we probably first need some agreement on it's content, i.e. the use of MIME types, in particular for file formats that don't have an unambiguous standard MIME type such as application/tei+xml or application/xml. In CLARIN-D there has been a decision to try out an approach based on Content-Type syntax; a standard MIME type with additional parameters for disambiguation, cf. MIME format variants for some info and links to the discussion on mailing lists. VLO ‘format’ facet is fed from resource proxies (120 distinct values) Also map from mime type component? E.g. in case of zips Make an (open) vocabulary for media types Cannot be validated using schema but could be part of the quality assessment procedure Helpdesk responsibilities and workflows (Hanna) Is our (and CLARIN-D's) idea of helpdesk responsibilities and workflows for reports and requests via the VLO OK with everyone involved? Jan: quite satisfied, usually get an answer Matej: receive reports, do not always act on it Ticket can be addressed to multiple people if they are in the queue (or by sending an e-mail) Define Workflow better between trac-tickets and OTRS-issues (and github-issues) Hanna will come back to us with a new/updated approach Priorities and planning (Twan) https://github.com/clarin-eric/VLO/milestones Maintenance release (VLO 4.0.2) - update of mappings! Use GitHub to publish and distribute mappings November? Separate meeting in the next few weeks Next minor release (VLO 4.1) Use new dynamic mapping retrieval workflow Various forks of a (to be created) mapping repository Domain experts and other curators can work on a fork and send pull request to curation experts Curation experts can send pull request to (production/beta/alpha) VLO maintainers VLO maintainers can maintain testing and production branches/forks Sync/transfer between CSV, CLAVAS, uniform mapping files (see below) Q1 2017?? Development process (Twan) Code forks/synchronisation Sustainable procedure for (near) future

Facet mapping, normalisation, curation module

Brief status update w.r.t. curation module, related work Curation module, Curation instance of VLO https://github.com/acdh-oeaw/vlo-curation Facet coverage E.g Resourcetype: map from profile name - big improvement Tentative mapping(s) -> how to agree on (precise) application of this approach? Jan: for organisations, mapping is not enough, also fuzzy search required, too many spelling variation So we would need to update the organisation vocabulary, But lot could be achieved also with fuzzy search (string normalisation, editing distance) Reporting of new values (to curators) would be helpful - possibly with automatic (pre)processing on top [VLO] For certain facet, apply fuzzy factor by default (for free text search - maybe also facets???)- Workflow for application of mapping in production Use github (with forks) for curating/editing the vocabularies You can edit csv in github directly github:vlo-curation/maps/csv github:vlo-curation/maps/profileName -> resourceType

Short term (4.1) Resource type CLAVAS vocabulary Map for profile name -> resource type mapping People: Jan Licence/availability CLAVAS vocabulary Licence URIs (instead of the handles that are assigned to concepts by OpenSKOS automatically) Mapping files licence identifier (URI) -> licence name, licence URL People: Legal Issues Committee, ask CLARINO people to curate/update (Oddrun?) MIME type Map ~120 types to descriptions (vocabularies) Some names/descriptions already provided in “MIME type inventory” document Map for cross-facet mapping (-> resource type) Hanna (+SCCTC editorial team of interoperability document?) How to get from CSV to SKOS? How to get from SKOS to vlo-map? (fetch whole concept-scheme from CLAVAS and transform via XSL to a map) Determine/implement transitions between CSV, CLAVAS, Mapping files Long term (curation module)

Attachments (1)

Download all attachments as: .zip