CMDI Taskforce & curation taskforce joint meetings at the CLARIN Annual Conference 2016
- What?
- Meeting together with the curation taskforce at CAC2016
- Who?
- Members of the CMDI taskforce and curation taskforce, and other people involved/interested in CMDI & metadata quality
- When?
- On basis of on basis of doodle:
- 25 October 2016: dinner
- 26 October: 9:15 - 11:15 (according to programme)
- 27 October: 13:00 - 14:30 (during lunch)
- On basis of on basis of doodle:
- Where?
- Centre de Congrès, Aix-en-Provence (for details, see Travel information on the conference page)
Documents
Agenda
Please add your agenda points or send them to tf-cmdi@lists.clarin.eu.
Tuesday 25 October (dinner)
Dinner, opportunity for (even more) informal discussions. We meet at 19:30 in front of the conference venue (see above for location details).
Restaurant: TBD (reasonable options: Alto Gusto, Le Maharaja, Sajna, Le Cèdre, La Grange)
Wednesday 26 October (joint taskforces meeting)
Progress update, planning for the upcoming period.
- CMDI 1.2 (Twan, Menzo)
- Brief status update
- Outlook
- VLO
- Brief status update
- The VLO's 'Format' facet and the use of MIME types (Hanna)
- To improve the 'Format' facet, we probably first need some agreement on it's content, i.e. the use of MIME types, in particular for file formats that don't have an unambiguous standard MIME type such as application/tei+xml or application/xml. In CLARIN-D there has been a decision to try out an approach based on Content-Type syntax; a standard MIME type with additional parameters for disambiguation, cf. MIME format variants for some info and links to the discussion on mailing lists.
- Helpdesk responsibilities and workflows (Hanna)
- Is our (and CLARIN-D's) idea of helpdesk responsibilities and workflows for reports and requests via the VLO OK with everyone involved?
- Priorities and planning (Twan)
- Maintenance release (VLO 4.0.2)
- Next minor release (VLO 4.1)
- Development process (Twan)
- Code forks/synchronisation
- Sustainable procedure for (near) future
- Facet mapping & normalisation (Matej, Davor)
- Brief status update w.r.t. curation module, related work
- Workflow for application of mapping in production
- Short term
- Long term (curation module)
Thursday 27 October (lunch)
We meet at the entrance of the conference centre at 13:05, and will walk to a restaurant where we can have lunch (TBD)
Input from TF members, distribution of work
- CMDI 1.2
- Brief status update
- Outlook
- Distribution of work
- VLO
- The VLO's 'Format' facet and the use of mimetypes
- Helpdesk responsibilities and workflows
- Outlook
- Distribution of work
- Facet mapping & normalisation
- Brief status update
- Outlook
- Distribution of work
Notes
Present:
- Wednesday
- Menzo Windhouwer, Twan Goosen, Thomas Eckart, Matej Durco, Marcin Oleksy, Hanna Hedeland, Gregoire Montcheuil, Danguole Kalinauskaite, Dimitrios Galanis, João Silva, Krista Liin, Davor Ostojic, Jan Odijk
- Thursday
- Menzo Windhouwer, Twan Goosen, Matej Durco, Marcin Oleksy, Hanna Hedeland, Pavel Stranak, Neeme Kahusk, Marcin Pol, Davor Ostojic
CMDI 1.2
- CMDI1.2 rolled out - supported in the production environment
- CMDI toolkit finished - validate, create, migrate CMD records
- Spec is available online
- ComponentRegistry supports 1.2 (and still 1.1), not all 1.2 features are implemented yet. -> Twan working on CR 2.2 with more complete coverage of 1.2 features
- support for 1.2 in tools/infra:
- Metadata curation module
- COMEDI - not yet but there's some commitment towards support (could become the default/recommended CMDI editor instead of Arbil)
- Centres partly already providing 1.2 (for now only Leipzig known, but others (in CLARIN-D) have plans)
- CLAVAS!
- CMDI 1.2 supports the CLAVAS vocabularies (currently: Organisations, Language codes; new vocabularies could come in, but need commitment)
- candidates for other vocabularies: resource type, media types (MIME), licences - see below
- CLAVAS based on OpenSKOS provided by Meertens - new version of OpenSKOS coming soon.
- governance/versioning/curation of vocabularies is a problem (to solve)
- how to edit the vocabularies (google-spreadsheets)
- all controlled vocabularies to be used in VLO should be in CLAVAS
- Outlook
- Best practices - a more practical guide to usage of CMDI
- Current/previous effort: https://www.clarin.eu/content/cmdi-best-practice-guide
- Issues:
- Hierarchies
- resource type
- Input (non-exhaustive)
- Existing best practices draft
- CLARIN-D VLO-WG output https://trac.clarin.eu/wiki/VLO-Taskforce
- Recommendations document
- Granularity recommendations
- Metadata FAQ at clarin.eu
- Paper by Thorsten Trippel and Claus Zinn ??
- ...
- Has to go inline/in sync with curation work and VLO development
- Menzo organises a kick-off meeting for the best practices document
- First task: decide on structure and global content of the document
- Best practices - a more practical guide to usage of CMDI
VLO
- Status
- This summer 4.0 released: CMDI 1.2 + UI
- 'Format' facet and the use of MIME types (Hanna)
- facet is fed from resource proxies (120 distinct values)
- Also map from mime type component? E.g. in case of zips
- Make an (open) vocabulary for media types
- Menzo: Yes, but cannot be validated using schema but could be part of the quality assessment procedure
- facet is fed from resource proxies (120 distinct values)
- Helpdesk responsibilities and workflows (Hanna)
- Jan Odijk: quite satisfied, usually get an answer
- Matej: receive reports, do not always act on it
- Ticket can be addressed to multiple people if they are in the queue (or by sending an e-mail)
- Define Workflow better between trac-tickets and OTRS-issues (and github-issues)
- Hanna will come back to us with a new/updated approach
- Priorities and planning
- Milestones
- Maintenance release (VLO 4.0.2) - update of mappings!
- https://github.com/clarin-eric/VLO/issues/21
- November?
- Separate meeting in the next few weeks (Twan)
- Next minor release (VLO 4.1)
- Use new dynamic mapping retrieval workflow
- Various forks of a (to be created) mapping repository
- Domain experts and other curators can work on a fork and send pull request to curation experts
- Curation experts can send pull request to (production/beta/alpha) VLO maintainers
- VLO maintainers can maintain testing and production branches/forks
- Use GitHub to publish and distribute mappings
- Sync/transfer between CSV, CLAVAS, uniform mapping files (see below)
- https://github.com/clarin-eric/VLO/issues/22
- Various forks of a (to be created) mapping repository
- Multivalue selection in facets
- Davor make ticket on 4.1 milestone for performance issue w.r.t. availability selection
- Q1 2017??
- Use new dynamic mapping retrieval workflow
- Development process
- Discuss in separate meeting
Availability components: Pavel look into existing components and report.
Facet mapping & normalisation
- Status
- Curation module, Curation instance of VLO
https://github.com/acdh-oeaw/vlo-curation- Facet coverage
- E.g Resourcetype: map from profile name - big improvement
- Tentative mapping(s) -> how to agree on (precise) application of this approach?
- Facet coverage
- Jan: for organisations, mapping is not enough, also fuzzy search required, too many spelling variation
So we would need to update the organisation vocabulary,
But lot could be achieved also with fuzzy search (string normalisation, editing distance)
Reporting of new values (to curators) would be helpful - possibly with automatic (pre)processing on top- [VLO] For certain facet, apply fuzzy factor by default (for free text search - maybe also facets???)
- https://github.com/clarin-eric/VLO/issues/26
- Workflow for application of mapping in production
- Use github (with forks) for curating/editing the vocabularies
You can edit csv in github directly - github:vlo-curation/maps/csv
- github:vlo-curation/maps/profileName -> resourceType
- Use github (with forks) for curating/editing the vocabularies
- Curation module, Curation instance of VLO
- Short term (VLO 4.1)
- Resource type
- CLAVAS vocabulary
- Map for profile name -> resource type mapping
- People: Jan
- Licence/availability
- CLAVAS vocabulary
- Licence URIs (instead of the handles that are assigned to concepts by OpenSKOS automatically)
- Mapping files licence identifier (URI) -> licence name, licence URL
- People: Legal Issues Committee, ask CLARINO people to curate/update (Oddrun?)
- CLAVAS vocabulary
- MIME type
- Map ~120 types to descriptions (vocabularies)
- Some names/descriptions already provided in “MIME type inventory” document
- Map for cross-facet mapping (-> resource type)
- Hanna (+SCCTC editorial team of interoperability document?)
- Map ~120 types to descriptions (vocabularies)
- Resource type
- Usage of CLAVAS
- how to deal with wild variants of concepts
- Hidden labels - for spelling errors etc;
- or separate ConceptSchemes? one with official labels/concepts, one with all the misspellings and mapping realtions between the concepts of the different concept schemes
- how to deal with wild variants of concepts
- Mapping definitions and translations between formats/modalities (csv, mapping xml, clavas)
- How to get from CSV to SKOS?
- How to get from SKOS to vlo-map? (fetch whole concept-scheme from CLAVAS and transform via XSL to a map)
- Davor, Matej, Twan: Determine/implement transitions between CSV, CLAVAS, Mapping files
- Long term (curation module)
- ???
Attachments (1)
- VLO dynamic mapping workflow.png (104.4 KB) - added by 7 years ago.
Download all attachments as: .zip