= CMDI Taskforce & curation taskforce joint meetings at the CLARIN Annual Conference 2016 = * What? * Meeting together with the curation taskforce at [https://www.clarin.eu/event/2016/clarin-annual-conference-2016-aix-en-provence-france CAC2016] * Who? * Members of the [[Taskforces/CMDI|CMDI taskforce]] and [[Taskforces/Curation|curation taskforce]], and other people involved/interested in CMDI & metadata quality * When? * On basis of on basis of [http://doodle.com/poll/78g7egwn58zv8kdv doodle]: * 25 October 2016: dinner * 26 October: 9:15 - 11:15 (according to [https://www.clarin.eu/content/programme-clarin-annual-conference-2016 programme]) * 27 October: 13:00 - 14:30 (during lunch) * Where? * [https://www.google.com/maps/place/Centre+de+Congr%C3%A8s/@43.525376,5.4527233,17z/data=!4m5!3m4!1s0x12c98d95273ef54b:0x62171dfe147e84e!8m2!3d43.525376!4d5.454912 Centre de Congrès, Aix-en-Provence] (for details, see Travel information on the [https://www.clarin.eu/event/2016/clarin-annual-conference-2016-aix-en-provence-france conference page]) == Documents == * [[CMDI 1.2/Specification|CMDI 1.2 specification]] == Agenda == Please add your agenda points or send them to tf-cmdi@lists.clarin.eu. === Tuesday 25 October (dinner) === Dinner, opportunity for (even more) informal discussions. We meet at 19:30 in front of the conference venue (see above for location details). Restaurant: TBD (reasonable options: [http://www.altogusto.fr/ Alto Gusto], [http://www.maharaja-aix.com/ Le Maharaja], [http://www.sajna.fr/ Sajna], [http://www.restaurant-lecedre.fr/ Le Cèdre], [https://www.google.com/maps/place/La+Grange/@43.5267548,5.4486086,19z/data=!4m21!1m15!4m14!1m5!1m1!1s0x0:0x62171dfe147e84e!2m2!1d5.454912!2d43.525376!1m6!1m2!1s0x12c98d9800ceef5b:0xb6de30a2bfc99f09!2sLe+C%C3%A8dre,+10+Rue+Victor+Leydet,+13100+Aix-en-Provence,+Frankrijk!2m2!1d5.4459574!2d43.5271894!3e2!3m4!1s0x0:0x2e7a0ba1c26f3066!8m2!3d43.527121!4d5.4486944!6m1!1e1 La Grange]) === Wednesday 26 October (joint taskforces meeting) === [https://docs.google.com/document/d/1Y2wb4jl32y9uUAo-Z8wlR0v-eLvdkYUYfl7ENtc-UGU/edit?usp=sharing Google Doc] Progress update, planning for the upcoming period. 1. CMDI 1.2 (Twan, Menzo) 1. Brief status update 1. Outlook 1. VLO 1. Brief status update 1. The VLO's 'Format' facet and the use of MIME types (Hanna) - To improve the 'Format' facet, we probably first need some agreement on it's content, i.e. the use of MIME types, in particular for file formats that don't have an unambiguous standard MIME type such as application/tei+xml or application/xml. In CLARIN-D there has been a decision to try out an approach based on Content-Type syntax; a standard MIME type with additional parameters for disambiguation, cf. [[MIME format variants|MIME format variants]] for some info and links to the discussion on mailing lists. 1. Helpdesk responsibilities and workflows (Hanna) - Is our (and CLARIN-D's) idea of helpdesk responsibilities and workflows for reports and requests via the VLO OK with everyone involved? 1. Priorities and planning (Twan) - Maintenance release (VLO 4.0.2) - Next minor release (VLO 4.1) 1. Development process (Twan) - Code forks/synchronisation - Sustainable procedure for (near) future 1. Facet mapping & normalisation (Matej, Davor) 1. Brief status update w.r.t. curation module, related work 1. Workflow for application of mapping in production - Short term - Long term (curation module) === Thursday 27 October (lunch) === '''We meet at the entrance of the conference centre at 13:05, and will walk to a restaurant where we can have lunch (TBD)''' Input from TF members, distribution of work 1. CMDI 1.2 1. Brief status update 1. Outlook 1. Distribution of work 1. VLO 1. The VLO's 'Format' facet and the use of mimetypes 1. Helpdesk responsibilities and workflows 1. Outlook 1. Distribution of work 1. Facet mapping & normalisation 1. Brief status update 1. Outlook 1. Distribution of work == Notes == Present: * Wednesday * Menzo Windhouwer, Twan Goosen, Thomas Eckart, Matej Durco, Marcin Oleksy, Hanna Hedeland, Gregoire Montcheuil, Danguole Kalinauskaite, Dimitrios Galanis, João Silva, Krista Liin, Davor Ostojic, Jan Odijk * Thursday * Menzo Windhouwer, Twan Goosen, Matej Durco, Marcin Oleksy, Hanna Hedeland, Pavel Stranak, Neeme Kahusk, Marcin Pol, Davor Ostojic ==== CMDI 1.2 ==== * CMDI1.2 rolled out - supported in the production environment * CMDI toolkit finished - validate, create, migrate CMD records * [https://www.clarin.eu/cmdi1.2-specification Spec] is available online * !ComponentRegistry supports 1.2 (and still 1.1), not all 1.2 features are implemented yet. -> Twan working on CR 2.2 with more complete coverage of 1.2 features * support for 1.2 in tools/infra: * Metadata curation module * COMEDI - not yet but there's some commitment towards support (could become the default/recommended CMDI editor instead of Arbil) * Centres partly already providing 1.2 (for now only Leipzig known, but others (in CLARIN-D) have plans) * CLAVAS! * CMDI 1.2 supports the CLAVAS vocabularies (currently: Organisations, Language codes; new vocabularies could come in, but need commitment) * candidates for other vocabularies: resource type, media types (MIME), licences - see below * CLAVAS based on OpenSKOS provided by Meertens - new version of OpenSKOS coming soon. * governance/versioning/curation of vocabularies is a problem (to solve) * how to edit the vocabularies (google-spreadsheets) * all controlled vocabularies to be used in VLO should be in CLAVAS - Outlook - Best practices - a more practical guide to usage of CMDI - Current/previous effort: https://www.clarin.eu/content/cmdi-best-practice-guide - Issues: - Hierarchies - resource type - Input (non-exhaustive) - [https://www.clarin.eu/content/cmdi-best-practice-guide Existing best practices draft] - CLARIN-D VLO-WG output [https://trac.clarin.eu/wiki/VLO-Taskforce] - Recommendations document - [http://www.clarin.eu/sites/default/files/AP3-007-CMDI_and_granularity.pdf Granularity recommendations] - [https://www.clarin.eu/faq-page/267 Metadata FAQ] at clarin.eu - [https://www.clarin.eu/sites/default/files/trippel-zinn-CLARIN2016_paper_15.pdf Paper] by Thorsten Trippel and Claus Zinn ?? - ... - Has to go inline/in sync with curation work and VLO development - '''Menzo''' organises a kick-off meeting for the best practices document - First task: decide on structure and global content of the document ==== VLO ==== - Status - This summer 4.0 released: CMDI 1.2 + UI - 'Format' facet and the use of MIME types (Hanna) - facet is fed from resource proxies (120 distinct values) - Also map from mime type component? E.g. in case of zips - Make an (open) vocabulary for media types - Menzo: Yes, but cannot be validated using schema but could be part of the quality assessment procedure - Helpdesk responsibilities and workflows (Hanna) - ''Jan Odijk'': quite satisfied, usually get an answer - ''Matej'': receive reports, do not always act on it - Ticket can be addressed to multiple people if they are in the queue (or by sending an e-mail) - Define Workflow better between trac-tickets and OTRS-issues (and github-issues) - '''Hanna''' will come back to us with a new/updated approach - Priorities and planning - [https://github.com/clarin-eric/VLO/milestones Milestones] - Maintenance release ([https://github.com/clarin-eric/VLO/milestone/2 VLO 4.0.2]) - update of mappings! - https://github.com/clarin-eric/VLO/issues/21 - November? - Separate meeting in the next few weeks ('''Twan''') - [http://doodle.com/poll/ivbrbvt44v97uegn Doodle] - Next minor release ([https://github.com/clarin-eric/VLO/milestone/1 VLO 4.1]) - Use new dynamic mapping retrieval workflow - Various forks of a (to be created) mapping repository - Domain experts and other curators can work on a fork and send pull request to curation experts - Curation experts can send pull request to (production/beta/alpha) VLO maintainers - VLO maintainers can maintain testing and production branches/forks - Use GitHub to publish and distribute mappings - Sync/transfer between CSV, CLAVAS, uniform mapping files (see below) - https://github.com/clarin-eric/VLO/issues/22 - Multivalue selection in facets - #794 - '''Davor''' make ticket on 4.1 milestone for performance issue w.r.t. availability selection - Q1 2017?? - Development process - Discuss in separate meeting Availability components: '''Pavel''' look into existing components and report. ==== Facet mapping & normalisation ==== - Status - [http://clarin.oeaw.ac.at/curate Curation module], [https://clarin.oeaw.ac.at/vlo/ Curation instance of VLO]\\https://github.com/acdh-oeaw/vlo-curation - Facet coverage - E.g Resourcetype: map from profile name - big improvement - Tentative mapping(s) -> how to agree on (precise) application of this approach? - ''Jan'': for organisations, mapping is not enough, also fuzzy search required, too many spelling variation\\So we would need to update the organisation vocabulary,\\But lot could be achieved also with fuzzy search (string normalisation, editing distance)\\Reporting of new values (to curators) would be helpful - possibly with automatic (pre)processing on top - [VLO] For certain facet, apply fuzzy factor by default (for free text search - maybe also facets???) - https://github.com/clarin-eric/VLO/issues/26 - Workflow for application of mapping in production - Use github (with forks) for curating/editing the vocabularies\\You can edit csv in github directly - github:vlo-curation/maps/csv - github:vlo-curation/maps/profileName -> resourceType - Short term (VLO 4.1) - Resource type - CLAVAS vocabulary - Map for profile name -> resource type mapping - People: '''Jan''' - Licence/availability - CLAVAS vocabulary - Licence URIs (instead of the handles that are assigned to concepts by OpenSKOS automatically) - Mapping files licence identifier (URI) -> licence name, licence URL - People: Legal Issues Committee, ask CLARINO people to curate/update ('''Oddrun'''?) - MIME type - Map ~120 types to descriptions (vocabularies) - Some names/descriptions already provided in “MIME type inventory” document - Map for cross-facet mapping (-> resource type) - '''Hanna''' (+SCCTC editorial team of interoperability document?) * Usage of CLAVAS * how to deal with wild variants of concepts * Hidden labels - for spelling errors etc; * or separate ConceptSchemes one with official labels/concepts, one with all the misspellings and mapping realtions between the concepts of the different concept schemes - Mapping definitions and translations between formats/modalities (csv, mapping xml, clavas) - How to get from CSV to SKOS? - How to get from SKOS to vlo-map? (fetch whole concept-scheme from CLAVAS and transform via XSL to a map) - '''Davor, Matej, Twan''': Determine/implement transitions between CSV, CLAVAS, Mapping files - Long term (curation module) - ???