= Joint CMDI & Curation taskforces meeting 2018-10-08 at CLARIN 2018 = ---- * What? * CMDI, best practices, curation * Who? * Members of the [[Taskforces/CMDI|CMDI]] or the [[Taskforces/Curation|Curation]] taskforces and other people involved/interested in CMDI, curation and related matters * When? * Monday, 8 October 2018, 11:00-13:00 CEST (see [https://www.clarin.eu/content/programme-clarin-annual-conference-2018 conference programme]) * Where? * Lear Hall of the Galilei Congress Hotel (see [https://www.clarin.eu/event/2018/clarin-annual-conference-2018-pisa-italy conference page]) * (Zoom: https://clarin.zoom.us/j/534115968) * We will try to enable remote participation == Documents == - [https://www.clarin.eu/content/cmdi-best-practices-guide Published CMDI Best practices] - Meeting notes: - [[Taskforces/CMDI/Meeting20170918|Last year's annual conference meeting]] - [[Taskforces/CMDI/Meetings|CMDI task force meetings]] (including best practices) - [[Taskforces/Curation#Meetings|Curation task force meetings]] == Preparation == == Agenda == See([https://docs.google.com/document/d/1NnXHha5dWlwjiDUKe4qrRaz3OKsFeL-EYkV_ToOiITI/edit?usp=sharing Google doc]) for details and up-to-date information. 0. Agenda 0. Best practices 0. CLARIN Concept Registry 0. CLAVAS vocabularies 0. Metadata curation == Notes == Present: Menzo Windhouwer, Twan Goosen, Matej Durco, Ineke Schuurman, Neeme Kahusk, Hanna Hedeland, Thomas Eckart, Marcin Oleksy, Alexander König, Susanne Haaf, Oddrun Pauline Ohren, Francesca Frontini, Jussi Piitulainen, Mitchell Seaton, Jozef Misutka === Best practices === Current version (see https://www.clarin.eu/content/cmdi-best-practices-guide) * Current state * Overleaf: draft version * Version 1.1.1 published (via GitHub) * Recommended components/general info * Current state: still starting up initiative, looking for more input and collaborators (perhaps at bazaar) * There is some synergy with the resource families initiative * Talk to Jakob and Darja about their experience in developing the resource families and their ideas for recommended components * Note Twan: perhaps we can meet early December in Vienna on this? * Also ISO/DIS 24622-3 (CMDI standard part 3) * In the initial face. Alignment might be desirable. * Update: we had a brief meeting with Maria Gavrilidou (and Thorsten, Daan and Penny) at the conference. She will present to the ISO people (?) in November but will keep it quite general. We can have a meeting not too long after that to synchronise. * Separate (sub)task force * Common use cases * Current status: * Being started up * Actual writing was detached from best practices guide * Next step after recommended components * Purpose changed over time; idea now is to gather information what to document in addition to ‘general info’ * Recommendations complimentary to recommended components. Should refer to recommended components (depending on use case) but can/should also consist out of prose description * Resource families as input/use case? * Note Twan: perhaps we can meet early December in Vienna on this? (see below ‘recommended components’ above) * Multiple use cases can run in parallel as ‘pilot’ * A template is developer for domain experts based on these pilots * Organisational structure: * Coordinators (current TFs/best practices group) * Domain experts * Do not have to be involved in coordination, topics outside their domain or broader discussions === CLARIN Concept Registry === Coordinators used (metadata in) VLO to investigate concepts, facets * Generic vs specific concepts * Presentation Menzo: https://datashare.mpcdf.mpg.de/s/PnxhV7NKmzrMLwO * Identification of generic concepts on basis of VLO facets? * A lot of implicit context * Black/white listing of context (already in VLO) * Acceptable/rejectable context * Works although sometimes recursive context is needed to do it properly * More facets need to be evaluated * Expectation is that the number of concepts will stabilise * VLO needs to be adapted to evaluate broader (multiple levels of) context * Modellers will have to be educated on use of generic concepts * Perhaps multiple concept links on one item are needed * https://datashare.mpcdf.mpg.de/s/tn1tLvcSCUYG1oN * Specific approach * E.g. language definition - how broad a definition (e.g. also dialects, sociolects) * Facets in VLO do not have to directly correspond to a single concept * Generic vs specific: * Suggestion Matej: in VLO user should perhaps be able to choose between generic and specific? * Feedback on VLO tooltips * Resource families can serve as input/use cases * More or less open question: who is actually in charge of the tooltips? * What goes into the CCR, what in CLAVAS, what 'nowhere'? * https://legacy.gitbook.com/book/cmdi-taskforce/cmdi-best-practices/discussions/26 * Conclusion/policy proposal (Menzo): instances do not go into CCR. There could be a concept scheme (vocabulary) in CLAVAS * Some (potential) vocabularies are made up out of concepts (e.g. resource type; ‘book’ would be a concept, a specific title would not; ‘century’ would but ‘16th century’ wouldn’t) * Not always clear but in principle CCR’s judgement === CLAVAS vocabularies === Vocabularies * Ownership? * Bound to a person? * Should it even be allowed? * Perhaps… In any case there has to be a handover procedure * Delegation * Primary owner - likely the Curation task force * Actual work/discussion e.g. by legal issues committee * Would be important to improve human readability/explorability of CLAVAS (UI like CCR) * General procedure * Owner (TF) submits a proposal for a (version of a) vocabulary to SCCTC * SCCTC reviews and approves * TODO: suggest this workflow to SCCTC * Concrete vocabs * Media types * Already initiated * Requires use of schemes and collections * Licences * Already initiated * Effort on compiling licence vocabulary has to be revived * Further discussion required for a full vocabulary with detailed properties. * Simple version without complicated/controversial properties? * https://trac.clarin.eu/wiki/Taskforces/Curation/Meeting20170628 * Resource type vocab * Review and finalise list drafted by curation TF * Option of custom/3rd party (open) vocabularies (EURAC/Alex case)? * If there are use cases support for other vocab services could be considered * We would need a well defined API so that others can run a compliant vocab service * Keeping the option open, but nothing decided === Metadata curation === * Report on work/status (Matej) * GitHub based ‘curation through value mapping’ workflow * Work on resource type * Within facet * Cross-facet (from profile) * (Semi-)automatic curation (mapping suggestions) * Compare to current values (Resource type) * Similarity analysis (Organisation) * Decomposition/ “deconcatenation” (Modality) * Link checking * Statistics in curation module * (Other) facets * Organisation * Prepared by Alex; we also have a CLAVAS vocab (work in progress) * https://trac.clarin.eu/ticket/1069 * https://github.com/acdh-oeaw/VLO-mapping/blob/master/value-maps/organisation.csv * 991 values * Modality * Suggestion: Mapping based approach * https://trac.clarin.eu/ticket/1073 * https://github.com/acdh-oeaw/VLO-mapping/blob/master/value-maps/modality.csv * 162 values * Can we improve coverage automatically? * Noise in other facets e.g. genre * Perhaps some profile to modality mapping (speech corpus?) == Post-meeting: hands-on curation session == https://github.com/acdh-oeaw/VLO-mapping * Should be applied in: https://vlo.acdh.oeaw.ac.at/ * [https://github.com/acdh-oeaw/VLO-mapping/blob/master/value-maps/resourceClass_tf-extended.csv Resource type map] (already in use) * [https://github.com/acdh-oeaw/VLO-mapping/blob/master/value-maps/modality.csv Modality map] === Modality facet === * Trac issue #1073 * [https://vlo.clarin.eu/values/modality Live VLO value selection] * [Previous work by CLARIN-D: [[VLO-Taskforce/RecommendationsForFacets#Modality|recommendations for facet values]] * [https://github.com/acdh-oeaw/VLO-mapping/blob/modality-handson/value-maps/modality.csv modality-handson branch] * [https://github.com/acdh-oeaw/VLO-mapping/compare/master...acdh-oeaw:modality-handson#diff-450f33a704b4f7ec439794e3600cde80 Diff to master] === Outcome === (Preliminary) * We have jointly created a [https://github.com/acdh-oeaw/VLO-mapping/blob/42a103b4f09d25de0d7eb7034fc57d46114aec8b/value-maps/modality.csv first version] of a values map * Further discussion about degree of interpretation is required * Next steps: ??