Changes between Version 9 and Version 10 of Taskforces/CMDI/Meeting20161025-28


Ignore:
Timestamp:
10/27/16 06:38:55 (8 years ago)
Author:
Twan Goosen
Comment:

Pasted notes from Google Doc. TODO: clean up

Legend:

Unmodified
Added
Removed
Modified
  • Taskforces/CMDI/Meeting20161025-28

    v9 v10  
    1 = CMDI Taskforce & curation taskforce joint meetings at the CLARIN Annual Conference 2016 =
     1rom = CMDI Taskforce & curation taskforce joint meetings at the CLARIN Annual Conference 2016 =
    22
    33* What?
     
    1515== Documents ==
    1616* [[CMDI 1.2/Specification|CMDI 1.2 specification]]
    17 * ...
    1817
    1918== Agenda ==
     
    2726
    2827=== Wednesday 26 October (joint taskforces meeting) ===
     28[https://docs.google.com/document/d/1Y2wb4jl32y9uUAo-Z8wlR0v-eLvdkYUYfl7ENtc-UGU/edit?usp=sharing Google Doc]
     29
    2930Progress update, planning for the upcoming period.
    3031
     
    7071== Notes ==
    7172
     73=== Wednesday 26 October ===
     74
     75Joint TF meeting (raw notes: [https://docs.google.com/document/d/1Y2wb4jl32y9uUAo-Z8wlR0v-eLvdkYUYfl7ENtc-UGU/edit?usp=sharing Google Doc])
     76
     77==== CMDI 1.2 ====
     78CMDI1.2 rolled out - supported in the production environment
     79CMDI toolkit finished - validate, create, migrate CMD records
     80Spec is available online
     81CompRegistry supports 1.2 (and still 1.1), not all 1.2 features are implemented yet. -> Twan working on CR 2.2 with more complete coverage of 1.2 features
     82support for 1.2 in tools: ??
     83 COMEDI? (could become the default/recommended CMDI editor instead of Arbil)
     84Centres partly already providing 1.2 (Leipzig)
     85
     86CLAVAS!
     87CMDI 1.2 supports the CLAVAS vocabularies (currently: Organisations, Language codes; new vocabularies could come in, but need commitment)
     88candidates for vocabularies: mime-types,  licences (manager: legal issues committee?)
     89CLAVAS based on OpenSKOS provided by Meertens - new version of OpenSKOS coming soon.
     90governance/versioning/curation of vocabularies is a problem (to solve)
     91how to edit the vocabularies (google-spreadsheets)
     92how to deal with wild variants of concepts
     93Hidden labels - for spelling errors etc;
     94or separate ConceptSchemes one with official labels/concepts, one with all the misspellings and mapping realtions between the concepts of the different concept schemes
     95all controlled vocabularies to be used in VLO should be in CLAVAS
     96
     97Best practices -  a more practical guide to usage of CMDI
     98Current/previous effort: https://www.clarin.eu/content/cmdi-best-practice-guide
     99Issues:
     100Hierarchies
     101resource type
     102Input (non-exhaustive)
     103Existing best practices draft
     104CLARIN-D VLO-WG output https://trac.clarin.eu/wiki/VLO-Taskforce
     105Recommendations document
     106Granularity recommendations
     107FAQ
     108Paper by Thorsten Trippel and Claus Zinn ??
    72109...
     110Has to go inline/in sync with curation work and VLO development
     111
     112==== VLO ====
     113Brief status update
     1144.0: CMDI 1.2 + UI
     115The VLO's 'Format' facet and the use of MIME types (Hanna)
     116To improve the 'Format' facet, we probably first need some agreement on it's content, i.e. the use of MIME types, in particular for file formats that don't have an unambiguous standard MIME type such as application/tei+xml or application/xml. In CLARIN-D there has been a decision to try out an approach based on Content-Type syntax; a standard MIME type with additional parameters for disambiguation, cf. MIME format variants for some info and links to the discussion on mailing lists.
     117VLO ‘format’ facet is fed from resource proxies (120 distinct values)
     118Also map from mime type component? E.g. in case of zips
     119Make an (open) vocabulary for media types
     120Cannot be validated using schema but could be part of the quality assessment procedure
     121Helpdesk responsibilities and workflows (Hanna)
     122Is our (and CLARIN-D's) idea of helpdesk responsibilities and workflows for reports and requests via the VLO OK with everyone involved?
     123Jan: quite satisfied, usually get an answer
     124Matej: receive reports, do not always act on it
     125Ticket can be addressed to multiple people if they are in the queue (or by sending an e-mail)
     126Define Workflow better between trac-tickets and OTRS-issues (and github-issues)
     127Hanna will come back to us with a new/updated approach
     128Priorities and planning (Twan)
     129https://github.com/clarin-eric/VLO/milestones
     130Maintenance release (VLO 4.0.2) - update of mappings!
     131Use GitHub to publish and distribute mappings
     132November?
     133Separate meeting in the next few weeks
     134Next minor release (VLO 4.1)
     135Use new dynamic mapping retrieval workflow
     136Various forks of a (to be created) mapping repository
     137Domain experts and other curators can work on a fork and send pull request to curation experts
     138Curation experts can send pull request to (production/beta/alpha) VLO maintainers
     139VLO maintainers can maintain testing and production branches/forks
     140Sync/transfer between CSV, CLAVAS, uniform mapping files (see below)
     141Q1 2017??
     142Development process (Twan)
     143Code forks/synchronisation
     144Sustainable procedure for (near) future
     145
     146==== Facet mapping, normalisation, curation module ====
     147
     148Brief status update w.r.t. curation module, related work
     149Curation module, Curation instance of VLO
     150https://github.com/acdh-oeaw/vlo-curation
     151Facet coverage
     152E.g Resourcetype: map from profile name - big improvement
     153Tentative mapping(s) -> how to agree on (precise) application of this approach?
     154Jan: for organisations, mapping is not enough, also fuzzy search required, too many spelling variation
     155So we would need to update the organisation vocabulary,
     156But lot could be achieved also with fuzzy search (string normalisation, editing distance)
     157Reporting of new values (to curators) would be helpful - possibly with automatic (pre)processing on top
     158[VLO] For certain facet, apply fuzzy factor by default (for free text search - maybe also facets???)-
     159Workflow for application of mapping in production
     160Use github (with forks) for curating/editing the vocabularies
     161You can edit csv in github directly
     162github:vlo-curation/maps/csv
     163github:vlo-curation/maps/profileName -> resourceType
     164
     165
     166Short term (4.1)
     167Resource type
     168CLAVAS vocabulary
     169Map for profile name -> resource type mapping
     170People: Jan
     171Licence/availability
     172CLAVAS vocabulary
     173Licence URIs (instead of the handles that are assigned to concepts by OpenSKOS automatically)
     174Mapping files licence identifier (URI) -> licence name, licence URL
     175People: Legal Issues Committee, ask CLARINO people to curate/update (Oddrun?)
     176MIME type
     177Map ~120 types to descriptions (vocabularies)
     178Some names/descriptions already provided in “MIME type inventory” document
     179Map for cross-facet mapping (-> resource type)
     180Hanna (+SCCTC editorial team of interoperability document?)
     181How to get from CSV to SKOS?
     182How to get from SKOS to vlo-map? (fetch whole concept-scheme from CLAVAS and transform via XSL to a map)
     183Determine/implement transitions between CSV, CLAVAS, Mapping files
     184Long term (curation module)