Changes between Version 10 and Version 11 of Taskforces/CMDI/Meeting20161025-28


Ignore:
Timestamp:
10/27/16 09:35:22 (8 years ago)
Author:
Twan Goosen
Comment:

editing notes

Legend:

Unmodified
Added
Removed
Modified
  • Taskforces/CMDI/Meeting20161025-28

    v10 v11  
    5353=== Thursday 27 October (lunch) ===
    5454
     55'''We meet at the entrance of the conference centre at 13:05, and will walk to a restaurant where we can have lunch (TBD)'''
     56
    5557Input from TF members, distribution of work
    5658
     
    7678
    7779==== CMDI 1.2 ====
    78 CMDI1.2 rolled out - supported in the production environment
    79 CMDI toolkit finished - validate, create, migrate CMD records
    80 Spec is available online
    81 CompRegistry supports 1.2 (and still 1.1), not all 1.2 features are implemented yet. -> Twan working on CR 2.2 with more complete coverage of 1.2 features
    82 support for 1.2 in tools: ??
    83  COMEDI? (could become the default/recommended CMDI editor instead of Arbil)
    84 Centres partly already providing 1.2 (Leipzig)
     80* CMDI1.2 rolled out - supported in the production environment
     81* CMDI toolkit finished - validate, create, migrate CMD records
     82* Spec is available online
     83* !ComponentRegistry supports 1.2 (and still 1.1), not all 1.2 features are implemented yet. -> Twan working on CR 2.2 with more complete coverage of 1.2 features
     84* support for 1.2 in tools/infra:
     85 * Metadata curation module
     86 * COMEDI - not yet but there's some commitment towards support (could become the default/recommended CMDI editor instead of Arbil)
     87 * Centres partly already providing 1.2 (for now only Leipzig known, but others (in CLARIN-D) have plans)
    8588
    86 CLAVAS!
    87 CMDI 1.2 supports the CLAVAS vocabularies (currently: Organisations, Language codes; new vocabularies could come in, but need commitment)
    88 candidates for vocabularies: mime-types,  licences (manager: legal issues committee?)
    89 CLAVAS based on OpenSKOS provided by Meertens - new version of OpenSKOS coming soon.
    90 governance/versioning/curation of vocabularies is a problem (to solve)
    91 how to edit the vocabularies (google-spreadsheets)
    92 how to deal with wild variants of concepts
    93 Hidden labels - for spelling errors etc;
    94 or separate ConceptSchemes one with official labels/concepts, one with all the misspellings and mapping realtions between the concepts of the different concept schemes
    95 all controlled vocabularies to be used in VLO should be in CLAVAS
     89* CLAVAS!
     90 * CMDI 1.2 supports the CLAVAS vocabularies (currently: Organisations, Language codes; new vocabularies could come in, but need commitment)
     91 * candidates for other vocabularies: media types (MIME),  licences (manager: legal issues committee?)
     92 * CLAVAS based on OpenSKOS provided by Meertens - new version of OpenSKOS coming soon.
     93 * governance/versioning/curation of vocabularies is a problem (to solve)
     94 * how to edit the vocabularies (google-spreadsheets)
     95 * how to deal with wild variants of concepts
     96  * Hidden labels - for spelling errors etc;
     97  * or separate ConceptSchemes one with official labels/concepts, one with all the misspellings and mapping realtions between the concepts of the different concept schemes
     98 * all controlled vocabularies to be used in VLO should be in CLAVAS
    9699
    97 Best practices -  a more practical guide to usage of CMDI
    98 Current/previous effort: https://www.clarin.eu/content/cmdi-best-practice-guide
    99 Issues:
    100 Hierarchies
    101 resource type
    102 Input (non-exhaustive)
    103 Existing best practices draft
    104 CLARIN-D VLO-WG output https://trac.clarin.eu/wiki/VLO-Taskforce
    105 Recommendations document
    106 Granularity recommendations
    107 FAQ
    108 Paper by Thorsten Trippel and Claus Zinn ??
    109 ...
    110 Has to go inline/in sync with curation work and VLO development
     100- Outlook
     101
     102    - Best practices -  a more practical guide to usage of CMDI
     103
     104        - Current/previous effort: https://www.clarin.eu/content/cmdi-best-practice-guide
     105
     106        - Issues:
     107
     108            - Hierarchies
     109
     110            - resource type
     111
     112        - Input (non-exhaustive)
     113
     114            - Existing best practices draft
     115
     116            - CLARIN-D VLO-WG output https://trac.clarin.eu/wiki/VLO-Taskforce
     117
     118                - Recommendations document
     119
     120            - Granularity recommendations
     121
     122            - FAQ
     123
     124            - Paper by Thorsten Trippel and Claus Zinn ??
     125
     126            - ...
     127
     128        - Has to go inline/in sync with curation work and VLO development
    111129
    112130==== VLO ====
    113 Brief status update
    114 4.0: CMDI 1.2 + UI
    115 The VLO's 'Format' facet and the use of MIME types (Hanna)
    116 To improve the 'Format' facet, we probably first need some agreement on it's content, i.e. the use of MIME types, in particular for file formats that don't have an unambiguous standard MIME type such as application/tei+xml or application/xml. In CLARIN-D there has been a decision to try out an approach based on Content-Type syntax; a standard MIME type with additional parameters for disambiguation, cf. MIME format variants for some info and links to the discussion on mailing lists.
    117 VLO ‘format’ facet is fed from resource proxies (120 distinct values)
    118 Also map from mime type component? E.g. in case of zips
    119 Make an (open) vocabulary for media types
    120 Cannot be validated using schema but could be part of the quality assessment procedure
    121 Helpdesk responsibilities and workflows (Hanna)
    122 Is our (and CLARIN-D's) idea of helpdesk responsibilities and workflows for reports and requests via the VLO OK with everyone involved?
    123 Jan: quite satisfied, usually get an answer
    124 Matej: receive reports, do not always act on it
    125 Ticket can be addressed to multiple people if they are in the queue (or by sending an e-mail)
    126 Define Workflow better between trac-tickets and OTRS-issues (and github-issues)
    127 Hanna will come back to us with a new/updated approach
    128 Priorities and planning (Twan)
    129 https://github.com/clarin-eric/VLO/milestones
    130 Maintenance release (VLO 4.0.2) - update of mappings!
    131 Use GitHub to publish and distribute mappings
    132 November?
    133 Separate meeting in the next few weeks
    134 Next minor release (VLO 4.1)
    135 Use new dynamic mapping retrieval workflow
    136 Various forks of a (to be created) mapping repository
    137 Domain experts and other curators can work on a fork and send pull request to curation experts
    138 Curation experts can send pull request to (production/beta/alpha) VLO maintainers
    139 VLO maintainers can maintain testing and production branches/forks
    140 Sync/transfer between CSV, CLAVAS, uniform mapping files (see below)
    141 Q1 2017??
    142 Development process (Twan)
    143 Code forks/synchronisation
    144 Sustainable procedure for (near) future
    145131
    146 ==== Facet mapping, normalisation, curation module ====
     132    - Status
     133        - This summer 4.0 released: CMDI 1.2 + UI
     134    - 'Format' facet and the use of MIME types (Hanna)
     135        - facet is fed from resource proxies (120 distinct values)
     136            - Also map from mime type component? E.g. in case of zips
     137        - Make an (open) vocabulary for media types
     138            - Menzo: Yes, but cannot be validated using schema but could be part of the quality assessment procedure
     139    - Helpdesk responsibilities and workflows (Hanna)
     140        - ''Jan Odijk'': quite satisfied, usually get an answer
     141        - ''Matej'': receive reports, do not always act on it
     142        - Ticket can be addressed to multiple people if they are in the queue (or by sending an e-mail)
     143        - Define Workflow better between trac-tickets and OTRS-issues (and github-issues)
     144        - '''Hanna''' will come back to us with a new/updated approach
     145    - Priorities and planning
     146        - [https://github.com/clarin-eric/VLO/milestones]
     147        - Maintenance release (VLO 4.0.2) - update of mappings!
     148            - Use GitHub to publish and distribute mappings
     149                - November?
     150                - Separate meeting in the next few weeks ('''Twan''')
     151        - Next minor release (VLO 4.1)
     152            - Use new dynamic mapping retrieval workflow
     153                - Various forks of a (to be created) mapping repository
     154                    - Domain experts and other curators can work on a fork and send pull request to curation experts
     155                    - Curation experts can send pull request to (production/beta/alpha) VLO maintainers
     156                    - VLO maintainers can maintain testing and production branches/forks
     157                - Sync/transfer between CSV, CLAVAS, uniform mapping files (see below)
     158            - Q1 2017??
     159    - Development process
     160        - Discuss in separate meeting
    147161
    148 Brief status update w.r.t. curation module, related work
    149 Curation module, Curation instance of VLO
    150 https://github.com/acdh-oeaw/vlo-curation
    151 Facet coverage
    152 E.g Resourcetype: map from profile name - big improvement
    153 Tentative mapping(s) -> how to agree on (precise) application of this approach?
    154 Jan: for organisations, mapping is not enough, also fuzzy search required, too many spelling variation
    155 So we would need to update the organisation vocabulary,
    156 But lot could be achieved also with fuzzy search (string normalisation, editing distance)
    157 Reporting of new values (to curators) would be helpful - possibly with automatic (pre)processing on top
    158 [VLO] For certain facet, apply fuzzy factor by default (for free text search - maybe also facets???)-
    159 Workflow for application of mapping in production
    160 Use github (with forks) for curating/editing the vocabularies
     162==== Facet mapping & normalisation ====
     163
     164    - Status
     165    - https://github.com/acdh-oeaw/vlo-curation
     166        - Facet coverage
     167            - E.g Resourcetype: map from profile name - big improvement
     168            - Tentative mapping(s) -> how to agree on (precise) application of this approach?
     169    - ''Jan'': for organisations, mapping is not enough, also fuzzy search required, too many spelling variation
     170    - So we would need to update the organisation vocabulary,
     171    - But lot could be achieved also with fuzzy search (string normalisation, editing distance)
     172    - Reporting of new values (to curators) would be helpful - possibly with automatic (pre)processing on top
     173    - [VLO] For certain facet, apply fuzzy factor by default (for free text search - maybe also facets???)
     174    - Workflow for application of mapping in production
     175        - Use github (with forks) for curating/editing the vocabularies
    161176You can edit csv in github directly
    162 github:vlo-curation/maps/csv
    163 github:vlo-curation/maps/profileName -> resourceType
    164 
    165 
    166 Short term (4.1)
    167 Resource type
    168 CLAVAS vocabulary
    169 Map for profile name -> resource type mapping
    170 People: Jan
    171 Licence/availability
    172 CLAVAS vocabulary
    173 Licence URIs (instead of the handles that are assigned to concepts by OpenSKOS automatically)
    174 Mapping files licence identifier (URI) -> licence name, licence URL
    175 People: Legal Issues Committee, ask CLARINO people to curate/update (Oddrun?)
    176 MIME type
    177 Map ~120 types to descriptions (vocabularies)
    178 Some names/descriptions already provided in “MIME type inventory” document
    179 Map for cross-facet mapping (-> resource type)
    180 Hanna (+SCCTC editorial team of interoperability document?)
    181 How to get from CSV to SKOS?
    182 How to get from SKOS to vlo-map? (fetch whole concept-scheme from CLAVAS and transform via XSL to a map)
    183 Determine/implement transitions between CSV, CLAVAS, Mapping files
    184 Long term (curation module)
     177        - github:vlo-curation/maps/csv
     178        - github:vlo-curation/maps/profileName -> resourceType
     179- Short term (VLO 4.1)
     180    - Resource type
     181        - CLAVAS vocabulary
     182        - Map for profile name -> resource type mapping
     183        - People: '''Jan'''
     184    - Licence/availability
     185        - CLAVAS vocabulary
     186            - Licence URIs (instead of the handles that are assigned to concepts by OpenSKOS automatically)
     187        - Mapping files licence identifier (URI) -> licence name, licence URL
     188        - People: Legal Issues Committee, ask CLARINO people to curate/update ('''Oddrun'''?)
     189    - MIME type
     190        - Map ~120 types to descriptions (vocabularies)
     191            - Some names/descriptions already provided in “MIME type inventory” document
     192        - Map for cross-facet mapping (-> resource type)
     193        - '''Hanna''' (+SCCTC editorial team of interoperability document?)
     194- Mapping definitions and translations between formats/modalities (csv, mapping xml, clavas)
     195 - How to get from CSV to SKOS?
     196 - How to get from SKOS to vlo-map? (fetch whole concept-scheme from CLAVAS and transform via XSL to a map)
     197 - '''Davor, Matej, Twan''': Determine/implement transitions between CSV, CLAVAS, Mapping files
     198- Long term (curation module)
     199 - ???