= Virtual meeting on VLO development planning 2020-01-29 = * What? * VLO progress and planning meeting * Who? * Can, Dieter, Matej, Tariq, Thomas, Twan * When? * 29 January 2020 10:00 - 11:00 CET (on basis of [https://https://doodle.com/poll/6er6hqh8rfi72pk2 doodle]) * Where? * [https://clarin.zoom.us/j/270953863 Zoom] == Documents == * Milestones (!GitHub) * [https://github.com/clarin-eric/VLO/milestone/8 VLO 4.9 milestone] * [https://github.com/clarin-eric/VLO/milestone/13 VLO 5.0 milestone] * [https://github.com/clarin-eric/VLO/labels/eosc-hub EOSC-hub related issues] * [[20191211#Notes|Notes from last meeting]] Other * [https://docs.google.com/document/d/1l-ucmGbEd59LzxuAhpXv-Te8mte8Z9O3PbfNF5ilh00/edit?usp=sharing VLO 5.0 design proposals] * [https://office.clarin.eu/v/CE-2018-1175-EOSC-hub-task71-roadmap.pdf EOSC-hub roadmap] == Agenda== === VLO === ==== VLO 4.9 ==== New features/enhancements previously considered for 4.9 * [https://github.com/clarin-eric/VLO/issues/209 #209]: Include metadata as Structured data for datasets * See item 5 of [https://office.clarin.eu/v/CE-2019-1545-SCCTC_Minutes_7_November_2019-Approved.pdf Centre Meeting of 7 November 2019] and [https://office.clarin.eu/v/CE-2019-1544-Embedded_metadata_for_landing_pages.docx CE-2019-1544 'Embedded metadata for landing pages'] * ​[https://github.com/clarin-eric/VLO/issues/36 #36]: Conditional facets * Pilot cases: * hide `multilingual` until at least one value has been selected for `language` * hide `genre` until there are less than N(=50?) values OR a `collection` value has been selected * ... * [https://github.com/clarin-eric/VLO/issues/115 #115]: Temporal coverage * (...) a pragmatic approach might be to extract all temporal information and store only the extrema for the whole instance. This would be robust regaring alignment problems (...), but would also create a larger timespan out of multiple discontinuous timespans. * ​[https://github.com/clarin-eric/VLO/issues/189 #189]: Display original values * [https://github.com/clarin-eric/VLO/issues/140 #140]: Allow for mapping based on a profile's concept link * [https://github.com/clarin-eric/VLO/issues/169 #169]: Migrating uniform maps and other post-processors to value mapping * [https://github.com/clarin-eric/VLO/issues/240 #240]: Show more precise but user friendly media type for resources * [https://github.com/clarin-eric/VLO/issues/141 #141]: Use alternative labels from vocabularies and maps as synonyms in Solr * Improved VCR integration * depends on VCR use cases... * Improved LRS integration (information exchange, preflight?) * Try to improve Solr performance + investigate possibilities for scaling up ==== VLO 5.0 ==== Rethink/redesign/reimplement UI * [https://github.com/clarin-eric/VLO/milestone/13 VLO 5.0 milestone] * [https://docs.google.com/document/d/1l-ucmGbEd59LzxuAhpXv-Te8mte8Z9O3PbfNF5ilh00/edit?usp=sharing VLO 5.0 design proposals] === Curation module & link checking === * Development, release and operation of Apache Storm based link checker * Hosting/operation moved to CLARIN central infra? * Possible integration of cmdi validator(https://github.com/clarin-eric/cmdi-instance-validator ) in curation module? === Planning === * Releases & deployments * From CLARIN internal roadmap: * 4.9 ideally before EOSC-hub week (release end of April/early May) * 5.0 design complete ~end of summer, release around the end of 2020 * Next meeting == Notes == === 4.9 issues === ==== Issues to include in milestone with highest priority ==== * [https://github.com/clarin-eric/VLO/issues/209 #209 "Include metadata as Structured data for datasets"] * Should be included but requires investigation: * Conditions for showing? * There has to be landing page, or another reference to point to as the original resource (which may or may not serve structured data) * We want to be able to explicitly exclude specific collections * Minimal information required * Based on requirements of Dataset profile (see Google) * Also for non-CLARIN collections? * Europeana case: check whether they already do this and/or would be ok with us doing this * [https://github.com/clarin-eric/VLO/issues/36 #36: Conditional facets] * 'Basic' and clear cases like language -> multilingual could already be included in 4.9 * Some cases like the genre example are more 'aggressive' and can be intransparent. Take up in investigation for re-design (VLO 5.0) * [https://github.com/clarin-eric/VLO/issues/115 #115: Temporal coverage ] * Round to year and map to extrema (earliest and latest year mentioned). ~19k records have information + many from Europeana. * Also introduces some UI opportunities/challenges: new type of facet value selector; possibly a date histogram could be shown as well. ==== Issues for 4.9 milestone with lower priority ==== * [https://github.com/clarin-eric/VLO/issues/140 140: Allow for mapping based on a profile's concept link] * [https://github.com/clarin-eric/VLO/issues/169 169: Migrating uniform maps and other post-processors to value mapping] * [https://github.com/clarin-eric/VLO/issues/189 189: show original values] * maybe better for 5.0, rethink UI aspects * risk of sacrificing transparent * maybe do user testing * ask Martin to make the case for this? * What does Europeana do? * Need to distinguish between types of mapping * [https://github.com/clarin-eric/VLO/issues/240 240: Show more precise but user friendly media type for resources] * Need to find or construct this vocabulary * Would also be useful for switchboard and VCR * Also value mapping to join types? * Would require a separate mapping into separate field * Solr tuning * Make an issue + try out things before 4.9 * [https://github.com/clarin-eric/VLO/issues/287 #287: Analysis of bottlenecks and performance tweaks] * Prioritise on understanding bottlenecks * Profiling ==== Issues to be excluded from 4.9 milestone ==== * VCR integration * Probably not right now. Check with Willem * Existing VCR integration could be taken out? * LSR integration * 'Widget' and preflight will not be implemented and available before 4.9 release * [https://github.com/clarin-eric/VLO/issues/141 141: Use alternative labels from vocabularies and maps as synonyms in Solr] * Possibly close permanently === VLO 5.0 === More ideas to be gathered in the design document, more concrete idea that can be implemented should have started crystallising around the summer. Any input is welcome. === Curation module & link checking === Link checker * Strong suspicion of source of performance issues ('''update''': source confirmed and path to resolution identified) * New, testable version of 'stormy checker' should be available in about 2 weeks * To be checked with (development version) of the VLO -> [https://github.com/clarin-eric/VLO/issues/285 #285] Curation module * Next steps: 1. Improve situation with '0' status code * Can plans to replace status code as success indicator with an explicit status field (`ok`/`broken`/`undetermined`); status code field can become nullable * Statistics reported to be updated; probably two levels will be introduced: first breaks down according to overall status; second gives details on status codes, response times etc (similar to what is available now) 1. Later: integration of CMDI instance validator into curation module === Planning === * Status & planning meeting for 4.9 development and roadmap: early march * [https://doodle.com/poll/gyqfrqrqnpn8nc29 Doodle] * chosen date: [[../20200311|11 March 2020]]