wiki:VLO/Meetings/20180904

Virtual meeting on VLO planning 2018-09-04

  • What?
    • VLO planning meeting
  • Who?
    • Wolfgang, Thomas, Dieter, Matej, Menzo, Twan (, Can?)
  • When?
    • 4 September 2018, 10:00 - 12:00 CET (on basis of doodle)
  • Where?

Documents

Agenda

  • Current state
    • VLO 4.5
    • EOSC-hub
  • VLO 4.6
  • Curation
    • Curation module
      • Link checking
      • Value mapping
    • Curation workflow
      • Updating mappings based on VLO values - tooling (see #1056)
  • Beyond 4.6
  • Planning

Notes

Current state

4.5.0 has been running stably since mid July on the new host. Recently/currently some developments/restructuring of the Docker image and Docker Compose projects.

VLO 4.6

VLO 4.6 milestone reviewed and prioritised.

Critical tasks:

  • De-duplication (#13 (Solr), #195 (front end))
    • Front end and back end aspects can be developed more or less separately
    • At least for testing/exploration, also allow for grouping by other fields than the custom 'similarity signature'
  • Decoupling of importer logic from importer application 'shell', specifically to make this available to the curation module (#181, also #50)
  • Landing page prominence (#153)
    • Also in search results (part of #164)
    • Some related issues:
      • #173: Resource PID prominence, actionability
      • #184: Record PID prominence

Additionally some minor, non-blocking tasks including Mopinion based reporting/feedback and some small bug fixes and UI/UX improvements (see milestone)

Curation

Link checker

A first complete link checking run has been completed.

MPI-PL have reported high server loads due to 'aggressive' polling by the link checker (see comments of #1052). A delay is being introduced and another full link checking operation will be carried out.

The results and performance of the first few test runs will be analysed, and reported on (informally) at CLARIN 2018. At a later stage, the intention is to implement support for parsing and respecting robots.txt attributes (note: a 3rd party Java library exists for this). Exact parameters such as upper bound for acceptable request intervals have to be determined.

Some improvements will be implemented with respect to the presentation of the results - overall and per collection. Aggregated counts per 'repsonse category' (probably based on response code, but also timeouts and other reponse issues) will be provided. Sample URLs for large collections (>50 records) should be selected to be informative and representative, and should primarily give a sample of the problematic cases. In the long term a view on the entire URL set should be available. Currently the full report XML is available for closer inspection.

Beyond 4.6

Skipped for the interest of time

Planning

  • First 4.6 alpha available ulitmately 2018-10-08
  • Aiming for 4.6.0 release 2018-11-10
  • Next joint meeting: at CLARIN 2018
  • Wolfgang will visit the CLARIN office in Nijmegen on 2018-09-07 to work on #181
Last modified 6 years ago Last modified on 09/04/18 15:20:32