Opened 6 years ago

Last modified 6 years ago

#1056 accepted feature

Develop util-tool to update mapping files with new values encountered in VLO

Reported by: matej.durco@oeaw.ac.at Owned by: wolfgang.sauer@oeaw.ac.at
Priority: major Milestone:
Component: MetadataCuration Version:
Keywords: Cc: wolfgang.sauer@oeaw.ac.at, Twan Goosen, haaf@bbaw.de, Menzo Windhouwer, Dieter Van Uytvanck

Description

To be able to easily continuously curate the normalisation maps we need an automated way to check them against the values encountered in the vlo.

For this a simple tool will be developed that will be part of the VLO-mapping code repo.
If will be run manually by a curator and will check a given mapping file against the values in the corresponding facet in the VLO (via query to solr-endpoint) and indicate the unseen values.

It could add these unseen values directly in the (alphabetically) right place in the mapping file, making them nicely visible through git diff, leaving it to the curator to act upon this new information, i.e. ideally propose new mappings to these values.

Attachments (1)

difftool.properties (280 bytes) - added by wolfgang.sauer@oeaw.ac.at 6 years ago.

Download all attachments as: .zip

Change History (8)

comment:1 Changed 6 years ago by matej.durco@oeaw.ac.at

Status: newassigned

initial version is on github at https://github.com/acdh-oeaw/valuemappings-difftool

Execution is
java -jar <version with dependencies> <csv filename>

An internet connection is necessary for execution since the program downloads facet values from solr on alpha-vlo

README follows soon.

comment:2 Changed 6 years ago by wolfgang.sauer@oeaw.ac.at

release 1.0 is on github. Please see the project readme for further instructions, since the program call changed for use of an external configuration file.

I'm attaching a configuration file with current solr_url to the ticket.

Changed 6 years ago by wolfgang.sauer@oeaw.ac.at

Attachment: difftool.properties added

comment:3 Changed 6 years ago by wolfgang.sauer@oeaw.ac.at

Status: assignedaccepted

comment:4 Changed 6 years ago by Twan Goosen

Cc: Menzo.Windhouwer@mpi.nl Dieter Van Uytvanck added; menzo removed

EDIT: This comment does NOT relate directly to the present issue. See #1052 for a more relevant ticket.


Just a note here to mention that in our opinion it would be very valuable to have a more or less formal definition of the intended behaviour of the URL checking tool and its contribution to the metadata scoring. We have some real-world use cases, especially regarding resource (un)availability that need thinking and probably some decision making. Our concern is that without such a specification we might end up with ad-hoc solutions and/or not so well defined behaviour and as a consequence confused metadata providers.

What would be a good place for this?

To start, we can disuss things a bit on Slack - I propose setting up a channel dedicated to the curation module and related tools.

Last edited 6 years ago by Twan Goosen (previous) (diff)

comment:5 Changed 6 years ago by Twan Goosen

Cc: Menzo Windhouwer added; Menzo.Windhouwer@mpi.nl removed

comment:6 Changed 6 years ago by matej.durco@oeaw.ac.at

I very much support, to write down a specification, as well as having a dedicated channel.
However this ticket is about diffing the normalisation tables.
In my understanding URL-checking is a separate feature, for which the nearest ticket I could find is #1052.

Nevertheless both, the diff-tool and the URL-checking would deserve a specification.

Wolfgang, could you start to work on that?

I would propose to start a google-doc for each to flash it out and then integrate it as documentation into the corresponding code-repos. Any objections?

comment:7 in reply to:  6 Changed 6 years ago by Twan Goosen

Replying to matej.durco@…:

However this ticket is about diffing the normalisation tables.
In my understanding URL-checking is a separate feature, for which the nearest ticket I could find is #1052.

Aaargh, wrong tool wrong ticket my bad. Thanks for following up despite that! I wil make a note to the comment above that it was wrongly posted here.

Note: See TracTickets for help on using tickets.