Opened 6 years ago
Last modified 6 years ago
#1056 accepted feature
Develop util-tool to update mapping files with new values encountered in VLO
Reported by: | matej.durco@oeaw.ac.at | Owned by: | wolfgang.sauer@oeaw.ac.at |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | MetadataCuration | Version: | |
Keywords: | Cc: | wolfgang.sauer@oeaw.ac.at, Twan Goosen, haaf@bbaw.de, Menzo Windhouwer, Dieter Van Uytvanck |
Description
To be able to easily continuously curate the normalisation maps we need an automated way to check them against the values encountered in the vlo.
For this a simple tool will be developed that will be part of the VLO-mapping code repo.
If will be run manually by a curator and will check a given mapping file against the values in the corresponding facet in the VLO (via query to solr-endpoint) and indicate the unseen values.
It could add these unseen values directly in the (alphabetically) right place in the mapping file, making them nicely visible through git diff, leaving it to the curator to act upon this new information, i.e. ideally propose new mappings to these values.
Attachments (1)
Change History (8)
comment:1 Changed 6 years ago by
Status: | new → assigned |
---|
comment:2 Changed 6 years ago by
release 1.0 is on github. Please see the project readme for further instructions, since the program call changed for use of an external configuration file.
I'm attaching a configuration file with current solr_url to the ticket.
Changed 6 years ago by
Attachment: | difftool.properties added |
---|
comment:3 Changed 6 years ago by
Status: | assigned → accepted |
---|
comment:4 Changed 6 years ago by
Cc: | Menzo.Windhouwer@mpi.nl Dieter Van Uytvanck added; menzo removed |
---|
EDIT: This comment does NOT relate directly to the present issue. See #1052 for a more relevant ticket.
Just a note here to mention that in our opinion it would be very valuable to have a more or less formal definition of the intended behaviour of the URL checking tool and its contribution to the metadata scoring. We have some real-world use cases, especially regarding resource (un)availability that need thinking and probably some decision making. Our concern is that without such a specification we might end up with ad-hoc solutions and/or not so well defined behaviour and as a consequence confused metadata providers.
What would be a good place for this?
To start, we can disuss things a bit on Slack - I propose setting up a channel dedicated to the curation module and related tools.
comment:5 Changed 6 years ago by
Cc: | Menzo Windhouwer added; Menzo.Windhouwer@mpi.nl removed |
---|
comment:6 follow-up: 7 Changed 6 years ago by
I very much support, to write down a specification, as well as having a dedicated channel.
However this ticket is about diffing the normalisation tables.
In my understanding URL-checking is a separate feature, for which the nearest ticket I could find is #1052.
Nevertheless both, the diff-tool and the URL-checking would deserve a specification.
Wolfgang, could you start to work on that?
I would propose to start a google-doc for each to flash it out and then integrate it as documentation into the corresponding code-repos. Any objections?
comment:7 Changed 6 years ago by
Replying to matej.durco@…:
However this ticket is about diffing the normalisation tables.
In my understanding URL-checking is a separate feature, for which the nearest ticket I could find is #1052.
Aaargh, wrong tool wrong ticket my bad. Thanks for following up despite that! I wil make a note to the comment above that it was wrongly posted here.
initial version is on github at https://github.com/acdh-oeaw/valuemappings-difftool
Execution is
java -jar <version with dependencies> <csv filename>
An internet connection is necessary for execution since the program downloads facet values from solr on alpha-vlo
README follows soon.