Opened 7 years ago
Last modified 6 years ago
#1042 new task
Fill up the value mapping for resourceClass
Reported by: | matej.durco@oeaw.ac.at | Owned by: | matej.durco@oeaw.ac.at |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | MetadataCuration | Version: | |
Keywords: | Cc: | haaf@bbaw.de, hanna.hedeland@uni-hamburg.de, go.sugimoto@oeaw.ac.at, matej.durco@oeaw.ac.at, Twan Goosen, Menzo Windhouwer |
Description
As started and agreed in Vienna meeting, we want to finalize the mapping file for resourceClass
facet.
After we successfully removed the clutter from OTA & CoCoON we are left with 132 values in the list.
I created a new list based on the |132| values still encountered in the VLO, merged this list with the manually curated list (resourceclass.csv) created during vienna meeting and added contextual information (in TF- prefixed columns - ignored by further processing) for the values (how often is the value used, how many collections contribute this value and top three contributing collections listed. Be aware that this information is a snapshot drawn from the vlo at some point and will get outdated.)
This new file is found under:
https://github.com/acdh-oeaw/VLO-mapping/blob/master/value-maps/resourceClass_tf-extended.csv
As we agreed, we would split the file, meaning each of the curators would work on another slice of that list. (You would still apply the whole fork-branch-pull-merge loop - as Twan showed us. But I also invited you to the repo, so you could spare yourself the forking and just clone the acdh-oeaw/VLO-mapping repo and create a branch there and work into that one.
For various reasons I propose right now just the three of us to work on this:
Hanna: 2-50 (many values are already mapped)
Matej: 51-90 (most sparse)
Susanne: 91-133 (also quite a few values filled already)
It would be great if you would manage to fill this out by Monday, 19.2., so that we can apply it before the telco-meeting on 21.2.
Change History (5)
comment:1 Changed 7 years ago by
comment:3 Changed 7 years ago by
As we discussed in Vienna and as Hanna reminded us, I now added two special columns to the mapping file:
TF-applicability
- for noting if the mapping is applicable in general or just in the context of currently encountered records. Correspondingly proposing two possible values for this field:general
andspecific
TF-note
- free text field for any general remarks the curators would like to leave for future generations
As specified by the application converting this into value-map.xml consumed by the vlo-importer https://github.com/clarin-eric/VLO-mapping-creator the TF-
prefixed fields will be disregarded during the processing.
The updated file is available in the master branch of acdh-oeaw/VLO-mapping:
https://github.com/acdh-oeaw/VLO-mapping/blob/master/value-maps/resourceClass_tf-extended.csv
Please fork & branch to work on your slice and create a pull request once ready.
comment:4 Changed 6 years ago by
comment:5 Changed 6 years ago by
after this round of changes has been applied, we are at:
count-uncovered-records="45508"
count-facet-values="56"
So it's getting better.
To introduce a continuous curation of the mapping files, we will rely on a utility-tool that will check for new values. See #1056
I'm fine with that, but did we decide on how to map, i.e. map to clean up right now even if individual mappings don't make sense, or create a more reliable mapping, or should we add the Tf-applicability column and specify this for each item?