Opened 6 years ago

Last modified 6 years ago

#1042 new task

Fill up the value mapping for resourceClass

Reported by: matej.durco@oeaw.ac.at Owned by: matej.durco@oeaw.ac.at
Priority: major Milestone:
Component: MetadataCuration Version:
Keywords: Cc: haaf@bbaw.de, hanna.hedeland@uni-hamburg.de, go.sugimoto@oeaw.ac.at, matej.durco@oeaw.ac.at, Twan Goosen, Menzo Windhouwer

Description

As started and agreed in Vienna meeting, we want to finalize the mapping file for resourceClassfacet.
After we successfully removed the clutter from OTA & CoCoON we are left with 132 values in the list.
I created a new list based on the |132| values still encountered in the VLO, merged this list with the manually curated list (resourceclass.csv) created during vienna meeting and added contextual information (in TF- prefixed columns - ignored by further processing) for the values (how often is the value used, how many collections contribute this value and top three contributing collections listed. Be aware that this information is a snapshot drawn from the vlo at some point and will get outdated.)

This new file is found under:
https://github.com/acdh-oeaw/VLO-mapping/blob/master/value-maps/resourceClass_tf-extended.csv

As we agreed, we would split the file, meaning each of the curators would work on another slice of that list. (You would still apply the whole fork-branch-pull-merge loop - as Twan showed us. But I also invited you to the repo, so you could spare yourself the forking and just clone the acdh-oeaw/VLO-mapping repo and create a branch there and work into that one.

For various reasons I propose right now just the three of us to work on this:
Hanna: 2-50 (many values are already mapped)
Matej: 51-90 (most sparse)
Susanne: 91-133 (also quite a few values filled already)

It would be great if you would manage to fill this out by Monday, 19.2., so that we can apply it before the telco-meeting on 21.2.

Change History (5)

comment:1 Changed 6 years ago by hanna.hedeland@uni-hamburg.de

I'm fine with that, but did we decide on how to map, i.e. map to clean up right now even if individual mappings don't make sense, or create a more reliable mapping, or should we add the Tf-applicability column and specify this for each item?

comment:2 Changed 6 years ago by haaf@bbaw.de

Thanks for the preps, Matej. I'll work on this as you proposed.

comment:3 Changed 6 years ago by matej.durco@oeaw.ac.at

As we discussed in Vienna and as Hanna reminded us, I now added two special columns to the mapping file:

  • TF-applicability - for noting if the mapping is applicable in general or just in the context of currently encountered records. Correspondingly proposing two possible values for this field: general and specific
  • TF-note - free text field for any general remarks the curators would like to leave for future generations

As specified by the application converting this into value-map.xml consumed by the vlo-importer https://github.com/clarin-eric/VLO-mapping-creator the TF- prefixed fields will be disregarded during the processing.

The updated file is available in the master branch of acdh-oeaw/VLO-mapping:
https://github.com/acdh-oeaw/VLO-mapping/blob/master/value-maps/resourceClass_tf-extended.csv

Please fork & branch to work on your slice and create a pull request once ready.

comment:4 Changed 6 years ago by Twan Goosen

Pull request #23 includes some fixes to the map: e4bf55 (syntax: comma to semicolon) and ccbe43 (value mappings for Europeana Newspapers).

comment:5 Changed 6 years ago by matej.durco@oeaw.ac.at

after this round of changes has been applied, we are at:

count-uncovered-records="45508"
count-facet-values="56"

So it's getting better.
To introduce a continuous curation of the mapping files, we will rely on a utility-tool that will check for new values. See #1056

Note: See TracTickets for help on using tickets.