Opened 6 years ago

Last modified 6 years ago

#1073 new task

normalize modality facet

Reported by: matej.durco@oeaw.ac.at Owned by: matej.durco@oeaw.ac.at
Priority: major Milestone:
Component: MetadataCuration Version:
Keywords: Cc: haaf@bbaw.de, go.sugimoto@oeaw.ac.at, hanna.hedeland@uni-hamburg.de, matej.durco@oeaw.ac.at, menzo.windhouwer@di.huc.knaw.nl, Caspar Jordan, Twan Goosen

Description

We agreed that next facet trying to clean up would be modality.

It has only around 160 different values, so that sounds feasible.

On the down side, it has very bad coverage: only some 255.857 records have a modality facet, 1.384.746 don't. Here again cross-facet mapping maybe could be of help.

Another question is that of the vocabulary we want to normalize against.
This will be even more difficult to agree on probably than in the case of resourceType.
There is some previous work on this in the context of CLARIN-D VLO taskforce [Susanne, can you link to it from here?]

Also around 50 of the values are just concatentations with "," or ";", so in the first step this can be simply correctly parsed as multi-value.

An empty normalisation map has been created:
https://github.com/acdh-oeaw/VLO-mapping/blob/master/value-maps/modality.csv
In the usual format: all the values encountered in the vlo (well alpha-vlo/solr, but that should be quite the same) are listed, including information on their frequency and their distribution among collections.

Next is again work distribution among the curation team to propose normalized values for the encountered ones.

Change History (3)

comment:1 Changed 6 years ago by matej.durco@oeaw.ac.at

first attempt in a intensive hands-on session:
https://github.com/acdh-oeaw/VLO-mapping/blob/modality-handson/value-maps/modality.csv

The proposed normalized values still need a discussion.
I will collect them into a vocabulary and share that with potential stake-holders.

comment:2 Changed 6 years ago by matej.durco@oeaw.ac.at

Proposition for a vocabulary by CLARIN-D VLO-task force:
https://trac.clarin.eu/wiki/VLO-Taskforce/RecommendationsForFacets#Modality

Last edited 6 years ago by matej.durco@oeaw.ac.at (previous) (diff)

comment:3 Changed 6 years ago by Twan Goosen

We have to agree on the degree of interpretation we do, and related to this, whether we want to create a vocabulary and try to map everything to that (as we did with resource type).

My suggestion would be to, at least for now, focus on removing noise first. Creating a vocabulary would not only require quite a deep understanding of modality, and there may also be multiple models and understandings of the concept which can be hard to merge.

I'm also not sure how much doing such a rigorous interpretative mapping would improve the discovery process - at least first we should try to figure out whether we actually have a recall problem.

Examples:

Fine (IMO):

  • "canción","song"

Don't do (yet) IMO:

  • "canción","music"
  • "cooking technique","activity;speech"
Note: See TracTickets for help on using tickets.