Opened 9 years ago

Closed 8 years ago

#813 closed enhancement (fixed)

Improved mapping to 'license' field

Reported by: Twan Goosen Owned by: davor.ostojic@oeaw.ac.at
Priority: major Milestone: VLO-3.4
Component: VLO importer Version:
Keywords: Cc: Twan Goosen

Description (last modified by Twan Goosen)

(Partly based on this comment on #393 by Matej)

As proposed in a recent video conference, the license field should store a unique and unambiguous identifier of a specific licence (e.g. GPLv3 or CC BY-NC-SA 2.0) iff such information is available. Other licence and access related information can be stored in the accessInfo field (to be added).

Some notes with respect to the mapping to the license field in the current version of the VLO:

  • There appears to be quite some noise in the values of the license field of the VLO.
    • An example can be found here, where the values 'CLARIN_ACA-NC', 'academic-nonCommercialUse', 'attribution' and 'noRedistribution' are mapped to this one field. The first value is the actual licence, the second is the licence category and the latter two are actually 'restrictions of use'. This is a mapping issue.
    • Another kind of case is this record: 'web service' is not a license. This is a metadata quality issue. (out of scope here)
  • As noted by Matej, the OLAC fields accessRights and rights could (sometimes?) be mapped to license values. This needs to be investigated. He also notes:
    • "On the other hand, there is distributionType, rightsHolder in the facet-mapping, which I don't see show up anywhere. What is the status/idea of it (relative to availability/license)?"
  • Licence information in a metadata record may relate to an individual resource, but this is not taken into account in the mapping. The VLO should distinguish between information that applies to the entire record and information that relates to one or more referenced resources. To represent this information in the Solr index, the current single field approach may not suffice. The best option may be to have both - a global licence field and a field that allows for a licence per linked resource.

Change History (7)

comment:1 Changed 9 years ago by DefaultCC Plugin

Cc: Twan Goosen added

comment:2 Changed 9 years ago by Twan Goosen

Owner: changed from teckart@informatik.uni-leipzig.de to davor.ostojic@oeaw.ac.at
Status: newassigned

comment:3 Changed 9 years ago by Twan Goosen

I accidentally posted this comment to #812 instead of here:

Note: at some point we probably also want to start mapping licenses by URI, eg http://creativecommons.org/licenses/by/3.0/ as used in https://repo.clarino.uib.no/oai/cite?metadataPrefix=cmdi&handle=11509/98. What other license URIs are we aware of?

Or is this an idiosyncratic approach?


Reply from Thomas:

There are at least a lot of these values for the (potentially relevant) concept "http://purl.org/dc/terms/license" (not used right now for the license or availability facet, probably because of the missing mapping). It contains almost only links to license texts (like "http://sldr.org/licence_v1/en", "CSLU Agreement: https://catalog.ldc.upenn.edu/l [...]"), often in combination with the license name.
For the other concepts there seem to be almost none of these values except for some CC licenses.

comment:4 Changed 9 years ago by Twan Goosen

The URI of an additional concept (licenceURL) for the license facet was added in 01fdf030

Last edited 9 years ago by Twan Goosen (previous) (diff)

comment:5 Changed 8 years ago by Twan Goosen

Note: relevant findings and decisions from the Curation TF can be found on the page Taskforces/Curation/ValueNormalization/License

comment:6 Changed 8 years ago by Twan Goosen

Description: modified (diff)

comment:7 Changed 8 years ago by Twan Goosen

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.