wiki:Taskforces/Curation/ValueNormalization/License

Version 30 (modified by matej.durco@oeaw.ac.at, 8 years ago) (diff)

adding smc-snaphot on licens/ce type

License/Availability

Currently, VLO features two facets License and Availability

There seems to be a lot of confusion what the two facets should contain. Cleaning up and consolidating this information in VLO is scheduled for the upcoming VLO-3.4 release. Please consult also issues #812, #813, #814.

Facet/Concept Definitions

The two facets and their descriptions/definitions as stated in the facetConcepts.xml mapping file

facet description definition
availability The usage conditions for the resource or tool A rough description of the conditions under which the resource or tool can be used
license The licensing conditions for the resource or tool The name of the license or a very brief description of the licensing conditions under which the resource or tool can be used
used for facet
prefLabel (CCR) Isocat Availability Licence comment in facetConcepts.xml CCR definition
license DC-2457 yes yes A description of the licensing conditions A description of the licensing conditions under which the resource can be used. (source: CLARIN)
availability DC-2453 yes yes A description of the terms of availability of the resource in simple words A description of the terms of availability of the resource in simple words. (source: CLARIN)
licence type DC-3800 yes yes licenceType: Indication of the type of a copyright licence Indication of the type of a copyright licence. (source: NaLiDa)
rights DC-6846 (yes) - only isocat link (yes) - only isocat link DASISH from DC Rights Any rights information for this resource. (source: DASISH)
license type DC-5439 yes no (commented out) The DC is used in the distributionType facet A rights-based classification of language resources and tools, indicating the scope of the target audience (source: CLARIN Deliverable: D7S-2.1, "A report including Model Licensing Templates and Authorization and Authentication Scheme")
licence URL DC-6586 no yes? (according to email from Menzo, 2015-11-18, but not visible in trunk yet) URL of a licence (see DC-2457), representing a web location at which the licence text is available.
dcterms:license no (commented out) no (commented out) A legal document giving official permission to do something with the resource. (source: dcmi)

Snapshot from SMC-browser showing which profiles use which of the two concepts: license type vs. licence type:

No image "smc_licenseType_licenceType_profile_dc_groups_2016-01-21.png" attached to Taskforces/Curation/ValueNormalization/License

Normalization of values

There is already a normalisation map used in production (committed 2015-04-23). But there are new values that are not mapped yet. Normalisation map as gsheet with already existing mappings (see normalisation map above) + new values encountered not yet normalized;
Values come from elements annotated with concepts linked to one of the two facets License/Availability.

See also the license categories as proposed by the Legal Issues Committee.

We propose to map the values to the license categories in decomposed fashion. I.e. License CC-BY-SA would become ["PUB", "BY", "SA"].

Allowing multiple possible values for the facet in each record in combination with the (already implemented) multi-select feature in VLO this should cover for all use cases and be more ergonomic (e.g. if I am interested only in the Non-Commercial clause, I need to select only one facet value and don’t have to search for all the combination that contain NC.)

Missing values

Curently all the mapped concepts only cover around 60.000 records! Records with missing value for availability

Here we have 3 possible situations:

  1. Profile does not have any information about licensing/availability (worst case)
  2. Profile has information about L/A, but is not linked to a concept, or the concept is not in the facet mapping
  3. Profile is well defined, with linking to one of the concepts in the facet mapping, but the information is simply not filled in the record.

Overview of profile/facet coverage with special considerations of availability and licensing facet. Especially also the individual concepts contributing to the facet are plotted (see the c-* columns)

Attachments (4)

Download all attachments as: .zip