wiki:Taskforces/Curation/ValueNormalization/License

License/Availability

There seems to be a lot of confusion what the two facets should contain. Cleaning up and consolidating this information in VLO is scheduled for the upcoming VLO-3.4 release. Please consult also issues #812, #813, #814 and notes from meeting at CAC2015.

Facet/Concept Definitions

The three facets and their descriptions/definitions as stated in the facetConcepts.xml mapping file

facet description definition
availability The usage conditions for the resource or tool A rough description of the conditions under which the resource or tool can be used
license The licensing conditions for the resource or tool The name of the license or a very brief description of the licensing conditions under which the resource or tool can be used
accessInfo TODO... TODO...
used for facet
prefLabel (CCR) Isocat Availability Licence accessInfo comment in facetConcepts.xml CCR definition
availability DC-2453 yes yes yes A description of the terms of availability of the resource in simple words A description of the terms of availability of the resource in simple words. (source: CLARIN)
license DC-2457 yes yes no A description of the licensing conditions A description of the licensing conditions under which the resource can be used. (source: CLARIN)
licence type DC-3800 yes yes yes licenceType: Indication of the type of a copyright licence Indication of the type of a copyright licence. (source: NaLiDa)
license type DC-5439 yes no no (commented out) The DC is used in the distributionType facet A rights-based classification of language resources and tools, indicating the scope of the target audience (source: CLARIN Deliverable: D7S-2.1, "A report including Model Licensing Templates and Authorization and Authentication Scheme")
licence URL DC-6586 yes yes no URL of a licence (see DC-2457), representing a web location at which the licence text is available. URL of a licence (see DC-2457), representing a web location at which the licence text is available.
rights DC-6846 (yes) - only isocat link no yes DASISH from DC Rights Any rights information for this resource. (source: DASISH)
dcterms:license no (commented out) no no (commented out) A legal document giving official permission to do something with the resource. (source: dcmi)

Snapshot from SMC-browser showing which profiles use which of the two concepts: license type vs. licence type:

SMC-snapshot showing profiles using licenseType vs. licenceType [PNG]

Normalization of values

There is already a normalisation map used in production (committed 19/02/2016). But there are new values that are not mapped yet. Normalisation map as gsheet with already existing mappings (see normalisation map above) + new values encountered not yet normalized;
Values come from elements annotated with concepts linked to one of the two facets License/Availability.

See also the license categories as proposed by the Legal Issues Committee.

We propose to map the values to the license categories in decomposed fashion. I.e. License CC-BY-SA would become ["PUB", "BY", "SA"].

Allowing multiple possible values for the facet in each record in combination with the (already implemented) multi-select feature in VLO this should cover for all use cases and be more ergonomic (e.g. if I am interested only in the Non-Commercial clause, I need to select only one facet value and don’t have to search for all the combination that contain NC.)

Missing values

Curently all the mapped concepts only cover around 60.000 records! Records with missing value for availability

Here we have 3 possible situations:

  1. Profile does not have any information about licensing/availability (worst case)
  2. Profile has information about L/A, but is not linked to a concept, or the concept is not in the facet mapping
  3. Profile is well defined, with linking to one of the concepts in the facet mapping, but the information is simply not filled in the record.

Overview of profile/facet coverage with special considerations of availability and licensing facet. Especially also the individual concepts contributing to the facet are plotted (see the c-* columns)

What is done for VLO 3.4 release

  • facetConcepts.xml is restructured and has a new facet called accessInfo.
  • new concept licence URL has been added to the availability and license facets.
  • normalisation map is updated with new values and CLIC did the mapping for new values.
  • another normalisation map containing mappings from license values to license URIs is introduced, LicenseURIMap.xml
  • In VLO GUI Availability facet is separated from others. This facet allows multi-selection of four values corresponding to the recommendations from CLIC PUB/ACA/RES/UNSPECIFIED. This values are represented with checkboxes followed by an icon and description in a tooltip. By deselecting all of them user gets records with OTHER category. In the list of results for each record user can see corresponding icons for licenses under which the CMD record is published. By selecting one record user will get detailed information about license in a separate section including explanation, links to licenses when available, original values from CMD record and disclaimer.
Last modified 8 years ago Last modified on 03/21/16 10:49:29

Attachments (4)

Download all attachments as: .zip