This page is a subpage of [[CMDI 1.2]] = External vocabularies in CMDI 1.2: Executive summary = This page provides an executive summary of the issue and proposed solution fully described in [[CMDI 1.2/Vocabularies]]. == Issue description == The issue is about utilising external vocabularies as value domains for CMDI elements and CMDI attributes (i.e. Attribute elements. More specifically, how to do this using the [http://openskos.org OpenSKOS]-based [https://openskos.meertens.knaw.nl/web-report/documentation/clavas-overview.html CLAVAS] vocabulary service. This has been described and explored on the page [[CmdiClavasIntegration]]. The issue description below is based on the one found on that page, and subsequent discussions within the CMDI Taskforce. == CLAVAS == [http://openskos.org OpenSKOS] is a vocabulary service by which users may publish, manage and use SKOS-ified vocabulary data. The data can be accessed by a publicly available RESTful API. At this point only basic SKOS is supported. Also, some Dublin Core elements are included, but not indexed. CLAVAS is a CLARIN-NL's application of OpenSKOS. It is hosted by the Meertens Institute, contact is [[henniebrugman|Hennie Brugman]]. The vocabularies will be available through a set of web services hosted at Meertens. CLAVAS is based on [http://openskos.org OpenSKOS], where vocabularies map to 'concept schemes' in OpenSKOS. More information is available in the [attachment:wiki:CmdiClavasIntegration:CLAVAS-Workplan-1.1.docx CLAVAS work plan document]. This page covers integration of CLAVAS with [[CmdiIndex|CMDI]]. The proposed workflow is that metadata modelers can associate a vocabulary (identified by its URI) with an element in their components and profiles. The metadata creator will then be able to pick values from the specified vocabulary or (in the case of open vocabularies) still choose to use a custom value that does not appear in the vocabulary. Editors like [http://tla.mpi.nl/tools/arbil Arbil] need to be extended to access the CLAVAS public API for retrieving potential values from vocabularies. PUBLIC API, FIND CONCEPTS, AUTOCOMPLETE ==== Available data ==== * '''Languages:''' * ConceptScheme URI: [http://openskos.meertens.knaw.nl/iso-639-3] * OAI set: meertens:ISO-639-3 * '''Organisations:''' * ConceptScheme URI: [http://openskos.meertens.knaw.nl/Organisations] * OAI set: meertens:VLO-orgs * '''All public [http://www.isocat.org ISOcat] categories (only simple ones?)''' * ConceptScheme URI: Many different ones, each corresponding to the closed datcats in which conceptual domains the simple datcats are included * OAI set: meertens:isocat == Solution description (proposed) == This section the solution as proposed at Utrecht Taskforce Meeting 21.2.2014 There are mainly 2 ways of using the OpenSKOS vocabularies in CMDI: * Importing vocabularies as closed value domains for CMD_elements or Attribute. Since the vocabulary items are enumerated explicitly as a choice list in the elements in question, validation is possible. * Using one or a combination of OpenSKOS vocabularies for dynamic lookup and retrieval of values for a CMDI element or Attribute. Here a non-exclusive (open) use of items from the vocabulary must be assumed, as validation against such external vocabularies is not practicable. === Schema changes === The following changes to the General Component Schema accommodates vocabulary use for both CMD_Element and Attribute: * New element in * may have an element. If so, we have an internal, closed vocabulary (imported or locally specified). If not , then the Vocabulary is to be considered as external, and used as a lookup mechanism. Attributes for * @URI * @ValueProperty (which field of the vocabulary items to return, typically prefLabel) * @ValueLanguage, (preferred language of the item field value) ==== General Component Schema changes ==== {{{ Specification of a regular expression the element should comply with. Specification of an open or closed vocabulary A list of the allowed values of a controlled vocabulary. }}} ==== CMD_Element example ==== {{{ Dutch French .... }}} === Impact on tools === * Metadata editors must facilitate vocabulary lookup. Arbil, as the most generic editor - should be prioritized. * Component Registry must facilitate import of vocabularies. Interface for specifying value domains for elements and Attributes must be updated. * Discovery services (VLO a.o.) could provide assistance for users through vocabularies. E.g. vocabulary-based browsing. === Comments/concerns === The proposed solution allows abuse to a certain degree, and it is vital to describe and motivate for good practices before bad practice proliferates. The main concern is connected to the possibility for ''importing vocabularies as controlled value ranges for CMD_Element and Attribute''. ==== Avoiding multiplication of large vocabularies in CR ==== Since imported vocabularies are to be part of elements, and elements are not reusable, great care must be taken so that large enumeration lists are not duplicated across components. One way of achieving this is 1. to consider which vocabularies are likely to be relevant in many profiles 2. for each concept property that is relevant as ValueProperty for some element in CR, define a component in CR containing one element only and import the property values of the vocabulary concepts as its closed value domain. * Example: The component [http://catalog.clarin.eu/ds/ComponentRegistry?item=clarin.eu:cr1:c_1271859438110 iso-language-639-3] contains one element only - iso-639-3-code - taking values from a controlled vocabulary of language codes. (With the proposed 1.2 model, and given the CLAVAS vocabulary of langauges, ValueProperty would have been set to "notation"). Some modelers may prefer to store the ''language names'' instead of or in addition to ''codes''. To make sure this can be reused independently of language codes, another component containing a language name element (with ValueProperty=prefLabel) should be defined. Note: If the same effect is to be obtained for Attributes, they also will have to be wrapped separately in a component. * Example: Consider the component [http://catalog.clarin.eu/ds/ComponentRegistry?item=clarin.eu:cr1:c_1349361150621 OLAC-DcmiTerms]. Here all elements have similar attributes DcmiType with the entire value list enumerated for each occurrence. If this is to be avoided in the proposed 1.2 model, elements and attributes must be wrapped separately into components throughout. ==== Importing partial vocabularies hampers reuse ==== The proposed model does not force the modeller to import entire vocabularies only, - it is possible to import only subsets from a larger vocabulary. For example, in a specific ''language'' element, the component creator may choose to import only the languages relevant in his/her user community. Such practice should be discouraged, as it renders the component unusable for anyone who needs access to more/other languages, event though the component otherwise might be perfectly suitable. Not supporting partial imports while retaining the external vocabulary reference in the Component Registry should however drastically limit the number of such occurrences.