wiki:CmdiClavasIntegration

CLAVAS support has been proposed to be taken up as a part of CMDI 1.2, see CMDI 1.2/Vocabularies for more information!

Integration of CLAVAS into CMDI

Contents

  1. Integration of CLAVAS into CMDI
    1. Introduction
    2. Overview
      1. Status of CLAVAS
        1. Public API
        2. Management interface and editor
        3. Available data
      2. Specifying vocabularies in CMDI specifications, schemata and instances
        1. Specifying vocabularies in CMDI instances
          1. Open vocabularies
          2. Closed vocabularies
        2. Specifying vocabularies in CMDI component specifications
          1. Open vocabularies
          2. Closed vocabularies
        3. Specifying vocabularies in CMDI profile XSD's
          1. Open vocabularies
          2. Closed vocabularies
      3. CLAVAS vocabulary sources
      4. Retrieving vocabularies
    3. Related tickets

Introduction

CLAVAS is a CLARIN-NL project that addresses a vocabulary service. It is hosted by the Meertens Institute, contact is Hennie Brugman?. The vocabularies will be available through a set of web services hosted at Meertens. CLAVAS is based on OpenSKOS, where vocabularies map to 'concept schemes' in OpenSKOS. More information is available in the CLAVAS work plan document.

This page covers integration of CLAVAS with CMDI. The idea, roughly speaking, is that metadata modelers can associate a vocabulary (identified by its URI) with an element in their components and profiles. The metadata creator will then be able to pick values from the specified vocabulary or (in the case of open vocabularies) still choose to use a custom value that does not appear in the vocabulary. Editors like Arbil need to be extended to access the CLAVAS public API for retrieving potential values from vocabularies.

Overview

Status of CLAVAS

Public API

A public API that allows one to search for collections and concepts within a scheme (vocabulary) is available.

The autocomplete API can be used to get all items in an identified vocabulary or a subset in JSON or RDF by providing the URI of a vocabulary as a parameter.

Eventually a CLAVAS-specific (as opposed to openskos.org) service will become available.

Management interface and editor

A concept scheme editor is available but not publicly accessible.

Available data

There are working examples that suffice for use during development, e.g. the languages vocabulary. (URL!)

A vocabulary of institutes will become available on a short time scale.

Specifying vocabularies in CMDI specifications, schemata and instances

Specifying vocabularies in CMDI instances

Open vocabularies

Each concept within a vocabulary is identified by an OpenSKOS specific URI and optionally has a reference to a 'source' URI (e.g. ISOCat). For fields that link to vocabularies as open vocabularies, we want to store one of these URIs as an attribute in CMDI metadata instances, e.g.:

<language cmd:ValueConceptLink="http://cdb.iso.org/lg/CDB-00138580-001">Dutch</language>

Notice the cmd-prefix to disambiguate and prevent clashes with potential custom attributes of this name.

Each item in the vocabulary has an OpenSKOS URI and optionally a 'source URI' (which generally will come from the primary data source, e.g. ISOCat), as in the example above. There will be a deterministic fall-back mechanism in Arbil that chooses the source URI if available, otherwise the CLAVAS URI.

The value that serves as the text content should come from one of the child elements of the concept definition. Typically this will be the preferred label as specified in the vocabulary item returned from the API but it could also come from another element (e.g. to choose between item full name and item code). Which path to use should be determined in the component specification (possibly part of the vocabulary URI).

Closed vocabularies

Closed vocabularies will be no different from standard CMDI closed vocabularies on the instance level. All required information will be available from the schema due to the vocabulary import (see below).

Specifying vocabularies in CMDI component specifications

Taking the above into account, an element specification in a CMDI component or profile could look something like:

Open vocabularies

Specified using attributes on CMD_Element

  • ValueScheme has to be string. For this we add a schematron rule to the general component schema.
  • Vocabulary URI (can be URL or URN) gets specified in Vocabulary attribute
  • Value property field optionally specified (default=prefLabel), either as a parameter on the URI or as a separate attribute VocabValueProperty? (second example). TODO: DECIDE
    • Use case is language vocabulary that provides versions of ISO-639 per item
    • It would be nice if we could pass the 'label' selection on to the API so that a pre-selection can happen server side, returning it in a specially marked element or attribute (making the processing of the response more uniform)

Example:

<CMD_Element 
    name="Institution"
    CardinalityMax="1" 
    CardinalityMin="1" 
    ValueScheme="string"
    Vocabulary="http://openskos.org/institutions?label=name"
/>

OR

<CMD_Element 
    name="Institution"
    CardinalityMax="1" 
    CardinalityMin="1" 
    ValueScheme="string"
    Vocabulary="http://openskos.org/institutions"
    VocabValueProperty="name"
/>
Closed vocabularies

Closed vocabularies will be 'imported' into the component design-time, resulting in an internalized 'snapshot copy' of the vocabulary at the time of creation. The ComponentRegistry will be extended with functionality to allow this. The vocabulary URI will be stored in the component specification and transferred to the schema so that editors can query the API for additional information but this is optional as all information including the item URI's will be available from the schema.

We will add the vocabulary uri as an attribute to the element and re-use the existing ConceptLink attribute on the enumeration items to store the identifier of individual vocabulary items.

<CMD_Element 
    name="Language"
    CardinalityMax="1" 
    CardinalityMin="1" 
    Vocabulary="http://openskos.org/api/languages?label=iso-639-3">
    <ValueScheme>
      <enumeration>
         <item ConceptLink="http://cdb.iso.org/lg/CDB-00138580-001">Dutch</item>
         <item ConceptLink="http://cdb.iso.org/lg/CDB-00138512-001">French</item>
      </enumeration>
    </ValueScheme>
</CMD_Element>

Text content comes from the selected label. ConceptLink? has the URI for each item in the vocabulary. There probably is no need for AppInfo? (separate display label). Notice that there currently is no way to represent multilingual vocabularies, so the language will have to be specified in the vocabulary URI with a fallback to the default language of the vocabulary.

Specifying vocabularies in CMDI profile XSD's

The values of the vocabulary related attributes could go straight into the generated profile XSD, pretty much like the "datcat"-attributes and read like that from the schema by client applications.

Open vocabularies

Example, assuming the solution with separate attributes for vocabulary id and label specifiers:

<xs:element 
  name="Institute"  
  ann:displaypriority="1"
  dcr:datcat="http://www.isocat.org/datcat/DC-3785" 
  cmd:Vocabulary="http://openskos.org/institutions" 
  cmd:VocabValueProperty="name">
  <xs:complexType>
    <xs:simpleContent>
      <xs:extension base="xs:string">
        <xs:attribute ref="xml:lang"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>
</xs:element>
Closed vocabularies

Example:

<xs:simpleType 
  name="simpletype-iso-639-3-code-clarin.eu.cr1.c_123456789" 
  cmd:Vocabulary="http://openskos.org/api/languages?label=iso-639-3">
  <xs:restriction base="xs:string">
    <xs:enumeration value="Dutch" dcr:datcat="http://cdb.iso.org/lg/CDB-00138512-001" />
    <xs:enumeration value="French" dcr:datcat="http://cdb.iso.org/lg/CDB-00138512-001" />
  </xs:restriction>
</xs:simpleType>

CLAVAS vocabulary sources

Vocabularies from ISOCat will be provided to OpenSKOS (the exact details as to how this is going to be done still have to be worked out), Arbil and ComponentRegistry will query these vocabularies through OpenSKOS.

Retrieving vocabularies

An example provided by Hennie Brugman:

https://openskos.meertens.knaw.nl/api/find-concepts?q=inScheme:http*Organisations&format=rdf&rows=3000

Pagination of results is supported as in Solr, with 'start' and 'rows' parameters.

And an OAI-PMH variant: https://openskos.meertens.knaw.nl/oai-pmh?verb=ListRecords&set=meertens:VLO-orgs&metadataPrefix=oai_dc

Partial example output of the former request:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
        xmlns:skos="http://www.w3.org/2004/02/skos/core#"
        xmlns:dcterms="http://purl.org/dc/terms/"       
        xmlns:openskos="http://openskos.org/xmlns#"
        openskos:numFound="2504"
        openskos:start="0"
        openskos:maxScore="1.4165695" >
  <rdf:Description rdf:about="http://openskos.meertens.knaw.nl/Organisations/c39056d3-bc5f-4c22-9381-3b9b9d7b38ef">
        <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
        <skos:prefLabel xml:lang="en">Jensco Limited</skos:prefLabel>
        <skos:altLabel xml:lang="en">Jensco Ltd.</skos:altLabel>
        <skos:inScheme rdf:resource="http://openskos.meertens.knaw.nl/Organisations"/>
  </rdf:Description>
  <rdf:Description rdf:about="http://openskos.meertens.knaw.nl/Organisations/1e824da3-ef2e-406d-bde5-8492b392172a">
        <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
        <skos:prefLabel xml:lang="en">Metsniereba</skos:prefLabel>
        <skos:inScheme rdf:resource="http://openskos.meertens.knaw.nl/Organisations"/>
  </rdf:Description>
  ...

The search could be more fine tuned by adding more query parameters, for example (also provided by Hennie):

http://editor.openskos.org/api/find-concepts?q=prefLabelAutocomplete:dood%20inScheme:"http://data.beeldengeluid.nl/gtaa/Onderwerpen"

Related tickets

Ticket Component Summary Milestone
#369 ComponentSchema Add support for open and closed vocabularies in component specifications and instances CMDI 1.2
#370 ComponentRegistry Add support for OpenSKOS vocabularies

Last modified 10 years ago Last modified on 02/20/14 08:13:50

Attachments (1)

Download all attachments as: .zip