wiki:CmdiClavasIntegration

Version 6 (modified by twagoo, 11 years ago) (diff)

updated xml examples for instances and comp specification

Integration of CLAVAS into CMDI

Contents

  1. Integration of CLAVAS into CMDI
    1. Introduction
    2. Overview
      1. Status of CLAVAS
        1. Public API
        2. Management interface and editor
        3. Available data
      2. Specifying vocabularies in CMDI specifications, schemata and instances
        1. Specifying vocabularies in CMDI instances
        2. Specifying vocabularies in CMDI component specifications
        3. Specifying vocabularies in CMDI profile XSD's
      3. CLAVAS vocabulary sources
    3. Open issues
    4. Roadmap
      1. Phase 1
      2. Phase 2

Introduction

CLAVAS is a CLARIN-NL project that addresses a vocabulary service. It is hosted by the Meertens Institute, contact is Hennie Brugman?. The vocabularies will be available through a set of web services hosted at Meertens. CLAVAS is based on OpenSKOS, where vocabularies map to 'concept schemes' in OpenSKOS. More information is available in the CLAVAS work plan document.

This page covers integration of CLAVAS with CMDI. The idea, roughly speaking, is that metadata modelers can associate a vocabulary (identified by its URI) with an element in their components and profiles. The metadata creator will then be able to pick values from the specified vocabulary or (in the case of open vocabularies) still choose to use a custom value that does not appear in the vocabulary. Editors like Arbil need to be extended to access the CLAVAS public API for retrieving potential values from vocabularies.

Overview

Status of CLAVAS

Public API

A public API that allows one to search for collections and concepts within a scheme (vocabulary) is available.

The autocomplete API can be used to get all items in an identified vocabulary or a subset in JSON or RDF by providing the URI of a vocabulary as a parameter.

Eventually a CLAVAS-specific (as opposed to openskos.org) service will become available.

Management interface and editor

A concept scheme editor is available but not publicly accessible.

Available data

There are working examples that suffice for use during development, e.g. the languages vocabulary. (URL!)

A vocabulary of institutes will become available on a short time scale.

Specifying vocabularies in CMDI specifications, schemata and instances

Specifying vocabularies in CMDI instances

Each concept within a vocabulary is identified by an OpenSKOS specific URI and optionally has a reference to a 'source' URI (e.g. ISOCat). For fields that link to vocabularies as open vocabularies, we want to store one of these URIs as an attribute in CMDI metadata instances, e.g.:

<language cmd:VocabItem="http://cdb.iso.org/lg/CDB-00138580-001">Dutch</language>

Notice the cmd-prefix to disambiguate and prevent clashes with potential custom attributes of this name.

Each vocabulary item has an CLAVAS URI and optionally a 'source URI' (which generally will come from the primary data source, e.g. ISOCat), as in the example above. There will be a deterministic fall-back mechanism in Arbil that chooses the source URI if available, otherwise the CLAVAS URI.

The value that serves as the text content should come from one of the child elements of the concept definition. Typically this will be the preferred label as specified in the vocabulary item returned from the API but it could also come from another element (e.g. a notation element that has a language code). Which path to use should be determined in the component specification (possibly part of the vocabulary URI).

Closed vocabularies will be no different from standard CMDI closed vocabularies on the instance level. All required information will be available from the schema (see below).

Specifying vocabularies in CMDI component specifications

Taking the above into account, an element specification in a CMDI component or profile could look something like:

Open vocabularies

Specified using attributes on CMD_Element

  • ValueScheme has to be string. For this we add a schematron rule to the general component schema.
  • Vocabulary URI (can be URL or URN) gets specified in Vocabulary attribute
  • Value field optionally specified, either as a parameter on the URI or as a separate attribute (TODO: DECIDE). It would be nice if we could pass the 'label' selection on to the API so that a pre-selection can happen server side (making the processing of the response more uniform)

Example:

<CMD_Element 
    name="Institution"
    CardinalityMax="1" 
    CardinalityMin="1" 
    ValueScheme="string"
    Vocabulary="http://openskos.org/institutions?label=name"
/>

OR

<CMD_Element 
    name="Institution"
    CardinalityMax="1" 
    CardinalityMin="1" 
    ValueScheme="string"
    Vocabulary="http://openskos.org/institutions"
    VocabularyLabel="name"
/>
Closed vocabularies
<CMD_Element 
    name="Language"
    CardinalityMax="1" 
    CardinalityMin="1" 
    Vocabulary="http://openskos.org/api/languages?label=iso-639-3">
    <ValueScheme>
      <enumeration>
         <item AppInfo="Dutch" VocabItem="http://cdb.iso.org/lg/CDB-00138580-001">dut</item>
         <item AppInfo="French" VocabItem="http://cdb.iso.org/lg/CDB-00138512-001">fra</item>
      </enumeration>
    </ValueScheme>
</CMD_Element>

Specifying vocabularies in CMDI profile XSD's

The values of the vocabulary related attributes could go straight into the generated profile XSD, pretty much like the "datcat"-attributes and read like that from the schema by client applications. We can put them

CLAVAS vocabulary sources

Vocabularies from ISOCat will be provided to OpenSKOS (the exact details as to how this is going to be done still have to be worked out), Arbil and ComponentRegistry will query these vocabularies through OpenSKOS.

Open issues

  • PID's for vocabularies (yes please) and vocabulary items (maybe... possibly use part identifiers)
  • ...

Roadmap

Implementation can be divided into at least two phases:

Phase 1

Target milestone: late december 2012

  • Adapt the CMDI schema to allow vocabularies to be referenced from elements. In this phase only support open vocabularies and values coming from prefLabel
  • Create a CMDI profile that references vocabularies
  • Implement a client in Arbil that consumes the OpenSKOS API for specific vocabularies (initially we skip caching, e.g. online availability only)
  • Create UI component(s) in Arbil that make use of these calls to present the vocabulary items to the user
  • Extend reading of profile specification and reading/writing of CMDI instances to support vocabulary references

Phase 2

Target milestone: ?

  • Allow closed vocabularies
  • Vocabulary caching in Arbil
  • ...to be further specified...

Attachments (1)

Download all attachments as: .zip