wiki:CmdiClavasIntegration

Version 5 (modified by twagoo, 11 years ago) (diff)

open issues

Integration of CLAVAS into CMDI

Contents

  1. Integration of CLAVAS into CMDI
    1. Introduction
    2. Overview
      1. Status of CLAVAS
        1. Public API
        2. Management interface and editor
        3. Available data
      2. Specifying vocabularies in CMDI components
      3. CLAVAS vocabulary sources
    3. Open issues
    4. Roadmap
      1. Phase 1
      2. Phase 2

Introduction

CLAVAS is a CLARIN-NL project that addresses a vocabulary service. It is hosted by the Meertens Institute, contact is Hennie Brugman?. The vocabularies will be available through a set of web services hosted at Meertens. CLAVAS is based on OpenSKOS, where vocabularies map to 'concept schemes' in OpenSKOS. More information is available in the CLAVAS work plan document.

This page covers integration of CLAVAS with CMDI. The idea, roughly speaking, is that metadata modelers can associate a vocabulary (identified by its URI) with an element in their components and profiles. The metadata creator will then be able to pick values from the specified vocabulary or (in the case of open vocabularies) still choose to use a custom value that does not appear in the vocabulary. Editors like Arbil need to be extended to access the CLAVAS public API for retrieving potential values from vocabularies.

Overview

Status of CLAVAS

Public API

A public API that allows one to search for collections and concepts within a scheme (vocabulary) is available.

The autocomplete API can be used to get all items in an identified vocabulary or a subset in JSON or RDF by providing the URI of a vocabulary as a parameter.

Eventually a CLAVAS-specific (as opposed to openskos.org) service will become available.

Management interface and editor

A concept scheme editor is available but not publicly accessible.

Available data

There are working examples that suffice for use during development, e.g. the languages vocabulary. (URL!)

A vocabulary of institutes will become available on a short time scale.

Specifying vocabularies in CMDI components

Each concept within a vocabulary is identified by an OpenSKOS specific URI and optionally has a reference to a 'source' URI (e.g. ISOCat). For CLAVAS enabled fields in CMDI metadata instances we want to store one of these URIs as an attribute, e.g.:

<language clavas:id="http://cdb.iso.org/lg/CDB-00138580-001">Dutch</language>

Each vocabulary item has an CLAVAS URI and optionally a 'source URI' (which generally will come from the primary data source, e.g. ISOCat), as in the example above. There will be a deterministic fall-back mechanism in Arbil that chooses the source URI if available, otherwise the CLAVAS URI.

The value that serves as the text content should come from one of the child elements of the concept definition. Typically this will be the preferred label as specified in the vocabulary item returned from the API but it could also come from another element (e.g. a notation element that has a language code). Which path to use should be part of the component specification.

Vocabularies can be designated as either closed or open, and Arbil will present them accordingly but will always allow arbitrary input (as it does now). We have to accept that values cannot always be validated against closed vocabularies coming form an external service (in contrast to closed controlled vocabularies that come from the schema as we already have). Applications like Arbil and harvesters could have validation steps in addition to schema validation that check the validity of the values in elements that have a vocabulary reference. Possibly Schematron could be used to implement this.

Taking these things into account, an element specification in a CMDI component or profile could look something like:

<CMD_Element 
    name="Institution"
    CardinalityMax="1" 
    CardinalityMin="1" 
    clavas:vocabulary="http://openskos.org/api/institutions"
    clavas:type="open"
    clavas:valueElement="prefLabel"
/>

The "clavas"-attributes could go straight into the generated profile XSD, pretty much like the "datcat"-attributes and read like that from the schema by client applications.

CLAVAS vocabulary sources

Vocabularies from ISOCat will be provided to OpenSKOS (the exact details as to how this is going to be done still have to be worked out), Arbil and ComponentRegistry will query these vocabularies through OpenSKOS.

Open issues

  • PID's for vocabularies (yes please) and vocabulary items (maybe... possibly use part identifiers)
  • ...

Roadmap

Implementation can be divided into at least two phases:

Phase 1

Target milestone: late december 2012

  • Adapt the CMDI schema to allow vocabularies to be referenced from elements. In this phase only support open vocabularies and values coming from prefLabel
  • Create a CMDI profile that references vocabularies
  • Implement a client in Arbil that consumes the OpenSKOS API for specific vocabularies (initially we skip caching, e.g. online availability only)
  • Create UI component(s) in Arbil that make use of these calls to present the vocabulary items to the user
  • Extend reading of profile specification and reading/writing of CMDI instances to support vocabulary references

Phase 2

Target milestone: ?

  • Allow closed vocabularies
  • Vocabulary caching in Arbil
  • ...to be further specified...

Attachments (1)

Download all attachments as: .zip