CLAVAS support has been proposed to be taken up as a part of CMDI 1.2, see CMDI 1.2/Vocabularies for more information!
Integration of CLAVAS into CMDI
Contents
- Integration of CLAVAS into CMDI
Introduction
CLAVAS is a CLARIN-NL project that addresses a vocabulary service. It is hosted by the Meertens Institute, contact is Hennie Brugman?. The vocabularies will be available through a set of web services hosted at Meertens. CLAVAS is based on OpenSKOS, where vocabularies map to 'concept schemes' in OpenSKOS. More information is available in the CLAVAS work plan document.
This page covers integration of CLAVAS with CMDI. The idea, roughly speaking, is that metadata modelers can associate a vocabulary (identified by its URI) with an element in their components and profiles. The metadata creator will then be able to pick values from the specified vocabulary or (in the case of open vocabularies) still choose to use a custom value that does not appear in the vocabulary. Editors like Arbil need to be extended to access the CLAVAS public API for retrieving potential values from vocabularies.
Overview
Status of CLAVAS
Public API
A public API that allows one to search for collections and concepts within a scheme (vocabulary) is available.
The autocomplete API can be used to get all items in an identified vocabulary or a subset in JSON or RDF by providing the URI of a vocabulary as a parameter.
Eventually a CLAVAS-specific (as opposed to openskos.org) service will become available.
Management interface and editor
A concept scheme editor is available but not publicly accessible.
Available data
There are working examples that suffice for use during development, e.g. the languages vocabulary. (URL!)
A vocabulary of institutes will become available on a short time scale.
Specifying vocabularies in CMDI specifications, schemata and instances
Specifying vocabularies in CMDI instances
Open vocabularies
Each concept within a vocabulary is identified by an OpenSKOS specific URI and optionally has a reference to a 'source' URI (e.g. ISOCat). For fields that link to vocabularies as open vocabularies, we want to store one of these URIs as an attribute in CMDI metadata instances, e.g.:
<language cmd:ValueConceptLink="http://cdb.iso.org/lg/CDB-00138580-001">Dutch</language>
Notice the cmd-prefix to disambiguate and prevent clashes with potential custom attributes of this name.
Each item in the vocabulary has an OpenSKOS URI and optionally a 'source URI' (which generally will come from the primary data source, e.g. ISOCat), as in the example above. There will be a deterministic fall-back mechanism in Arbil that chooses the source URI if available, otherwise the CLAVAS URI.
The value that serves as the text content should come from one of the child elements of the concept definition. Typically this will be the preferred label as specified in the vocabulary item returned from the API but it could also come from another element (e.g. to choose between item full name and item code). Which path to use should be determined in the component specification (possibly part of the vocabulary URI).
Closed vocabularies
Closed vocabularies will be no different from standard CMDI closed vocabularies on the instance level. All required information will be available from the schema due to the vocabulary import (see below).
Specifying vocabularies in CMDI component specifications
Taking the above into account, an element specification in a CMDI component or profile could look something like:
Open vocabularies
Specified using attributes on CMD_Element
- ValueScheme has to be string. For this we add a schematron rule to the general component schema.
- Vocabulary URI (can be URL or URN) gets specified in Vocabulary attribute
- Value property field optionally specified (default=prefLabel), either as a parameter on the URI or as a separate attribute VocabValueProperty? (second example). TODO: DECIDE
- Use case is language vocabulary that provides versions of ISO-639 per item
- It would be nice if we could pass the 'label' selection on to the API so that a pre-selection can happen server side, returning it in a specially marked element or attribute (making the processing of the response more uniform)
Example:
<CMD_Element name="Institution" CardinalityMax="1" CardinalityMin="1" ValueScheme="string" Vocabulary="http://openskos.org/institutions?label=name" />
OR
<CMD_Element name="Institution" CardinalityMax="1" CardinalityMin="1" ValueScheme="string" Vocabulary="http://openskos.org/institutions" VocabValueProperty="name" />
Closed vocabularies
Closed vocabularies will be 'imported' into the component design-time, resulting in an internalized 'snapshot copy' of the vocabulary at the time of creation. The ComponentRegistry will be extended with functionality to allow this. The vocabulary URI will be stored in the component specification and transferred to the schema so that editors can query the API for additional information but this is optional as all information including the item URI's will be available from the schema.
We will add the vocabulary uri as an attribute to the element and re-use the existing ConceptLink attribute on the enumeration items to store the identifier of individual vocabulary items.
<CMD_Element name="Language" CardinalityMax="1" CardinalityMin="1" Vocabulary="http://openskos.org/api/languages?label=iso-639-3"> <ValueScheme> <enumeration> <item ConceptLink="http://cdb.iso.org/lg/CDB-00138580-001">Dutch</item> <item ConceptLink="http://cdb.iso.org/lg/CDB-00138512-001">French</item> </enumeration> </ValueScheme> </CMD_Element>
Text content comes from the selected label. ConceptLink? has the URI for each item in the vocabulary. There probably is no need for AppInfo? (separate display label). Notice that there currently is no way to represent multilingual vocabularies, so the language will have to be specified in the vocabulary URI with a fallback to the default language of the vocabulary.
Specifying vocabularies in CMDI profile XSD's
The values of the vocabulary related attributes could go straight into the generated profile XSD, pretty much like the "datcat"-attributes and read like that from the schema by client applications.
Open vocabularies
Example, assuming the solution with separate attributes for vocabulary id and label specifiers:
<xs:element name="Institute" ann:displaypriority="1" dcr:datcat="http://www.isocat.org/datcat/DC-3785" cmd:Vocabulary="http://openskos.org/institutions" cmd:VocabValueProperty="name"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute ref="xml:lang"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element>
Closed vocabularies
Example:
<xs:simpleType name="simpletype-iso-639-3-code-clarin.eu.cr1.c_123456789" cmd:Vocabulary="http://openskos.org/api/languages?label=iso-639-3"> <xs:restriction base="xs:string"> <xs:enumeration value="Dutch" dcr:datcat="http://cdb.iso.org/lg/CDB-00138512-001" /> <xs:enumeration value="French" dcr:datcat="http://cdb.iso.org/lg/CDB-00138512-001" /> </xs:restriction> </xs:simpleType>
CLAVAS vocabulary sources
Vocabularies from ISOCat will be provided to OpenSKOS (the exact details as to how this is going to be done still have to be worked out), Arbil and ComponentRegistry will query these vocabularies through OpenSKOS.
Retrieving vocabularies
An example provided by Hennie Brugman:
Pagination of results is supported as in Solr, with 'start' and 'rows' parameters.
And an OAI-PMH variant: https://openskos.meertens.knaw.nl/oai-pmh?verb=ListRecords&set=meertens:VLO-orgs&metadataPrefix=oai_dc
Partial example output of the former request:
<?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:openskos="http://openskos.org/xmlns#" openskos:numFound="2504" openskos:start="0" openskos:maxScore="1.4165695" > <rdf:Description rdf:about="http://openskos.meertens.knaw.nl/Organisations/c39056d3-bc5f-4c22-9381-3b9b9d7b38ef"> <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/> <skos:prefLabel xml:lang="en">Jensco Limited</skos:prefLabel> <skos:altLabel xml:lang="en">Jensco Ltd.</skos:altLabel> <skos:inScheme rdf:resource="http://openskos.meertens.knaw.nl/Organisations"/> </rdf:Description> <rdf:Description rdf:about="http://openskos.meertens.knaw.nl/Organisations/1e824da3-ef2e-406d-bde5-8492b392172a"> <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/> <skos:prefLabel xml:lang="en">Metsniereba</skos:prefLabel> <skos:inScheme rdf:resource="http://openskos.meertens.knaw.nl/Organisations"/> </rdf:Description> ...
The search could be more fine tuned by adding more query parameters, for example (also provided by Hennie):
Related tickets
Attachments (1)
-
CLAVAS-Workplan-1.1.docx (224.3 KB) - added by 11 years ago.
CLAVAS workplan version 1.1
Download all attachments as: .zip