wiki:CMDI 1.2/SpecificationDraft1

NOTE: This page is a copy of a deprecated version of the CMDI 1.2 specification draft (revision 13) and for reference only.

For up-to-date information, please consult the current version and do not make any changes to this version.

Component Metadata Infrastructure (CMDI) 1.2 [DRAFT]

Introduction

The goal of the Component Metadata Infrastructure (CMDI) specification...

Terminology

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC2119.

Glossary

CLARIN-FCS, FCS
CLARIN federated content search, an interface specification to allow searching within resource content of repositories.
PID
A Persistent identifier is a long-lasting reference to a digital object.
attribute
synonym of XML attribute
bundle
collection in which the resources are tight together, having the same origin and are distributed together

a media file and its annotation created and distributed by the same person

CCSL, CMDI Component Specification Language
XML based language for describing components according to the CMDI model
CMDI, Component Metadata Infrastructure
Metadata description framework consisting of the CMDI model and infrastructure
collection
set of resources described by common metadata and distributed as a unit, i.e. referenced by a single persistent identifier
component
list of metadata elements and other components of which every data element corresponds to a metadata category. Together they describe an aspect of a component, e.g. name, language, other metadata properties of a LRT
data category, datcat

result of the specification of a given data field

  1. a type of data field, such as /definition/.
  2. ISO 212620:2009 provides for the creation of an inventory of data categories.
data category registry
set of data categories to be used as a reference for the definition of linguistic annotation schemes or any other formats in the domain of language resources
DCR, Data Category Registry, ISO TC37 Data Category Registry
Data category registry used for ISO Technical Committee 37.

The DCR is available at http://www.isocat.org.

DCS, Data Category Selection
set of data categories selected from the DCR
data category selection
set of attributes used to fully describe a given data element concept
data stream
constituent of a digital object

individual files in a digitial object

digital object, DO
resource in a repository stored in one repository containter that can be addressed by an identifier a digital object can be seen as a generalization of a directory in a file system containing one or more files which are the data stream(s). Digital objects can exist in databases, hence the comparison to directory and file structures falls short.
element
synonym of XML element
information unit, IU
elementary piece of information attached to a level of the metamodel
mimetype
type of file as defined by IETF RFC 2045, IETF RFC 2046 and registered by IANA
namespace
synonym to XML namespace
persistent identifier, PID
unique Uniform Resource Identifier (URI) that assures permanent access for a digital object by providing access to it independently of its physical location or current ownership
profile
component that can be translated into a schema for metadata for a specific type of resource
proxy
placeholder for external data; a proxy provides a standard way of addressing otherwise unreachable resources
registry
central directory designed for persistent provision of negotiated information that can rebiably be accessed
repository
computer system for long time storage of resources
resource
entity that can be referenced by a URI
resource type
classification of a resource
UML, Unified Modelling Language
language for specifying, visualizing, constructing and documenting the artifacts of software systems
URI, Uniform Resource Identifier
identifier for locating resources on the internet
value
property of an attribute
virtual collection
collection in which the individual resources are loosely combined in a registry and do not necessarily exist in one digital object.

Normative References

RFC2119
Key words for use in RFCs to Indicate Requirement Levels, IETF RFC 2119, March 1997,
http://www.ietf.org/rfc/rfc2119.txt
XML-Namespaces
Namespaces in XML 1.0 (Third Edition), W3C, 8 December 2009,
http://www.w3.org/TR/2009/REC-xml-names-20091208/

Non-Normative References

RFC3023
XML Media Types, IETF RFC 3023, January 2001,
http://www.ietf.org/rfc/rfc3023.txt

Typographic and XML Namespace conventions

The following typographic conventions for XML fragments will be used throughout this specification:

  • <prefix:Element>
    An XML element with the Generic Identifier Element that is bound to an XML namespace denoted by the prefix prefix.
  • @attr
    An XML attribute with the name attr
  • string
    The literal string must be used either as element content or attribute value.

The following XML namespace names and prefixes are used throughout this specification. The column "Recommended Syntax" indicates which syntax variant SHOULD be used by the Endpoint to serialize the XML response.

Prefix Namespace Name Comment Recommended Syntax
cmd http://clarin.eu/cmd CMDI instance prefixed

CMDI Component Metadata Model

CMDI Component and Profile Specification Level

Structure of CMDI-files

An example of a CMDI file can be found in Annex B: showing the overall struture of a metadata instance serialization. The structure of such an instance is described here. A metadata instance that is complient to this standard must follow this structure. Structurally a CMDI instance consists of three sections, the header, the resources and the components. The header and the resource section are statically defined and remain constant in the generation of an evaluative metadata schema. They are described here as they are required for creating the schema from the component specification.

The header

The header-element is a container element intended to provide information on the metadata file as such, not the resource that is described by the metadata file. To make this more explicit and human readable, the data categories contained in the header are prefixed by Md for Metadata. The following elements are part of the header, all of these elements are optional:

  • MdCreator (optional): Name of the person who created the metadata file. This is defined as a string
  • MdCreationDate (optional): Date of the creation of the metadata file. This is defined as the data type date, i.e. the date is specified in the form yyyy-mm-dd (four digits for the year, followed

by a dash, followed by two digits for the month, followed by a dash, followed by two digits for the day of the month

  • MdSelfLink (optional): Persistent identifier for the metadata file (see PISA) in the form of a URI
  • MdProfile (mandatory): persistent identifier of the profile used to create this metadata file. This information is partially implied by the value of the schemalocation attribute of the root element, but the profile identifier may refer to the complete description of the profile such as the CCSL.
  • MdCollectionDisplayName (mandatory): The name for a collection as it is supposed to be displayed by an application. This element is used because metadata is often shared and institutions display the names of the collections in applications
  • MdRevisionGrp (optional): The group for storing metadata revisions if any, with at least one but possibly many child-element MdRevision containing the name of the editor (element by with string content), the date of editing (element date of type xs:date) and a verbose note on the revision (element note of type string)

It is always recommended to fill in all possible fields here. The idea for these fields is to structure the data and make information available, providing some background for the users of the metadata.

Potential problems, intentionally left vague are how to deal with changed metadata files: should the MdCreator and MdCreationDate be adjusted? If yes, how persistent is the MdSelfLink? As the metadata is created during the archiving state of a resource, potential updates are currently not dealt with.

The resources section

The resources section in a metadata file list all information relevant for the individual resource, but does not describe the resource as such. The description is part of the components, the resource section provides the location of the resource or its parts if it consists of more than one, provenance information on the resource, information on the relation between the parts of the resource, if applicable and information of a greater body the resource is part of, also if applicable.

The Resource proxy list

The resource proxy list defines metadata file internal placeholders, called proxies, for each part of a resource. For example, if a resource consists of one specific file, this file is referenced in the ResourceRef element, which holds the PID of this file, in the form of a URI. As resources can be composed of other resources, which are identified by their metadata, the ResourceType-element specifies if the PID refers to metadata (another metadata file) or a resources such as a binary file or data. To further specify the type ResourceType? takes mimetype} as an attribute, with the value specifying the mimetype of the referenced resource. Providing the mimetype is optional. !Resources can consist of more than one data streams or files, hence the ResourceProxyList may contain more than one ResourceProxy. To be able to refer to each of these parts individually, each ResourceProxy receives an id-attribute for internal reference within the metadata file.

The Journal File Proxy List

For many resources that are developed over a longer period of time, changes and updates are frequent. Provenance data is not part of the CMDI-model, but it is possible to store provenance data outside of the metadata file in sensible forms. Provenance metadata is refered to as JournalFile in CMDI documents. The JournalFileProxyList contains the list of all JournalFiles for a resource, the JournalFileRef holds the URI as a reference to the JournalFile containing the provenance data.

The Resource Relation List

Resource files do not exist independently of each other if a resource consist of more than one file. For example audio files and transcriptions are related to each other. The ResourceProxyList only lists these files, the ResourceRelationList makes the relation between pairs of files explicit. For this purpose the ResourceRelation? contains a triple of elements defining a directed relation between a first resource source, which is referenced by a ref-pointer to an id from the ResourceProxys and a second resource target respectively. The relation between the two is given as a string in the RelationType?-element, which relations defined in a data category registry. The identifier of the Relation Type is given as dcr:datcat.

The Is-Part-of List

Resources that are defined in bundles are listed under ResourceProxy. The individual parts can be seen as independent resources as well, such as a subcorpus that can also be distributed on its own. To point out that a resource is part of a larger unit or created as part of a larger unit, the IsPartOfList is introduced referring to one or more larger units by referring to the PID of the larger units with the IsPartOf-element.

Potentional problem: it is (maybe intentional) unclear to what the PID points to: the resource (e.g. a landing page) or the MD (e.g. a CMDI in a repo).

The components

The components are the content section of the CMDI-files to be processed by users. The structure of the components varies according to the intentended use. In general, the components list the data categories from a data category registry in order, provides the cardinality of these data categories and possibly controlled vocabulary. Components are very varied and hence a general mechanism for describing them is more adequat than providing individual examples. The general mechanism for describing the components is using the CMDI Component Specification Language (CCSL). For the component metadata infrastructure the header and the components are described seperately. In practice it is possible to keep them seperate until the concrete schema is being generated. The instances contain the header section and the component part. For the description of the components a specification language is being used, described in the following section.

The CMDI Component Specification Language

The CMDI Component Specification Language (CCSL) is designed to describe the variable, component specific part of the CMDI schema. In a CCSL file the metadata elements are defined and grouped and other components are referenced. Figure 1 shows the relation of the individual elements of the CCSL. Figure 1 — Schematic architecture of the CMDI Component Specification Language Instances of the component specification language contain two parts, namely a header section and the component description.

CCSL header

The CCSL header provides simple data warehousing information on the component description, namely an identifier to the component description which must be unique and should be persistent (see also ISO 24619:2011), a name for the component and a description, providing a prose description of the component.

Component definition

Components are defined as a sequence of elements and can be followed by other components as components can be embedded in other components. Additionally components can take any number of attributes. These attributes and possible values are also specified in the component description.

Element definition

Elements are the part of metadata instances containing the content, i.e., the field descriptors. When introducing elements, the content model is also specified, i.e. a value scheme, which can be either a specific pattern or a closed vocabulary.

Cardinality of elements and components

For practictal considerations the cardinality of components and elements is specified according to the needs in the metadata instance. Both, elements and components can be specified as occuring for a specific number of times. It is possible to provide a lower and an upper bound for each, though the upper bound must be larger or equal to the lower bound. The cardinality can be any positive integer, 0, or unbound.

Describing multilingual content

To describe multilingual content, elements are specified with a boolean attribute for multilinguality. For elements that are specified as multilingual, conformant applications must adjust the cardinality so that such an element can be used in many languages (i.e. upper bound of the cardinality is unlimited) and allows the specification of the language of the element content by an appropriate attribute (i.e. xml:lang).

Attributes for elements and components

Besides the specification of the cardinality, the specification of components and elements both share the attributes of names and concept link. The name attribute is required to specify the name of the element in the instance, while the concept link should be used to provide an external definition of the concept behind the element or component.

For those elements where a concept link cannot be provided, the documentation may be provided in prose as part of another element-attribute. It is however prefered to provide a concept link with reference to a data category registry as defined in ISO 12620:2009

For implementation purposes there is an optional attribute SupersetLabel? that - when set - indicates that the content of this element should be used to identify a superset of elements by an enabled application. The value of this attribute is a numeric value used as a rank. An enabled application uses the rank only when multiple indicators to identify subsets are set, indicating which one takes priority. The highest priority is then given to the element with the rank 1; should the same rank be used multiple times, the first one in document order will receive a higher priority.

For components, the component ID is provided as an attribute. This is required when a component is being used that is not specified internally but only referenced to by this identifier. In the case where a component specification includes another component specification internally, the component identfier is optional.

Transformation of CCSL into a schema

An application conforming to this standard must process the component specification language together with the static portions of header and resource section and provide an evaluative scheme for assessment of metadata instances. Various schema languages could be used, including XSchema and RelaxNG. This standard specifies how the different parts of the component specification are to be interpreted by an application creating a schema. The intended serialization of the metadata instances is valid (and well-formed) XML, which must be provided by an enabled application. Other serializations that are equivalent, for example as JSON objects, may be provided in addition to that.

Interpretation of hierarchies of the CCSL

Components are to be realized as container elements in the XML serialization, containing elements and components as specified. The name of the components or elements is provided by the name as specified in the CCSL by the respective name attribute. As XML is case sensitive, the cases of the name attribute is to be retained. The content model of an element is provided by the value scheme, i.e. a closed vocabulary or a regular expression like pattern or data type.

Interpretation of the order or elements

The specification of the elements provides the sequence of elements and components. The order of elements is fixed in general to allow for the specification of the cardinaltiy of elements. For components that contain elements and components the elements have to be specified first before the (sub-)components.

Interpretation of attributes

The CCSL allows the specification of the attributes of elements and components. The AttributeList element of the CCSL provides the meachnism to define attributes with appropriate value schemas. An enabled application must interpret the attributes specified in a attribute list so that the parent element or component allows the attribute with exactly that name and the content model as specified by the CCSL. For semantic interoperability the CCSL provides a concept link to the external definition and description of the semantics of the attribute. The content model is provided either by the type or by the value scheme (i.e. a closed vocabulary or a regular expression like pattern).

Appendices

Normative Appendix

XML schema of the CMDI component specification language

This Annex comprises the specification of the CCSL format using the XML Schema Part 2: Datatypes syntax. This schema shall be used as a reference to check the conformity of any data represented in CCSL, so long as it does not contain any additional markup module. In any other case, the schema shall be modified to incorporate the definition of the namespaces to be associated with the external markup to be used. The schema was developed within the CLARIN-NL and is included for reference.

<xs:schema
   xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xs:import
   namespace="http://www.w3.org/XML/1998/namespace"
   schemaLocation="http://www.w3.org/2005/08/xml.xsd"/>
 <xs:element name="CMD_ComponentSpec">
  <xs:complexType>
   <xs:sequence>
    <xs:element name="Header">
     <xs:complexType>
      <xs:sequence>
       <xs:element name="ID" type="xs:anyURI" minOccurs="0"/>
       <xs:element name="Name" type="xs:string" minOccurs="0"/>
       <xs:element name="Description" type="xs:string" minOccurs="0"/>
      </xs:sequence>
     </xs:complexType>
    </xs:element>
    <xs:element name="CMD_Component" type="CMD_Component_type" maxOccurs="unbounded">
     <xs:annotation>
      <xs:documentation>At the root level there should
             always be a Component.</xs:documentation>
     </xs:annotation>
    </xs:element>
   </xs:sequence>
   <xs:attribute name="isProfile" type="xs:boolean" use="required"/>
  </xs:complexType>
 </xs:element>
 <xs:group name="group">
  <xs:sequence>
   <xs:element
     name="AttributeList"
     type="AttributeList_type"
     minOccurs="0"
     maxOccurs="1"/>
   <xs:element
     name="CMD_Element"
     type="CMD_Element_type"
     minOccurs="0"
     maxOccurs="unbounded"/>
   <xs:element
     name="CMD_Component"
     type="CMD_Component_type"
     minOccurs="0"
     maxOccurs="unbounded"/>
  </xs:sequence>
 </xs:group>
 <xs:complexType name="CMD_Element_type">
  <xs:sequence>
   <xs:element
     name="AttributeList"
     type="AttributeList_type"
     minOccurs="0"
     maxOccurs="1">
    <xs:annotation>
     <xs:documentation>The AttributeList child of an element
           contains a set of XML attributes for that
           element.</xs:documentation>
    </xs:annotation>
   </xs:element>
   <xs:element
     minOccurs="0"
     maxOccurs="1"
     name="ValueScheme"
     type="ValueScheme_type">
    <xs:annotation>
     <xs:documentation>When an element is linked to a regular
           expression or a controlled vocabulary, the
           ValueScheme sub-element contains more information
           about this.</xs:documentation>
    </xs:annotation>
   </xs:element>
  </xs:sequence>
  <xs:attributeGroup ref="clarin_element_attributes"/>
 </xs:complexType>
 <xs:complexType name="ValueScheme_type">
  <xs:choice>
   <xs:element name="pattern" type="xs:string" maxOccurs="1">
    <xs:annotation>
     <xs:documentation>Specification of a regular expression
           the element should comply with.</xs:documentation>
    </xs:annotation>
   </xs:element>
   <xs:element name="enumeration" type="enumeration_type">
    <xs:annotation>
     <xs:documentation>A list of the allowed values of a
           controlled vocabulary.</xs:documentation>
    </xs:annotation>
   </xs:element>
  </xs:choice>
 </xs:complexType>
 <xs:complexType name="AttributeList_type">
  <xs:sequence>
   <xs:element name="Attribute" minOccurs="1" maxOccurs="unbounded">
    <xs:complexType>
     <xs:sequence>
      <xs:element name="Name" type="xs:string">
       <xs:annotation>
        <xs:documentation>The name of the
                 attribute.</xs:documentation>
       </xs:annotation>
      </xs:element>
      <xs:element name="ConceptLink" type="xs:anyURI" minOccurs="0">
       <xs:annotation>
        <xs:documentation>A link to the ISOcat data
                 category registry (or any other concept
                 registry).</xs:documentation>
       </xs:annotation>
      </xs:element>
      <xs:choice>
       <xs:element name="Type" type="allowed_attributetypes_type">
        <xs:annotation>
         <xs:documentation>For the use of simple XML types
                   as the type of the attribute.</xs:documentation>
        </xs:annotation>
       </xs:element>
       <xs:element name="ValueScheme" type="ValueScheme_type">
        <xs:annotation>
         <xs:documentation>For the use of a regular
                   expression or a controlled vocabulary as the type
                   of the attribute.</xs:documentation>
        </xs:annotation>
       </xs:element>
      </xs:choice>
     </xs:sequence>
    </xs:complexType>
   </xs:element>
  </xs:sequence>
 </xs:complexType>
 <xs:complexType name="CMD_Component_type">
  <xs:group ref="group" minOccurs="0"/>
  <xs:attributeGroup ref="clarin_component_attributes"/>
 </xs:complexType>
 <xs:attributeGroup name="clarin_element_attributes">
  <xs:attribute name="name" type="xs:Name" use="required">
   <xs:annotation>
    <xs:documentation>The name of the
         element.</xs:documentation>
   </xs:annotation>
  </xs:attribute>
  <xs:attribute name="ConceptLink" type="xs:anyURI">
   <xs:annotation>
    <xs:documentation>A link to the ISOcat data category
         registry (or any other concept
         registry).</xs:documentation>
   </xs:annotation>
  </xs:attribute>
  <xs:attribute name="ValueScheme" type="allowed_attributetypes_type">
   <xs:annotation>
    <xs:documentation>Used to specify that an element has a
         simple XML type (string, integer,
         etc)</xs:documentation>
   </xs:annotation>
  </xs:attribute>
  <xs:attribute name="CardinalityMin" type="cardinality_type">
   <xs:annotation>
    <xs:documentation>Minimal number of
         occurrences.</xs:documentation>
   </xs:annotation>
  </xs:attribute>
  <xs:attribute name="CardinalityMax" type="cardinality_type">
   <xs:annotation>
    <xs:documentation>Maximal number of
         occurrences.</xs:documentation>
   </xs:annotation>
  </xs:attribute>
  <xs:attribute name="Documentation" type="xs:string">
   <xs:annotation>
    <xs:documentation>Some information an application (eg Arbil)
         can display to give guidance to the user when entering
         metadata.</xs:documentation>
   </xs:annotation>
  </xs:attribute>
  <xs:attribute name="SupersetLabel" type="xs:integer">
   <xs:annotation>
    <xs:documentation>The element with the highest priority will
         be displayed as the label for a metadata file (eg in
         Arbil)</xs:documentation>
   </xs:annotation>
  </xs:attribute>
  <xs:attribute name="Multilingual" type="xs:boolean">
   <xs:annotation>
    <xs:documentation>Indicates that this element can have
         values in multiple languages (and thus is repeatable).
         This will result in the possibility of using the
         xml:lang attribute in the metadata instances that are
         created.</xs:documentation>
   </xs:annotation>
  </xs:attribute>
 </xs:attributeGroup>
 <xs:attributeGroup name="clarin_component_attributes">
  <xs:attribute name="name" type="xs:Name"/>
  <xs:attribute name="ComponentId" type="xs:anyURI">
   <xs:annotation>
    <xs:documentation>Indicates that a component (using its
         unique ComponentId issued by the ComponentRegistry)
         should be included.</xs:documentation>
   </xs:annotation>
  </xs:attribute>
  <xs:attribute name="ConceptLink" type="xs:anyURI">
   <xs:annotation>
    <xs:documentation>A link to the ISOcat data category
         registry (or any other concept registry). Currently not
         used.</xs:documentation>
   </xs:annotation>
  </xs:attribute>
  <xs:attribute name="filename" type="xs:anyURI">
   <xs:annotation>
    <xs:documentation>Outdated way of including an external
         component. Here for backward compatibility with the
         XML-cmdi-toolkit.</xs:documentation>
   </xs:annotation>
  </xs:attribute>
  <xs:attribute name="CardinalityMin" type="cardinality_type"/>
  <xs:attribute name="CardinalityMax" type="cardinality_type"/>
  <xs:attribute ref="xml:base"/>
 </xs:attributeGroup>
 <xs:simpleType name="cardinality_type">
  <xs:annotation>
   <xs:documentation>cardinality for elements and
       components</xs:documentation>
  </xs:annotation>
  <xs:union>
   <xs:simpleType>
    <xs:list itemType="xs:nonNegativeInteger"/>
   </xs:simpleType>
   <xs:simpleType>
    <xs:restriction base="xs:string">
     <xs:enumeration value="unbounded"/>
    </xs:restriction>
   </xs:simpleType>
  </xs:union>
 </xs:simpleType>
 <xs:simpleType name="allowed_attributetypes_type">
  <xs:annotation>
   <xs:documentation>Subset of XSD types that are allowed as CMD
       type</xs:documentation>
  </xs:annotation>
  <xs:restriction base="xs:token">
   <xs:enumeration value="boolean"/>
   <xs:enumeration value="decimal"/>
   <xs:enumeration value="float"/>
   <xs:enumeration value="string"/>
   <xs:enumeration value="anyURI"/>
   <xs:enumeration value="date"/>
   <xs:enumeration value="gDay"/>
   <xs:enumeration value="gMonth"/>
   <xs:enumeration value="gYear"/>
   <xs:enumeration value="time"/>
   <xs:enumeration value="dateTime"/>
  </xs:restriction>
 </xs:simpleType>
 <xs:complexType name="enumeration_type">
  <xs:annotation>
   <xs:documentation>controlled vocabularies</xs:documentation>
  </xs:annotation>
  <xs:choice minOccurs="0" maxOccurs="unbounded">
   <xs:element name="item" type="item_type">
    <xs:annotation>
     <xs:documentation>An item from a controlled
           vocabulary.</xs:documentation>
    </xs:annotation>
   </xs:element>
   <xs:element name="appinfo" type="xs:string">
    <xs:annotation>
     <xs:documentation>End-user guidance about the value of
           the controlled vocabulary as a whole. Currently not
           used.</xs:documentation>
    </xs:annotation>
   </xs:element>
  </xs:choice>
 </xs:complexType>
 <xs:complexType name="item_type">
  <xs:simpleContent>
   <xs:extension base="xs:string">
    <xs:attribute type="xs:anyURI" name="ConceptLink">
     <xs:annotation>
      <xs:documentation>A link to the ISOcat data category
             registry (or any other concept registry) related
             to this controllec vocabulary
             item.</xs:documentation>
     </xs:annotation>
    </xs:attribute>
    <xs:attribute type="xs:string" name="AppInfo">
     <xs:annotation>
      <xs:documentation>End-user guidance about the value
             of this controlled vocabulary
             item.</xs:documentation>
     </xs:annotation>
    </xs:attribute>
   </xs:extension>
  </xs:simpleContent>
 </xs:complexType>
</xs:schema>

Non-normative Appendix

Example CMDI instance

The following example shows an example CMDI-instance without the components.
<CMD xmlns="http://www.clarin.eu/cmd/"
  CMDVersion="1.2"
  xsi:schemaLocation="http://www.clarin.eu/cmd/ http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/profiles/clarin.eu:cr1:p_1290431694580/xsd"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:dcr="http://www.isocat.org/ns/dcr">
 <Header>
  <MdCreator>Reinhild Barkey</MdCreator>
  <MdCreationDate>2011-03-31</MdCreationDate>
  <MdSelfLink>http://hdl.handle.net/XXXX/XXXXXXXXXXXX</MdSelfLink>
  <MdProfile> clarin.eu:cr1:p_1290431694580</MdProfile>
  <MdCollectionDisplayName>Tübingen Language Resource
     Repository</MdCollectionDisplayName>
  <MdRevisionGrp>
   <MdRevision>
    <by>Thorsten Trippel</by>
    <date>2012-01-24</date>
    <note>Fixed encoding, added </note>
   </MdRevision>
   <MdRevision>
    <by>Thorsten Trippel</by>
    <date>2012-01-24</date>
    <note>Urgently needed example for second revision
         inserted</note>
   </MdRevision>
  </MdRevisionGrp>
 </Header>
 <Resources>
  <ResourceProxyList>
   <ResourceProxy id="resourceno1">
    <ResourceType mimetype="application/xml">Resource</ResourceType>
    <ResourceRef>http://hdl.handle.net/THERESOURCEPID1</ResourceRef>
   </ResourceProxy>
   <ResourceProxy id="resourceno2">
    <ResourceType mimetype="application/xml">Resource</ResourceType>
    <ResourceRef>http://hdl.handle.net/THERESOURCEPID2</ResourceRef>
   </ResourceProxy>
  </ResourceProxyList>
  <JournalFileProxyList>
   <JournalFileProxy>
    <JournalFileRef>http://hdl.handle.net/ThePIDtoPROVENANCEfile</JournalFileRef>
   </JournalFileProxy>
  </JournalFileProxyList>
  <ResourceRelationList>
   <ResourceRelation>
    <RelationType
      dcr:datcat="http://www.isocat.org/datcat/DC-4009"> annotates </RelationType>
    <Source ref="resourceno1"/>
    <Target ref="resourceno2"/>
   </ResourceRelation>
  </ResourceRelationList>
  <IsPartOfList>
   <IsPartOf>http://hdl.handle.net/SomeOtherBiggerResourceThisIsPartOf</IsPartOf>
   <IsPartOf>http://hdl.handle.net/SomeOtherEvenBiggerResourceThisIsPartOf</IsPartOf>
  </IsPartOfList>
 </Resources>
 <Components> ... </Components>
</CMD>

Example instance of a component specification

General Information component specification

This section provides an example description of a component using the CCSL.

<CMD_ComponentSpec
  isProfile="false"
  xsi:schemaLocation="http://www.clarin.eu/cmd http://www.clarin.eu/cmd/general-component-schema.xsd"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <Header>
  <ID>clarin.eu:cr1:c_1290431694495</ID>
  <Name>GeneralInfo</Name>
  <Description>Component contains general information about the
     resource, e.g. its name, title, the time coverage of the
     data, etc.</Description>
 </Header>
 <CMD_Component CardinalityMax="1" CardinalityMin="1" name="GeneralInfo">
  <CMD_Element
    Multilingual="true"
    CardinalityMax="unbounded"
    CardinalityMin="0"
    ValueScheme="string"
    ConceptLink="http://www.isocat.org/datcat/DC-2544"
    name="ResourceName"/>
  <CMD_Element
    Multilingual="true"
    CardinalityMax="unbounded"
    CardinalityMin="0"
    ValueScheme="string"
    ConceptLink="http://www.isocat.org/datcat/DC-2545"
    name="ResourceTitle"/>
  <CMD_Element
    SupersetLabel="1"
    CardinalityMax="unbounded"
    CardinalityMin="1"
    ConceptLink="http://www.isocat.org/datcat/DC-3806"
    name="ResourceClass">
   <ValueScheme>
    <enumeration>
     <item AppInfo="" ConceptLink="">Lexicon</item>
     <item AppInfo="" ConceptLink="">Corpus</item>
     <item AppInfo="" ConceptLink="">Tool</item>
     <item AppInfo="" ConceptLink="">Grammar</item>
     <item AppInfo="" ConceptLink="">Fieldwork
           Material</item>
     <item AppInfo="" ConceptLink="">Experimental
           Data</item>
     <item AppInfo="" ConceptLink="">Survey Data</item>
     <item AppInfo="" ConceptLink="">Test Data</item>
     <item AppInfo="" ConceptLink="">Toolchain</item>
     <item AppInfo="" ConceptLink="">ResourceBundle</item>
    </enumeration>
   </ValueScheme>
  </CMD_Element>
  <CMD_Element
    Multilingual="false"
    CardinalityMax="1"
    CardinalityMin="0"
    ValueScheme="string"
    ConceptLink="http://www.isocat.org/datcat/DC-2573"
    name="PID"/>
  <CMD_Element
    Multilingual="true"
    CardinalityMax="1"
    CardinalityMin="0"
    ValueScheme="string"
    ConceptLink="http://www.isocat.org/datcat/DC-2547"
    name="Version"/>
  <CMD_Element
    CardinalityMax="1"
    CardinalityMin="0"
    ConceptLink="http://www.isocat.org/datcat/DC-3818"
    name="LifeCycleStatus">
   <ValueScheme>
    <enumeration>
     <item AppInfo="" ConceptLink="">planned</item>
     <item AppInfo="" ConceptLink="">development</item>
     <item AppInfo="" ConceptLink="">released</item>
     <item AppInfo="" ConceptLink="">production</item>
     <item AppInfo="" ConceptLink="">withdrawn</item>
     <item AppInfo="" ConceptLink="">retired</item>
     <item AppInfo="" ConceptLink="">superseded</item>
     <item AppInfo="" ConceptLink="">unknown</item>
     <item AppInfo="" ConceptLink="">archived</item>
     <item AppInfo="" ConceptLink="">published</item>
    </enumeration>
   </ValueScheme>
  </CMD_Element>
  <CMD_Element
    CardinalityMax="1"
    CardinalityMin="0"
    ValueScheme="gYear"
    ConceptLink="http://www.isocat.org/datcat/DC-2539"
    name="StartYear"/>
  <CMD_Element
    CardinalityMax="1"
    CardinalityMin="0"
    ValueScheme="gYear"
    ConceptLink="http://www.isocat.org/datcat/DC-2509"
    name="CompletionYear"/>
  <CMD_Element
    Multilingual="false"
    CardinalityMax="1"
    CardinalityMin="0"
    ValueScheme="string"
    ConceptLink="http://www.isocat.org/datcat/DC-2538"
    name="PublicationDate"/>
  <CMD_Element
    Multilingual="false"
    CardinalityMax="1"
    CardinalityMin="0"
    ValueScheme="string"
    ConceptLink="http://www.isocat.org/datcat/DC-2526"
    name="LastUpdate"/>
  <CMD_Element
    Multilingual="true"
    CardinalityMax="1"
    CardinalityMin="0"
    ValueScheme="string"
    ConceptLink="http://www.isocat.org/datcat/DC-2502"
    name="TimeCoverage"/>
  <CMD_Element
    Multilingual="true"
    CardinalityMax="unbounded"
    CardinalityMin="0"
    ValueScheme="string"
    ConceptLink="http://www.isocat.org/datcat/DC-2956"
    name="LegalOwner"/>
  <CMD_Component
    CardinalityMax="1"
    CardinalityMin="0"
    ComponentId="clarin.eu:cr1:c_1290431694494"
    name="Location">
   <CMD_Element
     Multilingual="true"
     SupersetLabel="1"
     CardinalityMax="1"
     CardinalityMin="0"
     ValueScheme="string"
     ConceptLink="http://www.isocat.org/datcat/DC-2505"
     name="Address"/>
   <CMD_Element
     Multilingual="true"
     CardinalityMax="1"
     CardinalityMin="0"
     ValueScheme="string"
     ConceptLink="http://www.isocat.org/datcat/DC-3814"
     name="Region"/>
   <CMD_Element
     Multilingual="true"
     CardinalityMax="1"
     CardinalityMin="0"
     ValueScheme="string"
     ConceptLink="http://www.isocat.org/datcat/DC-3791"
     name="ContinentName"/>
   <CMD_Component
     CardinalityMax="1"
     CardinalityMin="1"
     ComponentId="clarin.eu:cr1:c_1290431694493"
     name="Country">
    <CMD_Element
      Multilingual="true"
      CardinalityMax="1"
      CardinalityMin="1"
      ValueScheme="string"
      ConceptLink="http://www.isocat.org/datcat/DC-3792"
      name="CountryName"/>
    <CMD_Element
      SupersetLabel="1"
      CardinalityMax="1"
      CardinalityMin="1"
      ConceptLink="http://www.isocat.org/datcat/DC-2092"
      name="CountryCoding">
     <ValueScheme>
      <enumeration>
       <item AppInfo="Andorra" ConceptLink="">AD</item>
       <item AppInfo="United Arab Emirates" ConceptLink="">AE</item>
       <item AppInfo="Afghanistan" ConceptLink="">AF</item>
       <item AppInfo="Antigua and Barbuda" ConceptLink="">AG</item>
       <item AppInfo="Anguilla" ConceptLink="">AI</item>
       <item AppInfo="Albania" ConceptLink="">AL</item>
       <item AppInfo="Armenia" ConceptLink="">AM</item>
       <item AppInfo="Netherlands Antilles" ConceptLink="">AN</item>
      </enumeration>
     </ValueScheme>
    </CMD_Element>
   </CMD_Component>
  </CMD_Component>
  <CMD_Component
    CardinalityMax="1"
    CardinalityMin="0"
    ComponentId="clarin.eu:cr1:c_1290431694486"
    name="Descriptions">
   <CMD_Element
     Multilingual="true"
     SupersetLabel="1"
     CardinalityMax="unbounded"
     CardinalityMin="1"
     ValueScheme="string"
     ConceptLink="http://www.isocat.org/datcat/DC-2520"
     name="Description">
    <AttributeList>
     <Attribute>
      <Name>type</Name>
      <ValueScheme>
       <enumeration>
        <item AppInfo="" ConceptLink="">short</item>
        <item AppInfo="" ConceptLink="">long</item>
       </enumeration>
      </ValueScheme>
     </Attribute>
    </AttributeList>
   </CMD_Element>
  </CMD_Component>
 </CMD_Component>
</CMD_ComponentSpec>

Bibliography

IETF RFC 2045, Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies

IETF RFC 2046, Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types

IETF RFC 5646, Tags for Identifying Languages

ISO 639‐1, Codes for the representation of names of languages — Part 1: Alpha-2 code

ISO 639‐3, Codes for the representation of names of languages -- Part 3: Alpha-3 code for comprehensive coverage of languages

ISO 3166‐1, Codes for the representation of names of countries and their subdivisions — Part 1: Country codes

ISO 8601, Data elements and interchange formats — Information interchange — Representation of dates and times

ISO/IEC 10646‐1, Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane XML Schema Part 2: Datatypes, Biron, P.V. and Malhotra, A. (eds.), W3C Recommendation 02 May 2001, available at <http://www.w3.org/TR/xmlschema-2/>

Last modified 9 years ago Last modified on 06/01/15 09:50:20