{{{#!div class="system-message" '''NOTE''': This page is currently under development and should be considered a draft. If you wish to contribute, please contact the authors. }}} {{{#!comment OLD #!div class="notice system-message" This document ([[.@90|version 90]]) is currently under review at [https://docs.google.com/document/d/1nP9GJsPsDoKfN3PHEsdaVzIT_ESmFwB1Nnuwf3jxWt4/edit?usp=sharing docs.google.com] }}} {{{#!div class="notice system-message" Notes from meetings concerning the CMDI specification can be found [[Taskforces/CMDI|here]] }}} = Component Metadata Infrastructure (CMDI) 1.2 [DRAFT] = [[PageOutline(1-5)]] == Introduction == Many researchers, from the humanities and other domains, have a strong need to study resources in close detail. Nowadays more and more of these resources are available online. To be able to find these resources, they are described with metadata. These metadata records are collected and made available via central catalogues. Often, resource providers want to include specific properties of a resource in their metadata. The purpose of catalogues tends to be more generic and address a broader target audience. It is hard to strike the balance between these two ends of the spectrum with one metadata schema, and mismatches can negatively impact the quality of metadata provided. The goal of the Component Metadata Infrastructure (CMDI) is to provide a flexible mechanism to build resource specific metadata schemas out of shared components and semantics. In CMDI the metadata lifecycle starts with the need of a metadata modeller to create a dedicated metadata profile for a specific type of resources. The modeller can browse and search a registry for components and profiles that are suitable or come close to meet her requirements. A component groups together metadata elements that belong together and can potentially be reused in a different context. Components can also group other components. The CLARIN Component Registry already contains many of these general components. These can be reused as they are or be adapted, i.e., add or remove some metadata elements and/or components. Also completely new components can be created to model the unique aspects of the resources under consideration. All the needed components are combined into one profile specific for the type of resources. Components, elements and values in this profile are linked to a semantic description - a concept - to make their meaning explicit. In the end metadata creators can create records for specific resources that comply with the profile relevant for the resource type, and these records can be provided to local and global catalogues. This lifecycle needs many systems, which together form the infrastructure, to cooperate well together. To enable this level of cooperation this specification provides in depth descriptions and definitions of what CMDI records, components and their representations in XML look like. === History === CMDI has been developed in the context of the European CLARIN infrastructure. Already in its preparatory phase, which started in 2007, the infrastructure felt the need for flexibility in the metadata domain as it was confronted with many types of resources that had to be accurately described. For version 1.0 the CMDI toolbox was created, which consists of the XML schemas and XSLT stylesheets to validate and transform components, profiles and records. Version 1.1 included some small changes and has seen small incremental backward compatible advances since 2011. This version has been in use all throughout CLARIN’s construction phase. Also CMDI has seen a growing number of tools and infrastructure systems that deal with its records and components and rely on its shared syntax and semantics. This specification describes version 1.2. This new version adds some functionality and also fixes some issues. These changes are highlighted in CE-2014-0318.. The transition from 1.1 to 1.2 is supported by version 1.2 of the CMDI toolkit. === Terminology === The key words `MUST`, `MUST NOT`, `REQUIRED`, `SHALL`, `SHALL NOT`, `SHOULD`, `SHOULD NOT`, `RECOMMENDED`, `MAY`, and `OPTIONAL` in this document are to be interpreted as described in [#REF_RFC_2119 RFC2119]. === Glossary === {{{#!div class="notice system-message" Work in progress. Responsible for this section: Thorsten & Twan }}} {{{#!div class="system-message" Please do not edit here, but use the [https://docs.google.com/document/d/14yrkJwg2lxf5GGkkA-wMjgByHSlQYWiSt--Lvn0biyo/edit?usp=sharing Google Docs version]! }}} * CMD model, Component Metadata model * The component based metadata model described in the present specification * CMDI, Component Metadata Infrastructure * Metadata description framework consisting of the __CMD model__ and infrastructure * CCSL, CMDI Component Specification Language * __XML__ based language for describing components according to the __CMD__ model * CLARIN * The infrastructure governed by the CLARIN ERIC * [http://www.clarin.eu] * resource, language resource * A (digitally) accessible entity that can be described in terms of its content and technical properties, referenced by a __Uniform Resource Identifier__ * digital object * __Resource__ in a repository stored in one repository container that can be addressed by an identifier; a digital object can be seen as a generalization of a directory in a file system containing one or more files which are the data stream(s). Digital objects can exist in databases, hence the comparison to directory and file structures falls short. * metadata * A description of a __resource__, usually given as a set of properties in the form of attribute-value pairs. This description may contain information about the resource, aspects or parts of the resource and/or artefacts and actors connected to the resource. * persistent identifier, PID * Unique __Uniform Resource Identifier__ that assures permanent access for a digital object by providing access to it independently of its physical location or current ownership * concept * An abstract or generic idea generalized from particular instances (source: [http://www.merriam-webster.com/dictionary/concept Merriam-Webster]) * semantic registry * A list/directory/system maintaining (authoritative) definitions of terms, __concepts__ or data categories. These registries should also provide __persistent identifiers__ for their entries. * concept link * A reference from a __CMD profile__, __CMD component__, __CMD element__, __CMD attribute__ or a value in a __controlled vocabulary__ to an entry in a __semantic registry__ via its __persistent identifier__. * CLARIN Concept Registry * The __semantic registry__ maintaining __concepts__ used/central to the CLARIN infrastructure * [http://clarin.eu/ccr] * XML * Markup language standard as described by W3C recommendation http://www.w3.org/TR/xml/ * XML document * ... * XML element * A constituent of an __XML document__ as defined in W3C recommendation [http://www.w3.org/TR/xml/] (distinct from a __CMD element__) * XML schema datatype * A predefined set of permissible content within a section of an XML document as described in [http://www.w3.org/TR/xmlschema-2/] * XML container element * An __XML element__ that has one or more XML elements as its descendants * XML attribute * A property of an __XML element __as defined in W3C recommendation http://www.w3.org/TR/xml/ (distinct from a __CMD attribute__) * Uniform Resource Identifier, URI * An identifier for __resources__ as described in [http://tools.ietf.org/html/rfc3986 RFC3986] * namespace * An __XML__ namespace as described in [http://www.w3.org/TR/xml-names/] * CMD instance, metadata instance, CMDI file, metadata record, CMD record * A file that conforms to the general CMDI instance structure as described in this specification, and at the __instance payload__ level follows the specific structure defined by the __CMD specification__ it relates to * Instance header * The section of a __metadata instance__ marked as ‘header’, providing information on that metadata instance as such, not the __resource__ that is described by the metadata file * Resource proxy, CMD resource reference * A representation of a __resource__ within a __metadata instance__ containing a __Uniform Resource Identifier__ as a reference to the resource itself and a specification of its type (one of: Resource, Metadata, !SearchPage, !SearchService, !LandingPage) * Resource proxy reference * A reference from any point within the __CMD instance payload__ to any of the __resource proxies__ * CMD instance payload(?) * The section of a __metadata instance__ that follows the structure defined by the profile it references and contains the description of the __resources__ to which that metadata instance relates * CMD instance envelope (?) * [todo] * CMD specification, component specification/definition, profile specification/definition * The implementation of a __CMD component__ or __CMD profile__ by means of the __CCSL__ * Specification header, component header, profile header * The section of a __CMD specification__ marked as ‘header’, providing information on that specification as such that is not part of the defined structure * CMD component, component * A reusable, structured template for the description of (an aspect of)a __resource__, defined by means of a __CMD specification__ document with the potential of embedding other components by reference * CMD profile, profile definition, profile * A __CMD component__ that is used to describe a class of resources and is not embedded into other components, and therefore provides the complete structure for an __instance payload__ * CMD element, element definition * A unit of a CMD component that describes the level of the __metadata instance__ that can carry atomic values constrained by a __value scheme__, and does not contain further levels except for that of the __CMD attribute__ * CMD attribute * A unit of a CMD element that describes the level at which properties of a __CMD element__ can be provided by means of __value scheme __constrained atomic values. * value scheme * A set of constraints governing the range of  values allowed for a specific __CMD element__ or __CMD attribute__ in a __metadata instance__, expressed in terms of an __XML schema datatype__, __controlled vocabulary__, or __regular expression__ * controlled vocabulary, closed/open vocabulary * A set of values that can be used either to constrain the set of permissible values or to provide suggestions for applicable values in a given context * regular expression * An expression that constrains the set of permissible values,as described in  XML Schema Regular Expressions [http://www.w3.org/TR/xmlschema-2/#regexs] * CMD profile schema * A schema definition by which the correctness of a __CMD instance__ with respect to the __CMD profile__ it pertains to can be evaluated. May be expressed as __XML Schema__ but also in another XML schema language. * XML element declaration * todo e.g. xs:element * XML attribute declaration * todo e.g. xs:attribute === Normative References === RFC2119[=#REF_RFC_2119]:: Key words for use in RFCs to Indicate Requirement Levels, IETF RFC 2119, March 1997, \\ [http://www.ietf.org/rfc/rfc2119.txt] XML-Namespaces[=#REF_XML_Namespaces]:: Namespaces in XML 1.0 (Third Edition), W3C, 8 December 2009, \\ [http://www.w3.org/TR/2009/REC-xml-names-20091208/] {{{#!comment TODO: add references }}} === Non-Normative References === RFC3023[=#REF_RFC_3023]:: XML Media Types, IETF RFC 3023, January 2001, \\ [http://www.ietf.org/rfc/rfc3023.txt] {{{#!comment TODO: add references }}} === Typographic and XML Namespace conventions === The following typographic conventions for XML fragments will be used throughout this specification: * `` \\ An XML element with the Generic Identifier ''Element'' that is bound to an XML namespace denoted by the prefix ''prefix''. * `@attr` \\ An XML attribute with the name ''attr'' {{{#!comment * `@prefix:attr` \\ An XML attribute with the name ''attr'' that is bound to an XML namespaces denoted by the prefix ''prefix''. }}} * `string` \\ The literal ''string'' must be used either as element content or attribute value. The following XML namespace names and prefixes are used throughout this specification. The column "Recommended Syntax" indicates which syntax variant `SHOULD` be used by the toolkit and other creators of CMDI related documents. ||=Prefix =||=Namespace Name =||=Comment =||=Recommended Syntax =|| || `cmd` || `http://www.clarin.eu/cmd/1` || CMDI instance (general/envelope) || prefixed || || `cmdp` || `http://www.clarin.eu/cmd/1/profiles/{profileId}` || CMDI payload (profile specific) || prefixed || || `cue` || `http://www.clarin.eu/cmd/cues/1` || Cues for tools || prefixed || || `xs` || `http://www.w3.org/2001/XMLSchema` || XML Schema || prefixed || {{{#!comment TODO: Add payload, envelope namespace and namespace for specification }}} {{{#!comment TODO: Decide whether we want to have the following intro subsections == CMDI Component Metadata Model == == CMDI Component and Profile Specification Level == }}} = Structure of CMDI files = {{{#!div class="notice system-message" Responsible for this section: Oddrun }}} {{{#!comment TODO: UML diagram of CMDI record components }}} A CMDI file contains the actual metadata of one specific resource (hereafter referred to as the ''described resource''), and might also be referred to as a CMD record. All CMDI files have the same structure at the top level. At a lower level, parts of its structure are defined by the CMD profile upon which it is based. == The main structure == A CMDI file has the root element `` with 4 subelements: * The `
` element, containing certain administrative information about the CMDI file, i.e. metadata about the file itself * The `` element, listing resource proxies and their interrelations, by the following subelements (ordered) * ``, containing a list of `` elements, each referencing a file contained in or closely related to the described resource * ``, containing a list of `` elements, each referencing a file (“journal file”) containing provenance information about the described resource * `,` containing a list of `` elements, each representing a relationship between 2 resource files (as listed in the ``) * The `` element, containing a list of `` elements, each referencing a larger external resource of which the described resource (as a whole) forms a part * The `` element, containing one subelement corresponding to – and in turn structured according to - the CMD profile applied. Here the "real" metadata of the reosurce are to be found. The first three elements (`
`, ``and ``) constitute the ''CMD instance envelope''and reside in the `cmd` namespace. The ''CMD instance payload'' contains the ``element, which (profile specific) substructure exists in the profile-specific namespace (prefix `cmdp`) or cues-for-tools namespace (`cue`). A detailed specification of the above mentioned parts of a CMD instance is given in the next four sections. In addition to this, foreign attributes (XML attributes of other namespaces than those defined in the [Typographic and XML Namespace conventions]) `MAY` occur anywhere in `
`, ``and ``elements and on the `` element (but not on any of its children). === Examples === {{{ #!xml ... ... ... ... ... }}} == The `
` element== {{{#!comment [TODO CMDI 1.2]: Include support for attributes as local extensions Accepted proposal by Twan & Menzo (2014-11-20 by e-mail to the members of the CMDI taskforce): * Allow attributes of a foreign namespace (i.e. not the general CMDI namespace) anywhere in the Header, Resources sections and on the 'Component' element (but not on any of its children) of a CMDI record, i.e. a profile instance * This is achieved by modifying the component specification to XSD stylesheet * There will be no consequence for existing components, profiles or records * The change will be part of both CMDI 1.1 and CMDI 1.2 * The first application of this extension will be at TLA with the 'lat:localURI' attribute on ResourceRef elements Additional formal constraints proposed by Oliver: These local attributes a) MUST be ignored by tools that don't understand them (or don't want to deal with them) and therefore MAY be removed during processing b) the Namespace-Name MUST NOT contain fragments of official CLARIN namespaces, i.e. don't start with the "http://www.clarin.eu", "http://clarin.eu", etc. }}} The header of a CMDI file mainly contains administrative information about the metadata, that is metadata about the CMDI file itself. The following elements may be included in the order indicated: ||||=Name=||=Value type=||=Occurrences=||=Description=|| ||||`
`||`xs:complexType`||1||Encapsulates core admistrative data about the CMDI file|| || ||``||`xs:string`||0 to unbounded||Denotes the creator of this metadata file|| || ||``||`xs:date`||0 or 1||The date this metadata file was created|| || ||``||`xs:anyURI`||0 or 1||A reference to this metadata file in its home repository, in the form of a PID (preferred) or a URL|| || ||``||`xs:anyURI`||1||The CMDI profile upon which this metadata file is based, given by its identifier in the Component Registry, e.g. clarin.eu:cr1:p_1407745711925|| || ||``||`xs:string`||0 or 1||The collection to which the described resource belongs, given as a human-readable name. In VLO this name will be assigned to the Collection facet.|| === Examples === {{{ #!xml John Doe 2012-04-17 hdl:1234/567890 clarin.eu:cr1:p_1311927752306 CLARIN-NL web services }}} == The `` element == This section of the CMDI file enumerates * files which are parts of or closely related to the described resource (`` and ``) * possible relations between pairs of these files (``) * any external resources of which the described resource is a part (``) === The list of resource proxies === `` contains a sequence of zero or more occurrences of ``, each of which representing a file/part of the described resource. ||||||||=Name=||=Value type=||=Occurrences=||=Description=|| ||||||||`` ||`xs:complexType`||1||Contains a list of resource proxies (see below)|| || ||||||`` ||`xs:complexType`||0 to unbounded||Represents a file which is a part of or closely related to the described resource|| || || ||||`@id`||`xs:ID`||1||Local identifier for the parent ``, unique within this CMDI file|| || || ||||`` ||Value from controlled set (`cmd:Resourcetype_simple`): `Resource`,`Metadata`,`LandingPage`,`SearchService`,`SearchPage`||1||The type of the file represented by this ``|| || || || ||`@mimeType`||`xs:string`||0 or 1||The media type of the file|| || || ||||``||`xs:anyURI`||1||A reference to the file represented by this ``, in the form of a Clarin compliant PID or a regular URL|| === The list of journal files === `` contains a sequence of zero or more occurrences of ``, each of which representing a file containing provenance information about the described resource. ||||||=Name=||=Value type=||=Occurrences=||=Description=|| ||||||`` ||`xs:complexType`||1||Contains a list of journal file proxies (see below)|| || ||||`` ||`xs:complexType`||0 to unbounded||Represents a file containing provenance information about the described resource|| || || ||``||`xs:anyURI`||1||A reference to the file represented by this ``, in the form of a Clarin compliant PID or a regular URL|| === The list of relations between resource files === `` contains a sequence of zero or more occurrences of ``, each of which representing a relation between any pair of ``. ||||||||||=Name=||=Value type=||=Occurrences=||=Description=|| ||||||||||``||`xs:complexType`||1||A representation of a relation between 2 resource proxies listed in ``|| || ||||||||``||`xs:complexType`||0 to unbounded||A representation of a relation between 2 resource proxies listed in ``|| || || ||||||``||`xs:string`||1||The type of the relation represented by its parent ``|| || || || ||||`@ConceptLink`||`xs:anyURI`||0 or 1||A reference to some concept registry (Clarin Concept Registry by default), indicating the semantics of ``|| || || ||||||``||`xs:complexType`||2||References one of the resource proxies participating in the relationship|| || || || ||||`@ref`||`xs:IDREF`||1||A reference to the `` with id=ref (the `` represented by its parent `` element)|| || || || ||||``||`xs:string`||0 or 1||Indicates the role its parent Resource plays in the relationship|| || || || || ||`@ConceptLink`||`xs:anyURI`||0 or 1||A reference to some concept registry (Clarin Concept Registry by default), indicating the semantics of ``|| === The !IsPartOf List === `` contains a sequence of zero or more occurrences of ``, each representing an external resource of which the described resource constitutes a part. ||||=Name=||=Value type=||=Occurrences=||=Description=|| ||||`` ||`xs:complexType`||0 or 1||Contains a list of ``(see below)|| || ||`` ||`xs:anyURI`||0 to unbounded||A reference to an external resource of which the described resource is a part, in the form of a Clarin compliant PID or a regular URL|| === Examples === {{{#!comment TODO: richer example }}} {{{ #!xml Resource hdl:1839/00-SERV-0000-0000-0009-D }}} == The components == This part of the CMDI file forms what may be referred to as «real» metadata about the described resource. Both content and structure are completely defined by the CMD Profile referenced by the XML element`` in `
`. ||||||||=Name=||=Value type=||=Occurrences=||=Description=|| ||||||||`` ||`xs:complexType`||1||Contains 1 occurrence of an XML element named according to selected CMD profile|| || ||||||`<{CMDProfile}>` ||`xs:complexType`||1||The XML element housing all the metadata about the described resource, complying with the {CMDProfile} schema || || || ||||`<{CMDElement}>`||Subset of XSD datatypes||0 to unbounded||Atomic piece of information about the described resource|| || || || ||`@xml:lang`||`xs:string`||0 or 1||Indicates the language of the {CMDElement} content, by a 2 letter language tag from ISO 639|| || || || ||`@ConceptLink`||`xs:anyURI`||0 or 1||Reference to a concept in an external vocabulary. Used in case the value {CMDElement} is selected from a controlled vocabulary|| || || ||||`<{CMDComponent}>`||`xs:complexType`||0 to unbounded||A chunk of information about the described resource, recursively composed of CMD Elements and other CMD Components|| || || || ||`@ComponentId`||`xs:anyURI`||0 or 1||Reference to the specification of {{CMDComponent} in Component Registry. If not present, {CMDComponent} is defined locally within its parent CMD component || = The CMDI Component Specification Language (CCSL)= {{{#!div class="notice system-message" Responsible for this section: Thomas }}} The CMDI Component Specification Language (CCSL) is used to describe a CMD component or CMD profile. Hence, a CCSL document provides the structure for describing an aspect of a resource or (in the case of a profile specification) the complete structure of the CMD instance. It is also basis for the generation of the XML schema file that is used to validate a CMD instance (see section ''Transformation of CCSL into a CMD schema definition'' for details). A CCSL document `MUST` contain a CCSL header and the actual CMD component description. Its root element `MUST` contain an XML attribute ''isProfile'' to indicate if the document specifies a CMD profile or a CMD component. {{{#!comment TODO: UML diagram of CCSL "Figure XY show the relation of the individual elements of the CCSL." }}} ||||= Name =||= Valuetype =||= Occurrences =||= Description =|| |||| `` || `xs:complexType` || 1 || Root element of the CCSL document || || || `
` || `xs:complexType` || 1 || Header of the component specification || || || `` || `xs:complexType` || 1 || Definition of the component's structure || || || `@isProfile` || `xs:boolean` || 1 || Indication about the component's status as a profile || === Examples === {{{ #!xml
...
... }}} == CCSL header == The CCSL header provides information relevant to identify and describe the component. This part includes a persistent identifier, the name, the description of the component and information about the status of the specification. The header `MUST` contain an element indicating the component's status in its lifecycle (using the three lifecycles ''development'', ''production'', or ''deprecated'') and `MAY` contain the element ''statusComment'' to contain information about the reason for the current status. In the case of a deprecated specification that was succeeded by a new specification, the identifier of the direct successor `MAY` be stored in the element ''Successor''. ||||= Name =||= Valuetype =||= Occurrences =||= Description =|| |||| `
` || `xs:complexType` || || Descriptive information about the component || || || `` || `xs:anyURI` || 0 or 1 || ID of the component specification || || || `` || `xs:string` || 0 or 1 || Name of the component || || || `` || `xs:string` || 0 or 1 || Description of the component || || || `` || `xs:string` ("development", "production", "deprecated") || 1 || Status in lifecycle || || || `` || `xs:string` || 0 or 1 || Comment about the status || || || `` || `xs:anyURI` || 0 or 1 || ID of successor component, if available || === Examples === {{{ #!xml
clarin.eu:cr1:p_1311927752306 ToolService Description of a tool and/or service(s) (adapted from the AnnotationTool profile) production
}}} {{{ #!xml
clarin.eu:cr1:p_1311927752306 ToolService Description of a tool and/or service(s) (adapted from the AnnotationTool profile) deprecated clarin.eu:cr1:p_1234567890
}}} == CMD component definition == Components are defined as a sequence of elements which `MAY` be followed by other components. The latter is allowed because components may be embedded in other components. If an already defined CMD component (i.e. a CMD component with its own identifier) should be referenced, `@ComponentId` `MUST` be used to indicate its identifier. If this is not the case, the specification of a CMD components `MAY` contain the name of the component, the component's identifier, a concept link, and information about the allowed cardinality of the component. Furthermore documentation texts and further CMD attributes `MAY` be specified. ||||= Name =||= Valuetype =||= Occurrences =||= Description =|| |||| `` || `xs:complexType` || || Root element of every CMD component definition || || || `@name` || `xs:Name` || 0 or 1 || Name of the component || || || `@ComponentId` || `xs:anyURI` || 0 or 1 || Identifier of the component || || || `@ConceptLink` || `xs:anyURI` || 0 or 1 || Concept link || || || `@CardinalityMin` || `xs:nonNegativeInteger` || 0 or 1 || Minimum number of times this component has to occur || || || `@CardinalityMax` || `xs:nonNegativeInteger` or “unbounded” || 0 or 1 || Maximum number of times this component may occur || || || `Documentation` || `xs:string` || 0 to unbounded || Documentation about the purpose of the component || || || `AttributeList` || `xs:complexType` || 0 or 1 || Additional attributes specified by the component creator || === Examples === {{{ #!xml ... ... ... }}} == CMD element definition == CMD elements are a template for storing atomic values constrained by a value scheme in a CMD instance. The CCSL specification of an CMD element `MUST` contain the name of the element and `MAY` contain a concept link, the value schema, and information about the allowed cardinality of the element. Furthermore it `MAY` be indicated if the element may have different instance values in multiple languages, and hence an unlimited upper cardinality bound. Besides standard XML schema datatypes the value of a CMD element `MAY` be constrained by using regular expressions or vocabularies. The latter can be specified by giving the complete list of allowed values or by stating the URI of an external vocabulary (for details see ''Value restrictions for elements and attributes''). If the instance's content of the element can be derived from other values, the element `AutoValue` `MAY` be used to give indication about the derivation function. The CCSL does not prescribe or suggest a specific set of derivation functions. ||||= Name =||= Valuetype =||= Occurrences =||= Description =|| |||| `` || `xs:complexType` || || Root element of every CMD element definition || || || `@name` || `xs:Name` || 1 || Name of the element || || || `@ConceptLink` || `xs:anyURI` || 0 or 1 || Concept link || || || `@ValueScheme` || Subset of XSD datatypes || 0 or 1 || Allowed data type if simple XML type is used || || || `@CardinalityMin` || `xs:nonNegativeInteger` or "unbounded" || 0 or 1 || Minimum number of times this element has to occur || || || `@CardinalityMax` || `xs:nonNegativeInteger` or "unbounded" || 0 or 1 || Maximum number of times this element may occur || || `@Multilingual` || `xs:boolean` || 0 or 1 || Indication that the element can have values in multiple languages || || || `` || `xs:string` || 0 to unbounded || Documentation about the purpose of the element || || || `` || `xs:complexType` || 0 or 1 || Additional attributes specified by the component creator || || || `` || `xs:complexType` || 0 or 1 || Value restrictions based on a regular expression or a specified vocabulary. See ''Value restrictions for elements and attributes'' for details. || || || `` || `xs:string` || 0 to unbounded || Derivation rules for the element's content || === Examples === {{{ #!xml The name of the web service or set of web services. }}} == CMD attribute definition == Both the CMD element and component description allow the specification of additional CMD attributes. Every CMD attribute definition `MUST` contain a `@name` attribute and `MAY` contain other attributes or elements for a more detailled description. ||||= Name =||= Valuetype =||= Occurrences =||= Description =|| |||| `` || `xs:complexType` || || Root element of every CMD attribute definition || || || `@name` || `xs:Name` || 1 || Name of the attribute || || || `@ConceptLink` || `xs:anyURI` || 0 or 1 || Concept link || || || `@ValueScheme` || Subset of XSD datatypes || 0 or 1 || Allowed data type if simple XML type is used || || || `@Required` || `xs:boolean` || 0 or 1 || Indication if attribute is required || || || `` || `xs:string` || 0 to unbounded || Documentation about the purpose of the attribute || || || `` || `xs:complexType` || 0 or 1 ||Value restrictions based on a regular expression or a specified vocabulary. See ''Value restrictions for elements and attributes'' for details. || || || `` || `xs:string` || 0 to unbounded || Derivation rules for the attribute's content || === Examples === {{{ #!xml ... }}} == Value restrictions for elements and attributes == Apart from standard XML schema datatypes the content of a CMD element or attribute instance can be restricted by two means. The `` element `MAY` contain either an XML element `` with the specification of a regular expression the element/attribute should comply with, or the definition of a controlled vocabulary of allowed values. CMDI 1.2 supports two approaches to describe such a vocabulary: * specifying all allowed values with `OPTIONAL` attributes for every value to include a concept link and a description of the specific value, or * referring to an external vocabulary via a URI specified in `@URI`. `OPTIONAL` XML attributes `@ValueProperty` and `@ValueLanguage` `MAY` be used to give more information about preferred label and language in the chosen vocabulary. {{{#!comment TODO: Refer to XSD }}} ||||||||||= Name =||= Valuetype =||= Occurrences =||= Description =|| |||||||||| `` || `xs:complexType` || || Specification of the value scheme of an element or attribute. || || |||||||| `` || `xs:string` || 0 or 1 || Specification of a regular expression the element/attribute should comply with. || || |||||||| `` || `xs:complexType` || 0 or 1 || Specification of a CMD vocabulary || || || |||||| `` || `xs:complexType` || 0 or 1 || Enumeration of items from a controlled vocabulary || || || || |||| `` || `xs:string` || 0 to unbounded || An item from a controlled vocabulary || || || || || || `@ConceptLink` || `xs:anyURI` || 0 or 1 || Concept link of item value || || || || || || `@AppInfo` || `xs:string` || 0 or 1 || End-user guidance about the value of this controlled vocabulary item. || || || || |||| `` || `xs:string` || 0 to unbounded || End-user guidance about the value of the controlled vocabulary as a whole. Currently not used. || || || |||||| `@URI` || `xs:anyURI` || 0 or 1 || URI of an external vocabulary || || || |||||| `@ValueProperty` || `xs:string` || 0 or 1 || preferred label in the external vocabulary || || || |||||| `@ValueLanguage` || `xs:language` || 0 or 1 || preferred language in the external vocabulary || === Examples === {{{ #!xml aaa aab aac aad aae aaf ... }}} {{{ #!xml ((\\p{L}|\\p{N}|\\p{P}|\\p{S})+|\\s)+ }}} == Cues attributes == All CMD attribute, element, and component specifications may contain additional attributes with the namespace “http://www.clarin.eu/cmd/cues/1”. These `MAY` be used to give information about how the payload contained in the respective part of the CMD instance should be presented. Cues are grouped in component specific styles. Different styles for the same CMD component `MAY` be developed. The CCSL does not prescribe or suggest a specific set of cue attributes. === Examples === {{{ #!xml ... }}} = Transformation of CCSL into a CMD profile schema definition = A CMD instance document that is serialised as XML according this specification `SHOULD` contain a reference the location of a CMD profile schema. The infrastructure `MUST` provide a mechanism to derive such a schema for any specific CMD profile on basis of its definition and that of the CMD components that it references. This section specifies how different aspects of a CMD specification should be transformed into elements of a schema definition. The primary schema language targeted is XML Schema, although the infrastructure `MAY` provide support for other schema languages, such as DDML or Relax NG. CMD profile schemas `SHOULD NOT` (`MUST NOT`?) be derived from CMD specifications that are not CMD profiles. The transformation as described here is assumed to take place on the fully expanded CMD profile, i.e. a version of the specification that has all referenced (non-inline) CMD Component definitions are resolved and substituted, recursively, by their full definitions. == General properties of the CMD profile schema definition == A CMD profile schema `MUST` be a single document {or set of linked documents with a single entry point}(?) that allows for the evaluation of a CMD instance on all levels of description defined in one specific CMD profile. The schema `MUST` require the presence of a CMD instance envelope as described in section "Structure of CMDI-files". The value of the `` header item in the CMD instance envelope `MUST` only be valid if it is equal to the profile id as specified in the associated CMD profile. The CMD profile schema `SHOULD` include, as a matter of annotation, a copy of (a subset of) the information contained in the `Header` section of the CMD profile from which it is derived. The transformation `MAY` make use of embedded component identifiers in the CMD component definition to derive (complex) types that can be reused throughout the schema definition. The schema `MUST` declare a profile specific payload namespace in addition to the fixed, global namespaces that are used (in particular `cmd` and `cue`). This namespace, with `RECOMMENDED` prefix `cmdp`, `MUST` have the following format: `http://www.clarin.eu/cmd/1/profiles/{profileId}`, where `{profileId} ` refers to the identifier of the profile from which the schema is derived in the Component Registry. All XML elements and XML attributes derived from CMD components, CMD elements and CMD attributes `MUST` be declared in this namespace. == Interpretation of CMD component definitions in the CCSL == {{{#!div class="notice system-message" Responsible for this section: Twan }}} CMD Components which are represented as `` XML elements in the CCSL, `MUST` be realised as XML element declarations with the following property mapping: ||= Property =||= XML schema attribute =||= Derived from =||= Use =|| || Name of the XML element || `@name` || `@name` || `REQUIRED` || || Minimal number of occurrences || `@minOccurs` || `@CardinalityMin`, or '1' if XML attribute not present || `REQUIRED` ^[#ioccditc-note1 1]^ || || Maximal number of occurrences || `@maxOccurs` || `@CardinalityMax`, or '1' if XML attribute not present || `REQUIRED` ^[#ioccditc-note1 1]^ || || Concept link || `@dcr:datcat` || `@ConceptLink` || `OPTIONAL` || || Component id || `@cmd:ComponentId` || `@ComponentId` || `OPTIONAL` || ^[=#ioccditc-note1 1]^The implementation may make use of default evaluation of the schema language if it matches these requirements, as is the case with XML Schema, and therefore omit explicit declaration of these properties An optional XML Attribute `@cmd:ref` of type ''xs:IDREFS'' `MUST` be allowed on the XML container element derived from any CMD component. `` XML elements contained in CMD Components `SHOULD` be transformed into documentation elements embedded in the XML element declaration. In these, the content language information contained in the `@xml:lang` XML attribute `SHOULD` be preserved. XML attributes of CMD Components in the 'cue' namespace `SHOULD` be copied into the XML element declaration, in which case the XML attribute name, namespace and value `SHOULD` be preserved. === Document structure prescribed by the schema === The first CMD component defined in the CMD profile (the "root component") `MUST` be mapped as the mandatory, only direct descendant of the `` XML element of the CMD instance envelope. CMD components that are defined as direct descendants of another CMD component `MUST` be mapped as direct descendants of the XML element declaration to which it is transformed. XML components at the CMD component level in the metadata instance `MUST` be required to be included in the same order as defined in the CMD specification, the first of the resulting XML elements appearing after the last XML element derived from a CMD element at the same level, if present. These descendant CMD Components `MUST` also be mapped to XML element declarations recursively as described in this specification. CMD elements `MUST` be mapped as direct descendants of the XML element declaration derived from the CMD component of which they are direct descendants, and `MUST` be required to be included in the same order as defined in the CMD specification. CMD attributes that are defined in the CCSL within `` XML elements within an `` XML element that is a direct descendant of a CMD Component MUST be mapped to XML attribute definitions on the XML container element to which it is transformed. == Interpretation of CMD element definitions in the CCSL == CMD elements, represented as `` XML elements in the CCSL, `MUST` be realised as XML element declarations with the following property mapping: ||= Property =||= XML schema attribute =||= Derived from =||= Use =|| || Name of the XML element || `@name` || `@name` || `REQUIRED` || || Minimal number of occurrences || `@minOccurs` || `@CardinalityMin` '''unless''' `@Multilingual` is true,\\in which case MUST be 'unbounded',\\or '1' if neither XML attribute is present || `REQUIRED` ^[#ioceditc-note1 1]^ || || Maximal number of occurrences || `@maxOccurs` || `@CardinalityMax`, or '1' if XML attribute not present || `REQUIRED` ^[#ioceditc-note1 1]^ || || Type of the XML element || `@type` || See section 'Content model' || || || Concept link || `@dcr:datcat` || `@ConceptLink` || `OPTIONAL` || || Auto value instruction || `@cmd:AutoValue` || `@AutoValue` || `OPTIONAL` || ^[=#ioceditc-note1 1]^The implementation may make use of default evaluation of the schema language if it matches these requirements, as is the case with XML Schema, and therefore omit explicit declaration of these properties `` XML elements contained in CMD elements `SHOULD` be transformed into documentation elements embedded in the XML element declaration In these, the content language information contained in the `@xml:lang` XML attribute `SHOULD` be preserved. XML attributes of CMD Elements in the 'cue' namespace `SHOULD` be copied into the XML element declaration, in which case the XML attribute name, namespace and value `SHOULD` be preserved. An optional XML Attribute `@cmd:ValueConceptLink` of type ''xs:anyURI'' `MUST` be allowed on the XML element derived from a CMD element that has a vocabulary with XML attribute `@URI` defined (see section "Content model for CMD elements and CMD attributes in the schema definition"). The derivation of a content model for the XML element declaration on basis of a CMD element is described below. == Interpretation of CMD attribute definitions in the CCSL == CMD attributes, represented as `` XML elements in the CCSL, `MUST` be realised as XML attribute declarations with the following property mapping: ||= Property =||= XML schema attribute =||= Derived from =||= Use =|| || Name of the XML element || `@name` || `@name` || `REQUIRED` || || Use of the XML attribute || `@use` || 'required' if and only if `@Required` is present and equals true, otherwise 'optional' || `REQUIRED` ^[#iocaditc-note1 1]^ || || Type of the XML attribute || `@type` |||| See section 'Content model' || || Concept link || `@dcr:datcat` || `@ConceptLink` || `OPTIONAL` || || Auto value instruction || `@cmd:AutoValue` || `@AutoValue` || `OPTIONAL` || ^[=#iocaditc-note1 1]^The implementation may make use of default evaluation of the schema language if it matches these requirements, as is the case with XML Schema, and therefore omit explicit declaration of these properties `` XML elements contained in CMD attributes `SHOULD` be transformed into documentation elements embedded in the XML attribute declaration In these, the content language information contained in the `@xml:lang` XML attribute `SHOULD` be preserved. XML attributes of CMD Attributes in the 'cue' namespace `SHOULD` be copied into the XML attribute declaration, in which case the XML attribute name, namespace and value `SHOULD` be preserved. The derivation of a content model for the XML attribute declaration on basis of a CMD attribute is described below. == Content model for CMD elements and CMD attributes in the schema definition == If a CMD element or CMD attribute in the CCSL has a `@ValueScheme` XML attribute, its value `MUST` be interpreted as the name of the XML Schema Datatype (declared in the `@type` attribute of the XML element or attribute declaration in XML Schema) that defines the allowed value range of the XML element/attribute derived from the CMD element/attribute. '''Otherwise''', if a CMD element or CMD attribute in the CCSL has a descendant XML element `` that contains an XML element ``, then its text value `MUST` be interpreted as the XML Schema Regular Expressions that defines the allowed value range of the XML element/attribute derived from this CMD element/attribute. '''Otherwise''', if a CMD element or CMD attribute in the CCSL has a descendant XML element `` that contains an XML element ``: * The XML attribute `@URI` of the XML element ``, if present, `SHOULD` be transformed into an attribute `cmd:Vocabulary`^[#cm-note1 1]^ of the same value on the XML element or attribute declaration in the schema. * The XML attributes `@ValueProperty` and `@ValueLanguage` of the XML element `` `SHOULD` be transformed into XML attributes^[#cm-note1 1]^ in the 'cmd:' namespace on the XML element declaration in the case of a CMD element or XML attribute declaration in the case of a CMD attribute. * The XML elements `` that are descendants of `` contained in `` `MUST` be transformed into an enumeration based restriction with values taken from the text content of the `` XML elements. Each enumeration item in the schema `SHOULD` be annotated the value from the XML attribute `@ConceptLink` by means of an XML attribute `@dcr:datcat`^[#cm-note1 1]^ and the value of the XML attribute `@AppInfo` by means of an attribute `@cmd:label`. ^[=#cm-note1 1]^The attributes `@cmd:Vocabulary`, `@cmd:ValueProperty`, `@cmd:ValueLanguage` and `@dcr:datcat` should be present in the schema, not be declared as attributes allowed in the CMD instance = Appendices = {{{#!comment ISO spec has copy of general component schema and instance XML example, removed here }}} = Bibliography = IETF RFC 2045, Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies IETF RFC 2046, Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types IETF RFC 5646, Tags for Identifying Languages ISO 639‐1, Codes for the representation of names of languages — Part 1: Alpha-2 code ISO 639‐3, Codes for the representation of names of languages -- Part 3: Alpha-3 code for comprehensive coverage of languages ISO 3166‐1, Codes for the representation of names of countries and their subdivisions — Part 1: Country codes ISO 8601, Data elements and interchange formats — Information interchange — Representation of dates and times ISO/IEC 10646‐1, Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane XML Schema Part 2: Datatypes, Biron, P.V. and Malhotra, A. (eds.), W3C Recommendation 02 May 2001, available at