Version 29 (modified by 9 years ago) (diff) | ,
---|
Component Metadata Infrastructure (CMDI) 1.2 [DRAFT]
Introduction
The goal of the Component Metadata Infrastructure (CMDI) specification...
TODO
History
TODO
Terminology
The key words MUST
, MUST NOT
, REQUIRED
, SHALL
, SHALL NOT
, SHOULD
, SHOULD NOT
, RECOMMENDED
, MAY
, and OPTIONAL
in this document are to be interpreted as described in RFC2119.
Glossary
- CMD model, Component Metadata model
- The component based metadata model described in the present specification
- CMDI, Component Metadata Infrastructure
- Metadata description framework consisting of the CMD model and infrastructure
- CCSL, CMDI Component Specification Language
- XML based language for describing components according to the CMD model
- CLARIN
- The infrastructure governed by the CLARIN ERIC
- http://www.clarin.eu
- resource, language resource
- A (digitally) accessible entity that can be described in terms of its content and technical properties, referenced by a Uniform Resource Identifier
- digital object
- Resource in a repository stored in one repository container that can be addressed by an identifier; a digital object can be seen as a generalization of a directory in a file system containing one or more files which are the data stream(s). Digital objects can exist in databases, hence the comparison to directory and file structures falls short.
- metadata
- A description of a resource, usually given as a set of properties in the form of attribute-value pairs. This description may contain information about the resource, aspects or parts of the resource and/or artefacts and actors connected to the resource.
- persistent identifier, PID
- Unique Uniform Resource Identifier that assures permanent access for a digital object by providing access to it independently of its physical location or current ownership
- concept
- An abstract or generic idea generalized from particular instances (source: Merriam-Webster)
- semantic registry
- A list/directory/system maintaining (authoritative) definitions of terms, concepts or data categories. These registries should also provide persistent identifiers for their entries.
- concept link
- A reference from a CMD profile, CMD component, CMD element, CMD attribute or a value in a controlled vocabulary to an entry in a semantic registry via its persistent identifier.
- CLARIN Concept Registry
- The semantic registry maintaining concepts used/central to the CLARIN infrastructure
- http://clarin.eu/ccr
- XML
- Markup language standard as described by W3C recommendation http://www.w3.org/TR/xml/
- XML document
- ...
- XML element
- A constituent of an XML document as defined in W3C recommendation http://www.w3.org/TR/xml/ (distinct from a CMD element)
- XML schema datatype
- A predefined set of permissible content within a section of an XML document as described in http://www.w3.org/TR/xmlschema-2/
- XML container element
- An XML element that has one or more XML elements as its descendants
- XML attribute
- A property of an XML element as defined in W3C recommendation http://www.w3.org/TR/xml/ (distinct from a CMD attribute)
- Uniform Resource Identifier, URI
- An identifier for resources as described in RFC3986
- namespace
- An XML namespace as described in http://www.w3.org/TR/xml-names/
- CMD instance, metadata instance, CMDI file, metadata record, CMD record
- A file that conforms to the general CMDI instance structure as described in this specification, and at the instance payload level follows the specific structure defined by the CMD specification it relates to
- Instance header
- The section of a metadata instance marked as ‘header’, providing information on that metadata instance as such, not the resource that is described by the metadata file
- Resource proxy, CMD resource reference
- A representation of a resource within a metadata instance containing a Uniform Resource Identifier as a reference to the resource itself and a specification of its type (one of: Resource, Metadata, SearchPage, SearchService, LandingPage)
- Resource proxy reference
- A reference from any point within the instance payload to any of the resource proxies
- Instance payload(?)
- The section of a metadata instance that follows the structure defined by the profile it references and contains the description of the resources to which that metadata instance relates
- CMD specification, component specification/definition, profile specification/definition
- The implementation of a CMD component or CMD profile by means of the CCSL
- Specification header, component header, profile header
- The section of a CMD specification marked as ‘header’, providing information on that specification as such that is not part of the defined structure
- CMD component, component
- A reusable, structured template for the description of (an aspect of)a resource, defined by means of a CMD specification document with the potential of embedding other components by reference
- CMD profile, profile definition, profile
- A CMD component that is used to describe a class of resources and is not embedded into other components, and therefore provides the complete structure for an instance payload
- CMD element, element definition
- A unit of a CMD component that describes the level of the metadata instance that can carry atomic values constrained by a value scheme, and does not contain further levels except for that of the CMD attribute
- CMD attribute
- A unit of a CMD element that describes the level at which properties of a CMD element can be provided by means of value scheme constrained atomic values.
- value scheme
- A set of constraints governing the range of values allowed for a specific CMD element or CMD attribute in a metadata instance, expressed in terms of an XML schema datatype, controlled vocabulary, or regular expression
- controlled vocabulary, closed/open vocabulary
- A set of values that can be used either to constrain the set of permissible values or to provide suggestions for applicable values in a given context
- regular expression
- An expression that constrains the set of permissible values,as described in XML Schema Regular Expressions http://www.w3.org/TR/xmlschema-2/#regexs
Normative References
- RFC2119
-
Key words for use in RFCs to Indicate Requirement Levels, IETF RFC 2119, March 1997,
http://www.ietf.org/rfc/rfc2119.txt
- XML-Namespaces
-
Namespaces in XML 1.0 (Third Edition), W3C, 8 December 2009,
http://www.w3.org/TR/2009/REC-xml-names-20091208/
Non-Normative References
- RFC3023
-
XML Media Types, IETF RFC 3023, January 2001,
http://www.ietf.org/rfc/rfc3023.txt
Typographic and XML Namespace conventions
The following typographic conventions for XML fragments will be used throughout this specification:
<prefix:Element>
An XML element with the Generic Identifier Element that is bound to an XML namespace denoted by the prefix prefix.@attr
An XML attribute with the name attrstring
The literal string must be used either as element content or attribute value.
The following XML namespace names and prefixes are used throughout this specification. The column "Recommended Syntax" indicates which syntax variant SHOULD
be used by the Endpoint to serialize the XML response.
Prefix | Namespace Name | Comment | Recommended Syntax |
---|---|---|---|
cmd | http://clarin.eu/cmd | CMDI instance | prefixed |
TODO: update namespaces
Structure of CMDI-files
A CMDI file contains the actual metadata of one specific resource (hereafter referred to as the described resource), and might also be referred to as a CMDI record. All CMDI files have the same structure at the top level. At a lower level, parts of its structure are defined by the CMDI profile upon which it is based.
The main structure
A CMDI file has the root element CMD with 4 subelements:
- The Header element, containing certain administrative information about the CMDI file, i.e. metadata about the file itself
- The Resources element, listing resource proxies and their interrelations, by the following subelements
- ResourceProxyList?, containing a list of ResourceProxy? elements, each referencing a file contained in or closely related to the described resource
- JournalFileProxyList?, containing a list of JournalFileProxy? elements, each referencing a file (“journal file”) containing provenance information about the described resource
- ResourceRelationList?, containing a list of ResourceRelation? elements, each representing a relationship between 2 resource files (as listed in the ResourceProxyList?)
- IsPartOf? list, containing a list of IsPartOf? elements, each referencing a larger external resource of which the described resource (as a whole) forms a part
- Components, containing one subelement corresponding to – and in turn structured according to - the CMDI profile applied.
The profile substructure exist in the profile-specific namespace, all the rest within the cmd namespace.
<About local attributes here>
In the following the main parts are described in detail
The header
Name | MdCreator? |
Description | Denotes the creator of this metadata file |
Value type | A string |
Occurrences | 0 to unbounded |
Attributes |
State purpose of header List elements in a table, giving name, "definition", type, cardinality for each
The resources section
The Resource proxy list
State purpose of Resource Proxy list (and which files should be listed here) Specify in detail how resource proxies are represented:
- all possible elements and attributes with definition, type, cardinality/obligation
The Journal File Proxy List
State purpose of Journal File Proxy list (and which files should be listed here) Specify in detail how resource proxies are represented:
- all possible elements and attributes with definition, type, cardinality/obligation
The Resource Relation List
State purpose of Resource Relation List (representing binary relations between resource (proxies) and/or other resources Specify in detail how resource relation are represented:
- all possible elements and attributes with definition, type, cardinality/obligation
The Is-Part-of List
State purpose of Is-Part-of List (representing external resources that the described resource is a part of) (NOTE: IsPartOfList? no longer in Resources section) Specify in detail how an Is-part-of relation is represented:
- all possible elements and attributes with definition, type, cardinality/obligation
The components
Sate purpose of components section, and its dependency upon profile (as given in header: MdProfile?)
The CMDI Component Specification Language
The CMDI Component Specification Language (CCSL) is used to describe a CMD component or CMD profile. Hence, a CCSL document provides the structure of an aspect of a resource or (in the case of a profile specification) the complete structure of the instance's payload. It is also basis for the generation of the XML schema file that is used to validate a CMD instance (see section Transformation of CCSL into a schema for details). A CCSL document consists of two sections, the CCSL header and the actual CMD component description. Its root element must contain an XML attribute isProfile to indicate if the document specifies a CMD profile or a CMD component. Figure XY show the relation of the individual elements of the CCSL.
CCSL header
The CCSL header provides information relevant to identify and describe the component. This part includes a persistent identifier, the name, and a description of the component. The header also supports information about the status of the specification. These include a mandatory element indicating the component's status in its lifecycle (using the three lifecycles development, production, or deprecated) and an optional element statusComment to contain information about the reason for the current status. In the case of a deprecated specification that was succeeded by a new specification, the identifier of the direct successor should be stored in the element Successor. The following table contains a summary of allowed specifications for the component header.
Name | Valuetype | Description |
ID | xs:anyURI | ID of the component specification |
Name | xs:string | Name of the component |
Description | xs:string | Description of the component |
Status | xs:string ("development", "production", "deprecated") | Status in lifecycle |
StatusComment? | xs:string | Comment about the status |
Successor | xs:anyURI | ID of successor component, if available |
CMD Component definition
Components are defined as a sequence of elements which may be followed by other components. The later is allowed because components may be embedded in other components. The specification of a CMD components contains the name of the component, the component's identifier, an optional concept link, and information about the allowed cardinality of the component. Furthermore documentation texts and further CMD attributes may be specified. The following table contains a summary of allowed specifications for a CMD component.
Name | Element/Attribute? | Valuetype | Description |
name | Attribute | xs:Name | Name of the component |
ComponentId? | Attribute | xs:anyURI | Identifier of the component |
ConceptLink? | Attribute | xs:anyURI | Concept link |
CardinalityMin? | Attribute | xs:nonNegativeInteger | Minimum number of times this component has to occur |
CardinalityMax? | Attribute | xs:nonNegativeInteger or “unbounded” | Maximum number of times this component may occur |
Documentation | Element | xs:string | Documentation about the purpose of the component |
AttributeList? | Element | xs:complexType | Additional attributes specified by the component creator |
CMD element definition
CMD elements are a template for storing atomic values constrained by a value scheme in a CMD instance. All relevant information and restrictions for such an element is contained in the CMD element definition. Most of this information is stored in XML attributes. This includes the mandatory name of the element, an optional concept link, the value schema, and information about the allowed cardinality of the element. Furthermore it can be indicated if the element may have different instance values in multiple languages, and hence an unlimited upper cardinality bound. Besides standard XML schema datatypes the value of a CMD element can be constrained by using regular expressions or vocabularies. The latter can be specified by giving the complete list of allowed values or by stating the URI of an external vocabulary (for details see Value restrictions for elements and attributes). If the instance's content of the element can be derived from other values, the element AutoValue? may be used to give indication about the derivation function. The CCSL does not prescribe or suggest a specific set of derivation functions. The following table contains a summary of allowed specifications for a CMD element.
Name | Element/Attribute? | Valuetype | Description |
name | Attribute | xs:Name | Name of the element |
ConceptLink? | Attribute | xs:anyURI | Concept link |
ValueScheme? | Attribute | Subset of XSD datatypes | Allowed data type if simple XML type is used |
CardinalityMin? | Attribute | xs:nonNegativeInteger or "unbounded" | Minimum number of times this element has to occur |
CardinalityMax? | Attribute | xs:nonNegativeInteger or "unbounded" | Maximum number of times this element may occur |
Multilingual | Attribute | xs:boolean | Indication that the element can have values in multiple languages |
Documentation | Element | xs:string | Documentation about the purpose of the element |
AttributeList? | Element | xs:complexType | Additional attributes specified by the component creator |
ValueScheme? | Element | xs:complexType | Value restrictions based on a regular expression or a specified vocabulary |
AutoValue? | Element | xs:string | Derivation rules for the element's content |
CMD attribute definition
Both the CMD element and component description allow the specification of additional CMD attributes. Every CMD attribute is specified using similar attributes and elements as for CMD elements. The following table contains a summary of allowed specifications for a CMD attribute.
Name | Element/Attribute? | Valuetype | Description |
name | Attribute | xs:Name | Name of the attribute |
ConceptLink? | Attribute | xs:anyURI | Concept link |
ValueScheme? | Attribute | Subset of XSD datatypes | Allowed data type if simple XML type is used |
Required | Attribute | xs:boolean | Indication if attribute is required |
Documentation | Element | xs:string | Documentation about the purpose of the attribute |
ValueScheme? | Element | xs:complexType | Value restrictions based on a regular expression or a specified vocabulary |
AutoValue? | Element | xs:string | Derivation rules for the attribute's content |
Value restrictions for elements and attributes
Apart from standard XML schema datatypes the content of a CMD element or attribute instance can be restricted by two means. The ValueScheme? element may contain either an XML element pattern with the specification of a regular expression the element should comply with, or the definition of a vocabulary of allowed values. CMDI 1.2 supports two approaches to describe such a vocabulary:
- specifying all allowed values with optional attributes for every value to include a concept link and a description of the specific value, or
- referring to an external vocabulary via a URI specified in the attribute URI. Optional XML attributes ValueProperty? and ValueLanguage? may be used to give more information about preferred label and language in the chosen vocabulary.
Cues attributes
All CMD attribute, element, and component specifications may contain additional attributes with the namespace “http://www.clarin.eu/cmdi/cues/display/1.0”. These may be used to give information about how the payload contained in CMD instances should be presented. Different styles for the same CMD component may be developed. The CCSL does not prescribe or suggest a specific set of these cue attributes.
Transformation of CCSL into a CMD profile schema
- A CMD instance document that is serialised as XML according this specification SHOULD reference the location of a CMD profile schema. The infrastructure MUST provide a mechanism to derive such a schema for any specific CMD profile on basis of its definition and that of the CMD components that it references. This section specifies how different aspects of a CMD specification should be transformed into elements of a schema definition. The primary schema language targeted is XML Schema, although the infrastructure MAY provide support for other schema languages, such as DDML or Relax NG.
- CMD profile schemas SHOULD NOT (MUST NOT?) be derived from CMD specifications that are not CMD profiles.
- The transformation as described here is assumed to take place on the fully expanded CMD profile, i.e. a version of the specification that has all referenced (non-inline) CMD Component definitions are resolved and substituted, recursively, by their full definitions.
- Global schema properties
- A CMD profile schema MUST be a single document [or set of linked documents with a single entry point](?) that allows for the evaluation of CMD instance on all levels of description defined in one specific CMD profile.
- The CMD profile schema MUST use the following namespaces:
- {cmd} http://www.clarin.eu/cmd/
- targeted namespace
- {ann} http://www.clarin.eu
- for annotation and documentation purposes that are outside the scope of instance validation
- {dcr} http://www.isocat.org/ns/dcr
- for embedded semantic annotation
- {cue} http://www.clarin.eu/cmdi/cues/display/1.0
- for display cues
- {cmd} http://www.clarin.eu/cmd/
- The CMD profile schema MAY include, as a matter of annotation, a copy of (a subset of) the information contained in the Header section of the CMD profile from which it is defined.
- The CMD profile schema MUST require the presence of a CMD instance envelope as described in section [CMDI Instance/The? main structure]. The value of the 'MdProfile' header item MUST only be valid if it is equal to the profile id as specified in the associated CMD profile.
- ☐Transformation MAY make use of component ids to derive (complex) types that can be reused in the schema definition.
- Interpretation of CMD component definitions in the CCSL
- CMD Components, represented as "Component" XML elements in the CCSL, MUST be realised as XML element declarations with the following property mapping:
- MANDATORY: Name of the XML element: @name
- MANDATORY: Minimal number of occurrences: @CardinalityMin, MUST be evaluated as '1' if this XML attribute is missing
- MANDATORY: Maximal number of occurrences: @CardinalityMax unless @Multilingual is true, in which case MUST be 'unbounded', otherwise MUST be evaluated as '1' if @CardinalityMax is not present
- OPTIONAL: Concept link by means of an XML attribute "dcr:datcat" on the XML element within the schema definition: @ConceptLink
- OPTIONAL: Component id by means of an XML attribute "cmd:ComponentId" on the XML element within the schema definition: @ComponentId
- The first CMD component defined in the CMD profile (the 'root component') MUST be mapped as the mandatory, only direct descendant of the "Components" XML element of the CMD instance envelope.
- CMD components that are defined as direct descendants of another CMD component MUST be mapped as direct descendants of the XML element declaration to which it is transformed and MUST be required to be included in the same order as defined in the CMD specification, the first of the resulting XML elements appearing after the last XML element derived from a CMD element at the same level, if present. These CMD Components MUST be mapped to XML element declarations recursively as described in this specification.
- An optional XML Attribute "cmd:ref" of type "xs:IDREFS" MUST be allowed on the XML container element derived from any CMD component.
- "Documentation" XML elements contained in CMD Components MAY be transformed into documentation XML elements embedded in the XML element declaration, in which case the content language information contained in the "xml:lang" XML attribute SHOULD be preserved.
- ☐display cues (all attributes in cue namespace should be copied)
- CMD Components, represented as "Component" XML elements in the CCSL, MUST be realised as XML element declarations with the following property mapping:
- Interpretation of CMD element definitions in the CCSL
- CMD elements, represented as "Element" XML elements in the CCSL, MUST be realised as XML element declarations with the following property mapping:
- MANDATORY: Name of the XML element: @name
- MANDATORY: Minimal number of occurrences: @CardinalityMin, MUST be evaluated as '1' if this XML attribute is missing
- MANDATORY: Maximal number of occurrences: @CardinalityMax unless @Multilingual is true, in which case MUST be 'unbounded', otherwise MUST be evaluated as '1' if @CardinalityMax is not present
- OPTIONAL: Concept link by means of an XML attribute "dcr:datcat" on the XML element within the schema definition: @ConceptLink
- ☐AutoValue
- CMD elements MUST be mapped as direct descendants of the XML element declaration derived from the CMD component of which they are direct descendants, and MUST be required to be included in the same order as defined in the CMD specification.
- The derivation of a content model for the XML element declaration on basis of a CMD element is described below.
- "Documentation" XML elements contained in CMD elements MAY be transformed into documentation XML elements embedded in the XML element declaration, in which case the content language information contained in the "xml:lang" XML attribute SHOULD be preserved.
- ☐display cues (all attributes in cue namespace should be copied)
- CMD elements, represented as "Element" XML elements in the CCSL, MUST be realised as XML element declarations with the following property mapping:
- Interpretation of CMD attribute definitions in the CCSL
- CMD attributes, represented as "Attribute" XML elements in the CCSL, MUST be realised as XML attribute declarations with the following property mapping:
- MANDATORY: Name of the XML attribute: @name
- MANDATORY: Use of the XML attribute: MUST be required if and only if @Required is present and equals true, otherwise MUST evaluate to optional
- OPTIONAL: Concept link by means of an XML attribute "dcr:datcat" on the XML element within the schema definition: @ConceptLink
- ☐AutoValue
- CMD attributes that are defined in the CCSL within "Attribute" XML elements within an "AttributeList" XML element that is a direct descendant of a CMD Component MUST be mapped to XML attribute definitions on the XML container element to which it is transformed.
- The derivation of a content model for the XML attribute declaration on basis of a CMD attribute is described below.
- "Documentation" XML elements contained in CMD attributes MAY be transformed into documentation XML elements embedded in the XML attribute declaration, in which case the content language information contained in the "xml:lang" XML attribute SHOULD be preserved.
- ☐display cues (all attributes in cue namespace should be copied)
- CMD attributes, represented as "Attribute" XML elements in the CCSL, MUST be realised as XML attribute declarations with the following property mapping:
- Content model
- If a CMD element or CMD attribute in the CCSL has a "ValueScheme" XML attribute, its value MUST be interpreted as the name of the XML Schema Datatype that defines the allowed value range of the XML element derived from the CMD element or XML attribute derived from the CMD attribute.
- If a CMD element or CMD attribute in the CCSL has a descendant XML element "ValueScheme" that contains an XML element "pattern", then its text value MUST be interpreted as the XML Schema Regular Expressions that defines the allowed value range of the XML element derived from this CMD element or XML attribute derived from the CMD attribute.
- If a CMD element or CMD attribute in the CCSL has a descendant XML element "ValueScheme" that contains an XML element "Vocabulary":
- The XML attributes "ValueProperty" and "ValueLanguage" of the XML element "Vocabulary" MAY be transformed into XML attributes in the "ann" namespace on the XML element declaration in the case of a CMD element or XML attribute declaration in the case of a CMD attribute.
- The XML attribute "URI" of the XML element "Vocabulary", if present, MUST be transformed into an attribute "cmd:ValueConceptLink" of the same value on the XML element declaration in the case of a CMD element or XML attribute declaration in the case of a CMD attribute.
- The XML elements "item" that are descendants of the XML element "enumeration" contained in the XML element "Vocabulary" MUST be transformed into an enumeration based restriction with values taken from the text content of the "item" XML elements. Each enumeration item in the schema MAY be annotated the value from the XML attribute "ConceptLink" by means of an XML attribute "dcr:datcat" and the value of the XML attribute "AppInfo" by means of an attribute in the "ann" namespace.
Appendices
Bibliography
IETF RFC 2045, Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies
IETF RFC 2046, Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types
IETF RFC 5646, Tags for Identifying Languages
ISO 639‐1, Codes for the representation of names of languages — Part 1: Alpha-2 code
ISO 639‐3, Codes for the representation of names of languages -- Part 3: Alpha-3 code for comprehensive coverage of languages
ISO 3166‐1, Codes for the representation of names of countries and their subdivisions — Part 1: Country codes
ISO 8601, Data elements and interchange formats — Information interchange — Representation of dates and times
ISO/IEC 10646‐1, Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane XML Schema Part 2: Datatypes, Biron, P.V. and Malhotra, A. (eds.), W3C Recommendation 02 May 2001, available at <http://www.w3.org/TR/xmlschema-2/>
Attachments (12)
- CCSL.png (202.3 KB) - added by 8 years ago.
- CMDI_structure.png (204.8 KB) - added by 8 years ago.
-
CMDI_1.png (44.0 KB) - added by 8 years ago.
CMDI 1 model diagram
- CMDI 1.2 extensions to the component metadata model.html (3.7 KB) - added by 6 years ago.
- CCSL2.html (7.1 KB) - added by 6 years ago.
- CMDIFilStructure.html (6.2 KB) - added by 6 years ago.
- CMDI 1.2 extensions to the component metadata model.xml (3.0 KB) - added by 6 years ago.
- CCSL2.xml (6.5 KB) - added by 6 years ago.
- CMDIFilStructure.xml (5.6 KB) - added by 6 years ago.
- CCSL2.pdf (95.2 KB) - added by 6 years ago.
- CMDIFilStructure.pdf (102.0 KB) - added by 6 years ago.
- CMDI 1.2 extensions to the component metadata model.pdf (48.8 KB) - added by 6 years ago.
Download all attachments as: .zip