{{{#!div class="system-message" '''NOTE''': This page is currently under development and should be considered a draft. If you wish to contribute, please contact the authors. }}} {{{#!div class="notice system-message" Notes from a recent meeting concerning the CMDI specification can be found [[Taskforces/CMDI/Meeting20140730|here]] }}} = Component Metadata Infrastructure (CMDI) 1.2 [DRAFT] = [[PageOutline(1-5)]] == Introduction == The goal of the ''Component Metadata Infrastructure (CMDI)'' specification... '''TODO''' === History === '''TODO''' {{{#!comment Not sure if this is the right place for this section }}} === Terminology === The key words `MUST`, `MUST NOT`, `REQUIRED`, `SHALL`, `SHALL NOT`, `SHOULD`, `SHOULD NOT`, `RECOMMENDED`, `MAY`, and `OPTIONAL` in this document are to be interpreted as described in [#REF_RFC_2119 RFC2119]. === Glossary === {{{#!div class="notice system-message" Work in progress. Responsible for this section: Thorsten & Twan }}} {{{#!div class="system-message" Please do not edit here, but use the [https://docs.google.com/document/d/14yrkJwg2lxf5GGkkA-wMjgByHSlQYWiSt--Lvn0biyo/edit?usp=sharing Google Docs version]! }}} * CMD model, Component Metadata model * The component based metadata model described in the present specification * CMDI, Component Metadata Infrastructure * Metadata description framework consisting of the __CMD model__ and infrastructure * CCSL, CMDI Component Specification Language * __XML__ based language for describing components according to the __CMD__ model * CLARIN * The infrastructure governed by the CLARIN ERIC * [http://www.clarin.eu] * resource, language resource * A (digitally) accessible entity that can be described in terms of its content and technical properties, referenced by a __Uniform Resource Identifier__ * digital object * __Resource__ in a repository stored in one repository container that can be addressed by an identifier; a digital object can be seen as a generalization of a directory in a file system containing one or more files which are the data stream(s). Digital objects can exist in databases, hence the comparison to directory and file structures falls short. * metadata * A description of a __resource__, usually given as a set of properties in the form of attribute-value pairs. This description may contain information about the resource, aspects or parts of the resource and/or artefacts and actors connected to the resource. * persistent identifier, PID * Unique __Uniform Resource Identifier__ that assures permanent access for a digital object by providing access to it independently of its physical location or current ownership * concept * An abstract or generic idea generalized from particular instances (source: [http://www.merriam-webster.com/dictionary/concept Merriam-Webster]) * semantic registry * A list/directory/system maintaining (authoritative) definitions of terms, __concepts__ or data categories. These registries should also provide __persistent identifiers__ for their entries. * concept link * A reference from a __CMD profile__, __CMD component__, __CMD element__, __CMD attribute__ or a value in a __controlled vocabulary__ to an entry in a __semantic registry__ via its __persistent identifier__. * CLARIN Concept Registry * The __semantic registry__ maintaining __concepts__ used/central to the CLARIN infrastructure * [http://clarin.eu/ccr] * XML * Markup language standard as described by W3C recommendation http://www.w3.org/TR/xml/ * XML document * ... * XML element * A constituent of an __XML document__ as defined in W3C recommendation [http://www.w3.org/TR/xml/] (distinct from a __CMD element__) * XML schema datatype * A predefined set of permissible content within a section of an XML document as described in [http://www.w3.org/TR/xmlschema-2/] * XML container element * An __XML element__ that has one or more XML elements as its descendants * XML attribute * A property of an __XML element __as defined in W3C recommendation http://www.w3.org/TR/xml/ (distinct from a __CMD attribute__) * Uniform Resource Identifier, URI * An identifier for __resources__ as described in [http://tools.ietf.org/html/rfc3986 RFC3986] * namespace * An __XML__ namespace as described in [http://www.w3.org/TR/xml-names/] * CMD instance, metadata instance, CMDI file, metadata record, CMD record * A file that conforms to the general CMDI instance structure as described in this specification, and at the __instance payload__ level follows the specific structure defined by the __CMD specification__ it relates to * Instance header * The section of a __metadata instance__ marked as ‘header’, providing information on that metadata instance as such, not the __resource__ that is described by the metadata file * Resource proxy, CMD resource reference * A representation of a __resource__ within a __metadata instance__ containing a __Uniform Resource Identifier__ as a reference to the resource itself and a specification of its type (one of: Resource, Metadata, !SearchPage, !SearchService, !LandingPage) * Resource proxy reference * A reference from any point within the __instance payload__ to any of the __resource proxies__ * Instance payload(?) * The section of a __metadata instance__ that follows the structure defined by the profile it references and contains the description of the __resources__ to which that metadata instance relates * CMD specification, component specification/definition, profile specification/definition * The implementation of a __CMD component__ or __CMD profile__ by means of the __CCSL__ * Specification header, component header, profile header * The section of a __CMD specification__ marked as ‘header’, providing information on that specification as such that is not part of the defined structure * CMD component, component * A reusable, structured template for the description of (an aspect of)a __resource__, defined by means of a __CMD specification__ document with the potential of embedding other components by reference * CMD profile, profile definition, profile * A __CMD component__ that is used to describe a class of resources and is not embedded into other components, and therefore provides the complete structure for an __instance payload__ * CMD element, element definition * A unit of a CMD component that describes the level of the __metadata instance__ that can carry atomic values constrained by a __value scheme__, and does not contain further levels except for that of the __CMD attribute__ * CMD attribute * A unit of a CMD element that describes the level at which properties of a __CMD element__ can be provided by means of __value scheme __constrained atomic values. * value scheme * A set of constraints governing the range of  values allowed for a specific __CMD element__ or __CMD attribute__ in a __metadata instance__, expressed in terms of an __XML schema datatype__, __controlled vocabulary__, or __regular expression__ * controlled vocabulary, closed/open vocabulary * A set of values that can be used either to constrain the set of permissible values or to provide suggestions for applicable values in a given context * regular expression * An expression that constrains the set of permissible values,as described in  XML Schema Regular Expressions [http://www.w3.org/TR/xmlschema-2/#regexs] === Normative References === RFC2119[=#REF_RFC_2119]:: Key words for use in RFCs to Indicate Requirement Levels, IETF RFC 2119, March 1997, \\ [http://www.ietf.org/rfc/rfc2119.txt] XML-Namespaces[=#REF_XML_Namespaces]:: Namespaces in XML 1.0 (Third Edition), W3C, 8 December 2009, \\ [http://www.w3.org/TR/2009/REC-xml-names-20091208/] {{{#!comment TODO: add references }}} === Non-Normative References === RFC3023[=#REF_RFC_3023]:: XML Media Types, IETF RFC 3023, January 2001, \\ [http://www.ietf.org/rfc/rfc3023.txt] {{{#!comment TODO: add references }}} === Typographic and XML Namespace conventions === The following typographic conventions for XML fragments will be used throughout this specification: * `` \\ An XML element with the Generic Identifier ''Element'' that is bound to an XML namespace denoted by the prefix ''prefix''. * `@attr` \\ An XML attribute with the name ''attr'' {{{#!comment * `@prefix:attr` \\ An XML attribute with the name ''attr'' that is bound to an XML namespaces denoted by the prefix ''prefix''. }}} * `string` \\ The literal ''string'' must be used either as element content or attribute value. The following XML namespace names and prefixes are used throughout this specification. The column "Recommended Syntax" indicates which syntax variant `SHOULD` be used by the Endpoint to serialize the XML response. ||=Prefix =||=Namespace Name =||=Comment =||=Recommended Syntax =|| || `cmd` || `http://clarin.eu/cmd` || CMDI instance || prefixed || '''TODO''': update namespaces {{{#!comment TODO: Add payload, envelope namespace and namespace for specification }}} {{{#!comment TODO: Decide whether we want to have the following intro subsections == CMDI Component Metadata Model == == CMDI Component and Profile Specification Level == }}} = Structure of CMDI-files = {{{#!div class="notice system-message" Responsible for this section: Oddrun }}} {{{#!comment TODO: UML diagram of CMDI record components }}} A CMDI file contains the actual metadata of one specific resource (hereafter referred to as the described resource), and might also be referred to as a CMDI record. All CMDI files have the same structure at the top level. At a lower level, parts of its structure are defined by the CMDI profile upon which it is based. == The main structure == A CMDI file has the root element CMD with 4 subelements: * The Header element, containing certain administrative information about the CMDI file, i.e. metadata about the file itself * The Resources element, listing resource proxies and their interrelations, by the following subelements * ResourceProxyList, containing a list of ResourceProxy elements, each referencing a file contained in or closely related to the described resource * JournalFileProxyList, containing a list of JournalFileProxy elements, each referencing a file (“journal file”) containing provenance information about the described resource * ResourceRelationList, containing a list of ResourceRelation elements, each representing a relationship between 2 resource files (as listed in the ResourceProxyList) * IsPartOf list, containing a list of IsPartOf elements, each referencing a larger external resource of which the described resource (as a whole) forms a part * Components, containing one subelement corresponding to – and in turn structured according to - the CMDI profile applied. The profile substructure exist in the profile-specific namespace, all the rest within the cmd namespace. In the following the main parts are described in detail == The header == {{{#!comment [TODO CMDI 1.2]: Include support for attributes as local extensions Accepted proposal by Twan & Menzo (2014-11-20 by e-mail to the members of the CMDI taskforce): * Allow attributes of a foreign namespace (i.e. not the general CMDI namespace) anywhere in the Header, Resources sections and on the 'Component' element (but not on any of its children) of a CMDI record, i.e. a profile instance * This is achieved by modifying the component specification to XSD stylesheet * There will be no consequence for existing components, profiles or records * The change will be part of both CMDI 1.1 and CMDI 1.2 * The first application of this extension will be at TLA with the 'lat:localURI' attribute on ResourceRef elements Additional formal constraints proposed by Oliver: These local attributes a) MUST be ignored by tools that don't understand them (or don't want to deal with them) and therefore MAY be removed during processing b) the Namespace-Name MUST NOT contain fragments of official CLARIN namespaces, i.e. don't start with the "http://www.clarin.eu", "http://clarin.eu", etc. }}} ||Name||MdCreator|| ||Description||Denotes the creator of this metadata file|| ||Value type||A string|| ||Occurrences||0 to unbounded|| ||Attributes|| || State purpose of header List elements in a table, giving name, "definition", type, cardinality for each == The resources section == === The Resource proxy list === State purpose of Resource Proxy list (and which files should be listed here) Specify in detail how resource proxies are represented: * all possible elements and attributes with definition, type, cardinality/obligation === The Journal File Proxy List === State purpose of Journal File Proxy list (and which files should be listed here) Specify in detail how resource proxies are represented: * all possible elements and attributes with definition, type, cardinality/obligation === The Resource Relation List === State purpose of Resource Relation List (representing binary relations between resource (proxies) and/or other resources Specify in detail how resource relation are represented: * all possible elements and attributes with definition, type, cardinality/obligation == The Is-Part-of List == State purpose of Is-Part-of List (representing external resources that the described resource is a part of) ('''NOTE:''' IsPartOfList no longer in Resources section) Specify in detail how an Is-part-of relation is represented: * all possible elements and attributes with definition, type, cardinality/obligation == The components == Sate purpose of components section, and its dependency upon profile (as given in header: MdProfile) = The CMDI Component Specification Language = {{{#!div class="notice system-message" Responsible for this section: Thomas }}} {{{#!comment TODO: UML diagram of CCSL }}} == CCSL header == == Component definition == == Element definition == == Cardinality of elements and components == == Describing multilingual content == == Attributes for elements and components == = Transformation of CCSL into a schema = {{{#!div class="notice system-message" Responsible for this section: Twan }}} == Interpretation of hierarchies of the CCSL == == Interpretation of the order or elements == == Interpretation of attributes == = Appendices = {{{#!comment ISO spec has copy of general component schema and instance XML example, removed here }}} = Bibliography = IETF RFC 2045, Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies IETF RFC 2046, Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types IETF RFC 5646, Tags for Identifying Languages ISO 639‐1, Codes for the representation of names of languages — Part 1: Alpha-2 code ISO 639‐3, Codes for the representation of names of languages -- Part 3: Alpha-3 code for comprehensive coverage of languages ISO 3166‐1, Codes for the representation of names of countries and their subdivisions — Part 1: Country codes ISO 8601, Data elements and interchange formats — Information interchange — Representation of dates and times ISO/IEC 10646‐1, Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane XML Schema Part 2: Datatypes, Biron, P.V. and Malhotra, A. (eds.), W3C Recommendation 02 May 2001, available at