wiki:CMDI 1.2/Specification

Version 23 (modified by Twan Goosen, 9 years ago) (diff)

dumped first notes for "Transformation of CCSL into a CMD profile schema" section

NOTE: This page is currently under development and should be considered a draft. If you wish to contribute, please contact the authors.

Notes from a recent meeting concerning the CMDI specification can be found here

Component Metadata Infrastructure (CMDI) 1.2 [DRAFT]

Introduction

The goal of the Component Metadata Infrastructure (CMDI) specification...

TODO

History

TODO

Terminology

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC2119.

Glossary

Work in progress. Responsible for this section: Thorsten & Twan

Please do not edit here, but use the Google Docs version!

  • CMD model, Component Metadata model
    • The component based metadata model described in the present specification
  • CMDI, Component Metadata Infrastructure
    • Metadata description framework consisting of the CMD model and infrastructure
  • CCSL, CMDI Component Specification Language
    • XML based language for describing components according to the CMD model
  • CLARIN
  • resource, language resource
    • A (digitally) accessible entity that can be described in terms of its content and technical properties, referenced by a Uniform Resource Identifier
  • digital object
    • Resource in a repository stored in one repository container that can be addressed by an identifier; a digital object can be seen as a generalization of a directory in a file system containing one or more files which are the data stream(s). Digital objects can exist in databases, hence the comparison to directory and file structures falls short.
  • metadata
    • A description of a resource, usually given as a set of properties in the form of attribute-value pairs. This description may contain information about the resource, aspects or parts of the resource and/or artefacts and actors connected to the resource.
  • persistent identifier, PID
    • Unique Uniform Resource Identifier that assures permanent access for a digital object by providing access to it independently of its physical location or current ownership
  • concept
    • An abstract or generic idea generalized from particular instances (source: Merriam-Webster)
  • semantic registry
    • A list/directory/system maintaining (authoritative) definitions of terms, concepts or data categories. These registries should also provide persistent identifiers for their entries.
  • concept link
    • A reference from a CMD profile, CMD component, CMD element, CMD attribute or a value in a controlled vocabulary to an entry in a semantic registry via its persistent identifier.
  • CLARIN Concept Registry
  • CMD instance, metadata instance, CMDI file, metadata record, CMD record
    • A file that conforms to the general CMDI instance structure as described in this specification, and at the instance payload level follows the specific structure defined by the CMD specification it relates to
  • Instance header
    • The section of a metadata instance marked as ‘header’, providing information on that metadata instance as such, not the resource that is described by the metadata file
  • Resource proxy, CMD resource reference
    • A representation of a resource within a metadata instance containing a Uniform Resource Identifier as a reference to the resource itself and a specification of its type (one of: Resource, Metadata, SearchPage, SearchService, LandingPage)
  • Resource proxy reference
    • A reference from any point within the instance payload to any of the resource proxies
  • Instance payload(?)
    • The section of a metadata instance that follows the structure defined by the profile it references and contains the description of the resources to which that metadata instance relates
  • CMD specification, component specification/definition, profile specification/definition
    • The implementation of a CMD component or CMD profile by means of the CCSL
  • Specification header, component header, profile header
    • The section of a CMD specification marked as ‘header’, providing information on that specification as such that is not part of the defined structure
  • CMD component, component
    • A reusable, structured template for the description of (an aspect of)a resource, defined by means of a CMD specification document with the potential of embedding other components by reference
  • CMD profile, profile definition, profile
    • A CMD component that is used to describe a class of resources and is not embedded into other components, and therefore provides the complete structure for an instance payload
  • CMD element, element definition
    • A unit of a CMD component that describes the level of the metadata instance that can carry atomic values constrained by a value scheme, and does not contain further levels except for that of the CMD attribute
  • CMD attribute
    • A unit of a CMD element that describes the level at which properties of a CMD element can be provided by means of value scheme constrained atomic values.
  • value scheme
    • A set of constraints governing the range of  values allowed for a specific CMD element or CMD attribute in a metadata instance, expressed in terms of an XML schema datatype, controlled vocabulary, or regular expression
  • controlled vocabulary, closed/open vocabulary
    • A set of values that can be used either to constrain the set of permissible values or to provide suggestions for applicable values in a given context
  • regular expression

Normative References

RFC2119
Key words for use in RFCs to Indicate Requirement Levels, IETF RFC 2119, March 1997,
http://www.ietf.org/rfc/rfc2119.txt
XML-Namespaces
Namespaces in XML 1.0 (Third Edition), W3C, 8 December 2009,
http://www.w3.org/TR/2009/REC-xml-names-20091208/

Non-Normative References

RFC3023
XML Media Types, IETF RFC 3023, January 2001,
http://www.ietf.org/rfc/rfc3023.txt

Typographic and XML Namespace conventions

The following typographic conventions for XML fragments will be used throughout this specification:

  • <prefix:Element>
    An XML element with the Generic Identifier Element that is bound to an XML namespace denoted by the prefix prefix.
  • @attr
    An XML attribute with the name attr
  • string
    The literal string must be used either as element content or attribute value.

The following XML namespace names and prefixes are used throughout this specification. The column "Recommended Syntax" indicates which syntax variant SHOULD be used by the Endpoint to serialize the XML response.

Prefix Namespace Name Comment Recommended Syntax
cmd http://clarin.eu/cmd CMDI instance prefixed

TODO: update namespaces

Structure of CMDI-files

Responsible for this section: Oddrun

A CMDI file contains the actual metadata of one specific resource (hereafter referred to as the described resource), and might also be referred to as a CMDI record. All CMDI files have the same structure at the top level. At a lower level, parts of its structure are defined by the CMDI profile upon which it is based.

The main structure

A CMDI file has the root element CMD with 4 subelements:

  • The Header element, containing certain administrative information about the CMDI file, i.e. metadata about the file itself
  • The Resources element, listing resource proxies and their interrelations, by the following subelements
  • IsPartOf? list, containing a list of IsPartOf? elements, each referencing a larger external resource of which the described resource (as a whole) forms a part
  • Components, containing one subelement corresponding to – and in turn structured according to - the CMDI profile applied.

The profile substructure exist in the profile-specific namespace, all the rest within the cmd namespace.

<About local attributes here>

In the following the main parts are described in detail

The header

NameMdCreator?
DescriptionDenotes the creator of this metadata file
Value typeA string
Occurrences0 to unbounded
Attributes

State purpose of header List elements in a table, giving name, "definition", type, cardinality for each

The resources section

The Resource proxy list

State purpose of Resource Proxy list (and which files should be listed here) Specify in detail how resource proxies are represented:

  • all possible elements and attributes with definition, type, cardinality/obligation

The Journal File Proxy List

State purpose of Journal File Proxy list (and which files should be listed here) Specify in detail how resource proxies are represented:

  • all possible elements and attributes with definition, type, cardinality/obligation

The Resource Relation List

State purpose of Resource Relation List (representing binary relations between resource (proxies) and/or other resources Specify in detail how resource relation are represented:

  • all possible elements and attributes with definition, type, cardinality/obligation

The Is-Part-of List

State purpose of Is-Part-of List (representing external resources that the described resource is a part of) (NOTE: IsPartOfList? no longer in Resources section) Specify in detail how an Is-part-of relation is represented:

  • all possible elements and attributes with definition, type, cardinality/obligation

The components

Sate purpose of components section, and its dependency upon profile (as given in header: MdProfile?)

The CMDI Component Specification Language

Responsible for this section: Thomas

CCSL header

Component definition

Element definition

Cardinality of elements and components

Describing multilingual content

Attributes for elements and components

Transformation of CCSL into a CMD profile schema

Responsible for this section: Twan

A CMD instance document that is serialised as XML according this specification SHOULD reference the location of a CMD profile schema. The infrastructure MUST provide a mechanism to derive such a schema for any specific CMD profile on basis of its definition and that of the CMD components that it references. This section specifies how different aspects of a CMD specification should be transformed into elements of a schema document. The primary schema language targeted is XML Schema, although the infrastructure MAY provide support for other schema languages, such as DDML or Relax NG.

  • CMD profile schemas SHOULD NOT (MUST NOT?) be derived from CMD specifications that are not CMD profiles.

Global schema properties

  • Linked components should be included, expanded
  • A CMD profile schema MUST be a single document [or set of linked documents with a single entry point](?) that allows for the evaluation of CMD instance on all levels of description defined in one specific CMD profile.
  • The CMD profile schema MAY include, as a matter of annotation, a copy of (a subset of) the header information contained in the CMD profile from which it is defined.
  • The CMD profile schema MUST use the following namespaces:
  • targeted namespace
  • for annotation and documentation purposes that are outside the scope of instance validation
  • for embedded semantic annotation

Interpretation of CMD header

Interpretation of CMD component definitions in the CCSL

  • Interpretation of hierarchies in the CCSL
  • concept links
  • order of children
  • elements -> see elements
  • attributes -> see attributes

Interpretation of CMD element definitions in the CCSL

  • content model (value scheme)
  • concept links
  • order of children
  • attributes -> see attributes

Interpretation of CMD attribute definitions in the CCSL

  • content model (value scheme) - same as element??
  • concept links

Appendices

Bibliography

IETF RFC 2045, Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies

IETF RFC 2046, Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types

IETF RFC 5646, Tags for Identifying Languages

ISO 639‐1, Codes for the representation of names of languages — Part 1: Alpha-2 code

ISO 639‐3, Codes for the representation of names of languages -- Part 3: Alpha-3 code for comprehensive coverage of languages

ISO 3166‐1, Codes for the representation of names of countries and their subdivisions — Part 1: Country codes

ISO 8601, Data elements and interchange formats — Information interchange — Representation of dates and times

ISO/IEC 10646‐1, Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane XML Schema Part 2: Datatypes, Biron, P.V. and Malhotra, A. (eds.), W3C Recommendation 02 May 2001, available at <http://www.w3.org/TR/xmlschema-2/>

Attachments (12)

Download all attachments as: .zip