Changes between Version 15 and Version 16 of CMDI 1.2/Specification


Ignore:
Timestamp:
06/03/15 12:38:15 (9 years ago)
Author:
Twan Goosen
Comment:

stripped content

Legend:

Unmodified
Added
Removed
Modified
  • CMDI 1.2/Specification

    v15 v16  
    159159}}}
    160160
     161{{{#!comment
     162TODO: Decide on the following intro subsections
    161163== CMDI Component Metadata Model ==
    162 {{{#!comment
    163 TODO: Write section
    164 }}}
    165 
    166164== CMDI Component and Profile Specification Level ==
    167 {{{#!comment
    168 TODO: Write section
    169 }}}
    170 
     165}}}
    171166
    172167= Structure of CMDI-files =
    173 An example of a CMDI file can be found in Annex B:  showing the overall struture of a metadata instance serialization. The structure of such an instance is described here. A metadata instance that is complient to this standard must follow this structure. Structurally a CMDI instance consists of three sections, the header, the resources and the components. The header and the resource section are statically defined and remain constant in the generation of an evaluative metadata schema. They are described here as they are required for creating the schema from the component specification.
     168{{{#!div class="notice system-message"
     169Responsible for this section: Oddrun
     170}}}
     171
    174172==      The header ==
    175  The header-element is a container element intended to provide information on the metadata file as such, not the resource that is described by the metadata file. To make this more explicit and human readable, the data categories contained in the header are prefixed by Md for Metadata. The following elements are part of the header, all of these elements are optional:
    176 •       !MdCreator (optional): Name of the person who created the metadata file. This is defined as a string
    177 •       !MdCreationDate (optional): Date of the creation of the metadata file. This is defined as the data type date, i.e. the date is specified in the form yyyy-mm-dd (four digits for the year, followed
    178 by a dash, followed by two digits for the month, followed by a dash, followed by two digits for the day of the month
    179 •       !MdSelfLink (optional): Persistent identifier for the metadata file (see PISA) in the form of a URI
    180 •       !MdProfile (mandatory): persistent identifier of the profile used to create this metadata file. This information is partially implied by the value of the schemalocation attribute of the root element, but the profile identifier may refer to the complete description of the profile such as the CCSL.
    181 •       !MdCollectionDisplayName (mandatory):  The name for a collection as it is supposed to be displayed by an application. This element is used because metadata is often shared and institutions display the names of the collections in applications
    182 •       !MdRevisionGrp (optional):  The group for storing metadata revisions if any, with at least one but possibly many child-element !MdRevision containing the name of the editor (element by with string content), the date of editing (element date of type xs:date) and a verbose  note on the revision (element note of type  string)
    183 
    184 It is always recommended to fill in all possible fields here. The idea for these fields is to structure the data and make information available, providing some background for the users of the metadata.
    185 
    186  Potential problems, intentionally left vague are how to deal with changed metadata files: should the !MdCreator and !MdCreationDate be adjusted? If yes, how persistent is the !MdSelfLink? As the metadata is created during the archiving state of a resource, potential updates are currently not dealt with.
    187 
    188 {{{#!comment
    189 TODO: Include support for attributes as local extensions
     173
     174{{{#!comment
     175[TODO CMDI 1.2]: Include support for attributes as local extensions
    190176
    191177Accepted proposal by Twan & Menzo (2014-11-20 by e-mail to the members of the CMDI taskforce):
     
    206192    "http://clarin.eu", etc.
    207193}}}
     194
    208195==      The resources section ==
    209 The resources section in a metadata file list all information relevant for the individual resource, but does not describe the resource as such. The description is part of the components, the resource section provides the location of the resource or its parts if it consists of more than one, provenance information on the resource, information on the relation between the parts of the resource, if applicable and information of a greater body the resource is part of, also if applicable.
     196
    210197===     The Resource proxy list ===
    211 The resource proxy list defines metadata file internal placeholders, called proxies, for each part of a resource. For example, if a resource consists of one specific file, this file is referenced in the !ResourceRef element, which holds the PID of this file, in the form of a URI. As resources can be composed of other resources, which are identified by their metadata, the !ResourceType-element specifies if the PID refers to metadata (another metadata file) or a resources such as a binary file or data. To further specify the type ResourceType takes mimetype} as an attribute, with the value specifying the mimetype of the referenced resource. Providing the mimetype is optional.
    212 !Resources can consist of more than one data streams or files, hence the !ResourceProxyList may contain more than one !ResourceProxy. To be able to refer to each of these parts individually, each !ResourceProxy receives an id-attribute for internal reference within the metadata file.
     198
    213199===     The Journal File Proxy List ===
    214 For many resources that are developed over a longer period of time, changes and updates are frequent. Provenance data is not part of the CMDI-model, but it is possible to store provenance data outside of the metadata file in sensible forms. Provenance metadata is refered to as !JournalFile in CMDI documents. The !JournalFileProxyList contains the list of all !JournalFiles for a resource, the !JournalFileRef holds the URI as a reference to the !JournalFile containing the provenance data.
     200
    215201===     The Resource Relation List ===
    216 Resource files do not exist independently of each other if a resource consist of more than one file. For example audio files and transcriptions are related to each other. The !ResourceProxyList only lists these files, the !ResourceRelationList makes the relation between pairs of files explicit. For this purpose the ResourceRelation contains a triple of elements defining a directed relation between a first resource source, which is referenced by a ref-pointer to an id from the !ResourceProxys and a second resource target respectively. The relation between the two is given as a string in the RelationType-element, which relations defined in a data category registry. The identifier of the Relation Type is given as dcr:datcat.
     202
    217203===     The Is-Part-of List ===
    218  Resources that are defined in bundles are listed under !ResourceProxy. The individual parts can be seen as independent resources as well, such as a subcorpus that can also be distributed on its own. To point out that a resource is part of a larger unit or created as part of a larger unit, the !IsPartOfList is introduced referring to one or more larger units by referring to the PID of the larger units with the !IsPartOf-element.
    219 
    220 Potentional problem: it is (maybe intentional) unclear to what the PID points to: the resource (e.g. a landing page) or the MD (e.g. a CMDI in a repo).
     204
    221205==      The components ==
    222 The components are the content section of the CMDI-files to be processed by users. The structure of the components varies according to the intentended use. In general, the components list the data categories from a data category registry in order, provides the cardinality of these data categories and possibly controlled vocabulary.
    223 Components are very varied and hence a general mechanism for describing them is more adequat than providing individual examples. The general mechanism for describing the components is using the CMDI Component Specification Language (CCSL).
    224 For the component metadata infrastructure the header and the components are described seperately. In practice it is possible to keep them seperate until the concrete schema is being generated. The instances contain the header section and the component part. For the description of the components a specification language is being used, described in the following section.
     206
    225207=       The CMDI Component Specification Language =
    226 
    227 The CMDI Component Specification Language (CCSL) is designed to describe the variable, component specific part of the CMDI schema. In a CCSL file the metadata elements are defined and grouped and other components are referenced. Figure 1 shows the relation of the individual elements of the CCSL.
    228  
    229 Figure 1 — Schematic architecture of the CMDI Component Specification Language
    230 Instances of the component specification language contain two parts, namely a header section and the component description.
     208{{{#!div class="notice system-message"
     209Responsible for this section: Thomas
     210}}}
    231211
    232212==      CCSL header ==
    233 The CCSL header provides simple data warehousing information on the component description, namely an identifier to the component description which must be unique and should be persistent (see also ISO 24619:2011), a name for the component and a description, providing a prose description of the component.
    234213
    235214==      Component definition ==
    236  Components are defined as a sequence of elements and can be followed by other components as components can be embedded in other components. Additionally components can take any number of attributes. These attributes and possible values are also specified in the component description.
    237215
    238216==      Element definition ==
    239  Elements are the part of metadata instances containing the content, i.e., the field descriptors. When introducing elements, the content model is also specified, i.e. a value scheme, which can be either a specific pattern or a closed vocabulary.
    240217
    241218==      Cardinality of elements and components ==
    242219
    243 For practictal considerations the cardinality of components and elements is specified according to the needs in the metadata instance. Both, elements and components can be specified as occuring for a specific number of times. It is possible to provide a lower and an upper bound for each, though the upper bound must be larger or equal to the lower bound.
    244 The cardinality can be any positive integer, 0, or unbound.
    245 
    246220==      Describing multilingual content ==
    247 To describe multilingual content, elements are specified with a boolean attribute for multilinguality. For elements that are specified as multilingual, conformant applications must adjust the cardinality so that such an element can be used in many languages (i.e. upper bound of the cardinality is unlimited) and allows the specification of the language of the element content by an appropriate attribute (i.e. xml:lang).
    248221
    249222==      Attributes for elements and components ==
    250223
    251 Besides the specification of the cardinality, the specification of components and elements both share the attributes of names and concept link. The name attribute is required to specify the name of the element in the instance, while the concept link should be used to provide an external definition of the concept behind the element or component.
    252 
    253  For those elements where a concept link cannot be provided, the documentation may be provided in prose as part of another element-attribute. It is however prefered to provide a concept link with reference to a data category registry as defined in ISO 12620:2009
    254 For implementation purposes there is an optional attribute SupersetLabel that - when set - indicates that the content of this element should be used to identify a superset of elements by an enabled application. The value of this attribute is a numeric value used as a rank. An enabled application uses the rank only when multiple indicators to identify subsets are set, indicating which one takes priority. The highest priority is then given to the element with the rank 1; should the same rank be used multiple times, the first one in document order will receive a higher priority.
    255 
    256 For components, the component ID is provided as an attribute. This is required when a component is being used that is not specified internally but only referenced to by this identifier. In the case where a component specification includes another component specification internally, the component identfier is optional.
    257 
    258 
    259 
    260224=       Transformation of CCSL into a schema =
    261 
    262 An application conforming to this standard must process the component specification language together with the static portions of header and resource section and provide an evaluative scheme for assessment of metadata instances. Various schema languages could be used, including XSchema and RelaxNG. This standard specifies how the different parts of the component specification are to be interpreted by an application creating a schema. The intended serialization of the metadata instances is valid (and well-formed) XML, which must be provided by an enabled application. Other serializations that are equivalent, for example as JSON objects, may be provided in addition to that.
     225{{{#!div class="notice system-message"
     226Responsible for this section: Twan
     227}}}
     228
     229
    263230==      Interpretation of hierarchies of the CCSL ==
    264 Components are to be realized as container elements in the XML serialization, containing elements and components as specified. The name of the components or elements is provided by the name as specified in the CCSL by the respective name attribute. As XML is case sensitive, the cases of the name attribute is to be retained.
    265 The content model of an element is provided by the value scheme, i.e. a closed vocabulary or a regular expression like pattern or data type.
     231
    266232==      Interpretation of the order or elements ==
    267 The specification of the elements provides the sequence of elements and components. The order of elements is fixed in general to allow for the specification of the cardinaltiy of elements. For components that contain elements and components the elements have to be specified first before the (sub-)components.
     233
    268234==      Interpretation of attributes ==
    269 The CCSL allows the specification of the attributes of elements and components. The !AttributeList element of the CCSL provides the meachnism to define attributes with appropriate value schemas. An enabled application must interpret the attributes specified in a attribute list so that the parent element or component allows the attribute with exactly that name and the content model as specified by the CCSL. For semantic interoperability the CCSL provides a concept link to the external definition and description of the semantics of the attribute. The content model is provided either by the type or by the value scheme (i.e. a closed vocabulary or a regular expression like pattern).
    270 
    271235
    272236= Appendices =
    273237
    274 {Removed copy of general component schema and instance XML example}
     238{{{#!comment
     239ISO spec has copy of general component schema and instance XML example, removed here
     240}}}
    275241
    276242= Bibliography =