Changes between Version 15 and Version 16 of CMDI 1.2/Specification
- Timestamp:
- 06/03/15 12:38:15 (9 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
CMDI 1.2/Specification
v15 v16 159 159 }}} 160 160 161 {{{#!comment 162 TODO: Decide on the following intro subsections 161 163 == CMDI Component Metadata Model == 162 {{{#!comment163 TODO: Write section164 }}}165 166 164 == CMDI Component and Profile Specification Level == 167 {{{#!comment 168 TODO: Write section 169 }}} 170 165 }}} 171 166 172 167 = Structure of CMDI-files = 173 An example of a CMDI file can be found in Annex B: showing the overall struture of a metadata instance serialization. The structure of such an instance is described here. A metadata instance that is complient to this standard must follow this structure. Structurally a CMDI instance consists of three sections, the header, the resources and the components. The header and the resource section are statically defined and remain constant in the generation of an evaluative metadata schema. They are described here as they are required for creating the schema from the component specification. 168 {{{#!div class="notice system-message" 169 Responsible for this section: Oddrun 170 }}} 171 174 172 == The header == 175 The header-element is a container element intended to provide information on the metadata file as such, not the resource that is described by the metadata file. To make this more explicit and human readable, the data categories contained in the header are prefixed by Md for Metadata. The following elements are part of the header, all of these elements are optional: 176 • !MdCreator (optional): Name of the person who created the metadata file. This is defined as a string 177 • !MdCreationDate (optional): Date of the creation of the metadata file. This is defined as the data type date, i.e. the date is specified in the form yyyy-mm-dd (four digits for the year, followed 178 by a dash, followed by two digits for the month, followed by a dash, followed by two digits for the day of the month 179 • !MdSelfLink (optional): Persistent identifier for the metadata file (see PISA) in the form of a URI 180 • !MdProfile (mandatory): persistent identifier of the profile used to create this metadata file. This information is partially implied by the value of the schemalocation attribute of the root element, but the profile identifier may refer to the complete description of the profile such as the CCSL. 181 • !MdCollectionDisplayName (mandatory): The name for a collection as it is supposed to be displayed by an application. This element is used because metadata is often shared and institutions display the names of the collections in applications 182 • !MdRevisionGrp (optional): The group for storing metadata revisions if any, with at least one but possibly many child-element !MdRevision containing the name of the editor (element by with string content), the date of editing (element date of type xs:date) and a verbose note on the revision (element note of type string) 183 184 It is always recommended to fill in all possible fields here. The idea for these fields is to structure the data and make information available, providing some background for the users of the metadata. 185 186 Potential problems, intentionally left vague are how to deal with changed metadata files: should the !MdCreator and !MdCreationDate be adjusted? If yes, how persistent is the !MdSelfLink? As the metadata is created during the archiving state of a resource, potential updates are currently not dealt with. 187 188 {{{#!comment 189 TODO: Include support for attributes as local extensions 173 174 {{{#!comment 175 [TODO CMDI 1.2]: Include support for attributes as local extensions 190 176 191 177 Accepted proposal by Twan & Menzo (2014-11-20 by e-mail to the members of the CMDI taskforce): … … 206 192 "http://clarin.eu", etc. 207 193 }}} 194 208 195 == The resources section == 209 The resources section in a metadata file list all information relevant for the individual resource, but does not describe the resource as such. The description is part of the components, the resource section provides the location of the resource or its parts if it consists of more than one, provenance information on the resource, information on the relation between the parts of the resource, if applicable and information of a greater body the resource is part of, also if applicable. 196 210 197 === The Resource proxy list === 211 The resource proxy list defines metadata file internal placeholders, called proxies, for each part of a resource. For example, if a resource consists of one specific file, this file is referenced in the !ResourceRef element, which holds the PID of this file, in the form of a URI. As resources can be composed of other resources, which are identified by their metadata, the !ResourceType-element specifies if the PID refers to metadata (another metadata file) or a resources such as a binary file or data. To further specify the type ResourceType takes mimetype} as an attribute, with the value specifying the mimetype of the referenced resource. Providing the mimetype is optional. 212 !Resources can consist of more than one data streams or files, hence the !ResourceProxyList may contain more than one !ResourceProxy. To be able to refer to each of these parts individually, each !ResourceProxy receives an id-attribute for internal reference within the metadata file. 198 213 199 === The Journal File Proxy List === 214 For many resources that are developed over a longer period of time, changes and updates are frequent. Provenance data is not part of the CMDI-model, but it is possible to store provenance data outside of the metadata file in sensible forms. Provenance metadata is refered to as !JournalFile in CMDI documents. The !JournalFileProxyList contains the list of all !JournalFiles for a resource, the !JournalFileRef holds the URI as a reference to the !JournalFile containing the provenance data. 200 215 201 === The Resource Relation List === 216 Resource files do not exist independently of each other if a resource consist of more than one file. For example audio files and transcriptions are related to each other. The !ResourceProxyList only lists these files, the !ResourceRelationList makes the relation between pairs of files explicit. For this purpose the ResourceRelation contains a triple of elements defining a directed relation between a first resource source, which is referenced by a ref-pointer to an id from the !ResourceProxys and a second resource target respectively. The relation between the two is given as a string in the RelationType-element, which relations defined in a data category registry. The identifier of the Relation Type is given as dcr:datcat. 202 217 203 === The Is-Part-of List === 218 Resources that are defined in bundles are listed under !ResourceProxy. The individual parts can be seen as independent resources as well, such as a subcorpus that can also be distributed on its own. To point out that a resource is part of a larger unit or created as part of a larger unit, the !IsPartOfList is introduced referring to one or more larger units by referring to the PID of the larger units with the !IsPartOf-element. 219 220 Potentional problem: it is (maybe intentional) unclear to what the PID points to: the resource (e.g. a landing page) or the MD (e.g. a CMDI in a repo). 204 221 205 == The components == 222 The components are the content section of the CMDI-files to be processed by users. The structure of the components varies according to the intentended use. In general, the components list the data categories from a data category registry in order, provides the cardinality of these data categories and possibly controlled vocabulary. 223 Components are very varied and hence a general mechanism for describing them is more adequat than providing individual examples. The general mechanism for describing the components is using the CMDI Component Specification Language (CCSL). 224 For the component metadata infrastructure the header and the components are described seperately. In practice it is possible to keep them seperate until the concrete schema is being generated. The instances contain the header section and the component part. For the description of the components a specification language is being used, described in the following section. 206 225 207 = The CMDI Component Specification Language = 226 227 The CMDI Component Specification Language (CCSL) is designed to describe the variable, component specific part of the CMDI schema. In a CCSL file the metadata elements are defined and grouped and other components are referenced. Figure 1 shows the relation of the individual elements of the CCSL. 228 229 Figure 1 — Schematic architecture of the CMDI Component Specification Language 230 Instances of the component specification language contain two parts, namely a header section and the component description. 208 {{{#!div class="notice system-message" 209 Responsible for this section: Thomas 210 }}} 231 211 232 212 == CCSL header == 233 The CCSL header provides simple data warehousing information on the component description, namely an identifier to the component description which must be unique and should be persistent (see also ISO 24619:2011), a name for the component and a description, providing a prose description of the component.234 213 235 214 == Component definition == 236 Components are defined as a sequence of elements and can be followed by other components as components can be embedded in other components. Additionally components can take any number of attributes. These attributes and possible values are also specified in the component description.237 215 238 216 == Element definition == 239 Elements are the part of metadata instances containing the content, i.e., the field descriptors. When introducing elements, the content model is also specified, i.e. a value scheme, which can be either a specific pattern or a closed vocabulary.240 217 241 218 == Cardinality of elements and components == 242 219 243 For practictal considerations the cardinality of components and elements is specified according to the needs in the metadata instance. Both, elements and components can be specified as occuring for a specific number of times. It is possible to provide a lower and an upper bound for each, though the upper bound must be larger or equal to the lower bound.244 The cardinality can be any positive integer, 0, or unbound.245 246 220 == Describing multilingual content == 247 To describe multilingual content, elements are specified with a boolean attribute for multilinguality. For elements that are specified as multilingual, conformant applications must adjust the cardinality so that such an element can be used in many languages (i.e. upper bound of the cardinality is unlimited) and allows the specification of the language of the element content by an appropriate attribute (i.e. xml:lang).248 221 249 222 == Attributes for elements and components == 250 223 251 Besides the specification of the cardinality, the specification of components and elements both share the attributes of names and concept link. The name attribute is required to specify the name of the element in the instance, while the concept link should be used to provide an external definition of the concept behind the element or component.252 253 For those elements where a concept link cannot be provided, the documentation may be provided in prose as part of another element-attribute. It is however prefered to provide a concept link with reference to a data category registry as defined in ISO 12620:2009254 For implementation purposes there is an optional attribute SupersetLabel that - when set - indicates that the content of this element should be used to identify a superset of elements by an enabled application. The value of this attribute is a numeric value used as a rank. An enabled application uses the rank only when multiple indicators to identify subsets are set, indicating which one takes priority. The highest priority is then given to the element with the rank 1; should the same rank be used multiple times, the first one in document order will receive a higher priority.255 256 For components, the component ID is provided as an attribute. This is required when a component is being used that is not specified internally but only referenced to by this identifier. In the case where a component specification includes another component specification internally, the component identfier is optional.257 258 259 260 224 = Transformation of CCSL into a schema = 261 262 An application conforming to this standard must process the component specification language together with the static portions of header and resource section and provide an evaluative scheme for assessment of metadata instances. Various schema languages could be used, including XSchema and RelaxNG. This standard specifies how the different parts of the component specification are to be interpreted by an application creating a schema. The intended serialization of the metadata instances is valid (and well-formed) XML, which must be provided by an enabled application. Other serializations that are equivalent, for example as JSON objects, may be provided in addition to that. 225 {{{#!div class="notice system-message" 226 Responsible for this section: Twan 227 }}} 228 229 263 230 == Interpretation of hierarchies of the CCSL == 264 Components are to be realized as container elements in the XML serialization, containing elements and components as specified. The name of the components or elements is provided by the name as specified in the CCSL by the respective name attribute. As XML is case sensitive, the cases of the name attribute is to be retained. 265 The content model of an element is provided by the value scheme, i.e. a closed vocabulary or a regular expression like pattern or data type. 231 266 232 == Interpretation of the order or elements == 267 The specification of the elements provides the sequence of elements and components. The order of elements is fixed in general to allow for the specification of the cardinaltiy of elements. For components that contain elements and components the elements have to be specified first before the (sub-)components. 233 268 234 == Interpretation of attributes == 269 The CCSL allows the specification of the attributes of elements and components. The !AttributeList element of the CCSL provides the meachnism to define attributes with appropriate value schemas. An enabled application must interpret the attributes specified in a attribute list so that the parent element or component allows the attribute with exactly that name and the content model as specified by the CCSL. For semantic interoperability the CCSL provides a concept link to the external definition and description of the semantics of the attribute. The content model is provided either by the type or by the value scheme (i.e. a closed vocabulary or a regular expression like pattern).270 271 235 272 236 = Appendices = 273 237 274 {Removed copy of general component schema and instance XML example} 238 {{{#!comment 239 ISO spec has copy of general component schema and instance XML example, removed here 240 }}} 275 241 276 242 = Bibliography =