wiki:CmdiFormat

Metadata Components, Schemas and Specification formats

CLARIN Metadata Component

A metadata component is a specification of structured set of metadata elements. This specification is in its primary form a specification of a data structure. See fig. 1.

Figure 1 A CLARIN metadata component data structure

Every component in the registry has a unique identifier that also identifies the component registry instance [syntax to be decided]. The component registry also keeps some administrative data about the component that is not included in the component itself. A metadata element is atomic metadata for a resource. E.g. it cannot be subdivided into other metadata. A metadata element definition specifies attributes as its name, cardinality, value scheme, and a “ConceptLink” a link to a data category in an accepted registry. The actual instantiation of an elements definition contains also its value but does not (have to) contain the cardinality and ConceptLink since these can be obtained from its definition.

The component definition just as the metadata element definition specifies attributes as cardinality and ConceptLink that need not to be present in its actual instantiations. Every component can be transformed and exported into a component specification file. (See specification and exchange format for Components leter on this page).

An aggregation of components can serve as a specification for metadata descriptions for a specific class of resources for a specific purpose. Such an aggregation together with specifications of cardinality and with some administrative data is a metadata profile and can be registered in the component registry as a special component. Such a profile conforms to the CLARIN component metadata model that is described in Appendix A. The profile can be transformed and exported into an XML schema that constrains the creation of metadata description files.

The Metadata Component Model

This appendix takes an instantiated metadata description as primary data. Schemas and XML files are considered a necessary transformation product.

The UML diagram in Figure 2 is representing a class diagram for data structures needed to implement a CMD (Clarin Metadata Description). The central class is a "MetadataComponent?" that contains a list of metadata elements but can also contain again different "MetadataComponent?" classes. The MetadataComponent? describes a aspect of a LR, as such it needs also a link to a concept from a concept registry, anchoring the facet's semantics.

Figure 2 Data structure for a CLARIN metadata description

Every instance of the CMD data structures can be serialized in a XML metadata description, constrained by an XML schema that constrains the metadata description such that only the values of the metadata elements can vary. There are two factories that on the basis of a CMD, create (1) a metadata description using everywhere default values for the metadata elements and (2) a metadata XML schema file narrowly constraining the metadata description. There is of course also a (very) general XML schema that constrains the whole metadata component model. This general schema will just specify the necessary administrative elements and attributes as also the recursive structure of the metadata components.

The CLARIN metadata model does not require any mandatory components. However some predefined components are recommended in specific cases:

  1. Access metadata component that specifies ways to apply for access to a resource. Strongly recommended as a metadata component for all resources not automatically free.
  2. Technical metadata component allowing interaction with the CLARIN workflow mechanisms.

An example of a CMD description file can be seen in

No node metadata/trunk/toolkit/example/example-md-instance.xml at revision 7283

specification and exchange format for Components

In the start phase, when the component editor is not yet available, it may be an advantage to have a component specification format that people working on the CMDI can use to create components. Because of the problems in creating XML schemas constraining other schemas (they have a very broad scope, both syntactically and semantically and thus pose a large problem when interpreting them), it is best to have a very simple XML format that users can be easily understood and for which we can define a XML Schema that constrains this document sufficiently tight.

As an example a (very) simple Actor component is specified from containing a subcomponent “Language”.

No node metadata/trunk/toolkit/example/example-component-actor.xml at revision 7283

The Schema for this representation format is: Metadata Specification schema

CLARIN Metadata Profiles and Schemas

An example of a profile specification combining an Actor component with a TextTMD component (technical metadata) and photo technical metadata component. There is not much difference with a component specification.

<?xml version="1.0" encoding="UTF-8"?>
<CMD_ComponentSpec xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="file:/mnt/d/sync/doc/clarin/wp2/registry/cmd/cmd-schema.xsd">

    <CMD_Component ConceptLink="http://www.isocat.org/datcat/CMD-000" name="Actor">
        <!-- inline element definitions -->
        <CMD_Element name="FirstName" ValueScheme="string"
            ConceptLink="http://www.isocat.org/datcat/DC-1766"/>

        <CMD_Element name="ActorAge" ConceptLink="http://www.isocat.org/datcat/DC-1812"
            CardinalityMin="0" CardinalityMax="1">
            <ValueScheme>
                <pattern>([0-9]+)*(;[0-9]+)*(.[0-9]+)</pattern>
            </ValueScheme>
        </CMD_Element>

        <CMD_Element name="ActorSex" ConceptLink="http://www.isocat.org/datcat/DC-1789"
            CardinalityMin="0" CardinalityMax="1">
            <ValueScheme>
                <enumeration>
                    <item ConceptLink="http://www.isocat.org/datcat/DC-000">male</item>
                    <item ConceptLink="http://www.isocat.org/datcat/DC-000">female</item>
                </enumeration>
            </ValueScheme>
        </CMD_Element>
        

        <!-- use element defined elsewhere -->
        <!--<CMD_Element ref="ActorLanguage"/>-->

    </CMD_Component>

    <CMD_Component ConceptLink="http://www.isocat.org/datcat/CMD-000" name="TextTMD">
        <CMD_Element name="Format" ConceptLink="http://www.isocat.org/datcat/DC-1758"
            CardinalityMin="0" CardinalityMax="1">
            <ValueScheme>
                <enumeration>
                    <item ConceptLink="http://www.isocat.org/datcat/DC-000">text/plain</item>
                    <item ConceptLink="http://www.isocat.org/datcat/DC-000">text/html</item>
                </enumeration>
            </ValueScheme>
        </CMD_Element>
    </CMD_Component>

    <CMD_Component ConceptLink="http://www.isocat.org/datcat/CMD-000" name="PhotoTMD">
        <CMD_Element name="Format" ConceptLink="http://www.isocat.org/datcat/DC-1758"
            CardinalityMin="0" CardinalityMax="1">
            <ValueScheme>
                <enumeration>
                    <item ConceptLink="http://www.isocat.org/datcat/DC-000">image/jpeg</item>
                    <item ConceptLink="http://www.isocat.org/datcat/DC-000">image/png</item>
                </enumeration>
            </ValueScheme>
        </CMD_Element>
    </CMD_Component>

  </CMD_ComponentSpec>

This profile can be transformed into a profile metadata schema:

http://trac.clarin.eu/browser/metadata/trunk/toolkit/example/example-md-schema.xsd

Last modified 14 years ago Last modified on 11/18/09 16:09:41

Attachments (2)

Download all attachments as: .zip