= Metadata Components, Schemas and Specification formats = == CLARIN Metadata Component == A metadata component is a specification of structured set of metadata elements. This specification is in its primary form a specification of a data structure. See fig. 1. [[Image(Metadata Component.jpg)]] Figure 1 A CLARIN metadata component data structure Every component in the registry has a unique identifier that also identifies the component registry instance [syntax to be decided]. The component registry also keeps some administrative data about the component that is not included in the component itself. A metadata element is atomic metadata for a resource. E.g. it cannot be subdivided into other metadata. A metadata element definition specifies attributes as its name, cardinality, value scheme, and a {{{“ConceptLink”}}} a link to a data category in an accepted registry. The actual instantiation of an elements definition contains also its value but does not (have to) contain the cardinality and {{{ConceptLink}}} since these can be obtained from its definition. The component definition just as the metadata element definition specifies attributes as cardinality and {{{ConceptLink}}} that need not to be present in its actual instantiations. Every component can be transformed and exported into a component specification file. (See specification and exchange format for Components leter on this page). An aggregation of components can serve as a specification for metadata descriptions for a specific class of resources for a specific purpose. Such an aggregation together with specifications of cardinality and with some administrative data is a metadata profile and can be registered in the component registry as a special component. Such a profile conforms to the CLARIN component metadata model that is described in Appendix A. The profile can be transformed and exported into an XML schema that constrains the creation of metadata description files. == The Metadata Component Model == This appendix takes an instantiated metadata description as primary data. Schemas and XML files are considered a necessary transformation product. The UML diagram in Figure 2 is representing a class diagram for data structures needed to implement a CMD (Clarin Metadata Description). The central class is a "MetadataComponent" that contains a list of metadata elements but can also contain again different "MetadataComponent" classes. The MetadataComponent describes a aspect of a LR, as such it needs also a link to a concept from a concept registry, anchoring the facet's semantics. [[Image(Metadata Description.jpg)]] Figure 2 Data structure for a CLARIN metadata description Every instance of the CMD data structures can be serialized in a XML metadata description, constrained by an XML schema that constrains the metadata description such that only the values of the metadata elements can vary. There are two factories that on the basis of a CMD, create (1) a metadata description using everywhere default values for the metadata elements and (2) a metadata XML schema file narrowly constraining the metadata description. There is of course also a (very) general XML schema that constrains the whole metadata component model. This general schema will just specify the necessary administrative elements and attributes as also the recursive structure of the metadata components. The CLARIN metadata model does not require any mandatory components. However some predefined components are recommended in specific cases: 1. Access metadata component that specifies ways to apply for access to a resource. Strongly recommended as a metadata component for all resources not automatically free. 2. Technical metadata component allowing interaction with the CLARIN workflow mechanisms. An example of a CMD description file can be seen in [[Include(source:metadata/trunk/toolkit/example/example-md-instance.xml)]] == specification and exchange format for Components == In the start phase, when the component editor is not yet available, it may be an advantage to have a component specification format that people working on the CMDI can use to create components. Because of the problems in creating XML schemas constraining other schemas (they have a very broad scope, both syntactically and semantically and thus pose a large problem when interpreting them), it is best to have a very simple XML format that users can be easily understood and for which we can define a XML Schema that constrains this document sufficiently tight. As an example a (very) simple Actor component is specified from containing a subcomponent “Language”. [[Include(source:metadata/trunk/toolkit/example/example-component-actor.xml)]] The Schema for this representation format is: [source:/metadata/trunk/toolkit/general-component-schema.xsd Metadata Specification schema] == CLARIN Metadata Profiles and Schemas == An example of a profile specification combining an Actor component with a TextTMD component (technical metadata) and photo technical metadata component. There is not much difference with a component specification. {{{ ([0-9]+)*(;[0-9]+)*(.[0-9]+) male female text/plain text/html image/jpeg image/png }}} This profile can be transformed into a profile metadata schema: http://trac.clarin.eu/browser/metadata/trunk/toolkit/example/example-md-schema.xsd