wiki:CMDI 1.2/Cues/Derived values

This page is a subpage of CMDI 1.2

Insertion of derived values

An executive summary is available at CMDI 1.2/Cues/Derived values/Summary

The issue

Most metadata needs to be created manually but in some cases parts of it can be derived: either from the described resource (e.g. a lot of technical metadata: file size, encoding, bit rate, ...) or from other metadata values (e.g. language code -> language name), sometimes in combination with external data (age can be derived from the combination of date of birth (metadata) and the current date (external information)).

The actual derivation of values has to be carried out by the tool (i.e. the editor), obviously, but the logic will be largely profile or component specific. Therefore a unified method of specifying relations between metadata fields and resources, other metadata fields or 'environmental' values is needed, not unlike the display rules proposed for extended display information. An obvious way of representing these rules would be by means of something like RDF triples to indicate that the value of (element/resource/...) X provides a value for element Y if transformed in way T. The editor would then interpret these rules and apply them where applicable.

Examples of rules that might be supported:

  • CreationDate gets populated with the current date (format yyyy-mm-dd)
  • FileSize gets populated with the file size in bytes of the referenced resource
  • LanguageName gets filled in based on the value of LanguageCode and an external lookup table
  • The value of Actor.Age becomes the difference between the current data and the value of Actor.DateOfBirth in years (floored)
  • ....

These examples illustrate the possibilities but also the potential complexity of such a rule based system.

Open questions (cf the open questions regarding extended display information):

  • Where to store this information? (In the profile/schema, in a separate file linked from the profile/schema?)
  • Can we achieve a way of reusing existing rules in different contexts?
  • How will this information be generated and by whom? (Part of component registry/separate editor,separate registry?)

First solution

The idea is to specify a generic solution for value derivation in CMDI 1.2 without specific restrictions on allowed functions or constants. This is particularly important as there was no broader evaluation of needed functionality (and the above mentioned examples are probably just a start). Furthermore additional extensions in the infrastructure that are needed to support this functionality can be developed at a later time (as long as they are not directly related to the CMDI 1.2 specification process).

The CMD general component schema is extended with two additional (optional) attributes (AutoValueProcedure? and AutoValueParameters?) for the elements and attributes specification. AutoValueProcedure? contains a URI of an external specification of a procedure or constant (like "getLanguageNameForISO639", "numericAddition" "filesize", "currentDate" etc.). A second attribute contains the arguments for these procedures as a list of XPath expressions (referencing other elements in the same file).

Pros

  • By referencing external procedures the functionality can be easily extended
  • Procedures can be implemented as services (which hide their complexity)

Cons

  • needs additional components in the infrastructure, namely: a "function registry" where the semantics of all operations/constants is described and all allowed arguments are defined (semantics + allowed datatypes).
  • the proposed solution does not guarantee that referenced nodes do exist (usage of a component in various contexts or when refering to optional elements). This is especially important for derivation of values depending on component-external elements. In this case the editor should notify the user of this problem.

Centre impact

  • Affected tools
    • Infrastructure
      • Component Registry
    • Editors
    • Viewers
  • Impact on instances

Implementation examples

Derivation rules will be represented on the model level only (component specification -> XSD), not on the instance level.

Implementation in general component schema

The specification of CMDI elements and attributes is extended by two optional attributes (AutoValueProcedure? (xs:anyURI) and AutoValueParameters? (xs:string)).

<!-- list of all attributes that can be bound to a cl_el -->
    <xs:attributeGroup name="clarin_element_attributes">
        <xs:attribute name="name" type="xs:Name" use="required">
            <xs:annotation>
                <xs:documentation>The name of the element.</xs:documentation>
            </xs:annotation>
        </xs:attribute>
        ...
        <xs:attribute name="AutoValueProcedure" type="xs:anyURI">
            <xs:annotation>
                <xs:documentation>The URI of a procedure that is used to derive the content of this element based on external information.</xs:documentation>
            </xs:annotation>
        </xs:attribute>
        <xs:attribute name="AutoValueParameters" type="xs:string">
            <xs:annotation>
                <xs:documentation>A list of XPath expressions that is used as arguments for the procedure specified in AutoValueProcedure.</xs:documentation>
            </xs:annotation>
        </xs:attribute>
    </xs:attributeGroup>

and similar for Attribute specifications

Implementation on component model level

Example:

<CMD_Element name="Duration" AutoValueProcedure="http://www.clarin.eu/cmdi/autovalue/procedure/datediff" AutoValueParameter="../StartRangeDate ../EndRangeDate"/>

Open questions

  • Using new namespace?

Second solution

A simpler solution would be to define a fixed set of functions (numeric operations + some frequently used functions like replacement of language codes with language names) and keywords (like $filesize, $current_time etc.)).

The CMD general component schema is extended with one additional (optional) attribute (AutoValue?) for the elements and attributes specification. This attribute contains the function that is used to generate the content of the element/attribute and is based on these predefined functions and keywords. Content of other elements that is needed to generate the value is addressed by using XPath expressions.

Pros

  • simpler solution as proposal 1 (reduces implementation effort)
  • no need to build new infrastructure components

Cons

  • may lack expressiveness (missing functions may become clear at some later point)
  • adding new functions needs changes in all editors (may occur regularly)
  • the proposed solution does not guarantee that referenced nodes do exist (usage of a component in various contexts or when refering to optional elements). This is especially important for derivation of values depending on component-external elements. In this case the editor should notify the user of this problem.

Centre impact

  • Affected tools
    • Infrastructure
      • Component Registry
    • Editors
    • Viewers
  • Impact on instances

Implementation examples

The specification of CMDI elements and attributes is extended by one optional attribute (AutoValue? (xs:string)).

<!-- list of all attributes that can be bound to a cl_el -->
    <xs:attributeGroup name="clarin_element_attributes">
        <xs:attribute name="name" type="xs:Name" use="required">
            <xs:annotation>
                <xs:documentation>The name of the element.</xs:documentation>
            </xs:annotation>
        </xs:attribute>
        ...
        <xs:attribute name="AutoValue" type="xs:string">
            <xs:annotation>
                <xs:documentation>A function that is used to derive the content of this element based on external information.</xs:documentation>
            </xs:annotation>
        </xs:attribute>
    </xs:attributeGroup>

and similar for Attribute specifications

Implementation on component model level

Example:

<CMD_Element name="AgeOfFile" AutoValue="$CurrentDate-date({../CreationDate})"/>

Third solution (supported in CMDI 1.2)

(Based on solution 2) The CMD general component schema is extended with one additional (optional) attribute (AutoValue?) for the elements and attributes specification. This attribute contains the function that is used to generate the content of the element/attribute.

The supported attribute only provides a kind of "hook" to extend CMDI components with a derivation functionality. There will be no concrete specification of supported functions, syntax or of the mechanism to reference content of other elements. The specific implementation is up to the community and not part of the CMDI 1.2 specification process.

Pros

  • allows to add derivation functionality to CMDI 1.2 components without determining functionality or expressiveness
  • no need for immediate changes in the CMDI infrastructure

Cons

  • as derivation functionality is based on components this solution does not guarantee that referenced nodes do exist (usage of a component in various contexts or when refering to optional elements). This is especially important for derivation of values depending on component-external elements. In this case the editor should notify the user of this problem.

Centre impact

There is no need for immediate changes in the infrastructure. When there is an agreement on the concrete implementation the following tools should support it:

  • Affected tools
    • Infrastructure
      • Component Registry
    • Editors
    • Viewers
  • Impact on instances

Implementation examples

The specification of CMDI elements and attributes is extended by one optional attribute (AutoValue? (xs:string)).

<!-- list of all attributes that can be bound to a cl_el -->
    <xs:attributeGroup name="clarin_element_attributes">
        <xs:attribute name="name" type="xs:Name" use="required">
            <xs:annotation>
                <xs:documentation>The name of the element.</xs:documentation>
            </xs:annotation>
        </xs:attribute>
        ...
        <xs:attribute name="AutoValue" type="xs:string">
            <xs:annotation>
                <xs:documentation>A function that is used to derive the content of this element based on external information.</xs:documentation>
            </xs:annotation>
        </xs:attribute>
    </xs:attributeGroup>

and similar for Attribute specifications

Implementation on component model level

Example (just for illustration purposes, used syntax and functions are not part of the specification):

<CMD_Element name="Size" AutoValue="filesize"/>

Tickets

Tickets in the CMDI 1.2 milestone with the keyword derivedvalues:

Ticket Summary Owner Component Priority Status
No tickets found

Discussion

Oddrun: Good idea, but I think this is very different from the extended display information discussed on another page. While the latter has to do with information targeted visualisation of metadata, we are here concerned with alternative methods by which values of metadata fields may be decided. In particular, methods involving operations on other metadata fields and on the resources themselves. In other words, we are here talking about standard metadata. In my mind, derivation rules (calculation expressions) come in the same class as specifications of data type, restrictions (e.g. closed vocab) and syntax patterns (e.g. patterns for earth coordinates, points in time, etc), which are all specified in the components. Hence the derivation rules should also be specified in the components. For this we will need a simple rule/expression language, involving operands (metadata fields (not including other derived fields?), external resources and som fixed expressions (e.g. DateOfToday?)) and operators (at least arithmetic operators for numbers, concatenation and substring for texts, conversion between text and numbers, access method for external files, - possibly also a way of expressing conditions) to be performed on the operands. For this to function well, the metadata editor must be able to discover when any updated field is involved in some derivation rules, and either perform automatic syncronization of the derived fields, or alert the user accordingly.

One last word: This sounds like a very nice feature which makes it possible to create richer metadata without burdening the metadata creator. However, it has the potential of making the metadata handling quite complex, so showing restraint in using it should be part of the metadata modeller's best practice.

Discuss the topic in general below this point

Last modified 10 years ago Last modified on 04/07/14 09:15:20