wiki:CMDI 1.2/Cues/Derived values

Version 7 (modified by oddrun.ohren@nb.no, 10 years ago) (diff)

--

This page is a subpage of CMDI 1.2

Insertion of derived values

The issue

Most metadata needs to be created manually but in some cases parts of it can be derived: either from the described resource (e.g. a lot of technical metadata: file size, encoding, bit rate, ...) or from other metadata values (e.g. language code -> language name), sometimes in combination with external data (age can be derived from the combination of date of birth (metadata) and the current date (external information)).

The actual derivation of values has to be carried out by the tool (i.e. the editor), obviously, but the logic will be largely profile or component specific. Therefore a unified method of specifying relations between metadata fields and resources, other metadata fields or 'environmental' values is needed, not unlike the display rules proposed for extended display information. An obvious way of representing these rules would be by means of something like RDF triples to indicate that the value of (element/resource/...) X provides a value for element Y if transformed in way T. The editor would then interpret these rules and apply them where applicable.

Examples of rules that might be supported:

  • CreationDate gets populated with the current date (format yyyy-mm-dd)
  • FileSize gets populated with the file size in bytes of the referenced resource
  • LanguageName gets filled in based on the value of LanguageCode and an external lookup table
  • The value of Actor.Age becomes the difference between the current data and the value of Actor.DateOfBirth in years (floored)
  • ....

These examples illustrate the possibilities but also the potential complexity of such a rule based system.

Open questions (cf the open questions regarding extended display information):

  • Where to store this information? (In the profile/schema, in a separate file linked from the profile/schema?)
  • Can we achieve a way of reusing existing rules in different contexts?
  • How will this information be generated and by whom? (Part of component registry/separate editor,separate registry?)

Proposed solutions

No concrete proposals, open for discussion.

Tickets

Tickets in the CMDI 1.2 milestone with the keyword derivedvalues:

Ticket Summary Owner Component Priority Status
No tickets found

Discussion

Oddrun: Good idea, but I think this is very different from the extended display information discussed on another page. While the latter has to do with information targeted visualisation of metadata, we are here concerned with alternative methods by which values of metadata fields may be decided. In particular, methods involving operations on other metadata fields and on the resources themselves. In other words, we are here talking about standard metadata. In my mind, derivation rules (calculation expressions) come in the same class as specifications of data type, restrictions (e.g. closed vocab) and syntax patterns (e.g. patterns for earth coordinates, points in time, etc), which are all specified in the components. Hence the derivation rules should also be specified in the components. For this we will need a simple rule/expression language, involving operands (metadata fields (not including other derived fields?), external resources and som fixed expressions (e.g. DateOfToday?)) and operators (at least arithmetic operators for numbers, concatenation and substring for texts, conversion between text and numbers, access method for external files, - possibly also a way of expressing conditions) to be performed on the operands. For this to function well, the metadata editor must be able to discover when any updated field is involved in some derivation rules, and either perform automatic syncronization of the derived fields, or alert the user accordingly.

One last word: This sounds like a very nice feature which makes it possible to create richer metadata without burdening the metadata creator. However, it has the potential of making the metadata handling quite complex, so showing restraint in using it should be part of the metadata modeller's best practice.

Discuss the topic in general below this point