wiki:CMDI 1.2/Header/MdType

This page is a subpage of CMDI 1.2

MdType instance header element

The issue

Originally described in CmdiCollectionsIdentification

Due to the difference in granularity it might be good to be able to select CMD records of a specific granularity, e.g., collection or item level.

Discussion

Florian?: Why restrict this to a a two-level granularity 'collection' and 'part_of_collection'? Most of us agree that relations between MD instances are necessary and should be made possible (see for instances the discussions in the VLO task force group). The very first relations that all MD providers using hierarchical MD structures will apply are 'is_part_of' and 'has_part'. Once these relations are encoded, you don't need any rigid two-level granularity as proposed here: just the fact that a relation 'is_part_of' is present indicates that this MD cannot be the root of a resource. In fact both relations are already used in CMDI 1.1: namely the header element 'isPartOf' (= relation 'is_part_of') and the ResourceProxyLink? of type 'Metadata' (= relation 'has_part'). To my knowledge even in IMDI unrestricted hierarchies have been used. In the current CMDI at least the centers BAS and IDS uses relations like above. Relations will make MdType obsolete.

Menzo? Using the relations we could distinguish three levels: roots, inner nodes and leafs. It might be problematic to automatically mark all roots and inner nodes as collections. The solutions below would allow each center to determine which are the appropriate ones.

Axel?: Maybe we should think about whether it is more helpful for _the users_ to either know that some resource is part of a bigger collection or else how the data provider likes to describe/view the resource. Florian seems to be a proponent of the user's view and I tend to agree. @Menzo, under this view, appropriateness of the part/collection distinction would have to be judged in the light of the user's research question rather than the centre's general decision.

Proposed solutions

First solution: MdType header element

A new header element that indicates the type.

Pros

Easy

Cons

There can be a tendency to keep on extending the header instead of using CMDI's flexibility.

Centre impact

  • Affected tools
  • Impact on instances

Implementation examples

  • Implementation on model level
  • Implementation on instance level

Discussion

Axel?: This would work if we agreed on a fixed set of types, ideally with just two elements (collection, part_of_some_bigger_collection). But as always there is no sharp line to be draw between even those two values. E.g. for the Deutsches Textarchiv (a collection of digitalized complete printed book) one could argue that each part is also a complete and self-contained resource in its own right -- it just happens to have been arbitrarily chosen for inclusion in the Deutsches Textarchiv. Maybe we would need to introduce a third type 'archive' for cases like those?

Second solution: the one and only collection profile

One collection profile to be used by all.

Pros

Easy

Cons

Inflexible

Centre impact

  • Affected tools
  • Impact on instances

Implementation examples

  • Implementation on model level
  • Implementation on instance level

Discussion

Axel?: We shouldn't go this route. It's so inflexible that it is hard to imagine such a catch-all profile will ever stabilize. We would end up with a whole family of profiles anyway once profile versioning is in place.

Third solution: a mandatory collection component for collection profiles

Any collection level profile should contain a specific CLARIN collection component.

Pros

Easy, may have low impact if the component is optional (but can be detected in the schema)

Cons

What should be in the component?

Centre impact

  • Affected tools
  • Impact on instances

Implementation examples

  • Implementation on model level
  • Implementation on instance level

Discussion

Axel?: If we include life-cycle management in CMDI 1.2 any changes to such a component would lead to many profiles using a then deprecated module. Also, much of the metadata that you would want to include here already appears in other components so this could easily lead to duplicated information and would generally bloat the profiles.

Fourth solution: the profile root uses a data category from a collection relation set

The profile root should use one of the data categories from a specific collection relation set in RELcat.

Pros

Easy, low impact

Cons

Data category might be unintentionally used

Centre impact

  • Affected tools
  • Impact on instances

Implementation examples

  • Implementation on model level
  • Implementation on instance level

Discussion

Axel?: This would be backwards compatible with CMDI 1.1 and generally that's a plus. Considering Florian's point raised at the top of this page this could be seen as a supplement to the resource relation modelling. Using a suitable DatCat? for the profile root could encode the data provider's classification of the resource. Another nice aspect of this solution is that we wouldn't have to introduce yet another version of CMDI once we find that a fixed set of Header/MdType values should better be extended.

I would strongly opt for this solution. The general recommendation to metadata modelers (and applications like the VLO) should be to use the profile root's datcat as the data provider's canonical resource type classification. The resource relation modelling should be conceived as a description of the resource's internal structuring -- and not as a classification. Typically, the classification will depend on the description but that's not necessarily the case.

Fifth solution: collection level instances are harvested from a specific OAI-PMH set

Like done for web services collection CMD records can be explicitly requested by harvesting a center specific OAI-PMH set.

Pros

No need to touch CMD profiles or instances

Cons

VLO currently doesn't use endpoint information, i.e., the facet mapping can't select based on the OAI-PMH endpoint or set.

Centre impact

  • Affected tools
  • Impact on instances

Implementation examples

  • Implementation on model level
  • Implementation on instance level

Discussion

Axel?: No, this is a dirty hack, i.e. a tailor made solution for VLO harvesting outside of the CMDI framework proper. Collection information must be expressed within the CMDI framework so that it is available not only at harvest time but also within MD instances.

Tickets

Tickets in the CMDI 1.2 milestone with the keyword mdtype:

Ticket Summary Owner Component Priority Status
No tickets found

Discussion

Discuss the topic in general below this point

Last modified 10 years ago Last modified on 02/18/14 06:27:02