wiki:CMDI 1.2/Resource proxies/ResourceRelation

Version 26 (modified by oddrun.ohren@nb.no, 11 years ago) (diff)

--

This page is a subpage of CMDI 1.2

Specification of resource relations

The issue

CMDI 1.1 has an optional element /CMD/Resources/ResourceRelationsList that can look something like the following:

<ResourceRelationList>
 <ResourceRelation>
  <RelationType>describes</RelationType>
   <Res1 ref="a_text"/>
   <Res2 ref="a_photo"/>
  </ResourceRelation>
 </ResourceRelationList>
</Resources>

This is a relatively little used feature and it has even been argued that it can be removed. However it is being used in practice and has sensible use cases (at least theoretically). The problems with this implementation are the lack of clear semantics, a forced but implicit relational direction (e.g. Res1 describes Res2) and inelegant naming (Res1, Res2).

Proposed solution

Investigations and discussion

A query targeted towards all harvestable metadata early February 2012 resulted in only 32 valid occurrences of ResourceRelation. On the other hand, an investigation into metadata from Clarin-D revealed a plethora of ways to express relations between resources, none of which used the ResourceRelation mechanism. It is obvious that exploiting tools will have a hard time interpreting relationships between resources, irrespective what is done to ResourceRelationList?.

During discussion, several solutions have been proposed:

  • Declare source and target resources explicitly as well as connect the relation type to a concept registry, and thereby make the semantics clearer for binary relations
  • Generalize the above by allowing the modeller to specify the resources’ roles in the relationship, instead of just source and target.
  • Remove the ResourceRelationList? altogether, on account of the almost total lack of usage, and the richness of other ways to express relationships in CMDI

From the exchange on the wiki it is clear that discussions on this topic will have to go on beyond the deadline for CMDI 1.2. Hence, it is felt that no drastic change should be performed in CMDI 1.2. The proposed solution merely attempts to clarify the semantics of the current specification, all the while keeping the door open for expressivity extension at a later date.

Proposal

The solution suggested for CMDI 1.2 can be illustrated by the following example:

<ResourceRelationList>
  <ResourceRelation>
    <RelationType dcr:datcat="http://www.isocat.org/datcat/DC-2318">annotates</RelationType>
    <Resource role="annotation" dcr:roledatcat="http://www.isocat.org/datcat/DC-4009" ref="rp1"/>
    <Resource role="annotated" dcr:roledatcat="http://www.isocat.org/datcat/DC-2656" ref="rp2"/>
  </ResourceRelation>
</ResourceRelationList>

The Resource elements @role and @ref should be mandatory, dcr:roledatcat optional, minOccurs=2, maxOccurs=2. (An alternative way of specifying this would be to define Roles as sub-elements of Resource. The choice concerning this should be made according to the decision on general schema consistency)

Consequences of proposal:

  • Relationships are constrained to binary relations for the time being. (This may be alleviated by extending the cardinality of Resource to 2:unbounded at a late date. If so, it is crucial to make clear what n Resource elements is allowed to express, e.g.
    • an n-ary relation
    • n-1 binary relations (1:n) (e.s. 1 source and n-1 targets)
    • or both. If so, how to distinguish between them.)
  • No forced direction of relationships. Even so, any metadata creator is free to limit the resource roles to source and target, and thereby retaining a strict from/to relationship specification
  • Semantic marking of both relation type and resource roles.

Comment

Although outside the scope of the actual CMDI model, it should be pointed out that the metadata investigations performed as part of this work, indicate urgent need for proper guidance to express relationships in CMDI. Good examples, training and documention of best practice are all important elements in such guidance.


Old stuff

Proposed solutions

First solution

In an e-mail discussion (17 January 2012), Dieter? proposed the following:

- about the <ResourceRelationList>, with the input from Torsten and
Menzo I would like to propose a structure like:

<ResourceRelationList>
  <ResourceRelation>
    <RelationType dcr:datcat="http://www.isocat.org/datcat/DC-4009">
     annotates
    </RelationType>
    <Source ref="rp1"/>
    <Target ref="rp2"/>
  </ResourceRelation>
</ResourceRelationList>

This has:

-- a machine-readable relationtype (a datcat) while maintaining a prose
text description possibility

-- a clearly directed graph nature for the relation (source/target)

For symmetric relations that means that if there is a bidirectional
relation 2 ResourceRelations need to be specified (A -> B and B -> A)

We could make this change as no one is currently using ResourceRelationList.

Pros

  • It adds semantic grounding of the relation

Cons

  • Forces direction on the relation
  • Limits the number of resources taking part in the relation to two.

Centre impact

Tools that generate/process resource relation lists will need to be adapted

Discussion

Discuss this solution proposal in this section

Second solution

A more flexible solution was discussed recently. It would have for example:

<ResourceRelationList>
  <ResourceRelation>
    <RelationType dcr:datcat="http://www.isocat.org/datcat/DC-2318">annotates</RelationType>
    <Resource ref="rp1">
      <Role dcr:datcat="http://www.isocat.org/datcat/DC-4009">annotation</Role> 
    </Resource>
    <Resource ref="rp2"/>
      <Role dcr:datcat="http://www.isocat.org/datcat/DC-2656">annotated</Role> 
    </Resource>
  </ResourceRelation>
</ResourceRelationList>

The dcr:datcat attributes should probably be optional.

The maximal number of <Resource> elements should be unbounded, allowing for relations between any number of resources. (EXAMPLE? USE CASE?)

Pros

  • Semantic marking of both relation type and roles of resources
  • Option to have more than two resources involved in a relation
  • No forced relational direction

Cons

  • More verbose
  • More processing of datacategories
  • No default direction (while most cases will be covered by subject-object in that order)

Centre impact

Tools that generate/process resource relation lists will need to be adapted

Discussion

Discuss this solution proposal in this section

Tickets

Tickets in the CMDI 1.2 milestone with the keyword resourcerelationlist:

Ticket Summary Owner Component Priority Status
No tickets found

Discussion

Florian (BAS): The first proposal makes a lot of sense. But if such a general relation mechanism is implemented, we should also consider to remove the special relation 'isPartOf' (see seperate issue) and deal this as any other relation. The second proposal is not very lucid to me. Can anybody add a practical use case where this is necessary?

Oliver (IDS)?: The second version is a generalization of the first one. For some relations, it might lead to a more compact representation of the 1:N relations, e.g. you have 5 different annotations of a text file (e.g. different tools create POS annotations). With the first version, you'll need 5 ResourceRelation elements, in the latter case only one. However, I don't think proposed XML serialization for solution 2 could me made better, e.g:

<ResourceRelationList>
  <ResourceRelation>
    <RelationType dcr:datcat="http://www.isocat.org/datcat/DC-2318">annotates</RelationType>
      <Resource role="source" dcr:datcat="http://www.isocat.org/datcat/DC-4009" ref="rp1"/>
      <Resource role="target" dcr:datcat="http://www.isocat.org/datcat/DC-2656" ref="rp2"/>
    </Resource>
  </ResourceRelation>
</ResourceRelationList>

where mandatory @ref references a ResourceProxy and @dcr:datcat is optional. We need to discuss, if we want to make @role mandatory or not. I don't have a strong feeling in either direction. However, porper XSD magic can be used to ensure, that only one source role and at least one target role exists (XSD 1.1 assertions). Theoretically, one could also model N:M relations with this mechanism (by providing more source roles) and we need to to discuss, if we want to allow this. If we decide for this general relation mechanism, I agree with Florian to get rid of the IsPartOfList.

Twan?: Thanks Oliver, I like your improved representation and constraint proposals. Especially if we want to broaden the use of 'resource relations', I think we must build in this kind of flexibility (including N:M relations, what would be the downside?). On that note, using this to represent IsPartOf relations we probably want to
(1) Rename ResourceRelationList to RelationList and move it out of Resources
(2) Provide ways of referring to the document itself AND other documents that are not resources in the document (i.e. a way to express "this is part of collection Y"). For example:

<RelationList>
  <ResourceRelation>
    <!-- omitted details -->
  </ResourceRelation>
  <MetadataRelation>
    <RelationType dcr:datcat="http://www.isocat.org/datcat/DC-1234">partOf</RelationType>
      <MetadataDocument role="part" dcr:datcat="http://www.isocat.org/datcat/DC-2345"/> <!-- No ref could denote 'this document' -->
      <MetadataDocument role="container" dcr:datcat="http://www.isocat.org/datcat/DC-3456" ref="../mycollection.cmdi"/>
  </MetadataRelation>
</RelationList>

which adds a lot of power to the (resource) relation list but of course also complexity and another level of indirection. Is that roughly what you had in mind, Florian and Oliver? If so, the question is: is is it worth the additional hassle or should the 'part of' realation for metadata documents keep a special status.

Oddrun:

Comment to Oliver and Twans suggestions: I agree with the improved generalized version of ResourceRelation. However, I tend to think that the IsPartOfList? should keep a special status. After all, the ResourceProxyList? is in effect a PARTS list, giving the downlinks in the hierarchy a special status. So why not also the uplinks? That way, the hierarchical (or DAG-like) resource structure can be clearly and explicitly expressed, separately from other relationships the resource as a whole or its individual parts may engage in.

Twan, I am not sure of the necessity of having a separate MetadataRelations?, unless you want to distinguish between

  • relations between metadata files as resources in their own right, and
  • relations between the resources represented by the metadata files.

In your example, my feeling is that the relation expressed is to hold between the resources, not the metadata.

Now the PARTS (ResourceProxyList?) list, the IsPartOfList? and the ResourceRelationList? combined provide all the structural information about the described resource that the owner wish to express, and should perhaps be wrapped together. If Resources doesn’t suit, we might rename it to ResourceSpec?, like this:

<ResourceSpec>
    <ResourceProxyList>
        <!-- this is in effect a PARTS list, i.e. the downlinks in the hierarchical structure --> 
        <ResourceProxy id="rp1"/> 
        <ResourceProxy id="rp2"/>
        <ResourceProxy id="rp3"/>
    </ResourceProxyList>
    <isPartOfList>
        <!-- the uplinks in the hierarchical structure, from THIS resource as a whole -->
        <IsPartOf>http://infra.clarin.eu/example/mycollection1.cmdi</IsPartOf>
        <IsPartOf>http://infra.clarin.eu/example/mycollection2.cmdi</IsPartOf>
    </isPartOfList>  
    <ResourceRelationList>
        <!-- internal relations  between resources listed in ResourceProxyList -->
        <!-- relations between resources listed in ResourceProxyList and other resources -->
        <!-- relations (excluding isPartOf as expressed by the isPartOfList) between THIS resource as a whole and other resources -->
        <ResourceRelation>
            <RelationType dcr:datcat="http://www.isocat.org/datcat/DC-2318">annotates</RelationType>
            <Resource role="source" dcr:datcat="http://www.isocat.org/datcat/DC-4009" ref="rp1"/>
            <Resource role="target" dcr:datcat="http://www.isocat.org/datcat/DC-2656" ref="rp2"/>
            </Resource>
        </ResourceRelation>
        <ResourceRelation>
            <RelationType dcr:datcat="http://www.isocat.org/datcat/DC-xxx1">partOf</RelationType>
            <Resource role="part" dcr:datcat="http://www.isocat.org/datcat/DC-yyy1" ref="rp3"/>
            <Resource role="container" dcr:datcat="http://www.isocat.org/datcat/DC-zzz1" ref="../anotherCollection.cmdi"/>
            </Resource>
        </ResourceRelation>
        <ResourceRelation>
            <RelationType dcr:datcat="http://www.isocat.org/datcat/DC-xxx2">toolsUsed</RelationType>
            <Resource role="part" dcr:datcat="http://www.isocat.org/datcat/DC-yyy2"/> <!-- no ref denotes the resource described by THIS document -->
            <Resource role="container" dcr:datcat="http://www.isocat.org/datcat/DC-zzz2" ref="../someAnnotatorTool.cmdi"/>
            </Resource>
        </ResourceRelation>
    </ResourceRelationList>
</ResourceSpec>

I realise this is very much like before, but with your improved relationship version. However, with a clear semantics, I think it is a good format.

More comments on relationships: We need to be clear when we are talking about n-ary relation with n>2 as opposed to a set of several binary relations. We also need to be clear on the semantics of the ResourceRelation element: Does one ResourceRelation element express one relationship only, or may it sometimes express several relationships as suggested by Oliver?

  • If we constrain ResourceRelation to represent one relationship, and go for solution 2, it is possible to express realtionships of higher dimensions than 2. That is, each resource listed in the ResourceRelation participates in the same relation,for example, any ResouceRelation? with 3 resources represents a ternary relation.
  • If we allow one ResourceRelation to represent more than one relationship, I think in effect we limit the expressive power to binary relations. Oliver's example with 5 annotations of the same resource expressed as one ResourceRelation would then represent 5 binary relations.

I think the first bullet (one ResourceElement? = one relation) gives the most generic and extendible solution. Then we may or may not limit ourselves to binary relations, and it is easy to extend to higher dimensions later, if appropriate.

Using datcats for relationships and roles sounds like a good idea, but we should take care how we use them. The examples in the original text above show the difficulties, for instance:

  • DC-4009 is used to represent the relationship annotates, but is defined in IsoCat? as "The application of a scheme to texts...", that is, an operation/action, not a relation.

How strict should we be in applying datcats to relationships, - is it sufficient to select datcats conveying the general idea of the relation, or must the datcat be explicitly defined as a relation (as in the other example using DC-2318)

Oliver?: A little blurry comment to Oddruns last suggestion: I wonder, what if we remove any "relation semantics" from ResourceProxyList and ditch the isPartOf stuff. CMDI could define ResourceProxyList just as an "inventory" of entities we like to talk about and all relations, also more elementary relations like "hasPart" and "isPartOf", are explicitly expressed within the ResourceRelationList. Or are we going to become to abstract this why?

Another observation: [putting on my center agnostic hat] at least one centers defines relations between Resources by using ResourceProxies and the @href attribute within the component section of the CMDI instance, e.g. by putting an @href on isPart or source elements (OLAC DCMI-terms profile). Relation suddenly appear within the Components section. What do you think about this approach (especially considering the metadata exploitation point-of-view)?

Oddrun: I partly agree, Oliver here touch on the key characteristics of CMDI: While it's flexibility allows us to configure our metadata according to any specific needs, it is equally the case that one and the same phenomenon may be expressed in many different ways. Which is sometimes confusing...Relationships between resources is such a case. Any metadata modeller may define components or elements with the specific purpose of handling relationships. OLAC DCMI-terms is one example, - another is the ResourceRelationInfo? component included in all 4 resourceInfo profiles from METASHARE. Then, in addition, CMDI offer the integrated relationship elements in the Resources section, some "hard coded"/defined (isPartOf), others to be defined by the modeller or metadata creator. For the hard coded ones, the advantage of storing them outside components is the possibility of imposing a shared semantics, making exploitation easier. However, this does not apply to the ResourceRelation element, which increasingly - after its generalisation - looks like a regular component. Maybe it would be better to define ResourceRelation as a recommended component in CR, and exclude it from the Resources element?


Peter: VLO Taskforce? contains a summary of how some CLARIN-D centers currently represent relationships. None of the examples collected so far uses the element ResourceRelationships?. Instead, the generic (binary) relationships expressed by ResourceProxy?, are often further described in the component part of a CMDI record.


Twan: This document by Thorsten Trippel from 2012 describes the various non-component header sections of CMDI and expresses the originally intended usage of Resource Relation as well as the Part Of List (emphasis mine):

The Resource Relation List

Resource files do not exist independent of each other if a resource consists of more than one file. For example, audio files and transcriptions are related to each other. The ResourceProxyList only lists these files, the ResourceRelationList makes the relation between pairs of files explicit. For this purpose the ResourceRelation contains a triple of elements defining a directed relation between a Resource 1, which is referenced by a ref-pointer to an id from the ResourceProxy, and a Resource 2 respectively. The relation between the two is given as a string in the RelationType element.

The Is-Part-of List

Resources that are defined in bundles are listed under ResourceProxy. The individual parts can be seen as independent resources as well, such as a subcorpus that can also be distributed on its own. To point out that a resource is part of a larger unit or created as part of a larger unit, the IsPartOfList is introduced referring to one or more larger units by giving the PID of the larger units with the IsPartOf element.


Oddrun:

Ad relations between metadata documents: I will not claim there is never need of expressing relations between metadata documents. E.g. the metadata of a resource may be subdivided into several documents in a way that do not correspond to the resource composition (is such a situation compliant with CMDI at all?). In such cases, the metadata files will need to be related to each other . And it is certainly necessary to have a clear semantics as to which object types (resource or metadata) are connected by a relation statement. However, I do not think the need is great enough to warrant muddling the waters in the non-component part of CMDI by giving the opportunity to talk about metadata files. That is, whenever a file with ResourceType? “Metadata” is listed in ResourceProxyList?, it should represent the resource it describes, not the metadata as a resource in its own right. (it is of course thinkable to allow giving metadata files the ResourceType? “Resource” to indicate that the file in question is to be regarded as a resource in itself, but this would violate the principle of ResourceProxyList? only containing the parts of the described resource).

Likewise, when a metadata file CMDI-x appears in the IsPartOLst, it must be taken to mean that the currently described resource is part of the resource described by CMDI-x. Any information about (including relations between ) metadata as such (apart from the adm info in the header) can and should be handled by dedicated components, IMHO at least.

Ad relations in the non-component part of a CMDI file: It sounds so incredibly plausible and simple in Thortsen’s words from the cited paper above. But the examples from Clarin-D shows that each and every center has their own way of dealing with resource relations. Apparently, quite a representational diversity can be created within the CMDI framework, totally without the help of ResourceRelation... Hence, do we really need ResourceRelation? On the other hand, one might argue that since it is rarely used, removing it doesn’t really simplify thing... It would be nice to see some real world examples of its use, though. However, even though Clarin-D centres express relationships in manifold ways, there is a pattern when it comes to representing the compositional structure of resources, namely that the resource part-of hierarchy is represented by ResourceProxyList? and IsPartOfList?. If practiced universally, aggregators like VLO and others could use this to good effect in their portals. True, hasParts/IsPartOf is just one relation, but a very important one, defining the compositional structure of a resource.

To sum up, I think my position would be:

Oddrun: Some points to consider for the final discussion:

  • It appears that ResourceRelation is very seldom used. However, it would be nice to be able to see some of the few examples that do exist. Any ideas how to find such examples?
  • Consider the advantages and disadvantages of ResourceRelation in CMDI:
    • Expressivity: Does ResourceRelation offer expressivity that cannot be provided by components/elements? (Are there relations between resources that can not be expressed as components/elements? Is there any difference between resources with their own CMDI-file and those who have not (i.e. only listed in the ResourceProxyList?) in this respect?)
    • Standardizing effect: Does ResourceRelation make it easier for tools to utilize (e.g. traverse) relations?
    • ...any other suggestions?