wiki:VLO-Taskforce/Relations

Version 22 (modified by fankhauser@ids-mannheim.de, 10 years ago) (diff)

--

VLO Facets

  • [facets.ods List of Facets]

Relationships

The CMDI metadata framework provides for a generic mechanism to represent (directional) relationships between objects: <ResourceProxy?>. A <ResourceProxy?> specifies the <ResourceType?> (one of "Landing Page", "Resource", "Metadata") and the target <ResourceRef?> of a relationship, which (usually) contains a URL, sometimes a PID (as a special case of a URL). A <ResourceProxy?> is equipped with an id attribute, which can be and is being used for specifying further information, such as a descriptive, human readable anchor text and its semantics (partOf, versionOf, source, ...) in the component part of a CMDI record. Finally, for some kinds of relationships, there exist some dedicated elements in the <Resources> section of a CMDI record:

This huge, combinatorial design space for representing relationships has been creatively exhausted by the CLARIN-D Centers. As a consequence, the VLO is basically agnostic w.r.t. relationships.

In the following, some of the existing representations are analyzed, with the explicit goal of narrowing the design space and making some kinds of relationships more useful for the VLO. For some more information on the status of relationships in the discussion on CMDI 1.2 see also https://docs.google.com/spreadsheet/ccc?key=0Avyg_78eBoF4dFUxR2VpR01XRFEzSUVUb2tXcFduSXc&usp=sharing#gid=0

HZSK

HZSK represents all resources of a corpus in one CMDI record. Thus, the target of a relationship (<ResourceRef?>) has always type (ResourceType?) Resource. The individual resources are further specified in the component part of the CMDI record by referring to the <ResourceProxy?> via its id attribute.

Example:

http://virt-fedora.multilingua.uni-hamburg.de/drupal/fedora/repository/cmdi:demo/cmdi/metadata.xml

<ResourceProxy id="ACDIMB865D8-D9BA-7F9B-E652-D00D960850B4">
  <ResourceType mimetype="text/xml">Resource</ResourceType>
  <ResourceRef>http://hdl.handle.net/11858/00-248C-0000-000E-0181-F</ResourceRef>
</ResourceProxy>
...
<HZSKTranscription ComponentId="clarin.eu:cr1:c_1345561703658" ref="ID8392DD18-04C3-9DC7-A7F5-2FA8A3639EA4">
  <Name>Rudi Völler Wutausbruch</Name>
  <TranscriptionConvention>HIAT (simplified)</TranscriptionConvention>
  ...
</HZSKTranscription>

BAS

BAS represents the resources of a corpus by several CMDI records, and employs a variety of approaches to represent relationships:

  1. The relationship between the CMDI record for a corpus and its parts is specified explicitly in one direction by means of CMDI's built-in <isPartOf> element.
  2. Relationships with a target of type Resource are further specified in the component part of the CMDI record by referring to the <ResourceProxy?> via its id attribute
  3. Relationships between individual components, such as from <media-file> to <media-session-actor> are represented by referring to the target's id attribute as well.

Example:

https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/ZIPTEL/ZIPTEL.2.cmdi.xml

<ResourceProxy id="c_0000000001">
  <ResourceType mimetype="text/xml">Metadata</ResourceType>
  <ResourceRef>https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/ZIPTEL/0001.2.cmdi.xml</ResourceRef>
</ResourceProxy>

http://catalog.clarin.eu/oai-harvester/cmdi-providers/harvested/results/cmdi/Bayerisches_Archiv_f_r_Sprachsignale/oai_BAS_repo_Corpora_ZIPTEL_0001.xml or https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/ZIPTEL/0001.2.cmdi.xml

<ResourceProxy id="r_0000000001">
  <ResourceType mimetype="audio/raw">Resource</ResourceType>
  <ResourceRef>https://clarin.phonetik.uni-muenchen.de/BASRepository/Corpora/ZIPTEL/0001/z10001z2.dea</ResourceRef>
</ResourceProxy>
...
<IsPartOfList>
   <IsPartOf>https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/ZIPTEL/ZIPTEL.2.cmdi.xml</IsPartOf>
</IsPartOfList>
...
<media-file actor-ref="s_0000000001" ref="r_0000000001">
  <Type>audio</Type><Quality>3</Quality>
  <RecordingConditions>un-supervised answering of a question prompted via telephone</RecordingConditions>
...
</media-file>
...
<media-session-actor id="s_0000000001">
  <Role>question answering</Role> 
  <Name>unspecified</Name>
  <FullName>unspecified</FullName>
  <Code>0001</Code>
  ...
</media-session-actor>

IDS-Mannheim

IDS-Mannheim represents resources by several CMDI records, and employs a variety of approaches to represent relationships:

  1. The first version of the historical newspaper corpus MKHZ represents relationships by <ResourceProxy?> as well as OLAC-Dcmi-Terms elements such as <hasPart>, where both point to a PID (see for example http://repos.ids-mannheim.de/fedora/objects/clarind-ids:mkhz.000000/datastreams/CMDI/content)
  2. The second version of the historical newspaper corpus represents relationships by <ResourceProxy?> and further specifies the semantics (and anchor text) of the relationship in the component part of the relationship (see for example http://repos.ids-mannheim.de/fedora/objects/clarin-ids:mkhz1.00000/datastreams/CMDI/content). The underlying conceptual model is depicted in the figure below.
  3. Relationships in the corpora of spoken language are represented by ResourceProxy?'s only, and partOf relationships are further specified by means of CMDI's built-in <IsPartOf?> element (see for example http://repos.ids-mannheim.de/fedora/objects/clarin-ids:folk.FOLK_S_00248.cmdi/datastreams/CMDI/content)

conceptual model of MKHZ representation (small)

Leipzig

Leipzig's approach to representing relationships is similar to IDS (and BAS) Option b. The differences are as follows:

  1. For each relationship there exists a separate Component, which already has a built-in attribute ref of type idrefs.
  2. The description of a relationship is structured rather than just a simple anchor text.
  3. Inverse relationships seem not to be represented explicitly.

Example:

<ResourceProxy id="ulei-11858-00-229C-0000-0001-B06F-3-component-dataprovider-1">
  <ResourceType mimetype="text/xml">Metadata</ResourceType>
  <ResourceRef>http://hdl.handle.net/11858/00-229C-0000-0001-B06F-3@type=dataprovider&id=1</ResourceRef>
</ResourceProxy>
...
<LCC_DataProvider ComponentId="clarin.eu:cr1:c_1381926654509" ref="ulei-11858-00-229C-0000-0001-B06F-3-component-dataprovider-1">
  <Id>1</Id>
  <Name>LCC data provider "www.shortnews.de" in resource with identifier 11858/00-229C-0000-0001-B06F-3</Name>
  <Description xml:lang="eng">Data provider of the Leipzig Corpora Collection: www.shortnews.de</Description>
</LCC_DataProvider>

Summary on Representation of Relationships

TBD, in the form of a table of design options.

Example Profiles

Background Information

Attachments (7)

Download all attachments as: .zip