Changeset 3883 for CMDI-Interoperability


Ignore:
Timestamp:
10/24/13 12:27:38 (11 years ago)
Author:
vronk
Message:

minor

File:
1 edited

Legend:

Unmodified
Added
Removed
  • CMDI-Interoperability/CMD2RDF/trunk/docs/papers/2014-LREC/CMD2RDF.tex

    r3872 r3883  
    8888%
    8989
    90 The basic building blocks of CMDI are components. Components are used to group elements and attributes, which can take values, and also other components (see Figure \ref{fig:CMDM}). Components are stored in the Component Registry (CR), where they can be reused by other modellers. Thus a metadata modeller selects or creates components and combines them into a profile targeted at a specific resource type, a collection of resources or a project, tool or service. A profile is a blueprint of a schema for a metadata record. CLARIN centres offer these CMD records describing their resources to the joint metadata domain. There are a number of generic tools which operate on all the CMD records in this domain, e.g., the Virtual Language Observatory\furl{http://www.clarin.eu/vlo/}. These tools have to deal with the variety of CMD profiles. They can do so by operating on a semantic level, as components, elements and values can all be annotated with links to concepts in various registries. Currently used concept registries are the Dublin Core metadata %elements and
     90The basic building blocks of CMDI are components. Components are used to group elements and attributes, which can take values, and also other components (see Figure \ref{fig:CMDM}). Components are stored in the Component Registry (CR), where they can be reused by other modellers. Thus a metadata modeller selects or creates components and combines them into a profile targeted at a specific resource type, a collection of resources or a project, tool or service. A profile serves as blueprint for a schema for metadata records. CLARIN centres offer these CMD records describing their resources to the joint metadata domain. There are a number of generic tools which operate on all the CMD records in this domain, e.g., the Virtual Language Observatory\furl{http://www.clarin.eu/vlo/}. These tools have to deal with the variety of CMD profiles. They can do so by operating on a semantic level, as components, elements and values can all be annotated with links to concepts in various registries. Currently used concept registries are the Dublin Core metadata %elements and
    9191terms and the ISOcat Data Category Registry. These concept links allow profiles, while being diverse in structure, to share semantics. Generic tools can use this semantic linkage to overcome differences in terminology and also in structure.
    9292
     
    130130%
    131131The main added value of LOD\cite{TimBL2006} is the interconnecting of disparate datasets.
    132 In the broader context of LOD, there is meanwhile an Open Knowledge Foundation’s Working Group on Open Data in Linguistics, that represents an obvious pool of candidate
     132In the broader context of LOD there is meanwhile an Open Knowledge Foundation’s Working Group on Linked Data in Linguistics, that represents an obvious pool of candidate
    133133datasets to link the CMD data with\footnote{\url{http://linguistics.okfn.org/resources/llod/}}.
    134134Within these \xne{lexvo} seems most promising starting point, as it features URIs like \url{http://lexvo.org/id/term/eng/}, i.e. based on the ISO-639-3 language identifiers which are also used in CMD records.
     
    137137Step by step we will link other categories like countries, geographica, organisations, etc.
    138138to some of the central nodes of the LOD cloud \cite{Cyganiak2010}, like \xne{dbpedia}, \xne{Yago} or \xne{geonames},
    139 but also domain-specific semantic resource like the ontology for language technology \xne{LT-World} \cite{Joerg2010} developed at DFKI.
     139but also to domain-specific semantic resource like the ontology for language technology \xne{LT-World} \cite{Joerg2010} developed at DFKI.
    140140
    141141\section{CMD to RDF}
     
    143143In the following a RDF encoding is proposed for all levels of the CMD data domain:
    144144\begin{itemize}
    145 \item the CMD meta model,
    146 \item the profile definitions,
    147 \item the administrative and structural information of CMD records, and
    148 \item the individual values in the fields of the CMD records.
     145\item CMD meta model,
     146\item profile and component definitions,
     147\item administrative and structural information of CMD records and
     148\item individual values in the fields of the CMD records.
    149149\end{itemize}
    150150
    151151\subsection{CMD specification}\label{sec:CMDM}
    152152
    153 The main entity of the meta model is the CMD component modelled as a \code{rdfs:Class}. A CMD profile is basically a CMD component with some extra features, implying a specialization relation. It may seem natural to translate a CMD element to a RDF property (as it holds the literal value), but given its complexity (e.g., attributes\footnote{Due to space considerations the abstract will not further discuss attributes.}, relation to the containing component)  it too has to be a \code{rdfs:Class}. The actual literal value is a property of given element of type \code{cmdm:ElementValue}. For values that can be mapped to entities defined in external semantic resources, the references to these entities are expressed in parallel object properties of type \code{cmdm:hasElementEntity} (outbound links). The containment relation between components and elements is expressed with a dedicated property \code{cmdm:contains}.
     153The main entity of the meta model is the CMD component modelled as a \code{rdfs:Class}. A CMD profile is basically a CMD component with some extra features, implying a specialization relation. It may seem natural to translate a CMD element to a RDF property (as it holds the literal value), but given its complexity (e.g., attributes\footnote{Due to space considerations the abstract will not further discuss attributes.}, relation to the containing component)  it too has to be expressed as \code{rdfs:Class}. The actual literal value is a property of given element of type \code{cmdm:ElementValue}. For values that can be mapped to entities defined in external semantic resources, the references to these entities are expressed in parallel object properties of type \code{cmdm:hasElementEntity} (constituting outbound links). The containment relation between components and elements is expressed with a dedicated property \code{cmdm:contains}.
    154154
    155155\label{table:rdf-spec}
     
    197197\end{example3}
    198198
    199 \noindent
     199
     200\subsection{CMD profile and component definitions}
    200201This top-level classes and properties are subsequently used for modelling the actual profiles, components and elements as they are defined in the CR.
    201 For stand-alone components, the IRI is the exact path into the CR to get the RDF representation for the profile/component\furl{http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/components/clarin.eu:cr1:c\_1271859438125/rdf} . For ``inner'' components (that are defined as part of another component) and elements the identifier is a concatenation of the nearest ancestor stand-alone component's IRI and a dot-path to given component/element (e.g., Actor:\\ \code{cr:clarin.eu:cr1:c\_1271859438197/rdf\#Actor.Actor\_Languages.Actor\_Language}\footnote{For the sake of readability, in the examples we will collapse the component IRIs, refering to them just by their name, prefixed with \code{cmd:}.}.)
     202For stand-alone components, the IRI is the exact path into the CR to get the RDF representation for the profile/component\furl{http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/components/clarin.eu:cr1:c\_1271859438125/rdf}. For ``inner'' components (that are defined as part of another component) and elements the identifier is a concatenation of the nearest ancestor stand-alone component's IRI and the dot-path to given component/element (e.g., Actor:\\ \code{cr:clarin.eu:cr1:c\_1271859438197/rdf\#Actor.Actor\_Languages.Actor\_Language}\footnote{For the sake of readability, in the examples we will collapse the component IRIs, refering to them just by their name, prefixed with \code{cmd:}})
    202203
    203204\begin{example3}
     
    209210
    210211\subsubsection{Data Categories}
    211 The primary concept registry in use by CMDI for its concept links is ISOcat. The recommended approach to link to the data categories via an annotation property \cite{Windhouwer2012_LDL}.
     212The primary concept registry in use by CMDI for its concept links is ISOcat. The recommended approach to link to the data categories is via an annotation property \cite{Windhouwer2012_LDL}.
    212213
    213214\begin{example3}
     
    246247%%%%%%%%%%%%%%%%%%%%%
    247248\subsection{CMD instances}
    248 In the next step, we want to express the individual CMD instances, the metadata records.
     249In the next step, we want to express in RDF the individual CMD instances, the metadata records.
    249250
    250251We provide a generic top level class for all resources (including metadata records), the \code{cmdm:Resource} class and the  \code{cmdm:hasMimeType} predicate to type the resources.
     
    261262If identifiers are present for both resource and metadata, \end{comment}
    262263The PID of a Language Resource ( \code{<lr1>} ) is used as the IRI for the described resource in the RDF representation.
    263 The relationship between the resource and the metadata record can be expressed as an annotation using the \xne{OpenAnnotation} vocabulary\furl{http://openannotation.org/spec/core/core.html\#Motivations}.
     264The relationship between the resource and the metadata record can be expressed as an annotation using the \xne{OpenAnnotation} vocabulary\furl{http://openannotation.org/spec/core/core.html}.
    264265(Note, that one MD record can describe multiple resources. This can be also easily accommodated in OpenAnnotation.)
    265266
     
    276277
    277278\begin{example3}
    278 <lr1.cmd> & dcterms:identifier  & <lr1.cmd> ;  \\
     279\_:topComponent1 & dcterms:identifier  & <lr1.cmd> ;  \\
    279280 & dcterms:creator  & \var{\{cmd:MdCreator\}} ;  \\
    280281 & dcterms:publisher  & <http://clarin.eu> ; \\
     
    427428\section{Implementation}
    428429
    429 The transformation of profiles and instances into RDF/XML is accomplished by a set of XSL-stylesheets, that are currently being tested on a sample dataset. Once ready they will be integrated into the CMDI core infrastructure, e.g., the CR.
     430The transformation of profiles and instances into RDF/XML is accomplished by a set of XSL-stylesheets, that are currently being tested on a sample dataset. Once ready, they will be integrated into the CMDI core infrastructure, e.g., the CR.
    430431%And in the near future, a test on the instances in the complete CLARIN joint metadata domain will be performed.
    431432Once the linked data is available it has to be stored and published in a RDF triple store. We will elaborate on this aspect further in the final paper.
Note: See TracChangeset for help on using the changeset viewer.