Changeset 3883 for CMDI-Interoperability
- Timestamp:
- 10/24/13 12:27:38 (11 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
CMDI-Interoperability/CMD2RDF/trunk/docs/papers/2014-LREC/CMD2RDF.tex
r3872 r3883 88 88 % 89 89 90 The basic building blocks of CMDI are components. Components are used to group elements and attributes, which can take values, and also other components (see Figure \ref{fig:CMDM}). Components are stored in the Component Registry (CR), where they can be reused by other modellers. Thus a metadata modeller selects or creates components and combines them into a profile targeted at a specific resource type, a collection of resources or a project, tool or service. A profile is a blueprint of a schema for a metadata record. CLARIN centres offer these CMD records describing their resources to the joint metadata domain. There are a number of generic tools which operate on all the CMD records in this domain, e.g., the Virtual Language Observatory\furl{http://www.clarin.eu/vlo/}. These tools have to deal with the variety of CMD profiles. They can do so by operating on a semantic level, as components, elements and values can all be annotated with links to concepts in various registries. Currently used concept registries are the Dublin Core metadata %elements and90 The basic building blocks of CMDI are components. Components are used to group elements and attributes, which can take values, and also other components (see Figure \ref{fig:CMDM}). Components are stored in the Component Registry (CR), where they can be reused by other modellers. Thus a metadata modeller selects or creates components and combines them into a profile targeted at a specific resource type, a collection of resources or a project, tool or service. A profile serves as blueprint for a schema for metadata records. CLARIN centres offer these CMD records describing their resources to the joint metadata domain. There are a number of generic tools which operate on all the CMD records in this domain, e.g., the Virtual Language Observatory\furl{http://www.clarin.eu/vlo/}. These tools have to deal with the variety of CMD profiles. They can do so by operating on a semantic level, as components, elements and values can all be annotated with links to concepts in various registries. Currently used concept registries are the Dublin Core metadata %elements and 91 91 terms and the ISOcat Data Category Registry. These concept links allow profiles, while being diverse in structure, to share semantics. Generic tools can use this semantic linkage to overcome differences in terminology and also in structure. 92 92 … … 130 130 % 131 131 The main added value of LOD\cite{TimBL2006} is the interconnecting of disparate datasets. 132 In the broader context of LOD , there is meanwhile an Open Knowledge Foundationâs Working Group on OpenData in Linguistics, that represents an obvious pool of candidate132 In the broader context of LOD there is meanwhile an Open Knowledge Foundationâs Working Group on Linked Data in Linguistics, that represents an obvious pool of candidate 133 133 datasets to link the CMD data with\footnote{\url{http://linguistics.okfn.org/resources/llod/}}. 134 134 Within these \xne{lexvo} seems most promising starting point, as it features URIs like \url{http://lexvo.org/id/term/eng/}, i.e. based on the ISO-639-3 language identifiers which are also used in CMD records. … … 137 137 Step by step we will link other categories like countries, geographica, organisations, etc. 138 138 to some of the central nodes of the LOD cloud \cite{Cyganiak2010}, like \xne{dbpedia}, \xne{Yago} or \xne{geonames}, 139 but also domain-specific semantic resource like the ontology for language technology \xne{LT-World} \cite{Joerg2010} developed at DFKI.139 but also to domain-specific semantic resource like the ontology for language technology \xne{LT-World} \cite{Joerg2010} developed at DFKI. 140 140 141 141 \section{CMD to RDF} … … 143 143 In the following a RDF encoding is proposed for all levels of the CMD data domain: 144 144 \begin{itemize} 145 \item theCMD meta model,146 \item the profiledefinitions,147 \item the administrative and structural information of CMD records,and148 \item theindividual values in the fields of the CMD records.145 \item CMD meta model, 146 \item profile and component definitions, 147 \item administrative and structural information of CMD records and 148 \item individual values in the fields of the CMD records. 149 149 \end{itemize} 150 150 151 151 \subsection{CMD specification}\label{sec:CMDM} 152 152 153 The main entity of the meta model is the CMD component modelled as a \code{rdfs:Class}. A CMD profile is basically a CMD component with some extra features, implying a specialization relation. It may seem natural to translate a CMD element to a RDF property (as it holds the literal value), but given its complexity (e.g., attributes\footnote{Due to space considerations the abstract will not further discuss attributes.}, relation to the containing component) it too has to be a \code{rdfs:Class}. The actual literal value is a property of given element of type \code{cmdm:ElementValue}. For values that can be mapped to entities defined in external semantic resources, the references to these entities are expressed in parallel object properties of type \code{cmdm:hasElementEntity} (outbound links). The containment relation between components and elements is expressed with a dedicated property \code{cmdm:contains}.153 The main entity of the meta model is the CMD component modelled as a \code{rdfs:Class}. A CMD profile is basically a CMD component with some extra features, implying a specialization relation. It may seem natural to translate a CMD element to a RDF property (as it holds the literal value), but given its complexity (e.g., attributes\footnote{Due to space considerations the abstract will not further discuss attributes.}, relation to the containing component) it too has to be expressed as \code{rdfs:Class}. The actual literal value is a property of given element of type \code{cmdm:ElementValue}. For values that can be mapped to entities defined in external semantic resources, the references to these entities are expressed in parallel object properties of type \code{cmdm:hasElementEntity} (constituting outbound links). The containment relation between components and elements is expressed with a dedicated property \code{cmdm:contains}. 154 154 155 155 \label{table:rdf-spec} … … 197 197 \end{example3} 198 198 199 \noindent 199 200 \subsection{CMD profile and component definitions} 200 201 This top-level classes and properties are subsequently used for modelling the actual profiles, components and elements as they are defined in the CR. 201 For stand-alone components, the IRI is the exact path into the CR to get the RDF representation for the profile/component\furl{http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/components/clarin.eu:cr1:c\_1271859438125/rdf} . For ``inner'' components (that are defined as part of another component) and elements the identifier is a concatenation of the nearest ancestor stand-alone component's IRI and a dot-path to given component/element (e.g., Actor:\\ \code{cr:clarin.eu:cr1:c\_1271859438197/rdf\#Actor.Actor\_Languages.Actor\_Language}\footnote{For the sake of readability, in the examples we will collapse the component IRIs, refering to them just by their name, prefixed with \code{cmd:}.}.)202 For stand-alone components, the IRI is the exact path into the CR to get the RDF representation for the profile/component\furl{http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/components/clarin.eu:cr1:c\_1271859438125/rdf}. For ``inner'' components (that are defined as part of another component) and elements the identifier is a concatenation of the nearest ancestor stand-alone component's IRI and the dot-path to given component/element (e.g., Actor:\\ \code{cr:clarin.eu:cr1:c\_1271859438197/rdf\#Actor.Actor\_Languages.Actor\_Language}\footnote{For the sake of readability, in the examples we will collapse the component IRIs, refering to them just by their name, prefixed with \code{cmd:}}) 202 203 203 204 \begin{example3} … … 209 210 210 211 \subsubsection{Data Categories} 211 The primary concept registry in use by CMDI for its concept links is ISOcat. The recommended approach to link to the data categories via an annotation property \cite{Windhouwer2012_LDL}.212 The primary concept registry in use by CMDI for its concept links is ISOcat. The recommended approach to link to the data categories is via an annotation property \cite{Windhouwer2012_LDL}. 212 213 213 214 \begin{example3} … … 246 247 %%%%%%%%%%%%%%%%%%%%% 247 248 \subsection{CMD instances} 248 In the next step, we want to express the individual CMD instances, the metadata records.249 In the next step, we want to express in RDF the individual CMD instances, the metadata records. 249 250 250 251 We provide a generic top level class for all resources (including metadata records), the \code{cmdm:Resource} class and the \code{cmdm:hasMimeType} predicate to type the resources. … … 261 262 If identifiers are present for both resource and metadata, \end{comment} 262 263 The PID of a Language Resource ( \code{<lr1>} ) is used as the IRI for the described resource in the RDF representation. 263 The relationship between the resource and the metadata record can be expressed as an annotation using the \xne{OpenAnnotation} vocabulary\furl{http://openannotation.org/spec/core/core.html \#Motivations}.264 The relationship between the resource and the metadata record can be expressed as an annotation using the \xne{OpenAnnotation} vocabulary\furl{http://openannotation.org/spec/core/core.html}. 264 265 (Note, that one MD record can describe multiple resources. This can be also easily accommodated in OpenAnnotation.) 265 266 … … 276 277 277 278 \begin{example3} 278 <lr1.cmd>& dcterms:identifier & <lr1.cmd> ; \\279 \_:topComponent1 & dcterms:identifier & <lr1.cmd> ; \\ 279 280 & dcterms:creator & \var{\{cmd:MdCreator\}} ; \\ 280 281 & dcterms:publisher & <http://clarin.eu> ; \\ … … 427 428 \section{Implementation} 428 429 429 The transformation of profiles and instances into RDF/XML is accomplished by a set of XSL-stylesheets, that are currently being tested on a sample dataset. Once ready they will be integrated into the CMDI core infrastructure, e.g., the CR.430 The transformation of profiles and instances into RDF/XML is accomplished by a set of XSL-stylesheets, that are currently being tested on a sample dataset. Once ready, they will be integrated into the CMDI core infrastructure, e.g., the CR. 430 431 %And in the near future, a test on the instances in the complete CLARIN joint metadata domain will be performed. 431 432 Once the linked data is available it has to be stored and published in a RDF triple store. We will elaborate on this aspect further in the final paper.
Note: See TracChangeset
for help on using the changeset viewer.