Changeset 4829 for CMDI-Interoperability
- Timestamp:
- 03/26/14 08:27:31 (10 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
CMDI-Interoperability/CMD2RDF/trunk/docs/papers/2014-LREC/CMD2RDF.tex
r4463 r4829 71 71 matej.durco@oeaw.ac.at, menzo.windhouwer@dans.knaw.nl\\} 72 72 73 \abstract{In the european CLARIN infrastructure a growing number of resources are described with Component Metadata. In this paper we73 \abstract{In the European CLARIN infrastructure a growing number of resources are described with Component Metadata. In this paper we 74 74 describe a transformation to make this metadata available as linked data. After this first step it becomes possible to connect the CLARIN Component Metadata with other valuable knowledge sources in the Linked Data Cloud. \\ \newline \Keywords{Linked Open Data, RDF, metadata}} 75 75 … … 81 81 \section{Motivation} 82 82 % 83 Although semantic interoperability has been one of the main motivations for CLARIN's Component Metadata Infrastructure (CMDI) \cite{Broeder+2010} \furl{http://www.clarin.eu/cmdi/},until now there has been no work on the obvious -- bringing CMDI to the Semantic Web. We believe that providing the CLARIN CMD records as Linked Open Data (LOD) interlinked with external semantic resources, will open up new dimensions of processing and exploring of the CMD data by employing the power of semantic technologies. In this paper, we lay out how individual parts of the CMD data domain can be expressed in RDF and made ready to be interlinked with existing external semantic resources (ontologies, taxonomies, knowledge bases, vocabularies).83 Although semantic interoperability has been one of the main motivations for CLARIN's Component Metadata Infrastructure (CMDI) \cite{Broeder+2010},\furl{http://www.clarin.eu/cmdi/} until now there has been no work on the obvious -- bringing CMDI to the Semantic Web. We believe that providing the CLARIN CMD records as Linked Open Data (LOD) interlinked with external semantic resources, will open up new dimensions of processing and exploring of the CMD data by employing the power of semantic technologies. In this paper, we lay out how individual parts of the CMD data domain can be expressed in RDF and made ready to be interlinked with existing external semantic resources (ontologies, taxonomies, knowledge bases, vocabularies). 84 84 %This conversion lays a foundation / is groundwork for providing the original dataset as a \emph{Linked Open Data} nucleus within the \emph{Web of Data}\cite{TimBL2006} as well as for real semantic (ontology-driven) search and exploration of the data. 85 85 … … 88 88 % 89 89 90 The basic building blocks of CMDI are components. Components are used to group elements and attributes, which can take values, and also other components (see Figure \ref{fig:CMDM}). Components are stored in the Component Registry (CR), where they can be reused by other modellers. Thus a metadata modeller selects or creates components and combines them into a profile targeted at a specific resource type, a collection of resources or a project, tool or service. A profile serves as blueprint for a schema for metadata records. CLARIN centres offer these CMD records describing their resources to the joint metadata domain. There are a number of generic tools which operate on all the CMD records in this domain, e.g., the Virtual Language Observatory \furl{http://www.clarin.eu/vlo/}.These tools have to deal with the variety of CMD profiles. They can do so by operating on a semantic level, as components, elements and values can all be annotated with links to concepts in various registries. Currently used concept registries are the Dublin Core metadata %elements and90 The basic building blocks of CMDI are components. Components are used to group elements and attributes, which can take values, and also other components (see Figure \ref{fig:CMDM}). Components are stored in the Component Registry (CR), where they can be reused by other modellers. Thus a metadata modeller selects or creates components and combines them into a profile targeted at a specific resource type, a collection of resources or a project, tool or service. A profile serves as blueprint for a schema for metadata records. CLARIN centres offer these CMD records describing their resources to the joint metadata domain. There are a number of generic tools which operate on all the CMD records in this domain, e.g., the Virtual Language Observatory.\furl{http://www.clarin.eu/vlo/} These tools have to deal with the variety of CMD profiles. They can do so by operating on a semantic level, as components, elements and values can all be annotated with links to concepts in various registries. Currently used concept registries are the Dublin Core metadata %elements and 91 91 terms and the ISOcat Data Category Registry. These concept links allow profiles, while being diverse in structure, to share semantics. Generic tools can use this semantic linkage to overcome differences in terminology and also in structure. 92 92 … … 130 130 In the following a RDF encoding is proposed for all levels of the CMD data domain: 131 131 \begin{itemize} 132 \item CMD meta model ,132 \item CMD meta model (see Figure \ref{fig:CMDM}), 133 133 \item profile and component definitions, 134 134 \item administrative and structural information of CMD records and … … 138 138 \subsection{CMD specification}\label{sec:CMDM} 139 139 140 The main entity of the meta model is the CMD component modelled as a \code{rdfs:Class} . A CMD profile is basically a CMD component with some extra features, implying a specialization relation. It may seem natural to translate a CMD element to a RDF property (as it holds the literal value), but given its complexity (e.g., attributes\footnote{Due to space considerations we will not further discuss attributes.},relation to the containing component) it too has to be expressed as \code{rdfs:Class}. The actual literal value is a property of given element of type \code{cmdm:ElementValue}. For values that can be mapped to entities defined in external semantic resources, the references to these entities are expressed in parallel object properties of type \code{cmdm:hasElementEntity} (constituting outbound links). The containment relation between components and elements is expressed with a dedicated property \code{cmdm:contains}.140 The main entity of the meta model is the CMD component modelled as a \code{rdfs:Class} (see Figure \ref{fig:CMDM-RDF}). A CMD profile is basically a CMD component with some extra features, implying a specialization relation. It may seem natural to translate a CMD element to a RDF property (as it holds the literal value), but given its complexity (e.g., attributes,\footnote{Although the encoding has been done, due to space considerations, we will not further discuss attributes.} relation to the containing component) it too has to be expressed as \code{rdfs:Class}. The actual literal value is a property of given element of type \code{cmdm:ElementValue}. For values that can be mapped to entities defined in external semantic resources, the references to these entities are expressed in parallel object properties of type \code{cmdm:hasElementEntity} (constituting outbound links). The containment relation between components and elements is expressed with a dedicated property \code{cmdm:contains}. 141 141 142 142 \begin{figure*} … … 187 187 \end{center} 188 188 \caption{The CMD meta model in RDF} 189 \label{fig: final-example}189 \label{fig:CMDM-RDF} 190 190 \end{figure*} 191 191 192 192 \subsection{CMD profile and component definitions} 193 Th istop-level classes and properties are subsequently used for modelling the actual profiles, components and elements as they are defined in the CR.194 For stand-alone components, the IRI is the (future) path into the CR to get the RDF representation for the profile/component \furl{http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/components/clarin.eu:cr1:c\_1271859438125/rdf}.For ``inner'' components (that are defined as part of another component) and elements the identifier is a concatenation of the nearest ancestor stand-alone component's IRI and the dot-path to given component/element (e.g., Actor:\\ \code{cr:clarin.eu:cr1:c\_1271859438197/rdf \#Actor.Actor\_Languages.Actor\_Language}\footnote{For the sake of readability, in the examples we will collapse the component IRIs, refering to them just by their name, prefixed with \code{cmd:}})193 These top-level classes and properties are subsequently used for modelling the actual profiles, components and elements as they are defined in the CR. 194 For stand-alone components, the IRI is the (future) path into the CR to get the RDF representation for the profile/component.\furl{http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/components/clarin.eu:cr1:c_1271859438125/rdf} For ``inner'' components (that are defined as part of another component) and elements the identifier is a concatenation of the nearest ancestor stand-alone component's IRI and the dot-path to given component/element (e.g., Actor:\\ \code{cr:clarin.eu:cr1:c\_1271859438197/rdf \#Actor.Actor\_Languages.Actor\_Language}\footnote{For the sake of readability, in the examples we will collapse the component IRIs, refering to them just by their name, prefixed with \code{cmd:}}) 195 195 196 196 \begin{example2} … … 259 259 If identifiers are present for both resource and metadata, \end{comment} 260 260 The PID of a Language Resource ( \code{<lr1>} ) is used as the IRI for the described resource in the RDF representation. 261 The relationship between the resource and the metadata record can be expressed as an annotation using the \xne{OpenAnnotation} vocabulary \furl{http://openannotation.org/spec/core/core.html}.261 The relationship between the resource and the metadata record can be expressed as an annotation using the \xne{OpenAnnotation} vocabulary.\furl{http://openannotation.org/spec/core/core.html} 262 262 (Note, that one MD record can describe multiple resources. This can be also easily accommodated in OpenAnnotation.) 263 263 … … 284 284 \subsubsection{Collection hierarchy} % ( Resource Proxy â IsPartOf)} 285 285 286 In CMD, there are dedicated generic elements -- the \code{cmd:ResourceProxyList} structure -- used to express both the collection hierarchy and to point to resource(s) described by the CMD record. The collection hierarchy can be modelled as an \xne{OAI-ORE Aggregation} \furl{http://www.openarchives.org/ore/1.0/primer#Foundations}.(The links to resources are handled by \code{oa:hasTarget}.)286 In CMD, there are dedicated generic elements -- the \code{cmd:ResourceProxyList} structure -- used to express both the collection hierarchy and to point to resource(s) described by the CMD record. The collection hierarchy can be modelled as an \xne{OAI-ORE Aggregation}.\furl{http://www.openarchives.org/ore/1.0/primer#Foundations} (The links to resources are handled by \code{oa:hasTarget}.) 287 287 : 288 288 … … 333 333 cmd:Person.Organisation & a & cmdm:Element . \\ 334 334 cmd:hasPerson.OrganisationElementValue \\ 335 & rdfs:subProper yOf & cmdm:hasElementValue ; \\335 & rdfs:subPropertyOf & cmdm:hasElementValue ; \\ 336 336 & rdfs:domain & cmd:Person.Organisation ; \\ 337 337 & rdfs:range & xs:string . \\ 338 338 cmd:hasPerson.OrganisationElementEntity \\ 339 & rdfs:subProper yOf & cmdm:hasElementEntity ; \\339 & rdfs:subPropertyOf & cmdm:hasElementEntity ; \\ 340 340 & rdfs:domain & cmd:Person.Organisation ; \\ 341 341 & rdfs:range & cmd:Person.OrganisationElementEntity .\\ … … 350 350 & \multicolumn{2}{l}{ cmd:hasPerson.OrganisationElementEntity \quad \textless http://www.mpi.nl/\textgreater . }\\ 351 351 352 \textless http://www.mpi.nl/\textgreater & a & cmd:OrganisationElementEn ity .352 \textless http://www.mpi.nl/\textgreater & a & cmd:OrganisationElementEntity . 353 353 \end{example3} 354 354 \end{center} … … 445 445 446 446 In the broader context of LOD Cloud there is the Open Knowledge Foundationâs Working Group on Linked Data in Linguistics, that represents an obvious pool of candidate 447 datasets to link the CMD data with \footnote{\url{http://linguistics.okfn.org/resources/llod/}}.Within these \xne{lexvo} seems a most promising starting point, as it features URIs like \url{http://lexvo.org/id/term/eng/}, i.e. based on the ISO-639-3 language identifiers which are also used in CMD records.447 datasets to link the CMD data with.\furl{http://linguistics.okfn.org/resources/llod/} Within these \xne{lexvo} seems a most promising starting point, as it features URIs like \url{http://lexvo.org/id/term/eng/}, i.e. based on the ISO-639-3 language identifiers which are also used in CMD records. 448 448 \xne{lexvo} also seems suitable as it is already linked with a number of other LOD linguistic datasets like \xne{WALS}, \xne{lingvoj} and \xne{Glottolog}. 449 449 Of course, language is just one dimension to use for mapping.
Note: See TracChangeset
for help on using the changeset viewer.