Changeset 3872 for CMDI-Interoperability
- Timestamp:
- 10/24/13 10:48:52 (11 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
CMDI-Interoperability/CMD2RDF/trunk/docs/papers/2014-LREC/CMD2RDF.tex
r3871 r3872 81 81 \section{Motivation} 82 82 % 83 Although semantic interoperability has been one of the main motivations for CLARIN Component Metadata Infrastructure , until now there has been no work on the obvious -- bringing CMDI to Semantic Web. We believe that providing the whole of CMD data as Linked Open Datainterlinked with external semantic resources, will open up new dimensions of processing and exploring of the CMD data by employing the power of semantic technologies. In this paper, we lay out how individual parts of the CMD framework can be expressed in RDF and interlinked with existing external semantic resources (ontologies, knowledge bases, vocabularies).83 Although semantic interoperability has been one of the main motivations for CLARIN Component Metadata Infrastructure (CMDI), until now there has been no work on the obvious -- bringing CMDI to Semantic Web. We believe that providing the whole of CMD data as Linked Open Data (LOD) interlinked with external semantic resources, will open up new dimensions of processing and exploring of the CMD data by employing the power of semantic technologies. In this paper, we lay out how individual parts of the CMD framework can be expressed in RDF and interlinked with existing external semantic resources (ontologies, knowledge bases, vocabularies). 84 84 %This conversion lays a foundation / is groundwork for providing the original dataset as a \emph{Linked Open Data} nucleus within the \emph{Web of Data}\cite{TimBL2006} as well as for real semantic (ontology-driven) search and exploration of the data. 85 85 … … 88 88 % 89 89 90 The basic building blocks of CMDI are components. Components are used to group elements and attributes, which can take values, and also other components . Components are stored in the Component Registry (CR), where they can be reused by other modellers. Thus a metadata modeller selects or creates components and combines them into a profile targeted at a specific resource type, a collection of resources or a project, tool or service. A profile is a blueprint of a schema for a metadata record. CLARIN centres offer these CMD records describing their resources to the joint metadata domain. There are a number of generic tools which operate on all the CMD records in this domain, e.g., the Virtual Language Observatory. These tools have to deal with the variety of CMD profiles. They can do so by operating on a semantic level, as components, elements and values can all be annotated with links to concepts in various registries. Currently used concept registries are the Dublin Core metadata %elements and90 The basic building blocks of CMDI are components. Components are used to group elements and attributes, which can take values, and also other components (see Figure \ref{fig:CMDM}). Components are stored in the Component Registry (CR), where they can be reused by other modellers. Thus a metadata modeller selects or creates components and combines them into a profile targeted at a specific resource type, a collection of resources or a project, tool or service. A profile is a blueprint of a schema for a metadata record. CLARIN centres offer these CMD records describing their resources to the joint metadata domain. There are a number of generic tools which operate on all the CMD records in this domain, e.g., the Virtual Language Observatory\furl{http://www.clarin.eu/vlo/}. These tools have to deal with the variety of CMD profiles. They can do so by operating on a semantic level, as components, elements and values can all be annotated with links to concepts in various registries. Currently used concept registries are the Dublin Core metadata %elements and 91 91 terms and the ISOcat Data Category Registry. These concept links allow profiles, while being diverse in structure, to share semantics. Generic tools can use this semantic linkage to overcome differences in terminology and also in structure. 92 92 … … 122 122 16 of the providers offer CMDI records, the other 53 provide around 140.000 OLAC/DC records, that are converted into the corresponding CMD profile. 123 123 %Next to these 81.226 original OLAC records, there a few providers offering their OLAC or DCMI-terms records already converted into CMDI, thus all in all OLAC, DCMI-terms records amount to 139.152. 124 On the other hand, some of the comparatively few providers of `native' CMD records expose multiple profiles (e.g. Meertens Institute uses 12 different ones), so that overall instance data for more than 60 profiles is present. 124 %On the other hand, some 125 Some of the comparatively few providers of `native' CMD records expose multiple profiles (e.g. Meertens Institute uses 12 different ones), so that overall instance data for more than 60 profiles is present. 125 126 %So we encounter both situations: one profile being used by many providers and one provider using many profiles. 126 127 … … 132 133 datasets to link the CMD data with\footnote{\url{http://linguistics.okfn.org/resources/llod/}}. 133 134 Within these \xne{lexvo} seems most promising starting point, as it features URIs like \url{http://lexvo.org/id/term/eng/}, i.e. based on the ISO-639-3 language identifiers which are also used in CMD records. 134 \xne{lexvo} also seems suitable as it is already linked with a number of LDL datasets among others \xne{WALS}, \xne{lingvoj},\xne{Glottolog}.135 \xne{lexvo} also seems suitable as it is already linked with a number of other LOD linguistic datasets like \xne{WALS}, \xne{lingvoj} and \xne{Glottolog}. 135 136 Of course, language is just one dimension to use for mapping. 136 137 Step by step we will link other categories like countries, geographica, organisations, etc. … … 142 143 In the following a RDF encoding is proposed for all levels of the CMD data domain: 143 144 \begin{itemize} 144 \item CMD meta model145 \item profile definitions146 \item the administrative and structural information of CMD records 147 \item individual values in the fields of the CMD records145 \item the CMD meta model, 146 \item the profile definitions, 147 \item the administrative and structural information of CMD records, and 148 \item the individual values in the fields of the CMD records. 148 149 \end{itemize} 149 150 150 151 \subsection{CMD specification}\label{sec:CMDM} 151 152 152 The main entity of the meta model is the CMD component modelled as a \code{rdfs:Class}. A CMD profile is basically a CMD component with some extra features, implying a specialization relation. It may seem natural to translate a CMD element to a RDF property (as it holds the literal value), but given its complexity (e.g. attributes\footnote{Due to space considerations the abstract will not further discuss attributes.}, relation to the containing component) it too has to be a \code{rdfs:Class}. The actual literal value is a property of given element of type \code{cmdm:ElementValue}. For values that can be mapped to entities defined in external semantic resources, the references to these entities are expressed in parallel object properties of type \code{cmdm:hasElementEntity} (outbound links). The containment relation between components and elements is expressed with a dedicated property \code{cmdm:contains}.153 The main entity of the meta model is the CMD component modelled as a \code{rdfs:Class}. A CMD profile is basically a CMD component with some extra features, implying a specialization relation. It may seem natural to translate a CMD element to a RDF property (as it holds the literal value), but given its complexity (e.g., attributes\footnote{Due to space considerations the abstract will not further discuss attributes.}, relation to the containing component) it too has to be a \code{rdfs:Class}. The actual literal value is a property of given element of type \code{cmdm:ElementValue}. For values that can be mapped to entities defined in external semantic resources, the references to these entities are expressed in parallel object properties of type \code{cmdm:hasElementEntity} (outbound links). The containment relation between components and elements is expressed with a dedicated property \code{cmdm:contains}. 153 154 154 155 \label{table:rdf-spec} … … 197 198 198 199 \noindent 199 This top-level are subsequently used for modelling the actual profiles, components and elements as they are defined in the CR.200 This top-level classes and properties are subsequently used for modelling the actual profiles, components and elements as they are defined in the CR. 200 201 For stand-alone components, the IRI is the exact path into the CR to get the RDF representation for the profile/component\furl{http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/components/clarin.eu:cr1:c\_1271859438125/rdf} . For ``inner'' components (that are defined as part of another component) and elements the identifier is a concatenation of the nearest ancestor stand-alone component's IRI and a dot-path to given component/element (e.g., Actor:\\ \code{cr:clarin.eu:cr1:c\_1271859438197/rdf\#Actor.Actor\_Languages.Actor\_Language}\footnote{For the sake of readability, in the examples we will collapse the component IRIs, refering to them just by their name, prefixed with \code{cmd:}.}.) 201 202 … … 272 273 \subsubsection{Provenance} 273 274 274 The information from CMD record \code{cmd:Header} represents the provenance information about the modelled data:275 The information from the CMD record \code{cmd:Header} represents the provenance information about the modelled data. 275 276 276 277 \begin{example3} … … 283 284 \subsubsection{Collection hierarchy} % ( Resource Proxy â IsPartOf)} 284 285 285 In CMD, there are dedicated generic elements -- the \code{cmd:ResourceProxyList} structure -- used to express both the collection hierarchy and to point to resource(s) described by the CMD record. The collection hierarchy can be modelled as \xne{OAI-ORE Aggregation}\furl{http://www.openarchives.org/ore/1.0/primer\#Foundations}(The links to resources are handled by \code{oa:hasTarget}.)286 In CMD, there are dedicated generic elements -- the \code{cmd:ResourceProxyList} structure -- used to express both the collection hierarchy and to point to resource(s) described by the CMD record. The collection hierarchy can be modelled as an \xne{OAI-ORE Aggregation}\furl{http://www.openarchives.org/ore/1.0/primer\#Foundations}. (The links to resources are handled by \code{oa:hasTarget}.) 286 287 : 287 288 … … 326 327 327 328 \subsubsection{Elements, Fields, Values} 328 Finally, we want to integrate also the actual field values in the CMD records into the ontology.329 As explained before, CMD elements have to be typed as \code{rdfs:Class}, the actual value expressed as \code{cmds:ElementValue} property and the corresponding data category expressed as annotationproperty.329 Finally, we want to integrate also the actual field values in the CMD records into the linked data. 330 As explained before, CMD elements have to be typed as \code{rdfs:Class}, the actual value expressed as \code{cmds:ElementValue}, and they are related bua a \code{cmdm:hasElementValue} property. 330 331 331 332 While generating triples with literal values seems straightforward, the more challenging but also more valuable aspect is to generate object property triples (predicate \code{cmdm:hasElementEntity}) with the literal values mapped to semantic entities. Following example shows the whole chain of statements from metamodel to literal value and corresponding semantic entity. … … 344 345 & rdfs:domain & cmd:Organisation ; \\ 345 346 & rdfs:range & cmd:OrganisationElementEntity .\\ 347 cmd:OrganisationElementEntity \\ 348 & a & cmdm:Entity . \\ 346 349 \\ 347 350 \multicolumn{3}{l}{\# person (mentioned in a MD record) has an affiliation (cmd:Person/cmd:Organisation) } \\ … … 350 353 \_:org & a & cmd:Person.Organisation ; \\ 351 354 & \multicolumn{2}{l}{cmd:hasOrganisationElementValue \quad 'MPI'\^{}\^{}xs:string ;} \\ 352 & \multicolumn{2}{l}{ cmd:hasOrganisationElementEntity \quad <http:// mpi.nl> . }\\353 354 <http:// mpi.nl> & a & cmd:OrganisationElementEnity .355 & \multicolumn{2}{l}{ cmd:hasOrganisationElementEntity \quad <http://www.mpi.nl/> . }\\ 356 357 <http://www.mpi.nl/> & a & cmd:OrganisationElementEnity . 355 358 \end{example3} 356 359 … … 434 437 In this abstract, we sketched the work on encoding of the whole of the CMD data domain in RDF, with special focus on the core model -- the general component schema. In the full paper we will also elaborate on the task of mapping element values to semantic entities. Additionally, some technical considerations will be discussed regarding exposing this dataset as Linked Open Data. 435 438 436 With this new enhanced dataset, the groundwork is laid for a full-blown \emph{semantic search}, i.e. the possibility of exploring the dataset indirectly using external semantic resources (like vocabularies of organizations or taxonomies of resource types) to which the CMD data will then be linked to.439 With this new enhanced dataset, the groundwork is laid for a full-blown \emph{semantic search}, i.e., the possibility of exploring the dataset indirectly using external semantic resources (like vocabularies of organizations or taxonomies of resource types) to which the CMD data will then be linked. 437 440 438 441 \bibliographystyle{splncs}
Note: See TracChangeset
for help on using the changeset viewer.