Changeset 4833 for CMDI-Interoperability
- Timestamp:
- 03/28/14 09:39:11 (10 years ago)
- Location:
- CMDI-Interoperability/CMD2RDF/trunk/docs/papers/2014-LREC
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
CMDI-Interoperability/CMD2RDF/trunk/docs/papers/2014-LREC/CMD2RDF.bib
r4463 r4833 619 619 } 620 620 621 @incollection{Zinn+2012, 622 year={2012}, 623 isbn={978-3-642-30283-1}, 624 booktitle={The Semantic Web: Research and Applications}, 625 volume={7295}, 626 series={Lecture Notes in Computer Science}, 627 editor={Simperl, Elena and Cimiano, Philipp and Polleres, Axel and Corcho, Oscar and Presutti, Valentina}, 628 doi={10.1007/978-3-642-30284-8_26}, 629 title={The ISOcat Registry Reloaded}, 630 url={http://dx.doi.org/10.1007/978-3-642-30284-8_26}, 631 publisher={Springer Berlin Heidelberg}, 632 author={Zinn, Claus and Hoppermann, Christina and Trippel, Thorsten}, 633 pages={285-299} 634 } 635 621 636 @comment{jabref-meta: selector_publisher:} 622 637 -
CMDI-Interoperability/CMD2RDF/trunk/docs/papers/2014-LREC/CMD2RDF.tex
r4829 r4833 72 72 73 73 \abstract{In the European CLARIN infrastructure a growing number of resources are described with Component Metadata. In this paper we 74 describe a transformation to make this metadata available as linked data. After this first step it becomes possible to connect the CLARIN Component Metadata with other valuable knowledge sources in the Linked Data Cloud. \\ \newline \Keywords{Linked Open Data, RDF, metadata}}74 describe a transformation to make this metadata available as linked data. After this first step it becomes possible to connect the CLARIN Component Metadata with other valuable knowledge sources in the Linked Data Cloud. \\ \newline \Keywords{Linked Open Data, RDF, component metadata}} 75 75 76 76 % … … 107 107 108 108 \subsubsection{CMD Profiles } 109 Currently 146 public profiles and 857components are defined in the CR.109 Currently\footnote{All numbers are as of 2014-03.} 153 public profiles and 859 components are defined in the CR. 110 110 Next to the `native' ones a number of profiles have been created that implement existing metadata formats, like OLAC/DCMI-terms, TEI Header or the META-SHARE schema. 111 111 %The resulting profiles proof the flexibility/expressi\-vi\-ty of the CMD metamodel. 112 The individual profiles differ alsovery much in their structure -- next to simple flat profiles112 The individual profiles also differ very much in their structure -- next to simple flat profiles 113 113 %with just one level of components or elements with 5 to 20 fields (\textit{dublincore}, \textit{collection}, the set of \textit{Bamdes}-profiles) 114 114 there are complex ones with up to 10 levels %(\textit{ExperimentProfile}, profiles for describing Web Services) … … 119 119 120 120 The main CLARIN OAI-PMH harvester\footnote{\url{http://catalog.clarin.eu/oai-harvester/}} 121 regularly collects records from the -- currently 5 8-- providers, all in all over 600.000 records.122 Some 20 of the providers offer CMDI records, the rest provides around 140.000 OLAC/DC records, that are converted into the corresponding CMD profile.121 regularly collects records from the -- currently 56 -- providers, all in all over 600.000 records. 122 Some 20 of the providers offer CMDI records, the rest provides around 44.000 OLAC/DC records, that are converted into the corresponding CMD profile. 123 123 %Next to these 81.226 original OLAC records, there a few providers offering their OLAC or DCMI-terms records already converted into CMDI, thus all in all OLAC, DCMI-terms records amount to 139.152. 124 124 %On the other hand, some 125 Some of the comparatively fewproviders of `native' CMD records expose multiple profiles (e.g. Meertens Institute uses 12 different ones), so that overall instance data for more than 60 profiles is present.125 Some of the providers of `native' CMD records expose multiple profiles (e.g. Meertens Institute uses 12 different ones), so that overall instance data for more than 60 profiles is present. 126 126 %So we encounter both situations: one profile being used by many providers and one provider using many profiles. 127 127 … … 138 138 \subsection{CMD specification}\label{sec:CMDM} 139 139 140 The main entity of the meta model is the CMD component modelled as a \code{rdfs:Class} (see Figure \ref{fig:CMDM-RDF}). A CMD profile is basically a CMD component with some extra features, implying a specialization relation. It may seem natural to translate a CMD element to a RDF property (as it holds the literal value), but given its complexity (e.g., attributes,\footnote{Although the encoding has been done, due to space considerations, we will not further discuss attributes.} relation to the containing component) it too has to be expressed as \code{rdfs:Class}. The actual literal value is a property of given element of type \code{cmdm:ElementValue}. For values that can be mapped to entities defined in external semantic resources, the references to these entities are expressed in parallel object properties of type \code{cmdm:hasElementEntity} (constituting outbound links). The containment relation between components and elements is expressed with a dedicated property \code{cmdm:contains}.140 The main entity of the meta model is the CMD component modelled as a \code{rdfs:Class} (see Figure \ref{fig:CMDM-RDF}). A CMD profile is basically a CMD component with some extra features, implying a specialization relation. It may seem natural to translate a CMD element to a RDF property (as it holds the literal value), but given its complexity, i.e., attributes,\footnote{Although the modelling work has been done, due to space considerations, we will not further discuss attributes.} it too has to be expressed as a \code{rdfs:Class}. The actual literal value is a property of given element of type \code{cmdm:hasElementValue}. For values that can be mapped to entities defined in external semantic resources, the references to these entities are expressed in parallel object properties of type \code{cmdm:hasElementEntity} (constituting outbound links). The containment relation between components and elements is expressed with a dedicated property \code{cmdm:contains}. 141 141 142 142 \begin{figure*} … … 192 192 \subsection{CMD profile and component definitions} 193 193 These top-level classes and properties are subsequently used for modelling the actual profiles, components and elements as they are defined in the CR. 194 For stand-alone components, the IRI is the (future) path into the CR to get the RDF representation for the profile/component.\furl{http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/components/clarin.eu:cr1:c_1271859438125/rdf} For ``inner'' components (that are defined as part of another component) and elements the identifier is a concatenation of the nearest ancestor stand-alone component's IRI and the dot-path to given component/element (e.g., Actor:\\ \code{cr:clarin.eu:cr1:c\_1271859438197/rdf \#Actor.Actor\_Languages.Actor\_Language}\footnote{For the sake of readability, in the examples we will collapse the component IRIs, refering to them just by their name, prefixed with \code{cmd:}})194 For stand-alone components, the IRI is the (future) path into the CR to get the RDFS representation for the profile/component.\furl{http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/components/clarin.eu:cr1:c_1271859438125/rdf} For ``inner'' components (that are defined as part of another component) and elements the identifier is a concatenation of the nearest ancestor stand-alone component's IRI and the dot-path to given component/element (e.g., Actor:\\ \code{cr:clarin.eu:cr1:c\_1271859438197/rdf \#Actor.Actor\_Languages.Actor\_Language}\footnote{For the sake of readability, in the examples we will collapse the component IRIs, refering to them just by their name, prefixed with \code{cmd:}}) 195 195 196 196 \begin{example2} … … 204 204 205 205 \subsubsection{Data Categories} 206 The primary concept registry in use by CMDI for its concept linksis ISOcat. The recommended approach to link to the data categories is via an annotation property \cite{Windhouwer2012_LDL}.206 The primary concept registry in use by CMDI is ISOcat. The recommended approach to link to the data categories is via an annotation property \cite{Windhouwer2012_LDL}. 207 207 208 208 \begin{example2} … … 221 221 \end{example2} 222 222 223 Lateron, this information can be used, e.g., in combination with ontological relationships for these data categories available in the RELcat Relation Registry \cite{WINDHOUWER12.954}, to map to other vocabularies. 224 223 225 %\subsection{RELcat - Ontological relations} 224 226 % \commentx{for now we could probably skip all of relcat (although it is the future of semantic mapping ;) - we spare something for the next paper.} … … 289 291 \begin{example3} 290 292 \textless lr0.cmd \textgreater & a & ore:ResourceMap . \\ 291 \textless lr0.cmd\textgreater & ore:describes & \ textless lr0.agg\textgreater. \\292 \ textless lr0.agg\textgreater& a & ore:Aggregation ; \\293 \textless lr0.cmd\textgreater & ore:describes & \_:agg0 . \\ 294 \_:agg0 & a & ore:Aggregation ; \\ 293 295 & ore:aggregates & \textless lr1.cmd\textgreater, \textless lr2.cmd\textgreater . \\ 294 296 \end{example3} … … 308 310 309 311 \subsubsection{Components -- nested structures} 310 For expressing the tree structure of the CMD records, i.e. the containment relation between the componentsa dedicated property \code{cmd:contains} is used:312 For expressing the tree structure of the CMD records, i.e., the containment relation between the components, a dedicated property \code{cmd:contains} is used: 311 313 312 314 \begin{example3} … … 359 361 360 362 \subsubsection{Elements, Fields, Values}\label{sec:values} 361 Finally, we want to integrate also the actual f ieldvalues in the CMD records into the linked data.362 As explained before, CMD elements have to be typed as \code{rdfs:Class}, the actual value expressed as \code{cmd s:ElementValue}, and they are related by a \code{cmdm:hasElementValue} property.363 Finally, we want to integrate also the actual fvalues in the CMD records into the linked data. 364 As explained before, CMD elements have to be typed as \code{rdfs:Class}, the actual value expressed as \code{cmdm:ElementValue}, and they are related by a \code{cmdm:hasElementValue} property. 363 365 364 366 While generating triples with literal values seems straightforward, the more challenging but also more valuable aspect is to generate object property triples (predicate \code{cmdm:hasElementEntity}) with the literal values mapped to semantic entities. The example in Figure \ref{fig:final-example} shows the whole chain of statements from metamodel to literal value and corresponding semantic entity. … … 442 444 The main added value of LOD \cite{TimBL2006} is the interconnecting of disparate datasets in the so called LOD cloud \cite{Cyganiak2010}. 443 445 444 The actual mapping process from CMDI values (see Section \ref{sec:values}) to entities is a complex and challenging task. The main idea is to find entities in selected reference datasets (controlled vocabularies, ontologies) corresponding to the literal values in the metadata records. The obtained entity identifiers are further used to generate new RDF triples, representing outbound links. 445 446 In the broader context of LOD Cloud there is the Open Knowledge Foundationâs Working Group on Linked Data in Linguistics, that represents an obvious pool of candidate 446 The actual mapping process from CMDI values (see Section \ref{sec:values}) to entities is a complex and challenging task. The main idea is to find entities in selected reference datasets (controlled vocabularies, ontologies) corresponding to the literal values in the metadata records. The obtained entity identifiers are further used to generate new RDF triples, representing outbound links. Within CMDI the SKOS-based vocabulary service CLAVAS,\furl{https://openskos.meertens.knaw.nl/} which will be supported in the upcoming new version of CMDI, can be used as a starting point, e.g., for organisations. In the broader context of LOD Cloud there is the Open Knowledge Foundationâs Working Group on Linked Data in Linguistics, that represents an obvious pool of candidate 447 447 datasets to link the CMD data with.\furl{http://linguistics.okfn.org/resources/llod/} Within these \xne{lexvo} seems a most promising starting point, as it features URIs like \url{http://lexvo.org/id/term/eng/}, i.e. based on the ISO-639-3 language identifiers which are also used in CMD records. 448 448 \xne{lexvo} also seems suitable as it is already linked with a number of other LOD linguistic datasets like \xne{WALS}, \xne{lingvoj} and \xne{Glottolog}. … … 452 452 but also to domain-specific semantic resource like the ontology for language technology \xne{LT-World} \cite{Joerg2010} developed at DFKI. 453 453 454 Next to entities also predicates can be shared across datasets. The CMD Infrastructure already provides facilities in the form of ISOcat and RELcat. RELcat, for example, has already sets to relate data categories to Dublin Core terms. This can be extended with the ontology for metadata concepts described in \cite{Zinn+2012}, which does not provide common predicates but would allow to do more generic or specific searches. 455 454 456 \section{Conclusions} 455 457 In this paper, we sketched the work on encoding of the whole of the CMD data domain in RDF, with special focus on the core model -- the general component schema. In the future we will extend this with mapping element values to semantic entities.
Note: See TracChangeset
for help on using the changeset viewer.