Changeset 3847 for CMDI-Interoperability


Ignore:
Timestamp:
10/23/13 08:09:30 (11 years ago)
Author:
mwindhouwer
Message:

M 2014-LREC/CMD2RDF.tex

  • some more work, but still too long ...
File:
1 edited

Legend:

Unmodified
Added
Removed
  • CMDI-Interoperability/CMD2RDF/trunk/docs/papers/2014-LREC/CMD2RDF.tex

    r3844 r3847  
    112112%
    113113The main added value of LOD\cite{TimBL2006} is the interconnecting of disparate datasets.
    114 In the broader context of LOD, there is meanwhile a subgroup/subcommunity specializing
    115 in linking linguistic data \cite{ldl2012}, that renders an obvious pool of candidate
     114In the broader context of LOD, there is meanwhile a Open Knowledge Foundation’s Working Group on Open Data in Linguistics, that renders an obvious pool of candidate
    116115datasets to link the CMD data with\footnote{\url{http://linguistics.okfn.org/resources/llod/}}.
    117116Within these \xne{lexvo} seems most promising starting point, as it features URIs like \url{http://lexvo.org/id/term/eng/}, i.e. for the ISO-639-3 language identifiers which  are also used in CMD records.
     
    132131\end{itemize}
    133132
    134 \subsection{CMD specification}
     133\subsection{CMD specification}\label{sec:CMDM}
    135134
    136135The main entity of the meta model is the CMD component modelled as A \code{rdfs:Class}. A CMD profile is basically a CMD component with some extra features, implying a specialization relation. It may seem natural to translate a CMD element to a RDF property (as it holds the literal value), but given its complexity (e.g. attributes\footnote{Due to space considerations the remainder of the paper will not discuss attributes.}, relation to the containing component)  it to has to be a \code{rdfs:Class}. The actual literal value is a property of given element of type \code{cmdm:ElementValue}. For values that can be mapped to entities defined in external vocabularies/ semantic resources, the references to these entities are expressed in parallel properties of type \code{cmdm:hasElementEntity}. The containment relation between components and elements is expressed with a dedicated property \code{cmdm:contains}.
     
    191190\end{example3}
    192191
    193 \subsection{Data Categories}
     192\subsubsection{Data Categories}
    194193One of the semantic registries in use by CMDI for its concept links is ISOcat. In \cite{Windhouwer2012_LDL} proposes to link to the data categories via an annotation property.
    195194
     
    305304\end{example3}
    306305
    307 \subsection{Elements, Fields, Values}
     306\subsubsection{Elements, Fields, Values}
    308307Finally, we want to integrate also the actual field values in the CMD records into the ontology.
    309308As explained before, CMD elements have to be typed as \code{rdfs:Class}, the actual value expressed as \code{cmds:ElementValue} property and the corresponding data category expressed as annotation property.
     
    404403\section{Implementation}
    405404
    406 The transformation of profiles and instances into RDF/XML is accomplished by a set of XSL-stylesheets, that are currently being tested on a sample dataset. In the near future, a test with the whole CMD dataset will be performed.
    407 
    408 Once the data is available it has to be stored and published in a RDF triple store. The most promising solution seems to be \xne{Virtuoso}, an integrated feature-rich hybrid data store, able to deal with different types of data (``Universal Data Store''). \cite{Haslhofer2011europeana}
     405The transformation of profiles and instances into RDF/XML is accomplished by a set of XSL-stylesheets, that are currently being tested on a sample dataset. The mappings described for the CMD specification (see section \ref{sec:CMDM}) have to be integrated into the CMDI core infrastructure, e.g., the CR. And in the near future, a test on the instances in the complete CLARIN joint metadata domain will be performed.
     406
     407Once the linked data is available it has to be stored and published in a RDF triple store. The most promising solution seems to be \xne{Virtuoso}, an integrated feature-rich hybrid data store, able to deal with different types of data (``Universal Data Store''). \cite{Haslhofer2011europeana}
    409408
    410409% Although the distributed nature of the data is one of the defining features of LOD and theoretically one should be able to follow the data by dereferencable URIs, in practice it is mostly necessary to pool into one data store linked datasets from different sources that shall be queried together due to performance reasons. This implies that the data to be kept by the data store will be decisively larger, than ``just'' the original dataset.
Note: See TracChangeset for help on using the changeset viewer.