Changeset 3844


Ignore:
Timestamp:
10/22/13 20:54:45 (11 years ago)
Author:
mwindhouwer
Message:

M 2014-LREC/CMD2RDF.tex

  • changes here and there
File:
1 edited

Legend:

Unmodified
Added
Removed
  • CMDI-Interoperability/CMD2RDF/trunk/docs/papers/2014-LREC/CMD2RDF.tex

    r3843 r3844  
    134134\subsection{CMD specification}
    135135
    136 The main entity of the meta model is the CMD component modelled as A \code{rdfs:Class}. A CMD profile is basically a CMD component with some extra features, implying a specialization relation. It may seem natural to translate a CMD element to a RDF property (as it holds the literal value), but given its complexity (e.g. attributes, relation to the containing component)  it too has to be a \code{rdfs:Class}. The actual literal value is a property of given element of type \code{cmdm:ElementValue}. For values that can be mapped to entities defined in external vocabularies/ semantic resources, the references to these entities are expressed in parallel properties of type \code{cmdm:hasElementEntity}. The attributes are modelled analogously with \code{cmdm:Attribute, cmdm:hasAttributeValue, cmdm:hasAttributeEntity}.
    137 
    138 The containment relation between components and elements is expressed with a dedicated property \code{cmdm:contains}, attributes of individual components and elements are bound with \code{cmdm:containsAttribute}.
     136The main entity of the meta model is the CMD component modelled as A \code{rdfs:Class}. A CMD profile is basically a CMD component with some extra features, implying a specialization relation. It may seem natural to translate a CMD element to a RDF property (as it holds the literal value), but given its complexity (e.g. attributes\footnote{Due to space considerations the remainder of the paper will not discuss attributes.}, relation to the containing component)  it to has to be a \code{rdfs:Class}. The actual literal value is a property of given element of type \code{cmdm:ElementValue}. For values that can be mapped to entities defined in external vocabularies/ semantic resources, the references to these entities are expressed in parallel properties of type \code{cmdm:hasElementEntity}. The containment relation between components and elements is expressed with a dedicated property \code{cmdm:contains}.
    139137
    140138\label{table:rdf-spec}
     
    171169              & rdfs:domain & :Element ;  \\
    172170              & rdfs:range & :Entity .   \\
    173  \\
    174 \multicolumn{3}{l}{\# analogue for attributes ...}  \\
     171% \\
     172%\multicolumn{3}{l}{\# analogue for attributes ...}  \\
    175173%cmdm:hasAttributeValue & a & rdf:Property ;  \\
    176174%              & rdfs:domain & cmdm:Attribute ;  \\
     
    183181
    184182\noindent
    185 This entities are used for modelling the actual profiles, components and elements as they are defined in the Component Registry.
     183This entities are used for modelling the actual profiles, components and elements as they are defined in the CR.
    186184For stand-alone/top components, the IRI\furl{http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/components/clarin.eu:cr1:c\_1271859438125/rdf} is the exact path into the CR to get the RDF representation for the profile/component. For ``inner'' components (that are defined as part of another component) and elements the identifier is a concatenation of the parent top component IRI and dot-path to given component/element (Actor: \code{cr:clarin.eu:cr1:c\_1271859438197/rdf\#Actor.Actor\_Languages.Actor\_Language}).\footnote{For the sake of readability, we will collapse the component IRIs, refer to them just by their name, prefixed with \code{cmd:}.}
    187185
     
    190188 & rdfs:label & "collection"; \\
    191189 & dcterms:identifier & cr:clarin.eu:cr1:p\_1345561703620. \\
    192 cmd:Actor  \\
    193 & a &cmdm:Component. \\
    194 \end{example3}
    195 
    196 \commentx{Menzo: we need more context for inner components. In the example LanguageName looks well defined, but take a Component/Element like Title. Is it the title of a book or the title of a person. Only when the semantics are clear, e.g., with a dcr:datcat, one can ignore the context and collapse all Components/Elements to a single RDF class/property.}
     190cmd:Actor  & a &cmdm:Component. \\
     191\end{example3}
    197192
    198193\subsection{Data Categories}
    199 Windhouwer \cite{Windhouwer2012_LDL} proposes to use the data categories as annotation properties
    200 so as to avoid too strong semantic implications.
     194One of the semantic registries in use by CMDI for its concept links is ISOcat. In \cite{Windhouwer2012_LDL} proposes to link to the data categories via an annotation property.
    201195
    202196\begin{example3}
     
    207201\end{example3}
    208202
    209 That implies that the \code{@ConceptLink} attribute on CMD elements and components as used in the CMD profiles to reference the data category would be modelled as:
     203That implies that the \code{@ConceptLink} attribute on CMD elements and components as used in the CMD profiles to reference the data category can be modelled as:
    210204
    211205\begin{example3}
    212206cmd:LanguageName & dcr:datcat & isocat:DC-2484. \\
    213207\end{example3}
    214 
    215208
    216209%\subsection{RELcat - Ontological relations}
    217210% \commentx{for now we could probably skip all of relcat (although it is the future of semantic mapping ;) - we spare something for the next paper.}
    218211
    219 As described in \ref{CMDI}, relations between data categories are not stored directly in the \xne{ISOcat} DCR, but rather in a dedicated module the Relation Registry \xne{RELcat} directly as RDF triples \cite{SchuurmanWindhouwer2011} with dedicated predicates (like \code{rel:*}). In the final paper, we will provide more details on the role of this important building block in the endeavour.
     212Relations between data categories are not stored directly in the \xne{ISOcat} DCR, but rather in the dedicated Relation Registry \xne{RELcat} as RDF triples \cite{WINDHOUWER12.954} with dedicated predicates based on an extensible taxonomy of relation types. In the final paper, we will provide more details on the role of this important building block in the endeavour.
    220213
    221214\begin{comment}
     
    238231In the next step, we want to express the individual CMD instances, the metadata records.
    239232
    240 We provide a generic top level class for all resources (including metadata records), the \code{cmdm:Resource},
    241 with \code{cmdm:hasResourceType} and  \code{cmdm:hasMimeType} predicates to type the resources.
     233We provide a generic top level class for all resources (including metadata records), the \code{cmdm:Resource} class and the  \code{cmdm:hasMimeType} predicate to type the resources.
    242234
    243235\begin{example3}
    244236<lr1> & a  & cmdm:Resource; \\
    245 & cmdm:hasResourceType & cmdm:Resource. \\
     237& cmdm:hasMimeType & "audio/wav". \\
    246238\end{example3}
    247239
    248240\subsubsection {Resource Identifier}
    249241
     242\begin{comment}
    250243It seems natural to use the PID of a Language Resource ( \code{<lr1>} ) as the resource identifier for the subject in the RDF representation. While this seems semantically sound, not every resource has to have a PID. (This is especially the case for ``virtual'' resources like collections, that are solely defined by their constituents and don't have any data on their own.) As a fall-back the PID of the MD record ( \code{<lr1.cmd>}  from \code{cmd:MdSelfLink} element) could be used as the resource identifier.
    251 If identifiers are present for both resource and metadata, the relationship between the resource and the metadata record can be expressed as an annotation using the \xne{OpenAnnotation} vocabulary\furl{http://openannotation.org/spec/core/core.html\#Motivations}.
    252 (Note, that one MD record can describe multiple resources. This can be also easily accommodated in OpenAnnotation).
     244If identifiers are present for both resource and metadata, \end{comment}
     245The relationship between the resource and the metadata record can be expressed as an annotation using the \xne{OpenAnnotation} vocabulary\furl{http://openannotation.org/spec/core/core.html\#Motivations}.
     246(Note, that one MD record can describe multiple resources. This can be also easily accommodated in OpenAnnotation.)
    253247
    254248\begin{example3}
     
    261255\subsubsection{Provenance}
    262256
    263 The information from \code{cmd:Header} represents the provenance information about the modelled data:
     257The information from CMD record \code{cmd:Header} represents the provenance information about the modelled data:
    264258
    265259\begin{example3}
     
    271265
    272266\subsubsection{Hierarchy ( Resource Proxy – IsPartOf)}
    273 In CMD, the \code{cmd:ResourceProxyList} structure is used to express both collection hierarchy and point to resource(s) described by the MD record. This can be modelled as \xne{OAI-ORE Aggregation}\furl{http://www.openarchives.org/ore/1.0/primer\#Foundations}
     267In CMD, the \code{cmd:ResourceProxyList} structure is used to express both the collection hierarchy and point to resource(s) described by the CMD record. This can be modelled as \xne{OAI-ORE Aggregation}\furl{http://www.openarchives.org/ore/1.0/primer\#Foundations}
    274268:
    275269
     
    280274& ore:aggregates  & <lr1.cmd>, <lr2.cmd> . \\
    281275\end{example3}
    282 
    283 \commentx{Matej: Should both collection hierarchy and resource-pointers (collection and resource MD records) be encoded as ore:Aggregation?}
    284276
    285277\begin{comment}
     
    320312
    321313The actual mapping process from values to entities is a complex challenging task and will be tackled in more detail in the full paper. The main idea is to find entities in selected reference datasets (controlled vocabularies, ontologies) corresponding to the literal values in the metadata records. The obtained entity identifiers are further used to generate new RDF triples, representing outbound links.
    322 
    323314
    324315\begin{example3}
     
    344335\end{example3}
    345336
    346 
    347337\begin{comment}
    348338%%%%%%%%%%%%%%%%%
     
    420410% Although the distributed nature of the data is one of the defining features of LOD and theoretically one should be able to follow the data by dereferencable URIs, in practice it is mostly necessary to pool into one data store linked datasets from different sources that shall be queried together due to performance reasons. This implies that the data to be kept by the data store will be decisively larger, than ``just'' the original dataset.
    421411
    422 
    423412\section{Conclusions and Future Work}
    424413In this abstract, we sketched the work on encoding of the whole of the CMD data domain in RDF, with special focus on the core model -- the general component schema. In the full paper we will also elaborate on the task of mapping element values to semantic entities. Additionally, some technical considerations will be discussed regarding exposing this dataset as Linked Open Data.
     
    426415With this new enhanced dataset, the groundwork is laid for a full-blown \emph{semantic search}, i.e. the possibility of exploring the dataset indirectly using external semantic resources (like vocabularies of organizations or taxonomies of resource types) to which the CMD data will then be linked to.
    427416
    428 
    429417\bibliographystyle{splncs}
    430418\bibliography{CMD2RDF}
Note: See TracChangeset for help on using the changeset viewer.