Context Navigation

← Previous Changeset
Next Changeset →

Changeset 3844

Timestamp:

10/22/13 20:54:45 (11 years ago)

Author:

mwindhouwer

Message:

M 2014-LREC/CMD2RDF.tex

changes here and there

File:

: 1 edited

CMDI-Interoperability/CMD2RDF/trunk/docs/papers/2014-LREC/CMD2RDF.tex (modified) (13 diffs)

Legend:

: Unmodified
: Added
: Removed

CMDI-Interoperability/CMD2RDF/trunk/docs/papers/2014-LREC/CMD2RDF.tex

-                      r3843
+                      r3844
 \subsection{CMD specification}
+The main entity of the meta model is the CMD component modelled as A \code{rdfs:Class}. A CMD profile is basically a CMD component with some extra features, implying a specialization relation. It may seem natural to translate a CMD element to a RDF property (as it holds the literal value), but given its complexity (e.g. attributes, relation to the containing component)  it too has to be a \code{rdfs:Class}. The actual literal value is a property of given element of type \code{cmdm:ElementValue}. For values that can be mapped to entities defined in external vocabularies/ semantic resources, the references to these entities are expressed in parallel properties of type \code{cmdm:hasElementEntity}. The attributes are modelled analogously with \code{cmdm:Attribute, cmdm:hasAttributeValue, cmdm:hasAttributeEntity}.
+The containment relation between components and elements is expressed with a dedicated property \code{cmdm:contains}, attributes of individual components and elements are bound with \code{cmdm:containsAttribute}.
+The main entity of the meta model is the CMD component modelled as A \code{rdfs:Class}. A CMD profile is basically a CMD component with some extra features, implying a specialization relation. It may seem natural to translate a CMD element to a RDF property (as it holds the literal value), but given its complexity (e.g. attributes\footnote{Due to space considerations the remainder of the paper will not discuss attributes.}, relation to the containing component)  it to has to be a \code{rdfs:Class}. The actual literal value is a property of given element of type \code{cmdm:ElementValue}. For values that can be mapped to entities defined in external vocabularies/ semantic resources, the references to these entities are expressed in parallel properties of type \code{cmdm:hasElementEntity}. The containment relation between components and elements is expressed with a dedicated property \code{cmdm:contains}.
 \label{table:rdf-spec}
 …
               & rdfs:domain & :Element ;  \\
               & rdfs:range & :Entity .   \\
  \\
 \multicolumn{3}{l}{\# analogue for attributes ...}  \\
+% \\
+%\multicolumn{3}{l}{\# analogue for attributes ...}  \\
 %cmdm:hasAttributeValue & a & rdf:Property ;  \\
 %              & rdfs:domain & cmdm:Attribute ;  \\
 …
 \noindent
 This entities are used for modelling the actual profiles, components and elements as they are defined in the Component Registry.
+This entities are used for modelling the actual profiles, components and elements as they are defined in the CR.
 For stand-alone/top components, the IRI\furl{http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/components/clarin.eu:cr1:c\_1271859438125/rdf} is the exact path into the CR to get the RDF representation for the profile/component. For ``inner'' components (that are defined as part of another component) and elements the identifier is a concatenation of the parent top component IRI and dot-path to given component/element (Actor: \code{cr:clarin.eu:cr1:c\_1271859438197/rdf\#Actor.Actor\_Languages.Actor\_Language}).\footnote{For the sake of readability, we will collapse the component IRIs, refer to them just by their name, prefixed with \code{cmd:}.}
 …
  & rdfs:label & "collection"; \\
  & dcterms:identifier & cr:clarin.eu:cr1:p\_1345561703620. \\
+cmd:Actor  \\
+& a &cmdm:Component. \\
+\end{example3}
+\commentx{Menzo: we need more context for inner components. In the example LanguageName looks well defined, but take a Component/Element like Title. Is it the title of a book or the title of a person. Only when the semantics are clear, e.g., with a dcr:datcat, one can ignore the context and collapse all Components/Elements to a single RDF class/property.}
+cmd:Actor  & a &cmdm:Component. \\
+\end{example3}
 \subsection{Data Categories}
+Windhouwer \cite{Windhouwer2012_LDL} proposes to use the data categories as annotation properties
+so as to avoid too strong semantic implications.
+One of the semantic registries in use by CMDI for its concept links is ISOcat. In \cite{Windhouwer2012_LDL} proposes to link to the data categories via an annotation property.
 \begin{example3}
 …
 \end{example3}
 That implies that the \code{@ConceptLink} attribute on CMD elements and components as used in the CMD profiles to reference the data category would be modelled as:
+That implies that the \code{@ConceptLink} attribute on CMD elements and components as used in the CMD profiles to reference the data category can be modelled as:
 \begin{example3}
 cmd:LanguageName & dcr:datcat & isocat:DC-2484. \\
 \end{example3}
 %\subsection{RELcat - Ontological relations}
 % \commentx{for now we could probably skip all of relcat (although it is the future of semantic mapping ;) - we spare something for the next paper.}
 As described in \ref{CMDI}, relations between data categories are not stored directly in the \xne{ISOcat} DCR, but rather in a dedicated module the Relation Registry \xne{RELcat} directly as RDF triples \cite{SchuurmanWindhouwer2011} with dedicated predicates (like \code{rel:*}). In the final paper, we will provide more details on the role of this important building block in the endeavour.
+Relations between data categories are not stored directly in the \xne{ISOcat} DCR, but rather in the dedicated Relation Registry \xne{RELcat} as RDF triples \cite{WINDHOUWER12.954} with dedicated predicates based on an extensible taxonomy of relation types. In the final paper, we will provide more details on the role of this important building block in the endeavour.
 \begin{comment}
 …
 In the next step, we want to express the individual CMD instances, the metadata records.
+We provide a generic top level class for all resources (including metadata records), the \code{cmdm:Resource},
+with \code{cmdm:hasResourceType} and  \code{cmdm:hasMimeType} predicates to type the resources.
+We provide a generic top level class for all resources (including metadata records), the \code{cmdm:Resource} class and the  \code{cmdm:hasMimeType} predicate to type the resources.
 \begin{example3}
 <lr1> & a  & cmdm:Resource; \\
 & cmdm:hasResourceType & cmdm:Resource. \\
+& cmdm:hasMimeType & "audio/wav". \\
 \end{example3}
 \subsubsection {Resource Identifier}
+\begin{comment}
 It seems natural to use the PID of a Language Resource ( \code{<lr1>} ) as the resource identifier for the subject in the RDF representation. While this seems semantically sound, not every resource has to have a PID. (This is especially the case for ``virtual'' resources like collections, that are solely defined by their constituents and don't have any data on their own.) As a fall-back the PID of the MD record ( \code{<lr1.cmd>}  from \code{cmd:MdSelfLink} element) could be used as the resource identifier.
+If identifiers are present for both resource and metadata, the relationship between the resource and the metadata record can be expressed as an annotation using the \xne{OpenAnnotation} vocabulary\furl{http://openannotation.org/spec/core/core.html\#Motivations}.
+(Note, that one MD record can describe multiple resources. This can be also easily accommodated in OpenAnnotation).
+If identifiers are present for both resource and metadata, \end{comment}
+The relationship between the resource and the metadata record can be expressed as an annotation using the \xne{OpenAnnotation} vocabulary\furl{http://openannotation.org/spec/core/core.html\#Motivations}.
+(Note, that one MD record can describe multiple resources. This can be also easily accommodated in OpenAnnotation.)
 \begin{example3}
 …
 \subsubsection{Provenance}
 The information from \code{cmd:Header} represents the provenance information about the modelled data:
+The information from CMD record \code{cmd:Header} represents the provenance information about the modelled data:
 \begin{example3}
 …
 \subsubsection{Hierarchy ( Resource Proxy â IsPartOf)}
 In CMD, the \code{cmd:ResourceProxyList} structure is used to express both collection hierarchy and point to resource(s) described by the MD record. This can be modelled as \xne{OAI-ORE Aggregation}\furl{http://www.openarchives.org/ore/1.0/primer\#Foundations}
+In CMD, the \code{cmd:ResourceProxyList} structure is used to express both the collection hierarchy and point to resource(s) described by the CMD record. This can be modelled as \xne{OAI-ORE Aggregation}\furl{http://www.openarchives.org/ore/1.0/primer\#Foundations}
+:
 …
 & ore:aggregates  & <lr1.cmd>, <lr2.cmd> . \\
 \end{example3}
-\commentx{Matej: Should both collection hierarchy and resource-pointers (collection and resource MD records) be encoded as ore:Aggregation?}
 \begin{comment}
 …
 The actual mapping process from values to entities is a complex challenging task and will be tackled in more detail in the full paper. The main idea is to find entities in selected reference datasets (controlled vocabularies, ontologies) corresponding to the literal values in the metadata records. The obtained entity identifiers are further used to generate new RDF triples, representing outbound links.
 \begin{example3}
 …
 \end{example3}
 \begin{comment}
 %%%%%%%%%%%%%%%%%
 …
 % Although the distributed nature of the data is one of the defining features of LOD and theoretically one should be able to follow the data by dereferencable URIs, in practice it is mostly necessary to pool into one data store linked datasets from different sources that shall be queried together due to performance reasons. This implies that the data to be kept by the data store will be decisively larger, than ``just'' the original dataset.
 \section{Conclusions and Future Work}
 In this abstract, we sketched the work on encoding of the whole of the CMD data domain in RDF, with special focus on the core model -- the general component schema. In the full paper we will also elaborate on the task of mapping element values to semantic entities. Additionally, some technical considerations will be discussed regarding exposing this dataset as Linked Open Data.
 …
 With this new enhanced dataset, the groundwork is laid for a full-blown \emph{semantic search}, i.e. the possibility of exploring the dataset indirectly using external semantic resources (like vocabularies of organizations or taxonomies of resource types) to which the CMD data will then be linked to.
 \bibliographystyle{splncs}
 \bibliography{CMD2RDF}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 3844

Legend:

CMDI-Interoperability/CMD2RDF/trunk/docs/papers/2014-LREC/CMD2RDF.tex

Download in other formats: