Changeset 3844
- Timestamp:
- 10/22/13 20:54:45 (11 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
CMDI-Interoperability/CMD2RDF/trunk/docs/papers/2014-LREC/CMD2RDF.tex
r3843 r3844 134 134 \subsection{CMD specification} 135 135 136 The main entity of the meta model is the CMD component modelled as A \code{rdfs:Class}. A CMD profile is basically a CMD component with some extra features, implying a specialization relation. It may seem natural to translate a CMD element to a RDF property (as it holds the literal value), but given its complexity (e.g. attributes, relation to the containing component) it too has to be a \code{rdfs:Class}. The actual literal value is a property of given element of type \code{cmdm:ElementValue}. For values that can be mapped to entities defined in external vocabularies/ semantic resources, the references to these entities are expressed in parallel properties of type \code{cmdm:hasElementEntity}. The attributes are modelled analogously with \code{cmdm:Attribute, cmdm:hasAttributeValue, cmdm:hasAttributeEntity}. 137 138 The containment relation between components and elements is expressed with a dedicated property \code{cmdm:contains}, attributes of individual components and elements are bound with \code{cmdm:containsAttribute}. 136 The main entity of the meta model is the CMD component modelled as A \code{rdfs:Class}. A CMD profile is basically a CMD component with some extra features, implying a specialization relation. It may seem natural to translate a CMD element to a RDF property (as it holds the literal value), but given its complexity (e.g. attributes\footnote{Due to space considerations the remainder of the paper will not discuss attributes.}, relation to the containing component) it to has to be a \code{rdfs:Class}. The actual literal value is a property of given element of type \code{cmdm:ElementValue}. For values that can be mapped to entities defined in external vocabularies/ semantic resources, the references to these entities are expressed in parallel properties of type \code{cmdm:hasElementEntity}. The containment relation between components and elements is expressed with a dedicated property \code{cmdm:contains}. 139 137 140 138 \label{table:rdf-spec} … … 171 169 & rdfs:domain & :Element ; \\ 172 170 & rdfs:range & :Entity . \\ 173 \\174 \multicolumn{3}{l}{\# analogue for attributes ...} \\171 % \\ 172 %\multicolumn{3}{l}{\# analogue for attributes ...} \\ 175 173 %cmdm:hasAttributeValue & a & rdf:Property ; \\ 176 174 % & rdfs:domain & cmdm:Attribute ; \\ … … 183 181 184 182 \noindent 185 This entities are used for modelling the actual profiles, components and elements as they are defined in the C omponent Registry.183 This entities are used for modelling the actual profiles, components and elements as they are defined in the CR. 186 184 For stand-alone/top components, the IRI\furl{http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/components/clarin.eu:cr1:c\_1271859438125/rdf} is the exact path into the CR to get the RDF representation for the profile/component. For ``inner'' components (that are defined as part of another component) and elements the identifier is a concatenation of the parent top component IRI and dot-path to given component/element (Actor: \code{cr:clarin.eu:cr1:c\_1271859438197/rdf\#Actor.Actor\_Languages.Actor\_Language}).\footnote{For the sake of readability, we will collapse the component IRIs, refer to them just by their name, prefixed with \code{cmd:}.} 187 185 … … 190 188 & rdfs:label & "collection"; \\ 191 189 & dcterms:identifier & cr:clarin.eu:cr1:p\_1345561703620. \\ 192 cmd:Actor \\ 193 & a &cmdm:Component. \\ 194 \end{example3} 195 196 \commentx{Menzo: we need more context for inner components. In the example LanguageName looks well defined, but take a Component/Element like Title. Is it the title of a book or the title of a person. Only when the semantics are clear, e.g., with a dcr:datcat, one can ignore the context and collapse all Components/Elements to a single RDF class/property.} 190 cmd:Actor & a &cmdm:Component. \\ 191 \end{example3} 197 192 198 193 \subsection{Data Categories} 199 Windhouwer \cite{Windhouwer2012_LDL} proposes to use the data categories as annotation properties 200 so as to avoid too strong semantic implications. 194 One of the semantic registries in use by CMDI for its concept links is ISOcat. In \cite{Windhouwer2012_LDL} proposes to link to the data categories via an annotation property. 201 195 202 196 \begin{example3} … … 207 201 \end{example3} 208 202 209 That implies that the \code{@ConceptLink} attribute on CMD elements and components as used in the CMD profiles to reference the data category wouldbe modelled as:203 That implies that the \code{@ConceptLink} attribute on CMD elements and components as used in the CMD profiles to reference the data category can be modelled as: 210 204 211 205 \begin{example3} 212 206 cmd:LanguageName & dcr:datcat & isocat:DC-2484. \\ 213 207 \end{example3} 214 215 208 216 209 %\subsection{RELcat - Ontological relations} 217 210 % \commentx{for now we could probably skip all of relcat (although it is the future of semantic mapping ;) - we spare something for the next paper.} 218 211 219 As described in \ref{CMDI}, relations between data categories are not stored directly in the \xne{ISOcat} DCR, but rather in a dedicated module the Relation Registry \xne{RELcat} directly as RDF triples \cite{SchuurmanWindhouwer2011} with dedicated predicates (like \code{rel:*}). In the final paper, we will provide more details on the role of this important building block in the endeavour.212 Relations between data categories are not stored directly in the \xne{ISOcat} DCR, but rather in the dedicated Relation Registry \xne{RELcat} as RDF triples \cite{WINDHOUWER12.954} with dedicated predicates based on an extensible taxonomy of relation types. In the final paper, we will provide more details on the role of this important building block in the endeavour. 220 213 221 214 \begin{comment} … … 238 231 In the next step, we want to express the individual CMD instances, the metadata records. 239 232 240 We provide a generic top level class for all resources (including metadata records), the \code{cmdm:Resource}, 241 with \code{cmdm:hasResourceType} and \code{cmdm:hasMimeType} predicates to type the resources. 233 We provide a generic top level class for all resources (including metadata records), the \code{cmdm:Resource} class and the \code{cmdm:hasMimeType} predicate to type the resources. 242 234 243 235 \begin{example3} 244 236 <lr1> & a & cmdm:Resource; \\ 245 & cmdm:has ResourceType & cmdm:Resource. \\237 & cmdm:hasMimeType & "audio/wav". \\ 246 238 \end{example3} 247 239 248 240 \subsubsection {Resource Identifier} 249 241 242 \begin{comment} 250 243 It seems natural to use the PID of a Language Resource ( \code{<lr1>} ) as the resource identifier for the subject in the RDF representation. While this seems semantically sound, not every resource has to have a PID. (This is especially the case for ``virtual'' resources like collections, that are solely defined by their constituents and don't have any data on their own.) As a fall-back the PID of the MD record ( \code{<lr1.cmd>} from \code{cmd:MdSelfLink} element) could be used as the resource identifier. 251 If identifiers are present for both resource and metadata, the relationship between the resource and the metadata record can be expressed as an annotation using the \xne{OpenAnnotation} vocabulary\furl{http://openannotation.org/spec/core/core.html\#Motivations}. 252 (Note, that one MD record can describe multiple resources. This can be also easily accommodated in OpenAnnotation). 244 If identifiers are present for both resource and metadata, \end{comment} 245 The relationship between the resource and the metadata record can be expressed as an annotation using the \xne{OpenAnnotation} vocabulary\furl{http://openannotation.org/spec/core/core.html\#Motivations}. 246 (Note, that one MD record can describe multiple resources. This can be also easily accommodated in OpenAnnotation.) 253 247 254 248 \begin{example3} … … 261 255 \subsubsection{Provenance} 262 256 263 The information from \code{cmd:Header} represents the provenance information about the modelled data:257 The information from CMD record \code{cmd:Header} represents the provenance information about the modelled data: 264 258 265 259 \begin{example3} … … 271 265 272 266 \subsubsection{Hierarchy ( Resource Proxy â IsPartOf)} 273 In CMD, the \code{cmd:ResourceProxyList} structure is used to express both collection hierarchy and point to resource(s) described by theMD record. This can be modelled as \xne{OAI-ORE Aggregation}\furl{http://www.openarchives.org/ore/1.0/primer\#Foundations}267 In CMD, the \code{cmd:ResourceProxyList} structure is used to express both the collection hierarchy and point to resource(s) described by the CMD record. This can be modelled as \xne{OAI-ORE Aggregation}\furl{http://www.openarchives.org/ore/1.0/primer\#Foundations} 274 268 : 275 269 … … 280 274 & ore:aggregates & <lr1.cmd>, <lr2.cmd> . \\ 281 275 \end{example3} 282 283 \commentx{Matej: Should both collection hierarchy and resource-pointers (collection and resource MD records) be encoded as ore:Aggregation?}284 276 285 277 \begin{comment} … … 320 312 321 313 The actual mapping process from values to entities is a complex challenging task and will be tackled in more detail in the full paper. The main idea is to find entities in selected reference datasets (controlled vocabularies, ontologies) corresponding to the literal values in the metadata records. The obtained entity identifiers are further used to generate new RDF triples, representing outbound links. 322 323 314 324 315 \begin{example3} … … 344 335 \end{example3} 345 336 346 347 337 \begin{comment} 348 338 %%%%%%%%%%%%%%%%% … … 420 410 % Although the distributed nature of the data is one of the defining features of LOD and theoretically one should be able to follow the data by dereferencable URIs, in practice it is mostly necessary to pool into one data store linked datasets from different sources that shall be queried together due to performance reasons. This implies that the data to be kept by the data store will be decisively larger, than ``just'' the original dataset. 421 411 422 423 412 \section{Conclusions and Future Work} 424 413 In this abstract, we sketched the work on encoding of the whole of the CMD data domain in RDF, with special focus on the core model -- the general component schema. In the full paper we will also elaborate on the task of mapping element values to semantic entities. Additionally, some technical considerations will be discussed regarding exposing this dataset as Linked Open Data. … … 426 415 With this new enhanced dataset, the groundwork is laid for a full-blown \emph{semantic search}, i.e. the possibility of exploring the dataset indirectly using external semantic resources (like vocabularies of organizations or taxonomies of resource types) to which the CMD data will then be linked to. 427 416 428 429 417 \bibliographystyle{splncs} 430 418 \bibliography{CMD2RDF}
Note: See TracChangeset
for help on using the changeset viewer.