Changeset 4460 for CMDI-Interoperability


Ignore:
Timestamp:
02/06/14 11:02:56 (10 years ago)
Author:
vronk
Message:

updated numbers on CMD data and corrected a few typos

Location:
CMDI-Interoperability/CMD2RDF/trunk/docs/papers/2014-LREC
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • CMDI-Interoperability/CMD2RDF/trunk/docs/papers/2014-LREC/CMD2RDF.bib

    r4451 r4460  
    193193}
    194194
     195@STANDARD{ISODIS24622-1_2013,
     196  title = {Language resource management -- Component Metadata Infrastructure
     197        -- Part 1: The Component Metadata Model (CMDI-1)},
     198  organization = {ISO},
     199  author = {{ISO/DIS 24622-1}},
     200  type = {International Standard},
     201  number = {24622-1},
     202  address = {Geneva, Switzerland},
     203  year = {2013},
     204  url = {http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=37336},
     205  owner = {m},
     206  publisher = {ISO},
     207  timestamp = {2014.02.05}
     208}
     209
    195210@STANDARD{ISO12620:2009,
    196211  title = {Specification of data categories and management of a Data Category
     
    215230
    216231@INPROCEEDINGS{Joerg2010,
    217   author = {Brigitte J\'{o}rg and Hans Uszkoreit and Alastair Burt},
     232  author = {Brigitte J\"{o}rg and Hans Uszkoreit and Alastair Burt},
    218233  title = {LT World: Ontology and Reference Information Portal},
    219234  booktitle = {LREC},
     
    389404
    390405@INPROCEEDINGS{SchuurmanWindhouwer2011,
    391   author = {I. Schuurman and M.A. Windhouwer.},
     406  author = {Ineke Schuurman and Menzo Windhouwer},
    392407  title = {Explicit Semantics for Enriched Documents. What Do ISOcat, RELcat
    393408        and SCHEMAcat Have To Offer?},
     
    604619}
    605620
    606 @STANDARD{ISODIS24622-1_2013,
    607   title = {Language resource management -- Component Metadata Infrastructure -- Part 1: The Component Metadata Model (CMDI-1)},
    608   organization = {ISO},
    609   author = {{ISO/DIS 24622-1}},
    610   type = {International Standard},
    611   number = {24622-1},
    612   address = {Geneva, Switzerland},
    613   year = {2013},
    614   url = {http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=37336},
    615   abstract = {},
    616   owner = {m},
    617   publisher = {ISO},
    618   timestamp = {2014.02.05}
    619 }
    620 
    621 
    622621@comment{jabref-meta: selector_publisher:}
    623622
  • CMDI-Interoperability/CMD2RDF/trunk/docs/papers/2014-LREC/CMD2RDF.tex

    r4451 r4460  
    6565\title{From CLARIN Component Metadata to Linked Open Data}
    6666
    67 \name{Matej Durco, Menzo Windhouwer}
     67\name{Matej \v{D}ur\v{c}o, Menzo Windhouwer}
    6868
    6969\address{ Institute for Corpus Linguistics and Text Technology (ICLTT), The Language Archive - DANS \\
    7070               Vienna, Austria, The Hague, The Netherlands \\
    71                matej.durco@assoc.oeaw.ac.at, menzo.windhouwer@dans.knaw.nl\\}
     71               matej.durco@oeaw.ac.at, menzo.windhouwer@dans.knaw.nl\\}
    7272
    7373\abstract{In the european CLARIN infrastructure a growing number of resources are described with Component Metadata. In this paper we
     
    8181\section{Motivation}
    8282%
    83 Although semantic interoperability has been one of the main motivations for CLARIN's Component Metadata Infrastructure (CMDI), until now there has been no work on the obvious -- bringing CMDI to the Semantic Web. We believe that providing the CLARIN CMD records as Linked Open Data (LOD) interlinked with external semantic resources, will open up new dimensions of processing and exploring of the CMD data by employing the power of semantic technologies. In this paper, we lay out how individual parts of the CMD Infrastructure can be expressed in RDF and made ready to be interlinked with existing external semantic resources (ontologies, knowledge bases, vocabularies).
     83Although semantic interoperability has been one of the main motivations for CLARIN's Component Metadata Infrastructure (CMDI), until now there has been no work on the obvious -- bringing CMDI to the Semantic Web. We believe that providing the CLARIN CMD records as Linked Open Data (LOD) interlinked with external semantic resources, will open up new dimensions of processing and exploring of the CMD data by employing the power of semantic technologies. In this paper, we lay out how individual parts of the CMD data domain can be expressed in RDF and made ready to be interlinked with existing external semantic resources (ontologies, taxonomies, knowledge bases, vocabularies).
    8484%This conversion lays a foundation / is groundwork for providing the original dataset as a \emph{Linked Open Data} nucleus within the \emph{Web of Data}\cite{TimBL2006} as well as for real semantic (ontology-driven) search and exploration of the data.
    8585
     
    107107
    108108\subsubsection{CMD Profiles }
    109 Currently 133 public profiles and 772 components are defined in the CR.
     109Currently 146 public profiles and 857 components are defined in the CR.
    110110Next to the `native' ones a number of profiles have been created that implement existing metadata formats, like OLAC/DCMI-terms, TEI Header or the META-SHARE schema.
    111111%The resulting profiles proof the flexibility/expressi\-vi\-ty of the CMD metamodel.
     
    119119
    120120The main CLARIN OAI-PMH harvester\footnote{\url{http://catalog.clarin.eu/oai-harvester/}}
    121 regularly collects records from the -- currently 69 -- providers, all in all over 550.000 records.
    122 16 of the providers offer CMDI records, the other 53 provide around 140.000 OLAC/DC records, that are converted into the corresponding CMD profile.
     121regularly collects records from the -- currently 58 -- providers, all in all over 600.000 records.
     122Some 20 of the providers offer CMDI records, the rest provides around 140.000 OLAC/DC records, that are converted into the corresponding CMD profile.
    123123%Next to these 81.226 original OLAC records, there a few providers offering their OLAC or DCMI-terms records already converted into CMDI, thus all in all OLAC, DCMI-terms records amount to 139.152.
    124124%On the other hand, some
     
    312312\begin{example3}
    313313\_:actor1  & a & cmd:Actor . \\
    314 \_:actor1lang1  & a & cmd:Actor. \\
    315  &  & Actor\_Language . \\
     314\_:actor1lang1  & a  & cmd:Actor.Language . \\
    316315\_:actor1  & cmd:contains & \_:actor1lang1 . \\
    317316\end{example3}
     
    349348\_:org & a & cmd:Person.Organisation ; \\
    350349        & \multicolumn{2}{l}{cmd:hasPerson.OrganisationElementValue \quad 'MPI'\^{}\^{}xs:string ;} \\
    351         & \multicolumn{2}{l}{ cmd:hasPerson.OrganisationElementEntity  \quad <http://www.mpi.nl/> . }\\
    352 
    353 <http://www.mpi.nl/> & a  & cmd:OrganisationElementEnity .
     350        & \multicolumn{2}{l}{ cmd:hasPerson.OrganisationElementEntity  \quad \textless http://www.mpi.nl/\textgreater . }\\
     351
     352\textless http://www.mpi.nl/\textgreater & a  & cmd:OrganisationElementEnity .
    354353\end{example3}
    355354\end{center}
     
    370369\section{Mapping field values to semantic entities}
    371370\label{sec:values2entities}
    372 
    373 \commentx{this is probably definitely too much for one abstract  - so we could just anounce the need for this mapping process.}
    374371
    375372This task is a prerequisite to be able to express also the CMD instance data in RDF. The main idea is to find entities in selected reference datasets (controlled vocabularies, ontologies) matching the literal values in the metadata records. The obtained entity identifiers are further used to generate new RDF triples, representing outbound links.
     
    452449Of course, language is just one dimension to use for mapping.
    453450Step by step we will link other categories like countries, geographica, organisations, etc.
    454 to some of the central nodes of the LOD cloud , like \xne{dbpedia}, \xne{Yago} or \xne{geonames},
     451to some of the central nodes of the LOD cloud, like \xne{dbpedia}, \xne{Yago} or \xne{geonames},
    455452but also to domain-specific semantic resource like the ontology for language technology \xne{LT-World} \cite{Joerg2010} developed at DFKI.
    456453
Note: See TracChangeset for help on using the changeset viewer.