Changeset 3680 for SMC4LRT


Ignore:
Timestamp:
10/04/13 22:47:37 (11 years ago)
Author:
vronk
Message:

adding Schema Matching info and application

Location:
SMC4LRT
Files:
9 edited

Legend:

Unmodified
Added
Removed
  • SMC4LRT/Outline.tex

    r3671 r3680  
    8181
    8282\input{chapters/Introduction}
    83 \end{comment}
     83
    8484\input{chapters/Literature}
    8585
    8686\input{chapters/Definitions}
    87 
     87\end{comment}
    8888\input{chapters/Data}
    8989
     
    9999
    100100\input{chapters/Conclusion}
     101
    101102\end{comment}
    102103
     
    104105
    105106\bibliographystyle{ieeetr}
    106 \bibliography{../../2bib/lingua,../../2bib/ontolingua,../../2bib/smc4lrt,../../2bib/semweb,../../2bib/distributed_systems,../../2bib/own, ../../2bib/diglib,../../2bib/it-misc}
     107\bibliography{../../2bib/lingua,../../2bib/ontolingua,../../2bib/smc4lrt,../../2bib/semweb,../../2bib/distributed_systems,../../2bib/own,../../2bib/diglib,../../2bib/it-misc}
    107108
    108109\appendix
  • SMC4LRT/chapters/Data.tex

    r3671 r3680  
    132132
    133133\section{Other Metadata Formats and Collections }
     134\label{sec:lrt-md-catalogs}
    134135
    135136
     
    139140
    140141
    141 \subsection{Dublin Core metadata terms + OLAC}
    142 Since 1995
    143 Maintained Dublin Core Metadata Initiative
    144 DC, OLAC
    145 
    146 "Dublin" refers to Dublin, Ohio, USA where the work originated during the 1995 invitational OCLC/NCSA Metadata Workshop,[8] hosted by the Online Computer Library Center (OCLC), a library consortium based in Dublin, and the National Center for Supercomputing Applications (NCSA).
    147 
    148 comes in two version: 15 core elements  and 55 qualified terms ?
    149 
    150 \begin{quotation}
    151 Early Dublin Core workshops popularized the idea of "core metadata" for simple and generic resource descriptions. The fifteen-element "Dublin Core" achieved wide dissemination as part of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and has been ratified as IETF RFC 5013, ANSI/NISO Standard Z39.85-2007, and ISO Standard 15836:2009.
    152 \end{quotation}
    153 
    154 
    155 
    156 Given its simplicity it is used as the common denominator in many applications, among others it is the base format in the OAI-PMH protocol.
    157 
    158 It is required/expected as the base
    159 openarchives register: \url{http://www.openarchives.org/Register/BrowseSites}
    160  2006 OAI-repositories
    161 
    162 DublinCore Resource Types\furl{http://dublincore.org/documents/resource-typelist/}
    163 
    164 DublinCore to RDF mapping\furl{http://dublincore.org/documents/dcq-rdf-xml/}
    165 
     142\subsection{Dublin Core metadata terms}
     143The work on this metadata format started in 1995 at Metadata Workshop\furl{http://dublincore.org/workshops/dc1/} organized by OCLC/NCSA in  Dublin, Ohio, USA. Nowadays maintained by Dublin Core Metadata Initiative.
     144
     145It is a fixed set of terms for a basic generic description of a range of resources (both virtual and physical) coming in two version\furl{http://dublincore.org/documents/dcmi-terms/}:
     146\begin{description}
     147\item[Dublin Core Metadata Element Set (DCMES) ] \code{/elements/1.1/}
     148the original set 15 terms, standardized as IETF RFC 5013, ISO Standard 15836-2009 and NISO Standard Z39.85-2007
     149\item[Dublin Core metadata terms ] \code{/terms/}
     150the extended `Qualified' set of 55 terms, extending the original 15 ones (replicating them in the new namespace for consistency)
     151\end{description}
     152
     153Today, Dublin Core metadata terms is very widely spread. Thanks to its simplicity it is used as the common denominator in many applications, content management systems integrate Dublin Core to use in \code{meta} tags of served pages (\code{<meta name="DC.Publisher" content="publisher-name" >}), it is default minimal description in content repositories (Fedora-commons, DSpace). It is also the obligatory base format in the OAI-PMH protocol. The OpenArchives register\furl{http://www.openarchives.org/Register/BrowseSites} lists more than 2100 data providers.
     154
     155There are multiple possible serializations, in particular a mapping t RDF is specified\furl{http://dublincore.org/documents/dcq-rdf-xml/}.
     156Worth noting is Dublin Core's take on classification of resources\furl{http://dublincore.org/documents/resource-typelist/}.
     157
     158The simplicity of the format is also it's main drawback when considered as metadata format in the research communities. It it too general to capture all specific details, individual research groups need to describe different kinds of resources with.
     159
     160\subsection{OLAC}
    166161\label{def:OLAC}
    167162
    168 \xne{OLAC Metadata}\furl{http://www.language-archives.org/}format\cite{Bird2001},OLAC \cite{Simons2003OLAC} is a more specialized version of the \xne{Dublin Core metadata terms}, adapted to the needs of the linguistic community:
     163\xne{OLAC Metadata}\furl{http://www.language-archives.org/}format \cite{Bird2001} is a application profile\cite{heery2000application}, of the \xne{Dublin Core metadata terms}, adapted to the needs of the linguistic community. It is developed and maintained by the \xne{Open Language Archives Community} providing a common platform and an infrastructure for ``creating a worldwide virtual library of language resources'' \cite{Simons2003OLAC}.
     164
     165The OLAC schema \furl{http://www.language-archives.org/OLAC/1.1/olac.xsd} extends the dcterms schema mainly by adding attributes with controlled vocabularies, for domain specific semantic annotation (\code{linguistic-field, linguistic-type, language, role, discourse-type})
    169166
    170167\begin{quotation}
     
    172169\end{quotation}
    173170
    174 The \xne{OLAC Metadata} is the set of metadata elements archives participating in have agreed to use for describing language resources.
    175 
    176 \todoin{check http://www.language-archives.org/OLAC/metadata.html}
    177 
    178  OLAC Archives contain over 100,000 records, covering resources in half of the world's living languages. More statistics on coverage.
    179 http://www.language-archives.org/
    180 
    181 Most of the OLAC records are integrated into CMDI (cf. \ref{tab:cmd-profiles}, \ref{reports:OLAC})
    182 
     171\lstset{language=XML}
     172\begin{lstlisting}[label=lst:sampleolac, caption=Sample OLAC record]
     173<olac:olac>
     174   <creator>Bloomfield, Leonard</creator>
     175   <date>1933</date>
     176   <title>Language</title>
     177   <publisher>New York: Holt</publisher>
     178</olac:olac>
     179\end{
     180
     181OLAC provides a ``search over 100,000 records collected from 44 archives\furl{http://www.language-archives.org/archives}, covering resources in half of the world's living languages''.
     182
     183Note, that OLAC archives are being harvested by CLARIN harvester and OLAC records are part of the CMDI joint metadata domain (cf. \ref{tab:cmd-profiles}, \ref{reports:OLAC}).
    183184
    184185\subsection{TEI / teiHeader}
     
    186187
    187188\begin{quotation}
    188 The Text Encoding Initiative (TEI) is a consortium which collectively develops and maintains a standard for the representation of texts in digital form. 
     189The Text Encoding Initiative (TEI) is a consortium which collectively develops and maintains a standard for the representation of texts in digital form.\furl{http://www.tei-c.org/}
    189190\end{quotation}
    190 
    191 \url{http://www.tei-c.org/}
    192 
    193 TEI is a de-facto standard for encoding any kind of digital textual resources being developed by a large community since 1994. It defines a set of elements to annotate individual aspects of the text being encoded. For the purposes of text description, metadata encoding the complex top-level element \code{teiHeader} is foreseen. TEI is not prescriptive, but rather descriptive, it does not provide just one fixed schema, but allows for a certain flexibility wrt to elements used and inner structure, allowing to generate custom schemas adopted to projects' needs.
    194 
    195 Thus there is also not just one fixed \xne{teiHeader}.
    196 
    197  TEI/teiHeader/ODD,
    198 
    199 
     191encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics.
     192
     193\begin{quotation}
     194 The Text Encoding Initiative (TEI) is a consortium which collectively develops and maintains a standard for the representation of texts in digital form \dots  [Next to] its chief deliverable is a set of Guidelines which specify encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics, \dots the Consortium provides a variety of TEI-related resources, training events and software. [abgridged]
     195\ebnd{quotation}
     196
     197TEI is a de-facto standard for encoding any kind of digital textual resources being developed by a large community since 1994. It defines a set of elements to annotate individual aspects of the text being encoded. For the purposes of text description, metadata encoding (of main concern for us) the complex top-level element \code{teiHeader} is foreseen. TEI is not prescriptive, but rather descriptive, it does not provide just one fixed schema, but allows for a certain flexibility wrt to elements used and inner structure, allowing to generate custom schemas adopted to projects' needs. Thus there is also not just one fixed \code{teiHeader}.
     198
     199Some of the data collections encoded in TEI are die Korpora des DWDS\furl{http://www.dwds.de}, Deutsches Textarchiv\furl{http://www.dwds.de/dta} \cite{Geyken2011deutsches}, Oxford Text Archives\furl{http://ota.oucs.ox.ac.uk/}
     200
     201There has been an intense cooperation between the TEI and CMDI community on the issue of interoperability and multiple efforts to express teiHeader in CMDI were undertaken (cf. \ref{results:tei}) as a starting point for integrating TEI-based data into the CLARIN infrastructure.
    200202
    201203\subsection{ISLE/IMDI}
     
    204206http://www.mpi.nl/imdi/
    205207
     208\begin{quotation}
    206209The ISLE Meta Data Initiative (IMDI) is a proposed metadata standard to describe multi-media and multi-modal language resources. The standard provides interoperability for browsable and searchable corpus structures and resource descriptions with help of specific tools.
     210\end{quotation}
     211
     212
     213\subsection{LAT, TLA}
     214Language Archiving Technology, now The Language Archive - provided by Max Planck Insitute for Psycholinguistics \footnote{\url{http://www.mpi.nl/research/research-projects/language-archiving-technology}}
     215
    207216
    208217Predecessor of CMDI
     
    213222
    214223Metadata Object Description Schema - is a schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications.
     224
     225
    215226
    216227\subsection{ESE, Europeana Data Model - EDM}
     
    245256
    246257
     258
     259\subsection{META-NET}
     260
     261
     262
     263\begin{quotation}
     264META-SHARE is an open, integrated, secure and interoperable sharing and exchange facility for LRs (datasets and tools) for the Human Language Technologies domain and other applicative domains where language plays a critical role.
     265
     266META-SHARE is implemented in the framework of the META-NET Network of Excellence. It is designed as a network of distributed repositories of LRs, including language data and basic language processing tools (e.g., morphological analysers, PoS taggers, speech recognisers, etc.).
     267
     268\end{quotation}
     269
     270The distributed networks of repositories consists of a number of member repositories, that offer their own subset of resource.
     271
     272A few\footnote{7 as of 2013-07} of the members repositories play the role of managing nodes providing ``a core set of services critical to the whole of the META-SHARE network''\cite{Piperidis2012meta}, especially collecting the resource descriptions from other members and exposing the aggregated information to the users.
     273The whole network offers approximately 2.000 resources (the numbers differ even across individual managing nodes).
     274
     275
     276MetaShare ontology\furl{http://metashare.ilsp.gr/portal/knowledgebase/TheMetaShareOntology}
     277
     278\subsection{ELRA}
     279
     280European Language Resources Association\furl{http://elra.info}
     281
     282\begin{quotation}
     283ELRA's missions are to promote language resources for the Human Language Technology (HLT) sector, and to evaluate language engineering technologies. To achieve these two major missions, we offer a range of services, listed below and described in the "Services around Language Resources" section:
     284\end{quotation}
     285
     286http://www.elda.org/
     287Evaluations and Language resources Distribution Agency
     288
     289ELDA - Evaluations and Language resources Distribution Agency -- is ELRA's operational body, set up to identify, classify, collect, validate and produce the language resources which may be needed by the HLT -- Human Language Technology -- community. Besides, ELDA is involved in HLT evaluation campaigns.
     290
     291ELDA handles the practical and legal issues related to the distribution of language resources, provides legal advice in the field of HLT, and drafts and concludes distribution agreements on behalf of ELRA.
     292
     293ELRA Catalog
     294
     295http://catalog.elra.info/
     296
     297
     298Universal Catalog+
     299 Universal Catalogue is a repository comprising information regarding Language Resources (LRs) identified all over the world.
     300
     301
    247302\subsection{Other}
    248303
     
    262317\item Persons - GND, VIAF
    263318\item Organizations - GND, VIAF
    264 \item Schlagwörter/Subjects - GND, LCSH
     319\item Schlagw\"{o}rter/Subjects - GND, LCSH
    265320\item Resource Typology -
    266321\end{itemize}
     
    274329Other related relevant activities and initiatives
    275330
     331http://www.w3.org/wiki/WebSchemas/ExternalEnumerations#Controlled_property_values
     332
    276333A broader collection of related initiatives can be found at the German National Library website:
    277334\furl{http://www.dnb.de/DE/Standardisierung/LinksAFS/linksafs_node.html}
     
    281338http://metadaten-twr.org/ - Technology Watch Report: Standards in Metadata and Interoperability (last entry from 2011)
    282339At MPDL, within the escidoc publication platform there seems to be (work  on) a service (since 2009 !) for controlled vocabularies: \furl{http://colab.mpdl.mpg.de/mediawiki/Control_of_Named_Entities}
    283 Entity Authority Tool Set - a web application for recording, editing, using and displaying authority information about entities – developed at the New Zealand Electronic Text Centre (NZETC).
     340Entity Authority Tool Set - a web application for recording, editing, using and displaying authority information about entities -- developed at the New Zealand Electronic Text Centre (NZETC).
    284341http://eats.readthedocs.org/en/latest/
    285342
     
    303360
    304361\subsubsection{LT-World}
    305 Regarding existing domain-specific semantic resources \texttt{LT-World}\footnote{\url{http://www.lt-world.org/}},  the ontology-based portal covering primarily Language Technology being developed at DFKI\footnote{\textit{Deutsches Forschungszentrum fÃŒr KÃŒnstliche Intelligenz} - \url{http://www.dfki.de}},  is a prominent resource providing information about the entities (Institutions, Persons, Projects, Tools, etc.) in this field of study. \cite{Joerg2010}
    306 
    307 
    308 
    309 \section{LRT Metadata Catalogs/Collections}
    310 \label{sec:lrt-md-catalogs}
    311 \todoin{Overview of catalogs, name, since, \#providers, \#resources}
    312 
    313 \todoin{[DFKI/LT-World]  - collection or ontology}
    314 
    315 \subsection{CMDI}
    316 collections, profiles/Terms, ResourceTypes!
    317 
    318 \subsection{OLAC}
    319 
    320 \subsection{LAT, TLA}
    321 Language Archiving Technology, now The Language Archive - provided by Max Planck Insitute for Psycholinguistics \footnote{\url{http://www.mpi.nl/research/research-projects/language-archiving-technology}}
    322 
    323 \subsection{META-NET}
    324 
    325 
    326 
    327 \begin{quotation}
    328 META-SHARE is an open, integrated, secure and interoperable sharing and exchange facility for LRs (datasets and tools) for the Human Language Technologies domain and other applicative domains where language plays a critical role.
    329 
    330 META-SHARE is implemented in the framework of the META-NET Network of Excellence. It is designed as a network of distributed repositories of LRs, including language data and basic language processing tools (e.g., morphological analysers, PoS taggers, speech recognisers, etc.).
    331 
    332 \end{quotation}
    333 
    334 The distributed networks of repositories consists of a number of member repositories, that offer their own subset of resource.
    335 
    336 A few\footnote{7 as of 2013-07} of the members repositories play the role of managing nodes providing ``a core set of services critical to the whole of the META-SHARE network''\cite{Piperidis2012meta}, especially collecting the resource descriptions from other members and exposing the aggregated information to the users.
    337 The whole network offers approximately 2.000 resources (the numbers differ even across individual managing nodes).
    338 
    339 
    340 MetaShare ontology\furl{http://metashare.ilsp.gr/portal/knowledgebase/TheMetaShareOntology}
    341 
    342 
    343 
    344 \subsection{ELRA}
    345 
    346 European Language Resources Association
    347 
    348 \furl{http://elra.info}
    349 
    350 
    351 ELRA’s missions are to promote language resources for the Human Language Technology (HLT) sector, and to evaluate language engineering technologies. To achieve these two major missions, we offer a range of services, listed below and described in the "Services around Language Resources" section:
    352 
    353 
    354 http://www.elda.org/
    355 Evaluations and Language resources Distribution Agency
    356 
    357 ELDA - Evaluations and Language resources Distribution Agency – is ELRA’s operational body, set up to identify, classify, collect, validate and produce the language resources which may be needed by the HLT – Human Language Technology – community. Besides, ELDA is involved in HLT evaluation campaigns.
    358 
    359 ELDA handles the practical and legal issues related to the distribution of language resources, provides legal advice in the field of HLT, and drafts and concludes distribution agreements on behalf of ELRA.
    360 
    361 ELRA Catalog
    362 
    363 http://catalog.elra.info/
    364 
    365 
    366 Universal Catalog+
    367  Universal Catalogue is a repository comprising information regarding Language Resources (LRs) identified all over the world.
    368 
    369 
    370 \subsection{Other}
     362Regarding existing domain-specific semantic resources \texttt{LT-World}\footnote{\url{http://www.lt-world.org/}},  the ontology-based portal covering primarily Language Technology being developed at DFKI\footnote{Deutsches Forschungszentrum fÃŒr KÃŒnstliche Intelligenz, \url{http://www.dfki.de}},  is a prominent resource providing information about the entities (Institutions, Persons, Projects, Tools, etc.) in this field of study. \cite{Joerg2010}
     363
     364
    371365
    372366
     
    394388
    395389
     390VoID "Vocabulary of Interlinked Datasets") is an RDF based schema to describe linked datasets\furl{http://semanticweb.org/wiki/VoID}
     391
     392http://www.dnb.de/rdf
     393
     394
     395the entire WorldCat cataloging collection made publicly
     396available using Schema.org mark-up with library extensions for use by developers and
     397search partners such as Bing, Google, Yahoo! and Yandex
     398
     399OCLC begins adding linked data to WorldCat by appending
     400Schema.org descriptive mark-up to WorldCat.org pages, thereby
     401making OCLC member library data available for use by intelligent
     402Web crawlers such as Google and Bing
     403
     404
     405\subsection{schema.org}
     406
     407http://schema.org/docs/datamodel.html
     408
     409microdata or
     410http://www.w3.org/TR/rdfa-lite/
     411 Resource Description Framework in attributes
     412
    396413
    397414\section{Summary}
  • SMC4LRT/chapters/Definitions.tex

    r3671 r3680  
    5252dcterms: & http://purl.org/dc/terms \\
    5353oa: & http://www.w3.org/ns/oa\# \\
     54olac: & http://www.language-archives.org/OLAC/1.1/ \\
    5455ore: & http://www.openarchives.org/ore/terms/ \\
    5556cr: & http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/ \\
  • SMC4LRT/chapters/Design_SMCinstance.tex

    r3671 r3680  
    306306\end{enumerate}
    307307
     308This task is basically an application of ontology mapping method.
     309
     310We don't try to achieve complete ontology alignment, we just want to find
     311for our ``anonymous'' concepts semantically equivalent concepts from other ontologies.
     312This is very near just other phrasing for the definition of ontology mapping function as given by \cite{EhrigSure2004, amrouch2012survey}:
     313``for each concept (node) in ontology A [tries to] find a corresponding concept
     314(node), which has the same or similar semantics, in ontology B and vice verse''.
     315
     316The first two points in the above enumeration represent the steps necessary to be able to apply the ontology mapping.
     317The identification of appropriate vocabularies is discussed in the next subsection. In the operationalization, the identified vocabularies could be treated as one aggregated ontology to map all entities against. For the sake of higher precision, it may be sensible to perform the task separately for individual concepts, i.e. organisations, persons etc. and in every run consider only relevant vocabularies.
     318
     319
     320The transformation of the data has been partly described in previous section:
     321It can be trivially automatically converted into RDF triples as :
     322
     323\begin{example3}
     324<lr1> & cmd:Organisation & "MPI" \\
     325\end{example3}
     326
     327However for the needs of the mapping task we propose to reduce and rewrite to retrieve distinct concept , value pairs:
     328
     329\begin{example3}
     330\_:1 & a & cmd:Organisation;\\
     331   & skos:altLabel & "MPI";
     332\end{example3}
     333
     334\var{lookup} function is a customized version of the \var(map) function, that operates on this information pairs (concept, label).
     335
     336The two steps \var{lookup} and \var{assess} correspond exactly to the two steps in \cite{jimenez2012large} in their system \xne{LogMap2}: 1) computation of mapping candidates (maximise recall) and b) assessment of the candidates (maximize precision)
     337
     338
    308339\begin{figure*}[!ht]
    309340\includegraphics[width=1\textwidth]{images/SMC_CMD2LOD}
     
    373404
    374405
     406
    375407%%%%%%%%%%%%%%%%%%%%%
    376408\section{SMC LOD - Semantic Web Application}
     
    378410
    379411
    380 
    381 \cite{Europeana RDF Store Report}
    382 
    383 Technical aspects (RDF-store?): Virtuoso
    384 
    385 \todocode{install Jena +  fuseki}\furl{http://jena.apache.org}\furl{http://jena.apache.org/documentation/serving_data/index.html}\furl{http://csarven.ca/how-to-create-a-linked-data-site}
    386 
    387 \todocode{install older python (2.5?) to be able to install dot2tex - transforming dot files to nicer pgf formatted graphs}\furl{http://dot2tex.googlecode.com/files/dot2tex-2.8.7.zip}\furl{file:/C:/Users/m/2kb/tex/dot2tex-2.8.7/}
    388 
    389 \todocode{check install siren}\furl{http://siren.sindice.com/}
    390 
    391 
    392 \todocode{check/install: raptor for generating dot out of rdf}\furl{http://librdf.org/raptor/}
    393 
    394 \todocode{check/install: Linked Data browser: LoD p. 81; Haystack}\furl{http://en.wikipedia.org/wiki/Haystack_(PIM)}
    395 
    396  / interface (ontology browser?)
    397 
    398 semantic search component in the Linked Media Framework
    399 \todocode{!!! check install LMF - kiwi - SemanticSearch !!!}\furl{http://code.google.com/p/kiwi/wiki/SemanticSearch}
    400 
    401 \todoin{check SARQ}\furl{http://github.com/castagna/SARQ}
    402 
    403 
    404 \section {Full semantic search - concept-based + ontology-driven ?}
    405 \label{semantic-search}
    406 
    407412With the new enhanced dataset, as detailed in section \ref{sec:cmd2rdf}, the groundwork is laid for the full-blown semantic search as proposed in the original goals, i.e. the possibility for ontology-driven or at least `semantic resources assisted' exploration of the dataset.
    408413
     
    415420rechercheisidore, dbpedia, ...
    416421
     422
     423\cite{Europeana RDF Store Report}
     424
     425Technical aspects (RDF-store?): Virtuoso
     426
     427
     428semantic search component in the Linked Media Framework
     429
     430\todoin{check SARQ}\furl{http://github.com/castagna/SARQ}
     431
     432
     433%\section {Full semantic search - concept-based + ontology-driven ?}
     434%\label{semantic-search}
     435
     436
    417437\section{Summary}
     438
     439%The task can be also seen as building bridge between XML resources and semantic resources expressed in RDF, OWL.
     440
     441The process of expressing the whole of the data as one semantic resource, can be also understood as schema or ontology merging task. Data categories being the primary mapping elements
     442
     443
    418444In this chapter, an expression of the whole of the CMD data domain into RDF was proposed, with special focus on the way how to translate the string values in metadata fields to corresponding semantic entities. Additionally, some technical considerations were discussed regarding exposing this dataset as Linked Open Data and the implications for real semantic ontology-based data exploration.
    419445
  • SMC4LRT/chapters/Design_SMCschema.tex

    r3665 r3680  
    119119\caption{Attributes of \code{Term} when encoding CMD entity}
    120120\label{table:terms-attributes-cmd}
    121  \begin{tabularx}{1\textwidth}{ l | X | X }
     121\begin{tabularx}{1\textwidth}{ l | X | X }
     122 %\begin{tabu}{1\textwidth}{ l | l | l }
    122123  attribute & allowed values & sample value\\
    123124\hline
     
    689690Also such a visualization could feature direct search links from individual nodes into the dataset, i.e.  from a profile node a link could lead into a search interface listing metadata records of given profile.
    690691
     692
     693%%%%%%%%%%%%%%%%%%%%%%%%%
     694\section{Application of Schema Matching techniques in SMC}
     695\label{sec:schema-matching-app}
     696
     697Even though the described module is about ``semantic mapping'',  until now  we did not directly make use of the traditional ontology/schema mapping/alignment methods and tools as summarized in \ref{lit:schema-matching}. This is due
     698to the fact that the in this work we can harness the mechanisms of the semantic interoperability layer built into the core of the CMD Infrastructure, which integrates the task of identifying semantic correspondences directly into the process of schema creation,
     699to a high degree obsoleting the need for a posteriori complex schema matching/mapping techniques.
     700Or put in terms of the schema matching methodology, the system relies on explicitely set concept equivalences as base for mapping between schema entities. By referencing a data category in a CMD element, the modeller binds this element to a concept, making two elements linked to the same data category trivially equivalent.
     701
     702However this is only holds for schemas already created within the CMD framework (and even for these only to a certain degree, as will be explained later). Given the growing universe of definitions (data categories and components) in the CMD framework the metadata modeller could very well profit from applying schema mapping techniques as pre-processing step in the task of integrating existing external schemas into the infrastructure. (User involvement is identified by \cite{shvaiko2012ontology} as one of promising future challenges to ontology matching.) Already now, we witness a growing proliferation of components in the Component Registry and of data categories in the Data Category Registry.
     703
     704Let us restate the problem of integrating existing external schemas as an application of \var{schema matching} method:
     705The data modeller starts off with existing schema \var{$S_{x}$}. The system accomodates a set of schemas\footnote{We talk of schema even though the creation (and also remodelling) takes place in the component registry by creating CMD profiles and components, because every profile has an unambiguous expression in XML Schema.} \var{$S_{1..n}$}.
     706It is very unprobable, that there is a \var{$S_{y} \in S_{1..n}$} that fully matches \var{$S_{x}$}.
     707Given the heterogenity of the schemas present in the field of research, full alignments are not achievable at all.
     708However thanks to the compositional nature of the CMD data model, data modeller can reuse just parts of any of the schemas -- the
     709components \var{c}. Thus the task is to find for every entity $e_{x} \in S_{x}$ the set of semantically equivalent candidate components $\{c_{y}\}$, which corresponds to the definitions of mapping function for single entities as defined in \cite{EhrigSure2004}.
     710Given, that the modeller does not have to reuse the components as they are, but can use existing components as base to create his own, she is helped even with candidates that are not equivalent, thus we can further relax the task and allow even candidates that are just similar to a certain degree, that can be operationalized as threshold $t$ on the output of the \var{similarity} function
     711Being only a pre-processing step meant to provide suggestions to the human modeller implies higher importance to recall than to precision.
     712
     713Another requirement is that the matching entities should be maximal regarding the compositional tree:
     714
     715\begin{defcap}[!ht]
     716\caption{}
     717\begin{align*}
     718& map(e_{x1})  \rightarrow c_{y1} \\
     719& map(e_{x2})  \rightarrow c_{y2} \\
     720& candidates := \{(e_{xa},c_{ya}) : c_{ya} \notin descendants(c_{yb}) \}
     721\end{align*}
     722\end{defcap}
     723
     724Next to the usual features and measures that can be applied like label equality or string-similarity and structural equality,
     725the mapping function could be enriched with \emph{extensional} features based on the concept clusters as delivered by the crosswalk service \ref{sec:cx}.
     726
     727It would be worthwhile to test, in how far the \var{smcIndex} paths as defined in \ref{def:smcIndex} could be used as feature.
     728longest matching subpath.
     729
     730
     731Although we examplified on the case of integration of an external schema, the described approach could be applied also to the schemas already integrated in the system. Although there is already a high baseline given thanks to the mechanisms of reuse of components and data categories, there certainly still exist semantic proximities that are not explicitly expressed by these mechanisms. This deficiency is rooted in the collaborative creation of the CMD components and profiles, where individual modellers overlooked, deliberately ignored or only partially reused existing components or profiles. This can be seen on the case of multiple teiHeader profiles, that though they are modelling the same existing metadata format, are completely disconnected in terms of components and data category reuse (cf. \ref{results:tei}).
     732
     733Note, that in the case of reuse of components, in the normal scenario, the semantic equivalency is ensured even though the new component (and all its subcomponents) is a copy of the old one with new identity, because the references to data categories are copied as well, thus by default the new component shares all data categories with the original one and the modeller has to deliberately change them if required. But even with reuse of components scenarios are thinkable, in which the semantic linking gets broken, or is not established, even though semantic equivalency pervails.
     734
     735The question is, what to do with the new correspondences that would possibly be determined, when, as proposed, we would apply the schema matching on the integrated schemas. One possibility is to add a data category, if one of the pair is still one missing.
     736However if both already are linked to a data category, the data category pair could be added to the relation set in Relation Registry (cf. \ref{def:rr}).
     737 
     738Once all the equivalencies (and other relations) between the profiles/schemas were found, simliarity ratios can be determined.
     739This new simliarity ratios could be applied as alternative weights in the just-profiles graph \ref{sec:smc-cloud}.
     740
     741In contrast to the task described here, that -- restricted matching XML schemas -- can be seen as staying in the ``XML World'',
     742another aspect within this work is clearly situated in the Semantic Web world and requires application of ontology matching methods, the mapping of field values to semantic entities described in \ref{sec:values2entities}.
     743
     744
     745%This approach of integrating prerequisites for semantic interoperability directly into the process of metadata creation is fundamentally different from the traditional methods of schema matching that try to establish pairwise alignments between already existing schemas -- be it algorithm-based or by means of explicit manually defined crosswalks\cite{Shvaiko2005}.
     746
     747
    691748\section{Summary}
    692749In this core chapter, we layed out a design for a system dealing with concept-based crosswalks on schema level.
    693750The system consists of three main parts: the crosswalk service, the query expansion module and \xne{SMC Browser} -- a tool for visualizing and exploring the schemas and the corresponding crosswalks.
    694 
     751In addition, we elaborated on the application of schema matching methods to infer mappings between schemas.
     752
  • SMC4LRT/chapters/Literature.tex

    r3671 r3680  
    4848Most recently, with \xne{Europeana Cloud}\furl{http://pro.europeana.eu/web/europeana-cloud} (2013 to 2015) a succession of \xne{Europeana} was established, a Best Practice Network, coordinated by The European Library, designed to establish a cloud-based system for Europeana and its aggregators, providing new content, new metadata, a new linked storage system, new tools and services for researchers and a new platform - Europeana Research.
    4949
    50 A number of catalogs and formats are further described in the section \ref{sec:other-md-catalogs}
    51 
    52 
    53 \section{Schema / Ontology Mapping/Matching}
    54 
    55 Schema or ontology matching provides the methodical foundation for the problem at hand the \emph{semantic mapping}.
    56 As Shvaiko\cite{shvaiko2012ontology} states ``a solution to the semantic heterogeneity problem. It finds correspondences between semantically related entities of ontologies.''
    57 
    58 One starting point for the plethora of work in the field of \emph{schema and ontology mapping} techniques and technology
    59 is the overview of the field by Kalfoglou \cite{Kalfoglou2003}.
    60 Shvaiko and Euzenat provide a summary of the key challenges\cite{Shvaiko2008} as well as a comprehensive survey of approaches for schema and ontology matching based on a proposed new classification of schema-based matching techniques\cite{Shvaiko2005_classification}.
    61 Noy \cite{Noy2005_ontologyalignment,Noy2004_semanticintegration}
    62 
    63 and more recently \cite{shvaiko2012ontology}(2012!) and \cite{amrouch2012survey} provide surveys of the methods and systems in the field.
    64 
    65 \paragraph{Methods}
    66 Semantic and extensional methods are still rarely
    67 employed by the matching systems. In fact, most of
    68 the approaches are quite often based only on
    69 terminological and structural methods
    70 
    71 classify, review, and experimentally compare major methods of element similarity measures and their combinations.\cite{Algergawy2010}
     50The related catalogs and formats are described in the section \ref{sec:other-md-catalogs}
     51
     52
     53\section{Existing crosswalks (services)}
     54
     55Crosswalks as list of equivalent fields from two schemas have been around already for a long time, in the world of enterprise systems, e.g. to bridge to legacy systems and also in libraries,  e.g. \emph{MARC to Dublin Core Crosswalk}\furl{http://loc.gov/marc/marc2dc.html}
     56
     57\cite{Day2002crosswalks} lists a number of mappings between metadata formats.
     58
     59Mostly Dublin Core and MARC family of formats
     60
     61http://www.loc.gov/marc/dccross.html
     62
     63
     64static
     65metadata crosswalk repository
     66
     67
     68OCLC launched \xne{Metadata Schema Transformation Services}\furl{http://www.oclc.org/research/activities/schematrans.html?urlm=160118}
     69in particular \xne{Crosswalk Web Service}\furl{http://www.oclc.org/developer/services/metadata-crosswalk-service}
     70http://www.oclc.org/research/activities/xwalk.html
     71
     72\begin{quotation}
     73a self-contained crosswalk utility that can be called by any application that must translate metadata records. In our implementation, the translation logic is executed by a dedicated XML application called the Semantic Equivalence Expression Language, or Seel, a language specification and a corresponding interpreter that transcribes the information in a crosswalk into an executable format.
     74\end{quotation}
     75
     76the Crosswalk Web Service is now a production system that has been incorporated into the following OCLC products and services.
     77
     78However the demo service is not available\furl{http://errol.oclc.org/schemaTrans.oclc.org.search}
     79
     80
     81
     82Offered formats?
     83These however concentrate on the formats for the LIS community available and are ??
     84
     85For this service, a metadata format is defined as a triple of:
     86
     87    Standard—the metadata standard of the record (e.g. MARC, DC, MODS, etc ...)
     88    Structure—the structure of how the metadata is expressed in the record (e.g. XML, RDF, ISO 2709, etc ...)
     89    Encoding—the character encoding of the metadata (e.g. MARC8, UTF-8, Windows 1251, etc ...)
     90
     91
     92Offered interface!?
     93he Crosswalk Web Service has 4 methods:
     94
     95    translate(...) - This method translates the records. See the documentation for more information.
     96    getSupportedSourceRecordFormats() - This method returns a list of formats that are supported as input formats.
     97    getSupportedTargetRecordFormats() - This method returns a list of formats that the input formats can be translated to.
     98    getSupportedJavaEncodings() - Some formats will support all of the character encodings that Java supports. This function returns the list of encodings that Java supports.
     99
     100
     101
     102\section{Schema/Ontology Mapping/Matching}
     103\label{lit:schema-matching}
     104
     105As Shvaiko\cite{shvaiko2012ontology} states ``\emph{Ontology matching} is a solution to the semantic heterogeneity problem. It finds correspondences between semantically related entities of ontologies.''
     106As such, it provides a very suitable methodical foundation for the problem at hand -- the \emph{semantic mapping}. (In sections \ref{sec:schema-matching-app} and \ref{sec:values2entities} we elaborate on the possible ways to apply these methods to the described problem.)
     107
     108There is a plethora of work on methods and technology in the field of \emph{schema and ontology matching} as witnessed by a sizable number of publications providing overviews, surveys and classifications of existing work \cite{Kalfoglou2003, Shvaiko2008, Noy2005_ontologyalignment, Noy2004_semanticintegration, Shvaiko2005_classification} and most recently \cite{shvaiko2012ontology, amrouch2012survey}.
     109
     110%Shvaiko and Euzenat provide a summary of the key challenges\cite{Shvaiko2008} as well as a comprehensive survey of approaches for schema and ontology matching based on a proposed new classification of schema-based matching techniques\cite{}.
     111
     112Shvaiko and Euzenat also run the web page \url{http://www.ontologymatching.org/} dedicated to this topic and the related OAEI\footnote{Ontology Alignment Evalution Intiative - \url{http://oaei.ontologymatching.org/}}, an ongoing effort to evaulate alignment tools based on various alignment tasks from different domains.
     113
     114Interestingly, \cite{shvaiko2012ontology} somewhat self-critically asks if after years of research``the field of ontology matching [is] still making progress?''.
     115
     116\subsubsection{Method}
     117
     118There are slight differences in use of the terms between \cite{EhrigSure2004, Ehrig2006}, \cite{Euzenat2007} and \cite{amrouch2012survey}, especially one has to be aware if in given context the term denotes the task in general, the process, the actual operation/function or the result of the function.
     119
     120\cite{Euzenat2007} formalizes the problem as ``ontology matching operation'':
     121
     122\begin{quotation}
     123The matching operation determines an alignment A' for a
     124pair of ontologies O1 and O2. Hence, given a pair of
     125ontologies (which can be very simple and contain one entity
     126each), the matching task is that of finding an alignment
     127between these ontologies. [\dots]
     128\end{quotation}
     129
     130But basically the different authors broadly agree on the definition of \var{ontology alignment} in the meaning \concept{task} is ``to identify relations between individual elements of mulitple ontologies'', or as \concept{result} ``a set of correspondences between entities belonging to the matched ontologies''.
     131
     132More formally \cite{Ehrig2006} formulates ontology alignment as ``a partial function based on the set \var{E} of all entities $e \in E$ and based on the set of possible ontologies \var{O}. [\dots] Once an alignment is established we say entity \var{e} is aligned with entity \var{f} when \var{align(e) \ = \ f}.'' Also, ``alignment is a one-to-one equality relation.'' (although this is relativized further in the work, and also in \cite{EhrigSure2004} )
     133 
     134\begin{definition}{\var{align} function}
     135align \ : E \times O \times O \rightarrow E
     136\end{definition}
     137
     138\cite{EhrigSure2004} and \cite{amrouch2012survey} instead introduce \var{ontology mapping} when applying the task on individual entities, in the meaning as a function that ``for each concept (node) in ontology A [tries to] find a corresponding concept
     139(node), which has the same or similar semantics, in ontology B and vice verse''. In the meaning as result it is ``formal expression describing a semantic relationship between two (or more) concepts belonging to two (or more) different ontologies''.
     140
     141\cite{EhrigSure2004} further specify the mapping function as based on a similarity function, that for a pair of entities from two (or more) ontologies computes a ratio indicating the semantic proximity of the two entities.
     142
     143\begin{defcap}[!ht]
     144\caption{\var{map} function for single entities and underlying \var{similarity} function }
     145\begin{align*}
     146& map \ : O_{i1}  \rightarrow O_{i2} \\
     147& map( e_{i_{1}j_{1}}) = e_{i_{2}j_{2}}\text{, if } sim(e_{i_{1}j_{1}},e_{i_{2}j_{2}}) \ \textgreater \ t  \text{ with } t \text{ being the threshold} \\
     148& sim \ : E \times E \times O \times O \rightarrow [0,1]
     149\end{align*}
     150\end{defcap}
     151
     152This elegant abstraction introduced with the \var{similarity} function provides a general model that can accomodate a broad range of comparison relationships and corresponding similarity measures. And here, again, we encounter a broad range of possible approaches.
     153
     154\cite{ehrig2004qom} lists a number of basic features and corresponsing similarity measures:
     155Starting from primitive data types, next to value equality, string similarity, edit distance or in general relative distance can be computed.
     156For concepts, next to the directly applicable unambiguous \code{sameAs} statements, label similarity can be determined (again either as string similarity, but also broaded by employing external taxonomies and other semantic resources like WordNet - \emph{extensional} methods), equal (shared) class instances, shared superclasses, subclasses, properties.
     157
     158Element-level (terminological)  vs structure-level (structural)  \cite{Shvaiko2005_classification}
     159
     160based on background knowledge...
     161
     162subclass–superclass relationships, domains and ranges of properties, analysis of the graph structure of the ontology.
     163
     164For properties the degree of the super an subproperties equality, overlapping domain and/or range.
     165Additionally to these measures applicable on individual ontology items, there are approaches (like the \var{Similarity Flooding algorithm} \cite{melnik2002similarity}) to propagate computed similarities across the graph defined by relations between entities (primarily subsumption hierarchy).
     166
     167\cite{Algergawy2010} classifies, reviews, and experimentally compares major methods of element similarity measures and their combinations. \cite{shvaiko2012ontology} comparing a number of recent systems finds that ``semantic and extensional methods are still rarely employed. In fact, most of the approaches are quite often based only on terminological and structural methods.
     168
     169\cite{Ehrig2006} employs this \var{similarity} function over single entities to derive the notion of \var{ontology similarity} as ``based on similarity of pairs of single entities from the different ontologies''. This is operationalized as some kind of aggregating function\cite{ehrig2004qom}, that combines all similiarity measures (mostly modulated by custom weighting) computed for pairs of single entities again into one value (from the \var{[0,1]} range) expressing the similarity ratio of the two ontologies being compared. (The employment of weights allows to apply machine learning approaches for optimization of the results.)
     170
     171Thus, \var{ontology similarity} is a much weaker assertion, than \var{ontology alignment}, in fact, the computed similarity is interpreted to assert ontology alignment: the aggregated similarity above a defined threshold indicates an alignment.
     172
     173
     174As to the alignment process, \cite{Ehrig2006} distinguishes following steps:
     175\begin{enumerate}
     176\item Feature Engineering
     177\item Search Step Selection
     178\item Similarity Assessment
     179\item Interpretation
     180\item Iteration
     181\end{enumerate}
     182
     183In  contrast, \cite{jimenez2012large} in their system \xne{LogMap2} reduce the process into just two steps: computation of mapping candidates (maximise recall) and assessment of the candidates (maximize precision), that however correspond  to the steps 2 and 3 of the above procedure and in fact the other steps are implicitly present in the described system.
     184
    72185
    73186\subsubsection{Systems}
    74 A number of existing systems for schema/ontology matching/alignment is mentioned in this overview publications:
    75 
    76 The majority of tools for ontology mapping use some sort of structural or
    77 definitional information to discover new mappings. This information includes
    78 such elements as subclass–superclass relationships, domains and ranges of
    79 properties, analysis of the graph structure of the ontology, and so on. Some of
    80 the tools in this category include
    81 
    82 IF-Map\cite{kalfoglou2003if}
    83 
    84 QOM\cite{ehrig2004qom},
    85 
    86 Similarity Flooding\cite{melnik}
    87 
    88 the Prompt tools \cite{Noy2003_theprompt} integrating with Protege
    89 
    90 
    91 \xne{COMA++} \cite{Aumueller2005}  composite approach to combine different match algorithms, user interaction via graphical interface , supports W3C XML Schema and OWL.
    92 
    93 \xne{FOAM}\cite{EhrigSure2005}
    94 
    95 
    96 Ontology matching system \xne{LogMap 2} \cite{jimenez2012large} supports user interaction and implements scalable reasoning and diagnosis algorithms, which minimise any logical inconsistencies introduced by the matching process.
    97 The process is divided into two main logical phases: computation of mapping candidates (maximise recall) and assessment of the candidates (maximize precision).
    98 
    99 
    100         s which are at the core of the mapping task.
    101 
    102 
    103 On the dedicated platform OAEI\footnote{Ontology Alignment Evalution Intiative - \url{http://oaei.ontologymatching.org/}} an ongoing effort is being carried out and documented comparing various alignment methods applied on different domains.
    104 
    105 
    106 One more specific recent inspirational work is that of Noah et. al \cite{Noah2010} developing a semantic digital library for an academic institution. The scope is limited to document collections, but nevertheless many aspects seem very relevant for this work, like operating on document metadata, ontology population or sophisticated querying and searching.
    107 
    108 Matching is laborious and error-prone process, and once ontology
    109 mappings are discovered, i
    110 
    111 \subsection{MOVEOUT: Application of Schema Matching on the CMD domain}
    112 Notice, that this the semantic interoperability layer built into the core of the CMD Infrastructure, integrates the
    113 task of identifying semantic correspondences directly into the process of schema creation,
    114 largely removing the need for complex schema matching/mapping techniques in the post-processing.
    115 However this is only holds for schemas already created within the CMD framework,
    116 Given the growing universe of definitions (data categories and components) in the CMD framework the metadata modeller could very well profit from applying schema mapping techniques as pre-processing step in the task of integrating existing external schemas into the infrastructure. User involvement is identified by \cite{shvaiko2012ontology} as one of promising future challenges to ontology matching
    117 
    118 Such a procedure pays tribute to the fact, that the mapping techniques are mostly error-prone and can deliver reliable 1:1 alignments only in trivial cases. This lies in the nature of the problem, given the heterogenity of the schemas present in the data collection, full alignments are not achievable at all, only parts of individual schemas actually semantically correspond.
    119 
    120 Once all the equivalencies (and other relations) between the profiles/schemas were found, simliarity ratios can be determined.
    121 
    122 The task can be also seen as building bridge between XML resources and semantic resources expressed in RDF, OWL.
    123 This speaks for a tool like COMA++ supporting both W3C standards:  XML Schema and OWL.
    124 Concentration on existing systems with user interface?
    125 
    126 The process of expressing the whole of the data as one semantic resource, can be also understood as schema or ontology merging task. Data categories being the primary mapping elements
    127 
    128 
    129 In the end
    130 It is also not the goal to merge
    131 
    132 Being only a pre-processing step meant to provide suggestions to the human modeller implies higher importance to recall than to precision.
    133 
    134 
    135 
    136 infrastructure un
    137 
    138 This approach of integrating prerequisites for semantic interoperability directly into the process of metadata creation is fundamentally different from the traditional methods of schema matching that try to establish pairwise alignments between already existing schemas -- be it algorithm-based or by means of explicit manually defined crosswalks\cite{Shvaiko2005}.
    139 
    140 
    141 Application of ontology/schema matching/mapping techniques
    142 is reduced or outsourced
    143 
    144 
    145 \subsection{Existing Crosswalk services}
    146 
    147 
    148 \url{http://www.oclc.org/developer/services/metadata-crosswalk-service}
    149 
    150 
    151 VoID "Vocabulary of Interlinked Datasets") is an RDF based schema to describe linked datasets\furl{http://semanticweb.org/wiki/VoID}
    152 
    153 http://www.dnb.de/rdf
    154 
    155 
    156 the entire WorldCat cataloging collection made publicly
    157 available using Schema.org mark-up with library extensions for use by developers and
    158 search partners such as Bing, Google, Yahoo! and Yandex
    159 
    160 OCLC begins adding linked data to WorldCat by appending
    161 Schema.org descriptive mark-up to WorldCat.org pages, thereby
    162 making OCLC member library data available for use by intelligent
    163 Web crawlers such as Google and Bing
     187A number of existing systems for schema/ontology matching/alignment is collected in the above-mentioned overview publications:
     188
     189IF-Map \cite{kalfoglou2003if}, QOM \cite{ehrig2004qom}, \xne{FOAM} \cite{EhrigSure2005}, Similarity Flooding (SF) \cite{melnik}, S-Match \cite{Giunchiglia2007_semanticmatching}, the Prompt tools \cite{Noy2003_theprompt} integrating with Protégé or \xne{COMA++} \cite{Aumueller2005}, \xne{Chimaera}. Additionally, \cite{shvaiko2012ontology} lists and evaluates some more recent contributions: \xne{SAMBO, Falcon, RiMOM, ASMOV, Anchor-Flood, AgreementMaker}.
     190
     191All of the tools use multiple methods as described in the previous section, exploiting both element as well as structural features and applying some kind of composition or aggregation of the computed atomic measures, to arrive to a alignment assertion.
     192
     193Next to OWL as input format supported by all the systems some also accept XML Schemas (\xne{COMA++, SF, Cupid, SMatch}),
     194some provide a GUI (\xne{COMA++, Chimaera, PROMPT, SAMBO, AgreementMaker}).
     195
     196Scalability is one factor to be considered, given that in a baseline scenario (before considering efficiency optimisations in candidate generation) the space of possible candidate mappings is the cartesian product of entities from the two ontologies being aligned. Authors of the (refurbished) ontology matching system \xne{LogMap 2} \cite{jimenez2012large} hold that it implements scalable reasoning and diagnosis algorithms, performant enough, to be integrated with the provided user interaction.
     197
     198
    164199
    165200%%%%%%%%%%%%%%%%%%%%%%%%%%5
     
    168203Linked Data paradigm\cite{TimBL2006} for publishing data on the web is increasingly been taken up by data providers across many disciplines \cite{bizer2009linked}. \cite{HeathBizer2011} gives comprehensive overview of the principles of Linked Data with practical examples and current applications.
    169204
    170 \subsection{Semantic Web - Technical solutions / Server applications}
     205\subsubsection{Semantic Web - Technical solutions / Server applications}
     206
    171207
    172208The provision of the produced semantic resources on the web requires technical solutions to store the RDF triples, query them efficiently
     
    183219Another solution worth examining is the \xne{Linked Media Framework}\furl{http://code.google.com/p/lmf/} -- ``easy-to-setup server application that bundles together three Apache open source projects to offer some advanced services for linked media management'': publishing legacy data as linked data, semantic search by enriching data with content from the Linked Data Cloud, using SKOS thesaurus for information extraction.
    184220
     221One more specific work is that of Noah et. al \cite{Noah2010} developing a semantic digital library for an academic institution. The scope is limited to document collections, but nevertheless many aspects seem very relevant for this work, like operating on document metadata, ontology population or sophisticated querying and searching.
     222
     223
    185224\begin{comment}
    186225LDpath\furl{http://code.google.com/p/ldpath/}
     
    192231\end{comment}
    193232
    194 \subsection{Ontology Visualization}
     233\subsubsection{Ontology Visualization}
    195234
    196235Landscape, Treemap, SOM
     
    199238
    200239
     240%%%%%%%%%%%%%%%%
    201241\section{Language and Ontologies}
    202242
     
    245285and Specification of Data Categories, Data Category Registry (ISO 12620:2009 \cite{ISO12620:2009})
    246286
    247 Lexical Markup Framework LMF \cite{Francopoulo2006LMF, ISO24613:2008} defines a metamodel for representing data in lexical databases used with monolingual and multilingual computer applications.
     287Lexical Markup Framework LMF \cite{Francopoulo2006LMF, ISO24613:2008} defines a metamodel for representing data in lexical databases used with monolingual and multilingual computer applications, provides a RDF serialization (?!?!).
    248288
    249289An overview of current developments in application of the linked data paradigm for linguistic data collections was given at the  workshop Linked Data in Linguistics\furl{http://ldl2012.lod2.eu/} 2012 \cite{ldl2012}.
  • SMC4LRT/chapters/Results.tex

    r3671 r3680  
    178178
    179179\subsubsection{teiHeader}
    180 
     180\label{results:tei}
    181181TEI is a de-facto standard for encoding any kind of textual resources. It defines a set of elements to annotate individual aspects of the text being encoded. For the purposes of text description / metadata the complex element \code{teiHeader} is foreseen.
    182182TEI does not provide just one fixed schema, but allows for a certain flexibility wrt to elements used and inner structure, allowing to generate custom schemas adopted to projects' needs. \ref{def:tei}.
     
    281281%%%%%%%%%%%%%%%%%%%%%%%
    282282\subsection{SMC cloud}
     283\label{sec:smc-cloud}
    283284As a latest, still experimental, addition, SMC browser provides a special type of graph, that displays only profiles. The links between them reflect the reuse of components and data categories (i.e. how many components or data categories do the linked pairs of profiles share), indicating the degree of similarity or semantic proximity. Figure \ref{fig:SMC_cloud} depicts one possible output of the graph
    284285covering a large part of the defined profiles. It shows nicely the clusters of strongly related profiles in contrast to the greater distances between more loosely connected profiles.
  • SMC4LRT/utils.tex

    r3666 r3680  
    1919\usepackage{framed}
    2020\usepackage{amsmath}
    21 %\usepackage{tabularx}
     21\usepackage{tabularx}
    2222\usepackage{tabu}
    2323       
Note: See TracChangeset for help on using the changeset viewer.