source: SMC4LRT/chapters/Literature.tex @ 3551

Last change on this file since 3551 was 3551, checked in by vronk, 11 years ago

intermediate version - ongoing work on introduction

File size: 8.9 KB
Line 
1%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2\chapter{State of the Art}
3\label{ch:lit}
4%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
5
6This work is guided by two main dimensions: the \textbf{data} -- in broad, Language Resource and Technology  -- and the \textbf{method} -- Schema matching and Semantic Web technologies. This division is reflected in the following chapter:
7
8\section{(Infrastructure for) Language Resources and Technology}
9In recent years, multiple large-scale initiatives have been set out to combat the fragmented nature of the language resources landscape in general and the metadata interoperability problems in particular.
10
11The CLARIN project also delivers a valuable source of information on the normative resources in the domain in its current deliverable on \textit{Interoperability and Standards} \cite{CLARIN_D5.C-3}. Next to covering ontologies as one type of resources this document offers an exhaustive collection of references to standards, vocabularies and other normative/standardization work in the field of Language Resources and Technology.
12
13Regarding existing domain-specific semantic resources \texttt{LT-World}\footnote{\url{http://www.lt-world.org/}},  the ontology-based portal covering primarily Language Technology being developed at DFKI\footnote{\textit{Deutsches Forschungszentrum fÃŒr KÃŒnstliche Intelligenz} - \url{http://www.dfki.de}},  is a prominent resource providing information about the entities (Institutions, Persons, Projects, Tools, etc.) in this field of study. \cite{Joerg2010}
14Chapter \ref{ch:data} examines the field of LRT in more detail.
15
16
17\subsection{Metadata}
18A comprehensive architecture for harmonized handling of metadata -- the Component Metadata Infrastructure (CMDI)\footnote{\url{http://www.clarin.eu/cmdi}} \cite{Broeder2011} -- is being implemented within the CLARIN project\footnote{\url{http://clarin.eu}}. This service-oriented architecture consisting of a number of interacting software modules allows metadata creation and provision based on a flexible meta model, the \emph{Component Metadata Framework}, that facilitates creation of customized metadata schemas -- acknowledging that no one metadata schema can cover the large variety of language resources and usage scenarios -- however at the same time equipped with well-defined methods to ground their semantic interpretation in a community-wide controlled vocabulary -- the data category registry \cite{Kemps-Snijders+2009,Broeder2010}.
19
20Individual components of this infrastructure will be described in more detail in the section \ref{ch:infra}.
21
22A number of solution evolved in the recent years.
23The first to undertake standardization efforts for the exchange of catalog information were digital libraries.
24
25Z39.50 as base protocol, Worldcat, mapping/configuration files.
26These catalogs are further described in the section \ref{sec:other-md-catalogs}
27
28In the recent years the evolving research infrastructures all identified a common/harmonized search as a crucial component of the system and came up with a number of solutions, however often reduced to collecting metadata, reducing to dublincore
29and offering a lucene/solr based facetted search.
30These catalogs are further described in the section \ref{sec:lrt-md-catalogs}.
31
32Riley and Becker \cite{Riley2010seeing} put the overwhelming amount of existing metadata standards into a systematic comprehensive overview analyzing the use of standards from four aspects: community, domain, function, and purpose.
33
34\subsection{Content Repositories}
35Metadata is only one aspect of the availability of resources. It is the first step to announce and describe the resources. However it is of little value, if the resources themselves are not equally well accessible. Thus another pillar of the CLARIN infrastructure are Content Repositories - centres to ensure availability of resources.
36In the following a few well established repositories are mentioned and described, as well as some of the new repositories being set up in the context of CLARIN.
37
38\begin{description}
39\item[PHAIDRA] Permanent Hosting, Archiving and Indexing of Digital Resources and Assets, provided by Vienna University \footnote{\url{https://phaidra.univie.ac.at/}}
40\item[eSciDoc]  provided by MPG + FIZ Karlsruhe \footnote{\url{https://www.escidoc.org/}}
41\item[TextGrid] \todocode{install: TextGrid2 - check: TG-search}\furl{http:/textgrid.de}
42\item[DRIVER] pan-European infrastructure of Digital Repositories \footnote{\url{http://www.driver-repository.eu/}}
43\item[OpenAIRE] - Open Acces Infrastructure for Research in Europe \footnote{\url{http://www.openaire.eu/}}
44\end{description}
45
46\subsection{Content/Corpus Search}
47Corpus Search Systems
48\begin{description}
49\item[DDC]  - text-corpus
50\item[manatee] - text-corpus
51\item[CQP] - text-corps
52\item[TROVA] - MM annotated resources
53\item[ELAN] - MM annotated resources (editor + search)
54\end{description}
55
56\subsection{FederatedSearch}
57\todoask{How to relate Federated Search to SMC? }
58
59
60\section{Semantic Web}
61
62\todoin{cite TimBL}
63
64\begin{description}
65\item[RDF/OWL]
66\item[SKOS]
67\end{description}
68
69
70\subsection{Linked Open Data} 
71As described previously, one outcome of the work will be the dataset expressed in RDF interlinked with other semantic resources.
72This is very much in line with the broad \textit{Linked Open Data} effort as proposed by Berners-Lee \cite{TimBL2006} and being pursuit across many discplines. (This topic is supported also by the EU Commission within the FP7.\footnote{\url{http://cordis.europa.eu/fetch?CALLER=PROJ\_ICT&ACTION=D&CAT=PROJ&RCN=95562}}) A very recent comprehensive overview of the principles of Linked Data and current applications is the book by Heath and Bizer \cite{HeathBizer2011}, that shall serve as a practical guide for this specific task.
73
74Formate:
75Turtle \furl{http://www.w3.org/TeamSubmission/turtle/\#sec-grammar-comments}
76RDFa\furl{http://en.wikipedia.org/wiki/RDFa}
77EDM\furl{http://europeana.ontotext.com/resource/edm/hasType?role=all}
78
79
80\todocite{http://ldl2012.lod2.eu/program/proceedings}
81\todoin{check LDpath}\furl{http://code.google.com/p/ldpath/}
82
83
84\subsection{Schema / Ontology Mapping} 
85As the main contribution shall be the application of \emph{ontology mapping} techniques and technology, a comprehensive overview of this field and current developments is paramount. There seems to be a plethora of work on the topic and the difficult task will be to sort out the relevant contributions. The starting point for the investigation will be the overview of the field by Kalfoglou \cite{Kalfoglou2003} and a more recent summary of the key challenges by Shvaiko and Euzenat \cite{Shvaiko2008}.
86
87In their rather theoretical work Ehrig and Sure \cite{EhrigSure2004} elaborate on the various similarity measures which are at the core of the mapping task. On the dedicated platform OAEI\footnote{Ontology Alignment Evalution Intiative - \url{http://oaei.ontologymatching.org/}} an ongoing effort is being carried out and documented comparing various alignment methods applied on different domains.
88
89One more specific recent inspirative work is that of Noah et. al \cite{Noah2010} developing a semantic digital library for an academic institution. The scope is limited to document collections, but nevertheless many aspects seem very relevant for this work, like operating on document metadata, ontology population or sophisticated querying and searching.
90
91\todoin{check if relevant: http://schema.org/}
92
93\subsection{Existing Crosswalk services}
94
95\url{http://www.oclc.org/developer/services/metadata-crosswalk-service}
96
97http://semanticweb.org/wiki/VoID
98http://www.dnb.de/rdf
99
100\subsection{Ontology Visualization}
101
102
103\subsection{Linguistic Ontologies}
104
105A special case are Linguistic Ontologies: isocat, GOLD, WALS.info
106ontologies conceptualizing the linguistic domain
107
108They are special in that (``ontologized'') Lexicons refer to them to describe linguistic properties of the Lexical Entries, as opposed to linking to Domain Ontologies to anchor Senses/Meanings.
109Lexicalized Ontologies: LingInfo, lemon: LMF +  isocat/GOLD +  Domain Ontology
110
111a) as domain ontologies, describing aspects of the Resources\\
112b) as linguistic ontologies enriching the Lexicalization of Concepts
113
114Ontology and Lexicon \cite{Hirst2009}
115
116LingInfo/Lemon \cite{Buitelaar2009}
117
118We shouldn't need linguistic ontologies (LingInfo, LEmon), they are primarily relevant in the task of ontology population from texts, where the entities can be encountered in various word-forms in the context of the text.
119(Ontology Learning, Ontology-based Semantic Annotation of Text)
120And we are dealing with highly structured data with referenced in their nominal(?) form.
121
122
123
124\section{Summary}
125This chapter concentrated on the current affairs/developments regarding the infrastructures for Language Resources and Technology and
126on the other hand gave an overview of the state of the art regarding methods to be applied in this work: Semantic Web Technologies, Ontology Mapping and Ontology Visualization.
Note: See TracBrowser for help on using the repository browser.