1 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
---|
2 | \chapter{State of the Art} |
---|
3 | \label{ch:lit} |
---|
4 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
---|
5 | |
---|
6 | This work is guided by two main dimensions: the \textbf{data} -- in broad, Language Resource and Technology -- and the \textbf{method} -- Schema matching and Semantic Web technologies. This division is reflected in the following chapter: |
---|
7 | |
---|
8 | \section{(Infrastructure for) Language Resources and Technology} |
---|
9 | In recent years, multiple large-scale initiatives have been set out to combat the fragmented nature of the language resources landscape in general and the metadata interoperability problems in particular. |
---|
10 | |
---|
11 | The CLARIN project also delivers a valuable source of information on the normative resources in the domain in its current deliverable on \textit{Interoperability and Standards} \cite{CLARIN_D5.C-3}. Next to covering ontologies as one type of resources this document offers an exhaustive collection of references to standards, vocabularies and other normative/standardization work in the field of Language Resources and Technology. |
---|
12 | |
---|
13 | Regarding existing domain-specific semantic resources \texttt{LT-World}\footnote{\url{http://www.lt-world.org/}}, the ontology-based portal covering primarily Language Technology being developed at DFKI\footnote{\textit{Deutsches Forschungszentrum fÃŒr KÃŒnstliche Intelligenz} - \url{http://www.dfki.de}}, is a prominent resource providing information about the entities (Institutions, Persons, Projects, Tools, etc.) in this field of study. \cite{Joerg2010} |
---|
14 | Chapter \ref{ch:data} examines the field of LRT in more detail. |
---|
15 | |
---|
16 | |
---|
17 | \subsection{Metadata} |
---|
18 | A comprehensive architecture for harmonized handling of metadata -- the Component Metadata Infrastructure (CMDI)\furl{http://www.clarin.eu/cmdi} \cite{Broeder2011} -- is being implemented within the CLARIN project\footnote{\url{http://clarin.eu}}. This service-oriented architecture consisting of a number of interacting software modules allows metadata creation and provision based on a flexible meta model, the \emph{Component Metadata Framework}, that facilitates creation of customized metadata schemas -- acknowledging that no one metadata schema can cover the large variety of language resources and usage scenarios -- however at the same time equipped with well-defined methods to ground their semantic interpretation in a community-wide controlled vocabulary -- the data category registry \cite{Kemps-Snijders+2009,Broeder2010}. |
---|
19 | |
---|
20 | Individual components of this infrastructure will be described in more detail in the section \ref{ch:infra}. |
---|
21 | |
---|
22 | A number of solution evolved in the recent years. |
---|
23 | The first to undertake standardization efforts for the exchange of catalog information were digital libraries. |
---|
24 | |
---|
25 | Z39.50 as base protocol, Worldcat, mapping/configuration files. |
---|
26 | These catalogs are further described in the section \ref{sec:other-md-catalogs} |
---|
27 | |
---|
28 | In the recent years the evolving research infrastructures all identified a common/harmonized search as a crucial component of the system and came up with a number of solutions, however often reduced to collecting metadata, reducing to dublincore |
---|
29 | and offering a lucene/solr based facetted search. |
---|
30 | These catalogs are further described in the section \ref{sec:lrt-md-catalogs}. |
---|
31 | |
---|
32 | Riley and Becker \cite{Riley2010seeing} put the overwhelming amount of existing metadata standards into a systematic comprehensive overview analyzing the use of standards from four aspects: community, domain, function, and purpose. |
---|
33 | |
---|
34 | \subsection{Content Repositories} |
---|
35 | Metadata is only one aspect of the availability of resources. It is the first step to announce and describe the resources. However it is of little value, if the resources themselves are not equally well accessible. Thus another pillar of the CLARIN infrastructure are Content Repositories - centres to ensure availability of resources. |
---|
36 | In the following a few well established repositories are mentioned and described, as well as some of the new repositories being set up in the context of CLARIN. |
---|
37 | |
---|
38 | \begin{description} |
---|
39 | \item[PHAIDRA] Permanent Hosting, Archiving and Indexing of Digital Resources and Assets, provided by Vienna University \footnote{\url{https://phaidra.univie.ac.at/}} |
---|
40 | \item[eSciDoc] provided by MPG + FIZ Karlsruhe \footnote{\url{https://www.escidoc.org/}} |
---|
41 | \item[TextGrid] \todocode{install: TextGrid2 - check: TG-search}\furl{http:/textgrid.de} |
---|
42 | \item[DRIVER] pan-European infrastructure of Digital Repositories \footnote{\url{http://www.driver-repository.eu/}} |
---|
43 | \item[OpenAIRE] - Open Acces Infrastructure for Research in Europe \footnote{\url{http://www.openaire.eu/}} |
---|
44 | \end{description} |
---|
45 | |
---|
46 | \subsection{Content/Corpus Search} |
---|
47 | Corpus Search Systems |
---|
48 | \begin{description} |
---|
49 | \item[DDC] - text-corpus |
---|
50 | \item[manatee] - text-corpus |
---|
51 | \item[CQP] - text-corps |
---|
52 | \item[TROVA] - MM annotated resources |
---|
53 | \item[ELAN] - MM annotated resources (editor + search) |
---|
54 | \end{description} |
---|
55 | |
---|
56 | \subsection{FederatedSearch} |
---|
57 | \todoask{How to relate Federated Search to SMC? } |
---|
58 | |
---|
59 | |
---|
60 | \section{Semantic Web} |
---|
61 | |
---|
62 | \todoin{cite TimBL} |
---|
63 | |
---|
64 | \begin{description} |
---|
65 | \item[RDF/OWL] |
---|
66 | \item[SKOS] |
---|
67 | \end{description} |
---|
68 | |
---|
69 | |
---|
70 | \subsection{Linked Open Data} |
---|
71 | As described previously, one outcome of the work will be the dataset expressed in RDF interlinked with other semantic resources. |
---|
72 | This is very much in line with the broad \textit{Linked Open Data} effort as proposed by Berners-Lee \cite{TimBL2006} and being pursuit across many discplines. (This topic is supported also by the EU Commission within the FP7.\footnote{\url{http://cordis.europa.eu/fetch?CALLER=PROJ\_ICT&ACTION=D&CAT=PROJ&RCN=95562}}) A very recent comprehensive overview of the principles of Linked Data and current applications is the book by Heath and Bizer \cite{HeathBizer2011}, that shall serve as a practical guide for this specific task. |
---|
73 | |
---|
74 | Formate: |
---|
75 | Turtle \furl{http://www.w3.org/TeamSubmission/turtle/\#sec-grammar-comments} |
---|
76 | RDFa\furl{http://en.wikipedia.org/wiki/RDFa} |
---|
77 | EDM\furl{http://europeana.ontotext.com/resource/edm/hasType?role=all} |
---|
78 | |
---|
79 | |
---|
80 | \todocite{http://ldl2012.lod2.eu/program/proceedings} |
---|
81 | \todoin{check LDpath}\furl{http://code.google.com/p/ldpath/} |
---|
82 | |
---|
83 | |
---|
84 | \subsection{Schema / Ontology Mapping} |
---|
85 | As the main contribution shall be the application of \emph{ontology mapping} techniques and technology, a comprehensive overview of this field and current developments is paramount. There seems to be a plethora of work on the topic and the difficult task will be to sort out the relevant contributions. The starting point for the investigation will be the overview of the field by Kalfoglou \cite{Kalfoglou2003} and a more recent summary of the key challenges by Shvaiko and Euzenat \cite{Shvaiko2008}. |
---|
86 | |
---|
87 | In their rather theoretical work Ehrig and Sure \cite{EhrigSure2004} elaborate on the various similarity measures which are at the core of the mapping task. On the dedicated platform OAEI\footnote{Ontology Alignment Evalution Intiative - \url{http://oaei.ontologymatching.org/}} an ongoing effort is being carried out and documented comparing various alignment methods applied on different domains. |
---|
88 | |
---|
89 | One more specific recent inspirational work is that of Noah et. al \cite{Noah2010} developing a semantic digital library for an academic institution. The scope is limited to document collections, but nevertheless many aspects seem very relevant for this work, like operating on document metadata, ontology population or sophisticated querying and searching. |
---|
90 | |
---|
91 | \todoin{check if relevant: http://schema.org/} |
---|
92 | |
---|
93 | \subsection{Existing Crosswalk services} |
---|
94 | |
---|
95 | \url{http://www.oclc.org/developer/services/metadata-crosswalk-service} |
---|
96 | |
---|
97 | http://semanticweb.org/wiki/VoID |
---|
98 | http://www.dnb.de/rdf |
---|
99 | |
---|
100 | \subsection{Ontology Visualization} |
---|
101 | |
---|
102 | Landscape, Treemap, SOM |
---|
103 | |
---|
104 | \todoin{check Ontology Mapping and Alignement / saiks/Ontology4 4auf1.pdf} |
---|
105 | |
---|
106 | |
---|
107 | \subsection{Linguistic Ontologies} |
---|
108 | |
---|
109 | A special case are Linguistic Ontologies: isocat, GOLD, WALS.info |
---|
110 | ontologies conceptualizing the linguistic domain |
---|
111 | |
---|
112 | They are special in that (``ontologized'') Lexicons refer to them to describe linguistic properties of the Lexical Entries, as opposed to linking to Domain Ontologies to anchor Senses/Meanings. |
---|
113 | Lexicalized Ontologies: LingInfo, lemon: LMF + isocat/GOLD + Domain Ontology |
---|
114 | |
---|
115 | a) as domain ontologies, describing aspects of the Resources\\ |
---|
116 | b) as linguistic ontologies enriching the Lexicalization of Concepts |
---|
117 | |
---|
118 | Ontology and Lexicon \cite{Hirst2009} |
---|
119 | |
---|
120 | LingInfo/Lemon \cite{Buitelaar2009} |
---|
121 | |
---|
122 | We shouldn't need linguistic ontologies (LingInfo, LEmon), they are primarily relevant in the task of ontology population from texts, where the entities can be encountered in various word-forms in the context of the text. |
---|
123 | (Ontology Learning, Ontology-based Semantic Annotation of Text) |
---|
124 | And we are dealing with highly structured data with referenced in their nominal(?) form. |
---|
125 | |
---|
126 | |
---|
127 | |
---|
128 | \section{Summary} |
---|
129 | This chapter concentrated on the current affairs/developments regarding the infrastructures for Language Resources and Technology and on the other hand gave an overview of the state of the art regarding methods to be applied in this work: Semantic Web Technologies, Ontology Mapping and Ontology Visualization. |
---|