1 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
---|
2 | \chapter{State of the Art} |
---|
3 | \label{ch:lit} |
---|
4 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
---|
5 | |
---|
6 | In this chapter we give a short overview of the development of large research infrastructures (with focus on those for language resources and technology), then we examine in more detail the hoist of work (methods and systems) on schema/ontology matching |
---|
7 | and review Semantic Web principles and technologies. |
---|
8 | |
---|
9 | Note though that substantial parts of state of the art coverage are outsourced into separate chapters: A broad analysis of the data is provided in separate chapter \ref{ch:data} and a detailed description of the underlying infrastructure is found in \ref{ch:infra}. |
---|
10 | |
---|
11 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
---|
12 | \section{Research Infrastructures (for Language Resources and Technology)} |
---|
13 | In recent years, multiple large-scale initiatives have set out to combat the fragmented nature of the language resources landscape in general and the metadata interoperability problems in particular. |
---|
14 | |
---|
15 | \xne{EAGLES/ISLE Meta Data Initiative} (IMDI) \cite{wittenburg2000eagles} 2000 to 2003 proposed a standard for metadata descriptions of Multi-Media/Multi-Modal Language Resources aiming at easing access to Language Resources and thus increases their reusability. |
---|
16 | |
---|
17 | \xne{FLaReNet}\furl{http://www.flarenet.eu/} -- Fostering Language Resources Network -- running 2007 to 2010 concentrated rather on ``community and consensus building'' developing a common vision and mapping the field of LRT via survey. |
---|
18 | |
---|
19 | \xne{CLARIN} -- Common Language Resources and Technology Infrastructure -- large research infrastructure providing sustainable access for scholars in the humanities and social sciences to digital language data, and especially its technical core the Component Metadata Infrastructure (CMDI) -- a comprehensive architecture for harmonized handling of metadata\cite{Broeder2011} -- |
---|
20 | are the primary context of this work, therefore the description of this underlying infrastructure is detailed in separate chapter \ref{ch:infra}. |
---|
21 | Both above-mentioned projects can be seen as predecessors to CLARIN, the IMDI metadata model being one starting point for the development of CMDI. |
---|
22 | |
---|
23 | More of a sister-project is the initiative \xne{DARIAH} - Digital Research Infrastructure for the Arts and Humanities\furl{http://dariah.eu}. It has a broader scope, but has many personal ties as well as similar problems and similiar solutions as CLARIN. Therefore there are efforts to intensify the cooperation between these two research infrastructures for digital humanities. |
---|
24 | |
---|
25 | \xne{META-SHARE} is another multinational project aiming to build an infrastructure for language resource\cite{Piperidis2012meta}, however focusing more on Human Language Technologies domain.\furl{http://meta-share.eu} |
---|
26 | |
---|
27 | \begin{quotation} |
---|
28 | META-NET is designing and implementing META-SHARE, a sustainable network of repositories of language data, tools and related web services documented with high-quality metadata, aggregated in central inventories allowing for uniform search and access to resources. Data and tools can be both open and with restricted access rights, free and for-a-fee. |
---|
29 | \end{quotation} |
---|
30 | |
---|
31 | See \ref{def:META-SHARE} for more details about META-SHARE's catalog and metadata format. |
---|
32 | |
---|
33 | |
---|
34 | \subsubsection{Digital Libraries} |
---|
35 | |
---|
36 | In a broader view we should also regard the activities in the world of libraries. |
---|
37 | Starting already in 1970's with connecting, exchanging and harmonizing their bibliographic catalogs, they certainly have a long tradition, wealth of experience and stable solutions. |
---|
38 | |
---|
39 | Mainly driven by national libraries still bigger aggregations of the bibliographic data are being set up. |
---|
40 | The biggest one being the \xne{Worldcat}\furl{http://www.worldcat.org/} (totalling 273.7 million records \cite{OCLCAnnualReport2012}) |
---|
41 | powered by OCLC, a cooperative of over 72.000 libraries worldwide. |
---|
42 | |
---|
43 | In Europe, more recent initiatives have pursuit similar goals: |
---|
44 | \xne{The European Library}\furl{http://www.theeuropeanlibrary.org/tel4/} offers a search interface over more than 18 million digital items and almost 120 million bibliographic records from 48 National Libraries and leading European Research Libraries. |
---|
45 | |
---|
46 | \xne{Europeana}\furl{http://www.europeana.eu/} \cite{purday2009think} has even broader scope, serving as meta-aggregator and portal for European digitised works, encompassing material not just from libraries, but also museums, archives and all other kinds of collections (In fact, The European Library is the \emph{library aggregator} for Europeana). The auxiliary project \xne{EuropeanaConnect}\furl{http://www.europeanaconnect.eu/} (2009-2011) delivered the core technical components for Europeana as well as further services reusable in other contexts, e.g. the spatio-temporal browser \xne{GeoTemCo}\furl{https://github.com/stjaenicke/GeoTemCo} \cite{janicke2013geotemco}. |
---|
47 | |
---|
48 | Most recently, with \xne{Europeana Cloud}\furl{http://pro.europeana.eu/web/europeana-cloud} (2013 to 2015) a succession of \xne{Europeana} was established, a Best Practice Network, coordinated by The European Library, designed to establish a cloud-based system for Europeana and its aggregators, providing new content, new metadata, a new linked storage system, new tools and services for researchers and a new platform - Europeana Research. |
---|
49 | |
---|
50 | The related catalogs and formats are described in the section \ref{sec:other-md-catalogs} |
---|
51 | |
---|
52 | |
---|
53 | \section{Existing crosswalks (services)} |
---|
54 | |
---|
55 | Crosswalks as list of equivalent fields from two schemas have been around already for a long time, in the world of enterprise systems, e.g. to bridge to legacy systems and also in libraries, e.g. \emph{MARC to Dublin Core Crosswalk}\furl{http://loc.gov/marc/marc2dc.html} |
---|
56 | |
---|
57 | \cite{Day2002crosswalks} lists a number of mappings between metadata formats. |
---|
58 | |
---|
59 | Mostly Dublin Core and MARC family of formats |
---|
60 | |
---|
61 | http://www.loc.gov/marc/dccross.html |
---|
62 | |
---|
63 | |
---|
64 | static |
---|
65 | metadata crosswalk repository |
---|
66 | |
---|
67 | |
---|
68 | OCLC launched \xne{Metadata Schema Transformation Services}\furl{http://www.oclc.org/research/activities/schematrans.html?urlm=160118} |
---|
69 | in particular \xne{Crosswalk Web Service}\furl{http://www.oclc.org/developer/services/metadata-crosswalk-service} |
---|
70 | http://www.oclc.org/research/activities/xwalk.html |
---|
71 | |
---|
72 | \begin{quotation} |
---|
73 | a self-contained crosswalk utility that can be called by any application that must translate metadata records. In our implementation, the translation logic is executed by a dedicated XML application called the Semantic Equivalence Expression Language, or Seel, a language specification and a corresponding interpreter that transcribes the information in a crosswalk into an executable format. |
---|
74 | \end{quotation} |
---|
75 | |
---|
76 | the Crosswalk Web Service is now a production system that has been incorporated into the following OCLC products and services. |
---|
77 | |
---|
78 | However the demo service is not available\furl{http://errol.oclc.org/schemaTrans.oclc.org.search} |
---|
79 | |
---|
80 | |
---|
81 | |
---|
82 | Offered formats? |
---|
83 | These however concentrate on the formats for the LIS community available and are ?? |
---|
84 | |
---|
85 | For this service, a metadata format is defined as a triple of: |
---|
86 | |
---|
87 | Standardâthe metadata standard of the record (e.g. MARC, DC, MODS, etc ...) |
---|
88 | Structureâthe structure of how the metadata is expressed in the record (e.g. XML, RDF, ISO 2709, etc ...) |
---|
89 | Encodingâthe character encoding of the metadata (e.g. MARC8, UTF-8, Windows 1251, etc ...) |
---|
90 | |
---|
91 | |
---|
92 | Offered interface!? |
---|
93 | he Crosswalk Web Service has 4 methods: |
---|
94 | |
---|
95 | translate(...) - This method translates the records. See the documentation for more information. |
---|
96 | getSupportedSourceRecordFormats() - This method returns a list of formats that are supported as input formats. |
---|
97 | getSupportedTargetRecordFormats() - This method returns a list of formats that the input formats can be translated to. |
---|
98 | getSupportedJavaEncodings() - Some formats will support all of the character encodings that Java supports. This function returns the list of encodings that Java supports. |
---|
99 | |
---|
100 | |
---|
101 | |
---|
102 | \section{Schema/Ontology Mapping/Matching} |
---|
103 | \label{lit:schema-matching} |
---|
104 | |
---|
105 | As Shvaiko\cite{shvaiko2012ontology} states ``\emph{Ontology matching} is a solution to the semantic heterogeneity problem. It finds correspondences between semantically related entities of ontologies.'' |
---|
106 | As such, it provides a very suitable methodical foundation for the problem at hand -- the \emph{semantic mapping}. (In sections \ref{sec:schema-matching-app} and \ref{sec:values2entities} we elaborate on the possible ways to apply these methods to the described problem.) |
---|
107 | |
---|
108 | There is a plethora of work on methods and technology in the field of \emph{schema and ontology matching} as witnessed by a sizable number of publications providing overviews, surveys and classifications of existing work \cite{Kalfoglou2003, Shvaiko2008, Noy2005_ontologyalignment, Noy2004_semanticintegration, Shvaiko2005_classification} and most recently \cite{shvaiko2012ontology, amrouch2012survey}. |
---|
109 | |
---|
110 | %Shvaiko and Euzenat provide a summary of the key challenges\cite{Shvaiko2008} as well as a comprehensive survey of approaches for schema and ontology matching based on a proposed new classification of schema-based matching techniques\cite{}. |
---|
111 | |
---|
112 | Shvaiko and Euzenat also run the web page \url{http://www.ontologymatching.org/} dedicated to this topic and the related OAEI\footnote{Ontology Alignment Evalution Intiative - \url{http://oaei.ontologymatching.org/}}, an ongoing effort to evaulate alignment tools based on various alignment tasks from different domains. |
---|
113 | |
---|
114 | Interestingly, \cite{shvaiko2012ontology} somewhat self-critically asks if after years of research``the field of ontology matching [is] still making progress?''. |
---|
115 | |
---|
116 | \subsubsection{Method} |
---|
117 | |
---|
118 | There are slight differences in use of the terms between \cite{EhrigSure2004, Ehrig2006}, \cite{Euzenat2007} and \cite{amrouch2012survey}, especially one has to be aware if in given context the term denotes the task in general, the process, the actual operation/function or the result of the function. |
---|
119 | |
---|
120 | \cite{Euzenat2007} formalizes the problem as ``ontology matching operation'': |
---|
121 | |
---|
122 | \begin{quotation} |
---|
123 | The matching operation determines an alignment A' for a |
---|
124 | pair of ontologies O1 and O2. Hence, given a pair of |
---|
125 | ontologies (which can be very simple and contain one entity |
---|
126 | each), the matching task is that of finding an alignment |
---|
127 | between these ontologies. [\dots] |
---|
128 | \end{quotation} |
---|
129 | |
---|
130 | But basically the different authors broadly agree on the definition of \var{ontology alignment} in the meaning \concept{task} is ``to identify relations between individual elements of mulitple ontologies'', or as \concept{result} ``a set of correspondences between entities belonging to the matched ontologies''. |
---|
131 | |
---|
132 | More formally \cite{Ehrig2006} formulates ontology alignment as ``a partial function based on the set \var{E} of all entities $e \in E$ and based on the set of possible ontologies \var{O}. [\dots] Once an alignment is established we say entity \var{e} is aligned with entity \var{f} when \var{align(e) \ = \ f}.'' Also, ``alignment is a one-to-one equality relation.'' (although this is relativized further in the work, and also in \cite{EhrigSure2004} ) |
---|
133 | |
---|
134 | \begin{definition}{\var{align} function} |
---|
135 | align \ : E \times O \times O \rightarrow E |
---|
136 | \end{definition} |
---|
137 | |
---|
138 | \cite{EhrigSure2004} and \cite{amrouch2012survey} instead introduce \var{ontology mapping} when applying the task on individual entities, in the meaning as a function that ``for each concept (node) in ontology A [tries to] find a corresponding concept |
---|
139 | (node), which has the same or similar semantics, in ontology B and vice verse''. In the meaning as result it is ``formal expression describing a semantic relationship between two (or more) concepts belonging to two (or more) different ontologies''. |
---|
140 | |
---|
141 | \cite{EhrigSure2004} further specify the mapping function as based on a similarity function, that for a pair of entities from two (or more) ontologies computes a ratio indicating the semantic proximity of the two entities. |
---|
142 | |
---|
143 | \begin{defcap}[!ht] |
---|
144 | \caption{\var{map} function for single entities and underlying \var{similarity} function } |
---|
145 | \begin{align*} |
---|
146 | & map \ : O_{i1} \rightarrow O_{i2} \\ |
---|
147 | & map( e_{i_{1}j_{1}}) = e_{i_{2}j_{2}}\text{, if } sim(e_{i_{1}j_{1}},e_{i_{2}j_{2}}) \ \textgreater \ t \text{ with } t \text{ being the threshold} \\ |
---|
148 | & sim \ : E \times E \times O \times O \rightarrow [0,1] |
---|
149 | \end{align*} |
---|
150 | \end{defcap} |
---|
151 | |
---|
152 | This elegant abstraction introduced with the \var{similarity} function provides a general model that can accomodate a broad range of comparison relationships and corresponding similarity measures. And here, again, we encounter a broad range of possible approaches. |
---|
153 | |
---|
154 | \cite{ehrig2004qom} lists a number of basic features and corresponsing similarity measures: |
---|
155 | Starting from primitive data types, next to value equality, string similarity, edit distance or in general relative distance can be computed. |
---|
156 | For concepts, next to the directly applicable unambiguous \code{sameAs} statements, label similarity can be determined (again either as string similarity, but also broaded by employing external taxonomies and other semantic resources like WordNet - \emph{extensional} methods), equal (shared) class instances, shared superclasses, subclasses, properties. |
---|
157 | |
---|
158 | Element-level (terminological) vs structure-level (structural) \cite{Shvaiko2005_classification} |
---|
159 | |
---|
160 | based on background knowledge... |
---|
161 | |
---|
162 | subclassâsuperclass relationships, domains and ranges of properties, analysis of the graph structure of the ontology. |
---|
163 | |
---|
164 | For properties the degree of the super an subproperties equality, overlapping domain and/or range. |
---|
165 | Additionally to these measures applicable on individual ontology items, there are approaches (like the \var{Similarity Flooding algorithm} \cite{melnik2002similarity}) to propagate computed similarities across the graph defined by relations between entities (primarily subsumption hierarchy). |
---|
166 | |
---|
167 | \cite{Algergawy2010} classifies, reviews, and experimentally compares major methods of element similarity measures and their combinations. \cite{shvaiko2012ontology} comparing a number of recent systems finds that ``semantic and extensional methods are still rarely employed. In fact, most of the approaches are quite often based only on terminological and structural methods. |
---|
168 | |
---|
169 | \cite{Ehrig2006} employs this \var{similarity} function over single entities to derive the notion of \var{ontology similarity} as ``based on similarity of pairs of single entities from the different ontologies''. This is operationalized as some kind of aggregating function\cite{ehrig2004qom}, that combines all similiarity measures (mostly modulated by custom weighting) computed for pairs of single entities again into one value (from the \var{[0,1]} range) expressing the similarity ratio of the two ontologies being compared. (The employment of weights allows to apply machine learning approaches for optimization of the results.) |
---|
170 | |
---|
171 | Thus, \var{ontology similarity} is a much weaker assertion, than \var{ontology alignment}, in fact, the computed similarity is interpreted to assert ontology alignment: the aggregated similarity above a defined threshold indicates an alignment. |
---|
172 | |
---|
173 | |
---|
174 | As to the alignment process, \cite{Ehrig2006} distinguishes following steps: |
---|
175 | \begin{enumerate} |
---|
176 | \item Feature Engineering |
---|
177 | \item Search Step Selection |
---|
178 | \item Similarity Assessment |
---|
179 | \item Interpretation |
---|
180 | \item Iteration |
---|
181 | \end{enumerate} |
---|
182 | |
---|
183 | In contrast, \cite{jimenez2012large} in their system \xne{LogMap2} reduce the process into just two steps: computation of mapping candidates (maximise recall) and assessment of the candidates (maximize precision), that however correspond to the steps 2 and 3 of the above procedure and in fact the other steps are implicitly present in the described system. |
---|
184 | |
---|
185 | |
---|
186 | \subsubsection{Systems} |
---|
187 | A number of existing systems for schema/ontology matching/alignment is collected in the above-mentioned overview publications: |
---|
188 | |
---|
189 | IF-Map \cite{kalfoglou2003if}, QOM \cite{ehrig2004qom}, \xne{FOAM} \cite{EhrigSure2005}, Similarity Flooding (SF) \cite{melnik}, S-Match \cite{Giunchiglia2007_semanticmatching}, the Prompt tools \cite{Noy2003_theprompt} integrating with Protégé or \xne{COMA++} \cite{Aumueller2005}, \xne{Chimaera}. Additionally, \cite{shvaiko2012ontology} lists and evaluates some more recent contributions: \xne{SAMBO, Falcon, RiMOM, ASMOV, Anchor-Flood, AgreementMaker}. |
---|
190 | |
---|
191 | All of the tools use multiple methods as described in the previous section, exploiting both element as well as structural features and applying some kind of composition or aggregation of the computed atomic measures, to arrive to a alignment assertion. |
---|
192 | |
---|
193 | Next to OWL as input format supported by all the systems some also accept XML Schemas (\xne{COMA++, SF, Cupid, SMatch}), |
---|
194 | some provide a GUI (\xne{COMA++, Chimaera, PROMPT, SAMBO, AgreementMaker}). |
---|
195 | |
---|
196 | Scalability is one factor to be considered, given that in a baseline scenario (before considering efficiency optimisations in candidate generation) the space of possible candidate mappings is the cartesian product of entities from the two ontologies being aligned. Authors of the (refurbished) ontology matching system \xne{LogMap 2} \cite{jimenez2012large} hold that it implements scalable reasoning and diagnosis algorithms, performant enough, to be integrated with the provided user interaction. |
---|
197 | |
---|
198 | |
---|
199 | |
---|
200 | %%%%%%%%%%%%%%%%%%%%%%%%%%5 |
---|
201 | \section{Semantic Web -- Linked Open Data} |
---|
202 | |
---|
203 | Linked Data paradigm\cite{TimBL2006} for publishing data on the web is increasingly been taken up by data providers across many disciplines \cite{bizer2009linked}. \cite{HeathBizer2011} gives comprehensive overview of the principles of Linked Data with practical examples and current applications. |
---|
204 | |
---|
205 | \subsubsection{Semantic Web - Technical solutions / Server applications} |
---|
206 | |
---|
207 | |
---|
208 | The provision of the produced semantic resources on the web requires technical solutions to store the RDF triples, query them efficiently |
---|
209 | and idealiter expose them via a web interface to the users. |
---|
210 | |
---|
211 | Meanwhile a number of RDF triple store solutions relying both on native, DBMS-backed or hybrid persistence layer are available, open-source solutions like \xne{Jena, Sesame} or \xne{BigData} as well as a number of commercial solutions \xne{AllegroGraph, OWLIM, Virtuoso}. |
---|
212 | |
---|
213 | A qualitative and quantitative study\cite{Haslhofer2011europeana} in the context of Europeana evaluated a number of RDF stores (using the whole Europeana EDM data set = 382,629,063 triples as data load) and came to the conclusion, that ``certain RDF stores, notably OpenLink Virtuoso and 4Store'' can handle the large test dataset. |
---|
214 | |
---|
215 | \xne{OpenLink Virtuoso Universal Server}\furl{http://virtuoso.openlinksw.com} is hybrid storage solution for a range of data models, including relational data, RDF and XML, and free text documents.\cite{Erling2009Virtuoso, Haslhofer2011europeana} |
---|
216 | Virtuoso is used to host many important Linked Data sets (e.g., DBpedia\furl{http://dbpedia.org} \cite{auer2007dbpedia}). |
---|
217 | Virtuoso is offered both as commercial and open-source version license models exist. |
---|
218 | |
---|
219 | Another solution worth examining is the \xne{Linked Media Framework}\furl{http://code.google.com/p/lmf/} -- ``easy-to-setup server application that bundles together three Apache open source projects to offer some advanced services for linked media management'': publishing legacy data as linked data, semantic search by enriching data with content from the Linked Data Cloud, using SKOS thesaurus for information extraction. |
---|
220 | |
---|
221 | One more specific work is that of Noah et. al \cite{Noah2010} developing a semantic digital library for an academic institution. The scope is limited to document collections, but nevertheless many aspects seem very relevant for this work, like operating on document metadata, ontology population or sophisticated querying and searching. |
---|
222 | |
---|
223 | |
---|
224 | \begin{comment} |
---|
225 | LDpath\furl{http://code.google.com/p/ldpath/} |
---|
226 | `` a simple path-based query language similar to XPath or SPARQL Property Paths that is particularly well-suited for querying and retrieving resources from the Linked Data Cloud by following RDF links between resources and servers. '' |
---|
227 | |
---|
228 | Linked Data browser |
---|
229 | |
---|
230 | Haystack\furl{http://en.wikipedia.org/wiki/Haystack_(PIM)} |
---|
231 | \end{comment} |
---|
232 | |
---|
233 | \subsubsection{Ontology Visualization} |
---|
234 | |
---|
235 | Landscape, Treemap, SOM |
---|
236 | |
---|
237 | \todoin{check Ontology Mapping and Alignement / saiks/Ontology4 4auf1.pdf} |
---|
238 | |
---|
239 | |
---|
240 | %%%%%%%%%%%%%%%% |
---|
241 | \section{Language and Ontologies} |
---|
242 | |
---|
243 | There are two different relation links betwee language or linguistics and ontologies: a) `linguistic ontologies' domain ontologies conceptualizing the linguistic domain, capturing aspects of linguistic resources; b) `lexicalized' ontologies, where ontology entities are enriched with linguistic, lexical information. |
---|
244 | |
---|
245 | \subsubsection{Linguistic ontologies} |
---|
246 | |
---|
247 | One prominent instance of a linguistic ontology is \xne{General Ontology for Linguistic Description} or GOLD\cite{Farrar2003}\furl{http://linguistics-ontology.org}, |
---|
248 | that ``gives a formalized account of the most basic categories and relations (the "atoms") used in the scientific description of human language, attempting to codify the general knowledge of the field. The motivation is to`` facilite automated reasoning over linguistic data and help establish the basic concepts through which intelligent search can be carried out''. |
---|
249 | |
---|
250 | In line with the aspiration ``to be compatible with the general goals of the Semantic Web'', the dataset is provided via a web application as well as a dump in OWL format\furl{http://linguistics-ontology.org/gold-2010.owl} \cite{GOLD2010}. |
---|
251 | |
---|
252 | |
---|
253 | Founded in 1934, SIL International\furl{http://www.sil.org/about-sil} (originally known as the Summer Institute of Linguistics, Inc) is a leader in the identification and documentation of the world's languages. Results of this research are published in Ethnologue: Languages of the World\furl{http://www.ethnologue.com/} \cite{grimes2000ethnologue}, a comprehensive catalog of the world's nearly 7,000 living languages. SIL also maintains Language \& Culture Archives a large collection of all kinds resources in the ethnolinguistic domain \furl{http://www.sil.org/resources/language-culture-archives}. |
---|
254 | |
---|
255 | World Atlas of Language Structures (WALS) \furl{http://WALS.info} \cite{wals2011} |
---|
256 | is ``a large database of structural (phonological, grammatical, lexical) properties of languages gathered from descriptive materials (such as reference grammars) ''. First appeared 2005, current online version published in 2011 provides a compendium of detailed expert definitions of individual linguistic features, accompanied by a sophisticated web interface integrating the information on linguistic features with their occurrence in the world languages and their geographical distribution. |
---|
257 | |
---|
258 | Simons \cite{Simons2003developing} developed a Semantic Interpretation Language (SIL) that is used to define the meaning of the elements and attributes in an XML markup schema in terms of abstract concepts defined in a formal semantic schema |
---|
259 | Extending on this work, Simons et al. \cite{Simons2004semantics} propose a method for mapping linguistic descriptions in plain XML into semantically rich RDF/OWL, employing the GOLD ontology as the target semantic schema. |
---|
260 | |
---|
261 | These ontologies can be used by (``ontologized'') Lexicons refer to them to describe linguistic properties of the Lexical Entries, as opposed to linking to Domain Ontologies to anchor Senses/Meanings. |
---|
262 | |
---|
263 | |
---|
264 | Work on Semantic Interpretation Language as well as the GOLD ontology can be seen as conceptual predecessor of the Data Category Registry a ISO-standardized procedure for defining and standardizing ``widely accepted linguistic concepts'', that is at the core of the CLARIN's metadata infrastructure (cf. \ref{def:DCR}). |
---|
265 | Although not exactly an ontology in the common sense of |
---|
266 | Although (by design) this registry does not contain any relations between concepts, |
---|
267 | the central entities are concepts and not lexical items, thus it can be seen as a proto-ontology. |
---|
268 | Another indication of the heritage is the fact that concepts of the GOLD ontology were migrated into ISOcat (495 items) in 2010. |
---|
269 | |
---|
270 | Notice that although this work is concerned with language resources, it is primarily on the metadata level, thus the overlap with linguistic ontologies codifying the terminology of the discipline linguistic is rather marginal (perhaps on level of description of specific linguistic aspects of given resources). |
---|
271 | |
---|
272 | \subsubsection{Lexicalised ontologies,``ontologized'' lexicons} |
---|
273 | |
---|
274 | |
---|
275 | The other type of relation between ontologies and linguistics or language are lexicalised ontologies. Hirst \cite{Hirst2009} elaborates on the differences between ontology and lexicon and the possibility to reuse lexicons for development of ontologies. |
---|
276 | |
---|
277 | In a number of works Buitelaar, McCrae et. al \cite{Buitelaar2009, buitelaar2010ontology, McCrae2010c, buitelaar2011ontology, Mccrae2012interchanging} argues for ``associating linguistic information with ontologies'' or ``ontology lexicalisation'' and draws attention to lexical and linguistic issues in knowledge representation in general. This basic idea lies behind the series of proposed models \xne{LingInfo}, \xne{LexOnto}, \xne{LexInfo} and, most recently, \xne{lemon} aimed at allowing complex lexical information for such ontologies and for describing the relationship between the lexicon and the ontology. |
---|
278 | The most recent in this line, \xne{lemon} or \xne{lexicon model for ontologies} defines ``a formal model for the proper representation of the continuum between: i) ontology semantics; ii) terminology that is used to convey this in natural |
---|
279 | language; and iii) linguistic information on these terms and their constituent lexical units'', in essence enabling the creation of a lexicon for a given ontology, adopting the principle of ``semantics by reference", no complex semantic in- |
---|
280 | formation needs to be stated in the lexicon. |
---|
281 | a clear separation of the lexical layer and the ontological layer. |
---|
282 | |
---|
283 | Lemon builds on existing work, next to the LexInfo and LIR ontology-lexicon models. |
---|
284 | and in particular on global standards: W3C standard: SKOS (Simple Knowledge Organization System) \cite{SKOS2009} and ISO standards the Lexical Markup Framework (ISO 24613:2008 \cite{ISO24613:2008}) and |
---|
285 | and Specification of Data Categories, Data Category Registry (ISO 12620:2009 \cite{ISO12620:2009}) |
---|
286 | |
---|
287 | Lexical Markup Framework LMF \cite{Francopoulo2006LMF, ISO24613:2008} defines a metamodel for representing data in lexical databases used with monolingual and multilingual computer applications, provides a RDF serialization (?!?!). |
---|
288 | |
---|
289 | An overview of current developments in application of the linked data paradigm for linguistic data collections was given at the workshop Linked Data in Linguistics\furl{http://ldl2012.lod2.eu/} 2012 \cite{ldl2012}. |
---|
290 | |
---|
291 | |
---|
292 | The primary motivation for linguistic ontologies like \xne{lemon} are the tasks ontology-based information extraction, ontology learning and population from text, where the entities are often referred to by non-nominal word forms and with ambiguous semantics. Given, that the discussed collection contains mainly highly structured data referencing entities in their nominal form, linguistic ontologies are not directly relevant for this work. |
---|
293 | |
---|
294 | |
---|
295 | \section{Summary} |
---|
296 | This chapter concentrated on the current affairs/developments regarding the infrastructures for Language Resources and Technology and on the other hand gave an overview of the state of the art regarding methods to be applied in this work: Semantic Web Technologies, Ontology Mapping and Ontology Visualization. |
---|