1 | % !TEX TS-program = pdflatex |
---|
2 | % !TEX encoding = UTF-8 Unicode |
---|
3 | |
---|
4 | % This is a simple template for a LaTeX document using the "article" class. |
---|
5 | % See "book", "report", "letter" for other types of document. |
---|
6 | |
---|
7 | \documentclass[11pt]{article} % use larger type; default would be 10pt |
---|
8 | |
---|
9 | \usepackage[utf8]{inputenc} % set input encoding (not needed with XeLaTeX) |
---|
10 | |
---|
11 | \usepackage{url} |
---|
12 | %\usepackage{svn-multi} |
---|
13 | |
---|
14 | % Subversion Information |
---|
15 | %\svnidlong |
---|
16 | %{$HeadURL: $} |
---|
17 | %{$LastChangedDate: $} |
---|
18 | %{$LastChangedRevision: $} |
---|
19 | %{$LastChangedBy: $} |
---|
20 | %\svnid{$Id$} |
---|
21 | |
---|
22 | |
---|
23 | %%% Examples of Article customizations |
---|
24 | % These packages are optional, depending whether you want the features they provide. |
---|
25 | % See the LaTeX Companion or other references for full information. |
---|
26 | |
---|
27 | %%% PAGE DIMENSIONS |
---|
28 | \usepackage{geometry} % to change the page dimensions |
---|
29 | \geometry{a4paper} % or letterpaper (US) or a5paper or.... |
---|
30 | %\geometry{margin=1cm} % for example, change the margins to 2 inches all round |
---|
31 | \topmargin=-0.5in |
---|
32 | \textheight=700pt |
---|
33 | % \geometry{landscape} % set up the page for landscape |
---|
34 | % read geometry.pdf for detailed page layout information |
---|
35 | |
---|
36 | \usepackage{graphicx} % support the \includegraphics command and options |
---|
37 | |
---|
38 | % \usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent |
---|
39 | |
---|
40 | %%% PACKAGES |
---|
41 | \usepackage{booktabs} % for much better looking tables |
---|
42 | \usepackage{array} % for better arrays (eg matrices) in maths |
---|
43 | \usepackage{paralist} % very flexible & customisable lists (eg. enumerate/itemize, etc.) |
---|
44 | \usepackage{verbatim} % adds environment for commenting out blocks of text & for better verbatim |
---|
45 | %\usepackage{subfig} % make it possible to include more than one captioned figure/table in a single float |
---|
46 | % These packages are all incorporated in the memoir class to one degree or another... |
---|
47 | |
---|
48 | %%% HEADERS & FOOTERS |
---|
49 | \usepackage{fancyhdr} % This should be set AFTER setting up the page geometry |
---|
50 | \pagestyle{plain} % options: empty , plain , fancy |
---|
51 | \renewcommand{\headrulewidth}{0pt} % customise the layout... |
---|
52 | \lhead{}\chead{}\rhead{} |
---|
53 | \lfoot{}\cfoot{\thepage}\rfoot{} |
---|
54 | |
---|
55 | %%% SECTION TITLE APPEARANCE |
---|
56 | \usepackage{sectsty} |
---|
57 | \allsectionsfont{\sffamily\mdseries\upshape} % (See the fntguide.pdf for font help) |
---|
58 | % (This matches ConTeXt defaults) |
---|
59 | |
---|
60 | %%% ToC (table of contents) APPEARANCE |
---|
61 | \usepackage[nottoc,notlof,notlot]{tocbibind} % Put the bibliography in the ToC |
---|
62 | %\usepackage[titles,subfigure]{tocloft} % Alter the style of the Table of Contents |
---|
63 | %\renewcommand{\cftsecfont}{\rmfamily\mdseries\upshape} |
---|
64 | %\renewcommand{\cftsecpagefont}{\rmfamily\mdseries\upshape} % No bold! |
---|
65 | |
---|
66 | %%% END Article customizations |
---|
67 | |
---|
68 | %%% The "real" document content comes below... |
---|
69 | |
---|
70 | \title{SMC4LRT - \\ Semantic Mapping Component for Language Resources and Technology} |
---|
71 | \author{Durco Matej, 0005416} |
---|
72 | %\date{} % Activate to display a given date or no date (if empty), |
---|
73 | % otherwise the current date is printed |
---|
74 | |
---|
75 | \begin{document} |
---|
76 | \maketitle |
---|
77 | |
---|
78 | \section{Main Goal} |
---|
79 | |
---|
80 | We propose a component that shall enhance search functionality over a large heterogeneous collection of metadata descriptions of Language Resources and Technology (LRT). By applying semantic web technology the user shall be given both better recall through query expansion based on related categories/concepts and new means of exploring the dataset/knowledge-base via ontology-driven browsing. |
---|
81 | |
---|
82 | A trivial example for a concept-based query expansion: |
---|
83 | Confronted with a user query: \texttt{Actor.Name = Sue} and knowing that \texttt{Actor} is equivalent or similar to \texttt{Person} and \texttt{Name} is synonym to \texttt{FullName} the expanded query could look like: |
---|
84 | \texttt{Actor.Name = Sue OR Actor.FullName = Sue OR Person.Name = Sue OR Person.FullName= is Sue} |
---|
85 | |
---|
86 | Another example concerning instance mapping: the user looking for all resource produced by or linked to a given institution, does not have to guess or care for various spellings of the name of the institution used in the description of the resources, but rather can browse through a controlled vocabulary of institutions and see all the resources of given institution. While this could be achieved by simple normalizing of the literal-values (and indeed that definitely has to be one processing step), the linking to an ontology, enables to user to also continue browsing the ontology to find institutions that are related to the original institution by means of being concerned with similar topics and retrieve a union of resources for such resulting cluster. Thus in general the user is enabled to work with the data based on information that is not present in the original dataset. |
---|
87 | |
---|
88 | All these scenarios require a preprocessing step, that would produce the underlying linkage, both between categories/concepts and between instances (mapping literal values to entities). We refer to this task as semantic mapping, that shall be accomplished by coresponding "Semantic Mapping Component". In this work the focus lies on the process/method, i.e. on the specification and (prototypical) implementation of the component rather than trying to establish some final/accomplished mapping. Although a tentative/naive alignement on a subset of the data will be proposed, this will be mainly used for evaluation and shall serve as basis for discussion with domain experts aiming at creating the actual sensible mappings usable for real tasks. |
---|
89 | |
---|
90 | Actually due to the great diversity of resources and research tasks such a "final" complete mapping/alignement does not seem achievable at all. Therefore also the focus shall be on "soft", dynamic mapping, investigating the possibilities/methods to enable the users to adapt the mapping or apply different mappings with respect to their current task or research question, |
---|
91 | essentially being able to actively manipulate the recall/precision ratio of their searches. This entails the examination of user interaction with and visualization of the relevant information in the user interface and enabling the user to act upon it. |
---|
92 | |
---|
93 | \section{Method} |
---|
94 | We start with examining the existing data and describing the evolving Infrastructure in which the components are to be embedded. |
---|
95 | Then we formulate the task/function of Semantic Search on concept and on individuals level |
---|
96 | and the underlying Semantic Mapping and the requirements within the defined context, |
---|
97 | followed by a design proposal for an appropriate component fitting within the infrastructure. |
---|
98 | especially with focus on the feasibility of employing ontology mapping and alignement techniques and tools for the creation of mappings. |
---|
99 | |
---|
100 | In a prototype we want to deliver a proof of the concept, |
---|
101 | combined with an evaluation to verify the claims of fitness for the purpose. |
---|
102 | This evaluation is twofold. It shall verify the ability of the system to support dynamic mapping based on a set of test queries |
---|
103 | and secondly the usability of the ui-controls. |
---|
104 | |
---|
105 | |
---|
106 | +? Identify hooks into LOD? |
---|
107 | |
---|
108 | |
---|
109 | a) define/use semantic relations between categories (RelationRegistry) |
---|
110 | b) employ ontological resources to enhance search in the dataset (SemanticSearch) |
---|
111 | c) specify a translation instructions for expressing dataset in rdf (LinkedData) |
---|
112 | |
---|
113 | |
---|
114 | \section{Expected Results} |
---|
115 | |
---|
116 | The main result of this work will be a specification of the pair of components the Semantic Search and the underlying Semantic Mapping. These propositions will be supported by a proof-of-concept implementation of these components and an evaluation of querying the dataset comparing traditional search and semantic search. |
---|
117 | |
---|
118 | One important by-product of the work will be the original dataset expressed as RDF with links into existing datasets/ontologies/knowledgebases, building a base for another nucleus of Linked Open Data. |
---|
119 | |
---|
120 | \begin{itemize} |
---|
121 | \item [Specification] definition of a mapping mechanism |
---|
122 | \item [Prototype] proof of concept implementation |
---|
123 | \item [Evaluation] evaluation results of querying the dataset comparing traditional search and semantic search |
---|
124 | \item [LinkedData] translation of the source dataset to RDF-based format with links into existing datasets/ontologies/knowledgebases |
---|
125 | |
---|
126 | \end{itemize} |
---|
127 | |
---|
128 | |
---|
129 | \section{State of the Art} |
---|
130 | |
---|
131 | One current tightly related work is the VLO - Virtual Language Observatory \footnote{\url{http://www.clarin.eu/vlo/}} \cite{VanUytvanck2010}, being developed within the CLARIN-project. This application operates on the same collection of data as is discussed in this work, however lead by the guiding principle to simplify the search for the user it employs a faceted search, mapping manually the appropriate metadata-fields from the different schemas/profiles to 10 fixed facets. |
---|
132 | |
---|
133 | From the knowledge management and semantic web point of view, LT-World, \footnote{\url{http://www.lt-world.org/}} \cite{Joerg2010} |
---|
134 | the ontology-based portal for Language Resources and technology being developed at DFKI \footnote{Deutsches Forschungszentrum fÃŒr KÃŒnstliche Intelligenz} is a prominent resource providing the information about the entities (Institutions, Persons, Project, Tools, etc.) in this field of study. |
---|
135 | |
---|
136 | Linked Open Data |
---|
137 | |
---|
138 | VAS - Catch Plus |
---|
139 | |
---|
140 | OAEI |
---|
141 | |
---|
142 | The starting point for the investigation of work regarding ontology mapping is the overview of the field by Kalfouglou \cite{Kalfoglou2003}. |
---|
143 | |
---|
144 | \section{Keywords} |
---|
145 | |
---|
146 | Metadata interoperability, Ontology Mapping, Schema mapping, Crosswalk, Similarity measures, LinkedData |
---|
147 | Fuzzy Search, Visual Search? |
---|
148 | schema-based ontology alignment |
---|
149 | |
---|
150 | Language Resources and Technology, LRT/NLP/HLT |
---|
151 | |
---|
152 | Ontology Visualization |
---|
153 | |
---|
154 | Federated Search, Distributed Content Search |
---|
155 | (ILS - Integrated Library Systems) |
---|
156 | |
---|
157 | |
---|
158 | \bibliographystyle{plain} |
---|
159 | \bibliography{../../../2bib/lingua,../../../2bib/ontolingua,../../../2bib/semweb} |
---|
160 | |
---|
161 | |
---|
162 | \end{document} |
---|