source: SMC4LRT/Expose.tex @ 1196

Last change on this file since 1196 was 1196, checked in by vronk, 13 years ago

initial import of Expose

File size: 9.5 KB
Line 
1% !TEX TS-program = pdflatex
2% !TEX encoding = UTF-8 Unicode
3
4% This is a simple template for a LaTeX document using the "article" class.
5% See "book", "report", "letter" for other types of document.
6
7\documentclass[11pt]{article} % use larger type; default would be 10pt
8
9\usepackage[utf8]{inputenc} % set input encoding (not needed with XeLaTeX)
10
11\usepackage{url}
12%\usepackage{svn-multi}
13
14% Subversion Information
15%\svnidlong
16%{$HeadURL: $}
17%{$LastChangedDate: $}
18%{$LastChangedRevision: $}
19%{$LastChangedBy: $}
20%\svnid{$Id$}
21
22
23%%% Examples of Article customizations
24% These packages are optional, depending whether you want the features they provide.
25% See the LaTeX Companion or other references for full information.
26
27%%% PAGE DIMENSIONS
28\usepackage{geometry} % to change the page dimensions
29\geometry{a4paper} % or letterpaper (US) or a5paper or....
30%\geometry{margin=1cm} % for example, change the margins to 2 inches all round
31\topmargin=-0.5in
32\textheight=700pt
33% \geometry{landscape} % set up the page for landscape
34%   read geometry.pdf for detailed page layout information
35
36\usepackage{graphicx} % support the \includegraphics command and options
37
38% \usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent
39
40%%% PACKAGES
41\usepackage{booktabs} % for much better looking tables
42\usepackage{array} % for better arrays (eg matrices) in maths
43\usepackage{paralist} % very flexible & customisable lists (eg. enumerate/itemize, etc.)
44\usepackage{verbatim} % adds environment for commenting out blocks of text & for better verbatim
45%\usepackage{subfig} % make it possible to include more than one captioned figure/table in a single float
46% These packages are all incorporated in the memoir class to one degree or another...
47
48%%% HEADERS & FOOTERS
49\usepackage{fancyhdr} % This should be set AFTER setting up the page geometry
50\pagestyle{plain} % options: empty , plain , fancy
51\renewcommand{\headrulewidth}{0pt} % customise the layout...
52\lhead{}\chead{}\rhead{}
53\lfoot{}\cfoot{\thepage}\rfoot{}
54
55%%% SECTION TITLE APPEARANCE
56\usepackage{sectsty}
57\allsectionsfont{\sffamily\mdseries\upshape} % (See the fntguide.pdf for font help)
58% (This matches ConTeXt defaults)
59
60%%% ToC (table of contents) APPEARANCE
61\usepackage[nottoc,notlof,notlot]{tocbibind} % Put the bibliography in the ToC
62%\usepackage[titles,subfigure]{tocloft} % Alter the style of the Table of Contents
63%\renewcommand{\cftsecfont}{\rmfamily\mdseries\upshape}
64%\renewcommand{\cftsecpagefont}{\rmfamily\mdseries\upshape} % No bold!
65
66%%% END Article customizations
67
68%%% The "real" document content comes below...
69
70\title{SMC4LRT - \\ Semantic Mapping Component for Language Resources and Technology}
71\author{Durco Matej, 0005416}
72%\date{} % Activate to display a given date or no date (if empty),
73         % otherwise the current date is printed
74
75\begin{document}
76\maketitle
77
78\section{Main Goal}
79
80We propose a component that shall enhance search functionality over a large heterogeneous collection of metadata descriptions of Language Resources and Technology (LRT). By applying semantic web technology the user shall be given both better recall through query expansion based on related categories/concepts and new means of exploring the dataset/knowledge-base via ontology-driven browsing.
81
82A trivial example for a concept-based query expansion:
83Confronted with a user query: \texttt{Actor.Name = Sue} and knowing that \texttt{Actor} is equivalent or similar to \texttt{Person} and \texttt{Name} is synonym to \texttt{FullName} the expanded query could look like:
84\texttt{Actor.Name = Sue OR Actor.FullName = Sue OR Person.Name =  Sue OR Person.FullName= is Sue}
85
86Another example concerning instance mapping: the user looking for all resource produced by or linked to a given institution, does not have to guess or care for various spellings of the name of the institution used in the description of the resources, but rather can browse through a controlled vocabulary of institutions and see all the resources of given institution. While this could be achieved by simple normalizing of the literal-values (and indeed that definitely has to be one processing step), the linking to an ontology, enables to user to also continue browsing the ontology to find institutions that are related to the original institution by means of being concerned with similar topics and retrieve a union of resources for such resulting cluster. Thus in general the user is enabled to work with the data based on information that is not present in the original dataset.
87
88All these scenarios require a preprocessing step, that would produce the underlying linkage, both between categories/concepts and between instances (mapping literal values to entities). We refer to this task as semantic mapping, that shall be accomplished by coresponding "Semantic Mapping Component". In this work the focus lies on the process/method, i.e. on the specification and (prototypical) implementation of the component rather than trying to establish some final/accomplished mapping. Although a tentative/naive alignement on a subset of the data will be proposed, this will be mainly used for evaluation and shall serve as basis for discussion with domain experts aiming at creating the actual sensible mappings usable for real tasks.
89
90Actually due to the great diversity of resources and research tasks  such a "final" complete mapping/alignement does not seem achievable at all. Therefore also the focus shall be on "soft", dynamic mapping, investigating the possibilities/methods to enable the users to adapt the mapping or apply different mappings with respect to their current task or research question,
91essentially being able to actively manipulate the recall/precision ratio of their searches. This entails the examination of user interaction with and visualization of the relevant information in the user interface and enabling the user to act upon it.
92
93\section{Method}
94We start with examining the existing data and describing the evolving Infrastructure in which the components are to be embedded.
95Then we formulate the task/function of Semantic Search on concept and on individuals level
96and the underlying Semantic Mapping and the requirements within the defined context,
97followed by a design proposal for an appropriate component fitting within the infrastructure.
98especially with focus on the feasibility of employing ontology mapping and alignement techniques and tools for the creation of mappings.
99
100In a prototype we want to deliver a proof of the concept,
101combined with an evaluation to verify the claims of fitness for the purpose.
102This evaluation is twofold. It shall verify the ability of the system to support dynamic mapping based on a set of test queries
103and secondly the usability of the ui-controls.
104
105
106+? Identify hooks into LOD?
107
108
109a) define/use semantic relations between categories (RelationRegistry)
110b) employ ontological resources to enhance search in the dataset (SemanticSearch)
111c) specify a translation instructions for expressing dataset in rdf  (LinkedData)
112
113
114\section{Expected Results}
115
116The main result of this work will be a specification of the pair of components the Semantic Search and the underlying Semantic Mapping. These propositions will be supported by a proof-of-concept implementation of these components and an evaluation of querying the dataset comparing traditional search and semantic search.
117
118One important by-product of the work will be the original dataset expressed as RDF with links into existing datasets/ontologies/knowledgebases, building a base for another nucleus of Linked Open Data.
119
120\begin{itemize}
121\item [Specification] definition of a mapping mechanism
122\item [Prototype] proof of concept implementation
123\item [Evaluation] evaluation results of querying the dataset comparing traditional search and semantic search
124\item [LinkedData] translation of the source dataset to RDF-based format with links into existing datasets/ontologies/knowledgebases
125
126\end{itemize}
127
128
129\section{State of the Art}
130
131One current tightly related work is the VLO - Virtual Language Observatory  \footnote{\url{http://www.clarin.eu/vlo/}} \cite{VanUytvanck2010}, being developed within the CLARIN-project. This application operates on the same collection of data as is discussed in this work, however lead by the guiding principle to simplify the search for the user it employs a faceted search, mapping manually the appropriate metadata-fields from the different schemas/profiles to 10 fixed facets.
132
133From the knowledge management and semantic web point of view, LT-World, \footnote{\url{http://www.lt-world.org/}} \cite{Joerg2010}
134the ontology-based portal for Language Resources and technology being developed at DFKI \footnote{Deutsches Forschungszentrum fÃŒr KÃŒnstliche Intelligenz} is a prominent resource providing the information about the entities (Institutions, Persons, Project, Tools, etc.) in this field of study.
135
136Linked Open Data
137
138VAS - Catch Plus
139
140OAEI
141
142The starting point for the investigation of work regarding ontology mapping is the overview of the field by Kalfouglou \cite{Kalfoglou2003}.
143
144\section{Keywords}
145
146Metadata interoperability, Ontology Mapping, Schema mapping, Crosswalk, Similarity measures, LinkedData
147Fuzzy Search, Visual Search?
148schema-based ontology alignment
149
150Language Resources and Technology, LRT/NLP/HLT
151
152Ontology Visualization
153
154Federated Search, Distributed Content Search
155(ILS - Integrated Library Systems)
156
157
158\bibliographystyle{plain}
159\bibliography{../../../2bib/lingua,../../../2bib/ontolingua,../../../2bib/semweb}
160
161
162\end{document}
Note: See TracBrowser for help on using the repository browser.