source: SMC4LRT/Outline.tex @ 2669

Last change on this file since 2669 was 2669, checked in by vronk, 11 years ago

sections in separate files

File size: 10.2 KB
Line 
1% !TEX TS-program = pdflatex
2% !TEX encoding = UTF-8 Unicode
3
4% This is a simple template for a LaTeX document using the "article" class.
5% See "book", "report", "letter" for other types of document.
6
7\documentclass[11pt]{article} % use larger type; default would be 10pt
8
9\usepackage[utf8]{inputenc} % set input encoding (not needed with XeLaTeX)
10
11\usepackage{url}
12%\usepackage{svn-multi}
13
14% Subversion Information
15%\svnidlong
16%{$HeadURL: $}
17%{$LastChangedDate: $}
18%{$LastChangedRevision: $}
19%{$LastChangedBy: $}
20%\svnid{$Id$}
21
22
23%%% Examples of Article customizations
24% These packages are optional, depending whether you want the features they provide.
25% See the LaTeX Companion or other references for full information.
26
27%%% PAGE DIMENSIONS
28\usepackage{geometry} % to change the page dimensions
29\geometry{a4paper} % or letterpaper (US) or a5paper or....
30%\geometry{margin=1cm} % for example, change the margins to 2 inches all round
31\topmargin=-0.5in
32\textheight=700pt
33% \geometry{landscape} % set up the page for landscape
34%   read geometry.pdf for detailed page layout information
35
36\usepackage{graphicx} % support the \includegraphics command and options
37
38% \usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent
39
40%%% PACKAGES
41\usepackage{booktabs} % for much better looking tables
42\usepackage{array} % for better arrays (eg matrices) in maths
43\usepackage{paralist} % very flexible & customisable lists (eg. enumerate/itemize, etc.)
44\usepackage{verbatim} % adds environment for commenting out blocks of text & for better verbatim
45%\usepackage{subfig} % make it possible to include more than one captioned figure/table in a single float
46% These packages are all incorporated in the memoir class to one degree or another...
47
48%%% HEADERS & FOOTERS
49\usepackage{fancyhdr} % This should be set AFTER setting up the page geometry
50\pagestyle{plain} % options: empty , plain , fancy
51\renewcommand{\headrulewidth}{0pt} % customise the layout...
52\lhead{}\chead{}\rhead{}
53\lfoot{}\cfoot{\thepage}\rfoot{}
54
55%%% SECTION TITLE APPEARANCE
56\usepackage{sectsty}
57\allsectionsfont{\sffamily\mdseries\upshape} % (See the fntguide.pdf for font help)
58% (This matches ConTeXt defaults)
59
60%%% ToC (table of contents) APPEARANCE
61\usepackage[nottoc,notlof,notlot]{tocbibind} % Put the bibliography in the ToC
62%\usepackage[titles,subfigure]{tocloft} % Alter the style of the Table of Contents
63%\renewcommand{\cftsecfont}{\rmfamily\mdseries\upshape}
64%\renewcommand{\cftsecpagefont}{\rmfamily\mdseries\upshape} % No bold!
65
66%%% END Article customizations
67
68%%% The "real" document content comes below...
69
70\title{SMC4LRT - Master Outline}
71\author{Matej Durco}
72%\date{} % Activate to display a given date or no date (if empty),
73         % otherwise the current date is printed
74
75\begin{document}
76\maketitle
77
78\tableofcontents
79
80\include{Introduction}
81
82
83\include{Literature}
84
85
86\section{Definitions}
87We want to clarify or lay down a few terms and definition, ie explanation of our understanding
88
89\begin{description}
90\item[Concept]  sense, idea, philosophical problem, which we don't need to discuss here. For our purposes we say: Basic "entity" in an ontology? that of what an ontology is build
91\item[Ontology]  "a explicit specification of a conceptualization" [cite!], but for us mainly a collection of concepts as opposed to lexicon, which is a collection of words.
92\item[Word]  a lexical unit, a word in a language, something that has a surface Realization (writtenForm) and is a carrier of sense. so a Relation holds: hasSense(Word, Concept)
93\item[Lexicon]  a collection of words, a (lexical) vocabulary
94\item[Vocabulary] an index providing mapping from Word (string) to Concept (uri)
95\item[(Data)Category] (almost) the same as Concept; Things like "Topic", "Genre", "Organization", "ResourceType" are instantiations of Category
96\item[ConceptualDomain] the Class of entities a Concept/Category denotes. For Organization it would be all (existing) organizations,  CD(ResourceType)={Corpus, Lexicon, Document, Image, Video, ...}. Entities of the domain can itself be Categories (ResourceType:Image), but it can be also individuals (Organization University of Vienna)
97\item[Entity] 
98\item[Resource] informational resource, in the context of CLARIN-Project  mainly Language Resources (Corpus, Lexicon, Multimedia)
99\item[Metadata Description] description of some properties of a resource.  MD-Record
100\item[Schema] - CMD-Profile
101\item[Annotation] 
102
103
104\end{description}
105
106\section{Analysis}
107
108\subsection{Data landscape}
109
110Describe situation regarding the datasets and formats
111
112collections, profiles/Terms, ResourceTypes!
113
114DC, OLAC,
115ISLE/IMDI, CHILDES, TEI, EAF!
116(CES/XCES)
117
118\subsection{Infrastructure}
119
120CMDI \cite{Broeder2010}
121
122
123\subsection{Ontologies, Controlled Vocabularies, Knowledge Organizing Systems}
124
125
126\subsubsection{Classification Schemes, Taxonomies }
127LCSH, DDC
128
129
130\subsubsection{Other controlled Vocabularies}
131Tagsets: STTS
132Language codes ISO-639-1
133
134\subsubsection{Domain Ontologies, Vocabularies}
135Organization-Lists
136LT-World !?
137
138
139\subsection{Use Cases}
140
141\begin{itemize}
142
143\item MD Search employing Semantic Mapping
144\item MD Search employing Fuzzy Search
145\item Content Search
146\item Combined MEtadata Content Search
147\item Visualization of the Results - charts on facets/dimensions
148
149\item  Create and publish Virtual Collection based on complex Search (intensional/extensional)
150\item  Let Create ad-hoc corpus
151\end{itemize}
152
153A trivial example for a concept-based query expansion:
154Confronted with a user query: \texttt{Actor.Name = Sue} and knowing that \texttt{Actor} is equivalent or similar to \texttt{Person} and \texttt{Name} is synonym to \texttt{FullName} the expanded query could look like:
155\texttt{Actor.Name = Sue OR Actor.FullName = Sue OR Person.Name =  Sue OR Person.FullName= is Sue}
156
157Another example concerning instance mapping: the user looking for all resource produced by or linked to a given institution, does not have to guess or care for various spellings of the name of the institution used in the description of the resources, but rather can browse through a controlled vocabulary of institutions and see all the resources of given institution. While this could be achieved by simple normalizing of the literal-values (and indeed that definitely has to be one processing step), the linking to an ontology, enables to user to also continue browsing the ontology to find institutions that are related to the original institution by means of being concerned with similar topics and retrieve a union of resources for such resulting cluster. Thus in general the user is enabled to work with the data based on information that is not present in the original dataset.
158
159\section{Semantic Mapping}
160
161
162\subsection{Profiles to Data Categories}
163CMD:Profile.Comp.Elem -> DatCat
164
165
166\subsection{Semantic Relations between (Data)Categories}
167
168Relation Registry
169
170!check DCR-RR/Odijk2010 -follow up
171!Cf. Erhard Hinrichs 2009
172
173
174\subsection{Mapping from strings to Entities}
175
176Based on the textual values in the Metadata-descriptions find matching entities in selected Ontologies.
177
178Identify related ontologies:
179LT-World \cite{Joerg2010}
180
181task:
182\begin{enumerate}
183\item  express MDRecords in RDF
184\item  identify related ontologies/vocabularies (category -> vocabulary)
185\item  implement (reuse) a lookup/mapping function (Vocabulary Alignement Service? CATCH-PLUS?)
186
187\fbox{  function lookup: Category x String -> ConceptualDomain}
188
189Normally this would be served by dedicated controlled vocabularies, but expect also some string-normalizing preprocessing etc.
190\end{enumerate} 
191
192
193
194\subsection{Semantic Search}
195
196Main purpose for the undertaking described in previous two chapters (mapping of concepts and entities) is to enhance the search capabilities of the MDService serving the Metadata/Resources-data. Namely to enhance it by employing ontological resources.
197Mainly this enhancement shall mean, that the user can access the data indirectly by browsing one or multiple  ontologies,
198with which the data will then be linked. These could be for example ontologies of Organizations and Projects.
199
200In this section we want to explore, how this shall be accomplished, ie how to bring the enhanced capabilities to the user.
201Crucial aspect is the question how to deal with the even greater amount of information in a user-friendly way, ie how to prevent overwhelming, intimidating or frustrating the user.
202
203Semi-transparently means, that primarily the semantic mapping shall integrate seamlessly in the interaction with the service, but it shall "explain" - offer enough information - on demand, for the user to understand its role and also being able manipulate easily.
204
205?
206Facets
207Controlled Vocabularies
208Synonym Expansion (via TermExtraction(ContentSet))
209
210\subsection{Linked Data - Express dataset in RDF}
211
212Partly as by-product of the entities-mapping effort we will get the metadata-description rendered in RDF, linked with
213So theoretically we then only need to provide them "on the web", to make them a nucleus of the LinkedData-Cloud.
214
215Practically this won't be that straight-forward as the mapping to entities will be a hell of a work.
216But once that is solved, or for the subsets that it is solved, the publication of that data on the "SemanticWeb" should be easy.
217
218Technical aspects (RDF-store?) / interface (ontology browser?)
219
220defining the Mapping:
221\begin{enumerate}
222\item convert to RDF
223translate: MDREcord -> [\#mdrecord \#property literal]
224\item map: \#mdrecord \#property literal  -> [\#mdrecord \#property \#entity]
225\end{enumerate}
226
227\subsection{Content/Annotation}
228AF + DCR + RR
229
230
231\subsection{Visualization}
232Landscape, Treemap, SOM
233
234Ontology Mapping and Alignement / saiks/Ontology4 4auf1.pdf
235
236
237
238\include{System}
239
240\include{Evaluation}
241
242
243\section{Conclusions and Future Work}
244
245\section{Questions, Remarks}
246
247\begin{itemize}
248\item How does this relate to federated search?
249\item ontologicky vs. semaziologicky (Semanticke priznaky: kategoriálne/archysémy, difernciacne, specifikacne)
250\item "controlled vocabularies"
251\end{itemize}
252
253
254\bibliographystyle{ieee}
255\bibliography{../../../2bib/lingua,../../../2bib/ontolingua}
256
257
258\end{document}
Note: See TracBrowser for help on using the repository browser.