1 | % !TEX TS-program = pdflatex |
---|
2 | % !TEX encoding = UTF-8 Unicode |
---|
3 | |
---|
4 | % This is a simple template for a LaTeX document using the "article" class. |
---|
5 | % See "book", "report", "letter" for other types of document. |
---|
6 | |
---|
7 | \documentclass[11pt]{article} % use larger type; default would be 10pt |
---|
8 | |
---|
9 | \usepackage[utf8]{inputenc} % set input encoding (not needed with XeLaTeX) |
---|
10 | |
---|
11 | \usepackage{url} |
---|
12 | %\usepackage{svn-multi} |
---|
13 | |
---|
14 | % Subversion Information |
---|
15 | %\svnidlong |
---|
16 | %{$HeadURL: $} |
---|
17 | %{$LastChangedDate: $} |
---|
18 | %{$LastChangedRevision: $} |
---|
19 | %{$LastChangedBy: $} |
---|
20 | %\svnid{$Id$} |
---|
21 | |
---|
22 | |
---|
23 | %%% Examples of Article customizations |
---|
24 | % These packages are optional, depending whether you want the features they provide. |
---|
25 | % See the LaTeX Companion or other references for full information. |
---|
26 | |
---|
27 | %%% PAGE DIMENSIONS |
---|
28 | \usepackage{geometry} % to change the page dimensions |
---|
29 | \geometry{a4paper} % or letterpaper (US) or a5paper or.... |
---|
30 | %\geometry{margin=1cm} % for example, change the margins to 2 inches all round |
---|
31 | \topmargin=-0.5in |
---|
32 | \textheight=700pt |
---|
33 | % \geometry{landscape} % set up the page for landscape |
---|
34 | % read geometry.pdf for detailed page layout information |
---|
35 | |
---|
36 | \usepackage{graphicx} % support the \includegraphics command and options |
---|
37 | |
---|
38 | % \usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent |
---|
39 | |
---|
40 | %%% PACKAGES |
---|
41 | \usepackage{booktabs} % for much better looking tables |
---|
42 | \usepackage{array} % for better arrays (eg matrices) in maths |
---|
43 | \usepackage{paralist} % very flexible & customisable lists (eg. enumerate/itemize, etc.) |
---|
44 | \usepackage{verbatim} % adds environment for commenting out blocks of text & for better verbatim |
---|
45 | %\usepackage{subfig} % make it possible to include more than one captioned figure/table in a single float |
---|
46 | % These packages are all incorporated in the memoir class to one degree or another... |
---|
47 | |
---|
48 | %%% HEADERS & FOOTERS |
---|
49 | \usepackage{fancyhdr} % This should be set AFTER setting up the page geometry |
---|
50 | \pagestyle{plain} % options: empty , plain , fancy |
---|
51 | \renewcommand{\headrulewidth}{0pt} % customise the layout... |
---|
52 | \lhead{}\chead{}\rhead{} |
---|
53 | \lfoot{}\cfoot{\thepage}\rfoot{} |
---|
54 | |
---|
55 | %%% SECTION TITLE APPEARANCE |
---|
56 | \usepackage{sectsty} |
---|
57 | \allsectionsfont{\sffamily\mdseries\upshape} % (See the fntguide.pdf for font help) |
---|
58 | % (This matches ConTeXt defaults) |
---|
59 | |
---|
60 | %%% ToC (table of contents) APPEARANCE |
---|
61 | \usepackage[nottoc,notlof,notlot]{tocbibind} % Put the bibliography in the ToC |
---|
62 | %\usepackage[titles,subfigure]{tocloft} % Alter the style of the Table of Contents |
---|
63 | %\renewcommand{\cftsecfont}{\rmfamily\mdseries\upshape} |
---|
64 | %\renewcommand{\cftsecpagefont}{\rmfamily\mdseries\upshape} % No bold! |
---|
65 | |
---|
66 | %%% END Article customizations |
---|
67 | |
---|
68 | %%% The "real" document content comes below... |
---|
69 | |
---|
70 | \title{SMC4LRT - Master Outline} |
---|
71 | \author{Matej Durco} |
---|
72 | %\date{} % Activate to display a given date or no date (if empty), |
---|
73 | % otherwise the current date is printed |
---|
74 | |
---|
75 | \begin{document} |
---|
76 | \maketitle |
---|
77 | |
---|
78 | \tableofcontents |
---|
79 | |
---|
80 | \include{Introduction} |
---|
81 | |
---|
82 | |
---|
83 | \include{Literature} |
---|
84 | |
---|
85 | |
---|
86 | \section{Definitions} |
---|
87 | We want to clarify or lay down a few terms and definition, ie explanation of our understanding |
---|
88 | |
---|
89 | \begin{description} |
---|
90 | \item[Concept] sense, idea, philosophical problem, which we don't need to discuss here. For our purposes we say: Basic "entity" in an ontology? that of what an ontology is build |
---|
91 | \item[Ontology] "a explicit specification of a conceptualization" [cite!], but for us mainly a collection of concepts as opposed to lexicon, which is a collection of words. |
---|
92 | \item[Word] a lexical unit, a word in a language, something that has a surface Realization (writtenForm) and is a carrier of sense. so a Relation holds: hasSense(Word, Concept) |
---|
93 | \item[Lexicon] a collection of words, a (lexical) vocabulary |
---|
94 | \item[Vocabulary] an index providing mapping from Word (string) to Concept (uri) |
---|
95 | \item[(Data)Category] (almost) the same as Concept; Things like "Topic", "Genre", "Organization", "ResourceType" are instantiations of Category |
---|
96 | \item[ConceptualDomain] the Class of entities a Concept/Category denotes. For Organization it would be all (existing) organizations, CD(ResourceType)={Corpus, Lexicon, Document, Image, Video, ...}. Entities of the domain can itself be Categories (ResourceType:Image), but it can be also individuals (Organization University of Vienna) |
---|
97 | \item[Entity] |
---|
98 | \item[Resource] informational resource, in the context of CLARIN-Project mainly Language Resources (Corpus, Lexicon, Multimedia) |
---|
99 | \item[Metadata Description] description of some properties of a resource. MD-Record |
---|
100 | \item[Schema] - CMD-Profile |
---|
101 | \item[Annotation] |
---|
102 | |
---|
103 | |
---|
104 | \end{description} |
---|
105 | |
---|
106 | \section{Analysis} |
---|
107 | |
---|
108 | \subsection{Data landscape} |
---|
109 | |
---|
110 | Describe situation regarding the datasets and formats |
---|
111 | |
---|
112 | collections, profiles/Terms, ResourceTypes! |
---|
113 | |
---|
114 | DC, OLAC, |
---|
115 | ISLE/IMDI, CHILDES, TEI, EAF! |
---|
116 | (CES/XCES) |
---|
117 | |
---|
118 | \subsection{Infrastructure} |
---|
119 | |
---|
120 | CMDI \cite{Broeder2010} |
---|
121 | |
---|
122 | |
---|
123 | \subsection{Ontologies, Controlled Vocabularies, Knowledge Organizing Systems} |
---|
124 | |
---|
125 | |
---|
126 | \subsubsection{Classification Schemes, Taxonomies } |
---|
127 | LCSH, DDC |
---|
128 | |
---|
129 | |
---|
130 | \subsubsection{Other controlled Vocabularies} |
---|
131 | Tagsets: STTS |
---|
132 | Language codes ISO-639-1 |
---|
133 | |
---|
134 | \subsubsection{Domain Ontologies, Vocabularies} |
---|
135 | Organization-Lists |
---|
136 | LT-World !? |
---|
137 | |
---|
138 | |
---|
139 | \subsection{Use Cases} |
---|
140 | |
---|
141 | \begin{itemize} |
---|
142 | |
---|
143 | \item MD Search employing Semantic Mapping |
---|
144 | \item MD Search employing Fuzzy Search |
---|
145 | \item Content Search |
---|
146 | \item Combined MEtadata Content Search |
---|
147 | \item Visualization of the Results - charts on facets/dimensions |
---|
148 | |
---|
149 | \item Create and publish Virtual Collection based on complex Search (intensional/extensional) |
---|
150 | \item Let Create ad-hoc corpus |
---|
151 | \end{itemize} |
---|
152 | |
---|
153 | A trivial example for a concept-based query expansion: |
---|
154 | Confronted with a user query: \texttt{Actor.Name = Sue} and knowing that \texttt{Actor} is equivalent or similar to \texttt{Person} and \texttt{Name} is synonym to \texttt{FullName} the expanded query could look like: |
---|
155 | \texttt{Actor.Name = Sue OR Actor.FullName = Sue OR Person.Name = Sue OR Person.FullName= is Sue} |
---|
156 | |
---|
157 | Another example concerning instance mapping: the user looking for all resource produced by or linked to a given institution, does not have to guess or care for various spellings of the name of the institution used in the description of the resources, but rather can browse through a controlled vocabulary of institutions and see all the resources of given institution. While this could be achieved by simple normalizing of the literal-values (and indeed that definitely has to be one processing step), the linking to an ontology, enables to user to also continue browsing the ontology to find institutions that are related to the original institution by means of being concerned with similar topics and retrieve a union of resources for such resulting cluster. Thus in general the user is enabled to work with the data based on information that is not present in the original dataset. |
---|
158 | |
---|
159 | \section{Semantic Mapping} |
---|
160 | |
---|
161 | |
---|
162 | \subsection{Profiles to Data Categories} |
---|
163 | CMD:Profile.Comp.Elem -> DatCat |
---|
164 | |
---|
165 | |
---|
166 | \subsection{Semantic Relations between (Data)Categories} |
---|
167 | |
---|
168 | Relation Registry |
---|
169 | |
---|
170 | !check DCR-RR/Odijk2010 -follow up |
---|
171 | !Cf. Erhard Hinrichs 2009 |
---|
172 | |
---|
173 | |
---|
174 | \subsection{Mapping from strings to Entities} |
---|
175 | |
---|
176 | Based on the textual values in the Metadata-descriptions find matching entities in selected Ontologies. |
---|
177 | |
---|
178 | Identify related ontologies: |
---|
179 | LT-World \cite{Joerg2010} |
---|
180 | |
---|
181 | task: |
---|
182 | \begin{enumerate} |
---|
183 | \item express MDRecords in RDF |
---|
184 | \item identify related ontologies/vocabularies (category -> vocabulary) |
---|
185 | \item implement (reuse) a lookup/mapping function (Vocabulary Alignement Service? CATCH-PLUS?) |
---|
186 | |
---|
187 | \fbox{ function lookup: Category x String -> ConceptualDomain} |
---|
188 | |
---|
189 | Normally this would be served by dedicated controlled vocabularies, but expect also some string-normalizing preprocessing etc. |
---|
190 | \end{enumerate} |
---|
191 | |
---|
192 | |
---|
193 | |
---|
194 | \subsection{Semantic Search} |
---|
195 | |
---|
196 | Main purpose for the undertaking described in previous two chapters (mapping of concepts and entities) is to enhance the search capabilities of the MDService serving the Metadata/Resources-data. Namely to enhance it by employing ontological resources. |
---|
197 | Mainly this enhancement shall mean, that the user can access the data indirectly by browsing one or multiple ontologies, |
---|
198 | with which the data will then be linked. These could be for example ontologies of Organizations and Projects. |
---|
199 | |
---|
200 | In this section we want to explore, how this shall be accomplished, ie how to bring the enhanced capabilities to the user. |
---|
201 | Crucial aspect is the question how to deal with the even greater amount of information in a user-friendly way, ie how to prevent overwhelming, intimidating or frustrating the user. |
---|
202 | |
---|
203 | Semi-transparently means, that primarily the semantic mapping shall integrate seamlessly in the interaction with the service, but it shall "explain" - offer enough information - on demand, for the user to understand its role and also being able manipulate easily. |
---|
204 | |
---|
205 | ? |
---|
206 | Facets |
---|
207 | Controlled Vocabularies |
---|
208 | Synonym Expansion (via TermExtraction(ContentSet)) |
---|
209 | |
---|
210 | \subsection{Linked Data - Express dataset in RDF} |
---|
211 | |
---|
212 | Partly as by-product of the entities-mapping effort we will get the metadata-description rendered in RDF, linked with |
---|
213 | So theoretically we then only need to provide them "on the web", to make them a nucleus of the LinkedData-Cloud. |
---|
214 | |
---|
215 | Practically this won't be that straight-forward as the mapping to entities will be a hell of a work. |
---|
216 | But once that is solved, or for the subsets that it is solved, the publication of that data on the "SemanticWeb" should be easy. |
---|
217 | |
---|
218 | Technical aspects (RDF-store?) / interface (ontology browser?) |
---|
219 | |
---|
220 | defining the Mapping: |
---|
221 | \begin{enumerate} |
---|
222 | \item convert to RDF |
---|
223 | translate: MDREcord -> [\#mdrecord \#property literal] |
---|
224 | \item map: \#mdrecord \#property literal -> [\#mdrecord \#property \#entity] |
---|
225 | \end{enumerate} |
---|
226 | |
---|
227 | \subsection{Content/Annotation} |
---|
228 | AF + DCR + RR |
---|
229 | |
---|
230 | |
---|
231 | \subsection{Visualization} |
---|
232 | Landscape, Treemap, SOM |
---|
233 | |
---|
234 | Ontology Mapping and Alignement / saiks/Ontology4 4auf1.pdf |
---|
235 | |
---|
236 | |
---|
237 | |
---|
238 | \include{System} |
---|
239 | |
---|
240 | \include{Evaluation} |
---|
241 | |
---|
242 | |
---|
243 | \section{Conclusions and Future Work} |
---|
244 | |
---|
245 | \section{Questions, Remarks} |
---|
246 | |
---|
247 | \begin{itemize} |
---|
248 | \item How does this relate to federated search? |
---|
249 | \item ontologicky vs. semaziologicky (Semanticke priznaky: kategoriálne/archysémy, difernciacne, specifikacne) |
---|
250 | \item "controlled vocabularies" |
---|
251 | \end{itemize} |
---|
252 | |
---|
253 | |
---|
254 | \bibliographystyle{ieee} |
---|
255 | \bibliography{../../../2bib/lingua,../../../2bib/ontolingua} |
---|
256 | |
---|
257 | |
---|
258 | \end{document} |
---|