- Timestamp:
- 04/01/11 21:55:53 (13 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
SMC4LRT/Outline.tex
r1186 r1188 10 10 11 11 \usepackage{url} 12 %\usepackage{svn-multi} 13 14 % Subversion Information 15 %\svnidlong 16 %{$HeadURL: $} 17 %{$LastChangedDate: $} 18 %{$LastChangedRevision: $} 19 %{$LastChangedBy: $} 20 %\svnid{$Id$} 21 12 22 13 23 %%% Examples of Article customizations … … 73 83 74 84 \subsection{Main Goal} 85 86 87 a) define/use semantic relations between categories (RelationRegistry) 88 b) employ ontological resources to enhance search in the dataset (SemanticSearch) 89 c) specify a translation instructions for expressing dataset in rdf (LinkedData) 90 75 91 Propose a semantic mapping component for Language Resources and Technology within the context of a federated infrastructure (being constructed in the project CLARIN). 76 92 Due to the great diversity of resources and research tasks a full alignement is not achievable. Rather the focus shall be on "soft", dynamic mapping, investigating the possibilities/methods to enable the users to control the mapping with respect to their current task, 77 93 essentially being able to actively manipulate the recall/precision ratio of their searches. This entails the examination of user interaction with and visualization of the relevant information in the user interface and enabling the user to act upon it. 94 95 Example 78 96 79 97 … … 87 105 and secondly the usability of the ui-controls. 88 106 107 +? Identify hooks into LOD? 108 89 109 \subsection{Expected Results} 90 110 91 111 \begin{itemize} 92 93 \item Specification94 \item Prototype95 \item Evaluation112 \item [Specification] definition of a mapping mechanism 113 \item [Prototype] proof of concept implementation 114 \item [Evaluation] evaluation results of querying the dataset comparing traditional search and semantic search 115 \item [LinkedData] translation of the source dataset to RDF-based format with links into existing datasets/ontologies/knowledgebases 96 116 97 117 \end{itemize} … … 101 121 102 122 \begin{itemize} 103 \item OLAC - Open Language Archives Community \ url{http://www.language-archives.org/} 104 \item VLO - Virtual Language Observatory \url{http://www.clarin.eu/vlo/} 105 \item nature.com OpenSearch: A Case Study in OpenSearch and SRU Integration \cite{hammond2010} 106 \item Kowalski (2011): Information Retrieval \cite{kowalski2011} 123 \item VLO - Virtual Language Observatory \url{http://www.clarin.eu/vlo/}, \cite{VanUytvanck2010} 124 \item Ontology and Lexicon \cite{Hirst2009} 125 \item LingInfo/Lemon \cite{Buitelaar2009} 107 126 \end{itemize} 108 127 109 128 \subsection{Keywords} 110 129 111 Information retrieval, Information Discovery, IR-Systems 112 ILS - Integrated Library Systems 113 114 Metadata interoperability - schema mapping /semantic mapping repository, crosswalk, 115 116 Distributed content search, federated search 117 118 Fuzzy Search / Similarity measures 119 schema mapping / semantic mapping repository profiling? 120 121 Visualization (Treemap, SOM) 122 123 \section{Context} 130 Metadata interoperability, Ontology Mapping, Schema mapping, Crosswalk, Similarity measures, LinkedData 131 Fuzzy Search, Visual Search? 132 133 Language Resources and Technology, LRT/NLP/HLT 134 135 Ontology Visualization 136 137 Federated Search, Distributed Content Search 138 (ILS - Integrated Library Systems) 139 140 141 \section{Related Work} 124 142 125 143 \subsection{Language Resources and Technology} … … 129 147 Need some number about the disparity in the field, number of institutes, resources, formats. 130 148 131 This situation has been identified by the community and multiple standardization initiatives had been conducted . This process seems to have gained a new momentum thanks to large Research Infrastructure Programmes introduced by European Commission, aimed at fostering Research communities developing large-scale pan-european common infrastructures. One key player in this development is the project CLARIN132 133 \subs ection{CLARIN}149 This situation has been identified by the community and multiple standardization initiatives had been conducted/undertaken. This process seems to have gained a new momentum thanks to large Research Infrastructure Programmes introduced by European Commission, aimed at fostering Research communities developing large-scale pan-european common infrastructures. One key player in this development is the project CLARIN. 150 151 \subsubsection{CLARIN} 134 152 135 153 CLARIN - Common Language Resource and Technology Infrastructure - constituted by over 180 members from round 38 countries. The mission of this project is … … 142 160 CLARIN/NLP for SSH 143 161 144 \subs ection{Standards}145 146 \begin{description} 147 \item[ISO12620] 148 \item[ Z39.50/SRU/SRW/CQL] LoC162 \subsubsection{Standards} 163 164 \begin{description} 165 \item[ISO12620] Data Category Registry 166 \item[LAF] Linguistic Annotation Framework 149 167 \item[CMDI] - (DC, OLAC, IMDI, TEI) 150 \item[RDF/OWL] 151 \end{description} 152 153 \subsection{MD Catalogues} 154 155 \subsubsection{NLP} 168 \end{description} 169 170 \subsubsection{NLP MD Catalogues} 156 171 157 172 \begin{description} … … 164 179 \end{description} 165 180 181 \subsection{Ontologies} 182 183 \subsubsection{Word, Sense, Concept} 184 185 Lexicon vs. Ontology 186 Lexicon is a linguistic object an ontology is not.\cite{Hirst2009} We don't need to be that strict, but it shall be a guiding principle in this work to consider things (Datasets, Vocabularies, Resources) also along this dichotomy/polarity: Conceptual vs. Lexical. 187 And while every Ontology has to have a lexical representation (canonically: rdfs:label, rdfs:comment, skos:*label), if we don't try to force observed objects into a binary classification, but consider a bias spectrum, we should be able to locate these along this spectrum. 188 So the main focus of a typical ontology are the concepts ("conceptualization"), primarily language-independent. 189 190 A special case are Linguistic Ontologies: isocat, GOLD, WALS.info 191 ontologies conceptualizing the linguistic domain 192 193 They are special in that ("ontologized") Lexicons refere to them to describe linguistic properties of the Lexical Entries, as opposed to linking to Domain Ontologies to anchor Senses/Meanings. 194 Lexicalized Ontologies: LingInfo, lemon: LMF + isocat/GOLD + Domain Ontology 195 196 Another special case are Controlled Vocabularies or Taxonomies/Classification Systems, let alone folksonomies, in that they identify terms and concepts/meanings, ie there is no explicit mapping between the language represenation and the concept, but rather the term is implicit carrier of the meaning/concept. 197 So for example in the LCSH the surface realization of each subject-heading at the same time identifies the Concept ~. 198 199 controlled vocabularies? 200 201 202 \subsubsection{Semantic Web - Linked Data} 203 204 \begin{description} 205 \item[RDF/OWL] 206 \item[SKOS] 207 \end{description} 208 209 \subsubsection{OntologyMapping} 210 211 212 \subsection{Visualization} 213 214 215 \subsection{FederatedSearch} 216 217 \subsubsection{Standards} 218 219 \begin{description} 220 \item[Z39.50/SRU/SRW/CQL] LoC 221 \item[OAI-PMH] 222 \end{description} 223 224 166 225 \subsubsection{(Digital) Libraries} 167 226 168 Digital Libraries169 227 170 228 General (Libraries, Federations): … … 174 232 world's biggest Library Federation 175 233 \item[LoC] Library of Congress \url{http://www.loc.gov} 176 \item[EU-Lib] European Library \url{http://www.theeuropeanlibrary.org/portal/organisation/handbook/accessing-collections _ en.htm}234 \item[EU-Lib] European Library \url{http://www.theeuropeanlibrary.org/portal/organisation/handbook/accessing-collections\_ en.htm} 177 235 \item[europeana] virtual European library - cross-domain portal \url{http://www.europeana.eu/portal/} 178 236 \end{description} … … 187 245 \end{description} 188 246 189 190 \subsection{Technologies}191 247 192 248 \subsubsection{(MD)search frameworks:} … … 208 264 \end{description} 209 265 266 \subsection{Summary} 267 268 \section{Definitions} 269 We want to clarify or lay dowhn a few terms and definition, ie explanation of our understanding 270 271 \begin{description} 272 \item[Concept] sense, idea, philosophical problem, which we don't need to discuss here. For our purposes we say: Basic "entity" in an ontology? that of what an ontology is build 273 \item[Ontology] "a explicit specification of a conceptualization" [cite!], but for us mainly a collection of concepts as opposed to lexicon, which is a collection of words. 274 \item[Word] a lexical unit, a word in a language, something that has a surface Realization (writtenForm) and is a carrier of sense. so a Relation holds: hasSense(Word, Concept) 275 \item[Lexicon] a collection of words, a (lexical) vocabulary 276 \item[Vocabulary] an index providing mapping from Word (string) to Concept (uri) 277 \item[(Data)Category] (almost) the same as Concept; Things like "Topic", "Genre", "Organization", "ResourceType" are instantiations of Category 278 \item[ConceptualDomain] the Class of entities a Concept/Category denotes. For Organization it would be all (existing) organizations, CD(ResourceType)={Corpus, Lexicon, Document, Image, Video, ...}. Entities of the domain can itself be Categories (ResourceType:Image), but it can be also individuals (Organization University of Vienna) 279 \item[Entity] 280 \item[Resource] informational resource, in the context of CLARIN-Project mainly Language Resources (Corpus, Lexicon, Multimedia) 281 \item[Metadata Description] description of some properties of a resource. MD-Record 282 \item[Schema] - CMD-Profile 283 \item[Annotation] 284 285 286 \end{description} 210 287 211 288 \section{Analysis} … … 215 292 Describe situation regarding the datasets and formats 216 293 217 collections, profiles/Terms 294 collections, profiles/Terms, ResourceTypes! 218 295 219 296 DC, OLAC, … … 223 300 \subsection{Infrastructure} 224 301 225 CMDI 302 CMDI \cite{Broeder2010} 303 304 305 \subsection{Ontologies, Controlled Vocabularies, Knowledge Organizing Systems} 306 307 308 \subsubsection{Classification Schemes, Taxonomies } 309 LCSH, DDC 310 311 312 \subsubsection{Other controlled Vocabularies} 313 Tagsets: STTS 314 Language codes ISO-639-1 315 316 \subsubsection{Domain Ontologies, Vocabularies} 317 Organization-Lists 318 LT-World !? 319 226 320 227 321 \subsection{Use Cases} … … 242 336 243 337 338 \subsection{Profiles to Data Categories} 339 CMD:Profile.Comp.Elem -> DatCat 340 341 342 \subsection{Semantic Relations between (Data)Categories} 343 344 Relation Registry 345 346 !check DCR-RR/Odijk2010 -follow up 347 !Cf. Erhard Hinrichs 2009 348 349 350 \subsection{Mapping from strings to Entities} 351 352 Based on the textual values in the Metadata-descriptions find matching entities in selected Ontologies. 353 354 Identify related ontologies: 355 LT-World \cite{Joerg2010} 356 357 task: 358 \begin{enumerate} 359 \item express MDRecords in RDF 360 \item identify related ontologies/vocabularies (category -> vocabulary) 361 \item implement (reuse) a lookup/mapping function (Vocabulary Alignement Service? CATCH-PLUS?) 362 363 \fbox{ function lookup: Category x String -> ConceptualDomain} 364 365 Normally this would be served by dedicated controlled vocabularies, but expect also some string-normalizing preprocessing etc. 366 \end{enumerate} 367 368 369 370 \subsection{Semantic Search} 371 372 Main purpose for the undertaking described in previous two chapters (mapping of concepts and entities) is to enhance the search capabilities of the MDService serving the Metadata/Resources-data. Namely to enhance it by employing ontological resources. 373 Mainly this enhancement shall mean, that the user can access the data indirectly by browsing one or multiple ontologies, 374 with which the data will then be linked. These could be for example ontologies of Organizations and Projects. 375 376 In this section we want to explore, how this shall be accomplished, ie how to bring the enhanced capabilities to the user. 377 Crucial aspect is the question how to deal with the even greater amount of information in a user-friendly way, ie how to prevent overwhelming, intimidating or frustrating the user. 244 378 245 379 Semi-transparently means, that primarily the semantic mapping shall integrate seamlessly in the interaction with the service, but it shall "explain" - offer enough information - on demand, for the user to understand its role and also being able manipulate easily. 246 380 247 381 ? 248 382 Facets 249 383 Controlled Vocabularies 250 384 Synonym Expansion (via TermExtraction(ContentSet)) 251 385 252 Defining the Mapping - SKOS, Owl? 253 254 Distinction between Metadata and content blurry 255 256 \subsection{Metadata} 257 CMD:Profile.Comp.Elem -> DatCat 258 386 \subsection{Linked Data - Express dataset in RDF} 387 388 Partly as by-product of the entities-mapping effort we will get the metadata-description rendered in RDF, linked with 389 So theoretically we then only need to provide them "on the web", to make them a nucleus of the LinkedData-Cloud. 390 391 Practically this won't be that straight-forward as the mapping to entities will be a hell of a work. 392 But once that is solved, or for the subsets that it is solved, the publication of that data on the "SemanticWeb" should be easy. 393 394 Technical aspects (RDF-store?) / interface (ontology browser?) 395 396 defining the Mapping: 397 \begin{enumerate} 398 \item convert to RDF 399 translate: MDREcord -> [\#mdrecord \#property literal] 400 \item map: \#mdrecord \#property literal -> [\#mdrecord \#property \#entity] 401 \end{enumerate} 259 402 260 403 \subsection{Content/Annotation} 261 404 AF + DCR + RR 262 405 406 263 407 \subsection{Visualization} 264 408 Landscape, Treemap, SOM … … 266 410 Ontology Mapping and Alignement / saiks/Ontology4 4auf1.pdf 267 411 412 268 413 \section{System Design} 269 414 SOA … … 271 416 \subsection{Architecture} 272 417 273 Makes use of mulitple Components of the established infrastructure (CLARIN ) :418 Makes use of mulitple Components of the established infrastructure (CLARIN ) \cite{Varadi2008}, \cite{Broeder2010}: 274 419 275 420 \begin{itemize} … … 309 454 \subsection{Sample Queries} 310 455 456 candidate Categories: 457 ResourceType, Format 458 Genre, Topic 459 Project, Institution, Person, Publisher 460 311 461 \subsection{Usability} 312 462 313 \section{Literature} 314 315 \subsection{Standards} 316 317 ISO 12620 - Data Category Registry 318 319 LoC - SRU / CQL 320 321 OAI-PMH 322 323 LAF - Linguistic Annotation Framework 324 325 \subsection{Books, Papers} 326 A Formal Framework for Linguistic Annotation Steven Bird and Mark Liberman 2000 327 328 Gerald Kowalski (2011): Information Retrieval Architecture and Algorithms 329 330 Chowdhury, Gobinda G. : Introduction to modern information retrieval. - London : Facet Publ., 2010 (trocken) 331 332 \subsection{DigLibs} 333 334 http://publik.tuwien.ac.at/searchdb.php 335 336 ISIWebOfKnowledge (/ + http://scientific.thomsonwebplus.com) 337 338 \subsection{Journals, Conferences} 339 340 ACM 341 http://www.sigir.org/ 342 343 ECIR 344 345 Cambridge Journals 346 347 348 349 350 \bibliographystyle{plain} 351 \bibliography{../lit/ir} 463 \section{Conclusions and Futur Work} 464 465 \section{Questions, Remarks} 466 467 \begin{itemized} 468 \item How does this relate to federated search? 469 \item ontologicky vs. semaziologicky (Semanticke priznaky: kategoriálne/archysémy, difernciacne, specifikacne) 470 \end{itemized} 471 472 473 \bibliographystyle{ieee} 474 \bibliography{../../../2bib/lingua,../../../2bib/ontolingua} 475 352 476 353 477 \end{document}
Note: See TracChangeset
for help on using the changeset viewer.