Context Navigation

← Previous Changeset
Next Changeset →

Changeset 2669

Timestamp:

03/09/13 21:34:43 (11 years ago)

Author:

vronk

Message:

sections in separate files

Location:

SMC4LRT

Files:

: 4 added
: 1 edited

Evaluation.tex (added)
Introduction.tex (added)
Literature.tex (added)
Outline.tex (modified) (2 diffs)
System.tex (added)

Legend:

: Unmodified
: Added
: Removed

SMC4LRT/Outline.tex

-                      r1205
+                      r2669
 \tableofcontents
+\section{Introduction}
+Title: Semantic Mapping (Component) for Language Resources
+\subsection{Main Goal}
+We propose a component that shall enhance search functionality over a large heterogeneous collection of metadata descriptions of Language Resources and Technology (LRT). By applying semantic web technology the user shall be given both better recall through query expansion based on related categories/concepts and new means of exploring the dataset/knowledge-base via ontology-driven browsing.
+A trivial example for a concept-based query expansion:
+Confronted with a user query: \texttt{Actor.Name = Sue} and knowing that \texttt{Actor} is equivalent or similar to \texttt{Person} and \texttt{Name} is synonym to \texttt{FullName} the expanded query could look like:
+\texttt{Actor.Name = Sue OR Actor.FullName = Sue OR Person.Name =  Sue OR Person.FullName= is Sue}
+Another example concerning instance mapping: the user looking for all resource produced by or linked to a given institution, does not have to guess or care for various spellings of the name of the institution used in the description of the resources, but rather can browse through a controlled vocabulary of institutions and see all the resources of given institution. While this could be achieved by simple normalizing of the literal-values (and indeed that definitely has to be one processing step), the linking to an ontology, enables to user to also continue browsing the ontology to find institutions that are related to the original institution by means of being concerned with similar topics and retrieve a union of resources for such resulting cluster. Thus in general the user is enabled to work with the data based on information that is not present in the original dataset.
+All these scenarios require a preprocessing step, that would produce the underlying linkage, both between categories/concepts and between instances (mapping literal values to entities). We refer to this task as semantic mapping, that shall be accomplished by coresponding "Semantic Mapping Component". In this work the focus lies on the process/method, i.e. on the specification and (prototypical) implementation of the component rather than trying to establish some final/accomplished mapping. Although a tentative/naive alignement on a subset of the data will be proposed, this will be mainly used for evaluation and shall serve as basis for discussion with domain experts aiming at creating the actual sensible mappings usable for real tasks.
+Actually due to the great diversity of resources and research tasks  such a "final" complete mapping/alignement does not seem achievable at all. Therefore also the focus shall be on "soft", dynamic mapping, investigating the possibilities/methods to enable the users to adapt the mapping or apply different mapping with respect to their current task or research question,
+essentially being able to actively manipulate the recall/precision ratio of their searches. This entails the examination of user interaction with and visualization of the relevant information in the user interface and enabling the user to act upon it.
+\subsection{Method}
+We start with examining the existing Data and describing the evolving Infrastructure in which the components are to be embedded.
+Then we formulate the task/function of Semantic Search on concept and on individuals level
+and the underlying Semantic Mapping and the requirements within the defined context,
+followed by a design proposal for an appropriate component fitting within the infrastructure.
+especially with focus on the feasibility of employing ontology mapping and alignement techniques and tools for the creation of mappings.
+In a prototype we want to deliver a proof of the concept,
+combined with an evaluation to verify the claims of fitness for the purpose.
+This evaluation is twofold. It shall verify the ability of the system to support dynamic mapping based on a set of test queries
+and secondly the usability of the ui-controls.
++? Identify hooks into LOD?
+a) define/use semantic relations between categories (RelationRegistry)
+b) employ ontological resources to enhance search in the dataset (SemanticSearch)
+c) specify a translation instructions for expressing dataset in rdf  (LinkedData)
+\subsection{Expected Results}
+The main result of this work will be a specification of the pair of components the Semantic Search and the underlying Semantic Mapping. This propositions will be supported by a proof-of-concept implementation of these components and an evaluation of querying the dataset comparing traditional search and semantic search.
+One important by-product of the work will be the original dataset expressed as RDF with links into existing datasets/ontologies/knowledgebases, building a base for another nucleus of Linked Open Data.
+\begin{itemize}
+\item [Specification] definition of a mapping mechanism
+\item [Prototype] proof of concept implementation
+\item [Evaluation] evaluation results of querying the dataset comparing traditional search and semantic search
+\item [LinkedData] translation of the source dataset to RDF-based format with links into existing datasets/ontologies/knowledgebases
+\end{itemize}
+\subsection{State of the Art}
+\begin{itemize}
+\item VLO - Virtual Language Observatory  \url{http://www.clarin.eu/vlo/}, \cite{VanUytvanck2010}
+\item LT-World ontology-based \url{http://www.lt-world.org/}, \cite{Joerg2010}
+\item VAS - Catch Plus
+\item OAEI
+\end{itemize}
+\subsection{Keywords}
+Metadata interoperability, Ontology Mapping, Schema mapping, Crosswalk, Similarity measures, LinkedData
+Fuzzy Search, Visual Search?
+Language Resources and Technology, LRT/NLP/HLT
+Ontology Visualization
+Federated Search, Distributed Content Search
+(ILS - Integrated Library Systems)
+\section{Related Work}
+\subsection{Language Resources and Technology}
+While in the Digital Libraries community a consolidation generally already happened and big federated networks of digital libary repository are set up, in the field of Language Resource and Technology the landscape is still scattered, although meanwhile looking back at a decade of standardizing efforts. One main reason seems to be the complexity and diversity of the metadata associated with the resources, stemming for one from the wide range of resource types additionally complicated by dependence of different schools of thought.
+Need some number about the disparity in the field, number of institutes, resources, formats.
+This situation has been identified by the community and multiple standardization initiatives had been conducted/undertaken. This process seems to have gained a new momentum thanks to large Research Infrastructure Programmes introduced by European Commission, aimed at fostering Research communities developing large-scale pan-european common infrastructures. One key player in this development is the project CLARIN.
+\subsubsection{CLARIN}
+CLARIN - Common Language Resource and Technology Infrastructure - constituted by over 180 members from round 38 countries. The mission of this project is
+    create a research infrastructure that makes language resources and technologies (LRT) available to scholars of all disciplines, especially SSH large-scale pan-European collaborative effort to create, coordinate and make language resources and technology available and readily useable
+This shall be accomplished by setting up a federated network of centers (with federated identity management) but mainly providing resources and services in an agreed upon / coherent / uniform / consistent /standardized manner. The foundation for this goal shall be the Common or Component Metadata infrastructure, a model that caters for flexible metadata profiles, allowing to accomodate existing schemas.
+The embedment in the CLARIN project brings about the context of Language Resources and HLT (Human Language Technology, aka NLP - Natural Language Processing) and SSH (Social Sciences and Humanities) as the primary target user-group of CLARIN.
+CLARIN/NLP for SSH
+\subsubsection{Standards}
+\begin{description}
+\item[ISO12620] Data Category Registry
+\item[LAF] Linguistic Annotation Framework
+\item[CMDI] - (DC, OLAC, IMDI, TEI)
+\end{description}
+\subsubsection{NLP MD Catalogues}
+\begin{description}
+\item[LAT, TLA] - Language Archiving Technology, now The Language Archive - provided by Max Planck Insitute for Psycholinguistics \url{http://www.mpi.nl/research/research-projects/language-archiving-technology}
+\item[OTA LR] Archiving Service provided by Oxford Text Archive \url{http://ota.oucs.ox.ac.uk/}
+\item[OLAC]
+\item[ELRA]
+\item[LDC]
+\item[DFKI/LT-World]
+\end{description}
+\subsection{Ontologies}
+\subsubsection{Word, Sense, Concept}
+Lexicon vs. Ontology
+Lexicon is a linguistic object an ontology is not.\cite{Hirst2009} We don't need to be that strict, but it shall be a guiding principle in this work to consider things (Datasets, Vocabularies, Resources) also along this dichotomy/polarity: Conceptual vs. Lexical.
+And while every Ontology has to have a lexical representation (canonically: rdfs:label, rdfs:comment, skos:*label), if we don't try to force observed objects into a binary classification, but consider a bias spectrum, we should be able to locate these along this spectrum.
+So the main focus of a typical ontology are the concepts ("conceptualization"), primarily language-independent.
+A special case are Linguistic Ontologies: isocat, GOLD, WALS.info
+ontologies conceptualizing the linguistic domain
+They are special in that ("ontologized") Lexicons refer to them to describe linguistic properties of the Lexical Entries, as opposed to linking to Domain Ontologies to anchor Senses/Meanings.
+Lexicalized Ontologies: LingInfo, lemon: LMF +  isocat/GOLD +  Domain Ontology
+a) as domain ontologies, describing aspects of the Resources\\
+b) as linguistic ontologies enriching the Lexicalization of Concepts
+Ontology and Lexicon \cite{Hirst2009}
+LingInfo/Lemon \cite{Buitelaar2009}
+We shouldn't need linguistic ontologies (LingInfo, LEmon), they are primarily relevant in the task of ontology population from texts, where the entities can be encountered in various word-forms in the context of the text.
+(Ontology Learning, Ontology-based Semantic Annotation of Text)
+And we are dealing with highly structured data with referenced in their nominal(?) form.
+Another special case are Controlled Vocabularies or Taxonomies/Classification Systems, let alone folksonomies, in that they identify terms and concepts/meanings, ie there is no explicit mapping between the language represenation and the concept, but rather the term is implicit carrier of the meaning/concept.
+So for example in the LCSH the surface realization of each subject-heading at the same time identifies the Concept ~.
+controlled vocabularies?
+\subsubsection{Semantic Web - Linked Data}
+\begin{description}
+\item[RDF/OWL]
+\item[SKOS]
+\end{description}
+\subsubsection{OntologyMapping}
+\subsection{Visualization}
+\subsection{FederatedSearch}
+\subsubsection{Standards}
+\begin{description}
+\item[Z39.50/SRU/SRW/CQL] LoC
+\item[OAI-PMH]
+\end{description}
+\subsubsection{(Digital) Libraries}
+General (Libraries, Federations):
+\begin{description}
+\item[OCLC] \url{http://www.oclc.org}
+    world's biggest Library Federation
+\item[LoC] Library of Congress \url{http://www.loc.gov}
+\item[EU-Lib] European Library \url{http://www.theeuropeanlibrary.org/portal/organisation/handbook/accessing-collections\_ en.htm}
+\item[europeana] virtual European library - cross-domain portal \url{http://www.europeana.eu/portal/}
+\end{description}
+\subsubsection{Content Repositories}
+\begin{description}
+\item[PHAIDRA] Permanent Hosting, Archiving and Indexing of Digital Resources and Assets, provided by Vienna University \url{https://phaidra.univie.ac.at/}
+\item[eSciDoc]  provided by MPG + FIZ Karlsruhe \url{https://www.escidoc.org/}
+\item[DRIVER] pan-European infrastructure of Digital Repositories \url{http://www.driver-repository.eu/}
+\item[OpenAIRE] - Open Acces Infrastructure for Research in Europe \url{http://www.openaire.eu/}
+\end{description}
+\subsubsection{(MD)search frameworks:}
+\begin{description}
+\item[Zebra/Z39.50] JZKit
+\item[Lucene/Solr]
+\item[eXist] - xml DB
+\end{description}
+\subsubsection{Content/Corpus Search}
+Corpus Search Systems
+\begin{description}
+\item[DDC]  - text-corpus
+\item[manatee] - text-corpus
+\item[CQP] - text-corps
+\item[TROVA] - MM annotated resources
+\item[ELAN] - MM annotated resources (editor + search)
+\end{description}
+\subsection{Summary}
+\include{Introduction}
+\include{Literature}
 \section{Definitions}
 We want to clarify or lay dowhn a few terms and definition, ie explanation of our understanding
+We want to clarify or lay down a few terms and definition, ie explanation of our understanding
 \begin{description}
 …
+\section{System Design}
+SOA
+\subsection{Architecture}
+Makes use of mulitple Components of the established infrastructure (CLARIN ) \cite{Varadi2008}, \cite{Broeder2010}:
+\begin{itemize}
+\item Data Category REgistry,
+\item Relation Registry
+\item Component Registry
+\item Vocabulary Alignement Service
+\end{itemize}
+merging the pieces of information provided by those,
+offering them semi-transaprently to the user (or application) on the consumption side.
+\subsection{CMDI}
+MDBrowser
+MDService
+\subsection{Query Language}
+CQL?
+\subsection{User Interface}
+\subsubsection{Query Input}
+\subsubsection{Columns}
+\subsubsection{Summaries}
+\subsubsection{Differential Views}
+Visualize impact of given mapping in terms of covered dataset (number of matched records).
+\section{Evaluation}
+\subsection{Research Questions }
+\subsection{Sample Queries}
+candidate Categories:
+ResourceType, Format
+Genre, Topic
+Project, Institution, Person, Publisher
+\subsection{Usability}
+\section{Conclusions and Futur Work}
+\include{System}
+\include{Evaluation}
+\section{Conclusions and Future Work}
 \section{Questions, Remarks}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 2669

Legend:

SMC4LRT/Outline.tex

Download in other formats: