Context Navigation

← Previous Change
Next Change →

Changeset 1200 for SMC4LRT

Timestamp:

04/12/11 12:05:18 (13 years ago)

Author:

vronk

Message:

Location:

SMC4LRT

Files:

: 2 edited

Expose.tex (modified) (4 diffs)
Outline.tex (modified) (5 diffs)

Legend:

: Unmodified
: Added
: Removed

SMC4LRT/Expose.tex

-                      r1196
+                      r1200
 \section{Main Goal}
 We propose a component that shall enhance search functionality over a large heterogeneous collection of metadata descriptions of Language Resources and Technology (LRT). By applying semantic web technology the user shall be given both better recall through query expansion based on related categories/concepts and new means of exploring the dataset/knowledge-base via ontology-driven browsing.
+This work proposes a component that shall enhance search functionality over a large heterogeneous collection of metadata descriptions of Language Resources and Technology (LRT). By applying semantic web technology the user shall be given both better recall through \emph{query expansion} based on related categories, concepts and new means of \emph{exploring the dataset} via ontology-driven browsing.
 A trivial example for a concept-based query expansion:
 Confronted with a user query: \texttt{Actor.Name = Sue} and knowing that \texttt{Actor} is equivalent or similar to \texttt{Person} and \texttt{Name} is synonym to \texttt{FullName} the expanded query could look like:
+\texttt{Actor.Name = Sue OR Actor.FullName = Sue OR Person.Name =  Sue OR Person.FullName= is Sue}
+\begin{quote}
+\texttt{Actor.Name = Sue OR Actor.FullName = Sue OR \\ Person.Name =  Sue OR Person.FullName = Sue}
+\end{quote}
 Another example concerning instance mapping: the user looking for all resource produced by or linked to a given institution, does not have to guess or care for various spellings of the name of the institution used in the description of the resources, but rather can browse through a controlled vocabulary of institutions and see all the resources of given institution. While this could be achieved by simple normalizing of the literal-values (and indeed that definitely has to be one processing step), the linking to an ontology, enables to user to also continue browsing the ontology to find institutions that are related to the original institution by means of being concerned with similar topics and retrieve a union of resources for such resulting cluster. Thus in general the user is enabled to work with the data based on information that is not present in the original dataset.
+An example concerning instance mapping: Starting from a list of topics the user can browse the ontology to find institutions concerned with those topics and retrieve a union of resources for such resulting cluster. Thus in general the user is enabled to work with the data based on information that is not present in the original dataset, but rather in external linked-in semantic resources.
 All these scenarios require a preprocessing step, that would produce the underlying linkage, both between categories/concepts and between instances (mapping literal values to entities). We refer to this task as semantic mapping, that shall be accomplished by coresponding "Semantic Mapping Component". In this work the focus lies on the process/method, i.e. on the specification and (prototypical) implementation of the component rather than trying to establish some final/accomplished mapping. Although a tentative/naive alignement on a subset of the data will be proposed, this will be mainly used for evaluation and shall serve as basis for discussion with domain experts aiming at creating the actual sensible mappings usable for real tasks.
+Such semantic search functionality requires a preprocessing step, that produces the underlying linkage, both between categories/concepts and between instances (mapping literal values to entities). We refer to this task as \emph{semantic mapping}, that shall be accomplished by coresponding \textsc{Semantic Mapping Component}. In this work the focus lies on the process/method operationalized in the specification and (prototypical) implementation of the component rather than trying to establish some final, accomplished mapping. Although a tentative, naive alignement on a subset of the data will be proposed, this will be mainly used for evaluation and shall serve as basis for discussion with domain experts aiming at creating the actual sensible mappings usable for real tasks.
 Actually due to the great diversity of resources and research tasks  such a "final" complete mapping/alignement does not seem achievable at all. Therefore also the focus shall be on "soft", dynamic mapping, investigating the possibilities/methods to enable the users to adapt the mapping or apply different mappings with respect to their current task or research question,
 …
 \section{Method}
 We start with examining the existing data and describing the evolving Infrastructure in which the components are to be embedded.
+Then we formulate the task/function of Semantic Search on concept and on individuals level
+and the underlying Semantic Mapping and the requirements within the defined context,
+followed by a design proposal for an appropriate component fitting within the infrastructure.
+especially with focus on the feasibility of employing ontology mapping and alignement techniques and tools for the creation of mappings.
+Then we formulate the task/function of \emph{Semantic Search} distinguishing between the concept level - using semantic relations between concepts or categories for better retrieval and the individuals level - allowing the user to navigate to resources via semantic resources (ontologies, vocabularies).
+Subsequently we introduce the underlying \emph{Semantic Mapping Component} and the requirements within the defined context,
+followed by a design proposal for an appropriate component building upon the existing pieces of the infrastructure.
+A special focus will be put on the examination of the feasibility of employing ontology mapping and alignement techniques and tools for the creation of the mappings.
+Based on a survey of existing semantic resources (ontologies, vocabularies), we identify an intial set of relevant
+ones, which will be used in the exercise of mapping the literal values in the metadata descriptions to the externally defined entities, essentially interrelating the dataset with external resources and entities. ("Linked Data"). A necessary preprocessing step is the task of expressing the dataset in RDF.
 In a prototype we want to deliver a proof of the concept,
 …
++? Identify hooks into LOD?
+\section{Expected Results}
+The main contribution of this work will be the putting together of existing pieces (resources and methods), primarily applying the ontology mapping methods to a specific domain-specific data collection.
+Thus the main result of this work will be a specification of the pair of components the Semantic Search and the underlying Semantic Mapping. These propositions will be supported by a proof-of-concept implementation of these components and an evaluation of querying the dataset comparing traditional search and semantic search.
+a) define/use semantic relations between categories (RelationRegistry)
+b) employ ontological resources to enhance search in the dataset (SemanticSearch)
+c) specify a translation instructions for expressing dataset in rdf  (LinkedData)
+\section{Expected Results}
+The main result of this work will be a specification of the pair of components the Semantic Search and the underlying Semantic Mapping. These propositions will be supported by a proof-of-concept implementation of these components and an evaluation of querying the dataset comparing traditional search and semantic search.
+One important by-product of the work will be the original dataset expressed as RDF with links into existing datasets/ontologies/knowledgebases, building a base for another nucleus of Linked Open Data.
+\begin{itemize}
+\item [Specification] definition of a mapping mechanism
+\item [Prototype] proof of concept implementation
+\item [Evaluation] evaluation results of querying the dataset comparing traditional search and semantic search
+\item [LinkedData] translation of the source dataset to RDF-based format with links into existing datasets/ontologies/knowledgebases
+\end{itemize}
+One important by-product of the work will be the original dataset expressed as RDF with links into existing external  datasets/ontologies/knowledgebases, effectively setting up a base for veawing this dataset into the Linked Open Data \footnote{\url{http://linkeddata.org/}}. This issue is not only top-agenda pursued across many discplines but also recognized as key matter and accordingly supported also by European Commission within the current FP7 \footnote{\url{http://cordis.europa.eu/fetch?CALLER=PROJ\_ICT&ACTION=D&CAT=PROJ&RCN=95562}}.
 \section{State of the Art}
+One current tightly related work is the VLO - Virtual Language Observatory  \footnote{\url{http://www.clarin.eu/vlo/}} \cite{VanUytvanck2010}, being developed within the CLARIN-project. This application operates on the same collection of data as is discussed in this work, however lead by the guiding principle to simplify the search for the user it employs a faceted search, mapping manually the appropriate metadata-fields from the different schemas/profiles to 10 fixed facets.
+The most tightly related current work is the \texttt{VLO - Virtual Language Observatory} \footnote{\url{http://www.clarin.eu/vlo/}} \cite{VanUytvanck2010}, being developed within the CLARIN-project. This application operates on the same collection of data as is discussed in this work, however lead by the guiding principle to simplify the search for the user it employs a faceted search, mapping manually the appropriate metadata-fields from the different schemas/profiles to 10 fixed facets.
+From the knowledge management and semantic web point of view, LT-World, \footnote{\url{http://www.lt-world.org/}} \cite{Joerg2010}
+the ontology-based portal for Language Resources and technology being developed at DFKI \footnote{Deutsches Forschungszentrum fÃŒr KÃŒnstliche Intelligenz} is a prominent resource providing the information about the entities (Institutions, Persons, Project, Tools, etc.) in this field of study.
+Also from the context of CLARIN the integral part of the infrastructure is the \texttt{Component Registry} and \texttt{ISOcat}\footnote{\url{http://www.isocat.org/}}, the ISO-standardized Data Category Registry \cite{Broeder2010,ISO12620:2009}. In the context of CLARIN there also has been work on and now a prototype of the so called \texttt{Relation Registry}, meant for defining relations between data categories. All these components are running services, that this work directly builds upon.
+Linked Open Data
+Another related intiative is that of a \texttt{Vocabulary Alignement Service} being developed and run within the Dutch program CATCH\footnote{\textit{Continuous Access To Cultural Heritage} - \url{http://www.catchplus.nl/en/}}. There are plans to reuse or enhance the service for the needs of CLARIN project.
+VAS - Catch Plus
+The CLARIN project also delivers a valuable source of information on the normative resources in the domain in its current deliverable: \textit{Interoperability and Standards} \cite{CLARIN_D5.C-3}. This document is relevant in two ways - first it covers ontologies as a type of Resources, but mainly it offers an exhaustive collection of references to standards, vocabularies and other normative/standardization work in the field of Language Resources and Technology.
+OAEI
+From the point of view of existing semantic resource \texttt{LT-World}, \footnote{\url{http://www.lt-world.org/}} \cite{Joerg2010} the ontology-based portal covering primarily Language Technology being developed at DFKI \footnote{\textit{Deutsches Forschungszentrum fÃŒr KÃŒnstliche Intelligenz} - \url{http://www.dfki.de}} is a prominent resource providing the information about the entities (Institutions, Persons, Project, Tools, etc.) in this field of study. The underlying ontology is being restructured and cooperation was initialized.
+The starting point for the investigation of work regarding ontology mapping is the overview of the field by Kalfouglou \cite{Kalfoglou2003}.
+As the main contribution shall be the applying of ontology mapping techniques and technology, an comprehensive overview of this field and current developments is indicated. There seems to be a plethora of work on the topic of ontology mapping and the task will be to sort out the relevant work. The starting point for the investigation of work regarding ontology mapping is the overview of the field by Kalfouglou \cite{Kalfoglou2003}, a more recent work by Shvaiko and Euzenat \cite{Shvaiko2008} and the dedicated platform OAEI\footnote{Ontology Alignment Evalution Intiative - \url{http://oaei.ontologymatching.org/}}.
+One recent inspirative work is that of Noah et. al \cite{Noah2010} developing a semantic digital library for academic institution. The scope is limited to document collections, but nevertheless many aspects seem very relevant for this work, like operating on document metadata, ontology population or sophisticated quering and searching.
+As described previously one outcome of the work will be the dataset expressed as RDF, interlinked with other semantic resources, which is follow the broad Linked Open Data effort as proposed by Berners-Lee \cite{TimBL2006}. A current comprehensive overview of the principles  of Linked Data and current applications is the work by Heath and Bizer \cite{HeathBizer2011}, that shall serve as a practical guide for this specific task.
 \section{Keywords}
 Metadata interoperability, Ontology Mapping, Schema mapping, Crosswalk, Similarity measures, LinkedData
 Fuzzy Search, Visual Search?
 …
 \bibliographystyle{plain}
+\bibliographystyle{acm}
 \bibliography{../../../2bib/lingua,../../../2bib/ontolingua,../../../2bib/semweb}

SMC4LRT/Outline.tex

-                      r1188
+                      r1200
 \subsection{Main Goal}
+a) define/use semantic relations between categories (RelationRegistry)
+b) employ ontological resources to enhance search in the dataset (SemanticSearch)
+c) specify a translation instructions for expressing dataset in rdf  (LinkedData)
+Propose a semantic mapping component for Language Resources and Technology within the context of a federated infrastructure (being constructed in the project CLARIN).
+Due to the great diversity of resources and research tasks a full alignement is not achievable. Rather the focus shall be on "soft", dynamic mapping, investigating the possibilities/methods to enable the users to control the mapping with respect to their current task,
+We propose a component that shall enhance search functionality over a large heterogeneous collection of metadata descriptions of Language Resources and Technology (LRT). By applying semantic web technology the user shall be given both better recall through query expansion based on related categories/concepts and new means of exploring the dataset/knowledge-base via ontology-driven browsing.
+A trivial example for a concept-based query expansion:
+Confronted with a user query: \texttt{Actor.Name = Sue} and knowing that \texttt{Actor} is equivalent or similar to \texttt{Person} and \texttt{Name} is synonym to \texttt{FullName} the expanded query could look like:
+\texttt{Actor.Name = Sue OR Actor.FullName = Sue OR Person.Name =  Sue OR Person.FullName= is Sue}
+Another example concerning instance mapping: the user looking for all resource produced by or linked to a given institution, does not have to guess or care for various spellings of the name of the institution used in the description of the resources, but rather can browse through a controlled vocabulary of institutions and see all the resources of given institution. While this could be achieved by simple normalizing of the literal-values (and indeed that definitely has to be one processing step), the linking to an ontology, enables to user to also continue browsing the ontology to find institutions that are related to the original institution by means of being concerned with similar topics and retrieve a union of resources for such resulting cluster. Thus in general the user is enabled to work with the data based on information that is not present in the original dataset.
+All these scenarios require a preprocessing step, that would produce the underlying linkage, both between categories/concepts and between instances (mapping literal values to entities). We refer to this task as semantic mapping, that shall be accomplished by coresponding "Semantic Mapping Component". In this work the focus lies on the process/method, i.e. on the specification and (prototypical) implementation of the component rather than trying to establish some final/accomplished mapping. Although a tentative/naive alignement on a subset of the data will be proposed, this will be mainly used for evaluation and shall serve as basis for discussion with domain experts aiming at creating the actual sensible mappings usable for real tasks.
+Actually due to the great diversity of resources and research tasks  such a "final" complete mapping/alignement does not seem achievable at all. Therefore also the focus shall be on "soft", dynamic mapping, investigating the possibilities/methods to enable the users to adapt the mapping or apply different mapping with respect to their current task or research question,
 essentially being able to actively manipulate the recall/precision ratio of their searches. This entails the examination of user interaction with and visualization of the relevant information in the user interface and enabling the user to act upon it.
-Example
 \subsection{Method}
+We start with examining the existing Data and describing the evolving Infrastructure.
+Then we formulate the task/function of Semantic Mapping and the requirements within the defined context,
+We start with examining the existing Data and describing the evolving Infrastructure in which the components are to be embedded.
+Then we formulate the task/function of Semantic Search on concept and on individuals level
+and the underlying Semantic Mapping and the requirements within the defined context,
 followed by a design proposal for an appropriate component fitting within the infrastructure.
+especially with focus on the feasibility of employing ontology mapping and alignement techniques and tools for the creation of mappings.
 In a prototype we want to deliver a proof of the concept,
 combined with an evaluation to verify the claims of fitness for the purpose.
 …
 and secondly the usability of the ui-controls.
 +? Identify hooks into LOD?
+a) define/use semantic relations between categories (RelationRegistry)
+b) employ ontological resources to enhance search in the dataset (SemanticSearch)
+c) specify a translation instructions for expressing dataset in rdf  (LinkedData)
 \subsection{Expected Results}
+The main result of this work will be a specification of the pair of components the Semantic Search and the underlying Semantic Mapping. This propositions will be supported by a proof-of-concept implementation of these components and an evaluation of querying the dataset comparing traditional search and semantic search.
+One important by-product of the work will be the original dataset expressed as RDF with links into existing datasets/ontologies/knowledgebases, building a base for another nucleus of Linked Open Data.
 \begin{itemize}
 …
 \begin{itemize}
 \item VLO - Virtual Language Observatory  \url{http://www.clarin.eu/vlo/}, \cite{VanUytvanck2010}
+\item Ontology and Lexicon \cite{Hirst2009}
+\item LingInfo/Lemon \cite{Buitelaar2009}
+\item LT-World ontology-based \url{http://www.lt-world.org/}, \cite{Joerg2010}
+\item VAS - Catch Plus
+\item OAEI
 \end{itemize}
 …
 controlled vocabularies?
+Ontology and Lexicon \cite{Hirst2009}
+LingInfo/Lemon \cite{Buitelaar2009}
 …
 \section{Questions, Remarks}
 \begin{itemized}
+\begin{itemize}
 \item How does this relate to federated search?
 \item ontologicky vs. semaziologicky (Semanticke priznaky: kategoriÃ¡lne/archysÃ©my, difernciacne, specifikacne)
+\end{itemized}
+\item "controlled vocabularies"
+\end{itemize}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 1200 for SMC4LRT

Legend:

SMC4LRT/Expose.tex

SMC4LRT/Outline.tex

Download in other formats: