Changeset 1200 for SMC4LRT


Ignore:
Timestamp:
04/12/11 12:05:18 (13 years ago)
Author:
vronk
Message:
 
Location:
SMC4LRT
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • SMC4LRT/Expose.tex

    r1196 r1200  
    7878\section{Main Goal}
    7979
    80 We propose a component that shall enhance search functionality over a large heterogeneous collection of metadata descriptions of Language Resources and Technology (LRT). By applying semantic web technology the user shall be given both better recall through query expansion based on related categories/concepts and new means of exploring the dataset/knowledge-base via ontology-driven browsing.
     80This work proposes a component that shall enhance search functionality over a large heterogeneous collection of metadata descriptions of Language Resources and Technology (LRT). By applying semantic web technology the user shall be given both better recall through \emph{query expansion} based on related categories, concepts and new means of \emph{exploring the dataset} via ontology-driven browsing.
    8181
    8282A trivial example for a concept-based query expansion:
    8383Confronted with a user query: \texttt{Actor.Name = Sue} and knowing that \texttt{Actor} is equivalent or similar to \texttt{Person} and \texttt{Name} is synonym to \texttt{FullName} the expanded query could look like:
    84 \texttt{Actor.Name = Sue OR Actor.FullName = Sue OR Person.Name =  Sue OR Person.FullName= is Sue}
     84\begin{quote}
     85\texttt{Actor.Name = Sue OR Actor.FullName = Sue OR \\ Person.Name =  Sue OR Person.FullName = Sue}
     86\end{quote}
    8587
    86 Another example concerning instance mapping: the user looking for all resource produced by or linked to a given institution, does not have to guess or care for various spellings of the name of the institution used in the description of the resources, but rather can browse through a controlled vocabulary of institutions and see all the resources of given institution. While this could be achieved by simple normalizing of the literal-values (and indeed that definitely has to be one processing step), the linking to an ontology, enables to user to also continue browsing the ontology to find institutions that are related to the original institution by means of being concerned with similar topics and retrieve a union of resources for such resulting cluster. Thus in general the user is enabled to work with the data based on information that is not present in the original dataset.
     88An example concerning instance mapping: Starting from a list of topics the user can browse the ontology to find institutions concerned with those topics and retrieve a union of resources for such resulting cluster. Thus in general the user is enabled to work with the data based on information that is not present in the original dataset, but rather in external linked-in semantic resources.
    8789
    88 All these scenarios require a preprocessing step, that would produce the underlying linkage, both between categories/concepts and between instances (mapping literal values to entities). We refer to this task as semantic mapping, that shall be accomplished by coresponding "Semantic Mapping Component". In this work the focus lies on the process/method, i.e. on the specification and (prototypical) implementation of the component rather than trying to establish some final/accomplished mapping. Although a tentative/naive alignement on a subset of the data will be proposed, this will be mainly used for evaluation and shall serve as basis for discussion with domain experts aiming at creating the actual sensible mappings usable for real tasks.
     90Such semantic search functionality requires a preprocessing step, that produces the underlying linkage, both between categories/concepts and between instances (mapping literal values to entities). We refer to this task as \emph{semantic mapping}, that shall be accomplished by coresponding \textsc{Semantic Mapping Component}. In this work the focus lies on the process/method operationalized in the specification and (prototypical) implementation of the component rather than trying to establish some final, accomplished mapping. Although a tentative, naive alignement on a subset of the data will be proposed, this will be mainly used for evaluation and shall serve as basis for discussion with domain experts aiming at creating the actual sensible mappings usable for real tasks.
    8991
    9092Actually due to the great diversity of resources and research tasks  such a "final" complete mapping/alignement does not seem achievable at all. Therefore also the focus shall be on "soft", dynamic mapping, investigating the possibilities/methods to enable the users to adapt the mapping or apply different mappings with respect to their current task or research question,
     
    9395\section{Method}
    9496We start with examining the existing data and describing the evolving Infrastructure in which the components are to be embedded.
    95 Then we formulate the task/function of Semantic Search on concept and on individuals level
    96 and the underlying Semantic Mapping and the requirements within the defined context,
    97 followed by a design proposal for an appropriate component fitting within the infrastructure.
    98 especially with focus on the feasibility of employing ontology mapping and alignement techniques and tools for the creation of mappings.
     97Then we formulate the task/function of \emph{Semantic Search} distinguishing between the concept level - using semantic relations between concepts or categories for better retrieval and the individuals level - allowing the user to navigate to resources via semantic resources (ontologies, vocabularies).
     98Subsequently we introduce the underlying \emph{Semantic Mapping Component} and the requirements within the defined context,
     99followed by a design proposal for an appropriate component building upon the existing pieces of the infrastructure.
     100A special focus will be put on the examination of the feasibility of employing ontology mapping and alignement techniques and tools for the creation of the mappings.
     101
     102Based on a survey of existing semantic resources (ontologies, vocabularies), we identify an intial set of relevant
     103ones, which will be used in the exercise of mapping the literal values in the metadata descriptions to the externally defined entities, essentially interrelating the dataset with external resources and entities. ("Linked Data"). A necessary preprocessing step is the task of expressing the dataset in RDF.
    99104
    100105In a prototype we want to deliver a proof of the concept,
     
    104109
    105110
    106 +? Identify hooks into LOD?
     111\section{Expected Results}
     112The main contribution of this work will be the putting together of existing pieces (resources and methods), primarily applying the ontology mapping methods to a specific domain-specific data collection.
     113
     114Thus the main result of this work will be a specification of the pair of components the Semantic Search and the underlying Semantic Mapping. These propositions will be supported by a proof-of-concept implementation of these components and an evaluation of querying the dataset comparing traditional search and semantic search.
    107115
    108116
    109 a) define/use semantic relations between categories (RelationRegistry)
    110 b) employ ontological resources to enhance search in the dataset (SemanticSearch)
    111 c) specify a translation instructions for expressing dataset in rdf  (LinkedData)
    112 
    113 
    114 \section{Expected Results}
    115 
    116 The main result of this work will be a specification of the pair of components the Semantic Search and the underlying Semantic Mapping. These propositions will be supported by a proof-of-concept implementation of these components and an evaluation of querying the dataset comparing traditional search and semantic search.
    117 
    118 One important by-product of the work will be the original dataset expressed as RDF with links into existing datasets/ontologies/knowledgebases, building a base for another nucleus of Linked Open Data.
    119 
    120 \begin{itemize}
    121 \item [Specification] definition of a mapping mechanism
    122 \item [Prototype] proof of concept implementation
    123 \item [Evaluation] evaluation results of querying the dataset comparing traditional search and semantic search
    124 \item [LinkedData] translation of the source dataset to RDF-based format with links into existing datasets/ontologies/knowledgebases
    125 
    126 \end{itemize}
     117One important by-product of the work will be the original dataset expressed as RDF with links into existing external  datasets/ontologies/knowledgebases, effectively setting up a base for veawing this dataset into the Linked Open Data \footnote{\url{http://linkeddata.org/}}. This issue is not only top-agenda pursued across many discplines but also recognized as key matter and accordingly supported also by European Commission within the current FP7 \footnote{\url{http://cordis.europa.eu/fetch?CALLER=PROJ\_ICT&ACTION=D&CAT=PROJ&RCN=95562}}.
    127118
    128119
    129120\section{State of the Art}
    130121
    131 One current tightly related work is the VLO - Virtual Language Observatory  \footnote{\url{http://www.clarin.eu/vlo/}} \cite{VanUytvanck2010}, being developed within the CLARIN-project. This application operates on the same collection of data as is discussed in this work, however lead by the guiding principle to simplify the search for the user it employs a faceted search, mapping manually the appropriate metadata-fields from the different schemas/profiles to 10 fixed facets.
     122The most tightly related current work is the \texttt{VLO - Virtual Language Observatory} \footnote{\url{http://www.clarin.eu/vlo/}} \cite{VanUytvanck2010}, being developed within the CLARIN-project. This application operates on the same collection of data as is discussed in this work, however lead by the guiding principle to simplify the search for the user it employs a faceted search, mapping manually the appropriate metadata-fields from the different schemas/profiles to 10 fixed facets.
    132123
    133 From the knowledge management and semantic web point of view, LT-World, \footnote{\url{http://www.lt-world.org/}} \cite{Joerg2010}
    134 the ontology-based portal for Language Resources and technology being developed at DFKI \footnote{Deutsches Forschungszentrum fÃŒr KÃŒnstliche Intelligenz} is a prominent resource providing the information about the entities (Institutions, Persons, Project, Tools, etc.) in this field of study.
     124Also from the context of CLARIN the integral part of the infrastructure is the \texttt{Component Registry} and \texttt{ISOcat}\footnote{\url{http://www.isocat.org/}}, the ISO-standardized Data Category Registry \cite{Broeder2010,ISO12620:2009}. In the context of CLARIN there also has been work on and now a prototype of the so called \texttt{Relation Registry}, meant for defining relations between data categories. All these components are running services, that this work directly builds upon.
    135125
    136 Linked Open Data
     126Another related intiative is that of a \texttt{Vocabulary Alignement Service} being developed and run within the Dutch program CATCH\footnote{\textit{Continuous Access To Cultural Heritage} - \url{http://www.catchplus.nl/en/}}. There are plans to reuse or enhance the service for the needs of CLARIN project.
    137127
    138 VAS - Catch Plus
     128The CLARIN project also delivers a valuable source of information on the normative resources in the domain in its current deliverable: \textit{Interoperability and Standards} \cite{CLARIN_D5.C-3}. This document is relevant in two ways - first it covers ontologies as a type of Resources, but mainly it offers an exhaustive collection of references to standards, vocabularies and other normative/standardization work in the field of Language Resources and Technology.
    139129
    140 OAEI
     130From the point of view of existing semantic resource \texttt{LT-World}, \footnote{\url{http://www.lt-world.org/}} \cite{Joerg2010} the ontology-based portal covering primarily Language Technology being developed at DFKI \footnote{\textit{Deutsches Forschungszentrum fÃŒr KÃŒnstliche Intelligenz} - \url{http://www.dfki.de}} is a prominent resource providing the information about the entities (Institutions, Persons, Project, Tools, etc.) in this field of study. The underlying ontology is being restructured and cooperation was initialized.
    141131
    142 The starting point for the investigation of work regarding ontology mapping is the overview of the field by Kalfouglou \cite{Kalfoglou2003}.
     132As the main contribution shall be the applying of ontology mapping techniques and technology, an comprehensive overview of this field and current developments is indicated. There seems to be a plethora of work on the topic of ontology mapping and the task will be to sort out the relevant work. The starting point for the investigation of work regarding ontology mapping is the overview of the field by Kalfouglou \cite{Kalfoglou2003}, a more recent work by Shvaiko and Euzenat \cite{Shvaiko2008} and the dedicated platform OAEI\footnote{Ontology Alignment Evalution Intiative - \url{http://oaei.ontologymatching.org/}}.
     133
     134One recent inspirative work is that of Noah et. al \cite{Noah2010} developing a semantic digital library for academic institution. The scope is limited to document collections, but nevertheless many aspects seem very relevant for this work, like operating on document metadata, ontology population or sophisticated quering and searching.
     135
     136As described previously one outcome of the work will be the dataset expressed as RDF, interlinked with other semantic resources, which is follow the broad Linked Open Data effort as proposed by Berners-Lee \cite{TimBL2006}. A current comprehensive overview of the principles  of Linked Data and current applications is the work by Heath and Bizer \cite{HeathBizer2011}, that shall serve as a practical guide for this specific task.
     137
     138
    143139
    144140\section{Keywords}
    145 
    146141Metadata interoperability, Ontology Mapping, Schema mapping, Crosswalk, Similarity measures, LinkedData
    147142Fuzzy Search, Visual Search?
     
    156151
    157152
    158 \bibliographystyle{plain}
     153\bibliographystyle{acm}
    159154\bibliography{../../../2bib/lingua,../../../2bib/ontolingua,../../../2bib/semweb}
    160155
  • SMC4LRT/Outline.tex

    r1188 r1200  
    8484\subsection{Main Goal}
    8585
    86 
    87 a) define/use semantic relations between categories (RelationRegistry)
    88 b) employ ontological resources to enhance search in the dataset (SemanticSearch)
    89 c) specify a translation instructions for expressing dataset in rdf  (LinkedData)
    90 
    91 Propose a semantic mapping component for Language Resources and Technology within the context of a federated infrastructure (being constructed in the project CLARIN).
    92 Due to the great diversity of resources and research tasks a full alignement is not achievable. Rather the focus shall be on "soft", dynamic mapping, investigating the possibilities/methods to enable the users to control the mapping with respect to their current task,
     86We propose a component that shall enhance search functionality over a large heterogeneous collection of metadata descriptions of Language Resources and Technology (LRT). By applying semantic web technology the user shall be given both better recall through query expansion based on related categories/concepts and new means of exploring the dataset/knowledge-base via ontology-driven browsing.
     87
     88A trivial example for a concept-based query expansion:
     89Confronted with a user query: \texttt{Actor.Name = Sue} and knowing that \texttt{Actor} is equivalent or similar to \texttt{Person} and \texttt{Name} is synonym to \texttt{FullName} the expanded query could look like:
     90\texttt{Actor.Name = Sue OR Actor.FullName = Sue OR Person.Name =  Sue OR Person.FullName= is Sue}
     91
     92Another example concerning instance mapping: the user looking for all resource produced by or linked to a given institution, does not have to guess or care for various spellings of the name of the institution used in the description of the resources, but rather can browse through a controlled vocabulary of institutions and see all the resources of given institution. While this could be achieved by simple normalizing of the literal-values (and indeed that definitely has to be one processing step), the linking to an ontology, enables to user to also continue browsing the ontology to find institutions that are related to the original institution by means of being concerned with similar topics and retrieve a union of resources for such resulting cluster. Thus in general the user is enabled to work with the data based on information that is not present in the original dataset.
     93
     94All these scenarios require a preprocessing step, that would produce the underlying linkage, both between categories/concepts and between instances (mapping literal values to entities). We refer to this task as semantic mapping, that shall be accomplished by coresponding "Semantic Mapping Component". In this work the focus lies on the process/method, i.e. on the specification and (prototypical) implementation of the component rather than trying to establish some final/accomplished mapping. Although a tentative/naive alignement on a subset of the data will be proposed, this will be mainly used for evaluation and shall serve as basis for discussion with domain experts aiming at creating the actual sensible mappings usable for real tasks.
     95
     96Actually due to the great diversity of resources and research tasks  such a "final" complete mapping/alignement does not seem achievable at all. Therefore also the focus shall be on "soft", dynamic mapping, investigating the possibilities/methods to enable the users to adapt the mapping or apply different mapping with respect to their current task or research question,
    9397essentially being able to actively manipulate the recall/precision ratio of their searches. This entails the examination of user interaction with and visualization of the relevant information in the user interface and enabling the user to act upon it.
    9498
    95 Example
    96 
    97 
    9899\subsection{Method}
    99 We start with examining the existing Data and describing the evolving Infrastructure.
    100 Then we formulate the task/function of Semantic Mapping and the requirements within the defined context,
     100We start with examining the existing Data and describing the evolving Infrastructure in which the components are to be embedded.
     101Then we formulate the task/function of Semantic Search on concept and on individuals level
     102and the underlying Semantic Mapping and the requirements within the defined context,
    101103followed by a design proposal for an appropriate component fitting within the infrastructure.
     104especially with focus on the feasibility of employing ontology mapping and alignement techniques and tools for the creation of mappings.
     105
    102106In a prototype we want to deliver a proof of the concept,
    103107combined with an evaluation to verify the claims of fitness for the purpose.
     
    105109and secondly the usability of the ui-controls.
    106110
     111
    107112+? Identify hooks into LOD?
    108113
     114
     115a) define/use semantic relations between categories (RelationRegistry)
     116b) employ ontological resources to enhance search in the dataset (SemanticSearch)
     117c) specify a translation instructions for expressing dataset in rdf  (LinkedData)
     118
     119
    109120\subsection{Expected Results}
     121
     122The main result of this work will be a specification of the pair of components the Semantic Search and the underlying Semantic Mapping. This propositions will be supported by a proof-of-concept implementation of these components and an evaluation of querying the dataset comparing traditional search and semantic search.
     123
     124One important by-product of the work will be the original dataset expressed as RDF with links into existing datasets/ontologies/knowledgebases, building a base for another nucleus of Linked Open Data.
    110125
    111126\begin{itemize}
     
    122137\begin{itemize}
    123138\item VLO - Virtual Language Observatory  \url{http://www.clarin.eu/vlo/}, \cite{VanUytvanck2010}
    124 \item Ontology and Lexicon \cite{Hirst2009}
    125 \item LingInfo/Lemon \cite{Buitelaar2009}
     139\item LT-World ontology-based \url{http://www.lt-world.org/}, \cite{Joerg2010}
     140\item VAS - Catch Plus
     141\item OAEI
    126142\end{itemize}
    127143
     
    198214
    199215controlled vocabularies?
     216
     217Ontology and Lexicon \cite{Hirst2009}
     218
     219LingInfo/Lemon \cite{Buitelaar2009}
    200220
    201221
     
    465485\section{Questions, Remarks}
    466486
    467 \begin{itemized}
     487\begin{itemize}
    468488\item How does this relate to federated search?
    469489\item ontologicky vs. semaziologicky (Semanticke priznaky: kategoriálne/archysémy, difernciacne, specifikacne)
    470 \end{itemized}
     490\item "controlled vocabularies"
     491\end{itemize}
    471492
    472493
Note: See TracChangeset for help on using the changeset viewer.