- Timestamp:
- 04/12/11 12:05:18 (13 years ago)
- Location:
- SMC4LRT
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
SMC4LRT/Expose.tex
r1196 r1200 78 78 \section{Main Goal} 79 79 80 We propose a component that shall enhance search functionality over a large heterogeneous collection of metadata descriptions of Language Resources and Technology (LRT). By applying semantic web technology the user shall be given both better recall through query expansion based on related categories/concepts and new means of exploring the dataset/knowledge-basevia ontology-driven browsing.80 This work proposes a component that shall enhance search functionality over a large heterogeneous collection of metadata descriptions of Language Resources and Technology (LRT). By applying semantic web technology the user shall be given both better recall through \emph{query expansion} based on related categories, concepts and new means of \emph{exploring the dataset} via ontology-driven browsing. 81 81 82 82 A trivial example for a concept-based query expansion: 83 83 Confronted with a user query: \texttt{Actor.Name = Sue} and knowing that \texttt{Actor} is equivalent or similar to \texttt{Person} and \texttt{Name} is synonym to \texttt{FullName} the expanded query could look like: 84 \texttt{Actor.Name = Sue OR Actor.FullName = Sue OR Person.Name = Sue OR Person.FullName= is Sue} 84 \begin{quote} 85 \texttt{Actor.Name = Sue OR Actor.FullName = Sue OR \\ Person.Name = Sue OR Person.FullName = Sue} 86 \end{quote} 85 87 86 An other example concerning instance mapping: the user looking for all resource produced by or linked to a given institution, does not have to guess or care for various spellings of the name of the institution used in the description of the resources, but rather can browse through a controlled vocabulary of institutions and see all the resources of given institution. While this could be achieved by simple normalizing of the literal-values (and indeed that definitely has to be one processing step), the linking to an ontology, enables to user to also continue browsing the ontology to find institutions that are related to the original institution by means of being concerned with similar topics and retrieve a union of resources for such resulting cluster. Thus in general the user is enabled to work with the data based on information that is not present in the original dataset.88 An example concerning instance mapping: Starting from a list of topics the user can browse the ontology to find institutions concerned with those topics and retrieve a union of resources for such resulting cluster. Thus in general the user is enabled to work with the data based on information that is not present in the original dataset, but rather in external linked-in semantic resources. 87 89 88 All these scenarios require a preprocessing step, that would produce the underlying linkage, both between categories/concepts and between instances (mapping literal values to entities). We refer to this task as semantic mapping, that shall be accomplished by coresponding "Semantic Mapping Component". In this work the focus lies on the process/method, i.e. on the specification and (prototypical) implementation of the component rather than trying to establish some final/accomplished mapping. Although a tentative/naive alignement on a subset of the data will be proposed, this will be mainly used for evaluation and shall serve as basis for discussion with domain experts aiming at creating the actual sensible mappings usable for real tasks.90 Such semantic search functionality requires a preprocessing step, that produces the underlying linkage, both between categories/concepts and between instances (mapping literal values to entities). We refer to this task as \emph{semantic mapping}, that shall be accomplished by coresponding \textsc{Semantic Mapping Component}. In this work the focus lies on the process/method operationalized in the specification and (prototypical) implementation of the component rather than trying to establish some final, accomplished mapping. Although a tentative, naive alignement on a subset of the data will be proposed, this will be mainly used for evaluation and shall serve as basis for discussion with domain experts aiming at creating the actual sensible mappings usable for real tasks. 89 91 90 92 Actually due to the great diversity of resources and research tasks such a "final" complete mapping/alignement does not seem achievable at all. Therefore also the focus shall be on "soft", dynamic mapping, investigating the possibilities/methods to enable the users to adapt the mapping or apply different mappings with respect to their current task or research question, … … 93 95 \section{Method} 94 96 We start with examining the existing data and describing the evolving Infrastructure in which the components are to be embedded. 95 Then we formulate the task/function of Semantic Search on concept and on individuals level 96 and the underlying Semantic Mapping and the requirements within the defined context, 97 followed by a design proposal for an appropriate component fitting within the infrastructure. 98 especially with focus on the feasibility of employing ontology mapping and alignement techniques and tools for the creation of mappings. 97 Then we formulate the task/function of \emph{Semantic Search} distinguishing between the concept level - using semantic relations between concepts or categories for better retrieval and the individuals level - allowing the user to navigate to resources via semantic resources (ontologies, vocabularies). 98 Subsequently we introduce the underlying \emph{Semantic Mapping Component} and the requirements within the defined context, 99 followed by a design proposal for an appropriate component building upon the existing pieces of the infrastructure. 100 A special focus will be put on the examination of the feasibility of employing ontology mapping and alignement techniques and tools for the creation of the mappings. 101 102 Based on a survey of existing semantic resources (ontologies, vocabularies), we identify an intial set of relevant 103 ones, which will be used in the exercise of mapping the literal values in the metadata descriptions to the externally defined entities, essentially interrelating the dataset with external resources and entities. ("Linked Data"). A necessary preprocessing step is the task of expressing the dataset in RDF. 99 104 100 105 In a prototype we want to deliver a proof of the concept, … … 104 109 105 110 106 +? Identify hooks into LOD? 111 \section{Expected Results} 112 The main contribution of this work will be the putting together of existing pieces (resources and methods), primarily applying the ontology mapping methods to a specific domain-specific data collection. 113 114 Thus the main result of this work will be a specification of the pair of components the Semantic Search and the underlying Semantic Mapping. These propositions will be supported by a proof-of-concept implementation of these components and an evaluation of querying the dataset comparing traditional search and semantic search. 107 115 108 116 109 a) define/use semantic relations between categories (RelationRegistry) 110 b) employ ontological resources to enhance search in the dataset (SemanticSearch) 111 c) specify a translation instructions for expressing dataset in rdf (LinkedData) 112 113 114 \section{Expected Results} 115 116 The main result of this work will be a specification of the pair of components the Semantic Search and the underlying Semantic Mapping. These propositions will be supported by a proof-of-concept implementation of these components and an evaluation of querying the dataset comparing traditional search and semantic search. 117 118 One important by-product of the work will be the original dataset expressed as RDF with links into existing datasets/ontologies/knowledgebases, building a base for another nucleus of Linked Open Data. 119 120 \begin{itemize} 121 \item [Specification] definition of a mapping mechanism 122 \item [Prototype] proof of concept implementation 123 \item [Evaluation] evaluation results of querying the dataset comparing traditional search and semantic search 124 \item [LinkedData] translation of the source dataset to RDF-based format with links into existing datasets/ontologies/knowledgebases 125 126 \end{itemize} 117 One important by-product of the work will be the original dataset expressed as RDF with links into existing external datasets/ontologies/knowledgebases, effectively setting up a base for veawing this dataset into the Linked Open Data \footnote{\url{http://linkeddata.org/}}. This issue is not only top-agenda pursued across many discplines but also recognized as key matter and accordingly supported also by European Commission within the current FP7 \footnote{\url{http://cordis.europa.eu/fetch?CALLER=PROJ\_ICT&ACTION=D&CAT=PROJ&RCN=95562}}. 127 118 128 119 129 120 \section{State of the Art} 130 121 131 One current tightly related work is the VLO - Virtual Language Observatory \footnote{\url{http://www.clarin.eu/vlo/}} \cite{VanUytvanck2010}, being developed within the CLARIN-project. This application operates on the same collection of data as is discussed in this work, however lead by the guiding principle to simplify the search for the user it employs a faceted search, mapping manually the appropriate metadata-fields from the different schemas/profiles to 10 fixed facets. 122 The most tightly related current work is the \texttt{VLO - Virtual Language Observatory} \footnote{\url{http://www.clarin.eu/vlo/}} \cite{VanUytvanck2010}, being developed within the CLARIN-project. This application operates on the same collection of data as is discussed in this work, however lead by the guiding principle to simplify the search for the user it employs a faceted search, mapping manually the appropriate metadata-fields from the different schemas/profiles to 10 fixed facets. 132 123 133 From the knowledge management and semantic web point of view, LT-World, \footnote{\url{http://www.lt-world.org/}} \cite{Joerg2010} 134 the ontology-based portal for Language Resources and technology being developed at DFKI \footnote{Deutsches Forschungszentrum fÃŒr KÃŒnstliche Intelligenz} is a prominent resource providing the information about the entities (Institutions, Persons, Project, Tools, etc.) in this field of study. 124 Also from the context of CLARIN the integral part of the infrastructure is the \texttt{Component Registry} and \texttt{ISOcat}\footnote{\url{http://www.isocat.org/}}, the ISO-standardized Data Category Registry \cite{Broeder2010,ISO12620:2009}. In the context of CLARIN there also has been work on and now a prototype of the so called \texttt{Relation Registry}, meant for defining relations between data categories. All these components are running services, that this work directly builds upon. 135 125 136 Linked Open Data 126 Another related intiative is that of a \texttt{Vocabulary Alignement Service} being developed and run within the Dutch program CATCH\footnote{\textit{Continuous Access To Cultural Heritage} - \url{http://www.catchplus.nl/en/}}. There are plans to reuse or enhance the service for the needs of CLARIN project. 137 127 138 VAS - Catch Plus 128 The CLARIN project also delivers a valuable source of information on the normative resources in the domain in its current deliverable: \textit{Interoperability and Standards} \cite{CLARIN_D5.C-3}. This document is relevant in two ways - first it covers ontologies as a type of Resources, but mainly it offers an exhaustive collection of references to standards, vocabularies and other normative/standardization work in the field of Language Resources and Technology. 139 129 140 OAEI 130 From the point of view of existing semantic resource \texttt{LT-World}, \footnote{\url{http://www.lt-world.org/}} \cite{Joerg2010} the ontology-based portal covering primarily Language Technology being developed at DFKI \footnote{\textit{Deutsches Forschungszentrum fÃŒr KÃŒnstliche Intelligenz} - \url{http://www.dfki.de}} is a prominent resource providing the information about the entities (Institutions, Persons, Project, Tools, etc.) in this field of study. The underlying ontology is being restructured and cooperation was initialized. 141 131 142 The starting point for the investigation of work regarding ontology mapping is the overview of the field by Kalfouglou \cite{Kalfoglou2003}. 132 As the main contribution shall be the applying of ontology mapping techniques and technology, an comprehensive overview of this field and current developments is indicated. There seems to be a plethora of work on the topic of ontology mapping and the task will be to sort out the relevant work. The starting point for the investigation of work regarding ontology mapping is the overview of the field by Kalfouglou \cite{Kalfoglou2003}, a more recent work by Shvaiko and Euzenat \cite{Shvaiko2008} and the dedicated platform OAEI\footnote{Ontology Alignment Evalution Intiative - \url{http://oaei.ontologymatching.org/}}. 133 134 One recent inspirative work is that of Noah et. al \cite{Noah2010} developing a semantic digital library for academic institution. The scope is limited to document collections, but nevertheless many aspects seem very relevant for this work, like operating on document metadata, ontology population or sophisticated quering and searching. 135 136 As described previously one outcome of the work will be the dataset expressed as RDF, interlinked with other semantic resources, which is follow the broad Linked Open Data effort as proposed by Berners-Lee \cite{TimBL2006}. A current comprehensive overview of the principles of Linked Data and current applications is the work by Heath and Bizer \cite{HeathBizer2011}, that shall serve as a practical guide for this specific task. 137 138 143 139 144 140 \section{Keywords} 145 146 141 Metadata interoperability, Ontology Mapping, Schema mapping, Crosswalk, Similarity measures, LinkedData 147 142 Fuzzy Search, Visual Search? … … 156 151 157 152 158 \bibliographystyle{ plain}153 \bibliographystyle{acm} 159 154 \bibliography{../../../2bib/lingua,../../../2bib/ontolingua,../../../2bib/semweb} 160 155 -
SMC4LRT/Outline.tex
r1188 r1200 84 84 \subsection{Main Goal} 85 85 86 87 a) define/use semantic relations between categories (RelationRegistry) 88 b) employ ontological resources to enhance search in the dataset (SemanticSearch) 89 c) specify a translation instructions for expressing dataset in rdf (LinkedData) 90 91 Propose a semantic mapping component for Language Resources and Technology within the context of a federated infrastructure (being constructed in the project CLARIN). 92 Due to the great diversity of resources and research tasks a full alignement is not achievable. Rather the focus shall be on "soft", dynamic mapping, investigating the possibilities/methods to enable the users to control the mapping with respect to their current task, 86 We propose a component that shall enhance search functionality over a large heterogeneous collection of metadata descriptions of Language Resources and Technology (LRT). By applying semantic web technology the user shall be given both better recall through query expansion based on related categories/concepts and new means of exploring the dataset/knowledge-base via ontology-driven browsing. 87 88 A trivial example for a concept-based query expansion: 89 Confronted with a user query: \texttt{Actor.Name = Sue} and knowing that \texttt{Actor} is equivalent or similar to \texttt{Person} and \texttt{Name} is synonym to \texttt{FullName} the expanded query could look like: 90 \texttt{Actor.Name = Sue OR Actor.FullName = Sue OR Person.Name = Sue OR Person.FullName= is Sue} 91 92 Another example concerning instance mapping: the user looking for all resource produced by or linked to a given institution, does not have to guess or care for various spellings of the name of the institution used in the description of the resources, but rather can browse through a controlled vocabulary of institutions and see all the resources of given institution. While this could be achieved by simple normalizing of the literal-values (and indeed that definitely has to be one processing step), the linking to an ontology, enables to user to also continue browsing the ontology to find institutions that are related to the original institution by means of being concerned with similar topics and retrieve a union of resources for such resulting cluster. Thus in general the user is enabled to work with the data based on information that is not present in the original dataset. 93 94 All these scenarios require a preprocessing step, that would produce the underlying linkage, both between categories/concepts and between instances (mapping literal values to entities). We refer to this task as semantic mapping, that shall be accomplished by coresponding "Semantic Mapping Component". In this work the focus lies on the process/method, i.e. on the specification and (prototypical) implementation of the component rather than trying to establish some final/accomplished mapping. Although a tentative/naive alignement on a subset of the data will be proposed, this will be mainly used for evaluation and shall serve as basis for discussion with domain experts aiming at creating the actual sensible mappings usable for real tasks. 95 96 Actually due to the great diversity of resources and research tasks such a "final" complete mapping/alignement does not seem achievable at all. Therefore also the focus shall be on "soft", dynamic mapping, investigating the possibilities/methods to enable the users to adapt the mapping or apply different mapping with respect to their current task or research question, 93 97 essentially being able to actively manipulate the recall/precision ratio of their searches. This entails the examination of user interaction with and visualization of the relevant information in the user interface and enabling the user to act upon it. 94 98 95 Example96 97 98 99 \subsection{Method} 99 We start with examining the existing Data and describing the evolving Infrastructure. 100 Then we formulate the task/function of Semantic Mapping and the requirements within the defined context, 100 We start with examining the existing Data and describing the evolving Infrastructure in which the components are to be embedded. 101 Then we formulate the task/function of Semantic Search on concept and on individuals level 102 and the underlying Semantic Mapping and the requirements within the defined context, 101 103 followed by a design proposal for an appropriate component fitting within the infrastructure. 104 especially with focus on the feasibility of employing ontology mapping and alignement techniques and tools for the creation of mappings. 105 102 106 In a prototype we want to deliver a proof of the concept, 103 107 combined with an evaluation to verify the claims of fitness for the purpose. … … 105 109 and secondly the usability of the ui-controls. 106 110 111 107 112 +? Identify hooks into LOD? 108 113 114 115 a) define/use semantic relations between categories (RelationRegistry) 116 b) employ ontological resources to enhance search in the dataset (SemanticSearch) 117 c) specify a translation instructions for expressing dataset in rdf (LinkedData) 118 119 109 120 \subsection{Expected Results} 121 122 The main result of this work will be a specification of the pair of components the Semantic Search and the underlying Semantic Mapping. This propositions will be supported by a proof-of-concept implementation of these components and an evaluation of querying the dataset comparing traditional search and semantic search. 123 124 One important by-product of the work will be the original dataset expressed as RDF with links into existing datasets/ontologies/knowledgebases, building a base for another nucleus of Linked Open Data. 110 125 111 126 \begin{itemize} … … 122 137 \begin{itemize} 123 138 \item VLO - Virtual Language Observatory \url{http://www.clarin.eu/vlo/}, \cite{VanUytvanck2010} 124 \item Ontology and Lexicon \cite{Hirst2009} 125 \item LingInfo/Lemon \cite{Buitelaar2009} 139 \item LT-World ontology-based \url{http://www.lt-world.org/}, \cite{Joerg2010} 140 \item VAS - Catch Plus 141 \item OAEI 126 142 \end{itemize} 127 143 … … 198 214 199 215 controlled vocabularies? 216 217 Ontology and Lexicon \cite{Hirst2009} 218 219 LingInfo/Lemon \cite{Buitelaar2009} 200 220 201 221 … … 465 485 \section{Questions, Remarks} 466 486 467 \begin{itemize d}487 \begin{itemize} 468 488 \item How does this relate to federated search? 469 489 \item ontologicky vs. semaziologicky (Semanticke priznaky: kategoriálne/archysémy, difernciacne, specifikacne) 470 \end{itemized} 490 \item "controlled vocabularies" 491 \end{itemize} 471 492 472 493
Note: See TracChangeset
for help on using the changeset viewer.