Changeset 3204
- Timestamp:
- 07/28/13 10:45:45 (11 years ago)
- Location:
- SMC4LRT
- Files:
-
- 5 edited
- 3 moved
Legend:
- Unmodified
- Added
- Removed
-
SMC4LRT/Outline.tex
r2703 r3204 95 95 \input{chapters/Introduction} 96 96 97 \begin{comment} 97 98 \input{chapters/Literature} 98 99 … … 103 104 \input{chapters/Infrastructure} 104 105 105 \input{chapters/ SMC}106 \input{chapters/Design} 106 107 107 \input{chapters/ System}108 \input{chapters/Implementation} 108 109 109 \input{chapters/ Evaluation}110 \input{chapters/CMD2RDF} 110 111 112 \input{chapters/Results} 111 113 112 113 \chapter{Conclusions and Future Work} 114 115 Further work is needed on more complex types of response (similarity ratio, relation types) and also on the interaction with Metadata Service to find the optimal way of providing the features of semantic mapping and query expansion as semantic search within the search user-interface. 116 114 \input{chapters/Conclusion} 115 \end{comment} 117 116 118 117 -
SMC4LRT/chapters/CMD2RDF.tex
r3139 r3204 210 210 A major (if not the main) motivation for the CMD to RDF mapping is the wish to have better control over and better quality of values in metadata fields with constrained value domain like organization or resource type. As the allowed values for these fields often cannot be explicitly enumerated, it is not possible to restrict them by means of an XML schema. This leads to inconsistent use of labels for referring to entities. (As the instance data shows, some organizations are referred to by more than 20 different labels.) 211 211 Thus, one goal of this work is to map (string) values in selected fields to entities defined in corresponding vocabularies. The main provider of relevant vocabularies is ISOcat and CLAVAS â a service for managing and providing vocabularies in SKOS format. Closed and corresponding simple data categories are already being exported from ISOcat in SKOS format and imported into CLAVAS/OpenSKOS and also other relevant vocabularies shall be ingested into this system, so that for our purposes we can assume OpenSKOS as the one source of vocabularies. 212 Data in OpenSKOS is model ed purely in SKOS, so there is no more specific typing of the entities in the vocabularies, but rather all the entities are skos:Concepts:212 Data in OpenSKOS is modelled purely in SKOS, so there is no more specific typing of the entities in the vocabularies, but rather all the entities are \xne{skos:Concepts}: 213 213 214 214 \begin{example} -
SMC4LRT/chapters/Conclusion.tex
r3139 r3204 1 1 2 2 \chapter{Conclusions and Future Work} 3 \label{ch:conclusions} 3 4 4 5 Further work is needed on more complex types of response (similarity ratio, relation types) and also on the interaction with Metadata Service to find the optimal way of providing the features of semantic mapping and query expansion as semantic search within the search user-interface. -
SMC4LRT/chapters/Design.tex
r3140 r3204 1 1 2 \chapter{Semantic Mapping Component }3 2 \chapter{Semantic Mapping Component - Design} 3 \label{ch:design} 4 4 5 5 \section{Data Model?} -
SMC4LRT/chapters/Implementation.tex
r3140 r3204 2 2 \chapter{Implementation} 3 3 %%%%%%%%%%%%%%%%%%%%% 4 4 \label{ch:implementation} 5 5 6 6 -
SMC4LRT/chapters/Infrastructure.tex
r3140 r3204 36 36 37 37 \subsection{CMDI - DCR/CR/RR} 38 \label{dcr} 38 \label{def:cmdi} 39 \label{def:dcr} 39 40 40 41 The \emph{Data Category Registry} (DCR) is a central registry that enables the community to collectively define and maintain a set of relevant linguistic data categories. The resulting commonly agreed controlled vocabulary is the cornerstone for grounding the semantic interpretation within the CMD framework. -
SMC4LRT/chapters/Introduction.tex
r3140 r3204 4 4 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 5 5 6 7 \begin{itemize}8 \item motivation9 \item problem statement (which problem should be solved?)10 \item aim of the work11 \item methodological approach12 \item structure of the work13 \end{itemize}14 15 16 6 \todocode{install older python (2.5?) to be able to install dot2tex - transforming dot files to nicer pgf formatted graphs}\furl{http://dot2tex.googlecode.com/files/dot2tex-2.8.7.zip}\furl{file:/C:/Users/m/2kb/tex/dot2tex-2.8.7/} 17 7 18 8 19 \s ubsection{Problem statement}9 \section{Motivation / problem statement} 20 10 21 While in the Digital Libraries community a consolidation generally already happened and big federated networks of digital libary repository are set up, in the field of Language Resource and Technology the landscape is still scattered, although meanwhile looking back at a decade of standardizing efforts. One main reason seems to be the complexity and diversity of the metadata associated with the resources, stemming for one from the wide range of resource types additionally complicated by influence from different schools of thought. (cf. \ref{ch:data})11 While in the Digital Libraries community a consolidation generally already happened and global federated networks of digital library repositories are set up, in the field of Language Resource and Technology the landscape is still scattered, although meanwhile looking back at a decade of standardizing efforts. One main reason seems to be the complexity and diversity of the metadata associated with the resources, stemming for one from the wide range of resource types combined with project-specific needs. (See chapter \ref{ch:data} for analysis of the data disparity in the domain) 22 12 23 \todoin{Need some number about the disparity in the field, number of institutes, resources, formats.} 24 25 This situation has been identified by the community and multiple standardization initiatives had been conducted/undertaken. This process has gained a new momentum thanks to large Research Infrastructure Programmes introduced by European Commission, aimed at fostering Research communities developing large-scale pan-european common infrastructures. One key player in this development is the project CLARIN. 13 This situation has been identified by the community and multiple standardization initiatives had been conducted/undertaken. The process has gained a new momentum thanks to large research infrastructure programmes introduced by the European Commission, aimed at supporting research communities developing common large-scale pan-european infrastructures. One key player in this development is the project CLARIN (see section \ref{def:CLARIN}). The main objective of this initiative is to make language resources and technologies more easily available to scholars, by providing a common infrastructure. One core pillar of this infrastructure is a common framework for resource descriptions (metadata) the Component Metadata Framework (cf. \ref{def:cmdi}). This work discusses the component within the Component Metadata Infrastructure concerned with /dedicated to overcoming the heterogenity of the resource descriptions. 26 14 27 15 28 \s ubsection{Main Goal}16 \section{Main Goal} 29 17 30 This work proposes a component that shall enhance search functionality over a \emph{large heterogeneous collection of metadata descriptions} of Language Resources and Technology (LRT). By applying semantic web technology the user shall be given both better recall through \emph{query expansion} based on related categories/concepts and new means of \emph{exploring the dataset} via ontology-driven browsing. 18 The primary goal of this work is to enhance search functionality over a \emph{large heterogeneous collection of metadata descriptions} of Language Resources and Technology (LRT). By applying semantic web technology the user shall be given both better recall through \emph{query expansion} based on related categories/concepts and new means of \emph{exploring the dataset} via ontology-driven browsing. 19 20 This main goal can be broken down along the three meanings of the term ``mapping'': 21 22 \begin{description} 23 \item[crosswalk] establish links between crosswalks between metadata formats 24 \item[interpret] translate string labels in field values to semantic entities 25 \item[visualize] provide a visualization of the data. 26 \end{description} 31 27 32 28 Alternatively/ that allows query expansion by providing mappings between search indexes. This enables semantic search, ultimately increasing the recall when searching in metadata collections. The module builds on the Data Category Registry and Component Metadata Framework that are part of CMDI. … … 44 40 In fact, due to the great diversity of resources and research tasks, a ``final'' complete alignment does not seem achievable at all. Therefore also the focus shall be on ``soft'' dynamic mapping, i.e. to enable the users to adapt the mapping or apply different mappings depending on their current task or research question essentially being able to actively manipulate the recall/precision ratio of the search results. This entails an examination of user interaction with and visualization of the relevant additional information in the user search interface. However this would open doors to a whole new (to this work) field of usability engineering and can be treated here only marginally. 45 41 46 \s ubsection{Method}42 \section{Method} 47 43 We start with examining the existing data and describing the evolving infrastructure in which the components are to be embedded. Then we formulate the function of \textbf{Semantic Search} distinguishing between the concept level -- using semantic relations between concepts or categories for better retrieval -- and the instances level -- allowing the user to explore the primary data collection via semantic resources (ontologies, vocabularies). 48 44 … … 62 58 \end{itemize} 63 59 64 \s ubsection{Expected Results}60 \section{Expected Results} 65 61 The primary concern of this work is the integrative effort, i.e. putting together existing pieces (resources, components and methods) especially the application of techniques from ontology mapping to the domain-specific data collection (the domain of LRT). Thus the main result of this work will be the \emph{specification} of the two components \texttt{Semantic Search} and the underlying \texttt{Semantic Mapping}. 66 62 This theoretical part will be accompanied by a proof-of-concept \emph{implementation} of the components and the results and findings of the \emph{evaluation}. … … 76 72 \end{description} 77 73 74 \section{Structure of the work} 75 The work starts with examining the state of the art work in the two fields language resources and technology and semantic web technologies in chapter \ref{ch:lit}. In chapter \ref{cd:data} we analyze the situation in the data domain of LRT metadata and 76 in chapter \ref{ch:infra} we discuss the individual software components /modules /services of the infrastructure underlying this work. 78 77 79 \subsection{Keywords}78 The main part of the work is found in chapters \ref{ch:design}, \ref{ch:cmd2rdf} and \ref{ch:implementation} 80 79 81 Metadata interoperability, Ontology Mapping, Schema mapping, Crosswalk, Similarity measures, LinkedData 80 The findings and results are discussed in chapter \ref{ch:results}. Finally, in chapter \ref{ch:conclusions} we summarize the findings of the work and lay out where it could develop in the future. 81 82 83 \section{Keywords} 84 85 Metadata interoperability, Ontology Mapping, Schema mapping, Crosswalk, LinkedData 82 86 Fuzzy Search, Visual Search? 83 87 -
SMC4LRT/chapters/Results.tex
r3140 r3204 1 \chapter{Evaluation} 1 \chapter{Results} 2 \label{ch:results} 2 3 3 4 \section{Use Cases}
Note: See TracChangeset
for help on using the changeset viewer.