Changeset 3204


Ignore:
Timestamp:
07/28/13 10:45:45 (11 years ago)
Author:
vronk
Message:

restructuring the chapters
added chapter Structure of the work in the Introduction

Location:
SMC4LRT
Files:
5 edited
3 moved

Legend:

Unmodified
Added
Removed
  • SMC4LRT/Outline.tex

    r2703 r3204  
    9595\input{chapters/Introduction}
    9696
     97\begin{comment}
    9798\input{chapters/Literature}
    9899
     
    103104\input{chapters/Infrastructure}
    104105
    105 \input{chapters/SMC}
     106\input{chapters/Design}
    106107
    107 \input{chapters/System}
     108\input{chapters/Implementation}
    108109
    109 \input{chapters/Evaluation}
     110\input{chapters/CMD2RDF}
    110111
     112\input{chapters/Results}
    111113
    112 
    113 \chapter{Conclusions and Future Work}
    114 
    115 Further work is needed on more complex types of response (similarity ratio, relation types) and also on the interaction with Metadata Service to find the optimal way of providing the features of semantic mapping and query expansion as semantic search within the search user-interface.
    116 
     114\input{chapters/Conclusion}
     115\end{comment}
    117116
    118117
  • SMC4LRT/chapters/CMD2RDF.tex

    r3139 r3204  
    210210A major (if not the main) motivation for the CMD to RDF mapping is the wish to have better control over  and better quality of values in metadata fields with constrained value domain like organization or resource type. As the allowed values for these fields often cannot be explicitly enumerated, it is not possible to restrict them by means of an XML schema. This leads to inconsistent use of labels for referring to entities. (As the instance data shows, some organizations are referred to by more than 20 different labels.)
    211211Thus, one goal of this work is to map (string) values in selected fields to entities defined in corresponding vocabularies. The main provider of relevant vocabularies is ISOcat and CLAVAS  – a service for managing and providing vocabularies in SKOS format. Closed and corresponding simple data categories are already being exported from ISOcat in SKOS format and imported into CLAVAS/OpenSKOS and also other relevant vocabularies shall be ingested into this system, so that for our purposes we can assume OpenSKOS as the one source of vocabularies.
    212 Data in OpenSKOS is modeled purely in SKOS, so there is no more specific typing of the entities in the vocabularies, but rather all the entities are skos:Concepts:
     212Data in OpenSKOS is modelled purely in SKOS, so there is no more specific typing of the entities in the vocabularies, but rather all the entities are \xne{skos:Concepts}:
    213213
    214214\begin{example}
  • SMC4LRT/chapters/Conclusion.tex

    r3139 r3204  
    11
    22\chapter{Conclusions and Future Work}
     3\label{ch:conclusions}
    34
    45Further work is needed on more complex types of response (similarity ratio, relation types) and also on the interaction with Metadata Service to find the optimal way of providing the features of semantic mapping and query expansion as semantic search within the search user-interface.
  • SMC4LRT/chapters/Design.tex

    r3140 r3204  
    11
    2 \chapter{Semantic Mapping Component}
    3 
     2\chapter{Semantic Mapping Component - Design}
     3\label{ch:design}
    44
    55\section{Data Model?}
  • SMC4LRT/chapters/Implementation.tex

    r3140 r3204  
    22\chapter{Implementation}
    33%%%%%%%%%%%%%%%%%%%%%
    4 
     4\label{ch:implementation}
    55
    66
  • SMC4LRT/chapters/Infrastructure.tex

    r3140 r3204  
    3636
    3737\subsection{CMDI - DCR/CR/RR}
    38 \label{dcr}
     38\label{def:cmdi}
     39\label{def:dcr}
    3940
    4041The \emph{Data Category Registry} (DCR) is a central registry that enables the community to collectively define and maintain a set of relevant linguistic data categories. The resulting commonly agreed controlled vocabulary is the cornerstone for grounding the semantic interpretation within the CMD framework.
  • SMC4LRT/chapters/Introduction.tex

    r3140 r3204  
    44%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    55
    6 
    7         \begin{itemize}
    8                 \item motivation
    9                 \item problem statement (which problem should be solved?)
    10                 \item aim of the work
    11                 \item methodological approach
    12                 \item structure of the work
    13         \end{itemize}
    14 
    15 
    166\todocode{install older python (2.5?) to be able to install dot2tex - transforming dot files to nicer pgf formatted graphs}\furl{http://dot2tex.googlecode.com/files/dot2tex-2.8.7.zip}\furl{file:/C:/Users/m/2kb/tex/dot2tex-2.8.7/}
    177
    188
    19 \subsection{Problem statement}
     9\section{Motivation / problem statement}
    2010
    21 While in the Digital Libraries community a consolidation generally already happened and big federated networks of digital libary repository are set up, in the field of Language Resource and Technology the landscape is still scattered, although meanwhile looking back at a decade of standardizing efforts. One main reason seems to be the complexity and diversity of the metadata associated with the resources, stemming for one from the wide range of resource types additionally complicated by influence from different schools of thought. (cf. \ref{ch:data})
     11While in the Digital Libraries community a consolidation generally already happened and global federated networks of digital library repositories are set up, in the field of Language Resource and Technology the landscape is still scattered, although meanwhile looking back at a decade of standardizing efforts. One main reason seems to be the complexity and diversity of the metadata associated with the resources, stemming for one from the wide range of resource types combined with project-specific needs. (See chapter \ref{ch:data} for analysis of the data disparity in the domain)
    2212
    23 \todoin{Need some number about the disparity in the field, number of institutes, resources, formats.}
    24 
    25 This situation has been identified by the community and multiple standardization initiatives had been conducted/undertaken. This process has gained a new momentum thanks to large Research Infrastructure Programmes introduced by European Commission, aimed at fostering Research communities developing large-scale pan-european common infrastructures. One key player in this development is the project CLARIN.
     13This situation has been identified by the community and multiple standardization initiatives had been conducted/undertaken. The process has gained a new momentum thanks to large research infrastructure programmes introduced by the European Commission, aimed at supporting research communities developing common large-scale pan-european infrastructures. One key player in this development is the project CLARIN (see section \ref{def:CLARIN}). The main objective of this initiative is to make language resources and technologies more easily available to scholars, by providing a common infrastructure. One core pillar of this infrastructure is a common framework for resource descriptions (metadata) the Component Metadata Framework (cf. \ref{def:cmdi}). This work discusses the component within the Component Metadata Infrastructure concerned with /dedicated to overcoming the heterogenity of the resource descriptions.
    2614
    2715
    28 \subsection{Main Goal}
     16\section{Main Goal}
    2917
    30 This work proposes a component that shall enhance search functionality over a \emph{large heterogeneous collection of metadata descriptions} of Language Resources and Technology (LRT). By applying semantic web technology the user shall be given both better recall through \emph{query expansion} based on related categories/concepts and new means of \emph{exploring the dataset} via ontology-driven browsing.
     18The primary goal of this work is to enhance search functionality over a \emph{large heterogeneous collection of metadata descriptions} of Language Resources and Technology (LRT). By applying semantic web technology the user shall be given both better recall through \emph{query expansion} based on related categories/concepts and new means of \emph{exploring the dataset} via ontology-driven browsing.
     19
     20This main goal can be broken down along the three meanings of the term ``mapping'':
     21
     22\begin{description}
     23\item[crosswalk] establish links between crosswalks between metadata formats
     24\item[interpret] translate string labels in field values to semantic entities
     25\item[visualize] provide a visualization of the data.
     26\end{description}
    3127
    3228Alternatively/  that allows query expansion by providing mappings between search indexes. This enables semantic search, ultimately increasing the recall when searching in metadata collections. The module builds on the Data Category Registry and Component Metadata Framework that are part of CMDI.
     
    4440In fact, due to the great diversity of resources and research tasks, a ``final'' complete alignment does not seem achievable at all. Therefore also the focus shall be on ``soft'' dynamic mapping, i.e. to enable the users to adapt the mapping or apply different mappings depending on their current task or research question essentially being able to actively manipulate the recall/precision ratio of the search results. This entails an examination of user interaction with and visualization of the relevant additional information in the user search interface. However this would open doors to a whole new (to this work) field of usability engineering and can be treated here only marginally.
    4541
    46 \subsection{Method}
     42\section{Method}
    4743We start with examining the existing data and describing the evolving infrastructure in which the components are to be embedded. Then we formulate the function of \textbf{Semantic Search} distinguishing between the concept level -- using semantic relations between concepts or categories for better retrieval -- and the instances level -- allowing the user to explore the primary data collection via semantic resources (ontologies, vocabularies).
    4844
     
    6258\end{itemize}
    6359
    64 \subsection{Expected Results}
     60\section{Expected Results}
    6561The primary concern of this work is the integrative effort, i.e. putting together existing pieces (resources, components and methods) especially the application of techniques from ontology mapping to the domain-specific data collection (the domain of LRT). Thus the main result of this work will be the \emph{specification} of the two components \texttt{Semantic Search} and the underlying \texttt{Semantic Mapping}.
    6662This theoretical part will be accompanied by a proof-of-concept \emph{implementation} of the components and the results and findings of the \emph{evaluation}.
     
    7672\end{description}
    7773
     74\section{Structure of the work}
     75The work starts with examining the state of the art work in the two fields  language resources and technology and semantic web technologies in chapter \ref{ch:lit}. In chapter \ref{cd:data} we analyze the situation in the data domain of LRT metadata  and
     76in chapter \ref{ch:infra} we discuss the individual software components /modules /services of the infrastructure underlying this work.
    7877
    79 \subsection{Keywords}
     78The main part of the work is found in chapters \ref{ch:design}, \ref{ch:cmd2rdf} and \ref{ch:implementation}
    8079
    81 Metadata interoperability, Ontology Mapping, Schema mapping, Crosswalk, Similarity measures, LinkedData
     80The findings and results are discussed in chapter \ref{ch:results}. Finally, in chapter \ref{ch:conclusions} we summarize the findings of the work and lay out where it could develop in the future.
     81
     82
     83\section{Keywords}
     84
     85Metadata interoperability, Ontology Mapping, Schema mapping, Crosswalk, LinkedData
    8286Fuzzy Search, Visual Search?
    8387
  • SMC4LRT/chapters/Results.tex

    r3140 r3204  
    1 \chapter{Evaluation}
     1\chapter{Results}
     2\label{ch:results}
    23
    34\section{Use Cases}
Note: See TracChangeset for help on using the changeset viewer.