Ignore:
Timestamp:
12/01/13 19:04:51 (11 years ago)
Author:
vronk
Message:

minor orthographic corrections

File:
1 edited

Legend:

Unmodified
Added
Removed
  • SMC4LRT/chapters/Introduction.tex

    r3776 r4117  
    44%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    55
    6 \section{Motivation / problem statement}
     6\section{Motivation / Problem Statement}
    77
    88While in the Digital Libraries community a consolidation already took place and global federated networks of digital library repositories are set up, in the field of Language Resource and Technology the landscape is still scattered, although meanwhile looking back at a decade of standardization and integration efforts. One main reason seems to be the complexity and diversity of the metadata associated with the resources, stemming from the wide range of resource types combined with project-specific needs. (Chapter \ref{ch:data} analyses the disparity in the data domain.)
    99
    10 This situation has been identified by the community and numerous standardization initiatives had been undertaken. The process has gained a new momentum thanks to large framework programmes introduced by the European Commission aimed at fostering the development of common large-scale international research infrastructures. One key player in this development is the project CLARIN (see section \ref{def:CLARIN}). The main objective of this initiative is to make language resources and technologies (LRT) more easily available to scholars by means of a common harmonized architecture. One core pillar of this architecture is the \emph{Component Metadata Infrastructure} (cf. \ref{def:CMDI}) -- a distributed system consisting of multiple interconnected modules aimed at creating and providing metadata for LRT in a coherent harmonized way.
     10This situation has been identified by the community and numerous standardization initiatives had been undertaken. The process has gained a new momentum thanks to large framework programmes introduced by the European Commission aimed at fostering the development of common large-scale international research infrastructures. One key player in this development is the project CLARIN (see section \ref{def:CLARIN}). The main objective of this initiative is to make language resources and technologies (LRT) more easily available to scholars by means of a common harmonized architecture. One core pillar of this architecture is the \emph{Component Metadata Infrastructure} (CMDI, cf. \ref{def:CMDI}) -- a distributed system consisting of multiple interconnected modules aimed at creating and providing metadata for LRT in a coherent harmonized way.
    1111
    12 This work discusses one module within the Component Metadata Infrastructure -- the \emph{Semantic Mapping Component} -- dedicated to overcome or at least ease the semantic interoperability problem stemming from the heterogeneity of the resource descriptions, without the reductionist approach of imposing one common description schema for all resources.
     12This work discusses one module within the Component Metadata Infrastructure -- the \emph{Semantic Mapping Component} (SMC) -- dedicated to overcome or at least ease the semantic interoperability problem stemming from the heterogeneity of the resource descriptions, without the reductionist approach of imposing one common description schema for all resources.
    1313
    1414\section{Main Goal}
     
    4040Semantic interoperability has been one of the main concerns addressed by the CMDI and appropriate provisions were weaved into the underlying meta-model as well as all the modules of the infrastructure. The task of the crosswalk service -- the primary part of the SMC module -- is to collect the relevant information maintained in the registries of the infrastructure and process it to generate mappings, i.e. \emph{crosswalks} between fields in heterogeneous metadata schemas that can serve as basis for concept-based search.
    4141
    42 Thus, the goal is not primarily to produce the crosswalks but rather to develop the service serving existing ones.
     42Thus, the goal is not primarily to define new crosswalks but rather to develop a service serving existing ones.
    4343
    4444\subsubsection*{Concept-based query expansion}
     
    4848\paragraph{Example}
    4949Confronted with a user query searching in the notorious \concept{dublincore:title} the query has to be \emph{expanded} to
    50 all the semantically near fields (\emph{concept cluster}), that are however labelled (or even structured) differently in other schemas like:
     50all the semantically near fields (\emph{concept cluster}) that are however, labelled (or even structured) differently in other schemas like:
    5151
    5252\begin{quote}
     
    5454\end{quote}
    5555
    56 The expansion cannot be solved by simple string matching, as there are other fields labeled with the same (sub)strings but with different semantics, that shouldn't be considered:
     56The expansion cannot be solved by simple string matching, as there are other fields labelled with the same (sub)strings but with different semantics that shouldn't be considered:
    5757
    5858\begin{quote}
     
    6262\subsubsection*{Semantic interpretation}
    6363
    64 The problem of different labels for semantically similar or even identical entities is even more so virulent on the level of individual values in the fields of the instance data. A number of metadata fields (like \concept{organization} or \concept{resource type})  have a constrained value domain that yet cannot be explicitly exhaustively enumerated. This leads to a chronically inconsistent use of labels for referring to entities. (As the instance data shows, some organizations are referred to by more than 20 different labels.) Thus, one goal of this work is to propose a mechanism to map (string) values in selected fields to entities defined in corresponding vocabularies.
     64The problem of different labels for semantically similar or even identical entities is even more so virulent on the level of individual values in the fields of the instance data. A number of metadata fields (like \concept{organization} or \concept{resource type})  have a constrained value domain that yet cannot be explicitly exhaustively enumerated. This leads to a chronically inconsistent use of labels for referring to entities. (As the evidence in the metadata records collected within CMDI shows, some organizations are referred to by more than 20 different labels.) Thus, one goal of this work is to propose a mechanism to map (string) values in selected fields to entities defined in corresponding vocabularies.
    6565
    6666\subsubsection*{Ontology-driven data exploration}
     
    7575
    7676\section{Method}
    77 We start with examining the existing data and with the description of the existing infrastructure in which this work is embedded.
     77We start with examining the existing data and with the description of the existing infrastructure, in which this work is embedded.
    7878
    7979Building on this groundwork, in accordance with the first subgoal, we lay out the design of the service for handling crosswalks and concept-based query expansion. We describe the workflow, the central methods and the role of the module relative to other parts of the infrastructure.
     
    9090Once the dataset is expressed in RDF, it can be exposed via a semantic web application and published as another nucleus of \emph{Linked Open Data} in the global \emph{Web Of Data}.
    9191
    92 A separate evaluation of the usability of the proposed semantic search solution is indicated, examining the user interaction with and display of the relevant additional information in the user search interface, however this issue can only be tackled marginally and will have to be outsourced into future work.
     92A separate evaluation of the usability of the proposed semantic search solution is indicated, examining the user interaction with and display of the relevant additional information in the user search interface, however, this issue can only be tackled marginally and will have to be outsourced into future work.
    9393
    9494\section{Expected Results}
     
    108108\end{description}
    109109
    110 \section{Structure of the work}
     110\section{Structure of the Work}
    111111The work starts with examining the state of the art work in the two fields  language resources and technology and semantic web technologies in chapter \ref{ch:lit}. In chapter \ref{ch:data} we analyze the situation in the data domain of LRT metadata and in chapter \ref{ch:infra} we discuss the individual software components of the infrastructure underlying this work.
    112112
     
    116116The results are discussed in chapter \ref{ch:results}. Finally, in chapter \ref{ch:conclusions} we summarize the findings of the work and lay out where it could develop in the future.
    117117
    118 The auxiliary material accompanying the work is found in the appendix. After the administrative chapter \ref{ch:def} explaining the abbreviations and formatting conventions used throughout this work, full specifications of the used data models (\ref{ch:data-model-ref} and data samples (\ref{ch:cmd-sample}) are listed for reference, as well as the developer and user documentation for the technical solution of this work, the SMC module (\ref{ch:smc-docs}).
     118The auxiliary material accompanying the work is found in the appendix. After the administrative chapter \ref{ch:def} explaining the abbreviations and formatting conventions used throughout this work, full specifications of the used data models (\ref{ch:data-model-ref}) and data samples (\ref{ch:cmd-sample}) are listed for reference, as well as the developer and user documentation for the technical solution of this work, the SMC module (\ref{ch:smc-docs}).
    119119
    120120
Note: See TracChangeset for help on using the changeset viewer.