Changeset 3553 for SMC4LRT


Ignore:
Timestamp:
09/12/13 13:08:29 (11 years ago)
Author:
vronk
Message:

finishing introduction, adding citation, smaller reformulations

Location:
SMC4LRT/chapters
Files:
6 edited

Legend:

Unmodified
Added
Removed
  • SMC4LRT/chapters/Data.tex

    r3551 r3553  
    4444
    4545
    46 \todoin{probably more numbers about CMD records (collections, used profiles, ...) (in historical perspective?)}
    47 
    48 On the instance level, in the harvested data 60 distinct profiles can be found.
     46\todoin{ add historical perspective on data - list overall}
    4947
    5048The main CLARIN OAI-PMH harvester\footnote{\url{http://catalog.clarin.eu/oai-harvester/}}
     
    8583\end{table}
    8684
     85\begin{table}
     86\caption{Top 20 collections, with the respective number of records}
     87\begin{center}
     88  \begin{tabular}{ r | l }
     89\# records & colleciton \\
     90    \hline
     91243.129 & Meertens collection: Liederenbank \\
     9246.658 & DK-CLARIN Repository \\
     9346.156 & Nederlands Instituut voor Beeld en Geluid Academia collectie \\
     9429.266 & childes \\
     9524.583 & DoBeS archive \\
     9623.185 & Language and Cognition \\
     9714.593 & talkbank \\
     9814.363 & Acquisition \\
     9914.320 & Institut fÃŒr Deutsche Sprache, CLARIN-D Zentrum, Mannheim \\
     10012.893 & MPI CGN \\
     10110.628 & Bavarian Archive for Speech Signals (BAS) \\
     1027.964 & Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC) \\
     1037.348 & WALS RefDB \\
     1045.689 & Lund Corpora \\
     1054.640 & Oxford Text Archive \\
     1064.492 & Leipzig Corpora Collection \\
     1073.539 & Institut fÃŒr Deutsche Sprache, CLARIN-D Zentrum, Mannheim \\
     1083.280 & A Digital Archive of Research Papers in Computational Linguistics \\
     1093.147 & CLARIN NL \\
     1103.081 & MPI fÃŒr Bildungsforschung \\   
     111\hline
     112  \end{tabular}
     113\end{center}
     114\end{table}
     115
    87116We can also observe a large disparity on the amount of records between individual providers and profiles. Almost half of all records is provided by the Meertens Institute (\textit{Liederenbank} and \textit{Soundbites} collections), another 25\% by MPI for Psycholinguistics (\textit{corpus} + \textit{Session} records from the \textit{The Language Archive}). On the other hand there are 25 profiles that have less than 10 instances. This can be owing both to the state of the respective project (resources and records still being prepared) and the modelled granularity level (collection vs. individual resource).
    88117
  • SMC4LRT/chapters/Definitions.tex

    r3551 r3553  
    2424\item[LINDAT] czech national infrastructure for LRT\furl{http://lindat.ufal.cuni.cz}
    2525\item[OLAC] \textit{Open Language Archive Community}\furl{http://www.language-archives.org/}\ref{def:OLAC}
    26 \item[PID] persistend identifier \todocite{PID}
    27 \item[PURL] persistent uniform resource locator \todocite{PURL}
    28 \item[RDF] \textit{Resource Description Framework} \todocite{RDF}
     26\item[PID] persistend identifier \cite{CLARIN2009_PID}
     27\item[PURL] persistent uniform resource locator \cite{PURL1995}
     28\item[RDF] \textit{Resource Description Framework} \cite{RDF2004}
    2929\item[RR] Relation Registry\ref{def:rr} 
    3030\item[TEI] \textit{Text Encoding Initiative}
     
    3737\begin{description}
    3838\item[Concept]  Basic "entity" in an ontology? that of what an ontology is build
    39 \item[Ontology]  ``an explicit specification of a conceptualization'' \todocite {Ontology!}, but for us mainly a collection of concepts as opposed to lexicon, which is a collection of words.
     39\item[Ontology]  \quote{formal, explicit specification of a shared conceptualisation} \cite{Gruber1993}, but for us mainly a collection of concepts as opposed to lexicon, which is a collection of words.
    4040\item[Word]  a lexical unit, a word in a language, something that has a surface realization (writtenForm) and is a carrier of sense. so a relation holds: hasSense(Word, Concept)
    4141\item[Lexicon]  a collection of words, a (lexical) vocabulary
  • SMC4LRT/chapters/Design_SMCinstance.tex

    r3551 r3553  
    306306
    307307\subsection{RELcat - Ontological relations}
    308 Information in RELcat is already stored in RDF (Schuurman, Windhouwer 2011).  One relation from the example relation set for CMDI :
    309 isocat:DC-2538 rel:sameAs dce:date.
    310 Should we generate the redundant triples based on the relations defined between data categories?  I.e.  if there is a relation
     308Information in RELcat is already stored in RDF \cite{SchuurmanWindhouwer2011}.  One relation from the example relation set for CMDI :
    311309
    312310\begin{example}
     
    314312\end{example}
    315313
    316 and a resource has value
     314Should we generate the redundant triples based on the relations defined between data categories?  I.e.  if there is a relation and a resource has value:
    317315
    318316\begin{example}
     
    324322\begin{example}
    325323<lr1> dct:date 2012^^xs:year
    326 ?
    327 \end{example}
     324\end{example}
     325
     326?
    328327
    329328What about other relations we may want to express? (Do we need them and if yes, where to put them? – still in RR?) Examples:
     
    345344\todocode{install older python (2.5?) to be able to install dot2tex - transforming dot files to nicer pgf formatted graphs}\furl{http://dot2tex.googlecode.com/files/dot2tex-2.8.7.zip}\furl{file:/C:/Users/m/2kb/tex/dot2tex-2.8.7/}
    346345
    347 
    348346\todocode{check install siren}\furl{http://siren.sindice.com/}
    349 \todocode{check install Virtuoso}\furl{http://ods.openlinksw.com/wiki/ODS/}
    350 \todocode{check install Neo4J}
    351 \todocode{check install ontology browser}
    352347
    353348semantic search component in the Linked Media Framework
     
    356351\todoin{check SARQ}\furl{http://github.com/castagna/SARQ}
    357352
    358 \todocode{Load data: relcat, clavas, olac-and-dc-providers cmd, lt-world?}
    359353
    360354\section {Full semantic search - concept-based + ontology-driven ?}
  • SMC4LRT/chapters/Design_SMCschema.tex

    r3551 r3553  
    33\label{ch:design}
    44
    5 In this chapter, we lay out the functioning of the semantic mapping on schema level, the task the Semantic Mapping Component was originally conceived for within the larger CMD Infrastructure (cf. \ref{def:CMDI}).
    6 Semantic interoperability was one of the main concerns addressed by the CMDI and is weaved in tightly in all modules of the infrastructure. The task of the SMC module is to collect information maintained in the registries of the infrastructure and process it to generate mappings, i.e. \emph{crosswalks} between fields in heterogeneous metadata formats. This information serves as basis for the concept-based search.
     5In this chapter, we define the part of the proposed system pertaining to the schema level: the concept-based crosswalk and search functionality -- the tasks that the Semantic Mapping Component was originally conceived for within the larger CMD Infrastructure (cf. \ref{def:CMDI}) -- and, additionally,  the aspect of visualization of schema-level (model) data.
    76
    87We start by drawing a global view on the system, introducing its individual components and the dependencies among them.
    9 In the next section, the internal data model is presented and explained. In section \ref{sec:cx} the design of the actual main service for resolving crosswalks is described, divided into the interface specification and actual implementation. In section \ref{def:concept_search} we elaborate on a search functionality that builds upon the aforementioned service in terms of appropriate query language, a search engine to integrate the search in and the peculiarities of the user interface that could support this enhanced search possibilities. Finally, in section \ref{smc-browser} a advanced interactive user interface for exploring the CMD data domain is proposed.
     8In the next section, the internal data model is presented and explained. In section \ref{sec:cx} the design of the actual main service for resolving crosswalks is described, divided into the interface specification and actual implementation. In section \ref{def:concept_search} we elaborate on a search functionality that builds upon the aforementioned service in terms of appropriate query language, a search engine to integrate the search in and the peculiarities of the user interface that could support this enhanced search possibilities. Finally, in section \ref{smc-browser} an advanced interactive user interface for exploring the CMD data domain is proposed.
    109
    1110\section{System Architecture}
     
    4241\end{note}
    4342
    44 \section{Crosswalk service}
    45 \label{sec:cx}
    46 Crosswalk service offers the functionality, that was understood under the term \textit{Semantic Mapping} as conceived in the original plans of the Component Metadata Infrastructure. It allows to translate between search indexes. In particular it expresses data category based indexes as equivalent paths to fields in the CMD profiles. This way it builds the base for query expansion enhancing the recall, when searching in the heterogeneous data collection of the joint CLARIN metadata domain.
    47 
     43\section{cx -- crosswalk service}
     44\label{def:cx}
     45
     46The crosswalk service offers the functionality, that was understood under the term \textit{Semantic Mapping} as conceived in the original plans of the Component Metadata Infrastructure. Semantic interoperability has been one of the main concerns addressed by the CMDI and appropriate provisions were weaved into the underlying meta-model as well as all the modules of the infrastructure.
     47The task of the crosswalk service is to collect the relevant information maintained in the registries of the infrastructure and process it to generate mappings, i.e. \emph{crosswalks} between fields in heterogeneous metadata schemas, building the base for concept-based search in the heterogeneous data collection of the joint CLARIN metadata domain. (cf. \ref{def:qx}).
     48
     49The core means for semantic interoperability in CMDI are the \emph{data categories} (cf. \ref{def:DCR}), well-defined atomic concepts, that are supposed to be referenced in schemata annotating fields to unambiguously indicate their intended semantics. Drawing upon this system, the crosswalks are not generated directly between the fields of individual schemata by some matching algorithm, but rather the data categories are used as bridges for translation. This results in clusters of semantically equivalent metadata fields (with data categories serving as pivotal points), rather than in a collection of pair-wise equivalencies between the fields.
    4850
    4951\subsection{smcIndex}\label{indexes}
     
    175177
    176178
    177 \section{Concept-based search}
    178 \label{def:concept_search}
     179\section{qx -- concept-based search}
     180\label{def:qx}
    179181To recall, the main goal of this work is to enhance the search capabilities of the search engines serving the metadata.
    180182In this section we want to explore, how this shall be accomplished, i.e. how to bring the enhanced capabilities to the user.
  • SMC4LRT/chapters/Infrastructure.tex

    r3551 r3553  
    99
    1010\begin{quotation}
    11     create a research infrastructure that makes language resources and technologies (LRT) available to scholars of all disciplines, especially SSH large-scale pan-European collaborative effort to create, coordinate and make language resources and technology available and readily useable.
     11\dots create a research infrastructure that makes language resources and technologies (LRT) available to scholars of all disciplines, especially SSH large-scale pan-European collaborative effort to create, coordinate and make language resources and technology available and readily usable.
    1212\end{quotation}
    1313
    1414The infrastructure foresees a federated network of centers (with federated identity management) but mainly providing resources and services in an agreed upon / coherent / uniform / consistent /standardized manner. The foundation for this goal shall be the Common or Component Metadata infrastructure, a model that caters for flexible metadata profiles, allowing to accommodate existing schemas.
    1515
    16 
    17 
    18 As stated before, the SMC is part of CMDI and depends on multiple modules of the infrastructure. Before we describe the interaction itself in chapter \ref{ch:design}, we introduce in short these modules and the data they provide:
     16As stated before, the SMC is part of CMDI and depends on multiple modules on the production side of the infrastructure. Before we describe the interaction itself in chapter \ref{ch:design}, we introduce in short these modules and the data they provide:
    1917
    2018\begin{itemize}
    2119\item Data Category Registry
    2220\item Relation Registry
    23 \item Schema Registry (SCHEMAcat\furl{http://lux13.mpi.nl/schemacat/site/index.html})
    2421\item Component Registry
    2522\item Vocabulary Alignement Service (OpenSKOS)
     23\item Schema Registry (SCHEMAcat\furl{http://lux13.mpi.nl/schemacat/site/index.html})
    2624\item SchemaParser
    2725\end{itemize}
    2826
    29 ?MDBrowser
    30 ?MDService
     27On the other hand, SMC shall serve the modules on the exploitation side of the infrastructure, i.e. search services used by end users. These are briefly introduced in \label{cmdi_exploitation}.
    3128
    3229\begin{figure*}[!ht]
     
    3633
    3734
    38 \subsection{CMDI - DCR/CR/RR}
     35\subsection{CMDI registries: DCR, CR, RR}
    3936\label{def:CMD}
    4037\label{def:DCR}
    4138
     39
     40
    4241The \emph{Data Category Registry} (DCR) is a central registry that enables the community to collectively define and maintain a set of relevant linguistic data categories. The resulting commonly agreed controlled vocabulary is the cornerstone for grounding the semantic interpretation within the CMD framework.
    4342The data model and the procedures of the DCR are defined by the ISO standard \cite{ISO12620:2009}, and is implemented in \emph{ISOcat}\footnote{\url{http://www.isocat.org/}}.
    44 %Next to a web interface for users to browse and manage the data categories, DCR provides a REST-style webservice allowing applications to access the information (provided in Data Category Interchange Format - DCIF). The data categories are assigned a persistent identifier, making them globally and permanently referenceable.
    45 
    46 The \emph{Component Metadata Framework} (CMD) is built on top of the DCR and complements it. While the DCR defines the atomic concepts, within CMD the metadata schemas can be constructed out of reusable components - collections of metadata fields. The components can contain other components, and they can be reused in multiple profiles as long as each field ``refers via a PID to exactly one data category in the ISO DCR, thus indicating unambiguously how the content of the field in a metadata description should be interpreted'' \cite{Broeder+2010}. This allows to trivially infer equivalencies between metadata fields in different CMD-based schemas. While the primary registry used in CMD is the ISOcat DCR, other authoritative sources for data categories (``trusted registries'') are accepted, especially Dublin Core Metadata Initiative \cite{DCMI:2005}.
    47 % \emph{Component Registry} implements the Component Data Model and allows to define, maintain and publish CMD-components and -profiles.
     43Next to a web interface for users to browse and manage the data categories, DCR provides a REST-style webservice allowing applications to access the information (provided in Data Category Interchange Format - DCIF). The data categories are assigned a persistent identifier, making them globally and permanently referenceable.
     44
     45The \emph{Component Metadata Framework} (CMD) is built on top of the DCR and complements it. While the DCR defines the atomic concepts, within CMD the metadata schemas can be constructed out of reusable components - collections of metadata fields. The components can contain other components, and they can be reused in multiple profiles as long as each field ``refers via a PID to exactly one data category in the ISO DCR, thus indicating unambiguously how the content of the field in a metadata description should be interpreted'' \cite{Broeder+2010}. This allows to trivially infer equivalences between metadata fields in different CMD-based schemata. While the primary registry used in CMD is the ISOcat DCR, other authoritative sources for data categories (``trusted registries'') are accepted, especially Dublin Core Metadata Initiative \cite{DCMI:2005}.
     46
     47\emph{Component Registry} implements the Component Data Model and allows to define, maintain and publish CMD-components and -profiles.
    4848
    4949
     
    5656However there needs to be an additional means to capture information about relations between data categories.
    5757This information was deliberately not included in the DCR, because relations often depend on the context in which they are used, making global agreement unfeasible. CMDI proposes a separate module -- the \emph{Relation Registry}\label{def:rr} (RR) \cite{Kemps-Snijders+2008} --, where arbitrary relations between data categories can be stored and maintained. We expect that the RR should be under control of the metadata user whereas the DCR is under control of the metadata modeler.
    58 % These relations don't need to pass a standardization process, but rather separate research teams may define their own sets of relations according to the specific needs of the project. That is not to say that every researcher has to create her own set of relations -- some basic recommended sets will be defined right from the start. But new -- even contradictory -- ones can be created when needed.
     58 These relations don't need to pass a standardization process, but rather separate research teams may define their own sets of relations according to the specific needs of the project. That is not to say that every researcher has to create her own set of relations -- some basic recommended sets will be defined right from the start. But new -- even contradictory -- ones can be created when needed.
    5959
    6060There is a prototypical implementation of such a relation registry called \emph{RELcat} being developed at MPI, Nijmegen. \cite{Windhouwer2011,SchuurmanWindhouwer2011}, that already hosts a few relation sets. There is no user interface to it yet, but it is accessible as a REST-webservice\footnote{sample relation set: \url{http://lux13.mpi.nl/relcat/rest/set/cmdi}}.
    6161This implementation stores the individual relations as RDF-triples
    62 \begin{eqnarray*}
    63 <subjectDatacat, relationPredicate, objectDatcat>
    64 \end{eqnarray*}
     62\begin{example}
     63<subjectDatcat, relationPredicate, objectDatcat>
     64\end{example}
    6565allowing typed relations, like equivalency (\texttt{rel:sameAs}) and subsumption (\texttt{rel:subClassOf}). The relations are grouped into relation sets that can be used independently.
    6666
     
    6868!Cf. Erhard Hinrichs 2009
    6969
    70 \todoin{Describe SCHEMAcat}
     70SCHEMAcat is a registry for schemata of all kinds (not just XML-based) semantically annotated with data categories.
     71
     72RELcat and SCHEMAcat will provide the means to harvest and specify this information in the form of relationships and allow
     73(search) algorithms to traverse the semantic graph thus made explicit\cite{Schuurman2011_SCHEMAcat}.
    7174
    7275\noindent
     
    116119Following are those to be handled in short-term, in order of urgency/relevance/prirority:
    117120\begin{itemize}
    118 \item the list of language codes\todoin{url: ISO-639}
     121\item the list of language codes\cite{ISO639}
    119122\item country codes
    120123\item organization names for the domain of language resources
     
    143146
    144147\subsubsection{Export DCR to SKOS}
    145 \todocite{Menzo2013-03-12 mail}
     148\cite{Menzo2013mail}
    146149
    147150
     
    155158field/element/attribute, complex DCs in ISOcat are the users of such
    156159vocabularies and simple DCs the DCR equivalence of values in such a
    157 vocabulary.\todocite{Menzo}
     160vocabulary.\cite{Menzo2013mail}
    158161
    159162Another aspect is, that a simple DC can be in valuedomains of multiple closed DCs.
     
    214217\end{lstlisting}
    215218
    216 A current proposal by Windhouwer\todocite{Menzo2013-03-12 mail} for integration with CLAVAS foresees following extension:
     219A current proposal by Windhouwer\cite{Menzo2013mail} for integration with CLAVAS foresees following extension:
    217220
    218221\begin{lstlisting}
     
    272275It needs to be yet defined how the information about the vocabulary can be translated into a valid schema representation.
    273276One brute-force approach would be to explicitely enumerate all the values from the vocabulary. This is being currently done
    274 within the CMD-framework with the language-codes\todocite{cmd-component ISO-639}. However there is clearly a limit to this approach both in terms of size of the vocabulary (ISO-639 contains 7.679 items (language codes)  adding some 2MB to each schema referencing it) and its stability/change rate --- ISO-639 is a standard with a fixed list, however most other vocabularies are more volatile (think organization).
     277within the CMD-framework with the language-codes\furl{http://catalog.clarin.eu/ds/ComponentRegistry/?item=clarin.eu:cr1:c_1271859438110}. However there is clearly a limit to this approach both in terms of size of the vocabulary (ISO-639 contains 7.679 items (language codes)  adding some 2MB to each schema referencing it) and its stability/change rate --- ISO-639 is a standard with a fixed list, however most other vocabularies are more volatile (think organization). And even this supposedly fixed list undergoes regular changes -- it is being updated semi-annually, with entries being added, deleted, merged and split.\furl{http://www-01.sil.org/iso639-3/changes.asp}
    275278
    276279Most of these vocabularies also cannot be seen as closed-constrained, i.e. the list that is provided, provides a recommended orthography variant for a given entity, still allowing other values for given field rather than resricting the values to only the items from the vocabulary (think organizations).
     
    278281So this has to be solved in ``soft'' way. Most schema languages allow to annotate the schema.
    279282This is already used with DCR, adding the \code{@dcr:datcat} into schema elements.
    280 Also CMDI (ComponentRegistry when generating schemas) puts information in <xs:appinfo/>.
     283Also CMDI (ComponentRegistry when generating schemas) puts information in \code{<xs:appinfo/>}.
    281284
    282285Tools like Arbil can get access to these annotations, e.g., a reference to a CLAVAS vocabulary, and act upon
     
    315318
    316319\subsection{CMDI - Exploitation side}
     320\label{cmdi_exploitation}
    317321Metadata complying to the CMD-framework is being created by a growing number of institutions  by various means, automatic transformation from legacy data, authoring of new metadata records with the help of one of the Metadata-Editors (TODO: cite: Arbil, NALIDA, ). The CMD-Infrastructure requires the content providers to publish their metadata via the OAI-PMH protocol and announce the OAI-PMH endpoints.  These are being harvested daily by a dedicated CLARIN harvester\footnote{\url{http://catalog.clarin.eu/oai-harvester/}}. The harvested data is validated against the schemas \todoin{What about Normalization?}.  and made available in packaged datasets. These are being fetched by the exploitations side components, that index the metadata records and make them available for searching and browsing.
    318322
    319323\begin{figure*}[!ht]
    320 \includegraphics[width=1\textwidth]{images/CMDingestion_woVAS}
     324\includegraphics[width=0.8\textwidth]{images/CMDingestion_woVAS}
    321325\caption{Within CMDI, metadata is harvested from content providers via OAI-PMH and made available to consumers/users by exploitation side components}
    322326\end{figure*}
  • SMC4LRT/chapters/Introduction.tex

    r3551 r3553  
    8585In an evaluation phase, we apply a set of  test queries and compare a traditional search with a semantically expanded query in terms of recall/precision measures.
    8686
    87 In this work the focus lies on the method itself -- expressed in the specification and operationalized in the (prototypical) implementation of the module -- rather than trying to establish final, accomplished alignment of the schemas. Although a tentative mapping on a subset of the data will be proposed, this will be mainly used for evaluation and shall serve as basis for discussion with domain experts aimed at creating further, more comprehensive mappings.
     87In this work, the focus lies on the actual method to generate and apply the crosswalks -- expressed in the specification and operationalized in the (prototypical) implementation of the service -- rather than trying to establish final, accomplished crosswalks between the schemas. In fact, given the great diversity of resources and research tasks, a ``final'' complete alignment does not seem achievable at all. Therefore also the focus shall be on \emph{dynamic mapping}, i.e. to enable the users to directly manipulate the level of use of the crosswalks or even apply custom crosswalks depending on their current task or research question being able to actively influence the recall/precision ratio of the search results, and essentially to modulate the semantic search space.
    8888
    89 In fact, due to the great diversity of resources and research tasks, a ``final'' complete alignment does not seem achievable at all. Therefore also the focus shall be on ``soft'' dynamic mapping, i.e. to enable the users to adapt the mapping or apply different mappings depending on their current task or research question essentially being able to actively influence the recall/precision ratio of the search results.
    90 
    91 \begin{note}
    92 A special focus will be put on the examination of the feasibility of employing ontology mapping and alignment techniques and tools for the creation of the mappings.
    93 
    94 especially the application of techniques from ontology mapping to the domain-specific data collection (the domain of LRT).
    95 \end{note}
    9689
    9790Serving the second subgoal, semantic interpretation on the instance level, we will propose the expression of all of the domain data (from meta-model specification to instances) in RDF, linking to corresponding entities in appropriate external
    9891semantic resources (controlled vocabularies, ontologies).
    99 Once the dataset is expressed in RDF, it can be exposed via a semantic web application and publicized as another nucleus of \emph{Linked Open Data} in the global \emph{Web Of Data}.
     92Once the dataset is expressed in RDF, it can be exposed via a semantic web application and published as another nucleus of \emph{Linked Open Data} in the global \emph{Web Of Data}.
    10093
    101 A separate usability evaluation of the semantic search is indicated examining the user interaction with and display of the relevant additional information in the user search interface, however this issue can only be tackled marginally and will have to be outsourced into future work.
     94A separate usability evaluation of the semantic search is indicated, examining the user interaction with and display of the relevant additional information in the user search interface, however this issue can only be tackled marginally and will have to be outsourced into future work.
    10295
    10396\section{Expected Results}
    10497
    105 The main result of this work will be the \emph{specification} of the two modules \xne{concept-based search} and the underlying \texttt{crosswalk service}.
     98The main result of this work will be the \emph{specification} of the two modules \xne{concept-based search} and the underlying \xne{crosswalk service}.
    10699This theoretical part will be accompanied by a proof-of-concept \emph{implementation} of the components
    107100and the results and findings of the \emph{evaluation}.
    108101
    109 Another result of the work will be the original dataset expressed as RDF interlinked with existing external resources (ontologies, knowledge bases, vocabularies), effectively laying a foundation for providing this dataset as \emph{Linked Open Data}\furl{http://linkeddata.org/} in the \emph{Web of Data}.
     102Another result of the work will be the original dataset expressed as RDF interlinked with existing external resources (ontologies, knowledge bases, vocabularies), effectively laying a foundation for providing this dataset as \emph{Linked Open Data}\furl{http://linkeddata.org/}.
    110103
    111104\begin{description}
    112 \item [Crosswalk service] specification and proof of basic implementation of the module
    113 \item [Concept-based search] design of the query expansion and integration with search engines
    114 \item [Visualization] design of an application for interactive exploration of the concerned dataset
    115 \item [Evaluation] evaluation results of querying the dataset comparing traditional search and semantic search
     105\item [Crosswalk service] specification and a basic implementation of the service
     106\item [Concept-based search] design of the query expansion and prototypical integration with search engines
     107\item [Visualization tool] design of an application for interactive exploration of the concerned dataset
     108\item [Evaluation] evaluation results of querying the dataset comparing simple search and semantic search
    116109\item [LinkedData] translation of the source dataset to RDF-based format with links into existing datasets, ontologies, knowledge bases
    117110\end{description}
     
    120113The work starts with examining the state of the art work in the two fields  language resources and technology and semantic web technologies in chapter \ref{ch:lit}, followed by administrative chapter \ref{ch:def} explaining the terms and abbreviations used in the work.
    121114
    122 In chapter \ref{ch:data} we analyze the situation in the data domain of LRT metadata and in chapter \ref{ch:infra} we discuss the individual software components /modules /services of the infrastructure underlying this work.
     115In chapter \ref{ch:data} we analyze the situation in the data domain of LRT metadata and in chapter \ref{ch:infra} we discuss the individual software components of the infrastructure underlying this work.
    123116
    124 The main part of the work is found in chapters \ref{ch:design} and \ref{ch:design-instance} laying out the design of the software module, the proposal how to modell the data in RDF respectively.
     117The main part of the work is found in chapters \ref{ch:design} and \ref{ch:design-instance} laying out the design of the software module and a proposal how to model the data in RDF respectively.
    125118
    126119The evaluation and the results are discussed in chapter \ref{ch:results}. Finally, in chapter \ref{ch:conclusions} we summarize the findings of the work and lay out where it could develop in the future.
     
    128121\section{Keywords}
    129122
    130 Metadata interoperability, Ontology Mapping, Schema mapping, Crosswalk, LinkedData
    131 Fuzzy Search, Visual Search?
    132 
    133 Language Resources and Technology, LRT/NLP/HLT
    134 
    135 Ontology Visualization
     123semantic interoperability -- crosswalks -- schema mapping -- metadata -- language resources and technology -- linked data -- visualization
Note: See TracChangeset for help on using the changeset viewer.