Context Navigation

← Previous Change
Next Change →

Changeset 3553 for SMC4LRT

Timestamp:

09/12/13 13:08:29 (11 years ago)

Author:

vronk

Message:

finishing introduction, adding citation, smaller reformulations

Location:

SMC4LRT/chapters

Files:

: 6 edited

Data.tex (modified) (2 diffs)
Definitions.tex (modified) (2 diffs)
Design_SMCinstance.tex (modified) (5 diffs)
Design_SMCschema.tex (modified) (3 diffs)
Infrastructure.tex (modified) (11 diffs)
Introduction.tex (modified) (3 diffs)

Legend:

: Unmodified
: Added
: Removed

SMC4LRT/chapters/Data.tex

-                      r3551
+                      r3553
+\todoin{probably more numbers about CMD records (collections, used profiles, ...) (in historical perspective?)}
+On the instance level, in the harvested data 60 distinct profiles can be found.
+\todoin{ add historical perspective on data - list overall}
 The main CLARIN OAI-PMH harvester\footnote{\url{http://catalog.clarin.eu/oai-harvester/}}
 …
 \end{table}
+\begin{table}
+\caption{Top 20 collections, with the respective number of records}
+\begin{center}
+  \begin{tabular}{ r | l }
+\# records & colleciton \\
+    \hline
+.129 & Meertens collection: Liederenbank \\
+.658 & DK-CLARIN Repository \\
+.156 & Nederlands Instituut voor Beeld en Geluid Academia collectie \\
+.266 & childes \\
+.583 & DoBeS archive \\
+.185 & Language and Cognition \\
+.593 & talkbank \\
+.363 & Acquisition \\
+.320 & Institut fÃŒr Deutsche Sprache, CLARIN-D Zentrum, Mannheim \\
+.893 & MPI CGN \\
+.628 & Bavarian Archive for Speech Signals (BAS) \\
+.964 & Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC) \\
+.348 & WALS RefDB \\
+.689 & Lund Corpora \\
+.640 & Oxford Text Archive \\
+.492 & Leipzig Corpora Collection \\
+.539 & Institut fÃŒr Deutsche Sprache, CLARIN-D Zentrum, Mannheim \\
+.280 & A Digital Archive of Research Papers in Computational Linguistics \\
+.147 & CLARIN NL \\
+.081 & MPI fÃŒr Bildungsforschung \\
+\hline
+  \end{tabular}
+\end{center}
+\end{table}
 We can also observe a large disparity on the amount of records between individual providers and profiles. Almost half of all records is provided by the Meertens Institute (\textit{Liederenbank} and \textit{Soundbites} collections), another 25\% by MPI for Psycholinguistics (\textit{corpus} + \textit{Session} records from the \textit{The Language Archive}). On the other hand there are 25 profiles that have less than 10 instances. This can be owing both to the state of the respective project (resources and records still being prepared) and the modelled granularity level (collection vs. individual resource).

SMC4LRT/chapters/Definitions.tex

-                      r3551
+                      r3553
 \item[LINDAT] czech national infrastructure for LRT\furl{http://lindat.ufal.cuni.cz}
 \item[OLAC] \textit{Open Language Archive Community}\furl{http://www.language-archives.org/}\ref{def:OLAC}
 \item[PID] persistend identifier \todocite{PID}
 \item[PURL] persistent uniform resource locator \todocite{PURL}
 \item[RDF] \textit{Resource Description Framework} \todocite{RDF}
+\item[PID] persistend identifier \cite{CLARIN2009_PID}
+\item[PURL] persistent uniform resource locator \cite{PURL1995}
+\item[RDF] \textit{Resource Description Framework} \cite{RDF2004}
 \item[RR] Relation Registry\ref{def:rr}
 \item[TEI] \textit{Text Encoding Initiative}
 …
 \begin{description}
 \item[Concept]  Basic "entity" in an ontology? that of what an ontology is build
 \item[Ontology]  ``an explicit specification of a conceptualization'' \todocite {Ontology!}, but for us mainly a collection of concepts as opposed to lexicon, which is a collection of words.
+\item[Ontology]  \quote{formal, explicit specification of a shared conceptualisation} \cite{Gruber1993}, but for us mainly a collection of concepts as opposed to lexicon, which is a collection of words.
 \item[Word]  a lexical unit, a word in a language, something that has a surface realization (writtenForm) and is a carrier of sense. so a relation holds: hasSense(Word, Concept)
 \item[Lexicon]  a collection of words, a (lexical) vocabulary

SMC4LRT/chapters/Design_SMCinstance.tex

-                      r3551
+                      r3553
 \subsection{RELcat - Ontological relations}
+Information in RELcat is already stored in RDF (Schuurman, Windhouwer 2011).  One relation from the example relation set for CMDI :
+isocat:DC-2538 rel:sameAs dce:date.
+Should we generate the redundant triples based on the relations defined between data categories?  I.e.  if there is a relation
+Information in RELcat is already stored in RDF \cite{SchuurmanWindhouwer2011}.  One relation from the example relation set for CMDI :
 \begin{example}
 …
 \end{example}
+and a resource has value
+Should we generate the redundant triples based on the relations defined between data categories?  I.e.  if there is a relation and a resource has value:
 \begin{example}
 …
 \begin{example}
 <lr1> dct:date 2012^^xs:year
+?
+\end{example}
+\end{example}
+?
 What about other relations we may want to express? (Do we need them and if yes, where to put them? â still in RR?) Examples:
 …
 \todocode{install older python (2.5?) to be able to install dot2tex - transforming dot files to nicer pgf formatted graphs}\furl{http://dot2tex.googlecode.com/files/dot2tex-2.8.7.zip}\furl{file:/C:/Users/m/2kb/tex/dot2tex-2.8.7/}
 \todocode{check install siren}\furl{http://siren.sindice.com/}
-\todocode{check install Virtuoso}\furl{http://ods.openlinksw.com/wiki/ODS/}
-\todocode{check install Neo4J}
-\todocode{check install ontology browser}
 semantic search component in the Linked Media Framework
 …
 \todoin{check SARQ}\furl{http://github.com/castagna/SARQ}
-\todocode{Load data: relcat, clavas, olac-and-dc-providers cmd, lt-world?}
 \section {Full semantic search - concept-based + ontology-driven ?}

SMC4LRT/chapters/Design_SMCschema.tex

-                      r3551
+                      r3553
 \label{ch:design}
+In this chapter, we lay out the functioning of the semantic mapping on schema level, the task the Semantic Mapping Component was originally conceived for within the larger CMD Infrastructure (cf. \ref{def:CMDI}).
+Semantic interoperability was one of the main concerns addressed by the CMDI and is weaved in tightly in all modules of the infrastructure. The task of the SMC module is to collect information maintained in the registries of the infrastructure and process it to generate mappings, i.e. \emph{crosswalks} between fields in heterogeneous metadata formats. This information serves as basis for the concept-based search.
+In this chapter, we define the part of the proposed system pertaining to the schema level: the concept-based crosswalk and search functionality -- the tasks that the Semantic Mapping Component was originally conceived for within the larger CMD Infrastructure (cf. \ref{def:CMDI}) -- and, additionally,  the aspect of visualization of schema-level (model) data.
 We start by drawing a global view on the system, introducing its individual components and the dependencies among them.
 In the next section, the internal data model is presented and explained. In section \ref{sec:cx} the design of the actual main service for resolving crosswalks is described, divided into the interface specification and actual implementation. In section \ref{def:concept_search} we elaborate on a search functionality that builds upon the aforementioned service in terms of appropriate query language, a search engine to integrate the search in and the peculiarities of the user interface that could support this enhanced search possibilities. Finally, in section \ref{smc-browser} a advanced interactive user interface for exploring the CMD data domain is proposed.
+In the next section, the internal data model is presented and explained. In section \ref{sec:cx} the design of the actual main service for resolving crosswalks is described, divided into the interface specification and actual implementation. In section \ref{def:concept_search} we elaborate on a search functionality that builds upon the aforementioned service in terms of appropriate query language, a search engine to integrate the search in and the peculiarities of the user interface that could support this enhanced search possibilities. Finally, in section \ref{smc-browser} an advanced interactive user interface for exploring the CMD data domain is proposed.
 \section{System Architecture}
 …
 \end{note}
+\section{Crosswalk service}
+\label{sec:cx}
+Crosswalk service offers the functionality, that was understood under the term \textit{Semantic Mapping} as conceived in the original plans of the Component Metadata Infrastructure. It allows to translate between search indexes. In particular it expresses data category based indexes as equivalent paths to fields in the CMD profiles. This way it builds the base for query expansion enhancing the recall, when searching in the heterogeneous data collection of the joint CLARIN metadata domain.
+\section{cx -- crosswalk service}
+\label{def:cx}
+The crosswalk service offers the functionality, that was understood under the term \textit{Semantic Mapping} as conceived in the original plans of the Component Metadata Infrastructure. Semantic interoperability has been one of the main concerns addressed by the CMDI and appropriate provisions were weaved into the underlying meta-model as well as all the modules of the infrastructure.
+The task of the crosswalk service is to collect the relevant information maintained in the registries of the infrastructure and process it to generate mappings, i.e. \emph{crosswalks} between fields in heterogeneous metadata schemas, building the base for concept-based search in the heterogeneous data collection of the joint CLARIN metadata domain. (cf. \ref{def:qx}).
+The core means for semantic interoperability in CMDI are the \emph{data categories} (cf. \ref{def:DCR}), well-defined atomic concepts, that are supposed to be referenced in schemata annotating fields to unambiguously indicate their intended semantics. Drawing upon this system, the crosswalks are not generated directly between the fields of individual schemata by some matching algorithm, but rather the data categories are used as bridges for translation. This results in clusters of semantically equivalent metadata fields (with data categories serving as pivotal points), rather than in a collection of pair-wise equivalencies between the fields.
 \subsection{smcIndex}\label{indexes}
 …
 \section{Concept-based search}
 \label{def:concept_search}
+\section{qx -- concept-based search}
+\label{def:qx}
 To recall, the main goal of this work is to enhance the search capabilities of the search engines serving the metadata.
 In this section we want to explore, how this shall be accomplished, i.e. how to bring the enhanced capabilities to the user.

SMC4LRT/chapters/Infrastructure.tex

-                      r3551
+                      r3553
 \begin{quotation}
     create a research infrastructure that makes language resources and technologies (LRT) available to scholars of all disciplines, especially SSH large-scale pan-European collaborative effort to create, coordinate and make language resources and technology available and readily useable.
+\dots create a research infrastructure that makes language resources and technologies (LRT) available to scholars of all disciplines, especially SSH large-scale pan-European collaborative effort to create, coordinate and make language resources and technology available and readily usable.
 \end{quotation}
 The infrastructure foresees a federated network of centers (with federated identity management) but mainly providing resources and services in an agreed upon / coherent / uniform / consistent /standardized manner. The foundation for this goal shall be the Common or Component Metadata infrastructure, a model that caters for flexible metadata profiles, allowing to accommodate existing schemas.
+As stated before, the SMC is part of CMDI and depends on multiple modules of the infrastructure. Before we describe the interaction itself in chapter \ref{ch:design}, we introduce in short these modules and the data they provide:
+As stated before, the SMC is part of CMDI and depends on multiple modules on the production side of the infrastructure. Before we describe the interaction itself in chapter \ref{ch:design}, we introduce in short these modules and the data they provide:
 \begin{itemize}
 \item Data Category Registry
 \item Relation Registry
-\item Schema Registry (SCHEMAcat\furl{http://lux13.mpi.nl/schemacat/site/index.html})
 \item Component Registry
 \item Vocabulary Alignement Service (OpenSKOS)
+\item Schema Registry (SCHEMAcat\furl{http://lux13.mpi.nl/schemacat/site/index.html})
 \item SchemaParser
 \end{itemize}
+?MDBrowser
+?MDService
+On the other hand, SMC shall serve the modules on the exploitation side of the infrastructure, i.e. search services used by end users. These are briefly introduced in \label{cmdi_exploitation}.
 \begin{figure*}[!ht]
 …
 \subsection{CMDI - DCR/CR/RR}
+\subsection{CMDI registries: DCR, CR, RR}
 \label{def:CMD}
 \label{def:DCR}
 The \emph{Data Category Registry} (DCR) is a central registry that enables the community to collectively define and maintain a set of relevant linguistic data categories. The resulting commonly agreed controlled vocabulary is the cornerstone for grounding the semantic interpretation within the CMD framework.
 The data model and the procedures of the DCR are defined by the ISO standard \cite{ISO12620:2009}, and is implemented in \emph{ISOcat}\footnote{\url{http://www.isocat.org/}}.
+%Next to a web interface for users to browse and manage the data categories, DCR provides a REST-style webservice allowing applications to access the information (provided in Data Category Interchange Format - DCIF). The data categories are assigned a persistent identifier, making them globally and permanently referenceable.
+The \emph{Component Metadata Framework} (CMD) is built on top of the DCR and complements it. While the DCR defines the atomic concepts, within CMD the metadata schemas can be constructed out of reusable components - collections of metadata fields. The components can contain other components, and they can be reused in multiple profiles as long as each field ``refers via a PID to exactly one data category in the ISO DCR, thus indicating unambiguously how the content of the field in a metadata description should be interpreted'' \cite{Broeder+2010}. This allows to trivially infer equivalencies between metadata fields in different CMD-based schemas. While the primary registry used in CMD is the ISOcat DCR, other authoritative sources for data categories (``trusted registries'') are accepted, especially Dublin Core Metadata Initiative \cite{DCMI:2005}.
+% \emph{Component Registry} implements the Component Data Model and allows to define, maintain and publish CMD-components and -profiles.
+Next to a web interface for users to browse and manage the data categories, DCR provides a REST-style webservice allowing applications to access the information (provided in Data Category Interchange Format - DCIF). The data categories are assigned a persistent identifier, making them globally and permanently referenceable.
+The \emph{Component Metadata Framework} (CMD) is built on top of the DCR and complements it. While the DCR defines the atomic concepts, within CMD the metadata schemas can be constructed out of reusable components - collections of metadata fields. The components can contain other components, and they can be reused in multiple profiles as long as each field ``refers via a PID to exactly one data category in the ISO DCR, thus indicating unambiguously how the content of the field in a metadata description should be interpreted'' \cite{Broeder+2010}. This allows to trivially infer equivalences between metadata fields in different CMD-based schemata. While the primary registry used in CMD is the ISOcat DCR, other authoritative sources for data categories (``trusted registries'') are accepted, especially Dublin Core Metadata Initiative \cite{DCMI:2005}.
+\emph{Component Registry} implements the Component Data Model and allows to define, maintain and publish CMD-components and -profiles.
 …
 However there needs to be an additional means to capture information about relations between data categories.
 This information was deliberately not included in the DCR, because relations often depend on the context in which they are used, making global agreement unfeasible. CMDI proposes a separate module -- the \emph{Relation Registry}\label{def:rr} (RR) \cite{Kemps-Snijders+2008} --, where arbitrary relations between data categories can be stored and maintained. We expect that the RR should be under control of the metadata user whereas the DCR is under control of the metadata modeler.
 % These relations don't need to pass a standardization process, but rather separate research teams may define their own sets of relations according to the specific needs of the project. That is not to say that every researcher has to create her own set of relations -- some basic recommended sets will be defined right from the start. But new -- even contradictory -- ones can be created when needed.
+ These relations don't need to pass a standardization process, but rather separate research teams may define their own sets of relations according to the specific needs of the project. That is not to say that every researcher has to create her own set of relations -- some basic recommended sets will be defined right from the start. But new -- even contradictory -- ones can be created when needed.
 There is a prototypical implementation of such a relation registry called \emph{RELcat} being developed at MPI, Nijmegen. \cite{Windhouwer2011,SchuurmanWindhouwer2011}, that already hosts a few relation sets. There is no user interface to it yet, but it is accessible as a REST-webservice\footnote{sample relation set: \url{http://lux13.mpi.nl/relcat/rest/set/cmdi}}.
 This implementation stores the individual relations as RDF-triples
 \begin{eqnarray*}
 <subjectDatacat, relationPredicate, objectDatcat>
 \end{eqnarray*}
+\begin{example}
+<subjectDatcat, relationPredicate, objectDatcat>
+\end{example}
 allowing typed relations, like equivalency (\texttt{rel:sameAs}) and subsumption (\texttt{rel:subClassOf}). The relations are grouped into relation sets that can be used independently.
 …
 !Cf. Erhard Hinrichs 2009
+\todoin{Describe SCHEMAcat}
+SCHEMAcat is a registry for schemata of all kinds (not just XML-based) semantically annotated with data categories.
+RELcat and SCHEMAcat will provide the means to harvest and specify this information in the form of relationships and allow
+(search) algorithms to traverse the semantic graph thus made explicit\cite{Schuurman2011_SCHEMAcat}.
 \noindent
 …
 Following are those to be handled in short-term, in order of urgency/relevance/prirority:
 \begin{itemize}
 \item the list of language codes\todoin{url: ISO-639}
+\item the list of language codes\cite{ISO639}
 \item country codes
 \item organization names for the domain of language resources
 …
 \subsubsection{Export DCR to SKOS}
 \todocite{Menzo2013-03-12 mail}
+\cite{Menzo2013mail}
 …
 field/element/attribute, complex DCs in ISOcat are the users of such
 vocabularies and simple DCs the DCR equivalence of values in such a
 vocabulary.\todocite{Menzo}
+vocabulary.\cite{Menzo2013mail}
 Another aspect is, that a simple DC can be in valuedomains of multiple closed DCs.
 …
 \end{lstlisting}
 A current proposal by Windhouwer\todocite{Menzo2013-03-12 mail} for integration with CLAVAS foresees following extension:
+A current proposal by Windhouwer\cite{Menzo2013mail} for integration with CLAVAS foresees following extension:
 \begin{lstlisting}
 …
 It needs to be yet defined how the information about the vocabulary can be translated into a valid schema representation.
 One brute-force approach would be to explicitely enumerate all the values from the vocabulary. This is being currently done
 within the CMD-framework with the language-codes\todocite{cmd-component ISO-639}. However there is clearly a limit to this approach both in terms of size of the vocabulary (ISO-639 contains 7.679 items (language codes)  adding some 2MB to each schema referencing it) and its stability/change rate --- ISO-639 is a standard with a fixed list, however most other vocabularies are more volatile (think organization).
+within the CMD-framework with the language-codes\furl{http://catalog.clarin.eu/ds/ComponentRegistry/?item=clarin.eu:cr1:c_1271859438110}. However there is clearly a limit to this approach both in terms of size of the vocabulary (ISO-639 contains 7.679 items (language codes)  adding some 2MB to each schema referencing it) and its stability/change rate --- ISO-639 is a standard with a fixed list, however most other vocabularies are more volatile (think organization). And even this supposedly fixed list undergoes regular changes -- it is being updated semi-annually, with entries being added, deleted, merged and split.\furl{http://www-01.sil.org/iso639-3/changes.asp}
 Most of these vocabularies also cannot be seen as closed-constrained, i.e. the list that is provided, provides a recommended orthography variant for a given entity, still allowing other values for given field rather than resricting the values to only the items from the vocabulary (think organizations).
 …
 So this has to be solved in ``soft'' way. Most schema languages allow to annotate the schema.
 This is already used with DCR, adding the \code{@dcr:datcat} into schema elements.
 Also CMDI (ComponentRegistry when generating schemas) puts information in <xs:appinfo/>.
+Also CMDI (ComponentRegistry when generating schemas) puts information in \code{<xs:appinfo/>}.
 Tools like Arbil can get access to these annotations, e.g., a reference to a CLAVAS vocabulary, and act upon
 …
 \subsection{CMDI - Exploitation side}
+\label{cmdi_exploitation}
 Metadata complying to the CMD-framework is being created by a growing number of institutions  by various means, automatic transformation from legacy data, authoring of new metadata records with the help of one of the Metadata-Editors (TODO: cite: Arbil, NALIDA, ). The CMD-Infrastructure requires the content providers to publish their metadata via the OAI-PMH protocol and announce the OAI-PMH endpoints.  These are being harvested daily by a dedicated CLARIN harvester\footnote{\url{http://catalog.clarin.eu/oai-harvester/}}. The harvested data is validated against the schemas \todoin{What about Normalization?}.  and made available in packaged datasets. These are being fetched by the exploitations side components, that index the metadata records and make them available for searching and browsing.
 \begin{figure*}[!ht]
 \includegraphics[width=1\textwidth]{images/CMDingestion_woVAS}
+\includegraphics[width=0.8\textwidth]{images/CMDingestion_woVAS}
 \caption{Within CMDI, metadata is harvested from content providers via OAI-PMH and made available to consumers/users by exploitation side components}
 \end{figure*}

SMC4LRT/chapters/Introduction.tex

-                      r3551
+                      r3553
 In an evaluation phase, we apply a set of  test queries and compare a traditional search with a semantically expanded query in terms of recall/precision measures.
 In this work the focus lies on the method itself -- expressed in the specification and operationalized in the (prototypical) implementation of the module -- rather than trying to establish final, accomplished alignment of the schemas. Although a tentative mapping on a subset of the data will be proposed, this will be mainly used for evaluation and shall serve as basis for discussion with domain experts aimed at creating further, more comprehensive mappings.
+In this work, the focus lies on the actual method to generate and apply the crosswalks -- expressed in the specification and operationalized in the (prototypical) implementation of the service -- rather than trying to establish final, accomplished crosswalks between the schemas. In fact, given the great diversity of resources and research tasks, a ``final'' complete alignment does not seem achievable at all. Therefore also the focus shall be on \emph{dynamic mapping}, i.e. to enable the users to directly manipulate the level of use of the crosswalks or even apply custom crosswalks depending on their current task or research question being able to actively influence the recall/precision ratio of the search results, and essentially to modulate the semantic search space.
-In fact, due to the great diversity of resources and research tasks, a ``final'' complete alignment does not seem achievable at all. Therefore also the focus shall be on ``soft'' dynamic mapping, i.e. to enable the users to adapt the mapping or apply different mappings depending on their current task or research question essentially being able to actively influence the recall/precision ratio of the search results.
-\begin{note}
-A special focus will be put on the examination of the feasibility of employing ontology mapping and alignment techniques and tools for the creation of the mappings.
-especially the application of techniques from ontology mapping to the domain-specific data collection (the domain of LRT).
-\end{note}
 Serving the second subgoal, semantic interpretation on the instance level, we will propose the expression of all of the domain data (from meta-model specification to instances) in RDF, linking to corresponding entities in appropriate external
 semantic resources (controlled vocabularies, ontologies).
 Once the dataset is expressed in RDF, it can be exposed via a semantic web application and publicized as another nucleus of \emph{Linked Open Data} in the global \emph{Web Of Data}.
+Once the dataset is expressed in RDF, it can be exposed via a semantic web application and published as another nucleus of \emph{Linked Open Data} in the global \emph{Web Of Data}.
 A separate usability evaluation of the semantic search is indicated examining the user interaction with and display of the relevant additional information in the user search interface, however this issue can only be tackled marginally and will have to be outsourced into future work.
+A separate usability evaluation of the semantic search is indicated, examining the user interaction with and display of the relevant additional information in the user search interface, however this issue can only be tackled marginally and will have to be outsourced into future work.
 \section{Expected Results}
 The main result of this work will be the \emph{specification} of the two modules \xne{concept-based search} and the underlying \texttt{crosswalk service}.
+The main result of this work will be the \emph{specification} of the two modules \xne{concept-based search} and the underlying \xne{crosswalk service}.
 This theoretical part will be accompanied by a proof-of-concept \emph{implementation} of the components
 and the results and findings of the \emph{evaluation}.
 Another result of the work will be the original dataset expressed as RDF interlinked with existing external resources (ontologies, knowledge bases, vocabularies), effectively laying a foundation for providing this dataset as \emph{Linked Open Data}\furl{http://linkeddata.org/} in the \emph{Web of Data}.
+Another result of the work will be the original dataset expressed as RDF interlinked with existing external resources (ontologies, knowledge bases, vocabularies), effectively laying a foundation for providing this dataset as \emph{Linked Open Data}\furl{http://linkeddata.org/}.
 \begin{description}
 \item [Crosswalk service] specification and proof of basic implementation of the module
 \item [Concept-based search] design of the query expansion and integration with search engines
 \item [Visualization] design of an application for interactive exploration of the concerned dataset
 \item [Evaluation] evaluation results of querying the dataset comparing traditional search and semantic search
+\item [Crosswalk service] specification and a basic implementation of the service
+\item [Concept-based search] design of the query expansion and prototypical integration with search engines
+\item [Visualization tool] design of an application for interactive exploration of the concerned dataset
+\item [Evaluation] evaluation results of querying the dataset comparing simple search and semantic search
 \item [LinkedData] translation of the source dataset to RDF-based format with links into existing datasets, ontologies, knowledge bases
 \end{description}
 …
 The work starts with examining the state of the art work in the two fields  language resources and technology and semantic web technologies in chapter \ref{ch:lit}, followed by administrative chapter \ref{ch:def} explaining the terms and abbreviations used in the work.
 In chapter \ref{ch:data} we analyze the situation in the data domain of LRT metadata and in chapter \ref{ch:infra} we discuss the individual software components /modules /services of the infrastructure underlying this work.
+In chapter \ref{ch:data} we analyze the situation in the data domain of LRT metadata and in chapter \ref{ch:infra} we discuss the individual software components of the infrastructure underlying this work.
 The main part of the work is found in chapters \ref{ch:design} and \ref{ch:design-instance} laying out the design of the software module, the proposal how to modell the data in RDF respectively.
+The main part of the work is found in chapters \ref{ch:design} and \ref{ch:design-instance} laying out the design of the software module and a proposal how to model the data in RDF respectively.
 The evaluation and the results are discussed in chapter \ref{ch:results}. Finally, in chapter \ref{ch:conclusions} we summarize the findings of the work and lay out where it could develop in the future.
 …
 \section{Keywords}
+Metadata interoperability, Ontology Mapping, Schema mapping, Crosswalk, LinkedData
+Fuzzy Search, Visual Search?
+Language Resources and Technology, LRT/NLP/HLT
+Ontology Visualization
+semantic interoperability -- crosswalks -- schema mapping -- metadata -- language resources and technology -- linked data -- visualization

Note: See TracChangeset for help on using the changeset viewer.

Download in other formats: