- Timestamp:
- 09/12/13 13:08:29 (11 years ago)
- Location:
- SMC4LRT/chapters
- Files:
-
- 6 edited
Legend:
- Unmodified
- Added
- Removed
-
SMC4LRT/chapters/Data.tex
r3551 r3553 44 44 45 45 46 \todoin{probably more numbers about CMD records (collections, used profiles, ...) (in historical perspective?)} 47 48 On the instance level, in the harvested data 60 distinct profiles can be found. 46 \todoin{ add historical perspective on data - list overall} 49 47 50 48 The main CLARIN OAI-PMH harvester\footnote{\url{http://catalog.clarin.eu/oai-harvester/}} … … 85 83 \end{table} 86 84 85 \begin{table} 86 \caption{Top 20 collections, with the respective number of records} 87 \begin{center} 88 \begin{tabular}{ r | l } 89 \# records & colleciton \\ 90 \hline 91 243.129 & Meertens collection: Liederenbank \\ 92 46.658 & DK-CLARIN Repository \\ 93 46.156 & Nederlands Instituut voor Beeld en Geluid Academia collectie \\ 94 29.266 & childes \\ 95 24.583 & DoBeS archive \\ 96 23.185 & Language and Cognition \\ 97 14.593 & talkbank \\ 98 14.363 & Acquisition \\ 99 14.320 & Institut fÃŒr Deutsche Sprache, CLARIN-D Zentrum, Mannheim \\ 100 12.893 & MPI CGN \\ 101 10.628 & Bavarian Archive for Speech Signals (BAS) \\ 102 7.964 & Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC) \\ 103 7.348 & WALS RefDB \\ 104 5.689 & Lund Corpora \\ 105 4.640 & Oxford Text Archive \\ 106 4.492 & Leipzig Corpora Collection \\ 107 3.539 & Institut fÃŒr Deutsche Sprache, CLARIN-D Zentrum, Mannheim \\ 108 3.280 & A Digital Archive of Research Papers in Computational Linguistics \\ 109 3.147 & CLARIN NL \\ 110 3.081 & MPI fÃŒr Bildungsforschung \\ 111 \hline 112 \end{tabular} 113 \end{center} 114 \end{table} 115 87 116 We can also observe a large disparity on the amount of records between individual providers and profiles. Almost half of all records is provided by the Meertens Institute (\textit{Liederenbank} and \textit{Soundbites} collections), another 25\% by MPI for Psycholinguistics (\textit{corpus} + \textit{Session} records from the \textit{The Language Archive}). On the other hand there are 25 profiles that have less than 10 instances. This can be owing both to the state of the respective project (resources and records still being prepared) and the modelled granularity level (collection vs. individual resource). 88 117 -
SMC4LRT/chapters/Definitions.tex
r3551 r3553 24 24 \item[LINDAT] czech national infrastructure for LRT\furl{http://lindat.ufal.cuni.cz} 25 25 \item[OLAC] \textit{Open Language Archive Community}\furl{http://www.language-archives.org/}\ref{def:OLAC} 26 \item[PID] persistend identifier \ todocite{PID}27 \item[PURL] persistent uniform resource locator \ todocite{PURL}28 \item[RDF] \textit{Resource Description Framework} \ todocite{RDF}26 \item[PID] persistend identifier \cite{CLARIN2009_PID} 27 \item[PURL] persistent uniform resource locator \cite{PURL1995} 28 \item[RDF] \textit{Resource Description Framework} \cite{RDF2004} 29 29 \item[RR] Relation Registry\ref{def:rr} 30 30 \item[TEI] \textit{Text Encoding Initiative} … … 37 37 \begin{description} 38 38 \item[Concept] Basic "entity" in an ontology? that of what an ontology is build 39 \item[Ontology] ``an explicit specification of a conceptualization'' \todocite {Ontology!}, but for us mainly a collection of concepts as opposed to lexicon, which is a collection of words.39 \item[Ontology] \quote{formal, explicit specification of a shared conceptualisation} \cite{Gruber1993}, but for us mainly a collection of concepts as opposed to lexicon, which is a collection of words. 40 40 \item[Word] a lexical unit, a word in a language, something that has a surface realization (writtenForm) and is a carrier of sense. so a relation holds: hasSense(Word, Concept) 41 41 \item[Lexicon] a collection of words, a (lexical) vocabulary -
SMC4LRT/chapters/Design_SMCinstance.tex
r3551 r3553 306 306 307 307 \subsection{RELcat - Ontological relations} 308 Information in RELcat is already stored in RDF (Schuurman, Windhouwer 2011). One relation from the example relation set for CMDI : 309 isocat:DC-2538 rel:sameAs dce:date. 310 Should we generate the redundant triples based on the relations defined between data categories? I.e. if there is a relation 308 Information in RELcat is already stored in RDF \cite{SchuurmanWindhouwer2011}. One relation from the example relation set for CMDI : 311 309 312 310 \begin{example} … … 314 312 \end{example} 315 313 316 and a resource has value 314 Should we generate the redundant triples based on the relations defined between data categories? I.e. if there is a relation and a resource has value: 317 315 318 316 \begin{example} … … 324 322 \begin{example} 325 323 <lr1> dct:date 2012^^xs:year 326 ? 327 \end{example} 324 \end{example} 325 326 ? 328 327 329 328 What about other relations we may want to express? (Do we need them and if yes, where to put them? â still in RR?) Examples: … … 345 344 \todocode{install older python (2.5?) to be able to install dot2tex - transforming dot files to nicer pgf formatted graphs}\furl{http://dot2tex.googlecode.com/files/dot2tex-2.8.7.zip}\furl{file:/C:/Users/m/2kb/tex/dot2tex-2.8.7/} 346 345 347 348 346 \todocode{check install siren}\furl{http://siren.sindice.com/} 349 \todocode{check install Virtuoso}\furl{http://ods.openlinksw.com/wiki/ODS/}350 \todocode{check install Neo4J}351 \todocode{check install ontology browser}352 347 353 348 semantic search component in the Linked Media Framework … … 356 351 \todoin{check SARQ}\furl{http://github.com/castagna/SARQ} 357 352 358 \todocode{Load data: relcat, clavas, olac-and-dc-providers cmd, lt-world?}359 353 360 354 \section {Full semantic search - concept-based + ontology-driven ?} -
SMC4LRT/chapters/Design_SMCschema.tex
r3551 r3553 3 3 \label{ch:design} 4 4 5 In this chapter, we lay out the functioning of the semantic mapping on schema level, the task the Semantic Mapping Component was originally conceived for within the larger CMD Infrastructure (cf. \ref{def:CMDI}). 6 Semantic interoperability was one of the main concerns addressed by the CMDI and is weaved in tightly in all modules of the infrastructure. The task of the SMC module is to collect information maintained in the registries of the infrastructure and process it to generate mappings, i.e. \emph{crosswalks} between fields in heterogeneous metadata formats. This information serves as basis for the concept-based search. 5 In this chapter, we define the part of the proposed system pertaining to the schema level: the concept-based crosswalk and search functionality -- the tasks that the Semantic Mapping Component was originally conceived for within the larger CMD Infrastructure (cf. \ref{def:CMDI}) -- and, additionally, the aspect of visualization of schema-level (model) data. 7 6 8 7 We start by drawing a global view on the system, introducing its individual components and the dependencies among them. 9 In the next section, the internal data model is presented and explained. In section \ref{sec:cx} the design of the actual main service for resolving crosswalks is described, divided into the interface specification and actual implementation. In section \ref{def:concept_search} we elaborate on a search functionality that builds upon the aforementioned service in terms of appropriate query language, a search engine to integrate the search in and the peculiarities of the user interface that could support this enhanced search possibilities. Finally, in section \ref{smc-browser} a advanced interactive user interface for exploring the CMD data domain is proposed.8 In the next section, the internal data model is presented and explained. In section \ref{sec:cx} the design of the actual main service for resolving crosswalks is described, divided into the interface specification and actual implementation. In section \ref{def:concept_search} we elaborate on a search functionality that builds upon the aforementioned service in terms of appropriate query language, a search engine to integrate the search in and the peculiarities of the user interface that could support this enhanced search possibilities. Finally, in section \ref{smc-browser} an advanced interactive user interface for exploring the CMD data domain is proposed. 10 9 11 10 \section{System Architecture} … … 42 41 \end{note} 43 42 44 \section{Crosswalk service} 45 \label{sec:cx} 46 Crosswalk service offers the functionality, that was understood under the term \textit{Semantic Mapping} as conceived in the original plans of the Component Metadata Infrastructure. It allows to translate between search indexes. In particular it expresses data category based indexes as equivalent paths to fields in the CMD profiles. This way it builds the base for query expansion enhancing the recall, when searching in the heterogeneous data collection of the joint CLARIN metadata domain. 47 43 \section{cx -- crosswalk service} 44 \label{def:cx} 45 46 The crosswalk service offers the functionality, that was understood under the term \textit{Semantic Mapping} as conceived in the original plans of the Component Metadata Infrastructure. Semantic interoperability has been one of the main concerns addressed by the CMDI and appropriate provisions were weaved into the underlying meta-model as well as all the modules of the infrastructure. 47 The task of the crosswalk service is to collect the relevant information maintained in the registries of the infrastructure and process it to generate mappings, i.e. \emph{crosswalks} between fields in heterogeneous metadata schemas, building the base for concept-based search in the heterogeneous data collection of the joint CLARIN metadata domain. (cf. \ref{def:qx}). 48 49 The core means for semantic interoperability in CMDI are the \emph{data categories} (cf. \ref{def:DCR}), well-defined atomic concepts, that are supposed to be referenced in schemata annotating fields to unambiguously indicate their intended semantics. Drawing upon this system, the crosswalks are not generated directly between the fields of individual schemata by some matching algorithm, but rather the data categories are used as bridges for translation. This results in clusters of semantically equivalent metadata fields (with data categories serving as pivotal points), rather than in a collection of pair-wise equivalencies between the fields. 48 50 49 51 \subsection{smcIndex}\label{indexes} … … 175 177 176 178 177 \section{ Concept-based search}178 \label{def: concept_search}179 \section{qx -- concept-based search} 180 \label{def:qx} 179 181 To recall, the main goal of this work is to enhance the search capabilities of the search engines serving the metadata. 180 182 In this section we want to explore, how this shall be accomplished, i.e. how to bring the enhanced capabilities to the user. -
SMC4LRT/chapters/Infrastructure.tex
r3551 r3553 9 9 10 10 \begin{quotation} 11 create a research infrastructure that makes language resources and technologies (LRT) available to scholars of all disciplines, especially SSH large-scale pan-European collaborative effort to create, coordinate and make language resources and technology available and readily useable.11 \dots create a research infrastructure that makes language resources and technologies (LRT) available to scholars of all disciplines, especially SSH large-scale pan-European collaborative effort to create, coordinate and make language resources and technology available and readily usable. 12 12 \end{quotation} 13 13 14 14 The infrastructure foresees a federated network of centers (with federated identity management) but mainly providing resources and services in an agreed upon / coherent / uniform / consistent /standardized manner. The foundation for this goal shall be the Common or Component Metadata infrastructure, a model that caters for flexible metadata profiles, allowing to accommodate existing schemas. 15 15 16 17 18 As stated before, the SMC is part of CMDI and depends on multiple modules of the infrastructure. Before we describe the interaction itself in chapter \ref{ch:design}, we introduce in short these modules and the data they provide: 16 As stated before, the SMC is part of CMDI and depends on multiple modules on the production side of the infrastructure. Before we describe the interaction itself in chapter \ref{ch:design}, we introduce in short these modules and the data they provide: 19 17 20 18 \begin{itemize} 21 19 \item Data Category Registry 22 20 \item Relation Registry 23 \item Schema Registry (SCHEMAcat\furl{http://lux13.mpi.nl/schemacat/site/index.html})24 21 \item Component Registry 25 22 \item Vocabulary Alignement Service (OpenSKOS) 23 \item Schema Registry (SCHEMAcat\furl{http://lux13.mpi.nl/schemacat/site/index.html}) 26 24 \item SchemaParser 27 25 \end{itemize} 28 26 29 ?MDBrowser 30 ?MDService 27 On the other hand, SMC shall serve the modules on the exploitation side of the infrastructure, i.e. search services used by end users. These are briefly introduced in \label{cmdi_exploitation}. 31 28 32 29 \begin{figure*}[!ht] … … 36 33 37 34 38 \subsection{CMDI - DCR/CR/RR}35 \subsection{CMDI registries: DCR, CR, RR} 39 36 \label{def:CMD} 40 37 \label{def:DCR} 41 38 39 40 42 41 The \emph{Data Category Registry} (DCR) is a central registry that enables the community to collectively define and maintain a set of relevant linguistic data categories. The resulting commonly agreed controlled vocabulary is the cornerstone for grounding the semantic interpretation within the CMD framework. 43 42 The data model and the procedures of the DCR are defined by the ISO standard \cite{ISO12620:2009}, and is implemented in \emph{ISOcat}\footnote{\url{http://www.isocat.org/}}. 44 %Next to a web interface for users to browse and manage the data categories, DCR provides a REST-style webservice allowing applications to access the information (provided in Data Category Interchange Format - DCIF). The data categories are assigned a persistent identifier, making them globally and permanently referenceable. 45 46 The \emph{Component Metadata Framework} (CMD) is built on top of the DCR and complements it. While the DCR defines the atomic concepts, within CMD the metadata schemas can be constructed out of reusable components - collections of metadata fields. The components can contain other components, and they can be reused in multiple profiles as long as each field ``refers via a PID to exactly one data category in the ISO DCR, thus indicating unambiguously how the content of the field in a metadata description should be interpreted'' \cite{Broeder+2010}. This allows to trivially infer equivalencies between metadata fields in different CMD-based schemas. While the primary registry used in CMD is the ISOcat DCR, other authoritative sources for data categories (``trusted registries'') are accepted, especially Dublin Core Metadata Initiative \cite{DCMI:2005}. 47 % \emph{Component Registry} implements the Component Data Model and allows to define, maintain and publish CMD-components and -profiles. 43 Next to a web interface for users to browse and manage the data categories, DCR provides a REST-style webservice allowing applications to access the information (provided in Data Category Interchange Format - DCIF). The data categories are assigned a persistent identifier, making them globally and permanently referenceable. 44 45 The \emph{Component Metadata Framework} (CMD) is built on top of the DCR and complements it. While the DCR defines the atomic concepts, within CMD the metadata schemas can be constructed out of reusable components - collections of metadata fields. The components can contain other components, and they can be reused in multiple profiles as long as each field ``refers via a PID to exactly one data category in the ISO DCR, thus indicating unambiguously how the content of the field in a metadata description should be interpreted'' \cite{Broeder+2010}. This allows to trivially infer equivalences between metadata fields in different CMD-based schemata. While the primary registry used in CMD is the ISOcat DCR, other authoritative sources for data categories (``trusted registries'') are accepted, especially Dublin Core Metadata Initiative \cite{DCMI:2005}. 46 47 \emph{Component Registry} implements the Component Data Model and allows to define, maintain and publish CMD-components and -profiles. 48 48 49 49 … … 56 56 However there needs to be an additional means to capture information about relations between data categories. 57 57 This information was deliberately not included in the DCR, because relations often depend on the context in which they are used, making global agreement unfeasible. CMDI proposes a separate module -- the \emph{Relation Registry}\label{def:rr} (RR) \cite{Kemps-Snijders+2008} --, where arbitrary relations between data categories can be stored and maintained. We expect that the RR should be under control of the metadata user whereas the DCR is under control of the metadata modeler. 58 %These relations don't need to pass a standardization process, but rather separate research teams may define their own sets of relations according to the specific needs of the project. That is not to say that every researcher has to create her own set of relations -- some basic recommended sets will be defined right from the start. But new -- even contradictory -- ones can be created when needed.58 These relations don't need to pass a standardization process, but rather separate research teams may define their own sets of relations according to the specific needs of the project. That is not to say that every researcher has to create her own set of relations -- some basic recommended sets will be defined right from the start. But new -- even contradictory -- ones can be created when needed. 59 59 60 60 There is a prototypical implementation of such a relation registry called \emph{RELcat} being developed at MPI, Nijmegen. \cite{Windhouwer2011,SchuurmanWindhouwer2011}, that already hosts a few relation sets. There is no user interface to it yet, but it is accessible as a REST-webservice\footnote{sample relation set: \url{http://lux13.mpi.nl/relcat/rest/set/cmdi}}. 61 61 This implementation stores the individual relations as RDF-triples 62 \begin{e qnarray*}63 <subjectDat acat, relationPredicate, objectDatcat>64 \end{e qnarray*}62 \begin{example} 63 <subjectDatcat, relationPredicate, objectDatcat> 64 \end{example} 65 65 allowing typed relations, like equivalency (\texttt{rel:sameAs}) and subsumption (\texttt{rel:subClassOf}). The relations are grouped into relation sets that can be used independently. 66 66 … … 68 68 !Cf. Erhard Hinrichs 2009 69 69 70 \todoin{Describe SCHEMAcat} 70 SCHEMAcat is a registry for schemata of all kinds (not just XML-based) semantically annotated with data categories. 71 72 RELcat and SCHEMAcat will provide the means to harvest and specify this information in the form of relationships and allow 73 (search) algorithms to traverse the semantic graph thus made explicit\cite{Schuurman2011_SCHEMAcat}. 71 74 72 75 \noindent … … 116 119 Following are those to be handled in short-term, in order of urgency/relevance/prirority: 117 120 \begin{itemize} 118 \item the list of language codes\ todoin{url: ISO-639}121 \item the list of language codes\cite{ISO639} 119 122 \item country codes 120 123 \item organization names for the domain of language resources … … 143 146 144 147 \subsubsection{Export DCR to SKOS} 145 \ todocite{Menzo2013-03-12mail}148 \cite{Menzo2013mail} 146 149 147 150 … … 155 158 field/element/attribute, complex DCs in ISOcat are the users of such 156 159 vocabularies and simple DCs the DCR equivalence of values in such a 157 vocabulary.\ todocite{Menzo}160 vocabulary.\cite{Menzo2013mail} 158 161 159 162 Another aspect is, that a simple DC can be in valuedomains of multiple closed DCs. … … 214 217 \end{lstlisting} 215 218 216 A current proposal by Windhouwer\ todocite{Menzo2013-03-12mail} for integration with CLAVAS foresees following extension:219 A current proposal by Windhouwer\cite{Menzo2013mail} for integration with CLAVAS foresees following extension: 217 220 218 221 \begin{lstlisting} … … 272 275 It needs to be yet defined how the information about the vocabulary can be translated into a valid schema representation. 273 276 One brute-force approach would be to explicitely enumerate all the values from the vocabulary. This is being currently done 274 within the CMD-framework with the language-codes\ todocite{cmd-component ISO-639}. However there is clearly a limit to this approach both in terms of size of the vocabulary (ISO-639 contains 7.679 items (language codes) adding some 2MB to each schema referencing it) and its stability/change rate --- ISO-639 is a standard with a fixed list, however most other vocabularies are more volatile (think organization).277 within the CMD-framework with the language-codes\furl{http://catalog.clarin.eu/ds/ComponentRegistry/?item=clarin.eu:cr1:c_1271859438110}. However there is clearly a limit to this approach both in terms of size of the vocabulary (ISO-639 contains 7.679 items (language codes) adding some 2MB to each schema referencing it) and its stability/change rate --- ISO-639 is a standard with a fixed list, however most other vocabularies are more volatile (think organization). And even this supposedly fixed list undergoes regular changes -- it is being updated semi-annually, with entries being added, deleted, merged and split.\furl{http://www-01.sil.org/iso639-3/changes.asp} 275 278 276 279 Most of these vocabularies also cannot be seen as closed-constrained, i.e. the list that is provided, provides a recommended orthography variant for a given entity, still allowing other values for given field rather than resricting the values to only the items from the vocabulary (think organizations). … … 278 281 So this has to be solved in ``soft'' way. Most schema languages allow to annotate the schema. 279 282 This is already used with DCR, adding the \code{@dcr:datcat} into schema elements. 280 Also CMDI (ComponentRegistry when generating schemas) puts information in <xs:appinfo/>.283 Also CMDI (ComponentRegistry when generating schemas) puts information in \code{<xs:appinfo/>}. 281 284 282 285 Tools like Arbil can get access to these annotations, e.g., a reference to a CLAVAS vocabulary, and act upon … … 315 318 316 319 \subsection{CMDI - Exploitation side} 320 \label{cmdi_exploitation} 317 321 Metadata complying to the CMD-framework is being created by a growing number of institutions by various means, automatic transformation from legacy data, authoring of new metadata records with the help of one of the Metadata-Editors (TODO: cite: Arbil, NALIDA, ). The CMD-Infrastructure requires the content providers to publish their metadata via the OAI-PMH protocol and announce the OAI-PMH endpoints. These are being harvested daily by a dedicated CLARIN harvester\footnote{\url{http://catalog.clarin.eu/oai-harvester/}}. The harvested data is validated against the schemas \todoin{What about Normalization?}. and made available in packaged datasets. These are being fetched by the exploitations side components, that index the metadata records and make them available for searching and browsing. 318 322 319 323 \begin{figure*}[!ht] 320 \includegraphics[width= 1\textwidth]{images/CMDingestion_woVAS}324 \includegraphics[width=0.8\textwidth]{images/CMDingestion_woVAS} 321 325 \caption{Within CMDI, metadata is harvested from content providers via OAI-PMH and made available to consumers/users by exploitation side components} 322 326 \end{figure*} -
SMC4LRT/chapters/Introduction.tex
r3551 r3553 85 85 In an evaluation phase, we apply a set of test queries and compare a traditional search with a semantically expanded query in terms of recall/precision measures. 86 86 87 In this work the focus lies on the method itself -- expressed in the specification and operationalized in the (prototypical) implementation of the module -- rather than trying to establish final, accomplished alignment of the schemas. Although a tentative mapping on a subset of the data will be proposed, this will be mainly used for evaluation and shall serve as basis for discussion with domain experts aimed at creating further, more comprehensive mappings.87 In this work, the focus lies on the actual method to generate and apply the crosswalks -- expressed in the specification and operationalized in the (prototypical) implementation of the service -- rather than trying to establish final, accomplished crosswalks between the schemas. In fact, given the great diversity of resources and research tasks, a ``final'' complete alignment does not seem achievable at all. Therefore also the focus shall be on \emph{dynamic mapping}, i.e. to enable the users to directly manipulate the level of use of the crosswalks or even apply custom crosswalks depending on their current task or research question being able to actively influence the recall/precision ratio of the search results, and essentially to modulate the semantic search space. 88 88 89 In fact, due to the great diversity of resources and research tasks, a ``final'' complete alignment does not seem achievable at all. Therefore also the focus shall be on ``soft'' dynamic mapping, i.e. to enable the users to adapt the mapping or apply different mappings depending on their current task or research question essentially being able to actively influence the recall/precision ratio of the search results.90 91 \begin{note}92 A special focus will be put on the examination of the feasibility of employing ontology mapping and alignment techniques and tools for the creation of the mappings.93 94 especially the application of techniques from ontology mapping to the domain-specific data collection (the domain of LRT).95 \end{note}96 89 97 90 Serving the second subgoal, semantic interpretation on the instance level, we will propose the expression of all of the domain data (from meta-model specification to instances) in RDF, linking to corresponding entities in appropriate external 98 91 semantic resources (controlled vocabularies, ontologies). 99 Once the dataset is expressed in RDF, it can be exposed via a semantic web application and publi cized as another nucleus of \emph{Linked Open Data} in the global \emph{Web Of Data}.92 Once the dataset is expressed in RDF, it can be exposed via a semantic web application and published as another nucleus of \emph{Linked Open Data} in the global \emph{Web Of Data}. 100 93 101 A separate usability evaluation of the semantic search is indicated examining the user interaction with and display of the relevant additional information in the user search interface, however this issue can only be tackled marginally and will have to be outsourced into future work.94 A separate usability evaluation of the semantic search is indicated, examining the user interaction with and display of the relevant additional information in the user search interface, however this issue can only be tackled marginally and will have to be outsourced into future work. 102 95 103 96 \section{Expected Results} 104 97 105 The main result of this work will be the \emph{specification} of the two modules \xne{concept-based search} and the underlying \ texttt{crosswalk service}.98 The main result of this work will be the \emph{specification} of the two modules \xne{concept-based search} and the underlying \xne{crosswalk service}. 106 99 This theoretical part will be accompanied by a proof-of-concept \emph{implementation} of the components 107 100 and the results and findings of the \emph{evaluation}. 108 101 109 Another result of the work will be the original dataset expressed as RDF interlinked with existing external resources (ontologies, knowledge bases, vocabularies), effectively laying a foundation for providing this dataset as \emph{Linked Open Data}\furl{http://linkeddata.org/} in the \emph{Web of Data}.102 Another result of the work will be the original dataset expressed as RDF interlinked with existing external resources (ontologies, knowledge bases, vocabularies), effectively laying a foundation for providing this dataset as \emph{Linked Open Data}\furl{http://linkeddata.org/}. 110 103 111 104 \begin{description} 112 \item [Crosswalk service] specification and proof of basic implementation of the module113 \item [Concept-based search] design of the query expansion and integration with search engines114 \item [Visualization ] design of an application for interactive exploration of the concerned dataset115 \item [Evaluation] evaluation results of querying the dataset comparing traditionalsearch and semantic search105 \item [Crosswalk service] specification and a basic implementation of the service 106 \item [Concept-based search] design of the query expansion and prototypical integration with search engines 107 \item [Visualization tool] design of an application for interactive exploration of the concerned dataset 108 \item [Evaluation] evaluation results of querying the dataset comparing simple search and semantic search 116 109 \item [LinkedData] translation of the source dataset to RDF-based format with links into existing datasets, ontologies, knowledge bases 117 110 \end{description} … … 120 113 The work starts with examining the state of the art work in the two fields language resources and technology and semantic web technologies in chapter \ref{ch:lit}, followed by administrative chapter \ref{ch:def} explaining the terms and abbreviations used in the work. 121 114 122 In chapter \ref{ch:data} we analyze the situation in the data domain of LRT metadata and in chapter \ref{ch:infra} we discuss the individual software components /modules /servicesof the infrastructure underlying this work.115 In chapter \ref{ch:data} we analyze the situation in the data domain of LRT metadata and in chapter \ref{ch:infra} we discuss the individual software components of the infrastructure underlying this work. 123 116 124 The main part of the work is found in chapters \ref{ch:design} and \ref{ch:design-instance} laying out the design of the software module , the proposal how to modell the data in RDF respectively.117 The main part of the work is found in chapters \ref{ch:design} and \ref{ch:design-instance} laying out the design of the software module and a proposal how to model the data in RDF respectively. 125 118 126 119 The evaluation and the results are discussed in chapter \ref{ch:results}. Finally, in chapter \ref{ch:conclusions} we summarize the findings of the work and lay out where it could develop in the future. … … 128 121 \section{Keywords} 129 122 130 Metadata interoperability, Ontology Mapping, Schema mapping, Crosswalk, LinkedData 131 Fuzzy Search, Visual Search? 132 133 Language Resources and Technology, LRT/NLP/HLT 134 135 Ontology Visualization 123 semantic interoperability -- crosswalks -- schema mapping -- metadata -- language resources and technology -- linked data -- visualization
Note: See TracChangeset
for help on using the changeset viewer.