Changeset 2696 for SMC4LRT


Ignore:
Timestamp:
03/13/13 18:06:34 (11 years ago)
Author:
vronk
Message:

added notes from Menzo about DCR CLAVAS interaction

Location:
SMC4LRT/chapters
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • SMC4LRT/chapters/Infrastructure.tex

    r2672 r2696  
    1 \chapter{Underlying infrastructure}\label{ch:components}
    2 
    3 As stated before, the proposed module is part of CMDI and depends on multiple modules of the infrastructure. Before we describe the interaction itself in chapter \ref{method}, we introduce in short these modules and the data they provide:
     1\chapter{Underlying infrastructure}
     2\label{ch:components}
     3
     4
     5\section{CLARIN / CMDI}
     6
     7CLARIN - Common Language Resource and Technology Infrastructure - constituted by over 180 members from round 38 countries. The mission of this project is to
     8
     9\begin{quotation}
     10    create a research infrastructure that makes language resources and technologies (LRT) available to scholars of all disciplines, especially SSH large-scale pan-European collaborative effort to create, coordinate and make language resources and technology available and readily useable.
     11\end{quotation}
     12
     13The infrastructure foresees a federated network of centers (with federated identity management) but mainly providing resources and services in an agreed upon / coherent / uniform / consistent /standardized manner. The foundation for this goal shall be the Common or Component Metadata infrastructure, a model that caters for flexible metadata profiles, allowing to accommodate existing schemas.
     14
     15
     16
     17As stated before, the SMC is part of CMDI and depends on multiple modules of the infrastructure. Before we describe the interaction itself in chapter \ref{method}, we introduce in short these modules and the data they provide:
    418
    519\begin{itemize}
     
    2034\end{figure*}
    2135
    22 \subsection{CMDI - Production side}
     36\subsection{CMDI - DCR/CR/RR}
     37\label{dcr}
    2338
    2439The \emph{Data Category Registry} (DCR) is a central registry that enables the community to collectively define and maintain a set of relevant linguistic data categories. The resulting commonly agreed controlled vocabulary is the cornerstone for grounding the semantic interpretation within the CMD framework.
     
    2641%Next to a web interface for users to browse and manage the data categories, DCR provides a REST-style webservice allowing applications to access the information (provided in Data Category Interchange Format - DCIF). The data categories are assigned a persistent identifier, making them globally and permanently referenceable.
    2742
    28 The \emph{Component Metadata Framework} (CMD) is built on top of the DCR and complements it. While the DCR defines the atomic concepts, within CMD the metadata schemas can be constructed out of reusable components - collections of metadata fields. The components can contain other components, and they can be reused in multiple profiles as long as each field "refers via a PID to exactly one data category in the ISO DCR, thus indicating unambiguously how the content of the field in a metadata description should be interpreted" \cite{Broeder+2010}. This allows to trivially infer equivalencies between metadata fields in different CMD-based schemas. While the primary registry used in CMD is the ISOcat DCR, other authoritative sources for data categories ("trusted registries") are accepted, especially Dublin Core Metadata Initiative \cite{DCMI:2005}.
     43The \emph{Component Metadata Framework} (CMD) is built on top of the DCR and complements it. While the DCR defines the atomic concepts, within CMD the metadata schemas can be constructed out of reusable components - collections of metadata fields. The components can contain other components, and they can be reused in multiple profiles as long as each field ``refers via a PID to exactly one data category in the ISO DCR, thus indicating unambiguously how the content of the field in a metadata description should be interpreted'' \cite{Broeder+2010}. This allows to trivially infer equivalencies between metadata fields in different CMD-based schemas. While the primary registry used in CMD is the ISOcat DCR, other authoritative sources for data categories (``trusted registries'') are accepted, especially Dublin Core Metadata Initiative \cite{DCMI:2005}.
    2944% \emph{Component Registry} implements the Component Data Model and allows to define, maintain and publish CMD-components and -profiles.
    3045
     
    4459!Cf. Erhard Hinrichs 2009
    4560
    46 And a last relevant intiative to mention is that of a \texttt{Vocabulary Alignment Service} being developed and run within the Dutch program CATCH\footnote{\textit{Continuous Access To Cultural Heritage} - \url{http://www.catchplus.nl/en/}}, which serves as a neutral manager and provider of controlled vocabularies. There are plans to reuse or enhance this service for the needs of the CLARIN project.
    47 
    4861\noindent
    4962All these components are running services, that this work shall directly build upon.
     
    5366
    5467Consequently, the infrastructure also foresees a dedicated module, \emph{Semantic Mapping}, that exploits this novel mechanism to deliver correspondences between different metadata schemas. The details of its functioning and its interaction with the aforementioned modules is described in the following chapter \ref{method}.
     68
     69\subsection{Vocabulary Service / Reference Data Registry}
     70
     71\subsubsection{Motivation \& related activities in the community}
     72The urgent need for reliable community-shared registry services for concepts, controlled vocabularies and reference data for both the LRT and Digital Humanities community has been discussed on many occasions in various contexts. Applications and tasks requiring or profiting from this kind of service comprise Data-Enrichment / Annotation, Metadata Generation, Curation, Data Analysis, etc. As there is a substantial overlap in the vocabularies relevant for the various communities and even more so a high potential for reusability on the technical level, there is a strong case for tight cooperation between different initiatives.
     73
     74In the context of the CLARIN initiative, one activity to tackle this issue -- mainly driven by CLARIN-NL -- is the project/taskforce \emph{CLAVAS - Vocabulary Alignment Service for CLARIN} where the plan is to reuse and enhance for CLARIN needs a SKOS-based  vocabulary repository and editor OpenSKOS\furl{http://openskos.org}, developed and run within the dutch program CATCHplus\footnote{\textit{Continuous Access To Cultural Heritage} - \url{http://www.catchplus.nl/en/}}. See below for a more detailed description of this system. As of spring 2013, the Standing Committee on CLARIN Technical Centers (SCCTC) adopted the issue of Controlled Vocabularies and Concept Registries as one of the infrastructural (A-center) services to be dealt with.
     75
     76In parallel, within the sister ESFRI project DARIAH a taskforce with the same goal has been set up : \emph{Service for Reference Data and Controlled Vocabularies}. This taskforce was introduced at the 2nd VCC Meeting in Vienna in November 2012. It is conceived as a collaborative endeavor between VCC1/Task 5: Data federation and interoperability and VCC3/Task3: Reference Data Registries (and external partners). The main goal is to \emph{establish a service providing controlled vocabularies and reference data} for the DARIAH (and CLARIN) community.
     77
     78Regarding the responsibilities of the DARIAH working groups:
     79VCC3/Task 3 identifies and recommends vocabularies relevant for the community. VCC1/Task 5 provides basic/generic services relevant for whole community. Especially, the Schema Registry, that allows to express mappings between different schemas seems to be one starting point. In accordance with the VCC1 strategy, concentrate on pulling together (pooling) existing resources and only implement necessary ``glue'' to put the pieces together (data conversion, service-wrappers...)
     80
     81Thus there is a momentum and a high potential for a collaborative approach in at least these two big initiatives CLARIN and DARIAH, that serve a very wide-spread and diverse community.
     82
     83\subsubsection{Abstract service description}
     84As to the service itself it is primarily meant to serve other applications, rather than being used directly by end users, but a basic user interface is still necessary for administration etc.  By using global semantic identifiers instead of strings, such a service enables the harmonization of metadata descriptions and annotations and is an indispensable step towards semantic data and \xne{LOD}.
     85Besides providing vocabularies, the service should also hold and expose equivalencies (and other relationships) between concepts from different vocabularies (concept schemes). These relationships come primarily from existing mappings, but can (and hopefully will) be subsequently generated (manually) for specific subsets on demand in a community process. An example for equivalencies from Wikipedia\footnote{\href{http://de.wikipedia.org/wiki/Johann_Wolfgang_von_Goethe}{page for J. W. Goethe}}:
     86\begin{verbatim}
     87GND: 118540238 | LCCN: n79003362 | NDL: 00441109 | VIAF: 24602065 | Wikipedia-Personensuche
     88\end{verbatim}
     89
     90\subsubsection{Vocabulary Service - CLAVAS}
     91As described in previous section (\ref{dcr}), a solid pilar for defining and maintaining data categories is the ISOcat data category registry. However, while ISOcat has been in productive use for some time, it is – by design – not usable for all kinds of reference data. In general, it suits well for defining concepts/data categories (with closed or open concept domains), but its complex data model and standardization workflow does not lend itself well to maintain “semi-closed” concept domains, controlled vocabularies, like lists of entities (e.g. organizations or authors). In such cases, the concept domain is not closed (new entities need to be added), but it is also not open (not any string is a valid entity). Besides, the domain may be very large (millions of entities) and has to be presumed changing (especially new entities being added).
     92
     93This shortcoming leads to a need for an additional registry/repository service for this kind of data (controlled vocabularies). Within the CLARIN project mainly the abovementioned taskforce \emph{CLAVAS} is concerned with this challenge.
     94The foundation is the vocabulary repository and editor OpenSKOS\furl{http://openskos.org}.
     95
     96This repository can serve as a project independent manager and provider of controlled vocabularies.
     97One important feature of the OpenSKOS system is its distributed nature. It allows individual instances to synchronize the maintained vocabularies among each other via OAI-PMH protocol. This caters for a reliable redundant system, as multiple instances would provide identical synchronized data, while the primary responsibility for individual vocabularies could lie with different instances/organizations based on their specialization, field of expertise.
     98
     99Currently, the Meertens Institute\furl{http://meertens.knaw.nl/} of the Dutch Royal Academy of Sciences (KNAW), as well as Netherlands Institute for Sound and Vision\furl{http://www.beeldengeluid.nl/} are running an instance of OpenSKOS.
     100As the work on this vocabulary repository started in the context of a cultural heritage program, originally it served vocabularies not directly relevant for the LRT-community \emph{GTAA - Gemeenschappelijke Thesaurus Audiovisuele Archieven} or \emph{AAT - Art \& Architecture Thesaurus}\furl{http://openskos.org/api/collections}. As part of the process of adaptation to the needs of CLARIN and LRT-community data categories from \xne{ISOcat} have been converted into SKOS-format and ingested into the system.
     101\xne{CLARIN Centre Vienna} is also running a prototypical instance of the OpenSKOS system with ISOcat data.
     102
     103A plan has been developed/adopted to support further vocabularies relevant for the community.
     104Following are those to be handled in short-term, in order of urgency/relevance/prirority:
     105\begin{itemize}
     106\item the list of language codes\todo{url: ISO-639}
     107\item country codes
     108\item organization names for the domain of language resources
     109\end{itemize}
     110
     111See \ref{refdata} for a more complete list of required reference data together with candidate existing vocabularies
     112and \ref{dcr-skos} for discussion on mapping the information about data categories from ISOcat to \xne{SKOS}.
     113
     114\subsection{Interaction between DCR, VAS and client applications}
     115
     116
     117In my view you do that in ISOcat by binding the constrained DC to the
     118CLAVAS vocabulary, e.g., the constrained domain of /language ID/ (DC-2482)
     119could look as follows:
     120
     121I think is no need to express the relationship between this constrained DC
     122and the vocabulary in CLAVAS itself. Many DCs (or any other application
     123using CLAVAS) can refer to the same CLAVAS vocabulary.
     124
     125
     126See above for my reasoning. I don't think this information needs to be in
     127CLAVAS.
     128I do think that ISOcat, CLAVAS, RELcat, an actual language
     129resource all provide a part of the semantic network.
     130
     131And if you can express these all in RDF, which we can for almost all of them (maybe
     132except the actual language resource ... unless it has a schema adorned
     133with ISOcat DC references ... < insert a SCHEMAcat plug ;-) >, but for
     134metadata we have that in the CMDI profiles ...) you could load all the
     135relevant parts in a triple store and do your SPARQL/reasoning on it. Well
     136that's where I'm ultimately heading with all these registries related to
     137semantic interoperability ... I hope ;-)
     138
     139
     140Maybe I should add to this that I clearly see ISOcat as an user of CLAVAS,
     141i.e., for constrained DCs.
     142
     143However, ISOcat as a provider of vocabularies
     144is less clear to me. Many of the value domains are small and CLAVAS is
     145overkill.
     146
     147Where the value domains are big (ISO 639-3) or can only be
     148partially enumerated (organization names) ISOcat can't/shouldn't contain
     149the value domains but just refer to CLAVAS, i.e., ISOcat wouldn't be a
     150provider.
     151Still there are some closed DCs which might be good vocabulary
     152providers, e.g., /linguistic subject/ (DC-2527/), and still also need to
     153stay in ISOcat. I think at some point we should create a smaller set of
     154metadata DCs to be harvested by CLAVAS. Hennie and I discussed this also
     155somewhere last year ... I'll be a the Meertens on Thursday, maybe we can
     156talk it over once more.
     157
     158
     159>>
     160
     161I guess the discussion is about two different things:
     162- how to specify that the range of some metadata property consists of Concepts from a specific ConceptScheme
     163-> this can not be done in SKOS, but external schema definitions could refer to the URI of some (CLAVAS/OpenSKOS) ConceptScheme
     164- how to specifiy relations between Concepts that are in different ConceptSchemes
     165-> this can be done in SKOS using skos: exactMatch, closeMatch, broaderMatch, narrowerMatch, relatedMatch. OpenSKOS supports adding and searching these properties already, and the OpenSKOS editor also already has support for it.
     166
     167> - define them in a new clavas namespace and add the properties as a specialization to OpenSKOS, you consider them part of the vocabulary definition then
     168> --> is a bit against the OpenSKOS 'philosophy' that OpenSKOS is a platform for SKOS, by definition.
     169> - add them to your metadata schema or profile, your consider them as constraints on vocabulary usage for a given metadata field
     170> --> this would be my preference
     171> - add them to a definition in ISOcat, and let your metadata schema refer to ISOcat instead of OpenSKOS. ISOcat extends the OpenSKOS definition then.
     172> --> leads to mixing of ISOcat and OpenSKOS, in semantic and technical ways. Not my preference.
     173
     174In what I propose ISOcat constrained DCs can refer to a CLAVAS vocabulary as a way to constrain (we stretch this a bit if a vocabulary is 'open', e.g., like organization names where it provides the preferred spelling of known organizations but you still have to be able to add new organization names). In ISOcat such constraints have the same status as, for example, the data type, which is that ISOcat just provides hints it has no way to enforce this. Look at CMDI where the CMDI elements refer to a ISOcat DC via a concept link but they may have a completely different data type. In an ideal world the Component Editor would take over the data type and the CLAVAS vocabulary from the linked DC specification. This way the reference to the CLAVAS vocabulary ends up in the CMD component/profile specification and the derived XSD, and can be used by tools that support CLAVAS, e.g., Arbil (well its in the planning).
     175
     176So although ISOcat refers to CLAVAS as a hint, the metadata schema is the final one that has the real CLAVAS vocabulary reference, i.e., no reference to CLAVAS via ISOcat. Hennie, I think that still meets your preference and prevents unwanted mixing.
     177
    55178
    56179
     
    72195and \emph{Metadata Service} that provides search access to this body of data. As such, Metadata Service is the primary application to use Semantic Mapping, to optionally expand user queries before issuing a search in the Metadata Repository. \cite{Durco2011}
    73196
     197
     198\section{Content Repositories}
     199Metadata is only one aspect of the availability of resources. It is the first step to announce and describe the resources. However it is of little value, if the resources themselves are not equally well accessible. Thus another pillar of the CLARIN infrastructure are Content Repositories - centres to ensure availability of resources.
     200
     201The requirements for these repositories: PIDs, CMD, OAI-PMH
     202\todo{cite: center-B paper}
     203
     204\section{Distrbuted system - federated search}
     205
     206Metadata -> harvesting via OAI-PMH
     207but Content search has to be really distributed.
     208
     209?
     210\begin{description}
     211\item[Z39.50/SRU/SRW/CQL] LoC
     212\item[OAI-PMH]
     213\end{description}
  • SMC4LRT/chapters/SMC.tex

    r2672 r2696  
     1
    12\chapter{Semantic Mapping Component}
    23
    34
    4 
    5 \section{?? DataModel}
     5\section{Data Model}
    66
    77Terms ?
     
    1111
    1212
    13 \section{Semantic Mapping on concept level}
    14 
    15 merging the pieces of information provided by those,
    16 offering them semi-transaprently to the user (or application) on the consumption side.
    17 
    18 a module of the Component Metadata Infrastructure performing semantic mapping on search indexes. This  builds the base for query expansion to facilitate semantic search and enhance recall when querying the Metadata Repository.
     13\subsection{CMD namespace}
     14Describe the CMD-format?
     15
     16
     17\subsection{DCR in SKOS}
     18\label{dcr-skos}
     19Describe the mapping from DCR into SKOS
     20
     21DCR recognizes following types of data categories:
     22simple, complex: closed, open, constrained, (container)?
     23
     24\begin{figure*}[!ht]
     25\begin{center}
     26\includegraphics[width=0.7\textwidth]{images/dc_types}
     27\end{center}
     28\caption{Data Category types}
     29\end{figure*}
     30\todo{cite:  ISOcat introduction at CLARIN-NL Workshop}
     31
     32The export to CLAVAS-SKOS only considers/regards closed and simple DCs from the metadata profile are exported.
     33A closed DC maps to a concept scheme and a simple DC to a SKOS concept in such a concept scheme.
     34However it needs to be yet assessed how useful this approach is. In the metadata profile
     35there are many closed DCs with small value domains. How useful are those
     36in CLAVAS?
     37Originally, the vocabulary repository has been conceived to manage rather large and complex value domains,
     38that do not fit easily in the DCR data-model.
     39Therefore a threshold seems sensible, where only value domains with more
     40then 20, 50 or 100 values are exported.
     41
     42Open or constrained DCs are not exported as they don't provide anything to a vocabulary. \todo{cite: Menzo2013-03-12 mail}
     43However, they can become users of a CLAVAS vocabulary. Actually, providing vocabularies for constrained but large and complex conceptual domains is the main motivation for the vocabulary repository.
     44
     45Currently (before integration of VAS and DCR), the only possibility to constrain the value domain of a data category
     46is by the means a XML Schema provides, like a regular expression. So for the data category \concept{languageID DC-2482}
     47the rule looks like:
     48\lstset{language=XML}
     49\begin{lstlisting}
     50        <dcif:conceptualDomain type="constrained">
     51                <dcif:dataType>string</dcif:dataType>
     52                <dcif:ruleType>XML Schema regular expression</dcif:ruleType>
     53                <dcif:rule>[a-z]{3}</dcif:rule>
     54        </dcif:conceptualDomain>
     55\end{lstlisting}
     56
     57A current proposal by Windhouwer\todo{cite: Menzo2013-03-12 mail} for integration with CLAVAS foresees following extension:
     58
     59\begin{lstlisting}
     60        <clavas:vocabulary href="http://my.openskos.org/vocab/ISO-639" type="closed"/>
     61\end{lstlisting}
     62
     63\code{@href} points to the vocabulary. Actually a PID should be used in the context
     64of ISOcat, but it is not clear how persistent are the vocabularies. This may pose a problem as part of DC specification may now have a different persistency then the core.
     65
     66\code{@type} could be \code{closed} or \code{open}. \code{closed}: only values in the vocabulary are
     67valid. \code{open}: the values in the vocabulary are hints/preferred values. Basically the DC itself is then open.
     68
     69This would yield a definition of the conceptualDomain for the data category as follows:
     70 
     71\lstset{language=XML}
     72\begin{lstlisting}
     73  <dcif:conceptualDomain type="constrained">
     74     <dcif:dataType>string</dcif:dataType>
     75     <dcif:ruleType>XML Schema regular expression</dcif:ruleType>
     76     <dcif:rule>[a-z]{3}</dcif:rule>
     77  </dcif:conceptualDomain>
     78  <dcif:conceptualDomain type="constrained">
     79     <dcif:dataType>string</dcif:dataType>
     80     <dcif:ruleType>CLAVAS vocabulary</dcif:ruleType>
     81      <dcif:rule>
     82         <clavas:vocabulary href="http://my.openskos.org/vocab/ISO-639" type="closed"/>
     83      </dcif:rule>
     84  </dcif:conceptualDomain>
     85\end{lstlisting}
     86
     87I.e. the new rule pointing to the vocabulary would be \emph{added}, so that tools that don't support CLAVAS
     88lookup but are capable of XSD/RNG validation, can still use the regular expression based definition.
     89
     90 
     91\begin{note}
     92
     93\noindent
     94something similar for the link to an EBNF grammar in SCHEMAcat:
     95
     96%\begin{lstlisting}
     97\begin{verbatim}
     98      <scr:valueSchema
     99               xmlns:scr="http://www.isocat.org/ns/scr"
     100               pid="http://hdl.handle.net/1839/00-SCHM-0000-0000-004A-A"
     101               type="ISO 14977:1996 EBNF"/>
     102\end{verbatim}
     103%\end{lstlisting}
     104\end{note}
     105
    19106
    20107
     
    28115
    29116We distinguish two types of smcIndexes: (i) \emph{dcrIndex} referring to data categories and (ii) \emph{cmdIndex} denoting a specific
    30 "CMD-entity", i.e. a metadata field, component or whole profile defined within CMD. The \textit{cmdIndex} can be interpreted as a XPath into the instances of CMD-profiles. In contrast to it, the \textit{dcrIndexes} are generally not directly applicable on existing data, but can be understood as abstract indexes referring to well-defined concepts -- the data categories -- and for actual search they need to be resolved to the metadata fields they are referred by. In return one can expect to match more metadata fields from multiple profiles, all referring to the same data category.
     117``CMD-entity'', i.e. a metadata field, component or whole profile defined within CMD. The \textit{cmdIndex} can be interpreted as a XPath into the instances of CMD-profiles. In contrast to it, the \textit{dcrIndexes} are generally not directly applicable on existing data, but can be understood as abstract indexes referring to well-defined concepts -- the data categories -- and for actual search they need to be resolved to the metadata fields they are referred by. In return one can expect to match more metadata fields from multiple profiles, all referring to the same data category.
    31118
    32119These two types of smcIndex also follow different construction patterns:
     
    63150
    64151
    65 \subsection{Function}\label{method}
     152\subsection{Query language}
     153CQL?
     154
     155
     156\section{Semantic Mapping on concept level}
     157
     158merging the pieces of information provided by those,
     159offering them semi-transaprently to the user (or application) on the consumption side.
     160
     161a module of the Component Metadata Infrastructure performing semantic mapping on search indexes. This  builds the base for query expansion to facilitate semantic search and enhance recall when querying the Metadata Repository.
     162
     163
    66164In this section, we describe the actual task of the proposed application -- \textbf{mapping indexes to indexes} -- in abstract terms. The returned mappings can be used by other applications to expand or translate the original user query, to match elements in other schemas.
    67165\footnote{Though tightly related, mapping of terms and query expansion are to be seen as two separate functions.}
    68166% \footnote{This primary usage of SMC for work with user-created query strings explains the need for human-readability of the indices.}
    69167
    70 
    71 \subsubsection*{Initialization}
    72 
    73 First there is an initialization phase, in which the application fetches the information from the source modules (cf. \ref{components}). All profiles and components from the Component Registry are read and all the URIs to data categories are extracted to construct an inverted map of data categories:
    74 \newline
    75 
    76 \textit{datcatURI $\mapsto$ profile.component.element[]}
    77 \newline
    78 
    79 The collected data categories are enriched with information from corresponding registries (DCRs), adding the verbose identifier, the description and available translations into other working languages. %, usable as base for multi-lingual search user-interface.
    80 
    81 Finally relation sets defined in the Relation Registry are fetched and matched with the data categories in the map to create sets of semantically equivalent (or otherwise related) data categories.
    82 
    83 \subsubsection*{Operation}
    84168In the operation mode, the application accepts any index (\textit{smcIndex}, cf. \ref{indexes}) and returns a list of corresponding indexes (or only the input index, if no correspondences were found):
    85169\newline
     
    117201\newline
    118202
    119 (3) \emph{container data categories} -- further expansions will be possible once the container data categories \cite{SchuurmanWindhouwer2011} will be used. Currently only fields (leaf nodes) in metadata descriptions are linked to data categories. However, at times, there is a need to conceptually bind also the components, meaning that besides the "atomic" data category for \texttt{actorName, there would be also a data category for the complex concept \texttt{Actor}.}
     203(3) \emph{container data categories} -- further expansions will be possible once the container data categories \cite{SchuurmanWindhouwer2011} will be used. Currently only fields (leaf nodes) in metadata descriptions are linked to data categories. However, at times, there is a need to conceptually bind also the components, meaning that besides the ``atomic'' data category for \texttt{actorName, there would be also a data category for the complex concept \texttt{Actor}.}
    120204Having concept links also on components will require a compositional approach to the task of semantic mapping, resulting in:
    121205\newline
     
    141225\subsection{Mapping from strings to Entities}
    142226
    143 Based on the textual values in the Metadata-descriptions find matching entities in selected Ontologies.
     227Find matching entities in selected Ontologies based on the textual values in the metadata records.
     228
    144229
    145230Identify related ontologies:
     
    162247
    163248
    164 \subsection{Semantic Search}
    165 
    166 Main purpose for the undertaking described in previous two chapters (mapping of concepts and entities) is to enhance the search capabilities of the MDService serving the Metadata/Resources-data. Namely to enhance it by employing ontological resources.
    167 Mainly this enhancement shall mean, that the user can access the data indirectly by browsing one or multiple  ontologies,
    168 with which the data will then be linked. These could be for example ontologies of Organizations and Projects.
    169 
    170 In this section we want to explore, how this shall be accomplished, ie how to bring the enhanced capabilities to the user.
    171 Crucial aspect is the question how to deal with the even greater amount of information in a user-friendly way, ie how to prevent overwhelming, intimidating or frustrating the user.
    172 
    173 Semi-transparently means, that primarily the semantic mapping shall integrate seamlessly in the interaction with the service, but it shall "explain" - offer enough information - on demand, for the user to understand its role and also being able manipulate easily.
    174 
    175 ?
    176 Facets
    177 Controlled Vocabularies
    178 Synonym Expansion (via TermExtraction(ContentSet))
    179 
    180 \section{Linked Data - Express dataset in RDF}
     249\subsection{Linked Data - Express dataset in RDF}
    181250
    182251Partly as by-product of the entities-mapping effort we will get the metadata-description rendered in RDF, linked with
    183 So theoretically we then only need to provide them "on the web", to make them a nucleus of the LinkedData-Cloud.
    184 
    185 Practically this won't be that straight-forward as the mapping to entities will be a hell of a work.
    186 But once that is solved, or for the subsets that it is solved, the publication of that data on the "SemanticWeb" should be easy.
     252So theoretically we then only need to provide them ``on the web'', to make them a nucleus of the LinkedData-Cloud.
     253
    187254
    188255Technical aspects (RDF-store?) / interface (ontology browser?)
     
    201268\end{figure*}
    202269
    203 \subsection{Content/Annotation}
     270
     271\section{Semantic Search}
     272
     273Main purpose for the undertaking described in previous two chapters (mapping of concepts and entities) is to enhance the search capabilities of the MDService serving the Metadata/Resources-data. Namely to enhance it by employing ontological resources.
     274Mainly this enhancement shall mean, that the user can access the data indirectly by browsing one or multiple  ontologies,
     275with which the data will then be linked. These could be for example ontologies of Organizations and Projects.
     276
     277In this section we want to explore, how this shall be accomplished, ie how to bring the enhanced capabilities to the user.
     278Crucial aspect is the question how to deal with the even greater amount of information in a user-friendly way, ie how to prevent overwhelming, intimidating or frustrating the user.
     279
     280Semi-transparently means, that primarily the semantic mapping shall integrate seamlessly in the interaction with the service, but it shall ``explain'' - offer enough information - on demand, for the user to understand its role and also being able manipulate easily.
     281
     282?
     283Facets
     284Controlled Vocabularies
     285Synonym Expansion (via TermExtraction(ContentSet))
     286
     287\subsection{Query Expansion}
     288
     289
     290\section{Semantic Mapping in Metadata vs. Content/Annotation}
    204291AF + DCR + RR
    205292
    206293
    207 \subsection{Visualization}
    208 Landscape, Treemap, SOM
    209 
    210 Ontology Mapping and Alignement / saiks/Ontology4 4auf1.pdf
    211 
     294
     295
Note: See TracChangeset for help on using the changeset viewer.