Ignore:
Timestamp:
03/13/13 18:06:34 (11 years ago)
Author:
vronk
Message:

added notes from Menzo about DCR CLAVAS interaction

File:
1 edited

Legend:

Unmodified
Added
Removed
  • SMC4LRT/chapters/SMC.tex

    r2672 r2696  
     1
    12\chapter{Semantic Mapping Component}
    23
    34
    4 
    5 \section{?? DataModel}
     5\section{Data Model}
    66
    77Terms ?
     
    1111
    1212
    13 \section{Semantic Mapping on concept level}
    14 
    15 merging the pieces of information provided by those,
    16 offering them semi-transaprently to the user (or application) on the consumption side.
    17 
    18 a module of the Component Metadata Infrastructure performing semantic mapping on search indexes. This  builds the base for query expansion to facilitate semantic search and enhance recall when querying the Metadata Repository.
     13\subsection{CMD namespace}
     14Describe the CMD-format?
     15
     16
     17\subsection{DCR in SKOS}
     18\label{dcr-skos}
     19Describe the mapping from DCR into SKOS
     20
     21DCR recognizes following types of data categories:
     22simple, complex: closed, open, constrained, (container)?
     23
     24\begin{figure*}[!ht]
     25\begin{center}
     26\includegraphics[width=0.7\textwidth]{images/dc_types}
     27\end{center}
     28\caption{Data Category types}
     29\end{figure*}
     30\todo{cite:  ISOcat introduction at CLARIN-NL Workshop}
     31
     32The export to CLAVAS-SKOS only considers/regards closed and simple DCs from the metadata profile are exported.
     33A closed DC maps to a concept scheme and a simple DC to a SKOS concept in such a concept scheme.
     34However it needs to be yet assessed how useful this approach is. In the metadata profile
     35there are many closed DCs with small value domains. How useful are those
     36in CLAVAS?
     37Originally, the vocabulary repository has been conceived to manage rather large and complex value domains,
     38that do not fit easily in the DCR data-model.
     39Therefore a threshold seems sensible, where only value domains with more
     40then 20, 50 or 100 values are exported.
     41
     42Open or constrained DCs are not exported as they don't provide anything to a vocabulary. \todo{cite: Menzo2013-03-12 mail}
     43However, they can become users of a CLAVAS vocabulary. Actually, providing vocabularies for constrained but large and complex conceptual domains is the main motivation for the vocabulary repository.
     44
     45Currently (before integration of VAS and DCR), the only possibility to constrain the value domain of a data category
     46is by the means a XML Schema provides, like a regular expression. So for the data category \concept{languageID DC-2482}
     47the rule looks like:
     48\lstset{language=XML}
     49\begin{lstlisting}
     50        <dcif:conceptualDomain type="constrained">
     51                <dcif:dataType>string</dcif:dataType>
     52                <dcif:ruleType>XML Schema regular expression</dcif:ruleType>
     53                <dcif:rule>[a-z]{3}</dcif:rule>
     54        </dcif:conceptualDomain>
     55\end{lstlisting}
     56
     57A current proposal by Windhouwer\todo{cite: Menzo2013-03-12 mail} for integration with CLAVAS foresees following extension:
     58
     59\begin{lstlisting}
     60        <clavas:vocabulary href="http://my.openskos.org/vocab/ISO-639" type="closed"/>
     61\end{lstlisting}
     62
     63\code{@href} points to the vocabulary. Actually a PID should be used in the context
     64of ISOcat, but it is not clear how persistent are the vocabularies. This may pose a problem as part of DC specification may now have a different persistency then the core.
     65
     66\code{@type} could be \code{closed} or \code{open}. \code{closed}: only values in the vocabulary are
     67valid. \code{open}: the values in the vocabulary are hints/preferred values. Basically the DC itself is then open.
     68
     69This would yield a definition of the conceptualDomain for the data category as follows:
     70 
     71\lstset{language=XML}
     72\begin{lstlisting}
     73  <dcif:conceptualDomain type="constrained">
     74     <dcif:dataType>string</dcif:dataType>
     75     <dcif:ruleType>XML Schema regular expression</dcif:ruleType>
     76     <dcif:rule>[a-z]{3}</dcif:rule>
     77  </dcif:conceptualDomain>
     78  <dcif:conceptualDomain type="constrained">
     79     <dcif:dataType>string</dcif:dataType>
     80     <dcif:ruleType>CLAVAS vocabulary</dcif:ruleType>
     81      <dcif:rule>
     82         <clavas:vocabulary href="http://my.openskos.org/vocab/ISO-639" type="closed"/>
     83      </dcif:rule>
     84  </dcif:conceptualDomain>
     85\end{lstlisting}
     86
     87I.e. the new rule pointing to the vocabulary would be \emph{added}, so that tools that don't support CLAVAS
     88lookup but are capable of XSD/RNG validation, can still use the regular expression based definition.
     89
     90 
     91\begin{note}
     92
     93\noindent
     94something similar for the link to an EBNF grammar in SCHEMAcat:
     95
     96%\begin{lstlisting}
     97\begin{verbatim}
     98      <scr:valueSchema
     99               xmlns:scr="http://www.isocat.org/ns/scr"
     100               pid="http://hdl.handle.net/1839/00-SCHM-0000-0000-004A-A"
     101               type="ISO 14977:1996 EBNF"/>
     102\end{verbatim}
     103%\end{lstlisting}
     104\end{note}
     105
    19106
    20107
     
    28115
    29116We distinguish two types of smcIndexes: (i) \emph{dcrIndex} referring to data categories and (ii) \emph{cmdIndex} denoting a specific
    30 "CMD-entity", i.e. a metadata field, component or whole profile defined within CMD. The \textit{cmdIndex} can be interpreted as a XPath into the instances of CMD-profiles. In contrast to it, the \textit{dcrIndexes} are generally not directly applicable on existing data, but can be understood as abstract indexes referring to well-defined concepts -- the data categories -- and for actual search they need to be resolved to the metadata fields they are referred by. In return one can expect to match more metadata fields from multiple profiles, all referring to the same data category.
     117``CMD-entity'', i.e. a metadata field, component or whole profile defined within CMD. The \textit{cmdIndex} can be interpreted as a XPath into the instances of CMD-profiles. In contrast to it, the \textit{dcrIndexes} are generally not directly applicable on existing data, but can be understood as abstract indexes referring to well-defined concepts -- the data categories -- and for actual search they need to be resolved to the metadata fields they are referred by. In return one can expect to match more metadata fields from multiple profiles, all referring to the same data category.
    31118
    32119These two types of smcIndex also follow different construction patterns:
     
    63150
    64151
    65 \subsection{Function}\label{method}
     152\subsection{Query language}
     153CQL?
     154
     155
     156\section{Semantic Mapping on concept level}
     157
     158merging the pieces of information provided by those,
     159offering them semi-transaprently to the user (or application) on the consumption side.
     160
     161a module of the Component Metadata Infrastructure performing semantic mapping on search indexes. This  builds the base for query expansion to facilitate semantic search and enhance recall when querying the Metadata Repository.
     162
     163
    66164In this section, we describe the actual task of the proposed application -- \textbf{mapping indexes to indexes} -- in abstract terms. The returned mappings can be used by other applications to expand or translate the original user query, to match elements in other schemas.
    67165\footnote{Though tightly related, mapping of terms and query expansion are to be seen as two separate functions.}
    68166% \footnote{This primary usage of SMC for work with user-created query strings explains the need for human-readability of the indices.}
    69167
    70 
    71 \subsubsection*{Initialization}
    72 
    73 First there is an initialization phase, in which the application fetches the information from the source modules (cf. \ref{components}). All profiles and components from the Component Registry are read and all the URIs to data categories are extracted to construct an inverted map of data categories:
    74 \newline
    75 
    76 \textit{datcatURI $\mapsto$ profile.component.element[]}
    77 \newline
    78 
    79 The collected data categories are enriched with information from corresponding registries (DCRs), adding the verbose identifier, the description and available translations into other working languages. %, usable as base for multi-lingual search user-interface.
    80 
    81 Finally relation sets defined in the Relation Registry are fetched and matched with the data categories in the map to create sets of semantically equivalent (or otherwise related) data categories.
    82 
    83 \subsubsection*{Operation}
    84168In the operation mode, the application accepts any index (\textit{smcIndex}, cf. \ref{indexes}) and returns a list of corresponding indexes (or only the input index, if no correspondences were found):
    85169\newline
     
    117201\newline
    118202
    119 (3) \emph{container data categories} -- further expansions will be possible once the container data categories \cite{SchuurmanWindhouwer2011} will be used. Currently only fields (leaf nodes) in metadata descriptions are linked to data categories. However, at times, there is a need to conceptually bind also the components, meaning that besides the "atomic" data category for \texttt{actorName, there would be also a data category for the complex concept \texttt{Actor}.}
     203(3) \emph{container data categories} -- further expansions will be possible once the container data categories \cite{SchuurmanWindhouwer2011} will be used. Currently only fields (leaf nodes) in metadata descriptions are linked to data categories. However, at times, there is a need to conceptually bind also the components, meaning that besides the ``atomic'' data category for \texttt{actorName, there would be also a data category for the complex concept \texttt{Actor}.}
    120204Having concept links also on components will require a compositional approach to the task of semantic mapping, resulting in:
    121205\newline
     
    141225\subsection{Mapping from strings to Entities}
    142226
    143 Based on the textual values in the Metadata-descriptions find matching entities in selected Ontologies.
     227Find matching entities in selected Ontologies based on the textual values in the metadata records.
     228
    144229
    145230Identify related ontologies:
     
    162247
    163248
    164 \subsection{Semantic Search}
    165 
    166 Main purpose for the undertaking described in previous two chapters (mapping of concepts and entities) is to enhance the search capabilities of the MDService serving the Metadata/Resources-data. Namely to enhance it by employing ontological resources.
    167 Mainly this enhancement shall mean, that the user can access the data indirectly by browsing one or multiple  ontologies,
    168 with which the data will then be linked. These could be for example ontologies of Organizations and Projects.
    169 
    170 In this section we want to explore, how this shall be accomplished, ie how to bring the enhanced capabilities to the user.
    171 Crucial aspect is the question how to deal with the even greater amount of information in a user-friendly way, ie how to prevent overwhelming, intimidating or frustrating the user.
    172 
    173 Semi-transparently means, that primarily the semantic mapping shall integrate seamlessly in the interaction with the service, but it shall "explain" - offer enough information - on demand, for the user to understand its role and also being able manipulate easily.
    174 
    175 ?
    176 Facets
    177 Controlled Vocabularies
    178 Synonym Expansion (via TermExtraction(ContentSet))
    179 
    180 \section{Linked Data - Express dataset in RDF}
     249\subsection{Linked Data - Express dataset in RDF}
    181250
    182251Partly as by-product of the entities-mapping effort we will get the metadata-description rendered in RDF, linked with
    183 So theoretically we then only need to provide them "on the web", to make them a nucleus of the LinkedData-Cloud.
    184 
    185 Practically this won't be that straight-forward as the mapping to entities will be a hell of a work.
    186 But once that is solved, or for the subsets that it is solved, the publication of that data on the "SemanticWeb" should be easy.
     252So theoretically we then only need to provide them ``on the web'', to make them a nucleus of the LinkedData-Cloud.
     253
    187254
    188255Technical aspects (RDF-store?) / interface (ontology browser?)
     
    201268\end{figure*}
    202269
    203 \subsection{Content/Annotation}
     270
     271\section{Semantic Search}
     272
     273Main purpose for the undertaking described in previous two chapters (mapping of concepts and entities) is to enhance the search capabilities of the MDService serving the Metadata/Resources-data. Namely to enhance it by employing ontological resources.
     274Mainly this enhancement shall mean, that the user can access the data indirectly by browsing one or multiple  ontologies,
     275with which the data will then be linked. These could be for example ontologies of Organizations and Projects.
     276
     277In this section we want to explore, how this shall be accomplished, ie how to bring the enhanced capabilities to the user.
     278Crucial aspect is the question how to deal with the even greater amount of information in a user-friendly way, ie how to prevent overwhelming, intimidating or frustrating the user.
     279
     280Semi-transparently means, that primarily the semantic mapping shall integrate seamlessly in the interaction with the service, but it shall ``explain'' - offer enough information - on demand, for the user to understand its role and also being able manipulate easily.
     281
     282?
     283Facets
     284Controlled Vocabularies
     285Synonym Expansion (via TermExtraction(ContentSet))
     286
     287\subsection{Query Expansion}
     288
     289
     290\section{Semantic Mapping in Metadata vs. Content/Annotation}
    204291AF + DCR + RR
    205292
    206293
    207 \subsection{Visualization}
    208 Landscape, Treemap, SOM
    209 
    210 Ontology Mapping and Alignement / saiks/Ontology4 4auf1.pdf
    211 
     294
     295
Note: See TracChangeset for help on using the changeset viewer.