- Timestamp:
- 10/04/13 22:47:37 (11 years ago)
- Location:
- SMC4LRT
- Files:
-
- 9 edited
Legend:
- Unmodified
- Added
- Removed
-
SMC4LRT/Outline.tex
r3671 r3680 81 81 82 82 \input{chapters/Introduction} 83 \end{comment} 83 84 84 \input{chapters/Literature} 85 85 86 86 \input{chapters/Definitions} 87 87 \end{comment} 88 88 \input{chapters/Data} 89 89 … … 99 99 100 100 \input{chapters/Conclusion} 101 101 102 \end{comment} 102 103 … … 104 105 105 106 \bibliographystyle{ieeetr} 106 \bibliography{../../2bib/lingua,../../2bib/ontolingua,../../2bib/smc4lrt,../../2bib/semweb,../../2bib/distributed_systems,../../2bib/own, 107 \bibliography{../../2bib/lingua,../../2bib/ontolingua,../../2bib/smc4lrt,../../2bib/semweb,../../2bib/distributed_systems,../../2bib/own,../../2bib/diglib,../../2bib/it-misc} 107 108 108 109 \appendix -
SMC4LRT/chapters/Data.tex
r3671 r3680 132 132 133 133 \section{Other Metadata Formats and Collections } 134 \label{sec:lrt-md-catalogs} 134 135 135 136 … … 139 140 140 141 141 \subsection{Dublin Core metadata terms + OLAC} 142 Since 1995 143 Maintained Dublin Core Metadata Initiative 144 DC, OLAC 145 146 "Dublin" refers to Dublin, Ohio, USA where the work originated during the 1995 invitational OCLC/NCSA Metadata Workshop,[8] hosted by the Online Computer Library Center (OCLC), a library consortium based in Dublin, and the National Center for Supercomputing Applications (NCSA). 147 148 comes in two version: 15 core elements and 55 qualified terms ? 149 150 \begin{quotation} 151 Early Dublin Core workshops popularized the idea of "core metadata" for simple and generic resource descriptions. The fifteen-element "Dublin Core" achieved wide dissemination as part of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and has been ratified as IETF RFC 5013, ANSI/NISO Standard Z39.85-2007, and ISO Standard 15836:2009. 152 \end{quotation} 153 154 155 156 Given its simplicity it is used as the common denominator in many applications, among others it is the base format in the OAI-PMH protocol. 157 158 It is required/expected as the base 159 openarchives register: \url{http://www.openarchives.org/Register/BrowseSites} 160 2006 OAI-repositories 161 162 DublinCore Resource Types\furl{http://dublincore.org/documents/resource-typelist/} 163 164 DublinCore to RDF mapping\furl{http://dublincore.org/documents/dcq-rdf-xml/} 165 142 \subsection{Dublin Core metadata terms} 143 The work on this metadata format started in 1995 at Metadata Workshop\furl{http://dublincore.org/workshops/dc1/} organized by OCLC/NCSA in Dublin, Ohio, USA. Nowadays maintained by Dublin Core Metadata Initiative. 144 145 It is a fixed set of terms for a basic generic description of a range of resources (both virtual and physical) coming in two version\furl{http://dublincore.org/documents/dcmi-terms/}: 146 \begin{description} 147 \item[Dublin Core Metadata Element Set (DCMES) ] \code{/elements/1.1/} 148 the original set 15 terms, standardized as IETF RFC 5013, ISO Standard 15836-2009 and NISO Standard Z39.85-2007 149 \item[Dublin Core metadata terms ] \code{/terms/} 150 the extended `Qualified' set of 55 terms, extending the original 15 ones (replicating them in the new namespace for consistency) 151 \end{description} 152 153 Today, Dublin Core metadata terms is very widely spread. Thanks to its simplicity it is used as the common denominator in many applications, content management systems integrate Dublin Core to use in \code{meta} tags of served pages (\code{<meta name="DC.Publisher" content="publisher-name" >}), it is default minimal description in content repositories (Fedora-commons, DSpace). It is also the obligatory base format in the OAI-PMH protocol. The OpenArchives register\furl{http://www.openarchives.org/Register/BrowseSites} lists more than 2100 data providers. 154 155 There are multiple possible serializations, in particular a mapping t RDF is specified\furl{http://dublincore.org/documents/dcq-rdf-xml/}. 156 Worth noting is Dublin Core's take on classification of resources\furl{http://dublincore.org/documents/resource-typelist/}. 157 158 The simplicity of the format is also it's main drawback when considered as metadata format in the research communities. It it too general to capture all specific details, individual research groups need to describe different kinds of resources with. 159 160 \subsection{OLAC} 166 161 \label{def:OLAC} 167 162 168 \xne{OLAC Metadata}\furl{http://www.language-archives.org/}format\cite{Bird2001},OLAC \cite{Simons2003OLAC} is a more specialized version of the \xne{Dublin Core metadata terms}, adapted to the needs of the linguistic community: 163 \xne{OLAC Metadata}\furl{http://www.language-archives.org/}format \cite{Bird2001} is a application profile\cite{heery2000application}, of the \xne{Dublin Core metadata terms}, adapted to the needs of the linguistic community. It is developed and maintained by the \xne{Open Language Archives Community} providing a common platform and an infrastructure for ``creating a worldwide virtual library of language resources'' \cite{Simons2003OLAC}. 164 165 The OLAC schema \furl{http://www.language-archives.org/OLAC/1.1/olac.xsd} extends the dcterms schema mainly by adding attributes with controlled vocabularies, for domain specific semantic annotation (\code{linguistic-field, linguistic-type, language, role, discourse-type}) 169 166 170 167 \begin{quotation} … … 172 169 \end{quotation} 173 170 174 The \xne{OLAC Metadata} is the set of metadata elements archives participating in have agreed to use for describing language resources. 175 176 \todoin{check http://www.language-archives.org/OLAC/metadata.html} 177 178 OLAC Archives contain over 100,000 records, covering resources in half of the world's living languages. More statistics on coverage. 179 http://www.language-archives.org/ 180 181 Most of the OLAC records are integrated into CMDI (cf. \ref{tab:cmd-profiles}, \ref{reports:OLAC}) 182 171 \lstset{language=XML} 172 \begin{lstlisting}[label=lst:sampleolac, caption=Sample OLAC record] 173 <olac:olac> 174 <creator>Bloomfield, Leonard</creator> 175 <date>1933</date> 176 <title>Language</title> 177 <publisher>New York: Holt</publisher> 178 </olac:olac> 179 \end{ 180 181 OLAC provides a ``search over 100,000 records collected from 44 archives\furl{http://www.language-archives.org/archives}, covering resources in half of the world's living languages''. 182 183 Note, that OLAC archives are being harvested by CLARIN harvester and OLAC records are part of the CMDI joint metadata domain (cf. \ref{tab:cmd-profiles}, \ref{reports:OLAC}). 183 184 184 185 \subsection{TEI / teiHeader} … … 186 187 187 188 \begin{quotation} 188 The Text Encoding Initiative (TEI) is a consortium which collectively develops and maintains a standard for the representation of texts in digital form. 189 The Text Encoding Initiative (TEI) is a consortium which collectively develops and maintains a standard for the representation of texts in digital form.\furl{http://www.tei-c.org/} 189 190 \end{quotation} 190 191 \url{http://www.tei-c.org/} 192 193 TEI is a de-facto standard for encoding any kind of digital textual resources being developed by a large community since 1994. It defines a set of elements to annotate individual aspects of the text being encoded. For the purposes of text description, metadata encoding the complex top-level element \code{teiHeader} is foreseen. TEI is not prescriptive, but rather descriptive, it does not provide just one fixed schema, but allows for a certain flexibility wrt to elements used and inner structure, allowing to generate custom schemas adopted to projects' needs. 194 195 Thus there is also not just one fixed \xne{teiHeader}. 196 197 TEI/teiHeader/ODD, 198 199 191 encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics. 192 193 \begin{quotation} 194 The Text Encoding Initiative (TEI) is a consortium which collectively develops and maintains a standard for the representation of texts in digital form \dots [Next to] its chief deliverable is a set of Guidelines which specify encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics, \dots the Consortium provides a variety of TEI-related resources, training events and software. [abgridged] 195 \ebnd{quotation} 196 197 TEI is a de-facto standard for encoding any kind of digital textual resources being developed by a large community since 1994. It defines a set of elements to annotate individual aspects of the text being encoded. For the purposes of text description, metadata encoding (of main concern for us) the complex top-level element \code{teiHeader} is foreseen. TEI is not prescriptive, but rather descriptive, it does not provide just one fixed schema, but allows for a certain flexibility wrt to elements used and inner structure, allowing to generate custom schemas adopted to projects' needs. Thus there is also not just one fixed \code{teiHeader}. 198 199 Some of the data collections encoded in TEI are die Korpora des DWDS\furl{http://www.dwds.de}, Deutsches Textarchiv\furl{http://www.dwds.de/dta} \cite{Geyken2011deutsches}, Oxford Text Archives\furl{http://ota.oucs.ox.ac.uk/} 200 201 There has been an intense cooperation between the TEI and CMDI community on the issue of interoperability and multiple efforts to express teiHeader in CMDI were undertaken (cf. \ref{results:tei}) as a starting point for integrating TEI-based data into the CLARIN infrastructure. 200 202 201 203 \subsection{ISLE/IMDI} … … 204 206 http://www.mpi.nl/imdi/ 205 207 208 \begin{quotation} 206 209 The ISLE Meta Data Initiative (IMDI) is a proposed metadata standard to describe multi-media and multi-modal language resources. The standard provides interoperability for browsable and searchable corpus structures and resource descriptions with help of specific tools. 210 \end{quotation} 211 212 213 \subsection{LAT, TLA} 214 Language Archiving Technology, now The Language Archive - provided by Max Planck Insitute for Psycholinguistics \footnote{\url{http://www.mpi.nl/research/research-projects/language-archiving-technology}} 215 207 216 208 217 Predecessor of CMDI … … 213 222 214 223 Metadata Object Description Schema - is a schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications. 224 225 215 226 216 227 \subsection{ESE, Europeana Data Model - EDM} … … 245 256 246 257 258 259 \subsection{META-NET} 260 261 262 263 \begin{quotation} 264 META-SHARE is an open, integrated, secure and interoperable sharing and exchange facility for LRs (datasets and tools) for the Human Language Technologies domain and other applicative domains where language plays a critical role. 265 266 META-SHARE is implemented in the framework of the META-NET Network of Excellence. It is designed as a network of distributed repositories of LRs, including language data and basic language processing tools (e.g., morphological analysers, PoS taggers, speech recognisers, etc.). 267 268 \end{quotation} 269 270 The distributed networks of repositories consists of a number of member repositories, that offer their own subset of resource. 271 272 A few\footnote{7 as of 2013-07} of the members repositories play the role of managing nodes providing ``a core set of services critical to the whole of the META-SHARE network''\cite{Piperidis2012meta}, especially collecting the resource descriptions from other members and exposing the aggregated information to the users. 273 The whole network offers approximately 2.000 resources (the numbers differ even across individual managing nodes). 274 275 276 MetaShare ontology\furl{http://metashare.ilsp.gr/portal/knowledgebase/TheMetaShareOntology} 277 278 \subsection{ELRA} 279 280 European Language Resources Association\furl{http://elra.info} 281 282 \begin{quotation} 283 ELRA's missions are to promote language resources for the Human Language Technology (HLT) sector, and to evaluate language engineering technologies. To achieve these two major missions, we offer a range of services, listed below and described in the "Services around Language Resources" section: 284 \end{quotation} 285 286 http://www.elda.org/ 287 Evaluations and Language resources Distribution Agency 288 289 ELDA - Evaluations and Language resources Distribution Agency -- is ELRA's operational body, set up to identify, classify, collect, validate and produce the language resources which may be needed by the HLT -- Human Language Technology -- community. Besides, ELDA is involved in HLT evaluation campaigns. 290 291 ELDA handles the practical and legal issues related to the distribution of language resources, provides legal advice in the field of HLT, and drafts and concludes distribution agreements on behalf of ELRA. 292 293 ELRA Catalog 294 295 http://catalog.elra.info/ 296 297 298 Universal Catalog+ 299 Universal Catalogue is a repository comprising information regarding Language Resources (LRs) identified all over the world. 300 301 247 302 \subsection{Other} 248 303 … … 262 317 \item Persons - GND, VIAF 263 318 \item Organizations - GND, VIAF 264 \item Schlagw örter/Subjects - GND, LCSH319 \item Schlagw\"{o}rter/Subjects - GND, LCSH 265 320 \item Resource Typology - 266 321 \end{itemize} … … 274 329 Other related relevant activities and initiatives 275 330 331 http://www.w3.org/wiki/WebSchemas/ExternalEnumerations#Controlled_property_values 332 276 333 A broader collection of related initiatives can be found at the German National Library website: 277 334 \furl{http://www.dnb.de/DE/Standardisierung/LinksAFS/linksafs_node.html} … … 281 338 http://metadaten-twr.org/ - Technology Watch Report: Standards in Metadata and Interoperability (last entry from 2011) 282 339 At MPDL, within the escidoc publication platform there seems to be (work on) a service (since 2009 !) for controlled vocabularies: \furl{http://colab.mpdl.mpg.de/mediawiki/Control_of_Named_Entities} 283 Entity Authority Tool Set - a web application for recording, editing, using and displaying authority information about entities âdeveloped at the New Zealand Electronic Text Centre (NZETC).340 Entity Authority Tool Set - a web application for recording, editing, using and displaying authority information about entities -- developed at the New Zealand Electronic Text Centre (NZETC). 284 341 http://eats.readthedocs.org/en/latest/ 285 342 … … 303 360 304 361 \subsubsection{LT-World} 305 Regarding existing domain-specific semantic resources \texttt{LT-World}\footnote{\url{http://www.lt-world.org/}}, the ontology-based portal covering primarily Language Technology being developed at DFKI\footnote{\textit{Deutsches Forschungszentrum fÃŒr KÃŒnstliche Intelligenz} - \url{http://www.dfki.de}}, is a prominent resource providing information about the entities (Institutions, Persons, Projects, Tools, etc.) in this field of study. \cite{Joerg2010} 306 307 308 309 \section{LRT Metadata Catalogs/Collections} 310 \label{sec:lrt-md-catalogs} 311 \todoin{Overview of catalogs, name, since, \#providers, \#resources} 312 313 \todoin{[DFKI/LT-World] - collection or ontology} 314 315 \subsection{CMDI} 316 collections, profiles/Terms, ResourceTypes! 317 318 \subsection{OLAC} 319 320 \subsection{LAT, TLA} 321 Language Archiving Technology, now The Language Archive - provided by Max Planck Insitute for Psycholinguistics \footnote{\url{http://www.mpi.nl/research/research-projects/language-archiving-technology}} 322 323 \subsection{META-NET} 324 325 326 327 \begin{quotation} 328 META-SHARE is an open, integrated, secure and interoperable sharing and exchange facility for LRs (datasets and tools) for the Human Language Technologies domain and other applicative domains where language plays a critical role. 329 330 META-SHARE is implemented in the framework of the META-NET Network of Excellence. It is designed as a network of distributed repositories of LRs, including language data and basic language processing tools (e.g., morphological analysers, PoS taggers, speech recognisers, etc.). 331 332 \end{quotation} 333 334 The distributed networks of repositories consists of a number of member repositories, that offer their own subset of resource. 335 336 A few\footnote{7 as of 2013-07} of the members repositories play the role of managing nodes providing ``a core set of services critical to the whole of the META-SHARE network''\cite{Piperidis2012meta}, especially collecting the resource descriptions from other members and exposing the aggregated information to the users. 337 The whole network offers approximately 2.000 resources (the numbers differ even across individual managing nodes). 338 339 340 MetaShare ontology\furl{http://metashare.ilsp.gr/portal/knowledgebase/TheMetaShareOntology} 341 342 343 344 \subsection{ELRA} 345 346 European Language Resources Association 347 348 \furl{http://elra.info} 349 350 351 ELRAâs missions are to promote language resources for the Human Language Technology (HLT) sector, and to evaluate language engineering technologies. To achieve these two major missions, we offer a range of services, listed below and described in the "Services around Language Resources" section: 352 353 354 http://www.elda.org/ 355 Evaluations and Language resources Distribution Agency 356 357 ELDA - Evaluations and Language resources Distribution Agency â is ELRAâs operational body, set up to identify, classify, collect, validate and produce the language resources which may be needed by the HLT â Human Language Technology â community. Besides, ELDA is involved in HLT evaluation campaigns. 358 359 ELDA handles the practical and legal issues related to the distribution of language resources, provides legal advice in the field of HLT, and drafts and concludes distribution agreements on behalf of ELRA. 360 361 ELRA Catalog 362 363 http://catalog.elra.info/ 364 365 366 Universal Catalog+ 367 Universal Catalogue is a repository comprising information regarding Language Resources (LRs) identified all over the world. 368 369 370 \subsection{Other} 362 Regarding existing domain-specific semantic resources \texttt{LT-World}\footnote{\url{http://www.lt-world.org/}}, the ontology-based portal covering primarily Language Technology being developed at DFKI\footnote{Deutsches Forschungszentrum fÃŒr KÃŒnstliche Intelligenz, \url{http://www.dfki.de}}, is a prominent resource providing information about the entities (Institutions, Persons, Projects, Tools, etc.) in this field of study. \cite{Joerg2010} 363 364 371 365 372 366 … … 394 388 395 389 390 VoID "Vocabulary of Interlinked Datasets") is an RDF based schema to describe linked datasets\furl{http://semanticweb.org/wiki/VoID} 391 392 http://www.dnb.de/rdf 393 394 395 the entire WorldCat cataloging collection made publicly 396 available using Schema.org mark-up with library extensions for use by developers and 397 search partners such as Bing, Google, Yahoo! and Yandex 398 399 OCLC begins adding linked data to WorldCat by appending 400 Schema.org descriptive mark-up to WorldCat.org pages, thereby 401 making OCLC member library data available for use by intelligent 402 Web crawlers such as Google and Bing 403 404 405 \subsection{schema.org} 406 407 http://schema.org/docs/datamodel.html 408 409 microdata or 410 http://www.w3.org/TR/rdfa-lite/ 411 Resource Description Framework in attributes 412 396 413 397 414 \section{Summary} -
SMC4LRT/chapters/Definitions.tex
r3671 r3680 52 52 dcterms: & http://purl.org/dc/terms \\ 53 53 oa: & http://www.w3.org/ns/oa\# \\ 54 olac: & http://www.language-archives.org/OLAC/1.1/ \\ 54 55 ore: & http://www.openarchives.org/ore/terms/ \\ 55 56 cr: & http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/ \\ -
SMC4LRT/chapters/Design_SMCinstance.tex
r3671 r3680 306 306 \end{enumerate} 307 307 308 This task is basically an application of ontology mapping method. 309 310 We don't try to achieve complete ontology alignment, we just want to find 311 for our ``anonymous'' concepts semantically equivalent concepts from other ontologies. 312 This is very near just other phrasing for the definition of ontology mapping function as given by \cite{EhrigSure2004, amrouch2012survey}: 313 ``for each concept (node) in ontology A [tries to] find a corresponding concept 314 (node), which has the same or similar semantics, in ontology B and vice verse''. 315 316 The first two points in the above enumeration represent the steps necessary to be able to apply the ontology mapping. 317 The identification of appropriate vocabularies is discussed in the next subsection. In the operationalization, the identified vocabularies could be treated as one aggregated ontology to map all entities against. For the sake of higher precision, it may be sensible to perform the task separately for individual concepts, i.e. organisations, persons etc. and in every run consider only relevant vocabularies. 318 319 320 The transformation of the data has been partly described in previous section: 321 It can be trivially automatically converted into RDF triples as : 322 323 \begin{example3} 324 <lr1> & cmd:Organisation & "MPI" \\ 325 \end{example3} 326 327 However for the needs of the mapping task we propose to reduce and rewrite to retrieve distinct concept , value pairs: 328 329 \begin{example3} 330 \_:1 & a & cmd:Organisation;\\ 331 & skos:altLabel & "MPI"; 332 \end{example3} 333 334 \var{lookup} function is a customized version of the \var(map) function, that operates on this information pairs (concept, label). 335 336 The two steps \var{lookup} and \var{assess} correspond exactly to the two steps in \cite{jimenez2012large} in their system \xne{LogMap2}: 1) computation of mapping candidates (maximise recall) and b) assessment of the candidates (maximize precision) 337 338 308 339 \begin{figure*}[!ht] 309 340 \includegraphics[width=1\textwidth]{images/SMC_CMD2LOD} … … 373 404 374 405 406 375 407 %%%%%%%%%%%%%%%%%%%%% 376 408 \section{SMC LOD - Semantic Web Application} … … 378 410 379 411 380 381 \cite{Europeana RDF Store Report}382 383 Technical aspects (RDF-store?): Virtuoso384 385 \todocode{install Jena + fuseki}\furl{http://jena.apache.org}\furl{http://jena.apache.org/documentation/serving_data/index.html}\furl{http://csarven.ca/how-to-create-a-linked-data-site}386 387 \todocode{install older python (2.5?) to be able to install dot2tex - transforming dot files to nicer pgf formatted graphs}\furl{http://dot2tex.googlecode.com/files/dot2tex-2.8.7.zip}\furl{file:/C:/Users/m/2kb/tex/dot2tex-2.8.7/}388 389 \todocode{check install siren}\furl{http://siren.sindice.com/}390 391 392 \todocode{check/install: raptor for generating dot out of rdf}\furl{http://librdf.org/raptor/}393 394 \todocode{check/install: Linked Data browser: LoD p. 81; Haystack}\furl{http://en.wikipedia.org/wiki/Haystack_(PIM)}395 396 / interface (ontology browser?)397 398 semantic search component in the Linked Media Framework399 \todocode{!!! check install LMF - kiwi - SemanticSearch !!!}\furl{http://code.google.com/p/kiwi/wiki/SemanticSearch}400 401 \todoin{check SARQ}\furl{http://github.com/castagna/SARQ}402 403 404 \section {Full semantic search - concept-based + ontology-driven ?}405 \label{semantic-search}406 407 412 With the new enhanced dataset, as detailed in section \ref{sec:cmd2rdf}, the groundwork is laid for the full-blown semantic search as proposed in the original goals, i.e. the possibility for ontology-driven or at least `semantic resources assisted' exploration of the dataset. 408 413 … … 415 420 rechercheisidore, dbpedia, ... 416 421 422 423 \cite{Europeana RDF Store Report} 424 425 Technical aspects (RDF-store?): Virtuoso 426 427 428 semantic search component in the Linked Media Framework 429 430 \todoin{check SARQ}\furl{http://github.com/castagna/SARQ} 431 432 433 %\section {Full semantic search - concept-based + ontology-driven ?} 434 %\label{semantic-search} 435 436 417 437 \section{Summary} 438 439 %The task can be also seen as building bridge between XML resources and semantic resources expressed in RDF, OWL. 440 441 The process of expressing the whole of the data as one semantic resource, can be also understood as schema or ontology merging task. Data categories being the primary mapping elements 442 443 418 444 In this chapter, an expression of the whole of the CMD data domain into RDF was proposed, with special focus on the way how to translate the string values in metadata fields to corresponding semantic entities. Additionally, some technical considerations were discussed regarding exposing this dataset as Linked Open Data and the implications for real semantic ontology-based data exploration. 419 445 -
SMC4LRT/chapters/Design_SMCschema.tex
r3665 r3680 119 119 \caption{Attributes of \code{Term} when encoding CMD entity} 120 120 \label{table:terms-attributes-cmd} 121 \begin{tabularx}{1\textwidth}{ l | X | X } 121 \begin{tabularx}{1\textwidth}{ l | X | X } 122 %\begin{tabu}{1\textwidth}{ l | l | l } 122 123 attribute & allowed values & sample value\\ 123 124 \hline … … 689 690 Also such a visualization could feature direct search links from individual nodes into the dataset, i.e. from a profile node a link could lead into a search interface listing metadata records of given profile. 690 691 692 693 %%%%%%%%%%%%%%%%%%%%%%%%% 694 \section{Application of Schema Matching techniques in SMC} 695 \label{sec:schema-matching-app} 696 697 Even though the described module is about ``semantic mapping'', until now we did not directly make use of the traditional ontology/schema mapping/alignment methods and tools as summarized in \ref{lit:schema-matching}. This is due 698 to the fact that the in this work we can harness the mechanisms of the semantic interoperability layer built into the core of the CMD Infrastructure, which integrates the task of identifying semantic correspondences directly into the process of schema creation, 699 to a high degree obsoleting the need for a posteriori complex schema matching/mapping techniques. 700 Or put in terms of the schema matching methodology, the system relies on explicitely set concept equivalences as base for mapping between schema entities. By referencing a data category in a CMD element, the modeller binds this element to a concept, making two elements linked to the same data category trivially equivalent. 701 702 However this is only holds for schemas already created within the CMD framework (and even for these only to a certain degree, as will be explained later). Given the growing universe of definitions (data categories and components) in the CMD framework the metadata modeller could very well profit from applying schema mapping techniques as pre-processing step in the task of integrating existing external schemas into the infrastructure. (User involvement is identified by \cite{shvaiko2012ontology} as one of promising future challenges to ontology matching.) Already now, we witness a growing proliferation of components in the Component Registry and of data categories in the Data Category Registry. 703 704 Let us restate the problem of integrating existing external schemas as an application of \var{schema matching} method: 705 The data modeller starts off with existing schema \var{$S_{x}$}. The system accomodates a set of schemas\footnote{We talk of schema even though the creation (and also remodelling) takes place in the component registry by creating CMD profiles and components, because every profile has an unambiguous expression in XML Schema.} \var{$S_{1..n}$}. 706 It is very unprobable, that there is a \var{$S_{y} \in S_{1..n}$} that fully matches \var{$S_{x}$}. 707 Given the heterogenity of the schemas present in the field of research, full alignments are not achievable at all. 708 However thanks to the compositional nature of the CMD data model, data modeller can reuse just parts of any of the schemas -- the 709 components \var{c}. Thus the task is to find for every entity $e_{x} \in S_{x}$ the set of semantically equivalent candidate components $\{c_{y}\}$, which corresponds to the definitions of mapping function for single entities as defined in \cite{EhrigSure2004}. 710 Given, that the modeller does not have to reuse the components as they are, but can use existing components as base to create his own, she is helped even with candidates that are not equivalent, thus we can further relax the task and allow even candidates that are just similar to a certain degree, that can be operationalized as threshold $t$ on the output of the \var{similarity} function 711 Being only a pre-processing step meant to provide suggestions to the human modeller implies higher importance to recall than to precision. 712 713 Another requirement is that the matching entities should be maximal regarding the compositional tree: 714 715 \begin{defcap}[!ht] 716 \caption{} 717 \begin{align*} 718 & map(e_{x1}) \rightarrow c_{y1} \\ 719 & map(e_{x2}) \rightarrow c_{y2} \\ 720 & candidates := \{(e_{xa},c_{ya}) : c_{ya} \notin descendants(c_{yb}) \} 721 \end{align*} 722 \end{defcap} 723 724 Next to the usual features and measures that can be applied like label equality or string-similarity and structural equality, 725 the mapping function could be enriched with \emph{extensional} features based on the concept clusters as delivered by the crosswalk service \ref{sec:cx}. 726 727 It would be worthwhile to test, in how far the \var{smcIndex} paths as defined in \ref{def:smcIndex} could be used as feature. 728 longest matching subpath. 729 730 731 Although we examplified on the case of integration of an external schema, the described approach could be applied also to the schemas already integrated in the system. Although there is already a high baseline given thanks to the mechanisms of reuse of components and data categories, there certainly still exist semantic proximities that are not explicitly expressed by these mechanisms. This deficiency is rooted in the collaborative creation of the CMD components and profiles, where individual modellers overlooked, deliberately ignored or only partially reused existing components or profiles. This can be seen on the case of multiple teiHeader profiles, that though they are modelling the same existing metadata format, are completely disconnected in terms of components and data category reuse (cf. \ref{results:tei}). 732 733 Note, that in the case of reuse of components, in the normal scenario, the semantic equivalency is ensured even though the new component (and all its subcomponents) is a copy of the old one with new identity, because the references to data categories are copied as well, thus by default the new component shares all data categories with the original one and the modeller has to deliberately change them if required. But even with reuse of components scenarios are thinkable, in which the semantic linking gets broken, or is not established, even though semantic equivalency pervails. 734 735 The question is, what to do with the new correspondences that would possibly be determined, when, as proposed, we would apply the schema matching on the integrated schemas. One possibility is to add a data category, if one of the pair is still one missing. 736 However if both already are linked to a data category, the data category pair could be added to the relation set in Relation Registry (cf. \ref{def:rr}). 737 738 Once all the equivalencies (and other relations) between the profiles/schemas were found, simliarity ratios can be determined. 739 This new simliarity ratios could be applied as alternative weights in the just-profiles graph \ref{sec:smc-cloud}. 740 741 In contrast to the task described here, that -- restricted matching XML schemas -- can be seen as staying in the ``XML World'', 742 another aspect within this work is clearly situated in the Semantic Web world and requires application of ontology matching methods, the mapping of field values to semantic entities described in \ref{sec:values2entities}. 743 744 745 %This approach of integrating prerequisites for semantic interoperability directly into the process of metadata creation is fundamentally different from the traditional methods of schema matching that try to establish pairwise alignments between already existing schemas -- be it algorithm-based or by means of explicit manually defined crosswalks\cite{Shvaiko2005}. 746 747 691 748 \section{Summary} 692 749 In this core chapter, we layed out a design for a system dealing with concept-based crosswalks on schema level. 693 750 The system consists of three main parts: the crosswalk service, the query expansion module and \xne{SMC Browser} -- a tool for visualizing and exploring the schemas and the corresponding crosswalks. 694 751 In addition, we elaborated on the application of schema matching methods to infer mappings between schemas. 752 -
SMC4LRT/chapters/Literature.tex
r3671 r3680 48 48 Most recently, with \xne{Europeana Cloud}\furl{http://pro.europeana.eu/web/europeana-cloud} (2013 to 2015) a succession of \xne{Europeana} was established, a Best Practice Network, coordinated by The European Library, designed to establish a cloud-based system for Europeana and its aggregators, providing new content, new metadata, a new linked storage system, new tools and services for researchers and a new platform - Europeana Research. 49 49 50 A number of catalogs and formats are further described in the section \ref{sec:other-md-catalogs} 51 52 53 \section{Schema / Ontology Mapping/Matching} 54 55 Schema or ontology matching provides the methodical foundation for the problem at hand the \emph{semantic mapping}. 56 As Shvaiko\cite{shvaiko2012ontology} states ``a solution to the semantic heterogeneity problem. It finds correspondences between semantically related entities of ontologies.'' 57 58 One starting point for the plethora of work in the field of \emph{schema and ontology mapping} techniques and technology 59 is the overview of the field by Kalfoglou \cite{Kalfoglou2003}. 60 Shvaiko and Euzenat provide a summary of the key challenges\cite{Shvaiko2008} as well as a comprehensive survey of approaches for schema and ontology matching based on a proposed new classification of schema-based matching techniques\cite{Shvaiko2005_classification}. 61 Noy \cite{Noy2005_ontologyalignment,Noy2004_semanticintegration} 62 63 and more recently \cite{shvaiko2012ontology}(2012!) and \cite{amrouch2012survey} provide surveys of the methods and systems in the field. 64 65 \paragraph{Methods} 66 Semantic and extensional methods are still rarely 67 employed by the matching systems. In fact, most of 68 the approaches are quite often based only on 69 terminological and structural methods 70 71 classify, review, and experimentally compare major methods of element similarity measures and their combinations.\cite{Algergawy2010} 50 The related catalogs and formats are described in the section \ref{sec:other-md-catalogs} 51 52 53 \section{Existing crosswalks (services)} 54 55 Crosswalks as list of equivalent fields from two schemas have been around already for a long time, in the world of enterprise systems, e.g. to bridge to legacy systems and also in libraries, e.g. \emph{MARC to Dublin Core Crosswalk}\furl{http://loc.gov/marc/marc2dc.html} 56 57 \cite{Day2002crosswalks} lists a number of mappings between metadata formats. 58 59 Mostly Dublin Core and MARC family of formats 60 61 http://www.loc.gov/marc/dccross.html 62 63 64 static 65 metadata crosswalk repository 66 67 68 OCLC launched \xne{Metadata Schema Transformation Services}\furl{http://www.oclc.org/research/activities/schematrans.html?urlm=160118} 69 in particular \xne{Crosswalk Web Service}\furl{http://www.oclc.org/developer/services/metadata-crosswalk-service} 70 http://www.oclc.org/research/activities/xwalk.html 71 72 \begin{quotation} 73 a self-contained crosswalk utility that can be called by any application that must translate metadata records. In our implementation, the translation logic is executed by a dedicated XML application called the Semantic Equivalence Expression Language, or Seel, a language specification and a corresponding interpreter that transcribes the information in a crosswalk into an executable format. 74 \end{quotation} 75 76 the Crosswalk Web Service is now a production system that has been incorporated into the following OCLC products and services. 77 78 However the demo service is not available\furl{http://errol.oclc.org/schemaTrans.oclc.org.search} 79 80 81 82 Offered formats? 83 These however concentrate on the formats for the LIS community available and are ?? 84 85 For this service, a metadata format is defined as a triple of: 86 87 Standardâthe metadata standard of the record (e.g. MARC, DC, MODS, etc ...) 88 Structureâthe structure of how the metadata is expressed in the record (e.g. XML, RDF, ISO 2709, etc ...) 89 Encodingâthe character encoding of the metadata (e.g. MARC8, UTF-8, Windows 1251, etc ...) 90 91 92 Offered interface!? 93 he Crosswalk Web Service has 4 methods: 94 95 translate(...) - This method translates the records. See the documentation for more information. 96 getSupportedSourceRecordFormats() - This method returns a list of formats that are supported as input formats. 97 getSupportedTargetRecordFormats() - This method returns a list of formats that the input formats can be translated to. 98 getSupportedJavaEncodings() - Some formats will support all of the character encodings that Java supports. This function returns the list of encodings that Java supports. 99 100 101 102 \section{Schema/Ontology Mapping/Matching} 103 \label{lit:schema-matching} 104 105 As Shvaiko\cite{shvaiko2012ontology} states ``\emph{Ontology matching} is a solution to the semantic heterogeneity problem. It finds correspondences between semantically related entities of ontologies.'' 106 As such, it provides a very suitable methodical foundation for the problem at hand -- the \emph{semantic mapping}. (In sections \ref{sec:schema-matching-app} and \ref{sec:values2entities} we elaborate on the possible ways to apply these methods to the described problem.) 107 108 There is a plethora of work on methods and technology in the field of \emph{schema and ontology matching} as witnessed by a sizable number of publications providing overviews, surveys and classifications of existing work \cite{Kalfoglou2003, Shvaiko2008, Noy2005_ontologyalignment, Noy2004_semanticintegration, Shvaiko2005_classification} and most recently \cite{shvaiko2012ontology, amrouch2012survey}. 109 110 %Shvaiko and Euzenat provide a summary of the key challenges\cite{Shvaiko2008} as well as a comprehensive survey of approaches for schema and ontology matching based on a proposed new classification of schema-based matching techniques\cite{}. 111 112 Shvaiko and Euzenat also run the web page \url{http://www.ontologymatching.org/} dedicated to this topic and the related OAEI\footnote{Ontology Alignment Evalution Intiative - \url{http://oaei.ontologymatching.org/}}, an ongoing effort to evaulate alignment tools based on various alignment tasks from different domains. 113 114 Interestingly, \cite{shvaiko2012ontology} somewhat self-critically asks if after years of research``the field of ontology matching [is] still making progress?''. 115 116 \subsubsection{Method} 117 118 There are slight differences in use of the terms between \cite{EhrigSure2004, Ehrig2006}, \cite{Euzenat2007} and \cite{amrouch2012survey}, especially one has to be aware if in given context the term denotes the task in general, the process, the actual operation/function or the result of the function. 119 120 \cite{Euzenat2007} formalizes the problem as ``ontology matching operation'': 121 122 \begin{quotation} 123 The matching operation determines an alignment A' for a 124 pair of ontologies O1 and O2. Hence, given a pair of 125 ontologies (which can be very simple and contain one entity 126 each), the matching task is that of finding an alignment 127 between these ontologies. [\dots] 128 \end{quotation} 129 130 But basically the different authors broadly agree on the definition of \var{ontology alignment} in the meaning \concept{task} is ``to identify relations between individual elements of mulitple ontologies'', or as \concept{result} ``a set of correspondences between entities belonging to the matched ontologies''. 131 132 More formally \cite{Ehrig2006} formulates ontology alignment as ``a partial function based on the set \var{E} of all entities $e \in E$ and based on the set of possible ontologies \var{O}. [\dots] Once an alignment is established we say entity \var{e} is aligned with entity \var{f} when \var{align(e) \ = \ f}.'' Also, ``alignment is a one-to-one equality relation.'' (although this is relativized further in the work, and also in \cite{EhrigSure2004} ) 133 134 \begin{definition}{\var{align} function} 135 align \ : E \times O \times O \rightarrow E 136 \end{definition} 137 138 \cite{EhrigSure2004} and \cite{amrouch2012survey} instead introduce \var{ontology mapping} when applying the task on individual entities, in the meaning as a function that ``for each concept (node) in ontology A [tries to] find a corresponding concept 139 (node), which has the same or similar semantics, in ontology B and vice verse''. In the meaning as result it is ``formal expression describing a semantic relationship between two (or more) concepts belonging to two (or more) different ontologies''. 140 141 \cite{EhrigSure2004} further specify the mapping function as based on a similarity function, that for a pair of entities from two (or more) ontologies computes a ratio indicating the semantic proximity of the two entities. 142 143 \begin{defcap}[!ht] 144 \caption{\var{map} function for single entities and underlying \var{similarity} function } 145 \begin{align*} 146 & map \ : O_{i1} \rightarrow O_{i2} \\ 147 & map( e_{i_{1}j_{1}}) = e_{i_{2}j_{2}}\text{, if } sim(e_{i_{1}j_{1}},e_{i_{2}j_{2}}) \ \textgreater \ t \text{ with } t \text{ being the threshold} \\ 148 & sim \ : E \times E \times O \times O \rightarrow [0,1] 149 \end{align*} 150 \end{defcap} 151 152 This elegant abstraction introduced with the \var{similarity} function provides a general model that can accomodate a broad range of comparison relationships and corresponding similarity measures. And here, again, we encounter a broad range of possible approaches. 153 154 \cite{ehrig2004qom} lists a number of basic features and corresponsing similarity measures: 155 Starting from primitive data types, next to value equality, string similarity, edit distance or in general relative distance can be computed. 156 For concepts, next to the directly applicable unambiguous \code{sameAs} statements, label similarity can be determined (again either as string similarity, but also broaded by employing external taxonomies and other semantic resources like WordNet - \emph{extensional} methods), equal (shared) class instances, shared superclasses, subclasses, properties. 157 158 Element-level (terminological) vs structure-level (structural) \cite{Shvaiko2005_classification} 159 160 based on background knowledge... 161 162 subclassâsuperclass relationships, domains and ranges of properties, analysis of the graph structure of the ontology. 163 164 For properties the degree of the super an subproperties equality, overlapping domain and/or range. 165 Additionally to these measures applicable on individual ontology items, there are approaches (like the \var{Similarity Flooding algorithm} \cite{melnik2002similarity}) to propagate computed similarities across the graph defined by relations between entities (primarily subsumption hierarchy). 166 167 \cite{Algergawy2010} classifies, reviews, and experimentally compares major methods of element similarity measures and their combinations. \cite{shvaiko2012ontology} comparing a number of recent systems finds that ``semantic and extensional methods are still rarely employed. In fact, most of the approaches are quite often based only on terminological and structural methods. 168 169 \cite{Ehrig2006} employs this \var{similarity} function over single entities to derive the notion of \var{ontology similarity} as ``based on similarity of pairs of single entities from the different ontologies''. This is operationalized as some kind of aggregating function\cite{ehrig2004qom}, that combines all similiarity measures (mostly modulated by custom weighting) computed for pairs of single entities again into one value (from the \var{[0,1]} range) expressing the similarity ratio of the two ontologies being compared. (The employment of weights allows to apply machine learning approaches for optimization of the results.) 170 171 Thus, \var{ontology similarity} is a much weaker assertion, than \var{ontology alignment}, in fact, the computed similarity is interpreted to assert ontology alignment: the aggregated similarity above a defined threshold indicates an alignment. 172 173 174 As to the alignment process, \cite{Ehrig2006} distinguishes following steps: 175 \begin{enumerate} 176 \item Feature Engineering 177 \item Search Step Selection 178 \item Similarity Assessment 179 \item Interpretation 180 \item Iteration 181 \end{enumerate} 182 183 In contrast, \cite{jimenez2012large} in their system \xne{LogMap2} reduce the process into just two steps: computation of mapping candidates (maximise recall) and assessment of the candidates (maximize precision), that however correspond to the steps 2 and 3 of the above procedure and in fact the other steps are implicitly present in the described system. 184 72 185 73 186 \subsubsection{Systems} 74 A number of existing systems for schema/ontology matching/alignment is mentioned in this overview publications: 75 76 The majority of tools for ontology mapping use some sort of structural or 77 definitional information to discover new mappings. This information includes 78 such elements as subclassâsuperclass relationships, domains and ranges of 79 properties, analysis of the graph structure of the ontology, and so on. Some of 80 the tools in this category include 81 82 IF-Map\cite{kalfoglou2003if} 83 84 QOM\cite{ehrig2004qom}, 85 86 Similarity Flooding\cite{melnik} 87 88 the Prompt tools \cite{Noy2003_theprompt} integrating with Protege 89 90 91 \xne{COMA++} \cite{Aumueller2005} composite approach to combine different match algorithms, user interaction via graphical interface , supports W3C XML Schema and OWL. 92 93 \xne{FOAM}\cite{EhrigSure2005} 94 95 96 Ontology matching system \xne{LogMap 2} \cite{jimenez2012large} supports user interaction and implements scalable reasoning and diagnosis algorithms, which minimise any logical inconsistencies introduced by the matching process. 97 The process is divided into two main logical phases: computation of mapping candidates (maximise recall) and assessment of the candidates (maximize precision). 98 99 100 s which are at the core of the mapping task. 101 102 103 On the dedicated platform OAEI\footnote{Ontology Alignment Evalution Intiative - \url{http://oaei.ontologymatching.org/}} an ongoing effort is being carried out and documented comparing various alignment methods applied on different domains. 104 105 106 One more specific recent inspirational work is that of Noah et. al \cite{Noah2010} developing a semantic digital library for an academic institution. The scope is limited to document collections, but nevertheless many aspects seem very relevant for this work, like operating on document metadata, ontology population or sophisticated querying and searching. 107 108 Matching is laborious and error-prone process, and once ontology 109 mappings are discovered, i 110 111 \subsection{MOVEOUT: Application of Schema Matching on the CMD domain} 112 Notice, that this the semantic interoperability layer built into the core of the CMD Infrastructure, integrates the 113 task of identifying semantic correspondences directly into the process of schema creation, 114 largely removing the need for complex schema matching/mapping techniques in the post-processing. 115 However this is only holds for schemas already created within the CMD framework, 116 Given the growing universe of definitions (data categories and components) in the CMD framework the metadata modeller could very well profit from applying schema mapping techniques as pre-processing step in the task of integrating existing external schemas into the infrastructure. User involvement is identified by \cite{shvaiko2012ontology} as one of promising future challenges to ontology matching 117 118 Such a procedure pays tribute to the fact, that the mapping techniques are mostly error-prone and can deliver reliable 1:1 alignments only in trivial cases. This lies in the nature of the problem, given the heterogenity of the schemas present in the data collection, full alignments are not achievable at all, only parts of individual schemas actually semantically correspond. 119 120 Once all the equivalencies (and other relations) between the profiles/schemas were found, simliarity ratios can be determined. 121 122 The task can be also seen as building bridge between XML resources and semantic resources expressed in RDF, OWL. 123 This speaks for a tool like COMA++ supporting both W3C standards: XML Schema and OWL. 124 Concentration on existing systems with user interface? 125 126 The process of expressing the whole of the data as one semantic resource, can be also understood as schema or ontology merging task. Data categories being the primary mapping elements 127 128 129 In the end 130 It is also not the goal to merge 131 132 Being only a pre-processing step meant to provide suggestions to the human modeller implies higher importance to recall than to precision. 133 134 135 136 infrastructure un 137 138 This approach of integrating prerequisites for semantic interoperability directly into the process of metadata creation is fundamentally different from the traditional methods of schema matching that try to establish pairwise alignments between already existing schemas -- be it algorithm-based or by means of explicit manually defined crosswalks\cite{Shvaiko2005}. 139 140 141 Application of ontology/schema matching/mapping techniques 142 is reduced or outsourced 143 144 145 \subsection{Existing Crosswalk services} 146 147 148 \url{http://www.oclc.org/developer/services/metadata-crosswalk-service} 149 150 151 VoID "Vocabulary of Interlinked Datasets") is an RDF based schema to describe linked datasets\furl{http://semanticweb.org/wiki/VoID} 152 153 http://www.dnb.de/rdf 154 155 156 the entire WorldCat cataloging collection made publicly 157 available using Schema.org mark-up with library extensions for use by developers and 158 search partners such as Bing, Google, Yahoo! and Yandex 159 160 OCLC begins adding linked data to WorldCat by appending 161 Schema.org descriptive mark-up to WorldCat.org pages, thereby 162 making OCLC member library data available for use by intelligent 163 Web crawlers such as Google and Bing 187 A number of existing systems for schema/ontology matching/alignment is collected in the above-mentioned overview publications: 188 189 IF-Map \cite{kalfoglou2003if}, QOM \cite{ehrig2004qom}, \xne{FOAM} \cite{EhrigSure2005}, Similarity Flooding (SF) \cite{melnik}, S-Match \cite{Giunchiglia2007_semanticmatching}, the Prompt tools \cite{Noy2003_theprompt} integrating with Protégé or \xne{COMA++} \cite{Aumueller2005}, \xne{Chimaera}. Additionally, \cite{shvaiko2012ontology} lists and evaluates some more recent contributions: \xne{SAMBO, Falcon, RiMOM, ASMOV, Anchor-Flood, AgreementMaker}. 190 191 All of the tools use multiple methods as described in the previous section, exploiting both element as well as structural features and applying some kind of composition or aggregation of the computed atomic measures, to arrive to a alignment assertion. 192 193 Next to OWL as input format supported by all the systems some also accept XML Schemas (\xne{COMA++, SF, Cupid, SMatch}), 194 some provide a GUI (\xne{COMA++, Chimaera, PROMPT, SAMBO, AgreementMaker}). 195 196 Scalability is one factor to be considered, given that in a baseline scenario (before considering efficiency optimisations in candidate generation) the space of possible candidate mappings is the cartesian product of entities from the two ontologies being aligned. Authors of the (refurbished) ontology matching system \xne{LogMap 2} \cite{jimenez2012large} hold that it implements scalable reasoning and diagnosis algorithms, performant enough, to be integrated with the provided user interaction. 197 198 164 199 165 200 %%%%%%%%%%%%%%%%%%%%%%%%%%5 … … 168 203 Linked Data paradigm\cite{TimBL2006} for publishing data on the web is increasingly been taken up by data providers across many disciplines \cite{bizer2009linked}. \cite{HeathBizer2011} gives comprehensive overview of the principles of Linked Data with practical examples and current applications. 169 204 170 \subsection{Semantic Web - Technical solutions / Server applications} 205 \subsubsection{Semantic Web - Technical solutions / Server applications} 206 171 207 172 208 The provision of the produced semantic resources on the web requires technical solutions to store the RDF triples, query them efficiently … … 183 219 Another solution worth examining is the \xne{Linked Media Framework}\furl{http://code.google.com/p/lmf/} -- ``easy-to-setup server application that bundles together three Apache open source projects to offer some advanced services for linked media management'': publishing legacy data as linked data, semantic search by enriching data with content from the Linked Data Cloud, using SKOS thesaurus for information extraction. 184 220 221 One more specific work is that of Noah et. al \cite{Noah2010} developing a semantic digital library for an academic institution. The scope is limited to document collections, but nevertheless many aspects seem very relevant for this work, like operating on document metadata, ontology population or sophisticated querying and searching. 222 223 185 224 \begin{comment} 186 225 LDpath\furl{http://code.google.com/p/ldpath/} … … 192 231 \end{comment} 193 232 194 \subs ection{Ontology Visualization}233 \subsubsection{Ontology Visualization} 195 234 196 235 Landscape, Treemap, SOM … … 199 238 200 239 240 %%%%%%%%%%%%%%%% 201 241 \section{Language and Ontologies} 202 242 … … 245 285 and Specification of Data Categories, Data Category Registry (ISO 12620:2009 \cite{ISO12620:2009}) 246 286 247 Lexical Markup Framework LMF \cite{Francopoulo2006LMF, ISO24613:2008} defines a metamodel for representing data in lexical databases used with monolingual and multilingual computer applications .287 Lexical Markup Framework LMF \cite{Francopoulo2006LMF, ISO24613:2008} defines a metamodel for representing data in lexical databases used with monolingual and multilingual computer applications, provides a RDF serialization (?!?!). 248 288 249 289 An overview of current developments in application of the linked data paradigm for linguistic data collections was given at the workshop Linked Data in Linguistics\furl{http://ldl2012.lod2.eu/} 2012 \cite{ldl2012}. -
SMC4LRT/chapters/Results.tex
r3671 r3680 178 178 179 179 \subsubsection{teiHeader} 180 180 \label{results:tei} 181 181 TEI is a de-facto standard for encoding any kind of textual resources. It defines a set of elements to annotate individual aspects of the text being encoded. For the purposes of text description / metadata the complex element \code{teiHeader} is foreseen. 182 182 TEI does not provide just one fixed schema, but allows for a certain flexibility wrt to elements used and inner structure, allowing to generate custom schemas adopted to projects' needs. \ref{def:tei}. … … 281 281 %%%%%%%%%%%%%%%%%%%%%%% 282 282 \subsection{SMC cloud} 283 \label{sec:smc-cloud} 283 284 As a latest, still experimental, addition, SMC browser provides a special type of graph, that displays only profiles. The links between them reflect the reuse of components and data categories (i.e. how many components or data categories do the linked pairs of profiles share), indicating the degree of similarity or semantic proximity. Figure \ref{fig:SMC_cloud} depicts one possible output of the graph 284 285 covering a large part of the defined profiles. It shows nicely the clusters of strongly related profiles in contrast to the greater distances between more loosely connected profiles. -
SMC4LRT/utils.tex
r3666 r3680 19 19 \usepackage{framed} 20 20 \usepackage{amsmath} 21 %\usepackage{tabularx}21 \usepackage{tabularx} 22 22 \usepackage{tabu} 23 23
Note: See TracChangeset
for help on using the changeset viewer.