Changeset 3671 for SMC4LRT


Ignore:
Timestamp:
10/03/13 22:36:46 (11 years ago)
Author:
vronk
Message:
 
Location:
SMC4LRT
Files:
7 edited

Legend:

Unmodified
Added
Removed
  • SMC4LRT/Outline.tex

    r3666 r3671  
    8181
    8282\input{chapters/Introduction}
     83\end{comment}
     84\input{chapters/Literature}
    8385
    84 \input{chapters/Literature}
    85 \end{comment}
     86\input{chapters/Definitions}
     87
     88\input{chapters/Data}
     89
    8690\begin{comment}
    87 \input{chapters/Definitions}
    88 \input{chapters/Data}
    8991
    9092\input{chapters/Infrastructure}
     
    9496
    9597\input{chapters/Design_SMCinstance}
    96 \end{comment}
    9798\input{chapters/Results}
    9899
    99100\input{chapters/Conclusion}
     101\end{comment}
    100102
    101103
    102104
    103105\bibliographystyle{ieeetr}
    104 \bibliography{../../2bib/lingua,../../2bib/ontolingua,../../2bib/smc4lrt,../../2bib/semweb,../../2bib/distributed_systems,../../2bib/own}
     106\bibliography{../../2bib/lingua,../../2bib/ontolingua,../../2bib/smc4lrt,../../2bib/semweb,../../2bib/distributed_systems,../../2bib/own, ../../2bib/diglib,../../2bib/it-misc}
    105107
    106108\appendix
    107109
    108 \input{chapters/appendix}
     110%\input{chapters/appendix}
    109111
    110112
  • SMC4LRT/chapters/Data.tex

    r3665 r3671  
    22\chapter{Analysis of the data landscape}
    33\label{ch:data}
    4 This section gives an overview of existing standards and formats for metadata and content annotations in the field of Language Resources and Technology together with a description of their characteristics and their respective usage in the projects and initiatives.
    5 
    6 
    7 \section{Metadata Formats}
    8 
    9 
    10 \subsection{Component Metadata Framework}
     4This section gives an overview of existing standards and formats for metadata in the field of Language Resources and Technology together with a description of their characteristics and their respective usage in the initiatives and data collections. Special attention is paid to the Component Metadata Framework representing the base data model for the infrastructure this work is part of.
     5
     6
     7\section{Component Metadata Framework}
    118\label{def:CMD}
    129
     
    1613indicating unambiguously how the content of the field in a metadata description should be interpreted'' \cite{Broeder+2010}.
    1714
    18 This approach of integrating prerequisites for semantic interoperability directly into the process of metadata creation is fundamentally different from the traditional methods of schema matching that try to establish pairwise alignments between already existing schemas -- be it algorithm-based or by means of explicit manually defined crosswalks\cite{Shvaiko2005}.
     15%This approach of integrating prerequisites for semantic interoperability directly into the process of metadata creation is fundamentally different from the traditional methods of schema matching that try to establish pairwise alignments between already existing schemas -- be it algorithm-based or by means of explicit manually defined crosswalks\cite{Shvaiko2005}.
    1916
    2017While the primary registry for data categories used in CMD is the \xne{ISOcat} Data Category Registry (cf. \ref{def:DCR}), other authoritative sources are accepted (so-called ``trusted registries''), especially the set of terms maintained by the Dublin Core Metadata Initiative \cite{DCMI:2005}.
     
    2421
    2522
    26 \subsubsection{CMD Profiles }
     23\subsection{CMD Profiles }
    2724In the CR 124\footnote{All numbers are as of 2013-06 if not stated otherwise} public Profiles and 696 Components are defined. Table \ref{table:dev_profiles} shows the development of the CR and DCR population over time.
    2825
     
    5552
    5653
    57 \subsubsection{Instance Data}
    58 
     54\subsection{Instance Data}
    5955
    6056%\todoin{ add historical perspective on data - list overall}
     
    6763
    6864\begin{table}
    69 \caption{Top 20 profiles, with the respective number of records}
     65\caption{Top 20 CMD profiles, with the respective number of records}
     66\label{tab:cmd-profiles}
    7067\begin{center}
    7168  \begin{tabular}{ r l }
     
    9996
    10097\begin{table}
    101 \caption{Top 20 collections, with the respective number of records}
     98\caption{Top 20 CMD collections, with the respective number of records}
    10299\begin{center}
    103100  \begin{tabular}{ r l }
     
    133130
    134131
    135 \subsection{Dublin Core + OLAC}
    136 
     132
     133\section{Other Metadata Formats and Collections }
     134
     135
     136Riley and Becker \cite{Riley2010seeing} put the overwhelming amount of existing metadata standards into a systematic comprehensive overview analyzing the use of standards from four aspects: community, domain, function, and purpose. Despite its aspiration on comprehensiveness it leaves out some of the formats relevant in the context of this work: IMDI, EDM, ESE, TEI?
     137
     138The CLARIN deliverable \textit{Interoperability and Standards} \cite{CLARIN_D5.C-3} provides overview of standards, vocabularies and other normative/standardization work in the field of Language Resources and Technology.
     139
     140
     141\subsection{Dublin Core metadata terms + OLAC}
     142Since 1995
     143Maintained Dublin Core Metadata Initiative
    137144DC, OLAC
    138145
     146"Dublin" refers to Dublin, Ohio, USA where the work originated during the 1995 invitational OCLC/NCSA Metadata Workshop,[8] hosted by the Online Computer Library Center (OCLC), a library consortium based in Dublin, and the National Center for Supercomputing Applications (NCSA).
     147
     148comes in two version: 15 core elements  and 55 qualified terms ?
     149
     150\begin{quotation}
     151Early Dublin Core workshops popularized the idea of "core metadata" for simple and generic resource descriptions. The fifteen-element "Dublin Core" achieved wide dissemination as part of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and has been ratified as IETF RFC 5013, ANSI/NISO Standard Z39.85-2007, and ISO Standard 15836:2009.
     152\end{quotation}
     153
     154
     155
     156Given its simplicity it is used as the common denominator in many applications, among others it is the base format in the OAI-PMH protocol.
     157
     158It is required/expected as the base
    139159openarchives register: \url{http://www.openarchives.org/Register/BrowseSites}
    140160 2006 OAI-repositories
     
    145165
    146166\label{def:OLAC}
    147 A more specific version of the dublincore terms, adapted to the needs of the linguistic community is the
    148 OLAC\furl{http://www.language-archives.org/}format\cite{Bird2001}
    149 
    150 OLAC \cite{Simons2003OLAC}.
     167
     168\xne{OLAC Metadata}\furl{http://www.language-archives.org/}format\cite{Bird2001},OLAC \cite{Simons2003OLAC} is a more specialized version of the \xne{Dublin Core metadata terms}, adapted to the needs of the linguistic community:
     169
     170\begin{quotation}
     171 Uniform description across archives is ensured by limiting the values of certain metadata elements to the use of terms from agreed-upon controlled vocabularies. [\dots] OLAC adds encoding schemes that are designed specifically for describing language resources, such as subject language and linguistic data type.
     172\end{quotation}
     173
     174The \xne{OLAC Metadata} is the set of metadata elements archives participating in have agreed to use for describing language resources.
    151175
    152176\todoin{check http://www.language-archives.org/OLAC/metadata.html}
    153177
     178 OLAC Archives contain over 100,000 records, covering resources in half of the world's living languages. More statistics on coverage.
     179http://www.language-archives.org/
     180
     181Most of the OLAC records are integrated into CMDI (cf. \ref{tab:cmd-profiles}, \ref{reports:OLAC})
     182
     183
     184\subsection{TEI / teiHeader}
     185\label{def:tei}
     186
    154187\begin{quotation}
    155 The OLAC metadata set is the set of metadata elements that participating archives have agreed to use for describing language resources. Uniform description across archives is ensured by limiting the values of certain metadata elements to the use of terms from agreed-upon controlled vocabularies. The OLAC metadata set is equally applicable whether the resources are available online or not. The metadata set consists of the fifteen elements of the Dublin Core Metadata Set, plus the refinements and encoding schemes of the DCMI Metadata Terms—a widely accepted standard for describing resources of all types. To this general standard, OLAC adds encoding schemes that are designed specifically for describing language resources, such as subject language and linguistic data type. The OLAC Metadata Usage Guidelines describe (with examples) all the elements, refinements, and encoding schemes that may be used in OLAC metadata descriptions. The OLAC Metadata standard defines the XML format that is used for the interchange of metadata descriptions among participating archives.
     188The Text Encoding Initiative (TEI) is a consortium which collectively develops and maintains a standard for the representation of texts in digital form.
    156189\end{quotation}
    157190
    158 
    159 
    160 
    161 \subsection{TEI / teiHeader}
    162 \label{tei}
     191\url{http://www.tei-c.org/}
     192
     193TEI is a de-facto standard for encoding any kind of digital textual resources being developed by a large community since 1994. It defines a set of elements to annotate individual aspects of the text being encoded. For the purposes of text description, metadata encoding the complex top-level element \code{teiHeader} is foreseen. TEI is not prescriptive, but rather descriptive, it does not provide just one fixed schema, but allows for a certain flexibility wrt to elements used and inner structure, allowing to generate custom schemas adopted to projects' needs.
     194
     195Thus there is also not just one fixed \xne{teiHeader}.
    163196
    164197 TEI/teiHeader/ODD,
    165198
    166199
     200
    167201\subsection{ISLE/IMDI}
    168  
     202
     203IMDI = ISLE Metadata 
     204http://www.mpi.nl/imdi/
     205
     206The ISLE Meta Data Initiative (IMDI) is a proposed metadata standard to describe multi-media and multi-modal language resources. The standard provides interoperability for browsable and searchable corpus structures and resource descriptions with help of specific tools.
     207
     208Predecessor of CMDI
     209
    169210\subsection{MODS/METS}
    170211
    171 \subsection{Europeana Data Model - EDM}
     212Metadata Encoding and Transmission Standard - an XML schema for encoding descriptive, administrative, and structural metadata regarding objects within a digital library
     213
     214Metadata Object Description Schema - is a schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications.
     215
     216\subsection{ESE, Europeana Data Model - EDM}
     217
     218ESE Europeana Semantic Elements-
     219
     220EDM\furl{http://europeana.ontotext.com/resource/edm/hasType?role=all} \cite{doerr2010europeana}
     221
     222
     223he Linked Data approach will play a major role in the European Digital Library (
     224http://europeana.eu
     225)
     226and solutions that can handle data expressed in the newly created, RDF-based
     227Europeana Data Model
     228(EDM)
     229are currently being investigated. This report summarizes the results of a study we performed on existing
     230RDF stores, in the context of Europeana and encompasses the following contributions
     231
     232
     233data.europeana.eu: The Europeana Linked Open Data Pilot\cite{haslhofer2011data}
    172234
    173235\subsection{META-SHARE}
    174 META-SHARE is another multinational project aiming to build an infrastructure for language resource\cite{Piperidis2012meta}, however focusing more on Human Language Technologies domain.\furl{http://meta-share.eu}
    175 
    176 \begin{quotation}
    177 META-NET is designing and implementing META-SHARE, a sustainable network of repositories of language data, tools and related web services documented with high-quality metadata, aggregated in central inventories allowing for uniform search and access to resources. Data and tools can be both open and with restricted access rights, free and for-a-fee. META-SHARE targets existing but also new and emerging language data, tools and systems required for building and evaluating new technologies, products and services.
    178 \end{quotation}
    179 
    180 \begin{quotation}
    181 META-SHARE is an open, integrated, secure and interoperable sharing and exchange facility for LRs (datasets and tools) for the Human Language Technologies domain and other applicative domains where language plays a critical role.
    182 
    183 META-SHARE is implemented in the framework of the META-NET Network of Excellence. It is designed as a network of distributed repositories of LRs, including language data and basic language processing tools (e.g., morphological analysers, PoS taggers, speech recognisers, etc.).
    184 
    185 \end{quotation}
    186 
    187 The distributed networks of repositories consists of a number of member repositories, that offer their own subset of resource.
    188 
    189 A few\footnote{7 as of 2013-07} of the members repositories play the role of managing nodes providing ``a core set of services critical to the whole of the META-SHARE network''\cite{Piperidis2012meta}, especially collecting the resource descriptions from other members and exposing the aggregated information to the users.
    190 The whole network offers approximately 2.000 resources (the numbers differ even across individual managing nodes).
    191 
     236\label{def:META-SHARE}
     237Within the project META-SHARE format
     238
     239META-SHARE created a new metadata model \cite{Gavrilidou2012meta}. Although inspired by the Component Metadata, META-SHARE metadata imposes a single large schema for all resource types with a minimal core subset of obligatory metadata elements and with many optional components.
     240%In cooperation between metadata teams from CLARIN and META-SHARE
     241
     242The original META-SHARE schema actually accomodates four models for different resource types. Consequently, the model has been expressed as 4 CMD profiles each for a distinct resource type however all four sharing most of the components, as can be seen in figure \ref{fig:resource_info_5}. The biggest single profile is currently the remodelled maximum schema from the META-SHARE project for describing corpora, with 117 distinct components and 337 elements. When expanded, this translates to 419 components and 1587 elements. However, many of the components and elements are optional (and conditional), thus a specific instance will never use all the possible elements.
    192243
    193244MetaShare ontology\furl{http://metashare.ilsp.gr/portal/knowledgebase/TheMetaShareOntology}
     
    197248
    198249OAI-ORE - is this a schema?
    199 
    200 
    201 
    202 \section{Content/Annotation Formats}
    203 
    204 CHILDES, TEI, EAF!
    205 (CES/XCES)
    206 Open Annotation Collaboration (OAC)\footnote{\url{http://openannotation.org/}}
    207 
    208 [LAF] Linguistic Annotation Framework
    209250
    210251
     
    235276A broader collection of related initiatives can be found at the German National Library website:
    236277\furl{http://www.dnb.de/DE/Standardisierung/LinksAFS/linksafs_node.html}
    237 FRBR - Functional Requirements for Bibliographic Records
     278FRBR - Functional Requirements for Bibliographic Records  2002 \cite{FRBR1998}
     279
    238280RDA - Resource Description and Access
    239281http://metadaten-twr.org/ - Technology Watch Report: Standards in Metadata and Interoperability (last entry from 2011)
     
    252294
    253295\subsection{Other controlled Vocabularies}
    254 Tagsets: STTS
     296
    255297Language codes ISO-639-1
    256298
     
    260302
    261303
     304\subsubsection{LT-World}
     305Regarding existing domain-specific semantic resources \texttt{LT-World}\footnote{\url{http://www.lt-world.org/}},  the ontology-based portal covering primarily Language Technology being developed at DFKI\footnote{\textit{Deutsches Forschungszentrum fÃŒr KÃŒnstliche Intelligenz} - \url{http://www.dfki.de}},  is a prominent resource providing information about the entities (Institutions, Persons, Projects, Tools, etc.) in this field of study. \cite{Joerg2010}
     306
     307
    262308
    263309\section{LRT Metadata Catalogs/Collections}
     
    278324
    279325
     326
     327\begin{quotation}
     328META-SHARE is an open, integrated, secure and interoperable sharing and exchange facility for LRs (datasets and tools) for the Human Language Technologies domain and other applicative domains where language plays a critical role.
     329
     330META-SHARE is implemented in the framework of the META-NET Network of Excellence. It is designed as a network of distributed repositories of LRs, including language data and basic language processing tools (e.g., morphological analysers, PoS taggers, speech recognisers, etc.).
     331
     332\end{quotation}
     333
     334The distributed networks of repositories consists of a number of member repositories, that offer their own subset of resource.
     335
     336A few\footnote{7 as of 2013-07} of the members repositories play the role of managing nodes providing ``a core set of services critical to the whole of the META-SHARE network''\cite{Piperidis2012meta}, especially collecting the resource descriptions from other members and exposing the aggregated information to the users.
     337The whole network offers approximately 2.000 resources (the numbers differ even across individual managing nodes).
     338
     339
     340MetaShare ontology\furl{http://metashare.ilsp.gr/portal/knowledgebase/TheMetaShareOntology}
     341
     342
     343
    280344\subsection{ELRA}
    281345
     346European Language Resources Association
     347
     348\furl{http://elra.info}
     349
     350
     351ELRA’s missions are to promote language resources for the Human Language Technology (HLT) sector, and to evaluate language engineering technologies. To achieve these two major missions, we offer a range of services, listed below and described in the "Services around Language Resources" section:
     352
     353
     354http://www.elda.org/
     355Evaluations and Language resources Distribution Agency
     356
     357ELDA - Evaluations and Language resources Distribution Agency – is ELRA’s operational body, set up to identify, classify, collect, validate and produce the language resources which may be needed by the HLT – Human Language Technology – community. Besides, ELDA is involved in HLT evaluation campaigns.
     358
     359ELDA handles the practical and legal issues related to the distribution of language resources, provides legal advice in the field of HLT, and drafts and concludes distribution agreements on behalf of ELRA.
     360
     361ELRA Catalog
     362
     363http://catalog.elra.info/
     364
     365
     366Universal Catalog+
     367 Universal Catalogue is a repository comprising information regarding Language Resources (LRs) identified all over the world.
     368
     369
    282370\subsection{Other}
    283371
    284372
    285373\begin{description}
    286 \item[LDC]  Linguistic Data Consortium 
     374\item[LDC]  Linguistic Data Consortium\furl{http://www.ldc.upenn.edu/}
    287375\item[OTA LR] Archiving Service provided by Oxford Text Archive \url{http://ota.oucs.ox.ac.uk/}
    288376\end{description}
     
    309397\section{Summary}
    310398
    311 In this chapter, we gave an overview of the existing formats and dataset in the broad context of Language Resources and Technology
    312 
     399In this chapter, we gave an overview of the existing formats and datasets in the broad context of Language Resources and Technology
     400
  • SMC4LRT/chapters/Definitions.tex

    r3665 r3671  
    4848dcr:& http://isocat.org/ns/dcr.rdf\#  \\
    4949cmd: & http://clarin.eu/cmd/1.0\# \\
    50 cmds:    & ? \\
     50cmds: & https://infra.clarin.eu/cmd/general-component-schema.xsd \\
    5151dce: & http://purl.org/dc/elements/1.1/ \\
    5252dcterms: & http://purl.org/dc/terms \\
  • SMC4LRT/chapters/Design_SMCinstance.tex

    r3665 r3671  
    377377\label{sec:lod}
    378378
    379 \todoin{read: Europeana RDF Store Report}
     379
     380
     381\cite{Europeana RDF Store Report}
    380382
    381383Technical aspects (RDF-store?): Virtuoso
  • SMC4LRT/chapters/Infrastructure.tex

    r3665 r3671  
    359359%%%%%%%%%%%%%%%%%
    360360\section{Other aspects of the infrastructure}
    361 While this work concentrates solely on the metadata, it needs to be recognized, that it is only aspect of the infrastructure and its actual purpose the availability of resources. Metadata is a necessary first step to announce and describe the resources. However it is of little value, if the resources themselves are not accessible.
    362 
    363 Consequently, another pillar of the CLARIN infrastructure are the centres\furl{http://www.clarin.eu/node/3812}:
     361While this work concentrates solely on the metadata, it needs to be recognized, that it is only aspect of the infrastructure and its actual purpose the availability of resources. Metadata is a necessary first step to announce and describe the resources. However it is of little value, if the resources themselves are not accessible. Consequently, another pillar of the CLARIN infrastructure are the centres\furl{http://www.clarin.eu/node/3812}:
     362
    364363\begin{quotation}
    365364CLARIN's distributed network is made out of centres. These units, often a university or an academic institute, offer the scientific community access to services on a sustainable basis.
     
    369368CLARIN also maintains a central registry, the \xne{Centre Registry}\furl{https://centerregistry-clarin.esc.rzg.mpg.de/}, maintaining structured information about every centre, meant as primary entry point into the CLARIN network of centres.
    370369
    371 One core service of such centres are the content repositories, systems meant for long-term preservation and publication of research data and resources.
     370One core service of such centres are the content repositories, systems meant for long-term preservation and publication of research data and resources. A number of centres have been identified that provide Depositing Services\furl{http://clarin.eu/3773}, i.e. allow third parties researchers (not just the home users) to store research data.
     371
     372In the following a few further well established repositories are mentioned.
     373
     374\begin{description}
     375\item[PHAIDRA] Permanent Hosting, Archiving and Indexing of Digital Resources and Assets, provided by Vienna University \footnote{\url{https://phaidra.univie.ac.at/}}
     376\item[eSciDoc]  provided by MPG + FIZ Karlsruhe \footnote{\url{https://www.escidoc.org/}}
     377\item[TextGrid] \furl{http:/textgrid.de}
     378\item[DRIVER] pan-European infrastructure of Digital Repositories \footnote{\url{http://www.driver-repository.eu/}}
     379\item[OpenAIRE] - Open Acces Infrastructure for Research in Europe \footnote{\url{http://www.openaire.eu/}}
     380\end{description}
    372381
    373382
     
    376385\includegraphics[width=0.7\textwidth]{images/FCS_components.png}
    377386\end{center}
    378 \caption{components of the Federated Content Search}
     387\caption{Components of the Federated Content Search and their interdependencies}
    379388\label{fig:fcs}
    380389\end{figure*}
  • SMC4LRT/chapters/Literature.tex

    r3638 r3671  
    44%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    55
    6 This work is guided by two main dimensions: the \textbf{data} -- in broad, Language Resource and Technology  -- and the \textbf{method} -- Schema matching and Semantic Web technologies. This division is reflected in the following chapter:
    7 
    8 \section{(Infrastructure for) Language Resources and Technology}
    9 In recent years, multiple large-scale initiatives have been set out to combat the fragmented nature of the language resources landscape in general and the metadata interoperability problems in particular.
    10 
    11 The CLARIN project also delivers a valuable source of information on the normative resources in the domain in its current deliverable on \textit{Interoperability and Standards} \cite{CLARIN_D5.C-3}. Next to covering ontologies as one type of resources this document offers an exhaustive collection of references to standards, vocabularies and other normative/standardization work in the field of Language Resources and Technology.
    12 
    13 Regarding existing domain-specific semantic resources \texttt{LT-World}\footnote{\url{http://www.lt-world.org/}},  the ontology-based portal covering primarily Language Technology being developed at DFKI\footnote{\textit{Deutsches Forschungszentrum fÃŒr KÃŒnstliche Intelligenz} - \url{http://www.dfki.de}},  is a prominent resource providing information about the entities (Institutions, Persons, Projects, Tools, etc.) in this field of study. \cite{Joerg2010}
    14 Chapter \ref{ch:data} examines the field of LRT in more detail.
    15 
    16 
    17 \subsection{Metadata}
    18 A comprehensive architecture for harmonized handling of metadata -- the Component Metadata Infrastructure (CMDI)\furl{http://www.clarin.eu/cmdi} \cite{Broeder2011} -- is being implemented within the CLARIN project\footnote{\url{http://clarin.eu}}. This service-oriented architecture consisting of a number of interacting software modules allows metadata creation and provision based on a flexible meta model, the \emph{Component Metadata Framework}, that facilitates creation of customized metadata schemas -- acknowledging that no one metadata schema can cover the large variety of language resources and usage scenarios -- however at the same time equipped with well-defined methods to ground their semantic interpretation in a community-wide controlled vocabulary -- the data category registry \cite{Kemps-Snijders+2009,Broeder2010}.
    19 
    20 Individual components of this infrastructure will be described in more detail in the section \ref{ch:infra}.
    21 
    22 A number of solution evolved in the recent years.
    23 The first to undertake standardization efforts for the exchange of catalog information were digital libraries.
    24 
    25 Z39.50 as base protocol, Worldcat, mapping/configuration files.
    26 These catalogs are further described in the section \ref{sec:other-md-catalogs}
    27 
    28 In the recent years the evolving research infrastructures all identified a common/harmonized search as a crucial component of the system and came up with a number of solutions, however often reduced to collecting metadata, reducing to dublincore
    29 and offering a lucene/solr based facetted search.
    30 These catalogs are further described in the section \ref{sec:lrt-md-catalogs}.
    31 
    32 Riley and Becker \cite{Riley2010seeing} put the overwhelming amount of existing metadata standards into a systematic comprehensive overview analyzing the use of standards from four aspects: community, domain, function, and purpose.
    33 
    34 \subsection{Content Repositories}
    35 Metadata is only one aspect of the availability of resources. It is the first step to announce and describe the resources. However it is of little value, if the resources themselves are not equally well accessible. Thus another pillar of the CLARIN infrastructure are Content Repositories - centres to ensure availability of resources.
    36 In the following a few well established repositories are mentioned and described, as well as some of the new repositories being set up in the context of CLARIN.
    37 
    38 \begin{description}
    39 \item[PHAIDRA] Permanent Hosting, Archiving and Indexing of Digital Resources and Assets, provided by Vienna University \footnote{\url{https://phaidra.univie.ac.at/}}
    40 \item[eSciDoc]  provided by MPG + FIZ Karlsruhe \footnote{\url{https://www.escidoc.org/}}
    41 \item[TextGrid] \todocode{install: TextGrid2 - check: TG-search}\furl{http:/textgrid.de}
    42 \item[DRIVER] pan-European infrastructure of Digital Repositories \footnote{\url{http://www.driver-repository.eu/}}
    43 \item[OpenAIRE] - Open Acces Infrastructure for Research in Europe \footnote{\url{http://www.openaire.eu/}}
    44 \end{description}
    45 
    46 \subsection{Content/Corpus Search}
    47 Corpus Search Systems
    48 \begin{description}
    49 \item[DDC]  - text-corpus
    50 \item[manatee] - text-corpus
    51 \item[CQP] - text-corps
    52 \item[TROVA] - MM annotated resources
    53 \item[ELAN] - MM annotated resources (editor + search)
    54 \end{description}
    55 
    56 \subsection{FederatedSearch}
    57 \todoask{How to relate Federated Search to SMC? }
    58 
    59 
    60 \section{Semantic Web}
    61 
    62 \todoin{cite TimBL}
    63 
    64 \begin{description}
    65 \item[RDF/OWL]
    66 \item[SKOS]
    67 \end{description}
    68 
    69 
    70 \subsection{Linked Open Data}
    71 As described previously, one outcome of the work will be the dataset expressed in RDF interlinked with other semantic resources.
    72 This is very much in line with the broad \textit{Linked Open Data} effort as proposed by Berners-Lee \cite{TimBL2006} and being pursuit across many discplines. (This topic is supported also by the EU Commission within the FP7.\footnote{\url{http://cordis.europa.eu/fetch?CALLER=PROJ\_ICT&ACTION=D&CAT=PROJ&RCN=95562}}) A very recent comprehensive overview of the principles of Linked Data and current applications is the book by Heath and Bizer \cite{HeathBizer2011}, that shall serve as a practical guide for this specific task.
    73 
    74 Formate:
    75 Turtle \furl{http://www.w3.org/TeamSubmission/turtle/\#sec-grammar-comments}
    76 RDFa\furl{http://en.wikipedia.org/wiki/RDFa}
    77 EDM\furl{http://europeana.ontotext.com/resource/edm/hasType?role=all}
    78 
    79 
    80 \todocite{http://ldl2012.lod2.eu/program/proceedings}
    81 \todoin{check LDpath}\furl{http://code.google.com/p/ldpath/}
    82 
    83 
    84 \subsection{Schema / Ontology Mapping}
    85 As the main contribution shall be the application of \emph{ontology mapping} techniques and technology, a comprehensive overview of this field and current developments is paramount. There seems to be a plethora of work on the topic and the difficult task will be to sort out the relevant contributions. The starting point for the investigation will be the overview of the field by Kalfoglou \cite{Kalfoglou2003} and a more recent summary of the key challenges by Shvaiko and Euzenat \cite{Shvaiko2008}.
    86 
    87 In their rather theoretical work Ehrig and Sure \cite{EhrigSure2004} elaborate on the various similarity measures which are at the core of the mapping task. On the dedicated platform OAEI\footnote{Ontology Alignment Evalution Intiative - \url{http://oaei.ontologymatching.org/}} an ongoing effort is being carried out and documented comparing various alignment methods applied on different domains.
     6In this chapter we give a short overview of the development of large research infrastructures (with focus on those for language resources and technology), then we examine in more detail the hoist of work (methods and systems) on schema/ontology matching
     7and review Semantic Web principles and technologies.
     8
     9Note though that substantial parts of state of the art coverage are outsourced into separate chapters: A broad analysis of the data is provided in separate chapter \ref{ch:data} and a detailed description of the underlying infrastructure is found in \ref{ch:infra}.
     10
     11%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     12\section{Research Infrastructures (for Language Resources and Technology)}
     13In recent years, multiple large-scale initiatives have set out to combat the fragmented nature of the language resources landscape in general and the metadata interoperability problems in particular.
     14
     15\xne{EAGLES/ISLE Meta Data Initiative} (IMDI) \cite{wittenburg2000eagles} 2000 to 2003 proposed a standard for metadata descriptions of Multi-Media/Multi-Modal Language Resources aiming at easing access to Language Resources and thus increases their reusability.   
     16
     17\xne{FLaReNet}\furl{http://www.flarenet.eu/} -- Fostering Language Resources Network -- running 2007 to 2010 concentrated rather on ``community and consensus building'' developing a common vision and mapping the field of LRT via survey.
     18
     19\xne{CLARIN} -- Common Language Resources and Technology Infrastructure -- large research infrastructure providing sustainable access for scholars in the humanities and social sciences to digital language data, and especially its technical core the Component Metadata Infrastructure (CMDI)  -- a comprehensive architecture for harmonized handling of metadata\cite{Broeder2011} --
     20are the primary context of this work, therefore the description of this underlying infrastructure is detailed in separate chapter \ref{ch:infra}.
     21Both above-mentioned projects can be seen as predecessors to CLARIN, the IMDI metadata model being one starting point for the development of CMDI.
     22
     23More of a sister-project is the initiative \xne{DARIAH} - Digital Research Infrastructure for the Arts and Humanities\furl{http://dariah.eu}. It has a broader scope, but has many personal ties as well as similar problems  and similiar solutions as CLARIN. Therefore there are efforts to intensify the cooperation between these two research infrastructures for digital humanities.
     24
     25\xne{META-SHARE} is another multinational project aiming to build an infrastructure for language resource\cite{Piperidis2012meta}, however focusing more on Human Language Technologies domain.\furl{http://meta-share.eu}
     26
     27\begin{quotation}
     28META-NET is designing and implementing META-SHARE, a sustainable network of repositories of language data, tools and related web services documented with high-quality metadata, aggregated in central inventories allowing for uniform search and access to resources. Data and tools can be both open and with restricted access rights, free and for-a-fee.
     29\end{quotation}
     30
     31See \ref{def:META-SHARE} for more details about META-SHARE's catalog and metadata format.
     32
     33
     34\subsubsection{Digital Libraries}
     35
     36In a broader view we should also regard the activities in the world of libraries.
     37Starting already in 1970's with connecting, exchanging and harmonizing their bibliographic catalogs, they certainly have a long tradition, wealth of experience and stable solutions.
     38
     39Mainly driven by national libraries still bigger aggregations of the bibliographic data are being set up.
     40 The biggest one being the \xne{Worldcat}\furl{http://www.worldcat.org/} (totalling 273.7 million records \cite{OCLCAnnualReport2012})
     41powered by OCLC, a cooperative of over 72.000 libraries worldwide.
     42
     43In Europe, more recent initiatives have pursuit similar goals:
     44\xne{The European Library}\furl{http://www.theeuropeanlibrary.org/tel4/} offers a search interface over more than 18 million digital items and almost 120 million bibliographic records from 48 National Libraries and leading European Research Libraries.
     45
     46\xne{Europeana}\furl{http://www.europeana.eu/} \cite{purday2009think} has even broader scope, serving as meta-aggregator and portal for European digitised works, encompassing material not just from libraries, but also museums, archives and all other kinds of collections (In fact, The European Library is the \emph{library aggregator} for Europeana). The auxiliary project \xne{EuropeanaConnect}\furl{http://www.europeanaconnect.eu/} (2009-2011) delivered the core technical components for Europeana as well as further services reusable in other contexts, e.g. the spatio-temporal browser \xne{GeoTemCo}\furl{https://github.com/stjaenicke/GeoTemCo} \cite{janicke2013geotemco}.
     47
     48Most recently, with \xne{Europeana Cloud}\furl{http://pro.europeana.eu/web/europeana-cloud} (2013 to 2015) a succession of \xne{Europeana} was established, a Best Practice Network, coordinated by The European Library, designed to establish a cloud-based system for Europeana and its aggregators, providing new content, new metadata, a new linked storage system, new tools and services for researchers and a new platform - Europeana Research.
     49
     50A number of catalogs and formats are further described in the section \ref{sec:other-md-catalogs}
     51
     52
     53\section{Schema / Ontology Mapping/Matching}
     54
     55Schema or ontology matching provides the methodical foundation for the problem at hand the \emph{semantic mapping}.
     56As Shvaiko\cite{shvaiko2012ontology} states ``a solution to the semantic heterogeneity problem. It finds correspondences between semantically related entities of ontologies.''
     57
     58One starting point for the plethora of work in the field of \emph{schema and ontology mapping} techniques and technology
     59is the overview of the field by Kalfoglou \cite{Kalfoglou2003}.
     60Shvaiko and Euzenat provide a summary of the key challenges\cite{Shvaiko2008} as well as a comprehensive survey of approaches for schema and ontology matching based on a proposed new classification of schema-based matching techniques\cite{Shvaiko2005_classification}.
     61Noy \cite{Noy2005_ontologyalignment,Noy2004_semanticintegration}
     62
     63and more recently \cite{shvaiko2012ontology}(2012!) and \cite{amrouch2012survey} provide surveys of the methods and systems in the field.
     64
     65\paragraph{Methods}
     66Semantic and extensional methods are still rarely
     67employed by the matching systems. In fact, most of
     68the approaches are quite often based only on
     69terminological and structural methods
     70
     71classify, review, and experimentally compare major methods of element similarity measures and their combinations.\cite{Algergawy2010}
     72
     73\subsubsection{Systems}
     74A number of existing systems for schema/ontology matching/alignment is mentioned in this overview publications:
     75
     76The majority of tools for ontology mapping use some sort of structural or
     77definitional information to discover new mappings. This information includes
     78such elements as subclass–superclass relationships, domains and ranges of
     79properties, analysis of the graph structure of the ontology, and so on. Some of
     80the tools in this category include
     81
     82IF-Map\cite{kalfoglou2003if}
     83
     84QOM\cite{ehrig2004qom},
     85
     86Similarity Flooding\cite{melnik}
     87
     88the Prompt tools \cite{Noy2003_theprompt} integrating with Protege
     89
     90
     91\xne{COMA++} \cite{Aumueller2005}  composite approach to combine different match algorithms, user interaction via graphical interface , supports W3C XML Schema and OWL.
     92
     93\xne{FOAM}\cite{EhrigSure2005}
     94
     95
     96Ontology matching system \xne{LogMap 2} \cite{jimenez2012large} supports user interaction and implements scalable reasoning and diagnosis algorithms, which minimise any logical inconsistencies introduced by the matching process.
     97The process is divided into two main logical phases: computation of mapping candidates (maximise recall) and assessment of the candidates (maximize precision).
     98
     99
     100        s which are at the core of the mapping task.
     101
     102
     103On the dedicated platform OAEI\footnote{Ontology Alignment Evalution Intiative - \url{http://oaei.ontologymatching.org/}} an ongoing effort is being carried out and documented comparing various alignment methods applied on different domains.
     104
    88105
    89106One more specific recent inspirational work is that of Noah et. al \cite{Noah2010} developing a semantic digital library for an academic institution. The scope is limited to document collections, but nevertheless many aspects seem very relevant for this work, like operating on document metadata, ontology population or sophisticated querying and searching.
    90107
    91 \todoin{check if relevant: http://schema.org/}
     108Matching is laborious and error-prone process, and once ontology
     109mappings are discovered, i
     110
     111\subsection{MOVEOUT: Application of Schema Matching on the CMD domain}
     112Notice, that this the semantic interoperability layer built into the core of the CMD Infrastructure, integrates the
     113task of identifying semantic correspondences directly into the process of schema creation,
     114largely removing the need for complex schema matching/mapping techniques in the post-processing.
     115However this is only holds for schemas already created within the CMD framework,
     116Given the growing universe of definitions (data categories and components) in the CMD framework the metadata modeller could very well profit from applying schema mapping techniques as pre-processing step in the task of integrating existing external schemas into the infrastructure. User involvement is identified by \cite{shvaiko2012ontology} as one of promising future challenges to ontology matching
     117
     118Such a procedure pays tribute to the fact, that the mapping techniques are mostly error-prone and can deliver reliable 1:1 alignments only in trivial cases. This lies in the nature of the problem, given the heterogenity of the schemas present in the data collection, full alignments are not achievable at all, only parts of individual schemas actually semantically correspond.
     119
     120Once all the equivalencies (and other relations) between the profiles/schemas were found, simliarity ratios can be determined.
     121
     122The task can be also seen as building bridge between XML resources and semantic resources expressed in RDF, OWL.
     123This speaks for a tool like COMA++ supporting both W3C standards:  XML Schema and OWL.
     124Concentration on existing systems with user interface?
     125
     126The process of expressing the whole of the data as one semantic resource, can be also understood as schema or ontology merging task. Data categories being the primary mapping elements
     127
     128
     129In the end
     130It is also not the goal to merge
     131
     132Being only a pre-processing step meant to provide suggestions to the human modeller implies higher importance to recall than to precision.
     133
     134
     135
     136infrastructure un
     137
     138This approach of integrating prerequisites for semantic interoperability directly into the process of metadata creation is fundamentally different from the traditional methods of schema matching that try to establish pairwise alignments between already existing schemas -- be it algorithm-based or by means of explicit manually defined crosswalks\cite{Shvaiko2005}.
     139
     140
     141Application of ontology/schema matching/mapping techniques
     142is reduced or outsourced
     143
    92144
    93145\subsection{Existing Crosswalk services}
    94146
     147
    95148\url{http://www.oclc.org/developer/services/metadata-crosswalk-service}
    96149
    97 http://semanticweb.org/wiki/VoID
     150
     151VoID "Vocabulary of Interlinked Datasets") is an RDF based schema to describe linked datasets\furl{http://semanticweb.org/wiki/VoID}
     152
    98153http://www.dnb.de/rdf
    99154
     155
     156the entire WorldCat cataloging collection made publicly
     157available using Schema.org mark-up with library extensions for use by developers and
     158search partners such as Bing, Google, Yahoo! and Yandex
     159
     160OCLC begins adding linked data to WorldCat by appending
     161Schema.org descriptive mark-up to WorldCat.org pages, thereby
     162making OCLC member library data available for use by intelligent
     163Web crawlers such as Google and Bing
     164
     165%%%%%%%%%%%%%%%%%%%%%%%%%%5
     166\section{Semantic Web -- Linked Open Data}
     167
     168Linked Data paradigm\cite{TimBL2006} for publishing data on the web is increasingly been taken up by data providers across many disciplines \cite{bizer2009linked}. \cite{HeathBizer2011} gives comprehensive overview of the principles of Linked Data with practical examples and current applications.
     169
     170\subsection{Semantic Web - Technical solutions / Server applications}
     171
     172The provision of the produced semantic resources on the web requires technical solutions to store the RDF triples, query them efficiently
     173and idealiter expose them via a web interface to the users.
     174
     175Meanwhile a number of RDF triple store solutions relying both on native, DBMS-backed or hybrid persistence layer are available, open-source solutions like \xne{Jena, Sesame} or \xne{BigData} as well as a number of commercial solutions \xne{AllegroGraph, OWLIM, Virtuoso}.
     176
     177A qualitative and quantitative study\cite{Haslhofer2011europeana}   in the context of Europeana evaluated a number of RDF stores (using the whole Europeana EDM data set = 382,629,063 triples as data load) and came to the conclusion, that ``certain RDF stores, notably OpenLink Virtuoso and 4Store'' can handle the large test dataset.
     178
     179\xne{OpenLink Virtuoso Universal Server}\furl{http://virtuoso.openlinksw.com} is hybrid storage solution for a range of data models, including relational data, RDF and XML, and free text documents.\cite{Erling2009Virtuoso, Haslhofer2011europeana}
     180Virtuoso is used to host many important Linked Data sets (e.g., DBpedia\furl{http://dbpedia.org} \cite{auer2007dbpedia}).
     181Virtuoso is offered both as commercial and open-source version license models exist.
     182
     183Another solution worth examining is the \xne{Linked Media Framework}\furl{http://code.google.com/p/lmf/} -- ``easy-to-setup server application that bundles together three Apache open source projects to offer some advanced services for linked media management'': publishing legacy data as linked data, semantic search by enriching data with content from the Linked Data Cloud, using SKOS thesaurus for information extraction.
     184
     185\begin{comment}
     186LDpath\furl{http://code.google.com/p/ldpath/}
     187`` a simple path-based query language similar to XPath or SPARQL Property Paths that is particularly well-suited for querying and retrieving resources from the Linked Data Cloud by following RDF links between resources and servers. ''
     188
     189Linked Data browser
     190
     191Haystack\furl{http://en.wikipedia.org/wiki/Haystack_(PIM)}
     192\end{comment}
     193
    100194\subsection{Ontology Visualization}
    101195
     
    105199
    106200
    107 \subsection{Linguistic Ontologies}
    108 
    109 A special case are Linguistic Ontologies: isocat, GOLD, WALS.info
    110 ontologies conceptualizing the linguistic domain
    111 
    112 They are special in that (``ontologized'') Lexicons refer to them to describe linguistic properties of the Lexical Entries, as opposed to linking to Domain Ontologies to anchor Senses/Meanings.
    113 Lexicalized Ontologies: LingInfo, lemon: LMF +  isocat/GOLD +  Domain Ontology
    114 
    115 a) as domain ontologies, describing aspects of the Resources\\
    116 b) as linguistic ontologies enriching the Lexicalization of Concepts
    117 
    118 Ontology and Lexicon \cite{Hirst2009}
    119 
    120 LingInfo/Lemon \cite{Buitelaar2009}
    121 
    122 We shouldn't need linguistic ontologies (LingInfo, LEmon), they are primarily relevant in the task of ontology population from texts, where the entities can be encountered in various word-forms in the context of the text.
    123 (Ontology Learning, Ontology-based Semantic Annotation of Text)
    124 And we are dealing with highly structured data with referenced in their nominal(?) form.
    125 
     201\section{Language and Ontologies}
     202
     203There are two different relation links betwee language or linguistics and ontologies: a) `linguistic ontologies' domain ontologies conceptualizing the linguistic domain, capturing aspects of linguistic resources; b) `lexicalized' ontologies, where ontology entities are enriched with linguistic, lexical information.
     204
     205\subsubsection{Linguistic ontologies}
     206
     207One prominent instance of a linguistic ontology is \xne{General Ontology for Linguistic Description} or GOLD\cite{Farrar2003}\furl{http://linguistics-ontology.org},
     208that ``gives a formalized account of the most basic categories and relations (the "atoms") used in the scientific description of human language, attempting to codify the general knowledge of the field. The motivation is to`` facilite automated reasoning over linguistic data and help establish the basic concepts through which intelligent search can be carried out''.
     209
     210In line with the aspiration ``to be compatible with the general goals of the Semantic Web'', the dataset is provided via a web application as well as a dump in OWL format\furl{http://linguistics-ontology.org/gold-2010.owl} \cite{GOLD2010}.
     211
     212
     213Founded in 1934, SIL International\furl{http://www.sil.org/about-sil} (originally known as the Summer Institute of Linguistics, Inc) is a leader in the identification and documentation of the world's languages. Results of this research are published in Ethnologue: Languages of the World\furl{http://www.ethnologue.com/} \cite{grimes2000ethnologue}, a comprehensive catalog of the world's nearly 7,000 living languages. SIL also maintains Language \& Culture Archives a large collection of all kinds resources in the ethnolinguistic domain \furl{http://www.sil.org/resources/language-culture-archives}.
     214
     215 World Atlas of Language Structures (WALS) \furl{http://WALS.info} \cite{wals2011}
     216is ``a large database of structural (phonological, grammatical, lexical) properties of languages gathered from descriptive materials (such as reference grammars) ''. First appeared 2005, current online version published in 2011 provides a compendium of detailed expert definitions of individual linguistic features, accompanied by a sophisticated web interface integrating the information on linguistic features with their occurrence in the world languages and their geographical distribution.
     217
     218Simons \cite{Simons2003developing} developed a Semantic Interpretation Language (SIL) that is used to define the meaning of the elements and attributes in an XML markup schema in terms of abstract concepts defined in a formal semantic schema
     219Extending on this work, Simons et al. \cite{Simons2004semantics} propose a method for mapping linguistic descriptions in plain XML into semantically rich RDF/OWL, employing the GOLD ontology as the target semantic schema.
     220
     221These ontologies can be used by (``ontologized'') Lexicons refer to them to describe linguistic properties of the Lexical Entries, as opposed to linking to Domain Ontologies to anchor Senses/Meanings.
     222
     223
     224Work on Semantic Interpretation Language as well as the GOLD ontology can be seen as conceptual predecessor of the Data Category Registry a ISO-standardized procedure for defining and standardizing ``widely accepted linguistic concepts'', that is at the core of the CLARIN's metadata infrastructure (cf. \ref{def:DCR}).
     225Although not exactly an ontology in the common sense of
     226Although (by design) this registry does not contain any relations between concepts,
     227the central entities are concepts and not lexical items, thus it can be seen as a proto-ontology.
     228Another indication of the heritage is the fact that concepts of the GOLD ontology were migrated into ISOcat (495 items) in 2010.
     229
     230Notice that although this work is concerned with language resources, it is primarily on the metadata level, thus the overlap with linguistic ontologies codifying the terminology of the discipline linguistic is rather marginal (perhaps on level of description of specific linguistic aspects of given resources).
     231
     232\subsubsection{Lexicalised ontologies,``ontologized'' lexicons}
     233
     234
     235The other type of relation between ontologies and linguistics or language are lexicalised ontologies. Hirst \cite{Hirst2009} elaborates on the differences between ontology and lexicon and the possibility to reuse lexicons for development of ontologies.
     236
     237In a number of works Buitelaar, McCrae et. al \cite{Buitelaar2009, buitelaar2010ontology, McCrae2010c, buitelaar2011ontology, Mccrae2012interchanging} argues for ``associating linguistic information with ontologies'' or ``ontology lexicalisation'' and draws attention to lexical and linguistic issues in knowledge representation in general. This basic idea lies behind the series of proposed models \xne{LingInfo}, \xne{LexOnto}, \xne{LexInfo} and, most recently, \xne{lemon} aimed at allowing complex lexical information for such ontologies and for describing the relationship between the lexicon and the ontology.
     238The most recent in this line, \xne{lemon} or \xne{lexicon model for ontologies} defines ``a formal model for the proper representation of the continuum between: i) ontology semantics; ii) terminology that is used to convey this in natural
     239language; and iii) linguistic information on these terms and their constituent lexical units'', in essence enabling the creation of a lexicon for a given ontology, adopting the principle of ``semantics by reference", no complex semantic in-
     240formation needs to be stated in the lexicon.
     241a clear separation of the lexical layer and the ontological layer.
     242
     243Lemon builds on existing work, next to the LexInfo and LIR ontology-lexicon models.
     244and in particular on global standards: W3C standard: SKOS (Simple Knowledge Organization System) \cite{SKOS2009} and ISO standards the Lexical Markup Framework (ISO 24613:2008 \cite{ISO24613:2008}) and
     245and Specification of Data Categories, Data Category Registry (ISO 12620:2009 \cite{ISO12620:2009})
     246
     247Lexical Markup Framework LMF \cite{Francopoulo2006LMF, ISO24613:2008} defines a metamodel for representing data in lexical databases used with monolingual and multilingual computer applications.
     248
     249An overview of current developments in application of the linked data paradigm for linguistic data collections was given at the  workshop Linked Data in Linguistics\furl{http://ldl2012.lod2.eu/} 2012 \cite{ldl2012}.
     250
     251
     252The primary motivation for linguistic ontologies like \xne{lemon} are the tasks ontology-based information extraction, ontology learning and population from text, where the entities are often referred to by non-nominal word forms and with ambiguous semantics. Given, that the discussed collection contains mainly highly structured data referencing entities in their nominal form, linguistic ontologies are not directly relevant for this work.
    126253
    127254
  • SMC4LRT/chapters/Results.tex

    r3665 r3671  
    117117
    118118\subsubsection{dublincore / OLAC}
    119 
     119\label{reports:OLAC}
    120120Very widely used (because) simple format
    121 \ref{info:olac-records}
     121\ref{def:OLAC}
     122%\ref{info:olac-records}
    122123
    123124Here the problem of proliferation seems especially virulent. Table \ref{table:dcterms-profiles} lists all the profiles modelling dcterms.
     
    179180
    180181TEI is a de-facto standard for encoding any kind of textual resources. It defines a set of elements to annotate individual aspects of the text being encoded. For the purposes of text description / metadata the complex element \code{teiHeader} is foreseen.
    181 TEI does not provide just one fixed schema, but allows for a certain flexibility wrt to elements used and inner structure, allowing to generate custom schemas adopted to projects' needs.
     182TEI does not provide just one fixed schema, but allows for a certain flexibility wrt to elements used and inner structure, allowing to generate custom schemas adopted to projects' needs. \ref{def:tei}.
    182183Thus there is also not just one fixed \xne{teiHeader}.
    183184
Note: See TracChangeset for help on using the changeset viewer.