source: SMC4LRT/chapters/Data.tex @ 2703

Last change on this file since 2703 was 2703, checked in by vronk, 11 years ago

adding figures, big reorganization of the content structure
some new text on Controlled Vocabularies, Underlying Infrastructure.

File size: 5.0 KB
Line 
1
2\chapter{Analysis of the data landscape}
3\label{ch:data}
4This section gives an overview of existing standards and formats for metadata and content annotations in the field of Language Resources and Technology together with a description of their characteristics and their respective usage in the projects and initiatives.
5
6
7\section{Metadata Formats}
8
9
10\subsection{CMD-Framework}
11
12\begin{center}
13  \begin{tabular}{ l | r }
14    \hline
15created & 2013-01-26 \\ \hline
16Profiles & 87 \\
17Components & 2904 \\
18distinct Components & 542 \\
19Elements & 5754 \\
20distinct Elements & 1505 \\
21distinct DatCats & 436 \\
22Elements with DatCats & 1183 \\
23Elements without DatCats & 323 \\
24ratio of elements without DatCats & 21.46 \% \\
25available Concepts & 893 \\
26used Concept & 474 \\
27blind Concepts (not in public ISOcat) & 190 \\
28Concepts not used in CMD & 539 \\
29    \hline
30  \end{tabular}
31\end{center}
32
33\todoin{Collect number about CMD-Framework (profiles, datcats) + historical development}
34
35\todoin{Collect numbers about CMD records (collections, used profiles, ...) in historical perspective}
36
37
38\subsection{Dublin Core + OLAC}
39
40DC, OLAC
41
42DublinCore Resource Types\furl{http://dublincore.org/documents/resource-typelist/}
43
44\subsection{TEI / teiHeader}
45 TEI/teiHeader/ODD,
46
47\subsection{ISLE/IMDI}
48 
49\subsection{MODS/METS}
50
51\subsection{Europeana Data Model - EDM}
52
53
54\subsection{Other}
55
56OAI-ORE - is this a schema?
57
58
59
60\section{Content/Annotation Formats}
61
62CHILDES, TEI, EAF!
63(CES/XCES)
64Open Annotation Collaboration (OAC)\footnote{\url{http://openannotation.org/}}
65
66[LAF] Linguistic Annotation Framework
67
68
69
70\section{Ontologies, Controlled Vocabularies, Reference Data, Authority Files}
71\label{refdata}
72
73Based on popular demand, the work on reference data for the SSH-community should cover at least the following dimensions (with tentative denominations of corresponding existing vocabularies):
74
75\begin{itemize}
76\item Data Categories / Concepts - ISOcat
77\item Languages - ISO-639
78\item Countries - country codes
79\item Persons - GND, VIAF
80\item Organizations - GND, VIAF
81\item Schlagwörter/Subjects - GND, LCSH
82\item Resource Typology -
83\end{itemize}
84
85AAT - international Architecture and Arts Thesaurus
86GND - Gemeinsame Norm Datei
87GTAA - Gemeenschappelijke Thesaurus Audiovisuele Archieven (Common Thesaurus [for] Audiovisual Archives)
88VIAF - Virtual International Authority File
89
90Other related relevant activities and initiatives
91
92A broader collection of related initiatives can be found at the German National Library website:
93\furl{http://www.dnb.de/DE/Standardisierung/LinksAFS/linksafs_node.html}
94FRBR - Functional Requirements for Bibliographic Records
95RDA - Resource Description and Access
96http://metadaten-twr.org/ - Technology Watch Report: Standards in Metadata and Interoperability (last entry from 2011)
97At MPDL, within the escidoc publication platform there seems to be (work  on) a service (since 2009 !) for controlled vocabularies: \furl{http://colab.mpdl.mpg.de/mediawiki/Control_of_Named_Entities}
98Entity Authority Tool Set - a web application for recording, editing, using and displaying authority information about entities – developed at the New Zealand Electronic Text Centre (NZETC).
99http://eats.readthedocs.org/en/latest/
100
101
102\subsection{ISOcat - Data Category Registry}
103
104ISO12620
105
106\subsection{Classification Schemes, Taxonomies }
107LCSH, DDC
108
109
110\subsection{Other controlled Vocabularies}
111Tagsets: STTS
112Language codes ISO-639-1
113
114\subsection{Domain Ontologies, Vocabularies}
115Organization-Lists
116LT-World !?
117
118
119
120\section{LRT Metadata Catalogs/Collections}
121
122\todoin{Overview of catalogs, name, since, #providers, #resources}
123
124\todoin{[DFKI/LT-World]  - collection or ontology}
125
126\subsection{CMDI}
127collections, profiles/Terms, ResourceTypes!
128
129\subsection{OLAC}
130
131\subsection{LAT, TLA}
132Language Archiving Technology, now The Language Archive - provided by Max Planck Insitute for Psycholinguistics \footnote{\url{http://www.mpi.nl/research/research-projects/language-archiving-technology}}
133
134\subsection{META-NET}
135
136
137\subsection{ELRA}
138
139\subsection{Other}
140
141
142\begin{description}
143\item[LDC]  Linguistic Data Consortium
144\item[OTA LR] Archiving Service provided by Oxford Text Archive \url{http://ota.oucs.ox.ac.uk/}
145\end{description}
146
147\section{Other Metadata Catalogs/Collections}
148
149Digital Libraries
150\subsubsection{(Digital) Libraries}
151
152
153General (Libraries, Federations):
154
155\begin{description}
156\item[OCLC] \url{http://www.oclc.org}
157    world's biggest Library Federation
158\item[LoC] Library of Congress \url{http://www.loc.gov}
159\item[EU-Lib] European Library \url{http://www.theeuropeanlibrary.org/portal/organisation/handbook/accessing-collections\_ en.htm}
160\item[europeana] virtual European library - cross-domain portal \url{http://www.europeana.eu/portal/}
161\end{description}
162
163
164
165
166\section{Summary}
167
168In this chapter, we gave an overview of the existing formats and dataset in the broad context of Language Resources and Technology
169
Note: See TracBrowser for help on using the repository browser.