1 | |
---|
2 | \chapter{Analysis of the data landscape} |
---|
3 | \label{ch:data} |
---|
4 | This section gives an overview of existing standards and formats for metadata and content annotations in the field of Language Resources and Technology together with a description of their characteristics and their respective usage in the projects and initiatives. |
---|
5 | |
---|
6 | |
---|
7 | \section{Metadata Formats} |
---|
8 | |
---|
9 | |
---|
10 | \subsection{CMD-Framework} |
---|
11 | |
---|
12 | \begin{center} |
---|
13 | \begin{tabular}{ l | r } |
---|
14 | \hline |
---|
15 | created & 2013-01-26 \\ \hline |
---|
16 | Profiles & 87 \\ |
---|
17 | Components & 2904 \\ |
---|
18 | distinct Components & 542 \\ |
---|
19 | Elements & 5754 \\ |
---|
20 | distinct Elements & 1505 \\ |
---|
21 | distinct DatCats & 436 \\ |
---|
22 | Elements with DatCats & 1183 \\ |
---|
23 | Elements without DatCats & 323 \\ |
---|
24 | ratio of elements without DatCats & 21.46 \% \\ |
---|
25 | available Concepts & 893 \\ |
---|
26 | used Concept & 474 \\ |
---|
27 | blind Concepts (not in public ISOcat) & 190 \\ |
---|
28 | Concepts not used in CMD & 539 \\ |
---|
29 | \hline |
---|
30 | \end{tabular} |
---|
31 | \end{center} |
---|
32 | |
---|
33 | \todoin{Collect number about CMD-Framework (profiles, datcats) + historical development} |
---|
34 | |
---|
35 | \todoin{Collect numbers about CMD records (collections, used profiles, ...) in historical perspective} |
---|
36 | |
---|
37 | |
---|
38 | \subsection{Dublin Core + OLAC} |
---|
39 | |
---|
40 | DC, OLAC |
---|
41 | |
---|
42 | DublinCore Resource Types\furl{http://dublincore.org/documents/resource-typelist/} |
---|
43 | |
---|
44 | \subsection{TEI / teiHeader} |
---|
45 | TEI/teiHeader/ODD, |
---|
46 | |
---|
47 | \subsection{ISLE/IMDI} |
---|
48 | |
---|
49 | \subsection{MODS/METS} |
---|
50 | |
---|
51 | \subsection{Europeana Data Model - EDM} |
---|
52 | |
---|
53 | |
---|
54 | \subsection{Other} |
---|
55 | |
---|
56 | OAI-ORE - is this a schema? |
---|
57 | |
---|
58 | |
---|
59 | |
---|
60 | \section{Content/Annotation Formats} |
---|
61 | |
---|
62 | CHILDES, TEI, EAF! |
---|
63 | (CES/XCES) |
---|
64 | Open Annotation Collaboration (OAC)\footnote{\url{http://openannotation.org/}} |
---|
65 | |
---|
66 | [LAF] Linguistic Annotation Framework |
---|
67 | |
---|
68 | |
---|
69 | |
---|
70 | \section{Ontologies, Controlled Vocabularies, Reference Data, Authority Files} |
---|
71 | \label{refdata} |
---|
72 | |
---|
73 | Based on popular demand, the work on reference data for the SSH-community should cover at least the following dimensions (with tentative denominations of corresponding existing vocabularies): |
---|
74 | |
---|
75 | \begin{itemize} |
---|
76 | \item Data Categories / Concepts - ISOcat |
---|
77 | \item Languages - ISO-639 |
---|
78 | \item Countries - country codes |
---|
79 | \item Persons - GND, VIAF |
---|
80 | \item Organizations - GND, VIAF |
---|
81 | \item Schlagwörter/Subjects - GND, LCSH |
---|
82 | \item Resource Typology - |
---|
83 | \end{itemize} |
---|
84 | |
---|
85 | AAT - international Architecture and Arts Thesaurus |
---|
86 | GND - Gemeinsame Norm Datei |
---|
87 | GTAA - Gemeenschappelijke Thesaurus Audiovisuele Archieven (Common Thesaurus [for] Audiovisual Archives) |
---|
88 | VIAF - Virtual International Authority File |
---|
89 | |
---|
90 | Other related relevant activities and initiatives |
---|
91 | |
---|
92 | A broader collection of related initiatives can be found at the German National Library website: |
---|
93 | \furl{http://www.dnb.de/DE/Standardisierung/LinksAFS/linksafs_node.html} |
---|
94 | FRBR - Functional Requirements for Bibliographic Records |
---|
95 | RDA - Resource Description and Access |
---|
96 | http://metadaten-twr.org/ - Technology Watch Report: Standards in Metadata and Interoperability (last entry from 2011) |
---|
97 | At MPDL, within the escidoc publication platform there seems to be (work on) a service (since 2009 !) for controlled vocabularies: \furl{http://colab.mpdl.mpg.de/mediawiki/Control_of_Named_Entities} |
---|
98 | Entity Authority Tool Set - a web application for recording, editing, using and displaying authority information about entities â developed at the New Zealand Electronic Text Centre (NZETC). |
---|
99 | http://eats.readthedocs.org/en/latest/ |
---|
100 | |
---|
101 | |
---|
102 | \subsection{ISOcat - Data Category Registry} |
---|
103 | |
---|
104 | ISO12620 |
---|
105 | |
---|
106 | \subsection{Classification Schemes, Taxonomies } |
---|
107 | LCSH, DDC |
---|
108 | |
---|
109 | |
---|
110 | \subsection{Other controlled Vocabularies} |
---|
111 | Tagsets: STTS |
---|
112 | Language codes ISO-639-1 |
---|
113 | |
---|
114 | \subsection{Domain Ontologies, Vocabularies} |
---|
115 | Organization-Lists |
---|
116 | LT-World !? |
---|
117 | |
---|
118 | |
---|
119 | |
---|
120 | \section{LRT Metadata Catalogs/Collections} |
---|
121 | |
---|
122 | \todoin{Overview of catalogs, name, since, #providers, #resources} |
---|
123 | |
---|
124 | \todoin{[DFKI/LT-World] - collection or ontology} |
---|
125 | |
---|
126 | \subsection{CMDI} |
---|
127 | collections, profiles/Terms, ResourceTypes! |
---|
128 | |
---|
129 | \subsection{OLAC} |
---|
130 | |
---|
131 | \subsection{LAT, TLA} |
---|
132 | Language Archiving Technology, now The Language Archive - provided by Max Planck Insitute for Psycholinguistics \footnote{\url{http://www.mpi.nl/research/research-projects/language-archiving-technology}} |
---|
133 | |
---|
134 | \subsection{META-NET} |
---|
135 | |
---|
136 | |
---|
137 | \subsection{ELRA} |
---|
138 | |
---|
139 | \subsection{Other} |
---|
140 | |
---|
141 | |
---|
142 | \begin{description} |
---|
143 | \item[LDC] Linguistic Data Consortium |
---|
144 | \item[OTA LR] Archiving Service provided by Oxford Text Archive \url{http://ota.oucs.ox.ac.uk/} |
---|
145 | \end{description} |
---|
146 | |
---|
147 | \section{Other Metadata Catalogs/Collections} |
---|
148 | |
---|
149 | Digital Libraries |
---|
150 | \subsubsection{(Digital) Libraries} |
---|
151 | |
---|
152 | |
---|
153 | General (Libraries, Federations): |
---|
154 | |
---|
155 | \begin{description} |
---|
156 | \item[OCLC] \url{http://www.oclc.org} |
---|
157 | world's biggest Library Federation |
---|
158 | \item[LoC] Library of Congress \url{http://www.loc.gov} |
---|
159 | \item[EU-Lib] European Library \url{http://www.theeuropeanlibrary.org/portal/organisation/handbook/accessing-collections\_ en.htm} |
---|
160 | \item[europeana] virtual European library - cross-domain portal \url{http://www.europeana.eu/portal/} |
---|
161 | \end{description} |
---|
162 | |
---|
163 | |
---|
164 | |
---|
165 | |
---|
166 | \section{Summary} |
---|
167 | |
---|
168 | In this chapter, we gave an overview of the existing formats and dataset in the broad context of Language Resources and Technology |
---|
169 | |
---|