[[PageOutline(1-3)]]
= Advanced FCS Data View =
Add you examples and proposals for the Advanced FCS DataView here.
== Proposals ==
=== Proposal 1 ===
* Who: Oliver (IDS)
* What: Initial ''Proposal 2''. Ad-Hoc Stand-Off representation of multiple annotation layers.
* Code:
{{{
#!xml
}}}
* Issues/Discussion:
* a lot
=== Proposal for transcriptions of spoken language ===
* Who: Hanna (HZSK) trying to squeeze transcription data into Olli's dataview proposal (here 1, should work for 2 too?)
* What: A first draft for an Olli's Dataview Proposal 1 version of a small excerpt from a transcription according to the HIAT system widely used for discourse transcription, at least in Germany. It is a representation of the transcription of the conversation, i.e. of a text and not of the conversation itself, and it has been linearized though it was originally created for musical score notation. A common issue (which also applies for diachronic corpora) not represented here is the use of deviations from standard orthography ("i wanna kinda...").
* Code:
{{{
#!xml
Als
o
…
Sch
ön
.
Auf
W
iedersehen
.
Ne
?
Äh
…
((0,3s))
Öh
˙
}}}
* Issues/Discussion:
* a (normalized?) word/token layer?
* sometimes another attribute in the layer element might be good to describe i.e. what's in the value-info (e.g. original tagset name for the pos layer)
* everything concerning spoken language...
=== Proposal 2 ===
* Who: ljo (Språkbanken)
* What: Multilayered anchorbased annotations allowing parallel annotations of the same kind/type
* Code: To be published
{{{
#!xml
}}}
* Issues/Discussion:
* This can look a bit crowded so I don't expect anyone to get too excited about it, but bits and pieces might spur some discussion atleast. We use it for text corpora and anchors are only added -- never removed for persistency, so this differs a bit from our dataview perspective which is for transport only and quite short-lived.
=== Proposal 3 ===
* Who: Peter Beinema (MPI)
* What: attempt to map CGN utterance nl000783.1 eaf-description to DataView example
* Code:
{{{
#!xml
LINGUISTIC_TYPE_REF="Words" PARTICIPANT="N01999">
t
da
's
de
enige
echte
hoop
voor
ons
mensen
t
da
's
de
enige
echte
hoop
voor
ons
mensen
}}}
=== Proposal 4 ===
* Who: Oliver (IDS)
* What: Stand-off representation of MPI example
* Code:
{{{
#!xml
fcs>
t
da
's
de
enige
echte
hoop
voor
ons
mensen
word
word
word
word
word
word
word
word
word
word
X
PRON
VERB
DET
DET
ADJ
NOUN
ADP
PRON
NOUN
_
dat
zijn
de
enig
echt
hoop
voor
ons
mens
t@
dAz
dAz
d@
en@G@
Ext@
hop
for
Ons
mEns@
}}}
* Discussion:
* No preferred/primary layer, all layers are equal
* Each {{{}}} has a {{{@type}}}, which denotes the type of the layer (closed controlled vocabulary)
* If more layers of a certain type are available, the Aggregator would initially display the first.
* Each {{{}}} has an {{{@id}}}, which uniquely identifies the layer (in regard to the xml snippet); URIs are used to foster uniqueness
* Each {{{}}} has {{{@start}}} and {{{@end}}} offset to establish the relations between other spans
* The actual "content" is stored as PCDATA in {{{}}}. Additional data, e.g. the original tags in case of pos, can be supplied in the {{{@alt-value}}} attribute. Information about what is contained there must be present in {{{@alt-value-info}}} on {{{}}}.
* Optionally a {{{}}} may carry {{{@time-start}}} and {{{@time-end}}} elements; the value type is denoted by {{{@time-unit}}} on {{{}}}. Additionally a reference to a audio-file can be supplied by {{{@audio-file-ref}}} on {{{}}}; this can be used to generate links for playback
* Issues:
* Quite a lot of redundancy (especially when all segments more or less are the same across layers)
* Time info adds even more redundancy. Furthermore, it's unclear how this information is supposed to be combined for a direct playback link.
* Possible solution: endpoints needs to provide complete and valid playback link?
* "Maker layers" (e.g. "words") only mark intervals but contain useless PCDATA
* Possible solution: allow empty PCDATA (or better empty {{{}}} elements in that case?
* What is the Aggregator supposed to display, if it wants/needs to display offsets? The artificial characters offsets make no real sense in the spoken data example. Nice start end offsets (e.g. 00:01:12.22/00:01:23.42) would probably be nicer.
* Possible solution: add a display label?
* Hit-Markers (highlights) are missing
=== Proposal N ===
* Who: Name (Center)
* What: short description
* Code:
{{{
#!xml
}}}
* Issues/Discussion:
* ...
== General Discussion ==
...