[[PageOutline(1-3)]] = Advanced FCS Data View = Add you examples and proposals for the Advanced FCS DataView here. == Proposals == === Proposal 1 === * Who: Oliver (IDS) * What: Initial ''Proposal 2''. Ad-Hoc Stand-Off representation of multiple annotation layers. * Code: {{{ #!xml }}} * Issues/Discussion: * a lot === Proposal for transcriptions of spoken language === * Who: Hanna (HZSK) trying to squeeze transcription data into Olli's dataview proposal (here 1, should work for 2 too?) * What: A first draft for an Olli's Dataview Proposal 1 version of a small excerpt from a transcription according to the HIAT system widely used for discourse transcription, at least in Germany. It is a representation of the transcription of the conversation, i.e. of a text and not of the conversation itself, and it has been linearized though it was originally created for musical score notation. A common issue (which also applies for diachronic corpora) not represented here is the use of deviations from standard orthography ("i wanna kinda..."). * Code: {{{ #!xml Als o … Sch ön . Auf W iedersehen . Ne ? Äh … ((0,3s)) Öh ˙ }}} * Issues/Discussion: * a (normalized?) word/token layer? * sometimes another attribute in the layer element might be good to describe i.e. what's in the value-info (e.g. original tagset name for the pos layer) * everything concerning spoken language... === Proposal 2 === * Who: ljo (Språkbanken) * What: Multilayered anchorbased annotations allowing parallel annotations of the same kind/type * Code: To be published {{{ #!xml }}} * Issues/Discussion: * This can look a bit crowded so I don't expect anyone to get too excited about it, but bits and pieces might spur some discussion atleast. We use it for text corpora and anchors are only added -- never removed for persistency, so this differs a bit from our dataview perspective which is for transport only and quite short-lived. === Proposal 3 === * Who: Peter Beinema (MPI) * What: attempt to map CGN utterance nl000783.1 eaf-description to DataView example * Code: {{{ #!xml LINGUISTIC_TYPE_REF="Words" PARTICIPANT="N01999"> t da 's de enige echte hoop voor ons mensen t da 's de enige echte hoop voor ons mensen }}} === Proposal 4 === * Who: Oliver (IDS) * What: Stand-off representation of MPI example * Code: {{{ #!xml fcs> t da 's de enige echte hoop voor ons mensen word word word word word word word word word word X PRON VERB DET DET ADJ NOUN ADP PRON NOUN _ dat zijn de enig echt hoop voor ons mens t@ dAz dAz d@ en@G@ Ext@ hop for Ons mEns@ }}} * Discussion: * No preferred/primary layer, all layers are equal * Each {{{}}} has a {{{@type}}}, which denotes the type of the layer (closed controlled vocabulary) * If more layers of a certain type are available, the Aggregator would initially display the first. * Each {{{}}} has an {{{@id}}}, which uniquely identifies the layer (in regard to the xml snippet); URIs are used to foster uniqueness * Each {{{}}} has {{{@start}}} and {{{@end}}} offset to establish the relations between other spans * The actual "content" is stored as PCDATA in {{{}}}. Additional data, e.g. the original tags in case of pos, can be supplied in the {{{@alt-value}}} attribute. Information about what is contained there must be present in {{{@alt-value-info}}} on {{{}}}. * Optionally a {{{}}} may carry {{{@time-start}}} and {{{@time-end}}} elements; the value type is denoted by {{{@time-unit}}} on {{{}}}. Additionally a reference to a audio-file can be supplied by {{{@audio-file-ref}}} on {{{}}}; this can be used to generate links for playback * Issues: * Quite a lot of redundancy (especially when all segments more or less are the same across layers) * Time info adds even more redundancy. Furthermore, it's unclear how this information is supposed to be combined for a direct playback link. * Possible solution: endpoints needs to provide complete and valid playback link? * "Maker layers" (e.g. "words") only mark intervals but contain useless PCDATA * Possible solution: allow empty PCDATA (or better empty {{{}}} elements in that case? * What is the Aggregator supposed to display, if it wants/needs to display offsets? The artificial characters offsets make no real sense in the spoken data example. Nice start end offsets (e.g. 00:01:12.22/00:01:23.42) would probably be nicer. * Possible solution: add a display label? * Hit-Markers (highlights) are missing === Proposal N === * Who: Name (Center) * What: short description * Code: {{{ #!xml }}} * Issues/Discussion: * ... == General Discussion == ...