Context Navigation

FCS-Specification-Draft

Timestamp:: 11/02/15 14:54:26 (9 years ago)
Author:: Oliver Schonefeld
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

Taskforces/FCS/FCS-Specification-Draft

-                      v31
+                      v32
 ==== Layers #layers
+||=Identifier =||=Annotation Layer Description                                =||=Syntax                          =||=Examples (without quotes)           =||
+|| `token`     || Appropriate tokenisation of resource, i.e. words            || ''String''                       || "Dog", "cat", "walking", "better"               ||
+|| `lemma`     || Lemmatisation of tokens                                     || ''String''                       || "good", "walk", "dog"             ||
+|| `pos`       || Part-of-Speech annotations                                  || [#REF_UD_POS Universal POS tags] || "NOUN", "VERB", "ADJ"                ||
+|| `orth`      || Orthographic transcription of (mostly) spoken resources     || ''String''                       || "dug", "cat", "wolking"              ||
+|| `norm`      || Orthographic normalization of (mostly) spoken resources     || ''String''                       || "dog", "cat", "walking", "best"              ||
+|| `phonetic`  || Phonetic transcription                                      || [#REF_SAMPA SAMPA]               || "'du:", "'vi:-d6 'ha:-b@n"           ||
+|| `text`      || Annotation layer that is used in [#basicSearch Basic Search] || ''String''                       || "Dog", "cat" "walking", "better"                ||
+The column Syntax describes the inventory of symbols that a Client `MUST` use with a corresponding annotation layer; the value ''String'' denotes that symbols are arbitrary Unicode Strings, i.e. no fixed inventory of symbols are defined. An Endpoint `SHOULD` provide an appropriate error, if a Client used an invalid value.
+==== FCS-QL
+||=Layer Type Identifier =||=Annotation Layer Description                                =||=Syntax                          =||=Examples (without quotes)           =||
+|| `token`                || Appropriate tokenisation of resource, i.e. words            || ''String''                       || "Dog", "cat", "walking", "better"               ||
+|| `lemma`                || Lemmatisation of tokens                                     || ''String''                       || "good", "walk", "dog"             ||
+|| `pos`                  || Part-of-Speech annotations                                  || [#REF_UD_POS Universal POS tags] || "NOUN", "VERB", "ADJ"                ||
+|| `orth`                 || Orthographic transcription of (mostly) spoken resources     || ''String''                       || "dug", "cat", "wolking"              ||
+|| `norm`                 || Orthographic normalization of (mostly) spoken resources     || ''String''                       || "dog", "cat", "walking", "best"              ||
+|| `phonetic`             || Phonetic transcription                                      || [#REF_SAMPA SAMPA]               || "'du:", "'vi:-d6 'ha:-b@n"           ||
+|| `text`                 || Annotation layer that is used in [#basicSearch Basic Search] || ''String''                       || "Dog", "cat" "walking", "better"                ||
+The column ''Layer Type Identifier'' denotes the identifier for a layer. It is used in [#fcsQL FCS-QL] queries and the XML serialization for the [#advancedDataView Advanced Data View]. All valid identifiers are defined in the table above, all other identifiers are reserved and `MUST NOT` be used. Clients and Endpoints `MAY` create custom Layer Type Identifiers, e.g. for testing proposed. If they so so, the custom Layer Type identifiers `MUST` start with the String `x-`, e.g. `x-customLayer`.
+The column ''Syntax'' describes the inventory of symbols that a Client `MUST` use with a corresponding annotation layer; the value ''String'' denotes that symbols are arbitrary Unicode Strings, i.e. no fixed inventory of symbols are defined. An Endpoint `SHOULD` provide an appropriate error, if a Client used an invalid value.
+==== FCS-QL #fcsQL
 TODO: About FCS-QL
 …
 }}}
+||=Data    =|| t || || d || a || || ' || s ||   || d || e ||   || e || n || i || g || e ||   || e || c || h || t || e ||   || h || o || o || p ||   || v || o || o || r ||   || o || n || s ||   || m || e || n || s || e || n ||
+||= Offset =|| 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 || 10 || 11 || 12 || 13 || 14 || 15 || 16 || 17 || 18 || 19 || 20 || 21 || 22 || 23 || 24 || 25 || 26 || 27 || 28 || 29 || 30 || 31 || 32 || 33 || 34 || 35 || 36 || 37 || 38 || 39 || 40 || 41 || 42 || 43 ||
+||=Offset Start       =|| 1  || 3    || 6    || 9   || 12    || 18    || 24   || 29   || 34   || 38     ||
+||=Offset End         =|| 1  || 4    || 7    || 10  || 16    || 22    || 27   || 32   || 36   || 43     ||
+||=Layer ''orth''     =|| t  || da   || 's   || de  || enige || echte || hoop || voor || ons  || mensen ||
+||=Layer ''pos''      =|| X  || PRON || VERB || DET || DET   || ADJ   || NOUN || ADP  || PRON || NOUN   ||
+||=Layer ''lemma''    =|| _  || dat  || zijn || de  || enig  || echt  || hoop || voor || ons  || mens   ||
+||=Layer ''phonetic'' =|| t@ || dAz  || dAz  || d@  || en@G@ || Ext@  || hop  || for  || Ons  || mEns@  ||
+ - alignable (and referable)
+ - ADV Data View allows to return structured information for Advanced Search queries
+ - organized in one or more annotation layers
+ - annotation layer := annotations of a specific type, e.g. part-of-speech or orthographic transcription
+ - annotations of two annotation layers may freely overlap; no self-overlap in an annotation layer
+ - Data View serialization in a stand-off like format, i.e annotations are ranges over the signal (= language resource as character, token or audio stream) are denoted by start and end offsets
+ - layers alignable (through their offsets) and referable (trough their layer identifier)
+ - ADV Data View serialization:
+   - a list of segments (= "inventory" of all ranges used to describe annotations")
+     - units can be "items" (= offsets in character or token-stream) or "timestamp" (timestamps in audio-stream), timestamps may have a resolution of up to 1/1000 second.
+     - segments may also have an endpoint specific reference (= URI); can be show in aggregator and if user clicks link can open a viewer (e.g. audio-player) at the endpoint
+   - a list of layers, each has a type (e.g. "pos", "lemma", see Layer Type identifier in section Layers above) and an layer identifier (= URI)
+     - a layer consists of one or more Spans. A span references a segment (and thus inherits the start- and en- offsets) and contains the actual annotation (e.g. the port-of-speech label) in it's content; MAY also contain alt-value (e.g. original annotation value)
+   - Hit Makers are added by marking Spans as hits (add `@highlight` attribute); multiple hit-makers are supported and Aggregator may display them visually distinct
+Example: a sentence interpreted as a character stream
+||=Data   =|| t || || d || a || || ' || s ||   || d || e ||   || e || n || i || g || e ||   || e || c || h || t || e ||   || h || o || o || p ||   || v || o || o || r ||   || o || n || s ||   || m || e || n || s || e || n ||
+||=Offset =|| 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 || 10 || 11 || 12 || 13 || 14 || 15 || 16 || 17 || 18 || 19 || 20 || 21 || 22 || 23 || 24 || 25 || 26 || 27 || 28 || 29 || 30 || 31 || 32 || 33 || 34 || 35 || 36 || 37 || 38 || 39 || 40 || 41 || 42 || 43 ||
+Example: several annotation layers for the sentence
+||=Offset (Start, End) =|| 1,1  || 3,4    || 6,7    || 9,10   || 12,16    || 18,22    || 24,27   || 29,32   || 34,36   || 38,43  ||
+||=Layer ''orth''      =|| t    || da     || 's     || de     || enige    || echte    || hoop    || voor    || ons     || mensen ||
+||=Layer ''pos''       =|| X    || PRON   || VERB   || DET    || DET      || ADJ      || NOUN    || ADP     || PRON    || NOUN   ||
+||=Layer ''lemma''     =|| _    || dat    || zijn   || de     || enig     || echt     || hoop    || voor    || ons     || mens   ||
+||=Layer ''phonetic''  =|| t@   || dAz    || dAz    || d@     || en@G@    || Ext@     || hop     || for     || Ons     || mEns@  ||
+Example: XML serialization
 {{{#!xml
 <Advanced>