Changes between Version 31 and Version 32 of Taskforces/FCS/FCS-Specification-Draft


Ignore:
Timestamp:
11/02/15 14:54:26 (9 years ago)
Author:
Oliver Schonefeld
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Taskforces/FCS/FCS-Specification-Draft

    v31 v32  
    425425
    426426==== Layers #layers
    427 ||=Identifier =||=Annotation Layer Description                                =||=Syntax                          =||=Examples (without quotes)           =||
    428 || `token`     || Appropriate tokenisation of resource, i.e. words            || ''String''                       || "Dog", "cat", "walking", "better"               ||
    429 || `lemma`     || Lemmatisation of tokens                                     || ''String''                       || "good", "walk", "dog"             ||
    430 || `pos`       || Part-of-Speech annotations                                  || [#REF_UD_POS Universal POS tags] || "NOUN", "VERB", "ADJ"                ||
    431 || `orth`      || Orthographic transcription of (mostly) spoken resources     || ''String''                       || "dug", "cat", "wolking"              ||
    432 || `norm`      || Orthographic normalization of (mostly) spoken resources     || ''String''                       || "dog", "cat", "walking", "best"              ||
    433 || `phonetic`  || Phonetic transcription                                      || [#REF_SAMPA SAMPA]               || "'du:", "'vi:-d6 'ha:-b@n"           ||
    434 || `text`      || Annotation layer that is used in [#basicSearch Basic Search] || ''String''                       || "Dog", "cat" "walking", "better"                ||
    435 
    436 The column Syntax describes the inventory of symbols that a Client `MUST` use with a corresponding annotation layer; the value ''String'' denotes that symbols are arbitrary Unicode Strings, i.e. no fixed inventory of symbols are defined. An Endpoint `SHOULD` provide an appropriate error, if a Client used an invalid value.
    437 
    438 ==== FCS-QL
     427||=Layer Type Identifier =||=Annotation Layer Description                                =||=Syntax                          =||=Examples (without quotes)           =||
     428|| `token`                || Appropriate tokenisation of resource, i.e. words            || ''String''                       || "Dog", "cat", "walking", "better"               ||
     429|| `lemma`                || Lemmatisation of tokens                                     || ''String''                       || "good", "walk", "dog"             ||
     430|| `pos`                  || Part-of-Speech annotations                                  || [#REF_UD_POS Universal POS tags] || "NOUN", "VERB", "ADJ"                ||
     431|| `orth`                 || Orthographic transcription of (mostly) spoken resources     || ''String''                       || "dug", "cat", "wolking"              ||
     432|| `norm`                 || Orthographic normalization of (mostly) spoken resources     || ''String''                       || "dog", "cat", "walking", "best"              ||
     433|| `phonetic`             || Phonetic transcription                                      || [#REF_SAMPA SAMPA]               || "'du:", "'vi:-d6 'ha:-b@n"           ||
     434|| `text`                 || Annotation layer that is used in [#basicSearch Basic Search] || ''String''                       || "Dog", "cat" "walking", "better"                ||
     435
     436The column ''Layer Type Identifier'' denotes the identifier for a layer. It is used in [#fcsQL FCS-QL] queries and the XML serialization for the [#advancedDataView Advanced Data View]. All valid identifiers are defined in the table above, all other identifiers are reserved and `MUST NOT` be used. Clients and Endpoints `MAY` create custom Layer Type Identifiers, e.g. for testing proposed. If they so so, the custom Layer Type identifiers `MUST` start with the String `x-`, e.g. `x-customLayer`.
     437The column ''Syntax'' describes the inventory of symbols that a Client `MUST` use with a corresponding annotation layer; the value ''String'' denotes that symbols are arbitrary Unicode Strings, i.e. no fixed inventory of symbols are defined. An Endpoint `SHOULD` provide an appropriate error, if a Client used an invalid value.
     438
     439==== FCS-QL #fcsQL
    439440TODO: About FCS-QL
    440441
     
    564565}}}
    565566
    566 ||=Data    =|| t || || d || a || || ' || s ||   || d || e ||   || e || n || i || g || e ||   || e || c || h || t || e ||   || h || o || o || p ||   || v || o || o || r ||   || o || n || s ||   || m || e || n || s || e || n ||
    567 ||= Offset =|| 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 || 10 || 11 || 12 || 13 || 14 || 15 || 16 || 17 || 18 || 19 || 20 || 21 || 22 || 23 || 24 || 25 || 26 || 27 || 28 || 29 || 30 || 31 || 32 || 33 || 34 || 35 || 36 || 37 || 38 || 39 || 40 || 41 || 42 || 43 ||
    568 
    569 
    570 ||=Offset Start       =|| 1  || 3    || 6    || 9   || 12    || 18    || 24   || 29   || 34   || 38     ||
    571 ||=Offset End         =|| 1  || 4    || 7    || 10  || 16    || 22    || 27   || 32   || 36   || 43     ||
    572 ||=Layer ''orth''     =|| t  || da   || 's   || de  || enige || echte || hoop || voor || ons  || mensen ||
    573 ||=Layer ''pos''      =|| X  || PRON || VERB || DET || DET   || ADJ   || NOUN || ADP  || PRON || NOUN   ||
    574 ||=Layer ''lemma''    =|| _  || dat  || zijn || de  || enig  || echt  || hoop || voor || ons  || mens   ||
    575 ||=Layer ''phonetic'' =|| t@ || dAz  || dAz  || d@  || en@G@ || Ext@  || hop  || for  || Ons  || mEns@  ||
    576 
    577  - alignable (and referable)
    578 
     567 - ADV Data View allows to return structured information for Advanced Search queries
     568 - organized in one or more annotation layers
     569 - annotation layer := annotations of a specific type, e.g. part-of-speech or orthographic transcription
     570 - annotations of two annotation layers may freely overlap; no self-overlap in an annotation layer
     571 - Data View serialization in a stand-off like format, i.e annotations are ranges over the signal (= language resource as character, token or audio stream) are denoted by start and end offsets
     572 - layers alignable (through their offsets) and referable (trough their layer identifier)
     573 - ADV Data View serialization:
     574   - a list of segments (= "inventory" of all ranges used to describe annotations")
     575     - units can be "items" (= offsets in character or token-stream) or "timestamp" (timestamps in audio-stream), timestamps may have a resolution of up to 1/1000 second.
     576     - segments may also have an endpoint specific reference (= URI); can be show in aggregator and if user clicks link can open a viewer (e.g. audio-player) at the endpoint
     577   - a list of layers, each has a type (e.g. "pos", "lemma", see Layer Type identifier in section Layers above) and an layer identifier (= URI)
     578     - a layer consists of one or more Spans. A span references a segment (and thus inherits the start- and en- offsets) and contains the actual annotation (e.g. the port-of-speech label) in it's content; MAY also contain alt-value (e.g. original annotation value)
     579   - Hit Makers are added by marking Spans as hits (add `@highlight` attribute); multiple hit-makers are supported and Aggregator may display them visually distinct
     580
     581Example: a sentence interpreted as a character stream
     582||=Data   =|| t || || d || a || || ' || s ||   || d || e ||   || e || n || i || g || e ||   || e || c || h || t || e ||   || h || o || o || p ||   || v || o || o || r ||   || o || n || s ||   || m || e || n || s || e || n ||
     583||=Offset =|| 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 || 10 || 11 || 12 || 13 || 14 || 15 || 16 || 17 || 18 || 19 || 20 || 21 || 22 || 23 || 24 || 25 || 26 || 27 || 28 || 29 || 30 || 31 || 32 || 33 || 34 || 35 || 36 || 37 || 38 || 39 || 40 || 41 || 42 || 43 ||
     584
     585Example: several annotation layers for the sentence
     586||=Offset (Start, End) =|| 1,1  || 3,4    || 6,7    || 9,10   || 12,16    || 18,22    || 24,27   || 29,32   || 34,36   || 38,43  ||
     587||=Layer ''orth''      =|| t    || da     || 's     || de     || enige    || echte    || hoop    || voor    || ons     || mensen ||
     588||=Layer ''pos''       =|| X    || PRON   || VERB   || DET    || DET      || ADJ      || NOUN    || ADP     || PRON    || NOUN   ||
     589||=Layer ''lemma''     =|| _    || dat    || zijn   || de     || enig     || echt     || hoop    || voor    || ons     || mens   ||
     590||=Layer ''phonetic''  =|| t@   || dAz    || dAz    || d@     || en@G@    || Ext@     || hop     || for     || Ons     || mEns@  ||
     591
     592Example: XML serialization
    579593{{{#!xml
    580594<Advanced>