427 | | ||=Identifier =||=Annotation Layer Description =||=Syntax =||=Examples (without quotes) =|| |
428 | | || `token` || Appropriate tokenisation of resource, i.e. words || ''String'' || "Dog", "cat", "walking", "better" || |
429 | | || `lemma` || Lemmatisation of tokens || ''String'' || "good", "walk", "dog" || |
430 | | || `pos` || Part-of-Speech annotations || [#REF_UD_POS Universal POS tags] || "NOUN", "VERB", "ADJ" || |
431 | | || `orth` || Orthographic transcription of (mostly) spoken resources || ''String'' || "dug", "cat", "wolking" || |
432 | | || `norm` || Orthographic normalization of (mostly) spoken resources || ''String'' || "dog", "cat", "walking", "best" || |
433 | | || `phonetic` || Phonetic transcription || [#REF_SAMPA SAMPA] || "'du:", "'vi:-d6 'ha:-b@n" || |
434 | | || `text` || Annotation layer that is used in [#basicSearch Basic Search] || ''String'' || "Dog", "cat" "walking", "better" || |
435 | | |
436 | | The column Syntax describes the inventory of symbols that a Client `MUST` use with a corresponding annotation layer; the value ''String'' denotes that symbols are arbitrary Unicode Strings, i.e. no fixed inventory of symbols are defined. An Endpoint `SHOULD` provide an appropriate error, if a Client used an invalid value. |
437 | | |
438 | | ==== FCS-QL |
| 427 | ||=Layer Type Identifier =||=Annotation Layer Description =||=Syntax =||=Examples (without quotes) =|| |
| 428 | || `token` || Appropriate tokenisation of resource, i.e. words || ''String'' || "Dog", "cat", "walking", "better" || |
| 429 | || `lemma` || Lemmatisation of tokens || ''String'' || "good", "walk", "dog" || |
| 430 | || `pos` || Part-of-Speech annotations || [#REF_UD_POS Universal POS tags] || "NOUN", "VERB", "ADJ" || |
| 431 | || `orth` || Orthographic transcription of (mostly) spoken resources || ''String'' || "dug", "cat", "wolking" || |
| 432 | || `norm` || Orthographic normalization of (mostly) spoken resources || ''String'' || "dog", "cat", "walking", "best" || |
| 433 | || `phonetic` || Phonetic transcription || [#REF_SAMPA SAMPA] || "'du:", "'vi:-d6 'ha:-b@n" || |
| 434 | || `text` || Annotation layer that is used in [#basicSearch Basic Search] || ''String'' || "Dog", "cat" "walking", "better" || |
| 435 | |
| 436 | The column ''Layer Type Identifier'' denotes the identifier for a layer. It is used in [#fcsQL FCS-QL] queries and the XML serialization for the [#advancedDataView Advanced Data View]. All valid identifiers are defined in the table above, all other identifiers are reserved and `MUST NOT` be used. Clients and Endpoints `MAY` create custom Layer Type Identifiers, e.g. for testing proposed. If they so so, the custom Layer Type identifiers `MUST` start with the String `x-`, e.g. `x-customLayer`. |
| 437 | The column ''Syntax'' describes the inventory of symbols that a Client `MUST` use with a corresponding annotation layer; the value ''String'' denotes that symbols are arbitrary Unicode Strings, i.e. no fixed inventory of symbols are defined. An Endpoint `SHOULD` provide an appropriate error, if a Client used an invalid value. |
| 438 | |
| 439 | ==== FCS-QL #fcsQL |
566 | | ||=Data =|| t || || d || a || || ' || s || || d || e || || e || n || i || g || e || || e || c || h || t || e || || h || o || o || p || || v || o || o || r || || o || n || s || || m || e || n || s || e || n || |
567 | | ||= Offset =|| 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 || 10 || 11 || 12 || 13 || 14 || 15 || 16 || 17 || 18 || 19 || 20 || 21 || 22 || 23 || 24 || 25 || 26 || 27 || 28 || 29 || 30 || 31 || 32 || 33 || 34 || 35 || 36 || 37 || 38 || 39 || 40 || 41 || 42 || 43 || |
568 | | |
569 | | |
570 | | ||=Offset Start =|| 1 || 3 || 6 || 9 || 12 || 18 || 24 || 29 || 34 || 38 || |
571 | | ||=Offset End =|| 1 || 4 || 7 || 10 || 16 || 22 || 27 || 32 || 36 || 43 || |
572 | | ||=Layer ''orth'' =|| t || da || 's || de || enige || echte || hoop || voor || ons || mensen || |
573 | | ||=Layer ''pos'' =|| X || PRON || VERB || DET || DET || ADJ || NOUN || ADP || PRON || NOUN || |
574 | | ||=Layer ''lemma'' =|| _ || dat || zijn || de || enig || echt || hoop || voor || ons || mens || |
575 | | ||=Layer ''phonetic'' =|| t@ || dAz || dAz || d@ || en@G@ || Ext@ || hop || for || Ons || mEns@ || |
576 | | |
577 | | - alignable (and referable) |
578 | | |
| 567 | - ADV Data View allows to return structured information for Advanced Search queries |
| 568 | - organized in one or more annotation layers |
| 569 | - annotation layer := annotations of a specific type, e.g. part-of-speech or orthographic transcription |
| 570 | - annotations of two annotation layers may freely overlap; no self-overlap in an annotation layer |
| 571 | - Data View serialization in a stand-off like format, i.e annotations are ranges over the signal (= language resource as character, token or audio stream) are denoted by start and end offsets |
| 572 | - layers alignable (through their offsets) and referable (trough their layer identifier) |
| 573 | - ADV Data View serialization: |
| 574 | - a list of segments (= "inventory" of all ranges used to describe annotations") |
| 575 | - units can be "items" (= offsets in character or token-stream) or "timestamp" (timestamps in audio-stream), timestamps may have a resolution of up to 1/1000 second. |
| 576 | - segments may also have an endpoint specific reference (= URI); can be show in aggregator and if user clicks link can open a viewer (e.g. audio-player) at the endpoint |
| 577 | - a list of layers, each has a type (e.g. "pos", "lemma", see Layer Type identifier in section Layers above) and an layer identifier (= URI) |
| 578 | - a layer consists of one or more Spans. A span references a segment (and thus inherits the start- and en- offsets) and contains the actual annotation (e.g. the port-of-speech label) in it's content; MAY also contain alt-value (e.g. original annotation value) |
| 579 | - Hit Makers are added by marking Spans as hits (add `@highlight` attribute); multiple hit-makers are supported and Aggregator may display them visually distinct |
| 580 | |
| 581 | Example: a sentence interpreted as a character stream |
| 582 | ||=Data =|| t || || d || a || || ' || s || || d || e || || e || n || i || g || e || || e || c || h || t || e || || h || o || o || p || || v || o || o || r || || o || n || s || || m || e || n || s || e || n || |
| 583 | ||=Offset =|| 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 || 10 || 11 || 12 || 13 || 14 || 15 || 16 || 17 || 18 || 19 || 20 || 21 || 22 || 23 || 24 || 25 || 26 || 27 || 28 || 29 || 30 || 31 || 32 || 33 || 34 || 35 || 36 || 37 || 38 || 39 || 40 || 41 || 42 || 43 || |
| 584 | |
| 585 | Example: several annotation layers for the sentence |
| 586 | ||=Offset (Start, End) =|| 1,1 || 3,4 || 6,7 || 9,10 || 12,16 || 18,22 || 24,27 || 29,32 || 34,36 || 38,43 || |
| 587 | ||=Layer ''orth'' =|| t || da || 's || de || enige || echte || hoop || voor || ons || mensen || |
| 588 | ||=Layer ''pos'' =|| X || PRON || VERB || DET || DET || ADJ || NOUN || ADP || PRON || NOUN || |
| 589 | ||=Layer ''lemma'' =|| _ || dat || zijn || de || enig || echt || hoop || voor || ons || mens || |
| 590 | ||=Layer ''phonetic'' =|| t@ || dAz || dAz || d@ || en@G@ || Ext@ || hop || for || Ons || mEns@ || |
| 591 | |
| 592 | Example: XML serialization |