Changes between Version 39 and Version 40 of Taskforces/FCS/FCS-Specification-Draft


Ignore:
Timestamp:
11/03/15 12:33:20 (9 years ago)
Author:
Oliver Schonefeld
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Taskforces/FCS/FCS-Specification-Draft

    v39 v40  
    179179|| `diag`  || `http://docs.oasis-open.org/ns/search-ws/diagnostic`  || SRU Version 2.0 Diagnostics                              || prefixed ||
    180180|| `zr`    || `http://explain.z3950.org/dtd/2.0/`                   || SRU/ZeeRex Explain                                       || prefixed ||
    181 || `sru`   || `http://www.loc.gov/zing/srw/`                        || SRU Version 1.2, ''only comparability mode''             || prefixed ||
    182 || `diag`  || `http://www.loc.gov/zing/srw/diagnostic/`             || SRU Version 1.2 Diagnostics, ''only comparability mode'' || prefixed ||
     181|| `sru`   || `http://www.loc.gov/zing/srw/`                        || SRU Version 1.2, ''only compatibility mode''             || prefixed ||
     182|| `diag`  || `http://www.loc.gov/zing/srw/diagnostic/`             || SRU Version 1.2 Diagnostics, ''only compatibility mode'' || prefixed ||
    183183
    184184= CLARIN-FCS Interface Specification
     
    381381        <ed:Resource pid="http://hdl.handle.net/4711/0815">
    382382            <ed:Title xml:lang="de">Goethe Korpus</ed:Title>
    383             <ed:Title xml:lang="en">Goethe Corpus</ed:Title>
     383            <ed:Title xml:lang="en">Goethe corpus</ed:Title>
    384384            <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description>
    385385            <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description>
     
    596596 - organized in one or more annotation layers
    597597 - annotation layer := annotations of a specific type, e.g. part-of-speech or orthographic transcription
    598  - annotations of two annotation layers may freely overlap; no self-overlap in an annotation layer
     598 - annotations of two different annotation layers may freely overlap; no self-overlap in an annotation layer
    599599 - Data View serialization in a stand-off like format, i.e annotations are ranges over the signal (= language resource as character, token or audio stream) are denoted by start and end offsets
    600600 - layers alignable (through their offsets) and referable (trough their layer identifier)
     
    602602   - a list of segments (= "inventory" of all ranges used to describe annotations")
    603603     - units can be "items" (= offsets in character or token-stream) or "timestamp" (timestamps in audio-stream), timestamps may have a resolution of up to 1/1000 second.
     604       - to come up with the correct offsets is up to the endpoint; it must do so in a consistent manner. a recommendation for character streams: character := Unicode codepoint, normalized to Unicode Normalization Form KC (NFKC; Compatibility Decomposition, followed by Canonical Composition)
    604605     - segments may also have an endpoint specific reference (= URI); can be show in aggregator and if user clicks link can open a viewer (e.g. audio-player) at the endpoint
    605606   - a list of layers, each has a type (e.g. "pos", "lemma", see Layer Type identifier in section Layers above) and an layer identifier (= URI)
    606607     - a layer consists of one or more Spans. A span references a segment (and thus inherits the start- and en- offsets) and contains the actual annotation (e.g. the port-of-speech label) in it's content; MAY also contain alt-value (e.g. original annotation value)
     608     - document order of layer elements define the view order in the Aggregator
     609     - endpoints should return at least all layers that where referenced the query; they may return more
    607610   - Hit Makers are added by marking Spans as hits (add `@highlight` attribute); multiple hit-makers are supported and Aggregator may display them visually distinct
     611     - where to add hit markers is up to the endpoint; generally "things" that where referenced in the query should be marked.
     612
    608613
    609614Example: a sentence interpreted as a character stream