Changes between Version 39 and Version 40 of Taskforces/FCS/FCS-Specification-Draft
- Timestamp:
- 11/03/15 12:33:20 (9 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Taskforces/FCS/FCS-Specification-Draft
v39 v40 179 179 || `diag` || `http://docs.oasis-open.org/ns/search-ws/diagnostic` || SRU Version 2.0 Diagnostics || prefixed || 180 180 || `zr` || `http://explain.z3950.org/dtd/2.0/` || SRU/ZeeRex Explain || prefixed || 181 || `sru` || `http://www.loc.gov/zing/srw/` || SRU Version 1.2, ''only compa rability mode'' || prefixed ||182 || `diag` || `http://www.loc.gov/zing/srw/diagnostic/` || SRU Version 1.2 Diagnostics, ''only compa rability mode'' || prefixed ||181 || `sru` || `http://www.loc.gov/zing/srw/` || SRU Version 1.2, ''only compatibility mode'' || prefixed || 182 || `diag` || `http://www.loc.gov/zing/srw/diagnostic/` || SRU Version 1.2 Diagnostics, ''only compatibility mode'' || prefixed || 183 183 184 184 = CLARIN-FCS Interface Specification … … 381 381 <ed:Resource pid="http://hdl.handle.net/4711/0815"> 382 382 <ed:Title xml:lang="de">Goethe Korpus</ed:Title> 383 <ed:Title xml:lang="en">Goethe Corpus</ed:Title>383 <ed:Title xml:lang="en">Goethe corpus</ed:Title> 384 384 <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description> 385 385 <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description> … … 596 596 - organized in one or more annotation layers 597 597 - annotation layer := annotations of a specific type, e.g. part-of-speech or orthographic transcription 598 - annotations of two annotation layers may freely overlap; no self-overlap in an annotation layer598 - annotations of two different annotation layers may freely overlap; no self-overlap in an annotation layer 599 599 - Data View serialization in a stand-off like format, i.e annotations are ranges over the signal (= language resource as character, token or audio stream) are denoted by start and end offsets 600 600 - layers alignable (through their offsets) and referable (trough their layer identifier) … … 602 602 - a list of segments (= "inventory" of all ranges used to describe annotations") 603 603 - units can be "items" (= offsets in character or token-stream) or "timestamp" (timestamps in audio-stream), timestamps may have a resolution of up to 1/1000 second. 604 - to come up with the correct offsets is up to the endpoint; it must do so in a consistent manner. a recommendation for character streams: character := Unicode codepoint, normalized to Unicode Normalization Form KC (NFKC; Compatibility Decomposition, followed by Canonical Composition) 604 605 - segments may also have an endpoint specific reference (= URI); can be show in aggregator and if user clicks link can open a viewer (e.g. audio-player) at the endpoint 605 606 - a list of layers, each has a type (e.g. "pos", "lemma", see Layer Type identifier in section Layers above) and an layer identifier (= URI) 606 607 - a layer consists of one or more Spans. A span references a segment (and thus inherits the start- and en- offsets) and contains the actual annotation (e.g. the port-of-speech label) in it's content; MAY also contain alt-value (e.g. original annotation value) 608 - document order of layer elements define the view order in the Aggregator 609 - endpoints should return at least all layers that where referenced the query; they may return more 607 610 - Hit Makers are added by marking Spans as hits (add `@highlight` attribute); multiple hit-makers are supported and Aggregator may display them visually distinct 611 - where to add hit markers is up to the endpoint; generally "things" that where referenced in the query should be marked. 612 608 613 609 614 Example: a sentence interpreted as a character stream