Changes between Version 59 and Version 60 of Taskforces/FCS/FCS-Specification-Draft


Ignore:
Timestamp:
06/12/17 18:56:33 (7 years ago)
Author:
Leif-Jöran
Comment:

Minor proofing

Legend:

Unmodified
Added
Removed
Modified
  • Taskforces/FCS/FCS-Specification-Draft

    v59 v60  
    1010
    1111== Terminology
    12 The key words `MUST`, `MUST NOT`, `REQUIRED`, `SHALL`, `SHALL NOT`, `SHOULD`, `SHOULD NOT`, `RECOMMENDED`, `MAY`, and `OPTIONAL` in this document are to be interpreted as described in [#REF_RFC_2119 RFC2119].
     12The key words `MUST`, `MUST NOT`, `REQUIRED`, `SHALL`, `SHALL NOT`, `SHOULD`, `SHOULD NOT`, `RECOMMENDED`, `MAY`, and `OPTIONAL` in this document are to be interpreted as in [#REF_RFC_2119 RFC2119].
    1313
    1414== Glossary
     
    419419The ''Advanced Search'' capability allows searching in annotated data, that is represented in annotation layers. An annotation ''layer'' contains annotations of a specific type, e.g. lemma or part-of-speech layer. Queries can be across annotation layer.
    420420
    421 CLARIN-FCS defined a set of searchable annotation layers with certain semantics and syntax. Endpoints `SHOULD` support as many different, of course depending on the resource type, annotation layers as possible.
     421CLARIN-FCS defines a set of searchable annotation layers with certain semantics and syntax. Endpoints `SHOULD` support as many different, of course depending on the resource type, annotation layers as possible.
    422422
    423423==== Layers #layers
     
    467467
    468468=== Result Format
    469 {{{
    470 #!div style="border: 1px solid #000000; font-size: 75%"
    471 TODO: check and adjust / proof-read
    472 }}}
    473469The Search Engine will produce a result set containing several hits as the outcome of processing a query. The Endpoint `MUST` serialize these hits in the CLARIN-FCS result format. Endpoints are `REQUIRED` to adhere to the principle, that ''one'' hit `MUST` be serialized as ''one'' CLARIN-FCS result record and `MUST NOT` combine several hits in one CLARIN-FCS result record. E.g., if a query matches five different sentences within one text (= the resource), the Endpoint must serialize them as five SRU records each with one Hit each referencing the same containing Resource (see section [#searchRetrieve Operation ''searchRetrieve'']).
    474470
    475 CLARIN-FCS uses a customized format for returning results. ''Resource'' and ''Resource Fragments'' serve as containers for hit results, which are presented in one or more ''Data View''. The following section describes the Resource format and Data View format and section [#searchRetrieve Operation ''searchRetrieve''] will describe, how hits are embedded within SRU responses.
     471CLARIN-FCS uses a customized format for returning results. ''Resource'' and ''Resource Fragments'' serve as containers for hit results, which are presented in one or more ''Data View''. The following section describes the Resource format and Data View format and section [#searchRetrieve Operation ''searchRetrieve''] will describe how hits are embedded within SRU responses.
    476472
    477473==== Resource and !ResourceFragment
     
    482478   A ''Resource Fragment'' is a smaller unit in a ''Resource'', i.e. a sentence in a text corpus or a time interval in an audio transcription.
    483479
    484 A Resource `SHOULD` be the most precise unit of data that is directly addressable as a "whole". A Resource `SHOULD` contain a Resource Fragment, if the hit consists of just a part of the Resource unit (for example if the hit is a sentence within a large text). A Resource Fragment `SHOULD` be addressable within a resource, i.e. it has an offset or a resource-internal identifier. Using Resource Fragments is `OPTIONAL`, but Endpoints are encouraged to use them. If the Endpoint encodes a hit with a Resource Fragment, the actual hit `SHOULD` be encoded as a Data View that within the Resource Fragment.
    485 
    486 Endpoints `SHOULD` always provide a link to the resource itself, i.e. each Resource or Resource Fragment `SHOULD` be identified by a persistent identifier or providing a URI, that is unique for Endpoint. Even if direct linking is not possible, i.e. due to licensing issues, the Endpoints `SHOULD` provide a URI to link to a web-page describing the corpus or collection, including instruction on how to obtain it. Endpoints `SHOULD` provide links that are as specific as possible (and logical), i.e. if a sentence within a resource cannot be addressed directly, the Resource Fragment `SHOULD NOT` contain a persistent identifier or a URI.
     480A Resource `SHOULD` be the most precise unit of data that is directly addressable as a "whole". A Resource `SHOULD` contain a Resource Fragment, if the hit consists of just a part of the Resource unit (for example if the hit is a sentence within a large text). A Resource Fragment `SHOULD` be addressable within a resource, i.e. it has an offset or a resource-internal identifier. Using Resource Fragments is `OPTIONAL`, but Endpoints are encouraged to use them. If the Endpoint encodes a hit with a Resource Fragment, the actual hit `SHOULD` be encoded as a Data View within the Resource Fragment.
     481
     482Endpoints `SHOULD` always provide a link to the resource itself, i.e. each Resource or Resource Fragment `SHOULD` be identified by a persistent identifier or providing a URI, that is unique for the Endpoint. Even if direct linking is not possible, i.e. due to licensing issues, the Endpoints `SHOULD` provide a URI to link to a web-page describing the corpus or collection, including instruction on how to obtain it. Endpoints `SHOULD` provide links that are as specific as possible (and logical), i.e. if a sentence within a resource cannot be addressed directly, the Resource Fragment `SHOULD NOT` contain a persistent identifier or an URI.
    487483
    488484If the Endpoint can provide both, a persistent identifier as well as a URI, for either Resource or Resource Fragment, they `SHOULD` provide both. When working with results, Clients `SHOULD` prefer persistent identifiers over regular URIs.