Changes between Version 37 and Version 38 of FCS-Specification-ScrapBook


Ignore:
Timestamp:
02/12/14 17:36:05 (10 years ago)
Author:
oschonef
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FCS-Specification-ScrapBook

    v37 v38  
    4848
    4949 Client::
    50     A software component, that implements the interface specification to query endpoints, i.e. an aggregator or an user-interface.     
     50    A software component, that implements the interface specification to query Endpoints, i.e. an aggregator or an user-interface.     
    5151
    5252 CQL::
     
    7878
    7979 Repository Registry::
    80     A separate service that allows registering endpoints and provides information about these to other components, e.g. an aggegator. The [http://centres.clarin.eu/ CLARIN Center Registry] is an implementation of such a repository registry.
     80    A separate service that allows registering Endpoints and provides information about these to other components, e.g. an aggegator. The [http://centres.clarin.eu/ CLARIN Center Registry] is an implementation of such a repository registry.
    8181
    8282=== Normative References ===
     
    181181 ''Basic profile''::
    182182   Endpoints `MUST` support ''term-only'' queries. \\
    183    Endpoints `SHOULD` support ''terms'' combined with boolean operator queries (''AND'' and ''OR''), including subqueries. Endpoints `MAY` also support ''NOT'' or ''PROX'' operator queries. If an endpoint does not support a query, i.e. the used operators are not supported by the Endpoint, it `MUST` return an appropriate error message using the appropriate SRU diagnostic. \\
     183   Endpoints `SHOULD` support ''terms'' combined with boolean operator queries (''AND'' and ''OR''), including subqueries. Endpoints `MAY` also support ''NOT'' or ''PROX'' operator queries. If an Endpoint does not support a query, i.e. the used operators are not supported by the Endpoint, it `MUST` return an appropriate error message using the appropriate SRU diagnostic. \\
    184184   Examples for valid CQL queries for the ''basic profile'':
    185185{{{
     
    192192cat AND (mouse OR "lazy dog")
    193193}}}
    194    The endpoint is `MUST` perform the query on an annotation tier, that makes the most sense for the user, i.e. the textual content for a text corpus resource or the orthographic transcription of a spoken language corpus. Endpoints `SHOULD` perform the query case-sensitive.\\
     194   The Endpoint is `MUST` perform the query on an annotation tier, that makes the most sense for the user, i.e. the textual content for a text corpus resource or the orthographic transcription of a spoken language corpus. Endpoints `SHOULD` perform the query case-sensitive.\\
    195195   Endpoint `MUST NOT` silently accept queries that include CQL features besides ''term-only'' and ''terms'' combined with boolean operator queries, i.e. queries involving context sets, etc.
    196196
     
    203203
    204204=== Result Format ===
    205 CLARIN-FCS uses a customized format for returning results. ''Resource'' and ''Resource Fragments'' serve as containers for hit results, that are presented in one or more  ''Data View''. The following section describes the result result and data view format and section [#searchRetrieve Operation ''searchRetrieve''] will describe, how hits are embedded within SRU responses.
     205CLARIN-FCS uses a customized format for returning results. ''Resource'' and ''Resource Fragments'' serve as containers for hit results, that are presented in one or more  ''Data View''. The following section describes the Resource format and Data View format and section [#searchRetrieve Operation ''searchRetrieve''] will describe, how hits are embedded within SRU responses.
    206206
    207207==== Resource and !ResourceFragment ====
     
    224224Endpoints `MUST` use the identifier `http://clarin.eu/fcs/1.0` for the ''responseItemType'' (= content for the `<sru:recordSchema>` element) in SRU responses.
    225225
    226 Endpoints `MAY` serialize hits as multiple Data Views, however they `MUST` provide the Generic Hits (HITS) Data View either encoded as a Resource Fragment (if applicable), or otherwise within the Resource (if there is no reasonable resource fragment). Other Data Views `SHOULD` be put in a place that is logical for their content (as is to be determined by the Endpoint), e.g. a metadata data view would most likely be put directly below Resource and a Data View representing some annotation layers directly around the hit is more likely to belong within a Resource Fragment.
     226Endpoints `MAY` serialize hits as multiple Data Views, however they `MUST` provide the Generic Hits (HITS) Data View either encoded as a Resource Fragment (if applicable), or otherwise within the Resource (if there is no reasonable Resource Fragment). Other Data Views `SHOULD` be put in a place that is logical for their content (as is to be determined by the Endpoint), e.g. a metadata Data View would most likely be put directly below Resource and a Data View representing some annotation layers directly around the hit is more likely to belong within a Resource Fragment.
    227227
    228228[=#REF_Example_1]Example 1:
     
    246246</fcs:Resource>
    247247}}}
    248 This [#REF_Example_2 example] shows a hit encoded as a Resource Fragment embedded within a Resource. The actual hit is again encoded as one Data View of type ''Generic Hits''. The hit is not directly referenceable, but the Resource, in which hit occurred, is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`. In contrast to [#REF_Example_1 Example 1], the endpoint decided to provide a "semantically richer" encoding and embedded the hit using a Resource Fragment within the Resource to indicate that the hit is a part of a larger resource, e.g. a sentence in a text document.
     248This [#REF_Example_2 example] shows a hit encoded as a Resource Fragment embedded within a Resource. The actual hit is again encoded as one Data View of type ''Generic Hits''. The hit is not directly referenceable, but the Resource, in which hit occurred, is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`. In contrast to [#REF_Example_1 Example 1], the Endpoint decided to provide a "semantically richer" encoding and embedded the hit using a Resource Fragment within the Resource to indicate that the hit is a part of a larger resource, e.g. a sentence in a text document.
    249249
    250250[=#REF_Example_3]Example 3:
     
    266266All entities of the Hit can be referenced by a persistent identifier and an URI. The complete Resource is referenceable by either the persistent identifier `http://hdl.handle.net/4711/08-15` or the URI `http://repos.example.org/file/text_08_15.html` and the CMDI metadata record in the CMDI Data View is referenceable either by the persistent identifier `http://hdl.handle.net/4711/08-15-1` or the URI `http://repos.example.org/file/08_15_1.cmdi`. The actual hit in the Resource Fragment is also directly referenceable by either the persistent identifier `http://hdl.handle.net/4711/00-15-2` or the URI `http://repos.example.org/file/text_08_15.html#sentence2`.   
    267267
    268 Endpoints `MUST` serialize one Resource for each hit, i.e. they `MUST NOT` combine several hits in one Resource. E.g., if a query matches five different sentences within one text (= the resource), the endpoint must serialize five Resource (= one per hit) and embed each within one SRU result (see [#query below]).
     268Endpoints `MUST` serialize one Resource for each hit, i.e. they `MUST NOT` combine several hits in one Resource. E.g., if a query matches five different sentences within one text (= the resource), the Endpoint must serialize five Resource (= one per hit) and embed each within one SRU result (see [#query below]).
    269269
    270270
    271271==== Data View ====
    272 A ''Data View'' serves as a container for representing search results within CLARIN-FCS. Data Views are designed to allow for different representations of results, i.e. they are deliberately kept open to allow further extensions with more supported data view formats. The content of a Data View is called ''Payload''. Each Payload is typed and the type of the Payload is recorded in the `@type` attribute if the `<fcs:DataView>` element. The Payload type is is identified by a MIME type ([#REF_RFC_6838 RFC6838], [#REF_RFC_3023 RFC3023]). If no existing MIME type can be used, implementors `SHOULD` define a properer private mime type.
     272A ''Data View'' serves as a container for representing search results within CLARIN-FCS. Data Views are designed to allow for different representations of results, i.e. they are deliberately kept open to allow further extensions with more supported Data View formats. The content of a Data View is called ''Payload''. Each Payload is typed and the type of the Payload is recorded in the `@type` attribute if the `<fcs:DataView>` element. The Payload type is is identified by a MIME type ([#REF_RFC_6838 RFC6838], [#REF_RFC_3023 RFC3023]). If no existing MIME type can be used, implementors `SHOULD` define a properer private mime type.
    273273
    274274The Payload of a Data View can either be deposited ''inline'' or by ''reference''. In the case of ''inline'', it `MUST` be serialized as an XML fragment below the `<fcs:DataView>` element. This method is the preferred methods payloads that can easily serialized in XML. In the case of by ''reference'', the content cannot easily deposited inline, i.e. it is binary content. In this case, the Data View `MUST` include a `@ref` or `@pid` attribute that links location for Clients to download the payload. This location `SHOULD` be ''openly accessible'', i.e. data can be downloaded freely without any need to perform a login.