Changes between Version 88 and Version 89 of FCS-Specification-ScrapBook


Ignore:
Timestamp:
03/19/14 11:13:17 (11 years ago)
Author:
Oliver Schonefeld
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FCS-Specification-ScrapBook

    v88 v89  
    275275
    276276=== Result Format ===#resultFormat
    277 The Search Engine will produce a result set containing several hits as the outcome of processing a query. The Endpoint `MUST` serialize these hits in the CLARIN-FCS result format. Endpoints are `REQUIRED` to adhere to the principle, that ''one'' hit `MUST` be serialized as ''one'' CLARIN-FCS result record and `MUST NOT` combine several hits in one CLARIN-FCS result record. E.g., if a query matches five different sentences within one text (= the resource), the Endpoint must serialize five Resource (= one per hit) and embed each within one SRU result (see [#searchRetrieve below]).
     277The Search Engine will produce a result set containing several hits as the outcome of processing a query. The Endpoint `MUST` serialize these hits in the CLARIN-FCS result format. Endpoints are `REQUIRED` to adhere to the principle, that ''one'' hit `MUST` be serialized as ''one'' CLARIN-FCS result record and `MUST NOT` combine several hits in one CLARIN-FCS result record. E.g., if a query matches five different sentences within one text (= the resource), the Endpoint must serialize five Resource (= one per hit) and embed each within one SRU result (see section [#searchRetrieve Operation ''searchRetrieve'']).
    278278
    279279CLARIN-FCS uses a customized format for returning results. ''Resource'' and ''Resource Fragments'' serve as containers for hit results, that are presented in one or more ''Data View''. The following section describes the Resource format and Data View format and section [#searchRetrieve Operation ''searchRetrieve''] will describe, how hits are embedded within SRU responses.
     
    381381
    382382=== Endpoint Description ===#endpointDescription
    383 Endpoints need to provide information about their capabilities to support auto-configuration of Clients, This capabilities include, among other information, the Profile that is supported by the Endpoint. The ''Endpoint Description'' mechanism provides the necessary facility to provide this information to the Clients. Endpoints `MUST` encode their capabilities using an XML format and embed this information into the SRU/CQL protocol as described in section [#explain Operation ''explain'']. The XML fragment generated by the Endpoint for the Endpoint Description `MUST` be valid according to the XML schema "[source:FederatedSearch/schema/Endpoint-Description.xsd Endpoint-Description.xsd]" ([source:FederatedSearch/schema/Endpoint-Description.xsd?format=txt download]).
     383Endpoints need to provide information about their capabilities to support auto-configuration of Clients, The ''Endpoint Description'' mechanism provides the necessary facility to provide this information to the Clients. Endpoints `MUST` encode their capabilities using an XML format and embed this information into the SRU/CQL protocol as described in section [#explain Operation ''explain'']. The XML fragment generated by the Endpoint for the Endpoint Description `MUST` be valid according to the XML schema "[source:FederatedSearch/schema/Endpoint-Description.xsd Endpoint-Description.xsd]" ([source:FederatedSearch/schema/Endpoint-Description.xsd?format=txt download]).
    384384
    385385The XML fragment for ''Endpoint Description'' is encoded as an `<ed:EndpointDescription>` element, that contains the following attributes and children:
    386386 * one `@version` attribute (`REQUIRED`) on the `<ed:EndpointDescription>` element. The value of the `@version` attribute `MUST` be `1`.
    387387 * one `<ed:Capabilities>` element (`REQUIRED`) that contains one or more `<ed:Capability>` elements \\
    388    The content of the `<ed:Capability>` element indicates the capabilities, that are supported by the Endpoint. For valid values, see [#capabilities section Capabilities and Profiles]
     388   The content of the `<ed:Capability>` element indicates the capabilities, that are supported by the Endpoint. For valid values, see section [#capabilities Capabilities and Profiles]. This list `MUST NOT` include duplicate values.
    389389 * one `<ed:SupportedDataViews>` (`REQUIRED`) \\
    390    A list of Data Views, that are supported by this Endpoint. This list is composed of one or more `<ed:SupportedDataView>` elements. The content of a `<ed:SupportedDataView>` `MUST` be the MIME type of a supported Data View, e.g. `application/x-clarin-fcs-hits+xml`. Each `<ed:SupportedDataView>` element `MUST`carry a `@id` and a `@delivery-policy` attribute. The value of the `@id` attribute is later used in the `<ed:Resource>` element to indicate, which Data View is supported by a resource (see below). The `@delivery-policy` indicates, the Endpoint's delivery policy, for that Data View. Valid values are `send-default` for the ''send-default'' and `need-to-request` for the ''need-to-request'' delivery policy.
     390   A list of Data Views, that are supported by this Endpoint. This list is composed of one or more `<ed:SupportedDataView>` elements. The content of a `<ed:SupportedDataView>` `MUST` be the MIME type of a supported Data View, e.g. `application/x-clarin-fcs-hits+xml`. Each `<ed:SupportedDataView>` element `MUST` carry a `@id` and a `@delivery-policy` attribute. The value of the `@id` attribute is later used in the `<ed:Resource>` element to indicate, which Data View is supported by a resource (see below). The `@delivery-policy` indicates, the Endpoint's delivery policy, for that Data View. Valid values are `send-default` for the ''send-default'' and `need-to-request` for the ''need-to-request'' delivery policy. \\
     391   This list `MUST NOT` include duplicate entries, i.e. no MIME type must appear more then once. \\
     392   The value of the `@id` attribute `MUST NOT` contain the characters `,` (comma) or `;` (semicolon)
    391393 * one `<ed:Resources>` element (`REQUIRED`) \\
    392394   A list of (top-level) resources that are available, i.e. searchable, at the Endpoint. The `<ed:Resources>` element contains one or more `<ed:Resource>` elements (see below). The Endpoint `MUST` declare at least one (top-level) resource.
     
    402404   The (relevant) languages available within the resource. The `<ed:Languages>` element contains one or more `<ed:Language>` elements. The content of a `<ed:Language>` element `MUST` be a ISO 639-3 three letter language code. This element should be repeated for all languages (relevant) available ''within'' the resource, however this list `MUST NOT` contain duplicate entries.
    403405 * one `<ed:AvailableDataViews>` element (`REQUIRED`) \\
    404    The Data Views, that are available for the resource. The `<ed:AvailableDataViews>` `MUST` carry a `@ref` attribute, that contains a whitespace separated list of id values, that correspond to value of `@id` attribute on `<ed:SupportedDataView>` elements. 
     406   The Data Views, that are available for the resource. The `<ed:AvailableDataViews>` `MUST` carry a `@ref` attribute, that contains a whitespace separated list of id values, that correspond to value of `@id` attribute on `<ed:SupportedDataView>` elements. \\
     407   In case of sub-resources, each Resource `SHOULD` support all Data Views, that are supported by the parent resource. However, every resource `MUST` declare all available Data Views independently, i.e. there is no implicit inheritance semantic.
    405408 * zero or one `<ed:Resources>` element (`OPTIONAL`) \\
    406409   If a resource has searchable sub-resources the Endpoint `MUST` supply additional finer grained resource elements, which are wrapped in a `<ed:Resources>` element. A sub-resource is a searchable entity within a resource, e.g. a sub-corpus.
     
    432435</ed:EndpointDescription>
    433436}}}
    434 This [#REF_Example_4 example] shows a simple Endpoint Description for an Endpoint that only supports the ''basic-search'' capability (and this, the basic'' Profile) and only provides the Generic Hits Data View, which is indicated by a `<ed:SupportedDataView>` element. This element carries a `@id` attribute with a value of `dv1` and indicates a delivery policy of ''send-default' by the `@delivery-policy` attribute. It only provides one top-level resource identified by the persistent identifier `http://hdl.handle.net/4711/0815`. The resource has a title as well as a description in German and English. A landing page is located at `http://repos.example.org/corpus1.html`. The predominant language in the resource contents is German. Only the Generic Hits Data View is supported for this resource, because the `<ed:AvailableDataViews>` element only references the `<ed:SupporedDataView>` element with the `@id` with a value of `dv1`.
     437This [#REF_Example_4 example] shows a simple Endpoint Description for an Endpoint that only supports the ''basic-search'' capability (and thus, the ''basic'' Profile) and only provides the Generic Hits Data View, which is indicated by a `<ed:SupportedDataView>` element. This element carries a `@id` attribute with a value of `dv1` and indicates a delivery policy of ''send-default'' by the `@delivery-policy` attribute. It only provides one top-level resource identified by the persistent identifier `http://hdl.handle.net/4711/0815`. The resource has a title as well as a description in German and English. A landing page is located at `http://repos.example.org/corpus1.html`. The predominant language in the resource contents is German. Only the Generic Hits Data View is supported for this resource, because the `<ed:AvailableDataViews>` element only references the `<ed:SupporedDataView>` element with the `@id` with a value of `dv1`.
    435438
    436439[=#REF_Example_5]Example 5:
     
    492495</ed:EndpointDescription>
    493496}}}
    494 This more complex [#REF_Example_5 example] show an Endpoint Description for an Endpoint that, similar to [#REF_Example_4 Example 4], supports the ''basic'' profile. In addition to the Generic Hits Data View it also supports the CMDI Data View. The Endpoint has two top-level resources (identified by the persistent identifiers `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816`. The second top-level resource has two independently searchable sub-resources, identified by the persistent identifier `http://hdl.handle.net/4711/0816-1` and `http://hdl.handle.net/4711/0816-2`. All resources are described using several properties, like title, description, etc.
     497This more complex [#REF_Example_5 example] show an Endpoint Description for an Endpoint that, similar to [#REF_Example_4 Example 4], supports the ''basic'' profile. In addition to the Generic Hits Data View it also supports the CMDI Data View. The delivery polices are ''send-always'' for the Generic Hits Data View and ''need-to-request'' for the CMDI Data View. The Endpoint has two top-level resources (identified by the persistent identifiers `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816`. The second top-level resource has two independently searchable sub-resources, identified by the persistent identifier `http://hdl.handle.net/4711/0816-1` and `http://hdl.handle.net/4711/0816-2`. All resources are described using several properties, like title, description, etc. The first top-level resource provides only the Generic Hits Data Data View, while the other top-level resource, including it's children, provide the Generic Hits and the CMDI Data Views.
    495498
    496499=== Endpoint Custom Extensions ===
     
    539542   `<zr:databaseInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`) \\
    540543   `<zr:schemaInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`). This element `MUST` contain an element `<zr:schema>` with an `@identifier` attribute with a value of `http://clarin.eu/fcs/resource` and an `@name` attribute with a value of `fcs`. \\
    541    `<zr:configInfo>` is `OPTIONAL``\\
     544   `<zr:configInfo>` is `OPTIONAL` \\
    542545   An ''extended'' profile may define how the `<zr:indexInfo>` element is to be used, therefore it is `NOT RECOMMENDED` for Endpoints to define custom extensions.
    543546 ''Extended'' Profile::
     
    547550
    548551The following example shows a request and response to an ''explain'' request with added extra request parameter `x-fcs-endpoint-description`:
    549 * HTTP GET request: Client ? Endpoint:
     552* HTTP GET request: Client Endpoint:
    550553{{{#!sh
    551554http://repos.example.org/fcs-endpoint?operation=explain&version=1.2&x-fcs-endpoint-description=true
    552555}}}
    553 * HTTP Response: Endpoint ? Client:
     556* HTTP Response: Endpoint Client:
    554557{{{#!xml
    555558<?xml version='1.0' encoding='utf-8'?>
     
    594597  </sru:echoedExplainRequest>
    595598  <sru:extraResponseData>
    596     <ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description">
    597       <ed:Profile>basic</ed:Profile>
     599    <ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="1">
     600      <ed:Capabilities>
     601          <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability>
     602      </ed:Capabilities>
    598603      <ed:SupportedDataViews>
    599         <ed:SupportedDataView>application/x-clarin-fcs-hits+xml</ed:SupportedDataView>
     604          <ed:SupportedDataView id="dv1" delivery-policy="send-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView>
    600605      </ed:SupportedDataViews>
    601606      <ed:Resources>
    602         <!-- just one top-level resource at the Endpoint -->
    603         <ed:Resource pid="http://hdl.handle.net/4711/0815">
    604           <ed:Title xml:lang="de">Goethe Corpus</ed:Title>
    605           <ed:Title xml:lang="en">Goethe Korpus</ed:Title>
    606           <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description>
    607           <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description>
    608           <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI>
    609           <ed:Languages>
    610             <ed:Language>deu</ed:Language>
    611           </ed:Languages>
    612         </ed:Resource>
     607          <!-- just one top-level resource at the Endpoint -->
     608          <ed:Resource pid="http://hdl.handle.net/4711/0815">
     609              <ed:Title xml:lang="de">Goethe Corpus</ed:Title>
     610              <ed:Title xml:lang="en">Goethe Korpus</ed:Title>
     611              <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description>
     612              <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description>
     613              <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI>
     614              <ed:Languages>
     615                  <ed:Language>deu</ed:Language>
     616              </ed:Languages>
     617              <ed:AvailableDataViews ref="dv1"/>
     618          </ed:Resource>
    613619      </ed:Resources>
    614620    </ed:EndpointDescription>
     
    627633
    628634The following example shows a request and response to an ''searchRetrieve'' request with a ''term-only'' query for "cat":
    629 * HTTP GET request: Client ? Endpoint:
     635* HTTP GET request: Client Endpoint:
    630636{{{#!sh
    631637http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat
    632638}}}
    633 * HTTP Response: Endpoint ? Client:
     639* HTTP Response: Endpoint Client:
    634640{{{#!xml
    635641<?xml version='1.0' encoding='utf-8'?>
     
    689695http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-context=http://hdl.handle.net/4711/0815,http://hdl.handle.net/4711/0816-2
    690696}}}
    691 If an invalid persistent identifier is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/1` diagnostic, i.e add the appropiate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. just issue the diagnostic and perform no search or it `MAY` treat it a non-fatal and perform the search.
     697If an invalid persistent identifier is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/1` diagnostic, i.e add the appropriate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. just issue the diagnostic and perform no search or it `MAY` treat it a non-fatal and perform the search.
     698
     699If a Client want to request one or more Data Views, that are handled by Endpoint with the ''need-to-request'' delivery policy, it `MUST` pass a comma-separated of ''Data View identifier'' in the `x-fcs-dataviews` extra request parameter of the 'searchRetrieve' request. A Client can extract valid values for the ''Data View identifiers'' from the `@id` attribute of the `<ed:SupportedDataView>` elements in the Endpoint Description of the Endpoint (see section [#explain Operation ''explain''] and section [#endpointDescription Endpoint Description]).
     700
     701For example, to request the CMDI Data View from an Endpoint that has an Endpoint Description as described in [#REF_Example_5 Example 5] and Client would need to use the ''Data View identifier'' `dv2` and submit the following request:
     702{{{#!sh
     703http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-dataviews=dv2
     704}}}
     705If an invalid ''Data View identifier'' is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/4`diagnostic, i.e. add the appropriate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. just issue the diagnostic and perform no search or it `MAY` treat it a non-fatal and perform the search.
    692706
    693707== Normative Appendix