Changes between Version 14 and Version 15 of Taskforces/FCS/FCS-Specification-Draft


Ignore:
Timestamp:
10/23/15 10:00:11 (9 years ago)
Author:
Oliver Schonefeld
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Taskforces/FCS/FCS-Specification-Draft

    v14 v15  
    9292    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.pdf (PDF)]
    9393
    94  OASIS-SRU12[=#REF_SRU_12]::
    95     searchRetrieve: Part 2. SRU searchRetrieve Operation: APD Binding for SRU 1.2 Version 1.0, OASIS, January 2013, \\
     94 OASIS-SRU20[=#REF_SRU_20]::
     95    searchRetrieve: Part 3. SRU searchRetrieve Operation: APD Binding for SRU 2.0 Version 1.0, OASIS, January 2013, \\
    9696    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.doc]
    9797    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.html (HTML)]
    9898    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.pdf (PDF)]
     99
     100 OASIS-SRU12[=#REF_SRU_12]::
     101    searchRetrieve: Part 2. SRU searchRetrieve Operation: APD Binding for SRU 1.2 Version 1.0, OASIS, January 2013, \\
     102    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part3-sru2.0/searchRetrieve-v1.0-os-part3-sru2.0.doc]
     103    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part3-sru2.0/searchRetrieve-v1.0-os-part3-sru2.0.html (HTML)]
     104    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part3-sru2.0/searchRetrieve-v1.0-os-part3-sru2.0.pdf (PDF)]
    99105
    100106 OASIS-CQL[=#REF_CQL]::
     
    265271
    266272=== Result Format
     273{{{
     274#!div style="border: 1px solid #000000; font-size: 75%"
     275TODO: check and adjust / proof-read
     276}}}
    267277The Search Engine will produce a result set containing several hits as the outcome of processing a query. The Endpoint `MUST` serialize these hits in the CLARIN-FCS result format. Endpoints are `REQUIRED` to adhere to the principle, that ''one'' hit `MUST` be serialized as ''one'' CLARIN-FCS result record and `MUST NOT` combine several hits in one CLARIN-FCS result record. E.g., if a query matches five different sentences within one text (= the resource), the Endpoint must serialize them as five SRU records each with one Hit each referencing the same containing Resource (see section [#searchRetrieve Operation ''searchRetrieve'']).
    268278
     
    407417
    408418= CLARIN-FCS to SRU/CQL binding
    409 == SRU/CQL
     419== SRU/CQL #sruCQL
    410420{{{
    411421#!div style="border: 1px solid #000000; font-size: 75%"
    412422SRU 2.0 requirement
    413423}}}
     424CLARIN-FCS uses SRU 2.0 (!Search/Retrieve via URL) as underlaying communication protocol. SRU specifies a general communication protocol for searching and retrieving records and CQL (Contextual Query Language) specifies extensible query language. SRU 2.0 allows using additional custom query languages. CLARIN-FCS Core 2.0 uses FCS-QL for the Advanced Search capability.
     425
     426Endpoints and Clients `MUST` implement the SRU/CQL protocol suite as defined in [#REF_SRU_Overview OASIS-SRU-Overview], [#REF_SRU_APD OASIS-SRU-APD], [#REF_CQL OASIS-CQL], [#REF_Explain SRU-Explain], [#REF_Scan SRU-Scan], especially with respect to:
     427 * Data Model,
     428 * Query Model,
     429 * Processing Model,
     430 * Result Set Model, and
     431 * Diagnostics Model   
     432
     433Endpoints and Clients `MUST` implement the APD Binding for SRU 2.0, as defined in [#REF_SRU_20 OASIS-SRU-20]. \\
     434Clients `MUST` implement APD Binding for SRU 1.2, as defined in [#REF_SRU_12 OASIS-SRU-12]. \\
     435Endpoints `SHOULD` implement APD Binding for SRU 1.2, as defined in [#REF_SRU_12 OASIS-SRU-12]. \\
     436Endpoints and Clients `MAY` also implement APD binding for version 1.1.
     437
     438{{{
     439#!div style="border: 1px solid #000000; font-size: 75%"
     440TODO: think about XML namespaces
     441}}}
     442Endpoints and Clients `MUST` use the following XML namespace names (namespace URIs) for serializing responses:
     443 * `http://www.loc.gov/zing/srw/` for SRU response documents, and
     444 * `http://www.loc.gov/zing/srw/diagnostic/` for diagnostics within SRU response documents.
     445CLARIN-FCS deviates from the OASIS specification [#REF_SRU_Overview OASIS-SRU-Overview] and [#REF_SRU_12 OASIS-SRU-12] to ensure backwards comparability with SRU 1.2 services as they were defined by the [#REF_LOC_SRU_12 LOC-SRU12].
     446
     447Endpoints or Clients `MUST` support CQL conformance ''Level 2'' (as defined in [#REF_OASIS_CQL OASIS-CQL, section 6]), i.e. be able to ''parse'' (Endpoints) or ''serialize'' (Clients) all of CQL and respond with appropriate error messages to the search/retrieve protocol interface.
     448
     449'''NOTE''': this does ''not imply'', that Endpoints are ''required'' to support all of CQL, but rather that they are able to ''parse'' all of CQL and generate the appropriate error message, if a query includes a feature they do not support.
     450
     451Endpoints `MUST` generate diagnostics according to [#REF_SRU_20 OASIS-SRU-20, Appendix D] for error conditions or to indicate unsupported features. Unfortunately, the OASIS specification does not provides a comprehensive list of diagnostics for CQL-related errors. Therefore, Endpoints `MUST` use diagnostics from [#REF_LOC_DIAG LOC-DIAG, section "Diagnostics Relating to CQL"] for CQL related errors.
     452
     453Endpoints `MUST` support the HTTP GET [#REF_SRU_20 OASIS-SRU-20, Appendix B.1] and HTTP POST [#REF_SRU_20 OASIS-SRU-20, Appendix B.2] lower level protocol binding. Endpoints `MAY` also support the SOAP [#REF_SRU_20 OASIS-SRU-20, Appendix B.3] binding.
     454
     455
    414456== Operation ''explain''
    415457{{{
    416458#!div style="border: 1px solid #000000; font-size: 75%"
    417 Basically stays the same, but adjust for advanced stuff.
    418 }}}
     459TODO: adjust for advanced stuff!
     460}}}
     461The ''explain'' operation of the SRU protocol serves to announce server capabilities and to allow clients to configure themselves automatically. This operation is used similarly.
     462
     463The Endpoint `MUST` respond to a ''explain'' request by a proper ''explain'' response. As per [#REF_Explain SRU-Explain], the response `MUST` contain one `<sru:record>` element that contains an ''SRU Explain'' record. The `<sru:recordSchema>` element `MUST` contain the literal `http://explain.z3950.org/dtd/2.0/`, i.e. the official ''identifier'' for Explain records.
     464
     465According to the Capabilities supported by the Endpoint the Explain record `MUST` contain the following elements:
     466 ''Basic-Search'' Capability::
     467   `<zr:serverInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`) \\
     468   `<zr:databaseInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`) \\
     469   `<zr:schemaInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`). This element `MUST` contain an element `<zr:schema>` with an `@identifier` attribute with a value of `http://clarin.eu/fcs/resource` and an `@name` attribute with a value of `fcs`. \\
     470   `<zr:configInfo>` is `OPTIONAL` \\
     471   Other capabilities  may define how the `<zr:indexInfo>` element is to be used, therefore it is `NOT RECOMMENDED` for Endpoints to use it in custom extensions.
     472
     473To support auto-configuration in CLARIN-FCS, the Endpoint `MUST` provide support ''Endpoint Description''. The Endpoint Description is included in explain response utilizing SRUs extension mechanism, i.e. by embedding an XML fragment into the `<sru:extraResponseData>` element. The Endpoint `MUST` include the Endpoint Description ''only'' if the Client performs an explain request with the ''extra request parameter'' `x-fcs-endpoint-description` with a value of `true`. If the Client performs an explain request ''without'' supplying this extra request parameter the Endpoint `MUST NOT` include the Endpoint Description. The format of the Endpoint Description XML fragment is defined in [#endpointDescription Endpoint Description].
     474
     475The following example shows a request and response to an ''explain'' request with added extra request parameter `x-fcs-endpoint-description`:
     476* HTTP GET request: Client → Endpoint:
     477{{{#!sh
     478http://repos.example.org/fcs-endpoint?operation=explain&version=1.2&x-fcs-endpoint-description=true
     479}}}
     480* HTTP Response: Endpoint → Client:
     481{{{#!xml
     482<?xml version='1.0' encoding='utf-8'?>
     483<sru:explainResponse xmlns:sru="http://www.loc.gov/zing/srw/">
     484  <sru:version>1.2</sru:version>
     485  <sru:record>
     486    <sru:recordSchema>http://explain.z3950.org/dtd/2.0/</sru:recordSchema>
     487    <sru:recordPacking>xml</sru:recordPacking>
     488    <sru:recordData>
     489      <zr:explain xmlns:zr="http://explain.z3950.org/dtd/2.0/">
     490        <!-- <zr:serverInfo > is REQUIRED -->
     491        <zr:serverInfo protocol="SRU" version="1.2" transport="http">
     492          <zr:host>repos.example.org</zr:host>
     493          <zr:port>80</zr:port>
     494          <zr:database>fcs-endpoint</zr:database>
     495        </zr:serverInfo>
     496        <!-- <zr:databaseInfo> is REQUIRED -->
     497        <zr:databaseInfo>
     498          <zr:title lang="de">Goethe Corpus</zr:title>
     499          <zr:title lang="en" primary="true">Goethe Korpus</zr:title>
     500          <zr:description lang="de">Der Goethe Korpus des IDS Mannheim.</zr:description>
     501          <zr:description lang="en" primary="true">The Goethe corpus of IDS Mannheim.</zr:description>
     502        </zr:databaseInfo>
     503        <!-- <zr:schemaInfo> is REQUIRED -->
     504        <zr:schemaInfo>
     505          <zr:schema identifier="http://clarin.eu/fcs/resource" name="fcs">
     506            <zr:title lang="en" primary="true">CLARIN Federated Content Search</zr:title>
     507          </zr:schema>
     508        </zr:schemaInfo>
     509        <!-- <zr:configInfo> is OPTIONAL -->
     510        <zr:configInfo>
     511          <zr:default type="numberOfRecords">250</zr:default>
     512          <zr:setting type="maximumRecords">1000</zr:setting>
     513        </zr:configInfo>
     514      </zr:explain>
     515    </sru:recordData>
     516  </sru:record>
     517  <!-- <sru:echoedExplainRequest> is OPTIONAL -->
     518  <sru:echoedExplainRequest>
     519    <sru:version>1.2</sru:version>
     520    <sru:baseUrl>http://repos.example.org/fcs-endpoint</sru:baseUrl>
     521  </sru:echoedExplainRequest>
     522  <sru:extraResponseData>
     523    <ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="1">
     524      <ed:Capabilities>
     525          <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability>
     526      </ed:Capabilities>
     527      <ed:SupportedDataViews>
     528          <ed:SupportedDataView id="hits" delivery-policy="send-by-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView>
     529      </ed:SupportedDataViews>
     530      <ed:Resources>
     531          <!-- just one top-level resource at the Endpoint -->
     532          <ed:Resource pid="http://hdl.handle.net/4711/0815">
     533              <ed:Title xml:lang="de">Goethe Corpus</ed:Title>
     534              <ed:Title xml:lang="en">Goethe Korpus</ed:Title>
     535              <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description>
     536              <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description>
     537              <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI>
     538              <ed:Languages>
     539                  <ed:Language>deu</ed:Language>
     540              </ed:Languages>
     541              <ed:AvailableDataViews ref="hits"/>
     542          </ed:Resource>
     543      </ed:Resources>
     544    </ed:EndpointDescription>
     545  </sru:extraResponseData>
     546</sru:explainResponse>
     547}}}
     548
    419549== Operation ''scan''
    420 {{{
    421 #!div style="border: 1px solid #000000; font-size: 75%"
    422 Basically stays the same, but adjust for advanced stuff (if required).
    423 }}}
     550The ''scan'' operation of the SRU protocol is currently not used in the ''Basic Search'' or ''Advanced Search'' capability of CLARIN-FCS. Future capabilities may use this operation, therefore it `NOT RECOMMENDED` for Endpoints to define custom extensions that use this operation.
     551
    424552== Operation ''searchRetrieve''
    425553{{{
     
    428556Define String for SRU query-lanaguge paramater ("fcs"? "clarin-fcs"?)
    429557}}}
     558The ''searchRetrieve'' operation of the SRU protocol is used for searching in the Resources that are provided by the Endpoint. The SRU protocol defines the serialization of request and response formats in [#REF_SRU_20 OASIS-SRU-20] for SRU version 2.0 and [#REF_SRU_12 OASIS-SRU-12] for SRU version 1.2. An Endpoint `MUST` respond in the correct format, i.e. when Endpoint also supports SRU 1.2 and the request is issued in SRU version 1.2, the response must be encoded accordingly.
     559
     560In SRU, search result hits are encoded down to a record level, i.e. the `<sru:record>` element, and SRU allows records to be serialized in various formats, so called ''record schemas'' Endpoints `MUST` support the CLARIN-FCS record schema (see section [#resultFormat Result Format]) and `MUST` use the value `http://clarin.eu/fcs/resource` for the ''responseItemType'' ("record schema identifier").
     561Endpoints `MUST` represent exactly ''one hit'' within the Resource as one SRU record, i.e. `<sru:record>` element.
     562
     563The following example shows a request and response to a ''searchRetrieve'' request with a ''term-only'' query for "cat":
     564* HTTP GET request: Client → Endpoint:
     565{{{#!sh
     566http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat
     567}}}
     568* HTTP Response: Endpoint → Client:
     569{{{#!xml
     570<?xml version='1.0' encoding='utf-8'?>
     571<sru:searchRetrieveResponse xmlns:sru="http://www.loc.gov/zing/srw/">
     572  <sru:version>1.2</sru:version>
     573  <sru:numberOfRecords>6</sru:numberOfRecords>
     574  <sru:records>
     575    <sru:record>
     576      <sru:recordSchema>http://clarin.eu/fcs/resource</sru:recordSchema>
     577      <sru:recordPacking>xml</sru:recordPacking>
     578      <sru:recordData>
     579        <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/08-15">
     580          <fcs:ResourceFragment>
     581            <fcs:DataView type="application/x-clarin-fcs-hits+xml">
     582              <hits:Result xmlns:hits="http://clarin.eu/fcs/dataview/hits">
     583                The quick brown <hits:Hit>cat</hits:Hit> jumps over the lazy dog.
     584              </hits:Result>
     585            </fcs:DataView>
     586          </fcs:ResourceFragment>
     587        </fcs:Resource>
     588      </sru:recordData>
     589      <sru:recordPosition>1</sru:recordPosition>
     590    </sru:record>
     591    <!-- more <sru:records> omitted for brevity -->
     592  </sru:records>
     593  <!-- <sru:echoedSearchRetrieveRequest> is OPTIONAL -->
     594  <sru:echoedSearchRetrieveRequest>
     595    <sru:version>1.2</sru:version>
     596    <sru:query>cat</sru:query>
     597    <sru:xQuery xmlns="http://www.loc.gov/zing/cql/xcql/">
     598      <searchClause>
     599        <index>cql.serverChoice</index>
     600        <relation>
     601          <value>=</value>
     602        </relation>
     603        <term>cat</term>
     604      </searchClause>
     605    </sru:xQuery>
     606    <sru:startRecord>1</sru:startRecord>
     607    <sru:baseUrl>http://repos.example.org/fcs-endpoint</sru:baseUrl>
     608  </sru:echoedSearchRetrieveRequest>
     609</sru:searchRetrieveResponse>
     610}}}
     611
     612In general, the Endpoint is `REQUIRED` to accept an ''unrestricted search'' and `SHOULD` perform the search operation on ''all'' Resources that are available at the Endpoint. If that is for some reason not feasible, e.g. performing an unrestricted search would allocate too many resources, the Endpoint `MAY` independently restrict the search to a scope that it can handle. If it does so, it `MUST` issue a non-fatal diagnostics `http://clarin.eu/fcs/diagnostic/2` ("Resource set too large. Query context automatically adjusted."). The details field of diagnostics `MUST` contain the persistent identifier of the resources to which the query scope was limited to. If the Endpoint limits the query scope to more than one resource, it `MUST` generate a ''separate'' non-fatal diagnostic `http://clarin.eu/fcs/diagnostic/2` for each of the resources.
     613
     614The Client can request the Endpoint to ''restrict the search'' to a sub-resource of these Resources. In this case, the Client `MUST` pass a comma-separated list of persistent identifiers in the `x-fcs-context` extra request parameter of the ''searchRetrieve'' request. The Endpoint `MUST` then restrict the search to those Resources, which are identified by the persistent identifiers passed by the Client. If a Client requests too many resources for the Endpoint to handle with `x-fcs-context`, the Endpoint `MAY` issue a fatal diagnostic `http://clarin.eu/fcs/diagnostic/3` ("Resource set too large. Cannot perform Query.") and terminate processing. Alternatively, the Endpoint `MAY` also automatically adjust the scope and issue a non-fatal diagnostic `http://clarin.eu/fcs/diagnostic/2` (see above). And Endpoint `MUST NOT` issue a `http://clarin.eu/fcs/diagnostic/3` diagnostic in response to a request, if a Client performed the request ''without'' the `x-fcs-context` extra request parameter.
     615
     616The Client can extract all valid persistent identifiers from the `@pid` attribute of the `<ed:Resource>` element, obtained by the ''explain'' request (see section [#explain Operation ''explain''] and section [#endpointDescription Endpoint Description]). The list of persistent identifiers can get extensive, but a Client can use the HTTP POST method instead of HTTP GET method for submitting the request.
     617
     618For example, to restrict the search to the Resource with the persistent identifier `http://hdl.handle.net/4711/0815` the Client must issue the following request:
     619{{{#!sh
     620http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-context=http://hdl.handle.net/4711/0815
     621}}}
     622To restrict the search to the Resources with the persistent identifier `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816-2` the Client must issue the following request:
     623{{{#!sh
     624http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-context=http://hdl.handle.net/4711/0815,http://hdl.handle.net/4711/0816-2
     625}}}
     626If an invalid persistent identifier is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/1` diagnostic, i.e. add the appropriate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. just issue the diagnostic and perform no search, or it `MAY` treat it as non-fatal and perform the search.
     627
     628If a Client wants to request one or more Data Views, that are handled by Endpoint with the ''need-to-request'' delivery policy, it `MUST` pass a comma-separated list of ''Data View identifier'' in the `x-fcs-dataviews` extra request parameter of the 'searchRetrieve' request. A Client can extract valid values for the ''Data View identifiers'' from the `@id` attribute of the `<ed:SupportedDataView>` elements in the Endpoint Description of the Endpoint (see section [#explain Operation ''explain''] and section [#endpointDescription Endpoint Description]).
     629
     630For example, to request the CMDI Data View from an Endpoint that has an Endpoint Description, as described in [#REF_Example_5 Example 5], a Client would need to use the ''Data View identifier'' `cmdi` and submit the following request:
     631{{{#!sh
     632http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-dataviews=cmdi
     633}}}
     634If an invalid ''Data View identifier'' is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/4`diagnostic, i.e. add the appropriate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. simply issue the diagnostic and perform no search, or it `MAY` treat it a non-fatal and perform the search.
     635
     636
    430637= Normative Appendix
    431638{{{