| 424 | CLARIN-FCS uses SRU 2.0 (!Search/Retrieve via URL) as underlaying communication protocol. SRU specifies a general communication protocol for searching and retrieving records and CQL (Contextual Query Language) specifies extensible query language. SRU 2.0 allows using additional custom query languages. CLARIN-FCS Core 2.0 uses FCS-QL for the Advanced Search capability. |
| 425 | |
| 426 | Endpoints and Clients `MUST` implement the SRU/CQL protocol suite as defined in [#REF_SRU_Overview OASIS-SRU-Overview], [#REF_SRU_APD OASIS-SRU-APD], [#REF_CQL OASIS-CQL], [#REF_Explain SRU-Explain], [#REF_Scan SRU-Scan], especially with respect to: |
| 427 | * Data Model, |
| 428 | * Query Model, |
| 429 | * Processing Model, |
| 430 | * Result Set Model, and |
| 431 | * Diagnostics Model |
| 432 | |
| 433 | Endpoints and Clients `MUST` implement the APD Binding for SRU 2.0, as defined in [#REF_SRU_20 OASIS-SRU-20]. \\ |
| 434 | Clients `MUST` implement APD Binding for SRU 1.2, as defined in [#REF_SRU_12 OASIS-SRU-12]. \\ |
| 435 | Endpoints `SHOULD` implement APD Binding for SRU 1.2, as defined in [#REF_SRU_12 OASIS-SRU-12]. \\ |
| 436 | Endpoints and Clients `MAY` also implement APD binding for version 1.1. |
| 437 | |
| 438 | {{{ |
| 439 | #!div style="border: 1px solid #000000; font-size: 75%" |
| 440 | TODO: think about XML namespaces |
| 441 | }}} |
| 442 | Endpoints and Clients `MUST` use the following XML namespace names (namespace URIs) for serializing responses: |
| 443 | * `http://www.loc.gov/zing/srw/` for SRU response documents, and |
| 444 | * `http://www.loc.gov/zing/srw/diagnostic/` for diagnostics within SRU response documents. |
| 445 | CLARIN-FCS deviates from the OASIS specification [#REF_SRU_Overview OASIS-SRU-Overview] and [#REF_SRU_12 OASIS-SRU-12] to ensure backwards comparability with SRU 1.2 services as they were defined by the [#REF_LOC_SRU_12 LOC-SRU12]. |
| 446 | |
| 447 | Endpoints or Clients `MUST` support CQL conformance ''Level 2'' (as defined in [#REF_OASIS_CQL OASIS-CQL, section 6]), i.e. be able to ''parse'' (Endpoints) or ''serialize'' (Clients) all of CQL and respond with appropriate error messages to the search/retrieve protocol interface. |
| 448 | |
| 449 | '''NOTE''': this does ''not imply'', that Endpoints are ''required'' to support all of CQL, but rather that they are able to ''parse'' all of CQL and generate the appropriate error message, if a query includes a feature they do not support. |
| 450 | |
| 451 | Endpoints `MUST` generate diagnostics according to [#REF_SRU_20 OASIS-SRU-20, Appendix D] for error conditions or to indicate unsupported features. Unfortunately, the OASIS specification does not provides a comprehensive list of diagnostics for CQL-related errors. Therefore, Endpoints `MUST` use diagnostics from [#REF_LOC_DIAG LOC-DIAG, section "Diagnostics Relating to CQL"] for CQL related errors. |
| 452 | |
| 453 | Endpoints `MUST` support the HTTP GET [#REF_SRU_20 OASIS-SRU-20, Appendix B.1] and HTTP POST [#REF_SRU_20 OASIS-SRU-20, Appendix B.2] lower level protocol binding. Endpoints `MAY` also support the SOAP [#REF_SRU_20 OASIS-SRU-20, Appendix B.3] binding. |
| 454 | |
| 455 | |
417 | | Basically stays the same, but adjust for advanced stuff. |
418 | | }}} |
| 459 | TODO: adjust for advanced stuff! |
| 460 | }}} |
| 461 | The ''explain'' operation of the SRU protocol serves to announce server capabilities and to allow clients to configure themselves automatically. This operation is used similarly. |
| 462 | |
| 463 | The Endpoint `MUST` respond to a ''explain'' request by a proper ''explain'' response. As per [#REF_Explain SRU-Explain], the response `MUST` contain one `<sru:record>` element that contains an ''SRU Explain'' record. The `<sru:recordSchema>` element `MUST` contain the literal `http://explain.z3950.org/dtd/2.0/`, i.e. the official ''identifier'' for Explain records. |
| 464 | |
| 465 | According to the Capabilities supported by the Endpoint the Explain record `MUST` contain the following elements: |
| 466 | ''Basic-Search'' Capability:: |
| 467 | `<zr:serverInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`) \\ |
| 468 | `<zr:databaseInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`) \\ |
| 469 | `<zr:schemaInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`). This element `MUST` contain an element `<zr:schema>` with an `@identifier` attribute with a value of `http://clarin.eu/fcs/resource` and an `@name` attribute with a value of `fcs`. \\ |
| 470 | `<zr:configInfo>` is `OPTIONAL` \\ |
| 471 | Other capabilities may define how the `<zr:indexInfo>` element is to be used, therefore it is `NOT RECOMMENDED` for Endpoints to use it in custom extensions. |
| 472 | |
| 473 | To support auto-configuration in CLARIN-FCS, the Endpoint `MUST` provide support ''Endpoint Description''. The Endpoint Description is included in explain response utilizing SRUs extension mechanism, i.e. by embedding an XML fragment into the `<sru:extraResponseData>` element. The Endpoint `MUST` include the Endpoint Description ''only'' if the Client performs an explain request with the ''extra request parameter'' `x-fcs-endpoint-description` with a value of `true`. If the Client performs an explain request ''without'' supplying this extra request parameter the Endpoint `MUST NOT` include the Endpoint Description. The format of the Endpoint Description XML fragment is defined in [#endpointDescription Endpoint Description]. |
| 474 | |
| 475 | The following example shows a request and response to an ''explain'' request with added extra request parameter `x-fcs-endpoint-description`: |
| 476 | * HTTP GET request: Client → Endpoint: |
| 477 | {{{#!sh |
| 478 | http://repos.example.org/fcs-endpoint?operation=explain&version=1.2&x-fcs-endpoint-description=true |
| 479 | }}} |
| 480 | * HTTP Response: Endpoint → Client: |
| 481 | {{{#!xml |
| 482 | <?xml version='1.0' encoding='utf-8'?> |
| 483 | <sru:explainResponse xmlns:sru="http://www.loc.gov/zing/srw/"> |
| 484 | <sru:version>1.2</sru:version> |
| 485 | <sru:record> |
| 486 | <sru:recordSchema>http://explain.z3950.org/dtd/2.0/</sru:recordSchema> |
| 487 | <sru:recordPacking>xml</sru:recordPacking> |
| 488 | <sru:recordData> |
| 489 | <zr:explain xmlns:zr="http://explain.z3950.org/dtd/2.0/"> |
| 490 | <!-- <zr:serverInfo > is REQUIRED --> |
| 491 | <zr:serverInfo protocol="SRU" version="1.2" transport="http"> |
| 492 | <zr:host>repos.example.org</zr:host> |
| 493 | <zr:port>80</zr:port> |
| 494 | <zr:database>fcs-endpoint</zr:database> |
| 495 | </zr:serverInfo> |
| 496 | <!-- <zr:databaseInfo> is REQUIRED --> |
| 497 | <zr:databaseInfo> |
| 498 | <zr:title lang="de">Goethe Corpus</zr:title> |
| 499 | <zr:title lang="en" primary="true">Goethe Korpus</zr:title> |
| 500 | <zr:description lang="de">Der Goethe Korpus des IDS Mannheim.</zr:description> |
| 501 | <zr:description lang="en" primary="true">The Goethe corpus of IDS Mannheim.</zr:description> |
| 502 | </zr:databaseInfo> |
| 503 | <!-- <zr:schemaInfo> is REQUIRED --> |
| 504 | <zr:schemaInfo> |
| 505 | <zr:schema identifier="http://clarin.eu/fcs/resource" name="fcs"> |
| 506 | <zr:title lang="en" primary="true">CLARIN Federated Content Search</zr:title> |
| 507 | </zr:schema> |
| 508 | </zr:schemaInfo> |
| 509 | <!-- <zr:configInfo> is OPTIONAL --> |
| 510 | <zr:configInfo> |
| 511 | <zr:default type="numberOfRecords">250</zr:default> |
| 512 | <zr:setting type="maximumRecords">1000</zr:setting> |
| 513 | </zr:configInfo> |
| 514 | </zr:explain> |
| 515 | </sru:recordData> |
| 516 | </sru:record> |
| 517 | <!-- <sru:echoedExplainRequest> is OPTIONAL --> |
| 518 | <sru:echoedExplainRequest> |
| 519 | <sru:version>1.2</sru:version> |
| 520 | <sru:baseUrl>http://repos.example.org/fcs-endpoint</sru:baseUrl> |
| 521 | </sru:echoedExplainRequest> |
| 522 | <sru:extraResponseData> |
| 523 | <ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="1"> |
| 524 | <ed:Capabilities> |
| 525 | <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability> |
| 526 | </ed:Capabilities> |
| 527 | <ed:SupportedDataViews> |
| 528 | <ed:SupportedDataView id="hits" delivery-policy="send-by-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView> |
| 529 | </ed:SupportedDataViews> |
| 530 | <ed:Resources> |
| 531 | <!-- just one top-level resource at the Endpoint --> |
| 532 | <ed:Resource pid="http://hdl.handle.net/4711/0815"> |
| 533 | <ed:Title xml:lang="de">Goethe Corpus</ed:Title> |
| 534 | <ed:Title xml:lang="en">Goethe Korpus</ed:Title> |
| 535 | <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description> |
| 536 | <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description> |
| 537 | <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI> |
| 538 | <ed:Languages> |
| 539 | <ed:Language>deu</ed:Language> |
| 540 | </ed:Languages> |
| 541 | <ed:AvailableDataViews ref="hits"/> |
| 542 | </ed:Resource> |
| 543 | </ed:Resources> |
| 544 | </ed:EndpointDescription> |
| 545 | </sru:extraResponseData> |
| 546 | </sru:explainResponse> |
| 547 | }}} |
| 548 | |
| 558 | The ''searchRetrieve'' operation of the SRU protocol is used for searching in the Resources that are provided by the Endpoint. The SRU protocol defines the serialization of request and response formats in [#REF_SRU_20 OASIS-SRU-20] for SRU version 2.0 and [#REF_SRU_12 OASIS-SRU-12] for SRU version 1.2. An Endpoint `MUST` respond in the correct format, i.e. when Endpoint also supports SRU 1.2 and the request is issued in SRU version 1.2, the response must be encoded accordingly. |
| 559 | |
| 560 | In SRU, search result hits are encoded down to a record level, i.e. the `<sru:record>` element, and SRU allows records to be serialized in various formats, so called ''record schemas'' Endpoints `MUST` support the CLARIN-FCS record schema (see section [#resultFormat Result Format]) and `MUST` use the value `http://clarin.eu/fcs/resource` for the ''responseItemType'' ("record schema identifier"). |
| 561 | Endpoints `MUST` represent exactly ''one hit'' within the Resource as one SRU record, i.e. `<sru:record>` element. |
| 562 | |
| 563 | The following example shows a request and response to a ''searchRetrieve'' request with a ''term-only'' query for "cat": |
| 564 | * HTTP GET request: Client → Endpoint: |
| 565 | {{{#!sh |
| 566 | http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat |
| 567 | }}} |
| 568 | * HTTP Response: Endpoint → Client: |
| 569 | {{{#!xml |
| 570 | <?xml version='1.0' encoding='utf-8'?> |
| 571 | <sru:searchRetrieveResponse xmlns:sru="http://www.loc.gov/zing/srw/"> |
| 572 | <sru:version>1.2</sru:version> |
| 573 | <sru:numberOfRecords>6</sru:numberOfRecords> |
| 574 | <sru:records> |
| 575 | <sru:record> |
| 576 | <sru:recordSchema>http://clarin.eu/fcs/resource</sru:recordSchema> |
| 577 | <sru:recordPacking>xml</sru:recordPacking> |
| 578 | <sru:recordData> |
| 579 | <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/08-15"> |
| 580 | <fcs:ResourceFragment> |
| 581 | <fcs:DataView type="application/x-clarin-fcs-hits+xml"> |
| 582 | <hits:Result xmlns:hits="http://clarin.eu/fcs/dataview/hits"> |
| 583 | The quick brown <hits:Hit>cat</hits:Hit> jumps over the lazy dog. |
| 584 | </hits:Result> |
| 585 | </fcs:DataView> |
| 586 | </fcs:ResourceFragment> |
| 587 | </fcs:Resource> |
| 588 | </sru:recordData> |
| 589 | <sru:recordPosition>1</sru:recordPosition> |
| 590 | </sru:record> |
| 591 | <!-- more <sru:records> omitted for brevity --> |
| 592 | </sru:records> |
| 593 | <!-- <sru:echoedSearchRetrieveRequest> is OPTIONAL --> |
| 594 | <sru:echoedSearchRetrieveRequest> |
| 595 | <sru:version>1.2</sru:version> |
| 596 | <sru:query>cat</sru:query> |
| 597 | <sru:xQuery xmlns="http://www.loc.gov/zing/cql/xcql/"> |
| 598 | <searchClause> |
| 599 | <index>cql.serverChoice</index> |
| 600 | <relation> |
| 601 | <value>=</value> |
| 602 | </relation> |
| 603 | <term>cat</term> |
| 604 | </searchClause> |
| 605 | </sru:xQuery> |
| 606 | <sru:startRecord>1</sru:startRecord> |
| 607 | <sru:baseUrl>http://repos.example.org/fcs-endpoint</sru:baseUrl> |
| 608 | </sru:echoedSearchRetrieveRequest> |
| 609 | </sru:searchRetrieveResponse> |
| 610 | }}} |
| 611 | |
| 612 | In general, the Endpoint is `REQUIRED` to accept an ''unrestricted search'' and `SHOULD` perform the search operation on ''all'' Resources that are available at the Endpoint. If that is for some reason not feasible, e.g. performing an unrestricted search would allocate too many resources, the Endpoint `MAY` independently restrict the search to a scope that it can handle. If it does so, it `MUST` issue a non-fatal diagnostics `http://clarin.eu/fcs/diagnostic/2` ("Resource set too large. Query context automatically adjusted."). The details field of diagnostics `MUST` contain the persistent identifier of the resources to which the query scope was limited to. If the Endpoint limits the query scope to more than one resource, it `MUST` generate a ''separate'' non-fatal diagnostic `http://clarin.eu/fcs/diagnostic/2` for each of the resources. |
| 613 | |
| 614 | The Client can request the Endpoint to ''restrict the search'' to a sub-resource of these Resources. In this case, the Client `MUST` pass a comma-separated list of persistent identifiers in the `x-fcs-context` extra request parameter of the ''searchRetrieve'' request. The Endpoint `MUST` then restrict the search to those Resources, which are identified by the persistent identifiers passed by the Client. If a Client requests too many resources for the Endpoint to handle with `x-fcs-context`, the Endpoint `MAY` issue a fatal diagnostic `http://clarin.eu/fcs/diagnostic/3` ("Resource set too large. Cannot perform Query.") and terminate processing. Alternatively, the Endpoint `MAY` also automatically adjust the scope and issue a non-fatal diagnostic `http://clarin.eu/fcs/diagnostic/2` (see above). And Endpoint `MUST NOT` issue a `http://clarin.eu/fcs/diagnostic/3` diagnostic in response to a request, if a Client performed the request ''without'' the `x-fcs-context` extra request parameter. |
| 615 | |
| 616 | The Client can extract all valid persistent identifiers from the `@pid` attribute of the `<ed:Resource>` element, obtained by the ''explain'' request (see section [#explain Operation ''explain''] and section [#endpointDescription Endpoint Description]). The list of persistent identifiers can get extensive, but a Client can use the HTTP POST method instead of HTTP GET method for submitting the request. |
| 617 | |
| 618 | For example, to restrict the search to the Resource with the persistent identifier `http://hdl.handle.net/4711/0815` the Client must issue the following request: |
| 619 | {{{#!sh |
| 620 | http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-context=http://hdl.handle.net/4711/0815 |
| 621 | }}} |
| 622 | To restrict the search to the Resources with the persistent identifier `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816-2` the Client must issue the following request: |
| 623 | {{{#!sh |
| 624 | http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-context=http://hdl.handle.net/4711/0815,http://hdl.handle.net/4711/0816-2 |
| 625 | }}} |
| 626 | If an invalid persistent identifier is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/1` diagnostic, i.e. add the appropriate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. just issue the diagnostic and perform no search, or it `MAY` treat it as non-fatal and perform the search. |
| 627 | |
| 628 | If a Client wants to request one or more Data Views, that are handled by Endpoint with the ''need-to-request'' delivery policy, it `MUST` pass a comma-separated list of ''Data View identifier'' in the `x-fcs-dataviews` extra request parameter of the 'searchRetrieve' request. A Client can extract valid values for the ''Data View identifiers'' from the `@id` attribute of the `<ed:SupportedDataView>` elements in the Endpoint Description of the Endpoint (see section [#explain Operation ''explain''] and section [#endpointDescription Endpoint Description]). |
| 629 | |
| 630 | For example, to request the CMDI Data View from an Endpoint that has an Endpoint Description, as described in [#REF_Example_5 Example 5], a Client would need to use the ''Data View identifier'' `cmdi` and submit the following request: |
| 631 | {{{#!sh |
| 632 | http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-dataviews=cmdi |
| 633 | }}} |
| 634 | If an invalid ''Data View identifier'' is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/4`diagnostic, i.e. add the appropriate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. simply issue the diagnostic and perform no search, or it `MAY` treat it a non-fatal and perform the search. |
| 635 | |
| 636 | |