Changes between Version 88 and Version 89 of FCS-Specification-ScrapBook
- Timestamp:
- 03/19/14 11:13:17 (11 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
FCS-Specification-ScrapBook
v88 v89 275 275 276 276 === Result Format ===#resultFormat 277 The Search Engine will produce a result set containing several hits as the outcome of processing a query. The Endpoint `MUST` serialize these hits in the CLARIN-FCS result format. Endpoints are `REQUIRED` to adhere to the principle, that ''one'' hit `MUST` be serialized as ''one'' CLARIN-FCS result record and `MUST NOT` combine several hits in one CLARIN-FCS result record. E.g., if a query matches five different sentences within one text (= the resource), the Endpoint must serialize five Resource (= one per hit) and embed each within one SRU result (see [#searchRetrieve below]).277 The Search Engine will produce a result set containing several hits as the outcome of processing a query. The Endpoint `MUST` serialize these hits in the CLARIN-FCS result format. Endpoints are `REQUIRED` to adhere to the principle, that ''one'' hit `MUST` be serialized as ''one'' CLARIN-FCS result record and `MUST NOT` combine several hits in one CLARIN-FCS result record. E.g., if a query matches five different sentences within one text (= the resource), the Endpoint must serialize five Resource (= one per hit) and embed each within one SRU result (see section [#searchRetrieve Operation ''searchRetrieve'']). 278 278 279 279 CLARIN-FCS uses a customized format for returning results. ''Resource'' and ''Resource Fragments'' serve as containers for hit results, that are presented in one or more ''Data View''. The following section describes the Resource format and Data View format and section [#searchRetrieve Operation ''searchRetrieve''] will describe, how hits are embedded within SRU responses. … … 381 381 382 382 === Endpoint Description ===#endpointDescription 383 Endpoints need to provide information about their capabilities to support auto-configuration of Clients, Th is capabilities include, among other information, the Profile that is supported by the Endpoint. The ''Endpoint Description'' mechanism provides the necessary facility to provide this information to the Clients. Endpoints `MUST` encode their capabilities using an XML format and embed this information into the SRU/CQL protocol as described in section [#explain Operation ''explain'']. The XML fragment generated by the Endpoint for the Endpoint Description `MUST` be valid according to the XML schema "[source:FederatedSearch/schema/Endpoint-Description.xsd Endpoint-Description.xsd]" ([source:FederatedSearch/schema/Endpoint-Description.xsd?format=txt download]).383 Endpoints need to provide information about their capabilities to support auto-configuration of Clients, The ''Endpoint Description'' mechanism provides the necessary facility to provide this information to the Clients. Endpoints `MUST` encode their capabilities using an XML format and embed this information into the SRU/CQL protocol as described in section [#explain Operation ''explain'']. The XML fragment generated by the Endpoint for the Endpoint Description `MUST` be valid according to the XML schema "[source:FederatedSearch/schema/Endpoint-Description.xsd Endpoint-Description.xsd]" ([source:FederatedSearch/schema/Endpoint-Description.xsd?format=txt download]). 384 384 385 385 The XML fragment for ''Endpoint Description'' is encoded as an `<ed:EndpointDescription>` element, that contains the following attributes and children: 386 386 * one `@version` attribute (`REQUIRED`) on the `<ed:EndpointDescription>` element. The value of the `@version` attribute `MUST` be `1`. 387 387 * one `<ed:Capabilities>` element (`REQUIRED`) that contains one or more `<ed:Capability>` elements \\ 388 The content of the `<ed:Capability>` element indicates the capabilities, that are supported by the Endpoint. For valid values, see [#capabilities section Capabilities and Profiles]388 The content of the `<ed:Capability>` element indicates the capabilities, that are supported by the Endpoint. For valid values, see section [#capabilities Capabilities and Profiles]. This list `MUST NOT` include duplicate values. 389 389 * one `<ed:SupportedDataViews>` (`REQUIRED`) \\ 390 A list of Data Views, that are supported by this Endpoint. This list is composed of one or more `<ed:SupportedDataView>` elements. The content of a `<ed:SupportedDataView>` `MUST` be the MIME type of a supported Data View, e.g. `application/x-clarin-fcs-hits+xml`. Each `<ed:SupportedDataView>` element `MUST`carry a `@id` and a `@delivery-policy` attribute. The value of the `@id` attribute is later used in the `<ed:Resource>` element to indicate, which Data View is supported by a resource (see below). The `@delivery-policy` indicates, the Endpoint's delivery policy, for that Data View. Valid values are `send-default` for the ''send-default'' and `need-to-request` for the ''need-to-request'' delivery policy. 390 A list of Data Views, that are supported by this Endpoint. This list is composed of one or more `<ed:SupportedDataView>` elements. The content of a `<ed:SupportedDataView>` `MUST` be the MIME type of a supported Data View, e.g. `application/x-clarin-fcs-hits+xml`. Each `<ed:SupportedDataView>` element `MUST` carry a `@id` and a `@delivery-policy` attribute. The value of the `@id` attribute is later used in the `<ed:Resource>` element to indicate, which Data View is supported by a resource (see below). The `@delivery-policy` indicates, the Endpoint's delivery policy, for that Data View. Valid values are `send-default` for the ''send-default'' and `need-to-request` for the ''need-to-request'' delivery policy. \\ 391 This list `MUST NOT` include duplicate entries, i.e. no MIME type must appear more then once. \\ 392 The value of the `@id` attribute `MUST NOT` contain the characters `,` (comma) or `;` (semicolon) 391 393 * one `<ed:Resources>` element (`REQUIRED`) \\ 392 394 A list of (top-level) resources that are available, i.e. searchable, at the Endpoint. The `<ed:Resources>` element contains one or more `<ed:Resource>` elements (see below). The Endpoint `MUST` declare at least one (top-level) resource. … … 402 404 The (relevant) languages available within the resource. The `<ed:Languages>` element contains one or more `<ed:Language>` elements. The content of a `<ed:Language>` element `MUST` be a ISO 639-3 three letter language code. This element should be repeated for all languages (relevant) available ''within'' the resource, however this list `MUST NOT` contain duplicate entries. 403 405 * one `<ed:AvailableDataViews>` element (`REQUIRED`) \\ 404 The Data Views, that are available for the resource. The `<ed:AvailableDataViews>` `MUST` carry a `@ref` attribute, that contains a whitespace separated list of id values, that correspond to value of `@id` attribute on `<ed:SupportedDataView>` elements. 406 The Data Views, that are available for the resource. The `<ed:AvailableDataViews>` `MUST` carry a `@ref` attribute, that contains a whitespace separated list of id values, that correspond to value of `@id` attribute on `<ed:SupportedDataView>` elements. \\ 407 In case of sub-resources, each Resource `SHOULD` support all Data Views, that are supported by the parent resource. However, every resource `MUST` declare all available Data Views independently, i.e. there is no implicit inheritance semantic. 405 408 * zero or one `<ed:Resources>` element (`OPTIONAL`) \\ 406 409 If a resource has searchable sub-resources the Endpoint `MUST` supply additional finer grained resource elements, which are wrapped in a `<ed:Resources>` element. A sub-resource is a searchable entity within a resource, e.g. a sub-corpus. … … 432 435 </ed:EndpointDescription> 433 436 }}} 434 This [#REF_Example_4 example] shows a simple Endpoint Description for an Endpoint that only supports the ''basic-search'' capability (and th is, the basic'' Profile) and only provides the Generic Hits Data View, which is indicated by a `<ed:SupportedDataView>` element. This element carries a `@id` attribute with a value of `dv1` and indicates a delivery policy of ''send-default' by the `@delivery-policy` attribute. It only provides one top-level resource identified by the persistent identifier `http://hdl.handle.net/4711/0815`. The resource has a title as well as a description in German and English. A landing page is located at `http://repos.example.org/corpus1.html`. The predominant language in the resource contents is German. Only the Generic Hits Data View is supported for this resource, because the `<ed:AvailableDataViews>` element only references the `<ed:SupporedDataView>` element with the `@id` with a value of `dv1`.437 This [#REF_Example_4 example] shows a simple Endpoint Description for an Endpoint that only supports the ''basic-search'' capability (and thus, the ''basic'' Profile) and only provides the Generic Hits Data View, which is indicated by a `<ed:SupportedDataView>` element. This element carries a `@id` attribute with a value of `dv1` and indicates a delivery policy of ''send-default'' by the `@delivery-policy` attribute. It only provides one top-level resource identified by the persistent identifier `http://hdl.handle.net/4711/0815`. The resource has a title as well as a description in German and English. A landing page is located at `http://repos.example.org/corpus1.html`. The predominant language in the resource contents is German. Only the Generic Hits Data View is supported for this resource, because the `<ed:AvailableDataViews>` element only references the `<ed:SupporedDataView>` element with the `@id` with a value of `dv1`. 435 438 436 439 [=#REF_Example_5]Example 5: … … 492 495 </ed:EndpointDescription> 493 496 }}} 494 This more complex [#REF_Example_5 example] show an Endpoint Description for an Endpoint that, similar to [#REF_Example_4 Example 4], supports the ''basic'' profile. In addition to the Generic Hits Data View it also supports the CMDI Data View. The Endpoint has two top-level resources (identified by the persistent identifiers `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816`. The second top-level resource has two independently searchable sub-resources, identified by the persistent identifier `http://hdl.handle.net/4711/0816-1` and `http://hdl.handle.net/4711/0816-2`. All resources are described using several properties, like title, description, etc.497 This more complex [#REF_Example_5 example] show an Endpoint Description for an Endpoint that, similar to [#REF_Example_4 Example 4], supports the ''basic'' profile. In addition to the Generic Hits Data View it also supports the CMDI Data View. The delivery polices are ''send-always'' for the Generic Hits Data View and ''need-to-request'' for the CMDI Data View. The Endpoint has two top-level resources (identified by the persistent identifiers `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816`. The second top-level resource has two independently searchable sub-resources, identified by the persistent identifier `http://hdl.handle.net/4711/0816-1` and `http://hdl.handle.net/4711/0816-2`. All resources are described using several properties, like title, description, etc. The first top-level resource provides only the Generic Hits Data Data View, while the other top-level resource, including it's children, provide the Generic Hits and the CMDI Data Views. 495 498 496 499 === Endpoint Custom Extensions === … … 539 542 `<zr:databaseInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`) \\ 540 543 `<zr:schemaInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`). This element `MUST` contain an element `<zr:schema>` with an `@identifier` attribute with a value of `http://clarin.eu/fcs/resource` and an `@name` attribute with a value of `fcs`. \\ 541 `<zr:configInfo>` is `OPTIONAL` `\\544 `<zr:configInfo>` is `OPTIONAL` \\ 542 545 An ''extended'' profile may define how the `<zr:indexInfo>` element is to be used, therefore it is `NOT RECOMMENDED` for Endpoints to define custom extensions. 543 546 ''Extended'' Profile:: … … 547 550 548 551 The following example shows a request and response to an ''explain'' request with added extra request parameter `x-fcs-endpoint-description`: 549 * HTTP GET request: Client ?Endpoint:552 * HTTP GET request: Client → Endpoint: 550 553 {{{#!sh 551 554 http://repos.example.org/fcs-endpoint?operation=explain&version=1.2&x-fcs-endpoint-description=true 552 555 }}} 553 * HTTP Response: Endpoint ?Client:556 * HTTP Response: Endpoint → Client: 554 557 {{{#!xml 555 558 <?xml version='1.0' encoding='utf-8'?> … … 594 597 </sru:echoedExplainRequest> 595 598 <sru:extraResponseData> 596 <ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description"> 597 <ed:Profile>basic</ed:Profile> 599 <ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="1"> 600 <ed:Capabilities> 601 <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability> 602 </ed:Capabilities> 598 603 <ed:SupportedDataViews> 599 <ed:SupportedDataView>application/x-clarin-fcs-hits+xml</ed:SupportedDataView>604 <ed:SupportedDataView id="dv1" delivery-policy="send-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView> 600 605 </ed:SupportedDataViews> 601 606 <ed:Resources> 602 <!-- just one top-level resource at the Endpoint --> 603 <ed:Resource pid="http://hdl.handle.net/4711/0815"> 604 <ed:Title xml:lang="de">Goethe Corpus</ed:Title> 605 <ed:Title xml:lang="en">Goethe Korpus</ed:Title> 606 <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description> 607 <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description> 608 <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI> 609 <ed:Languages> 610 <ed:Language>deu</ed:Language> 611 </ed:Languages> 612 </ed:Resource> 607 <!-- just one top-level resource at the Endpoint --> 608 <ed:Resource pid="http://hdl.handle.net/4711/0815"> 609 <ed:Title xml:lang="de">Goethe Corpus</ed:Title> 610 <ed:Title xml:lang="en">Goethe Korpus</ed:Title> 611 <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description> 612 <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description> 613 <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI> 614 <ed:Languages> 615 <ed:Language>deu</ed:Language> 616 </ed:Languages> 617 <ed:AvailableDataViews ref="dv1"/> 618 </ed:Resource> 613 619 </ed:Resources> 614 620 </ed:EndpointDescription> … … 627 633 628 634 The following example shows a request and response to an ''searchRetrieve'' request with a ''term-only'' query for "cat": 629 * HTTP GET request: Client ?Endpoint:635 * HTTP GET request: Client → Endpoint: 630 636 {{{#!sh 631 637 http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat 632 638 }}} 633 * HTTP Response: Endpoint ?Client:639 * HTTP Response: Endpoint → Client: 634 640 {{{#!xml 635 641 <?xml version='1.0' encoding='utf-8'?> … … 689 695 http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-context=http://hdl.handle.net/4711/0815,http://hdl.handle.net/4711/0816-2 690 696 }}} 691 If an invalid persistent identifier is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/1` diagnostic, i.e add the appropiate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. just issue the diagnostic and perform no search or it `MAY` treat it a non-fatal and perform the search. 697 If an invalid persistent identifier is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/1` diagnostic, i.e add the appropriate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. just issue the diagnostic and perform no search or it `MAY` treat it a non-fatal and perform the search. 698 699 If a Client want to request one or more Data Views, that are handled by Endpoint with the ''need-to-request'' delivery policy, it `MUST` pass a comma-separated of ''Data View identifier'' in the `x-fcs-dataviews` extra request parameter of the 'searchRetrieve' request. A Client can extract valid values for the ''Data View identifiers'' from the `@id` attribute of the `<ed:SupportedDataView>` elements in the Endpoint Description of the Endpoint (see section [#explain Operation ''explain''] and section [#endpointDescription Endpoint Description]). 700 701 For example, to request the CMDI Data View from an Endpoint that has an Endpoint Description as described in [#REF_Example_5 Example 5] and Client would need to use the ''Data View identifier'' `dv2` and submit the following request: 702 {{{#!sh 703 http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-dataviews=dv2 704 }}} 705 If an invalid ''Data View identifier'' is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/4`diagnostic, i.e. add the appropriate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. just issue the diagnostic and perform no search or it `MAY` treat it a non-fatal and perform the search. 692 706 693 707 == Normative Appendix