Changes between Version 47 and Version 48 of FCS-Specification-ScrapBook
- Timestamp:
- 02/14/14 13:12:25 (10 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
FCS-Specification-ScrapBook
v47 v48 152 152 Endpoints and Clients `MUST` adhere the [#REF_XML_Namespaces XML-Namespaces] specification. The CLARIN-FCS interface specification generally does not dictate whether XML elements should be serialized in their prefixed or non-prefixed syntax, but Endpoints `MUST` ensure that the correct XML namespace is used for elements and that XML namespaces are declared correctly. Clients `MUST` be agnostic to which syntax for serializing the XML elements, i.e. if the prefixed or un-prefixed variant was used, and `SHOULD` operate solely on ''expanded names'', i.e. pairs of ''namespace name'' and ''local name''. 153 153 154 The following XML namespace names and prefixes are used throughout this specification. The column ''Recommended Syntax'' indicates, which syntax variant `SHOULD` be used by Endpoints when serializingthe XML response.154 The following XML namespace names and prefixes are used throughout this specification. The column "Recommended Syntax" indicates, which syntax variant `SHOULD` be used by the Endpoint to serialize the XML response. 155 155 ||=Prefix =||=Namespace Name =||=Comment =||= Recommended Syntax =|| 156 156 || `fsc` || `http://clarin.eu/fcs/1.0` || CLARIN-FCS Resources || prefixed || … … 202 202 ''Basic profile'':: 203 203 Endpoints `MUST` support ''term-only'' queries. \\ 204 Endpoints `SHOULD` support ''terms'' combined with boolean operator queries (''AND'' and ''OR''), including subqueries. Endpoints `MAY` also support ''NOT'' or ''PROX'' operator queries. If anEndpoint does not support a query, i.e. the used operators are not supported by the Endpoint, it `MUST` return an appropriate error message using the appropriate SRU diagnostic. \\204 Endpoints `SHOULD` support ''terms'' combined with boolean operator queries (''AND'' and ''OR''), including subqueries. Endpoints `MAY` also support ''NOT'' or ''PROX'' operator queries. If the Endpoint does not support a query, i.e. the used operators are not supported by the Endpoint, it `MUST` return an appropriate error message using the appropriate SRU diagnostic. \\ 205 205 Examples for valid CQL queries for the ''basic profile'': 206 206 {{{ … … 228 228 To encode search results, CLARIN-FCS supports two building blocks: 229 229 Resources:: 230 A ''Resource'' is an searchable entity at anEndpoint, such as a text corpus or an multi-modal corpus. A resource `SHOULD` be a self-contained unit, i.e. not a sentence in a text corpus or a time interval in an audio transcription.230 A ''Resource'' is an searchable entity at the Endpoint, such as a text corpus or an multi-modal corpus. A resource `SHOULD` be a self-contained unit, i.e. not a sentence in a text corpus or a time interval in an audio transcription. 231 231 Resource Fragments:: 232 232 A ''Resource Fragment'' is a smaller unit in a ''Resource'', i.e. a sentence in a text corpus or a time interval in an audio transcription. 233 233 234 A Resource `SHOULD` be the most precise unit of data that is directly addressable as a "whole". A Resource `SHOULD` contain a Resource Fragment, if the hit consists of just a part of the Resource unit, if the hit is a sentence within a large text. A Resource Fragment `SHOULD` be addressable within a resource, i.e. it has an offset or a resource-internal identifier. Using Resource Fragments is `OPTIONAL`, but Endpoints are encouraged to use them. If anEndpoint encodes a hit with a Resource Fragment, the actual hit `SHOULD` be encoded as a Data View that is encoded in a Resource Fragment.235 236 Endpoints `SHOULD` always provide a links to the resource itself, i.e. each Resource or Resource Fragment `SHOULD` be identified by a persistent identifier or providing a n Endpoint unique URI. Even if direct linking is not possible, i.e. due to licensing issues, the Endpoints `SHOULD` provide a URI to link to a web-page describing the corpus or collection, including instruction on how to obtain it. Endpoints `SHOULD` provide links that are as specific as possible (and logical), i.e. if a sentence within a resource cannot be addressed directly, the Resource Fragment `SHOULD NOT` contain a persistent identifier or an URI.237 238 If anEndpoint can provide both, a persistent identifier as well as an URI, for either Resource or Resource Fragment, they `SHOULD` provide both. When working with results, Clients `SHOULD` prefer persistent identifiers over regular URIs.234 A Resource `SHOULD` be the most precise unit of data that is directly addressable as a "whole". A Resource `SHOULD` contain a Resource Fragment, if the hit consists of just a part of the Resource unit, if the hit is a sentence within a large text. A Resource Fragment `SHOULD` be addressable within a resource, i.e. it has an offset or a resource-internal identifier. Using Resource Fragments is `OPTIONAL`, but Endpoints are encouraged to use them. If the Endpoint encodes a hit with a Resource Fragment, the actual hit `SHOULD` be encoded as a Data View that is encoded in a Resource Fragment. 235 236 Endpoints `SHOULD` always provide a links to the resource itself, i.e. each Resource or Resource Fragment `SHOULD` be identified by a persistent identifier or providing a URI, that is unique for Endpoint. Even if direct linking is not possible, i.e. due to licensing issues, the Endpoints `SHOULD` provide a URI to link to a web-page describing the corpus or collection, including instruction on how to obtain it. Endpoints `SHOULD` provide links that are as specific as possible (and logical), i.e. if a sentence within a resource cannot be addressed directly, the Resource Fragment `SHOULD NOT` contain a persistent identifier or an URI. 237 238 If the Endpoint can provide both, a persistent identifier as well as an URI, for either Resource or Resource Fragment, they `SHOULD` provide both. When working with results, Clients `SHOULD` prefer persistent identifiers over regular URIs. 239 239 240 240 Resource and Resource Fragment are serialized in XML and Endpoints `MUST` generate responses, that are valid according to the XML schema "[source:FederatedSearch/schema/Resource.xsd Resource.xsd]" ([source:FederatedSearch/schema/Resource.xsd?format=txt download]). A Resource is encoded in the form of a `<fcs:Resource>` element, a ''Resource Fragment'' in the form of a `<fcs:ResourceFragment>` element. The content of a Data View is wrapped in a `<fcs:DataView>` element. `<fcs:Resource>` is the top-level element and `MAY` contain zero or more `<fcs:DataView>` elements and `MAY` contain zero or more `<fcs:ResourceFragment>` elements. A `<fcs:ResourceFragment>` element `MUST` contain one or more `<fcs:DataView>` elements. … … 283 283 </fcs:Resource> 284 284 }}} 285 The most complex [#REF_Example_3 example] is similar to [#REF_Example_2 Example 2], i.e. it shows a hit is encoded as one ''Generic Hits'' Data View in a Resource Fragment, that is embedded in a Resource. In contrast to Example 2, another Data View of type ''CMDI'' is embedded directly within the Resource. AnEndpoint can use this type of Data View to directly provide CMDI metadata about the Resource to Clients.285 The most complex [#REF_Example_3 example] is similar to [#REF_Example_2 Example 2], i.e. it shows a hit is encoded as one ''Generic Hits'' Data View in a Resource Fragment, that is embedded in a Resource. In contrast to Example 2, another Data View of type ''CMDI'' is embedded directly within the Resource. The Endpoint can use this type of Data View to directly provide CMDI metadata about the Resource to Clients. 286 286 All entities of the Hit can be referenced by a persistent identifier and an URI. The complete Resource is referenceable by either the persistent identifier `http://hdl.handle.net/4711/08-15` or the URI `http://repos.example.org/file/text_08_15.html` and the CMDI metadata record in the CMDI Data View is referenceable either by the persistent identifier `http://hdl.handle.net/4711/08-15-1` or the URI `http://repos.example.org/file/08_15_1.cmdi`. The actual hit in the Resource Fragment is also directly referenceable by either the persistent identifier `http://hdl.handle.net/4711/00-15-2` or the URI `http://repos.example.org/file/text_08_15.html#sentence2`. 287 287 … … 375 375 376 376 === Endpoint Description ===#endpointDescription 377 Endpoints need to provide information about their capabilities to support auto-configuration of Clients, This capabilities include, among other information, the Profile that is supported by anEndpoint. The ''Endpoint Description'' mechanism provides the necessary facility to provide this information to the Clients. Endpoints `MUST` encode their capabilities using an XML format and embed this information into the SRU/CQL protocol as described in section [#explain Operation ''explain'']. The XML fragment generated by the Endpoint for the Endpoint Description `MUST` be valid according to the XML schema "[source:FederatedSearch/schema/Endpoint-Description.xsd Endpoint-Description.xsd]" ([source:FederatedSearch/schema/Endpoint-Description.xsd?format=txt download]).377 Endpoints need to provide information about their capabilities to support auto-configuration of Clients, This capabilities include, among other information, the Profile that is supported by the Endpoint. The ''Endpoint Description'' mechanism provides the necessary facility to provide this information to the Clients. Endpoints `MUST` encode their capabilities using an XML format and embed this information into the SRU/CQL protocol as described in section [#explain Operation ''explain'']. The XML fragment generated by the Endpoint for the Endpoint Description `MUST` be valid according to the XML schema "[source:FederatedSearch/schema/Endpoint-Description.xsd Endpoint-Description.xsd]" ([source:FederatedSearch/schema/Endpoint-Description.xsd?format=txt download]). 378 378 379 379 The XML fragment for ''Endpoint Description'' is encoded as an `<ed:EndpointDescription>` element, that contains the following children: … … 386 386 A list of Data Views, that are supported by this Endpoint. This list is composed of one or more `<ed:SupportedDataView>` elements. The content of a `<ed:SupportedDataView>` `MUST` be the MIME type of a supported Data View, e.g. `application/x-clarin-fcs-hits+xml`. 387 387 * one `<ed:Collections>` element (`REQUIRED`) \\ 388 A list of (top-level) collections that are available at the Endpoint. The `<ed:Collections>` element contains one or more `<ed:Collection>` elements (see below). AnEndpoint `MUST` declare at least one (top-level) collection.389 390 The `<ed:Collection>` element contains a detailed description of a collection that is available at anEndpoint. A collection is a searchable entity, e.g. a single corpus. The `<ed:Collection>` has a mandatory `@pid` attribute, that contains persistent identifier of the collection. This value `MUST` be the same as the ''!MdSelfLink'' of the CMDI record describing the collection. The `<ed:Collection>` element contains the following children:388 A list of (top-level) collections that are available at the Endpoint. The `<ed:Collections>` element contains one or more `<ed:Collection>` elements (see below). The Endpoint `MUST` declare at least one (top-level) collection. 389 390 The `<ed:Collection>` element contains a detailed description of a collection that is available at the Endpoint. A collection is a searchable entity, e.g. a single corpus. The `<ed:Collection>` has a mandatory `@pid` attribute, that contains persistent identifier of the collection. This value `MUST` be the same as the ''!MdSelfLink'' of the CMDI record describing the collection. The `<ed:Collection>` element contains the following children: 391 391 * one or more `<ed:Title>` elements (`REQUIRED`) \\ 392 392 A human readable title for the collection. A `REQUIRED` `@xml:lang` attribute indicates the language of the title. An English version of the title is `REQUIRED`. The list of titles `MUST NOT` contain duplicate entries for the same language. … … 476 476 </ed:EndpointDescription> 477 477 }}} 478 This more complex [#REF_Example_5 example] show a Endpoint Description for an Endpoint that, similar to [#REF_Example_4 Example 4], supports the ''basic'' profile. In addition to the Generic Hits Data View it also supports CMDI the CMDI Data View. The Endpoint has two top-level collections (identified by the persistent identifiers `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816`. The second top-level collection has two sub-collections, identified by the persistent identifier `http://hdl.handle.net/4711/0816-1` and `http://hdl.handle.net/4711/0816-2`. All collections are described using several properties, like title, description, etc.478 This more complex [#REF_Example_5 example] show an Endpoint Description for an Endpoint that, similar to [#REF_Example_4 Example 4], supports the ''basic'' profile. In addition to the Generic Hits Data View it also supports CMDI the CMDI Data View. The Endpoint has two top-level collections (identified by the persistent identifiers `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816`. The second top-level collection has two sub-collections, identified by the persistent identifier `http://hdl.handle.net/4711/0816-1` and `http://hdl.handle.net/4711/0816-2`. All collections are described using several properties, like title, description, etc. 479 479 480 480 == CLARIN-FCS to SRU/CQL binding == … … 505 505 The ''explain'' operation of the SRU protocol serves to announce server capabilities and to allows clients to configure themselves automatically. This operation is used similarly. 506 506 507 AnEndpoint `MUST` respond to a ''explain'' request by a proper ''explain'' response. As per [#REF_Explain SRU-Explain], the response `MUST` contain one `<sru:record>` element that contains an ''SRU Explain'' record. The `<sru:recordSchema>` element `MUST` contain the literal `http://explain.z3950.org/dtd/2.0/`, i.e. the official ''identifier'' for Explain records.507 The Endpoint `MUST` respond to a ''explain'' request by a proper ''explain'' response. As per [#REF_Explain SRU-Explain], the response `MUST` contain one `<sru:record>` element that contains an ''SRU Explain'' record. The `<sru:recordSchema>` element `MUST` contain the literal `http://explain.z3950.org/dtd/2.0/`, i.e. the official ''identifier'' for Explain records. 508 508 509 509 According to the Profile supported by the Endpoint the Explain record `MUST` contain the following elements: … … 517 517 '''NOTE''': the extended profile is not yet defined and will be part of a future CLARIN-FCS specification. 518 518 519 To support auto-configuration in CLARIN-FCS, an Endpoint provide an ''Endpoint Description''. The Endpoint Description is included in explain response utilizing SRUs extension mechanism, i.e. by embedding an XML fragment into the `<sru:extraResponseData>` element. Endpoints `MUST` include the Endpoint Description ''only'' if a Client performs an explain request with the ''extra request parameter'' `x-clarin-fcs-endpoint-description` with a value of `true`. If aClient performs an explain request ''without'' supplying this extra request parameter the Endpoint `MUST NOT` include519 To support auto-configuration in CLARIN-FCS, the Endpoint provide an ''Endpoint Description''. The Endpoint Description is included in explain response utilizing SRUs extension mechanism, i.e. by embedding an XML fragment into the `<sru:extraResponseData>` element. The Endpoint `MUST` include the Endpoint Description ''only'' if the Client performs an explain request with the ''extra request parameter'' `x-clarin-fcs-endpoint-description` with a value of `true`. If the Client performs an explain request ''without'' supplying this extra request parameter the Endpoint `MUST NOT` include 520 520 the Endpoint Description. The format of the Endpoint Description XML fragment is defined in [#REF_endpointDescription Endpoint Description]. 521 521 … … 595 595 596 596 === Operation ''searchRetrieve'' ===#searchRetrieve 597 The ''searchRetrieve'' operation of the SRU protocol is used for searching in the Resources that are provided by anEndpoint. The SRU protocol defines the serialization of request and response formats in [#REF_SRU_12 OASIS-SRU-12]. In SRU, search result hits are encoded down to a record level, i.e. the `<sru:record>` element, and SRU allows records to be serialized in various formats, so called ''record schemas''.597 The ''searchRetrieve'' operation of the SRU protocol is used for searching in the Resources that are provided by the Endpoint. The SRU protocol defines the serialization of request and response formats in [#REF_SRU_12 OASIS-SRU-12]. In SRU, search result hits are encoded down to a record level, i.e. the `<sru:record>` element, and SRU allows records to be serialized in various formats, so called ''record schemas''. 598 598 599 599 Endpoints `MUST` support the CLARIN-FCS record schema (see section [#resultFormat Result Format]) and `MUST` use the value `http://clarin.eu/fcs/1.0` for the ''responseItemType'' ("record schema identifier"). … … 648 648 }}} 649 649 650 In general, the Endpoint is `REQUIRED` to accept an unrestricted search and `MUST` then perform the search operation on all Resources at an Endpoint. The Client can request the Endpoint to ''restrict the search'' to a sub-collection of the Resources available. In this case, the Client `MUST` pass a comma-separated list of persistent identifiers in the {{{x-clarin-fcs-context}}} extra request parameter of the ''searchRetrieve'' request. The Endpoint `MUST` then restrict the search to those Resources, that are identified by the persistent identifiers passed by Client. AClient can extract all valid persistent identifiers from the `@pid` attribute of the `<ed:Collection>` element, obtained by the ''explain'' request (see section [#explain Operation ''explain''] and section [#endpointDescription Endpoint Description]). The list of persistent identifiers can get extensive, but an agent `MAY` use the POST method instead of GET method for submitting the request.650 In general, the Endpoint is `REQUIRED` to accept an unrestricted search and `MUST` then perform the search operation on all Resources, that are available at the Endpoint. The Client can request the Endpoint to ''restrict the search'' to a sub-collection of these Resources. In this case, the Client `MUST` pass a comma-separated list of persistent identifiers in the {{{x-clarin-fcs-context}}} extra request parameter of the ''searchRetrieve'' request. The Endpoint `MUST` then restrict the search to those Resources, that are identified by the persistent identifiers passed by Client. The Client can extract all valid persistent identifiers from the `@pid` attribute of the `<ed:Collection>` element, obtained by the ''explain'' request (see section [#explain Operation ''explain''] and section [#endpointDescription Endpoint Description]). The list of persistent identifiers can get extensive, but an agent `MAY` use the POST method instead of GET method for submitting the request. 651 651 652 652 For example, to restrict the search to the Resource with the persistent identifier `http://hdl.handle.net/4711/0815` the Client must issue the following request: … … 659 659 }}} 660 660 661 If an invalid persistent identifier is passed by aClient, the Endpoint `MUST` issue a `http://clarin.eu/fcs/1.0/diagnostic/1` diagnostic, i.e add the appropiate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. just issue the diagnostic and perform no search or it `MAY` treat it a non-fatal and perform the search.661 If an invalid persistent identifier is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/1.0/diagnostic/1` diagnostic, i.e add the appropiate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. just issue the diagnostic and perform no search or it `MAY` treat it a non-fatal and perform the search. 662 662 663 663 … … 666 666 The following extra request parameters are used in CLARIN-FCS: 667 667 ||=Parameter Name =||=SRU operations =||=Allowed values =||= Description =|| 668 || `x-clarin-fcs-endpoint-description` || explain || `true` \\ All other values are reserved an `MUST` not be used by Clients || If present, the Endpoint `MUST` include a Endpoint Description in the\\`<sru:extraResponseData>` element of an ''explain'' response. ||668 || `x-clarin-fcs-endpoint-description` || explain || `true` \\ All other values are reserved an `MUST` not be used by Clients || If present, the Endpoint `MUST` include an Endpoint Description in the\\`<sru:extraResponseData>` element of an ''explain'' response. || 669 669 || `x-clarin-fcs-context` || searchRetrieve || A comma separated list of persistent identifiers || The Endpoint `MUST` restrict the search to the collections identified by\\the persistent identifiers || 670 670 … … 709 709 *WIP* 710 710 711 AnEndpoint `MAY` add arbitrary XML fragments to a `<fcs:Resource>` element. Clients `MUST` ignore any custom extensions they do not understand.711 The Endpoint `MAY` add arbitrary XML fragments to a `<fcs:Resource>` element. Clients `MUST` ignore any custom extensions they do not understand. 712 712 Endpoints `MUST` use a custom XML namespace name for their extensions. Endpoints `MUST NOT` use XML namespace names, that start with the prefixes `http://clarin.eu`, `http://www.clarin.eu/`, `https://clarin.eu` or `http://www.clarin.eu/`. 713 713