Changes between Version 47 and Version 48 of FCS-Specification-ScrapBook


Ignore:
Timestamp:
02/14/14 13:12:25 (10 years ago)
Author:
oschonef
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FCS-Specification-ScrapBook

    v47 v48  
    152152Endpoints and Clients `MUST` adhere the [#REF_XML_Namespaces XML-Namespaces] specification. The CLARIN-FCS interface specification generally does not dictate whether XML elements should be serialized in their prefixed or non-prefixed syntax, but Endpoints `MUST` ensure that the correct XML namespace is used for elements and that XML namespaces are declared correctly. Clients `MUST` be agnostic to which syntax for serializing the XML elements, i.e. if the prefixed or un-prefixed variant was used, and `SHOULD` operate solely on ''expanded names'', i.e. pairs of ''namespace name'' and ''local name''.
    153153
    154 The following XML namespace names and prefixes are used throughout this specification. The column ''Recommended Syntax'' indicates, which syntax variant `SHOULD` be used by Endpoints when serializing the XML response.
     154The following XML namespace names and prefixes are used throughout this specification. The column "Recommended Syntax" indicates, which syntax variant `SHOULD` be used by the Endpoint to serialize the XML response.
    155155||=Prefix =||=Namespace Name                                 =||=Comment                        =||= Recommended Syntax =||
    156156|| `fsc`   || `http://clarin.eu/fcs/1.0`                      || CLARIN-FCS Resources            || prefixed ||
     
    202202 ''Basic profile''::
    203203   Endpoints `MUST` support ''term-only'' queries. \\
    204    Endpoints `SHOULD` support ''terms'' combined with boolean operator queries (''AND'' and ''OR''), including subqueries. Endpoints `MAY` also support ''NOT'' or ''PROX'' operator queries. If an Endpoint does not support a query, i.e. the used operators are not supported by the Endpoint, it `MUST` return an appropriate error message using the appropriate SRU diagnostic. \\
     204   Endpoints `SHOULD` support ''terms'' combined with boolean operator queries (''AND'' and ''OR''), including subqueries. Endpoints `MAY` also support ''NOT'' or ''PROX'' operator queries. If the Endpoint does not support a query, i.e. the used operators are not supported by the Endpoint, it `MUST` return an appropriate error message using the appropriate SRU diagnostic. \\
    205205   Examples for valid CQL queries for the ''basic profile'':
    206206{{{
     
    228228To encode search results, CLARIN-FCS supports two building blocks:
    229229 Resources::
    230    A ''Resource'' is an searchable entity at an Endpoint, such as a text corpus or an multi-modal corpus. A resource `SHOULD` be a self-contained unit, i.e. not a sentence in a text corpus or a time interval in an audio transcription.
     230   A ''Resource'' is an searchable entity at the Endpoint, such as a text corpus or an multi-modal corpus. A resource `SHOULD` be a self-contained unit, i.e. not a sentence in a text corpus or a time interval in an audio transcription.
    231231 Resource Fragments::
    232232   A ''Resource Fragment'' is a smaller unit in a ''Resource'', i.e. a sentence in a text corpus or a time interval in an audio transcription.
    233233
    234 A Resource `SHOULD` be the most precise unit of data that is directly addressable as a "whole". A Resource `SHOULD` contain a Resource Fragment, if the hit consists of just a part of the Resource unit, if the hit is a sentence within a large text. A Resource Fragment `SHOULD` be addressable within a resource, i.e. it has an offset or a resource-internal identifier. Using Resource Fragments is `OPTIONAL`, but Endpoints are encouraged to use them. If an Endpoint encodes a hit with a Resource Fragment, the actual hit `SHOULD` be encoded as a Data View that is encoded in a Resource Fragment.
    235 
    236 Endpoints `SHOULD` always provide a links to the resource itself, i.e. each Resource or Resource Fragment `SHOULD` be identified by a persistent identifier or providing an Endpoint unique URI. Even if direct linking is not possible, i.e. due to licensing issues, the Endpoints `SHOULD` provide a URI to link to a web-page describing the corpus or collection, including instruction on how to obtain it. Endpoints `SHOULD` provide links that are as specific as possible (and logical), i.e. if a sentence within a resource cannot be addressed directly, the Resource Fragment `SHOULD NOT` contain a persistent identifier or an URI.
    237 
    238 If an Endpoint can provide both, a persistent identifier as well as an URI, for either Resource or Resource Fragment, they `SHOULD` provide both. When working with results, Clients `SHOULD` prefer persistent identifiers over regular URIs.
     234A Resource `SHOULD` be the most precise unit of data that is directly addressable as a "whole". A Resource `SHOULD` contain a Resource Fragment, if the hit consists of just a part of the Resource unit, if the hit is a sentence within a large text. A Resource Fragment `SHOULD` be addressable within a resource, i.e. it has an offset or a resource-internal identifier. Using Resource Fragments is `OPTIONAL`, but Endpoints are encouraged to use them. If the Endpoint encodes a hit with a Resource Fragment, the actual hit `SHOULD` be encoded as a Data View that is encoded in a Resource Fragment.
     235
     236Endpoints `SHOULD` always provide a links to the resource itself, i.e. each Resource or Resource Fragment `SHOULD` be identified by a persistent identifier or providing a URI, that is unique for Endpoint. Even if direct linking is not possible, i.e. due to licensing issues, the Endpoints `SHOULD` provide a URI to link to a web-page describing the corpus or collection, including instruction on how to obtain it. Endpoints `SHOULD` provide links that are as specific as possible (and logical), i.e. if a sentence within a resource cannot be addressed directly, the Resource Fragment `SHOULD NOT` contain a persistent identifier or an URI.
     237
     238If the Endpoint can provide both, a persistent identifier as well as an URI, for either Resource or Resource Fragment, they `SHOULD` provide both. When working with results, Clients `SHOULD` prefer persistent identifiers over regular URIs.
    239239
    240240Resource and Resource Fragment are serialized in XML and Endpoints `MUST` generate responses, that are valid according to the XML schema "[source:FederatedSearch/schema/Resource.xsd Resource.xsd]" ([source:FederatedSearch/schema/Resource.xsd?format=txt download]). A Resource is encoded in the form of a `<fcs:Resource>` element, a ''Resource Fragment'' in the form of a `<fcs:ResourceFragment>` element. The content of a Data View is wrapped in a `<fcs:DataView>` element. `<fcs:Resource>` is the top-level element and `MAY` contain zero or more `<fcs:DataView>` elements and `MAY` contain zero or more `<fcs:ResourceFragment>` elements. A `<fcs:ResourceFragment>` element `MUST` contain one or more `<fcs:DataView>` elements.
     
    283283</fcs:Resource>
    284284}}}
    285 The most complex [#REF_Example_3 example] is similar to [#REF_Example_2 Example 2], i.e. it shows a hit is encoded as one ''Generic Hits'' Data View in a Resource Fragment, that is embedded in a Resource. In contrast to Example 2, another Data View of type ''CMDI'' is embedded directly within the Resource. An Endpoint can use this type of Data View to directly provide CMDI metadata about the Resource to Clients.
     285The most complex [#REF_Example_3 example] is similar to [#REF_Example_2 Example 2], i.e. it shows a hit is encoded as one ''Generic Hits'' Data View in a Resource Fragment, that is embedded in a Resource. In contrast to Example 2, another Data View of type ''CMDI'' is embedded directly within the Resource. The Endpoint can use this type of Data View to directly provide CMDI metadata about the Resource to Clients.
    286286All entities of the Hit can be referenced by a persistent identifier and an URI. The complete Resource is referenceable by either the persistent identifier `http://hdl.handle.net/4711/08-15` or the URI `http://repos.example.org/file/text_08_15.html` and the CMDI metadata record in the CMDI Data View is referenceable either by the persistent identifier `http://hdl.handle.net/4711/08-15-1` or the URI `http://repos.example.org/file/08_15_1.cmdi`. The actual hit in the Resource Fragment is also directly referenceable by either the persistent identifier `http://hdl.handle.net/4711/00-15-2` or the URI `http://repos.example.org/file/text_08_15.html#sentence2`.   
    287287
     
    375375
    376376=== Endpoint Description ===#endpointDescription
    377 Endpoints need to provide information about their capabilities to support auto-configuration of Clients, This capabilities include, among other information, the Profile that is supported by an Endpoint. The ''Endpoint Description'' mechanism provides the necessary facility to provide this information to the Clients. Endpoints `MUST` encode their capabilities using an XML format and embed this information into the SRU/CQL protocol as described in section [#explain Operation ''explain'']. The XML fragment generated by the Endpoint for the Endpoint Description `MUST` be valid according to the XML schema "[source:FederatedSearch/schema/Endpoint-Description.xsd Endpoint-Description.xsd]" ([source:FederatedSearch/schema/Endpoint-Description.xsd?format=txt download]).
     377Endpoints need to provide information about their capabilities to support auto-configuration of Clients, This capabilities include, among other information, the Profile that is supported by the Endpoint. The ''Endpoint Description'' mechanism provides the necessary facility to provide this information to the Clients. Endpoints `MUST` encode their capabilities using an XML format and embed this information into the SRU/CQL protocol as described in section [#explain Operation ''explain'']. The XML fragment generated by the Endpoint for the Endpoint Description `MUST` be valid according to the XML schema "[source:FederatedSearch/schema/Endpoint-Description.xsd Endpoint-Description.xsd]" ([source:FederatedSearch/schema/Endpoint-Description.xsd?format=txt download]).
    378378
    379379The XML fragment for ''Endpoint Description'' is encoded as an `<ed:EndpointDescription>` element, that contains the following children:
     
    386386   A list of Data Views, that are supported by this Endpoint. This list is composed of one or more `<ed:SupportedDataView>` elements. The content of a `<ed:SupportedDataView>` `MUST` be the MIME type of a supported Data View, e.g. `application/x-clarin-fcs-hits+xml`.
    387387 * one `<ed:Collections>` element (`REQUIRED`) \\
    388    A list of (top-level) collections that are available at the Endpoint. The `<ed:Collections>` element contains one or more `<ed:Collection>` elements (see below). An Endpoint `MUST` declare at least one (top-level) collection.
    389 
    390 The `<ed:Collection>` element contains a detailed description of a collection that is available at an Endpoint. A collection is a searchable entity, e.g. a single corpus. The `<ed:Collection>` has a mandatory `@pid` attribute, that contains persistent identifier of the collection. This value `MUST` be the same as the ''!MdSelfLink'' of the CMDI record describing the collection. The `<ed:Collection>` element contains the following children:
     388   A list of (top-level) collections that are available at the Endpoint. The `<ed:Collections>` element contains one or more `<ed:Collection>` elements (see below). The Endpoint `MUST` declare at least one (top-level) collection.
     389
     390The `<ed:Collection>` element contains a detailed description of a collection that is available at the Endpoint. A collection is a searchable entity, e.g. a single corpus. The `<ed:Collection>` has a mandatory `@pid` attribute, that contains persistent identifier of the collection. This value `MUST` be the same as the ''!MdSelfLink'' of the CMDI record describing the collection. The `<ed:Collection>` element contains the following children:
    391391 * one or more `<ed:Title>` elements (`REQUIRED`) \\
    392392   A human readable title for the collection. A `REQUIRED` `@xml:lang` attribute indicates the language of the title. An English version of  the title is `REQUIRED`. The list of titles `MUST NOT` contain duplicate entries for the same language.
     
    476476</ed:EndpointDescription>
    477477}}}
    478 This more complex [#REF_Example_5 example] show a Endpoint Description for an Endpoint that, similar to [#REF_Example_4 Example 4], supports the ''basic'' profile. In addition to the Generic Hits Data View it also supports CMDI the CMDI Data View. The Endpoint has two top-level collections (identified by the persistent identifiers `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816`. The second top-level collection has two sub-collections, identified by the persistent identifier `http://hdl.handle.net/4711/0816-1` and `http://hdl.handle.net/4711/0816-2`. All collections are described using several properties, like title, description, etc.
     478This more complex [#REF_Example_5 example] show an Endpoint Description for an Endpoint that, similar to [#REF_Example_4 Example 4], supports the ''basic'' profile. In addition to the Generic Hits Data View it also supports CMDI the CMDI Data View. The Endpoint has two top-level collections (identified by the persistent identifiers `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816`. The second top-level collection has two sub-collections, identified by the persistent identifier `http://hdl.handle.net/4711/0816-1` and `http://hdl.handle.net/4711/0816-2`. All collections are described using several properties, like title, description, etc.
    479479
    480480== CLARIN-FCS to SRU/CQL binding ==
     
    505505The ''explain'' operation of the SRU protocol serves to announce server capabilities and to allows clients to configure themselves automatically. This operation is used similarly.
    506506
    507 An Endpoint `MUST` respond to a ''explain'' request by a proper ''explain'' response. As per [#REF_Explain SRU-Explain], the response `MUST` contain one `<sru:record>` element that contains an ''SRU Explain'' record. The `<sru:recordSchema>` element `MUST` contain the literal `http://explain.z3950.org/dtd/2.0/`, i.e. the official ''identifier'' for Explain records.
     507The Endpoint `MUST` respond to a ''explain'' request by a proper ''explain'' response. As per [#REF_Explain SRU-Explain], the response `MUST` contain one `<sru:record>` element that contains an ''SRU Explain'' record. The `<sru:recordSchema>` element `MUST` contain the literal `http://explain.z3950.org/dtd/2.0/`, i.e. the official ''identifier'' for Explain records.
    508508
    509509According to the Profile supported by the Endpoint the Explain record `MUST` contain the following elements:
     
    517517   '''NOTE''': the extended profile is not yet defined and will be part of a future CLARIN-FCS specification.
    518518
    519 To support auto-configuration in CLARIN-FCS, an Endpoint provide an ''Endpoint Description''. The Endpoint Description is included in explain response utilizing SRUs extension mechanism, i.e. by embedding an XML fragment into the `<sru:extraResponseData>` element. Endpoints `MUST` include the Endpoint Description ''only'' if a Client performs an explain request with the ''extra request parameter'' `x-clarin-fcs-endpoint-description` with a value of `true`. If a Client performs an explain request ''without'' supplying this extra request parameter the Endpoint `MUST NOT` include
     519To support auto-configuration in CLARIN-FCS, the Endpoint provide an ''Endpoint Description''. The Endpoint Description is included in explain response utilizing SRUs extension mechanism, i.e. by embedding an XML fragment into the `<sru:extraResponseData>` element. The Endpoint `MUST` include the Endpoint Description ''only'' if the Client performs an explain request with the ''extra request parameter'' `x-clarin-fcs-endpoint-description` with a value of `true`. If the Client performs an explain request ''without'' supplying this extra request parameter the Endpoint `MUST NOT` include
    520520the Endpoint Description. The format of the Endpoint Description XML fragment is defined in [#REF_endpointDescription Endpoint Description].
    521521
     
    595595
    596596=== Operation ''searchRetrieve'' ===#searchRetrieve
    597 The ''searchRetrieve'' operation of the SRU protocol is used for searching in the Resources that are provided by an Endpoint. The SRU protocol defines the serialization of request and response formats in [#REF_SRU_12 OASIS-SRU-12]. In SRU, search result hits are encoded down to a record level, i.e. the `<sru:record>` element, and SRU allows records to be serialized in various formats, so called ''record schemas''.
     597The ''searchRetrieve'' operation of the SRU protocol is used for searching in the Resources that are provided by the Endpoint. The SRU protocol defines the serialization of request and response formats in [#REF_SRU_12 OASIS-SRU-12]. In SRU, search result hits are encoded down to a record level, i.e. the `<sru:record>` element, and SRU allows records to be serialized in various formats, so called ''record schemas''.
    598598
    599599Endpoints `MUST` support the CLARIN-FCS record schema (see section [#resultFormat Result Format]) and `MUST` use the value `http://clarin.eu/fcs/1.0` for the ''responseItemType'' ("record schema identifier").
     
    648648}}}
    649649
    650 In general, the Endpoint is `REQUIRED` to accept an unrestricted search and `MUST` then perform the search operation on all Resources at an Endpoint. The Client can request the Endpoint to ''restrict the search'' to a sub-collection of the Resources available. In this case, the Client `MUST` pass a comma-separated list of persistent identifiers in the {{{x-clarin-fcs-context}}} extra request parameter of the ''searchRetrieve'' request. The Endpoint `MUST` then restrict the search to those Resources, that are identified by the persistent identifiers passed by Client. A Client can extract all valid persistent identifiers from the `@pid` attribute of the `<ed:Collection>` element, obtained by the ''explain'' request (see section [#explain Operation ''explain''] and section [#endpointDescription  Endpoint Description]). The list of persistent identifiers can get extensive, but an agent `MAY` use the POST method instead of GET method for submitting the request.
     650In general, the Endpoint is `REQUIRED` to accept an unrestricted search and `MUST` then perform the search operation on all Resources, that are available at the Endpoint. The Client can request the Endpoint to ''restrict the search'' to a sub-collection of these Resources. In this case, the Client `MUST` pass a comma-separated list of persistent identifiers in the {{{x-clarin-fcs-context}}} extra request parameter of the ''searchRetrieve'' request. The Endpoint `MUST` then restrict the search to those Resources, that are identified by the persistent identifiers passed by Client. The Client can extract all valid persistent identifiers from the `@pid` attribute of the `<ed:Collection>` element, obtained by the ''explain'' request (see section [#explain Operation ''explain''] and section [#endpointDescription  Endpoint Description]). The list of persistent identifiers can get extensive, but an agent `MAY` use the POST method instead of GET method for submitting the request.
    651651
    652652For example, to restrict the search to the Resource with the persistent identifier `http://hdl.handle.net/4711/0815` the Client must issue the following request:
     
    659659}}}
    660660
    661 If an invalid persistent identifier is passed by a Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/1.0/diagnostic/1` diagnostic, i.e add the appropiate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. just issue the diagnostic and perform no search or it `MAY` treat it a non-fatal and perform the search.
     661If an invalid persistent identifier is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/1.0/diagnostic/1` diagnostic, i.e add the appropiate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. just issue the diagnostic and perform no search or it `MAY` treat it a non-fatal and perform the search.
    662662
    663663
     
    666666The following extra request parameters are used in CLARIN-FCS:
    667667||=Parameter Name =||=SRU operations =||=Allowed values =||= Description =||
    668 || `x-clarin-fcs-endpoint-description` || explain || `true` \\ All other values are reserved an `MUST` not be used by Clients || If present, the Endpoint `MUST` include a Endpoint Description in the\\`<sru:extraResponseData>` element of an ''explain'' response. ||
     668|| `x-clarin-fcs-endpoint-description` || explain || `true` \\ All other values are reserved an `MUST` not be used by Clients || If present, the Endpoint `MUST` include an Endpoint Description in the\\`<sru:extraResponseData>` element of an ''explain'' response. ||
    669669|| `x-clarin-fcs-context` || searchRetrieve || A comma separated list of persistent identifiers || The Endpoint `MUST` restrict the search to the collections identified by\\the persistent identifiers ||
    670670
     
    709709*WIP*
    710710
    711 An Endpoint `MAY` add arbitrary XML fragments to a `<fcs:Resource>` element. Clients `MUST` ignore any custom extensions they do not understand.
     711The Endpoint `MAY` add arbitrary XML fragments to a `<fcs:Resource>` element. Clients `MUST` ignore any custom extensions they do not understand.
    712712Endpoints `MUST` use a custom XML namespace name for their extensions. Endpoints `MUST NOT` use XML namespace names, that start with the prefixes `http://clarin.eu`, `http://www.clarin.eu/`, `https://clarin.eu` or `http://www.clarin.eu/`.
    713713