126 | | Generally, CLARIN-FCS Interface Specification consists of two components, a set of ''formats'' and a ''transport protocol''. The ''Endpoint'' component is a software component that acts as a bridge between the Formats, that are send by a ''Client'' using the ''Transport Protocol'', and a ''Search Engine''. The ''Search Engine'' is a custom software component, that allows searching in the language resources of a CLARIN center. The ''Endpoint'' basically implements the ''transport protocol'' and acts as an mediator between the CLRAIN-FCS speceific formats and the idiosyncrasies of ''Search Engines''. The following figure illustrates the overall architecture. |
| 128 | Generally, CLARIN-FCS Interface Specification consists of two components, a set of ''formats'' and a ''transport protocol''. The ''Endpoint'' component is a software component that acts as a bridge between the Formats, that are send by a ''Client'' using the ''Transport Protocol'', and a ''Search Engine''. The ''Search Engine'' is a custom software component, that allows searching in the language resources of a CLARIN center. The ''Endpoint'' basically implements the ''transport protocol'' and acts as an mediator between the CLRAIN-FCS specific formats and the idiosyncrasies of ''Search Engines''. The following figure illustrates the overall architecture. |
194 | | Each ''Resource'' `SHOULD` be identified by a persistent identifier. A ''Resource'' `MAY` be identified by an endpoint unique URI. A ''Resource'' `SHOULD` be the most precise unit of data that is directly addressable as a "whole". A ''Resource'' `SHOULD` contain a ''Resource Fragment'', if the hit consists of just a part of the ''Resource'' unit, if the hit is a sentence within a large text. A ''Resource Fragment'' `SHOULD` be addressable within a resource, i.e. it has an offset or a resource-internal identifier. Using ''Resource Fragments'' is `OPTIONAL`, but Endpoints are encouraged to use them. If an Endpoint encodes a hit with a ''Resource Fragment'', the actual hit `SHOULD` be encoded as a ''Data View'' that is encoded in a ''Resource Fragment''. |
195 | | |
196 | | Endpoints `SHOULD` always provide a link to the resource itself, i.e. by supplying the persistent identifier o the ''Resource'' or providing a URI to reference the ''Resource''. If direct linking is not possible, i.e. due to licensing issues, the Endpoints `SHOULD` use a URI to link to a web-page describing a corpus or collection (including instruction on how to obtain it). Endpoints `SHOULD` provide links that are as specific as possible (and logical), i.e. if a sentence within a resource cannot be addressed, the ''Resource Fragment'' `SHOULD NOT` contain a persistent identifier or an URI. |
197 | | |
198 | | ''Resource'' and ''Resource Fragment'' are serialized in XML and Endpoints `MUST` generate responses, that are valid according to the XML schema "[source:FederatedSearch/Resource.xsd Resource.xsd]" ([source:FederatedSearch/Resource.xsd?format=txt download]). A ''Resource'' is encoded in the form of a `<fcs:Resource>` element, a ''Resource Fragment'' in the form of a `<fcs:ResourceFragment>` element. The content of a ''Data View'' is wrapped in a `<fcs:DataView>` element. `<fcs:Resource>` is the top-level element and `MAY` contain zero or more `<fcs:DataView>` elements and `MAY` contain zero or more `<fcs:ResourceFragment>` elements. A `<fcs:ResourceFragment>` element `MUST` contain one or more `<fcs:DataView>` elements. The elements `<fcs:Resource>`, `<fcs:ResourceFragment>` and `<fcs:DataView>` `MAY` carry a `@pid` and/or a `@ref` attribute, which allows linking to the original data represented by the resource, resource fragment, or data view. A `@pid` attribute `MUST` contain a valid persistent identifier, a `@ref` `MUST` contain valid URI (without the additional semantics of being persistent reference). |
| 196 | Each Resource `SHOULD` be identified by a persistent identifier. A Resource `MAY` be identified by an endpoint unique URI. A Resource `SHOULD` be the most precise unit of data that is directly addressable as a "whole". A Resource `SHOULD` contain a Resource Fragment, if the hit consists of just a part of the Resource unit, if the hit is a sentence within a large text. A Resource Fragment `SHOULD` be addressable within a resource, i.e. it has an offset or a resource-internal identifier. Using Resource Fragments is `OPTIONAL`, but Endpoints are encouraged to use them. If an Endpoint encodes a hit with a Resource Fragment, the actual hit `SHOULD` be encoded as a Data View that is encoded in a Resource Fragment. |
| 197 | |
| 198 | Endpoints `SHOULD` always provide a link to the resource itself, i.e. by supplying the persistent identifier of the Resource or providing an URI. If direct linking is not possible, i.e. due to licensing issues, the Endpoints `SHOULD` use a URI to link to a web-page describing the corpus or collection,including instruction on how to obtain it. Endpoints `SHOULD` provide links that are as specific as possible (and logical), i.e. if a sentence within a resource cannot be addressed, the Resource Fragment `SHOULD NOT` contain a persistent identifier or an URI. |
| 199 | |
| 200 | If an Endpoint can provide both, a persistent identifier as well as an URI, for either Resource or Resource Fragment, they `SHOULD` provide both. When working with results, Clients `SHOULD` prefer persistent identifiers over regular URIs. |
| 201 | |
| 202 | Resource and Resource Fragment are serialized in XML and Endpoints `MUST` generate responses, that are valid according to the XML schema "[source:FederatedSearch/Resource.xsd Resource.xsd]" ([source:FederatedSearch/Resource.xsd?format=txt download]). A Resource is encoded in the form of a `<fcs:Resource>` element, a ''Resource Fragment'' in the form of a `<fcs:ResourceFragment>` element. The content of a Data View is wrapped in a `<fcs:DataView>` element. `<fcs:Resource>` is the top-level element and `MAY` contain zero or more `<fcs:DataView>` elements and `MAY` contain zero or more `<fcs:ResourceFragment>` elements. A `<fcs:ResourceFragment>` element `MUST` contain one or more `<fcs:DataView>` elements. |
| 203 | |
| 204 | The elements `<fcs:Resource>`, `<fcs:ResourceFragment>` and `<fcs:DataView>` `MAY` carry a `@pid` and/or a `@ref` attribute, which allows linking to the original data represented by the Resource, Resource Fragment, or Data View. A `@pid` attribute `MUST` contain a valid persistent identifier, a `@ref` `MUST` contain valid URI, i.e. a "plain" URI without the additional semantics of being a persistent reference. |
239 | | *TODO*: explain examples. |
| 245 | [#XREF_Example_2 Example 1] shows a simple hit, which is encoded in one Data View of type ''Generic Hits'' embedded within a Resource. The type of the Data View is identified by the MIME type `application/x-clarin-fcs-hits+xml`. The Resource is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`. |
| 246 | |
| 247 | [#XREF_Example_2 Example 2] shows a hit encoded as a Resource Fragment embedded within a Resource. The actual hit is again encoded as one Data View of type ''Generic Hits''. The hit is not directly referenceable, but the Resource, in which hit occurred, is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`. In contrast to Example 1, the endpoint decided to provide a "semantically richer" encoding and embedded the hit using a Resource Fragment within the Resource to indicate that the hit is a part of a larger resource, e.g. a sentence in a text document. |
| 248 | |
| 249 | The most complex [#XREF_Example_3 Example 3] is similar to Example 2, i.e. it shows a hit is encoded as one ''Generic Hits'' Data View in a Resource Fragment, that is embedded in a Resource. In contrast to Example 2, another Data View of type ''CMDI'' is embedded directly within the Resource. An Endpoint can use this type of Data View to directly provide CMDI metadata about the Resource to Clients. |
| 250 | All entities of the Hit can be referenced by a persistent identifier and an URI. The complete Resource is referencable by either the persistent identifier `http://hdl.handle.net/4711/08-15` or the URI `http://repos.example.org/file/text_08_15.html` and the CMDI metadata record in the CMDI Data View is referencable either by the persistent identifier `http://hdl.handle.net/4711/08-15-1` or the URI `http://repos.example.org/file/08_15_1.cmdi`. The actual hit in the Resource Fragment is also directly referencable by either the persistent identifier `http://hdl.handle.net/4711/00-15-2` or the URI `http://repos.example.org/file/text_08_15.html#sentence2`. |