186 | | Yada ... |
187 | | |
188 | | |
189 | | ''' WIP: will be reformulated ''' |
190 | | |
191 | | In CLARIN-FCS, each {{{<sru:record>}}} element represents one hit within the ''resource'', which is encoded as a {{{<fcs:Resource>}}} element. Each resource shall be identified a persistent identifier (or, less preferably, a endpoint unique URI). The correct resource to return here is the most precise unit of data that is directly addressable as a "whole". The hit may contain a ''resource fragment'', which is encoded as as a {{{<fcs:ResourceFragment>}}} element. The resource fragment shall be addressable within a resource, i.e. it has an offset or a resource-internal identifier. Using resource fragments is optional, but encouraged. |
192 | | |
193 | | The actual hit within a resource is provided encoded as a ''data view'' format and is serialized as a {{{<fcs:DataView>}}} element inside either the {{{<fcs:Resource>}}} or {{{<fcs:ResourceFragment>}}} element. Each hit may be serialized as multiple data views, however the keyword-in-context (KWIC) data view is mandatory with the resource fragment (if applicable), or otherwise within the resource (if there is no reasonable resource fragment). Other data views should be put in a place that is logical for their content (as is to be determined by the endpoint. E.g. a metadata data view would most likely be put directly under a resource. On the other hand a data view representing some annotation layers directly around the hit is more likely to belong in within the resource fragment. |
194 | | |
195 | | Each entity (i.e. {{{<fcs:Resource>}}}, {{{<fcs:ResourceFragment>}}} or {{{<fcs:DataView>}}} element) contains a {{{ref}}} attribute, which points to the original data represented by the resource, resource fragment, or data view as well as possible. It should always be possible to directly link to the resource itself. Worst case this will be a web-page describing a corpus or collection (including instruction on how to obtain it). Best case it directly links to a specific file or part of a resource in which the hit was obtained. The latter is not always possible, and when possible often constrained by licensing issues. Endpoints should provide links that are as specific as possible/logical. |
196 | | |
197 | | For CLARIN-FCS, a custom record schema has been defined. The ''record schema identifier'' for this schema is {{{http://clarin.eu/fcs/1.0}}} and the appropriate XML Schema can be found at [source:FederatedSearch/Resource.xsd Resource.xsd] ([source:FederatedSearch/Resource.xsd?format=txt download]). |
198 | | |
| 187 | A ''Resource Fragment'' is smaller unit in a ''Resource'', i.e. a sentence in a text corpus or a time interval in an audio transcription. |
| 188 | |
| 189 | Each ''Resource'' `SHOULD` be identified by a persistent identifier. A ''Resource'' `MAY` be identified by an endpoint unique URI. A ''Resource'' `SHOULD` be the most precise unit of data that is directly addressable as a "whole". A ''Resource'' `SHOULD` contain a ''Resource Fragment'', if the hit consists of just a part of the ''Resource'' unit, if the hit is a sentence within a large text. A ''Resource Fragment'' `SHOULD` be addressable within a resource, i.e. it has an offset or a resource-internal identifier. Using ''Resource Fragments'' is `OPTIONAL`, but Endpoints are encouraged to use them. If an Endpoint encodes a hit with a ''Resource Fragment'', the actual hit `SHOULD` be encoded as a ''Data View'' that is encloded in a ''Resource Fragment''. |
| 190 | |
| 191 | Endpoints `SHOULD` always provide a link to the resource itself, i.e. by supplying the persistent identifier o the ''Resource'' or providing a URI to reference the ''Resource''. If direct linking is not possible, i.e. due to licensing issues, the Endpoints `SHOULD` use a URI to link to a web-page describing a corpus or collection (including instruction on how to obtain it). Endpoints `SHOULD` provide links that are as specific as possible (and logical), i.e. if a sentence within a resource cannot be addressed, the ''Resource Fragment'' `SHOULD NOT` contain a persistent identifier or an URI. |
| 192 | |
| 193 | ''Resource'' and ''Resource Fragment'' are serialized in XML and Endpoints `MUST` generate responses, that are valid according to the XML schema "[source:FederatedSearch/Resource.xsd Resource.xsd]" ([source:FederatedSearch/Resource.xsd?format=txt download]). A ''Resource'' is encoded in the form of a `<fcs:Resource>` element, a ''Resource Fragment'' in the form of a `<fcs:ResourceFragment>` element. The content of a ''Data View'' is wrapped in a `<fcs:DataView>` element. `<fcs:Resource>` is the top-level element and `MAY` contain zero or more `<fcs:DataView>` elements and `MAY` contain zero or more `<fcs:ResourceFragment>` elements. A `<fcs:ResourceFragment>` element `MUST` contain one or more `<fcs:DataView>` elements. The elements `<fcs:Resource>`, `<fcs:ResourceFragment>` and `<fcs:DataView>` `MAY` carry a `@pid` and/or a `@ref` attribute, which allows linking to the original data represented by the resource, resource fragment, or data view. A `@pid` attribute `MUST` contain a valid persistent identifier, a `@ref` `MUST` contain valid URI (without the additional semantics of being persistent reference). |
| 194 | |
| 195 | Endpoints `MUST` use the identifier `http://clarin.eu/fcs/1.0` for the ''responseItemType'' (= content for the `<sru:recordSchema>` element) in SRU responses. |
| 196 | |
| 197 | Endpoints `MAY` serialize hits as multiple ''Data Views'', however they `MUST` provide the Generic Hits (HITS) ''Data View'' either encoded as a ''Resource Fragment'' (if applicable), or otherwise within the ''Resource'' (if there is no reasonable resource fragment). Other ''Data Views'' `SHOULD` be put in a place that is logical for their content (as is to be determined by the Endpoint), e.g. a metadata data view would most likely be put directly under a ''Resource'' and a ''Data View'' representing some annotation layers directly around the hit is more likely to belong in within a ''Resource Fragment''. |
| 198 | |
| 199 | Some examples: |
| 200 | * [=#XREF_Example_1]Example 1: a ''Resource'' with a ''Data View'' |
| 201 | {{{#!xml |
| 202 | <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/1.0" pid="http://hdl.handle.net/4711/00-15"> |
| 203 | <fcs:DataView type="application/x-clarin-fcs-hits+xml"> |
| 204 | <!-- data view content omitted --> |
| 205 | </fcs:DataView> |
| 206 | </fcs:Resource> |
| 207 | }}} |
| 208 | * [=#XREF_Example_2]Example 2: a ''Resource'' with a ''Resource Fragment'', that has a ''Data View'' |
| 209 | {{{#!xml |
| 210 | <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/1.0" pid="http://hdl.handle.net/4711/00-15"> |
| 211 | <fcs:ResourceFragment> |
| 212 | <fcs:DataView type="application/x-clarin-fcs-hits+xml"> |
| 213 | <!-- data view content omitted --> |
| 214 | </fcs:DataView> |
| 215 | </fcs:ResourceFragment> |
| 216 | </fcs:Resource> |
| 217 | }}} |
| 218 | * [=#XREF_Example_2]Example 3: a ''Resource'' with a ''Data View'' and a ''Resource Fragment'', that has a ''Data View'' |
| 219 | {{{#!xml |
| 220 | <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/1.0" |
| 221 | pid="http://hdl.handle.net/4711/00-15" ref="http://repos.example.org/file/text_00_15.html"> |
| 222 | <fcs:DataView type="application/x-cmdi+xml" |
| 223 | pid="http://hdl.handle.net/4711/00-15-1" ref="http://repos.example.org/file/00_15_1.cmdi"> |
| 224 | <!-- data view content omitted --> |
| 225 | </fcs:DataView> |
| 226 | <fcs:ResourceFragment pid="http://hdl.handle.net/4711/00-15-2" ref="http://repos.example.org/file/text_00_15.html#sentence2"> |
| 227 | <fcs:DataView type="application/x-clarin-fcs-hits+xml"> |
| 228 | <!-- data view content omitted --> |
| 229 | </fcs:DataView> |
| 230 | </fcs:ResourceFragment> |
| 231 | </fcs:Resource> |
| 232 | }}} |
| 233 | |
| 234 | *TODO*: explain examples. |