Changes between Version 37 and Version 38 of FCS-Specification-ScrapBook
- Timestamp:
- 02/12/14 17:36:05 (10 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
FCS-Specification-ScrapBook
v37 v38 48 48 49 49 Client:: 50 A software component, that implements the interface specification to query endpoints, i.e. an aggregator or an user-interface.50 A software component, that implements the interface specification to query Endpoints, i.e. an aggregator or an user-interface. 51 51 52 52 CQL:: … … 78 78 79 79 Repository Registry:: 80 A separate service that allows registering endpoints and provides information about these to other components, e.g. an aggegator. The [http://centres.clarin.eu/ CLARIN Center Registry] is an implementation of such a repository registry.80 A separate service that allows registering Endpoints and provides information about these to other components, e.g. an aggegator. The [http://centres.clarin.eu/ CLARIN Center Registry] is an implementation of such a repository registry. 81 81 82 82 === Normative References === … … 181 181 ''Basic profile'':: 182 182 Endpoints `MUST` support ''term-only'' queries. \\ 183 Endpoints `SHOULD` support ''terms'' combined with boolean operator queries (''AND'' and ''OR''), including subqueries. Endpoints `MAY` also support ''NOT'' or ''PROX'' operator queries. If an endpoint does not support a query, i.e. the used operators are not supported by the Endpoint, it `MUST` return an appropriate error message using the appropriate SRU diagnostic. \\183 Endpoints `SHOULD` support ''terms'' combined with boolean operator queries (''AND'' and ''OR''), including subqueries. Endpoints `MAY` also support ''NOT'' or ''PROX'' operator queries. If an Endpoint does not support a query, i.e. the used operators are not supported by the Endpoint, it `MUST` return an appropriate error message using the appropriate SRU diagnostic. \\ 184 184 Examples for valid CQL queries for the ''basic profile'': 185 185 {{{ … … 192 192 cat AND (mouse OR "lazy dog") 193 193 }}} 194 The endpoint is `MUST` perform the query on an annotation tier, that makes the most sense for the user, i.e. the textual content for a text corpus resource or the orthographic transcription of a spoken language corpus. Endpoints `SHOULD` perform the query case-sensitive.\\194 The Endpoint is `MUST` perform the query on an annotation tier, that makes the most sense for the user, i.e. the textual content for a text corpus resource or the orthographic transcription of a spoken language corpus. Endpoints `SHOULD` perform the query case-sensitive.\\ 195 195 Endpoint `MUST NOT` silently accept queries that include CQL features besides ''term-only'' and ''terms'' combined with boolean operator queries, i.e. queries involving context sets, etc. 196 196 … … 203 203 204 204 === Result Format === 205 CLARIN-FCS uses a customized format for returning results. ''Resource'' and ''Resource Fragments'' serve as containers for hit results, that are presented in one or more ''Data View''. The following section describes the result result and data view format and section [#searchRetrieve Operation ''searchRetrieve''] will describe, how hits are embedded within SRU responses.205 CLARIN-FCS uses a customized format for returning results. ''Resource'' and ''Resource Fragments'' serve as containers for hit results, that are presented in one or more ''Data View''. The following section describes the Resource format and Data View format and section [#searchRetrieve Operation ''searchRetrieve''] will describe, how hits are embedded within SRU responses. 206 206 207 207 ==== Resource and !ResourceFragment ==== … … 224 224 Endpoints `MUST` use the identifier `http://clarin.eu/fcs/1.0` for the ''responseItemType'' (= content for the `<sru:recordSchema>` element) in SRU responses. 225 225 226 Endpoints `MAY` serialize hits as multiple Data Views, however they `MUST` provide the Generic Hits (HITS) Data View either encoded as a Resource Fragment (if applicable), or otherwise within the Resource (if there is no reasonable resource fragment). Other Data Views `SHOULD` be put in a place that is logical for their content (as is to be determined by the Endpoint), e.g. a metadata data view would most likely be put directly below Resource and a Data View representing some annotation layers directly around the hit is more likely to belong within a Resource Fragment.226 Endpoints `MAY` serialize hits as multiple Data Views, however they `MUST` provide the Generic Hits (HITS) Data View either encoded as a Resource Fragment (if applicable), or otherwise within the Resource (if there is no reasonable Resource Fragment). Other Data Views `SHOULD` be put in a place that is logical for their content (as is to be determined by the Endpoint), e.g. a metadata Data View would most likely be put directly below Resource and a Data View representing some annotation layers directly around the hit is more likely to belong within a Resource Fragment. 227 227 228 228 [=#REF_Example_1]Example 1: … … 246 246 </fcs:Resource> 247 247 }}} 248 This [#REF_Example_2 example] shows a hit encoded as a Resource Fragment embedded within a Resource. The actual hit is again encoded as one Data View of type ''Generic Hits''. The hit is not directly referenceable, but the Resource, in which hit occurred, is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`. In contrast to [#REF_Example_1 Example 1], the endpoint decided to provide a "semantically richer" encoding and embedded the hit using a Resource Fragment within the Resource to indicate that the hit is a part of a larger resource, e.g. a sentence in a text document.248 This [#REF_Example_2 example] shows a hit encoded as a Resource Fragment embedded within a Resource. The actual hit is again encoded as one Data View of type ''Generic Hits''. The hit is not directly referenceable, but the Resource, in which hit occurred, is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`. In contrast to [#REF_Example_1 Example 1], the Endpoint decided to provide a "semantically richer" encoding and embedded the hit using a Resource Fragment within the Resource to indicate that the hit is a part of a larger resource, e.g. a sentence in a text document. 249 249 250 250 [=#REF_Example_3]Example 3: … … 266 266 All entities of the Hit can be referenced by a persistent identifier and an URI. The complete Resource is referenceable by either the persistent identifier `http://hdl.handle.net/4711/08-15` or the URI `http://repos.example.org/file/text_08_15.html` and the CMDI metadata record in the CMDI Data View is referenceable either by the persistent identifier `http://hdl.handle.net/4711/08-15-1` or the URI `http://repos.example.org/file/08_15_1.cmdi`. The actual hit in the Resource Fragment is also directly referenceable by either the persistent identifier `http://hdl.handle.net/4711/00-15-2` or the URI `http://repos.example.org/file/text_08_15.html#sentence2`. 267 267 268 Endpoints `MUST` serialize one Resource for each hit, i.e. they `MUST NOT` combine several hits in one Resource. E.g., if a query matches five different sentences within one text (= the resource), the endpoint must serialize five Resource (= one per hit) and embed each within one SRU result (see [#query below]).268 Endpoints `MUST` serialize one Resource for each hit, i.e. they `MUST NOT` combine several hits in one Resource. E.g., if a query matches five different sentences within one text (= the resource), the Endpoint must serialize five Resource (= one per hit) and embed each within one SRU result (see [#query below]). 269 269 270 270 271 271 ==== Data View ==== 272 A ''Data View'' serves as a container for representing search results within CLARIN-FCS. Data Views are designed to allow for different representations of results, i.e. they are deliberately kept open to allow further extensions with more supported data view formats. The content of a Data View is called ''Payload''. Each Payload is typed and the type of the Payload is recorded in the `@type` attribute if the `<fcs:DataView>` element. The Payload type is is identified by a MIME type ([#REF_RFC_6838 RFC6838], [#REF_RFC_3023 RFC3023]). If no existing MIME type can be used, implementors `SHOULD` define a properer private mime type.272 A ''Data View'' serves as a container for representing search results within CLARIN-FCS. Data Views are designed to allow for different representations of results, i.e. they are deliberately kept open to allow further extensions with more supported Data View formats. The content of a Data View is called ''Payload''. Each Payload is typed and the type of the Payload is recorded in the `@type` attribute if the `<fcs:DataView>` element. The Payload type is is identified by a MIME type ([#REF_RFC_6838 RFC6838], [#REF_RFC_3023 RFC3023]). If no existing MIME type can be used, implementors `SHOULD` define a properer private mime type. 273 273 274 274 The Payload of a Data View can either be deposited ''inline'' or by ''reference''. In the case of ''inline'', it `MUST` be serialized as an XML fragment below the `<fcs:DataView>` element. This method is the preferred methods payloads that can easily serialized in XML. In the case of by ''reference'', the content cannot easily deposited inline, i.e. it is binary content. In this case, the Data View `MUST` include a `@ref` or `@pid` attribute that links location for Clients to download the payload. This location `SHOULD` be ''openly accessible'', i.e. data can be downloaded freely without any need to perform a login.