Changes between Version 54 and Version 55 of FCS-Specification-ScrapBook
- Timestamp:
- 02/17/14 12:38:04 (10 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
FCS-Specification-ScrapBook
v54 v55 34 34 35 35 == Introduction == 36 The main goal of CLARIN federated content search (CLARIN-FCS) is to introduce a ''interface specification'', to decouple the ''search engine'' functionality from its ''exploitation'', i.e. user-interfaces, third-party applications and to allow services to access search engines in a nuniform way.36 The main goal of CLARIN federated content search (CLARIN-FCS) is to introduce a ''interface specification'', to decouple the ''search engine'' functionality from its ''exploitation'', i.e. user-interfaces, third-party applications and to allow services to access search engines in a uniform way. 37 37 38 38 === Terminology === … … 47 47 48 48 Client:: 49 A software component, that implements the interface specification to query Endpoints, i.e. an aggregator or a nuser-interface.49 A software component, that implements the interface specification to query Endpoints, i.e. an aggregator or a user-interface. 50 50 51 51 CQL:: … … 235 235 A Resource `SHOULD` be the most precise unit of data that is directly addressable as a "whole". A Resource `SHOULD` contain a Resource Fragment, if the hit consists of just a part of the Resource unit, if the hit is a sentence within a large text. A Resource Fragment `SHOULD` be addressable within a resource, i.e. it has an offset or a resource-internal identifier. Using Resource Fragments is `OPTIONAL`, but Endpoints are encouraged to use them. If the Endpoint encodes a hit with a Resource Fragment, the actual hit `SHOULD` be encoded as a Data View that is encoded in a Resource Fragment. 236 236 237 Endpoints `SHOULD` always provide a links to the resource itself, i.e. each Resource or Resource Fragment `SHOULD` be identified by a persistent identifier or providing a URI, that is unique for Endpoint. Even if direct linking is not possible, i.e. due to licensing issues, the Endpoints `SHOULD` provide a URI to link to a web-page describing the corpus or collection, including instruction on how to obtain it. Endpoints `SHOULD` provide links that are as specific as possible (and logical), i.e. if a sentence within a resource cannot be addressed directly, the Resource Fragment `SHOULD NOT` contain a persistent identifier or a nURI.238 239 If the Endpoint can provide both, a persistent identifier as well as a nURI, for either Resource or Resource Fragment, they `SHOULD` provide both. When working with results, Clients `SHOULD` prefer persistent identifiers over regular URIs.237 Endpoints `SHOULD` always provide a links to the resource itself, i.e. each Resource or Resource Fragment `SHOULD` be identified by a persistent identifier or providing a URI, that is unique for Endpoint. Even if direct linking is not possible, i.e. due to licensing issues, the Endpoints `SHOULD` provide a URI to link to a web-page describing the corpus or collection, including instruction on how to obtain it. Endpoints `SHOULD` provide links that are as specific as possible (and logical), i.e. if a sentence within a resource cannot be addressed directly, the Resource Fragment `SHOULD NOT` contain a persistent identifier or a URI. 238 239 If the Endpoint can provide both, a persistent identifier as well as a URI, for either Resource or Resource Fragment, they `SHOULD` provide both. When working with results, Clients `SHOULD` prefer persistent identifiers over regular URIs. 240 240 241 241 Resource and Resource Fragment are serialized in XML and Endpoints `MUST` generate responses, that are valid according to the XML schema "[source:FederatedSearch/schema/Resource.xsd Resource.xsd]" ([source:FederatedSearch/schema/Resource.xsd?format=txt download]). A Resource is encoded in the form of a `<fcs:Resource>` element, a ''Resource Fragment'' in the form of a `<fcs:ResourceFragment>` element. The content of a Data View is wrapped in a `<fcs:DataView>` element. `<fcs:Resource>` is the top-level element and `MAY` contain zero or more `<fcs:DataView>` elements and `MAY` contain zero or more `<fcs:ResourceFragment>` elements. A `<fcs:ResourceFragment>` element `MUST` contain one or more `<fcs:DataView>` elements. … … 285 285 }}} 286 286 The most complex [#REF_Example_3 example] is similar to [#REF_Example_2 Example 2], i.e. it shows a hit is encoded as one ''Generic Hits'' Data View in a Resource Fragment, that is embedded in a Resource. In contrast to Example 2, another Data View of type ''CMDI'' is embedded directly within the Resource. The Endpoint can use this type of Data View to directly provide CMDI metadata about the Resource to Clients. 287 All entities of the Hit can be referenced by a persistent identifier and a nURI. The complete Resource is referenceable by either the persistent identifier `http://hdl.handle.net/4711/08-15` or the URI `http://repos.example.org/file/text_08_15.html` and the CMDI metadata record in the CMDI Data View is referenceable either by the persistent identifier `http://hdl.handle.net/4711/08-15-1` or the URI `http://repos.example.org/file/08_15_1.cmdi`. The actual hit in the Resource Fragment is also directly referenceable by either the persistent identifier `http://hdl.handle.net/4711/00-15-2` or the URI `http://repos.example.org/file/text_08_15.html#sentence2`.287 All entities of the Hit can be referenced by a persistent identifier and a URI. The complete Resource is referenceable by either the persistent identifier `http://hdl.handle.net/4711/08-15` or the URI `http://repos.example.org/file/text_08_15.html` and the CMDI metadata record in the CMDI Data View is referenceable either by the persistent identifier `http://hdl.handle.net/4711/08-15-1` or the URI `http://repos.example.org/file/08_15_1.cmdi`. The actual hit in the Resource Fragment is also directly referenceable by either the persistent identifier `http://hdl.handle.net/4711/00-15-2` or the URI `http://repos.example.org/file/text_08_15.html#sentence2`. 288 288 289 289 Endpoints `MUST` serialize one Resource for each hit, i.e. they `MUST NOT` combine several hits in one Resource. E.g., if a query matches five different sentences within one text (= the resource), the Endpoint must serialize five Resource (= one per hit) and embed each within one SRU result (see [#searchRetrieve below]). … … 690 690 Centers are encouraged to provide links to their CLARIN-FCS Endpoints in the metadata records for their resources. Other services, like the VLO, can use this information for automatically configuring an Aggregator for searching resources at the Endpoint. 691 691 692 To refer to an Endpoint a `<cmdi:ResourceProxy>` with `<cmdi:ResourceType>` set to the value `SearchService` and a `@mimetype` attribute with a value of `application/sru+xml` need to be added to the CMDI record. The content of the `<cmdi:ResourceRef>` element must contain a nURI that points to the Endpoint web service.692 To refer to an Endpoint a `<cmdi:ResourceProxy>` with `<cmdi:ResourceType>` set to the value `SearchService` and a `@mimetype` attribute with a value of `application/sru+xml` need to be added to the CMDI record. The content of the `<cmdi:ResourceRef>` element must contain a URI that points to the Endpoint web service. 693 693 694 694 Example: