Changes between Version 54 and Version 55 of FCS-Specification-ScrapBook


Ignore:
Timestamp:
02/17/14 12:38:04 (10 years ago)
Author:
kisler
Comment:

Super minor changes ("an" in context with u vowels not always correct)

Legend:

Unmodified
Added
Removed
Modified
  • FCS-Specification-ScrapBook

    v54 v55  
    3434
    3535== Introduction ==
    36 The main goal of CLARIN federated content search (CLARIN-FCS) is to introduce a ''interface specification'', to decouple the ''search engine'' functionality from its ''exploitation'', i.e. user-interfaces, third-party applications and to allow services to access search engines in an uniform way.
     36The main goal of CLARIN federated content search (CLARIN-FCS) is to introduce a ''interface specification'', to decouple the ''search engine'' functionality from its ''exploitation'', i.e. user-interfaces, third-party applications and to allow services to access search engines in a uniform way.
    3737
    3838=== Terminology ===
     
    4747
    4848 Client::
    49     A software component, that implements the interface specification to query Endpoints, i.e. an aggregator or an user-interface.     
     49    A software component, that implements the interface specification to query Endpoints, i.e. an aggregator or a user-interface.     
    5050
    5151 CQL::
     
    235235A Resource `SHOULD` be the most precise unit of data that is directly addressable as a "whole". A Resource `SHOULD` contain a Resource Fragment, if the hit consists of just a part of the Resource unit, if the hit is a sentence within a large text. A Resource Fragment `SHOULD` be addressable within a resource, i.e. it has an offset or a resource-internal identifier. Using Resource Fragments is `OPTIONAL`, but Endpoints are encouraged to use them. If the Endpoint encodes a hit with a Resource Fragment, the actual hit `SHOULD` be encoded as a Data View that is encoded in a Resource Fragment.
    236236
    237 Endpoints `SHOULD` always provide a links to the resource itself, i.e. each Resource or Resource Fragment `SHOULD` be identified by a persistent identifier or providing a URI, that is unique for Endpoint. Even if direct linking is not possible, i.e. due to licensing issues, the Endpoints `SHOULD` provide a URI to link to a web-page describing the corpus or collection, including instruction on how to obtain it. Endpoints `SHOULD` provide links that are as specific as possible (and logical), i.e. if a sentence within a resource cannot be addressed directly, the Resource Fragment `SHOULD NOT` contain a persistent identifier or an URI.
    238 
    239 If the Endpoint can provide both, a persistent identifier as well as an URI, for either Resource or Resource Fragment, they `SHOULD` provide both. When working with results, Clients `SHOULD` prefer persistent identifiers over regular URIs.
     237Endpoints `SHOULD` always provide a links to the resource itself, i.e. each Resource or Resource Fragment `SHOULD` be identified by a persistent identifier or providing a URI, that is unique for Endpoint. Even if direct linking is not possible, i.e. due to licensing issues, the Endpoints `SHOULD` provide a URI to link to a web-page describing the corpus or collection, including instruction on how to obtain it. Endpoints `SHOULD` provide links that are as specific as possible (and logical), i.e. if a sentence within a resource cannot be addressed directly, the Resource Fragment `SHOULD NOT` contain a persistent identifier or a URI.
     238
     239If the Endpoint can provide both, a persistent identifier as well as a URI, for either Resource or Resource Fragment, they `SHOULD` provide both. When working with results, Clients `SHOULD` prefer persistent identifiers over regular URIs.
    240240
    241241Resource and Resource Fragment are serialized in XML and Endpoints `MUST` generate responses, that are valid according to the XML schema "[source:FederatedSearch/schema/Resource.xsd Resource.xsd]" ([source:FederatedSearch/schema/Resource.xsd?format=txt download]). A Resource is encoded in the form of a `<fcs:Resource>` element, a ''Resource Fragment'' in the form of a `<fcs:ResourceFragment>` element. The content of a Data View is wrapped in a `<fcs:DataView>` element. `<fcs:Resource>` is the top-level element and `MAY` contain zero or more `<fcs:DataView>` elements and `MAY` contain zero or more `<fcs:ResourceFragment>` elements. A `<fcs:ResourceFragment>` element `MUST` contain one or more `<fcs:DataView>` elements.
     
    285285}}}
    286286The most complex [#REF_Example_3 example] is similar to [#REF_Example_2 Example 2], i.e. it shows a hit is encoded as one ''Generic Hits'' Data View in a Resource Fragment, that is embedded in a Resource. In contrast to Example 2, another Data View of type ''CMDI'' is embedded directly within the Resource. The Endpoint can use this type of Data View to directly provide CMDI metadata about the Resource to Clients.
    287 All entities of the Hit can be referenced by a persistent identifier and an URI. The complete Resource is referenceable by either the persistent identifier `http://hdl.handle.net/4711/08-15` or the URI `http://repos.example.org/file/text_08_15.html` and the CMDI metadata record in the CMDI Data View is referenceable either by the persistent identifier `http://hdl.handle.net/4711/08-15-1` or the URI `http://repos.example.org/file/08_15_1.cmdi`. The actual hit in the Resource Fragment is also directly referenceable by either the persistent identifier `http://hdl.handle.net/4711/00-15-2` or the URI `http://repos.example.org/file/text_08_15.html#sentence2`.   
     287All entities of the Hit can be referenced by a persistent identifier and a URI. The complete Resource is referenceable by either the persistent identifier `http://hdl.handle.net/4711/08-15` or the URI `http://repos.example.org/file/text_08_15.html` and the CMDI metadata record in the CMDI Data View is referenceable either by the persistent identifier `http://hdl.handle.net/4711/08-15-1` or the URI `http://repos.example.org/file/08_15_1.cmdi`. The actual hit in the Resource Fragment is also directly referenceable by either the persistent identifier `http://hdl.handle.net/4711/00-15-2` or the URI `http://repos.example.org/file/text_08_15.html#sentence2`.   
    288288
    289289Endpoints `MUST` serialize one Resource for each hit, i.e. they `MUST NOT` combine several hits in one Resource. E.g., if a query matches five different sentences within one text (= the resource), the Endpoint must serialize five Resource (= one per hit) and embed each within one SRU result (see [#searchRetrieve below]).
     
    690690Centers are encouraged to provide links to their CLARIN-FCS Endpoints in the metadata records for their resources. Other services, like the VLO, can use this information for automatically configuring an Aggregator for searching resources at the Endpoint.
    691691   
    692 To refer to an Endpoint a `<cmdi:ResourceProxy>` with `<cmdi:ResourceType>` set to the value `SearchService` and a `@mimetype` attribute with a value of `application/sru+xml` need to be added to the CMDI record. The content of the `<cmdi:ResourceRef>` element must contain an URI that points to the Endpoint web service.
     692To refer to an Endpoint a `<cmdi:ResourceProxy>` with `<cmdi:ResourceType>` set to the value `SearchService` and a `@mimetype` attribute with a value of `application/sru+xml` need to be added to the CMDI record. The content of the `<cmdi:ResourceRef>` element must contain a URI that points to the Endpoint web service.
    693693
    694694Example: