Changes between Version 93 and Version 94 of FCS-Specification-ScrapBook


Ignore:
Timestamp:
03/19/14 13:42:12 (11 years ago)
Author:
Oliver Schonefeld
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FCS-Specification-ScrapBook

    v93 v94  
    164164
    165165== CLARIN-FCS Interface Specification ==
    166 The CLARIN-FCS interface specification defines a set of capabilites, an extensible result format and a set of required operations. CLARIN-FCS is built on the SRU/CQL standard and additional functionality required for CLARIN-FCS is added through SRU/CQL's extension mechanisms.
    167 
    168 Generally, CLARIN-FCS Interface Specification consists of two components, a set of ''formats'' and a ''transport protocol''. The ''Endpoint'' component is a software component that acts as a bridge between the Formats, that are send by a ''Client'' using the ''Transport Protocol'', and a ''Search Engine''. The ''Search Engine'' is a custom software component, that allows searching in the language resources of a CLARIN center. The ''Endpoint'' basically implements the ''transport protocol'' and acts as an mediator between the CLARIN-FCS specific formats and the idiosyncrasies of ''Search Engines''. The following figure illustrates the overall architecture.
     166The CLARIN-FCS Interface Specification defines a set of capabilities, an extensible result format and a set of required operations. CLARIN-FCS is built on the SRU/CQL standard and additional functionality required for CLARIN-FCS is added through SRU/CQL's extension mechanisms.
     167
     168Specifically, the CLARIN-FCS Interface Specification consists of two ''components'', a set of ''formats'' and a ''transport protocol''. The ''Endpoint'' component is a software component that acts as a bridge between the formats, that are send by a ''Client'' using the ''Transport Protocol'', and a ''Search Engine''. The ''Search Engine'' is a custom software component, that allows searching in the language resources of a CLARIN center. The ''Endpoint'' basically implements the ''transport protocol'' and acts as an mediator between the CLARIN-FCS specific formats and the idiosyncrasies of ''Search Engines''. The following figure illustrates the overall architecture.
    169169{{{
    170170                 +---------+
     
    195195        +---------------------------+
    196196}}}
    197 In general, the work-flow in CLARIN-FCS is as follows: a Client submits a query to an Endpoint. The Endpoint translates the query from CQL to the dialect used by the Search Engine and submits the translated query to the Search Engine. The Search Engine processes the query and generates a result set, i.e. it compiles a set of hits that match the search criterion. The Endpoint then translates the results from the Search Engine specific result set format to the CLARIN-FCS result format and sends it to the Client.
     197In general, the work-flow in CLARIN-FCS is as follows: a Client submits a query to an Endpoint. The Endpoint translates the query from CQL to the query dialect used by the Search Engine and submits the translated query to the Search Engine. The Search Engine processes the query and generates a result set, i.e. it compiles a set of hits that match the search criterion. The Endpoint then translates the results from the Search Engine specific result set format to the CLARIN-FCS result format and sends it to the Client.
    198198
    199199The following sections describe the CLARIN-FCS capabilities, the query and result formats, how SRU/CQL is used as a transport protocol in the context of CLARIN-FCS and the required CLARIN-FCS specific extensions to SRU.
    200200
    201201=== Capabilities===#capabilities
    202 CLARIN-FCS aims to integrate several heterogeneous Search Engines. Each Search Engine has different capabilities and the available resources have different properties, e.g. some resources provide Part-Of-Speech annotations and some do not. To accommodate for the different capabilities of the Search Engine, the Interface Specification defines several ''Capabilities''. A Capability defined a certain aspect of CLARIN-FCS, e.g. what kind of queries are supported. Each Endpoint implements some (or all) of these Capabilities. The Endpoint will announce the capabilities it provides to allow a Client to auto-tune itself (see section [#endpointDescription  Endpoint Description]. Each Capability is identified by a ''Capability Identifier'', which uses the URI syntax.
     202Since, CLARIN-FCS aims to integrate several heterogeneous Search Engines it needs to support the different features of the Search Engines, e.g. some only allow simple search while other support wild-cards or regular expressions. Also, the available resources have different properties, e.g. some resources provide Part-Of-Speech annotations and some do not. To accommodate for these different features, CLARIN-FCS defines several ''Capabilities''. A Capability defines a certain feature that is part of CLARIN-FCS, e.g. what kind of queries are supported. Each Endpoint implements some (or all) of these Capabilities. The Endpoint will announce the capabilities it provides to allow a Client to auto-tune itself (see section [#endpointDescription  Endpoint Description]. Each Capability is identified by a ''Capability Identifier'', which uses the URI syntax.
    203203
    204204The following Capabilities are defined by CLARIN-FCS:
     
    218218   The Endpoint `MUST` perform the query on an annotation tier, that makes the most sense for the user, i.e. the textual content for a text corpus resource or the orthographic transcription of a spoken language corpus. Endpoints `SHOULD` perform the query case-sensitive. \\
    219219   '''NOTE''': CLARIN-FCS requires Endpoints to be able to parse all of CQL and, if they don't support a certain CQL feature, to generate the appropriate error message (see section [#sruCQL  SRU/CQL]). Especially, if an Endpoint ''only'' supports ''Basic Search'', it `MUST NOT` silently accept queries that include CQL features besides ''term-only'' and ''terms'' combined with boolean operator queries, i.e. queries involving context sets, etc.
     220
     221Every Endpoint `MUST` implement the ''Basic Search'' Capability.
    220222
    221223Endpoints `MUST NOT` invent custom Capability Identifiers and `MUST` only use the values defined above.