Changes between Version 57 and Version 58 of Taskforces/FCS/FCS-Specification-Draft
- Timestamp:
- 06/12/17 15:53:45 (7 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Taskforces/FCS/FCS-Specification-Draft
v57 v58 182 182 183 183 Specifically, the CLARIN-FCS Interface Specification consists of two parts, a set of formats, and a transport protocol. The ''Endpoint'' component is a software component that acts as a bridge between a ''Client'' and a ''Search Engine'' and passes the requests sent by the ''Client'' to the ''Search Engine''. The ''Search Engine'' is a custom software component that allows the search of language resources in a Repository. The ''Endpoint'' implements the ''Transport Protocol'' and acts as a mediator between the CLARIN-FCS specific formats and the idiosyncrasies of ''Search Engines'' of the individual Repositories. The following figure illustrates the overall architecture: 184 [[Image(fcs-2-interaktion.svg, 9 60)]]184 [[Image(fcs-2-interaktion.svg, 940)]] 185 185 {{{#!comment 186 186 +---------+ … … 226 226 227 227 === Endpoint Description #endpointDescription 228 {{{229 #!div style="border: 1px solid #000000; font-size: 75%"230 Add stuff required for advanced capability.231 }}}232 228 Endpoints need to provide information about their capabilities to support auto-configuration of Clients. The ''Endpoint Description'' mechanism provides the necessary facility to provide this information to the Clients. Endpoints `MUST` encode their capabilities using an XML format and embed this information into the SRU/CQL protocol as described in section [#explain Operation ''explain'']. The XML fragment generated by the Endpoint for the Endpoint Description `MUST` be valid according to the XML schema "[source:FederatedSearch/schema/Core_2/Endpoint-Description.xsd Endpoint-Description.xsd]" ([source:FederatedSearch/schema/Core_2/Endpoint-Description.xsd?format=txt download]). 233 229 … … 241 237 The value of the `@id` attribute `MUST NOT` contain the characters `,` (comma) or `;` (semicolon) 242 238 * one `<ed:SupportedLayers>` element (`REQUIRED` if Endpoint supports ''Advanced Search'' capability) \\ 243 A list of Layers that are generally supported by this Endpoint. This list is composed of one or more `<ed:SupportedLayer>` elements. The content of a `<ed:SupportedLayer>` `MUST` be the identifier of a Layer (see [#layers section "Layers"]), e.g. `orth`. Each `<ed:SupportedLayer>` element `MUST` carry an `@id` and a `@delivery-policy` attribute. The value of the `@id` attribute is later used in the `<ed:Resource>` element to indicate, which Data View is supported by a resource (see below). The `@result-id` attribute is used in the Advanced Data View (see [#advancedDataView section "Advanced Data View"]). Each `<ed:SupportedLayer>` element `MAY` carry an optional `@qualifier` attribute. It is used a a qualifier in a FCS-QL search term in to address this specific layer. \\239 A list of Layers that are generally supported by this Endpoint. This list is composed of one or more `<ed:SupportedLayer>` elements. The content of a `<ed:SupportedLayer>` `MUST` be the identifier of a Layer (see [#layers section "Layers"]), e.g. `orth`. Each `<ed:SupportedLayer>` element `MUST` carry an `@id` and a `@delivery-policy` attribute. The value of the `@id` attribute is later used in the `<ed:Resource>` element to indicate, which Data View is supported by a resource (see below). The `@result-id` attribute is used in the Advanced Data View (see [#advancedDataView section "Advanced Data View"]). Each `<ed:SupportedLayer>` element `MAY` carry an optional `@qualifier` attribute. It is used as a qualifier in a FCS-QL search term in to address this specific layer. \\ 244 240 This list `MUST NOT` include duplicate entries, i.e. no Layer with the same `@result-id` MIME type must appear more than once. \\ 245 241 The value of the `@id` or `@result-id` attribute `MUST NOT` contain the characters `,` (comma) or `;` (semicolon) 246 The value of the `@qualifier` attribute `MUST NOT` contain characters other than `a`-`z`,`A`-`Z`,`0`-`9` and `-` (hyphen) .242 The value of the `@qualifier` attribute `MUST NOT` contain characters other than `a`-`z`,`A`-`Z`,`0`-`9` and `-` (hyphen) and the first character `MUST` be one of `a`-`z` or `A`-`Z`. 247 243 The `<ed:SupportedLayer>` element `MAY` carry an `@alt-value-info` and `@alt-value-info-uri` attribute; `@alt-value-info` `SHOULD` contain a sort description about the layer, e.g. the original tag set used; `@alt-value-info-uri` `MUST` contain a well-formed URI and `SHOULD` point to a web site with further information, e.g. about the original tag set and how the translation to FCS is done. Client, e.g. the Aggregator, can display this information together with the search result. 248 244 * one `<ed:Resources>` element (`REQUIRED`) \\ … … 356 352 {{{#!xml 357 353 <ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="2"> 358 <ed:Capabilities> 359 <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability> 360 <ed:Capability>http://clarin.eu/fcs/capability/advanced-search</ed:Capability> 361 </ed:Capabilities> 362 <ed:SupportedDataViews> 363 <ed:SupportedDataView id="hits" delivery-policy="send-by-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView> 364 </ed:SupportedDataViews> 365 <!-- ADV-FCS --> 366 <SupportedLayers> 367 <SupportedLayer id="l1" result-id="http://endpoint.example.org/Layers/orth1">orth</SupportedLayer> 368 <SupportedLayer id="l2" result-id="http://endpoint.example.org/Layers/pos1" qualifier="x">pos</SupportedLayer> 369 <SupportedLayer id="l3" result-id="http://endpoint.example.org/Layers/pos2" qualifier="y" 370 alt-value-info="STTS tagset" 371 alt-value-info-uri="http://repos.example.org/tagset_doc.html">pos</SupportedLayer> 372 <SupportedLayer id="l4" result-id="http://endpoint.example.org/Layers/word" type="empty">word</SupportedLayer> 373 <SupportedLayer id="l5" result-id="http://endpoint.example.org/Layers/lemma1">lemma</SupportedLayer> 374 </SupportedLayers> 354 <ed:Capabilities> 355 <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability> 356 <ed:Capability>http://clarin.eu/fcs/capability/advanced-search</ed:Capability> 357 </ed:Capabilities> 358 <ed:SupportedDataViews> 359 <ed:SupportedDataView id="hits" delivery-policy="send-by-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView> 360 <ed:SupportedDataView id="adv" delivery-policy="send-by-default">application/x-clarin-fcs-adv+xml</ed:SupportedDataView> 361 <ed:SupportedDataView id="cmdi" delivery-policy="need-to-request">application/x-cmdi+xml</ed:SupportedDataView> 362 </ed:SupportedDataViews> 363 <ed:SupportedLayers> 364 <ed:SupportedLayer id="word" result-id="http://spraakbanken.gu.se/ns/fcs/layer/word">text</ed:SupportedLayer> 365 <ed:SupportedLayer id="orth" result-id="http://endpoint.example.org/Layers/orth" type="empty">orth</ed:SupportedLayer> 366 <ed:SupportedLayer id="lemma" result-id="http://spraakbanken.gu.se/ns/fcs/layer/lemma">lemma</ed:SupportedLayer> 367 <ed:SupportedLayer id="pos" result-id="http://spraakbanken.gu.se/ns/fcs/layer/pos" 368 alt-value-info="SUC tagset" 369 alt-value-info-uri="https://spraakbanken.gu.se/parole/Docs/SUC2.0-manual.pdf" 370 qualifier="suc">pos</ed:SupportedLayer> 371 <ed:SupportedLayer id="pos2" result-id="http://spraakbanken.gu.se/ns/fcs/layer/pos2" 372 alt-value-info="2nd tagset" 373 qualifier="t2">pos</ed:SupportedLayer> 374 </ed:SupportedLayers> 375 375 376 376 <ed:Resources> 377 377 <!-- just one top-level resource at the Endpoint --> 378 <ed:Resource pid="http://hdl.handle.net/4711/0815"> 379 <ed:Title xml:lang="de">Goethe Korpus</ed:Title> 380 <ed:Title xml:lang="en">Goethe corpus</ed:Title> 381 <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description> 382 <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description> 383 <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI> 384 <ed:Languages> 385 <ed:Language>deu</ed:Language> 386 </ed:Languages> 387 <ed:AvailableDataViews ref="hits" /> 388 <AvailableLayers ref="l1 l2 l3 l4 l5" /> 378 <ed:Resource pid="hdl:10794/suc"> 379 <ed:Title xml:lang="sv">SUC-korpusen</ed:Title> 380 <ed:Title xml:lang="en">The SUC corpus</ed:Title> 381 <ed:Description xml:lang="sv">Stockholm-Umeå-korpusen hos Språkbanken.</ed:Description> 382 <ed:Description xml:lang="en">The Stockholm-Umeå corpus at Språkbanken.</ed:Description> 383 <ed:LandingPageURI>https://spraakbanken.gu.se/resurser/suc</ed:LandingPageURI> 384 <ed:Languages> 385 <ed:Language>swe</ed:Language> 386 </ed:Languages> 387 <ed:AvailableDataViews ref="hits" /> 388 <ed:AvailableDataViews ref="adv"/> 389 <ed:AvailableLayers ref="word lemma pos pos2" /> 389 390 </ed:Resource> 390 391 </ed:Resources> 391 392 </ed:EndpointDescription> 392 393 }}} 393 {{{ 394 #!div style="border: 1px solid #000000; font-size: 75%" 395 TODO: describe the above example 396 }}} 394 [#REF_Example_6 Example 6] show an Endpoint Description for an Endpoint that supports the ''Advanced Search'' capability. The `ed:SupportedDataViews` also shows support for ADV in this case. The `ed:SupportedLayers` contains the list of SupportedLayer elements. These elements must carry an `@id` attribute that is referred to by a `ed:Resource` element to indicate which Data View is supported and a `@delivery-policy` attribute. The `@result-id` attribute is used in ADV. If needed the optional `@qualifier` attribute is used in a FCS-QL search term to address this specific layer, e.g. `pos` or `pos2`. The attribute `@alt-value-info` should contain a short description about the layer. If further information is needed use the `@alt-value-info-uri` attribute with an well-fomed URI to point to a web site. This information could be shown by the Aggregator together with any search results. The attribute `@type` has a default value of `value` which should only be changed to `empty` when needed. 397 395 398 396 == Searching