Changes between Version 57 and Version 58 of Taskforces/FCS/FCS-Specification-Draft


Ignore:
Timestamp:
06/12/17 15:53:45 (7 years ago)
Author:
Leif-Jöran
Comment:

Adding example 6 Advanced Search description and some proofing.

Legend:

Unmodified
Added
Removed
Modified
  • Taskforces/FCS/FCS-Specification-Draft

    v57 v58  
    182182
    183183Specifically, the CLARIN-FCS Interface Specification consists of two parts, a set of formats, and a transport protocol. The ''Endpoint'' component is a software component that acts as a bridge between a ''Client'' and a ''Search Engine'' and passes the requests sent by the ''Client'' to the ''Search Engine''. The ''Search Engine'' is a custom software component that allows the search of language resources in a Repository. The ''Endpoint'' implements the ''Transport Protocol'' and acts as a mediator between the CLARIN-FCS specific formats and the idiosyncrasies of ''Search Engines'' of the individual Repositories. The following figure illustrates the overall architecture:
    184 [[Image(fcs-2-interaktion.svg, 960)]]
     184[[Image(fcs-2-interaktion.svg, 940)]]
    185185{{{#!comment
    186186                   +---------+
     
    226226
    227227=== Endpoint Description #endpointDescription
    228 {{{
    229 #!div style="border: 1px solid #000000; font-size: 75%"
    230 Add stuff required for advanced capability.
    231 }}}
    232228Endpoints need to provide information about their capabilities to support auto-configuration of Clients. The ''Endpoint Description'' mechanism provides the necessary facility to provide this information to the Clients. Endpoints `MUST` encode their capabilities using an XML format and embed this information into the SRU/CQL protocol as described in section [#explain Operation ''explain'']. The XML fragment generated by the Endpoint for the Endpoint Description `MUST` be valid according to the XML schema "[source:FederatedSearch/schema/Core_2/Endpoint-Description.xsd Endpoint-Description.xsd]" ([source:FederatedSearch/schema/Core_2/Endpoint-Description.xsd?format=txt download]).
    233229
     
    241237   The value of the `@id` attribute `MUST NOT` contain the characters `,` (comma) or `;` (semicolon)
    242238 * one `<ed:SupportedLayers>` element (`REQUIRED` if Endpoint supports ''Advanced Search'' capability) \\
    243    A list of Layers that are generally supported by this Endpoint. This list is composed of one or more `<ed:SupportedLayer>` elements. The content of a `<ed:SupportedLayer>` `MUST` be the identifier of a Layer (see [#layers section "Layers"]), e.g. `orth`. Each `<ed:SupportedLayer>` element `MUST` carry an `@id` and a `@delivery-policy` attribute. The value of the `@id` attribute is later used in the `<ed:Resource>` element to indicate, which Data View is supported by a resource (see below). The `@result-id` attribute is used in the Advanced Data View (see [#advancedDataView section "Advanced Data View"]). Each `<ed:SupportedLayer>` element `MAY` carry an optional `@qualifier` attribute. It is used a a qualifier in a FCS-QL search term in to address this specific layer. \\
     239   A list of Layers that are generally supported by this Endpoint. This list is composed of one or more `<ed:SupportedLayer>` elements. The content of a `<ed:SupportedLayer>` `MUST` be the identifier of a Layer (see [#layers section "Layers"]), e.g. `orth`. Each `<ed:SupportedLayer>` element `MUST` carry an `@id` and a `@delivery-policy` attribute. The value of the `@id` attribute is later used in the `<ed:Resource>` element to indicate, which Data View is supported by a resource (see below). The `@result-id` attribute is used in the Advanced Data View (see [#advancedDataView section "Advanced Data View"]). Each `<ed:SupportedLayer>` element `MAY` carry an optional `@qualifier` attribute. It is used as a qualifier in a FCS-QL search term in to address this specific layer. \\
    244240   This list `MUST NOT` include duplicate entries, i.e. no Layer with the same `@result-id` MIME type must appear more than once. \\
    245241   The value of the `@id` or `@result-id` attribute `MUST NOT` contain the characters `,` (comma) or `;` (semicolon)
    246    The value of the `@qualifier` attribute `MUST NOT` contain characters other than `a`-`z`,`A`-`Z`,`0`-`9` and `-` (hyphen).
     242   The value of the `@qualifier` attribute `MUST NOT` contain characters other than `a`-`z`,`A`-`Z`,`0`-`9` and `-` (hyphen) and the  first character `MUST` be one of `a`-`z` or `A`-`Z`.
    247243   The `<ed:SupportedLayer>` element `MAY` carry an `@alt-value-info` and `@alt-value-info-uri` attribute; `@alt-value-info` `SHOULD` contain a sort description about the layer, e.g. the original tag set used; `@alt-value-info-uri` `MUST` contain a well-formed URI and `SHOULD` point to a web site with further information, e.g. about the original tag set and how the translation to FCS is done. Client, e.g. the Aggregator, can display this information together with the search result.
    248244 * one `<ed:Resources>` element (`REQUIRED`) \\
     
    356352{{{#!xml
    357353<ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="2">
    358     <ed:Capabilities>
    359         <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability>
    360         <ed:Capability>http://clarin.eu/fcs/capability/advanced-search</ed:Capability>
    361     </ed:Capabilities>
    362     <ed:SupportedDataViews>
    363         <ed:SupportedDataView id="hits" delivery-policy="send-by-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView>
    364     </ed:SupportedDataViews>
    365     <!-- ADV-FCS -->
    366     <SupportedLayers>
    367         <SupportedLayer id="l1" result-id="http://endpoint.example.org/Layers/orth1">orth</SupportedLayer>
    368         <SupportedLayer id="l2" result-id="http://endpoint.example.org/Layers/pos1" qualifier="x">pos</SupportedLayer>
    369         <SupportedLayer id="l3" result-id="http://endpoint.example.org/Layers/pos2" qualifier="y"
    370             alt-value-info="STTS tagset"
    371             alt-value-info-uri="http://repos.example.org/tagset_doc.html">pos</SupportedLayer>
    372         <SupportedLayer id="l4" result-id="http://endpoint.example.org/Layers/word" type="empty">word</SupportedLayer>
    373         <SupportedLayer id="l5" result-id="http://endpoint.example.org/Layers/lemma1">lemma</SupportedLayer>
    374     </SupportedLayers>
     354  <ed:Capabilities>
     355    <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability>
     356    <ed:Capability>http://clarin.eu/fcs/capability/advanced-search</ed:Capability>
     357  </ed:Capabilities>
     358  <ed:SupportedDataViews>
     359    <ed:SupportedDataView id="hits" delivery-policy="send-by-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView>
     360    <ed:SupportedDataView id="adv" delivery-policy="send-by-default">application/x-clarin-fcs-adv+xml</ed:SupportedDataView>
     361    <ed:SupportedDataView id="cmdi" delivery-policy="need-to-request">application/x-cmdi+xml</ed:SupportedDataView>
     362  </ed:SupportedDataViews>
     363  <ed:SupportedLayers>
     364    <ed:SupportedLayer id="word" result-id="http://spraakbanken.gu.se/ns/fcs/layer/word">text</ed:SupportedLayer>
     365    <ed:SupportedLayer id="orth" result-id="http://endpoint.example.org/Layers/orth" type="empty">orth</ed:SupportedLayer>
     366    <ed:SupportedLayer id="lemma" result-id="http://spraakbanken.gu.se/ns/fcs/layer/lemma">lemma</ed:SupportedLayer>
     367    <ed:SupportedLayer id="pos" result-id="http://spraakbanken.gu.se/ns/fcs/layer/pos"
     368                    alt-value-info="SUC tagset"
     369                    alt-value-info-uri="https://spraakbanken.gu.se/parole/Docs/SUC2.0-manual.pdf"
     370                    qualifier="suc">pos</ed:SupportedLayer>
     371    <ed:SupportedLayer id="pos2" result-id="http://spraakbanken.gu.se/ns/fcs/layer/pos2"
     372                    alt-value-info="2nd tagset"
     373                    qualifier="t2">pos</ed:SupportedLayer>
     374  </ed:SupportedLayers>
    375375
    376376    <ed:Resources>
    377377        <!-- just one top-level resource at the Endpoint -->
    378         <ed:Resource pid="http://hdl.handle.net/4711/0815">
    379             <ed:Title xml:lang="de">Goethe Korpus</ed:Title>
    380             <ed:Title xml:lang="en">Goethe corpus</ed:Title>
    381             <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description>
    382             <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description>
    383             <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI>
    384             <ed:Languages>
    385                 <ed:Language>deu</ed:Language>
    386             </ed:Languages>
    387             <ed:AvailableDataViews ref="hits" />
    388             <AvailableLayers ref="l1 l2 l3 l4 l5" />
     378        <ed:Resource pid="hdl:10794/suc">
     379          <ed:Title xml:lang="sv">SUC-korpusen</ed:Title>
     380          <ed:Title xml:lang="en">The SUC corpus</ed:Title>
     381          <ed:Description xml:lang="sv">Stockholm-Umeå-korpusen hos Språkbanken.</ed:Description>
     382          <ed:Description xml:lang="en">The Stockholm-Umeå corpus at Språkbanken.</ed:Description>
     383          <ed:LandingPageURI>https://spraakbanken.gu.se/resurser/suc</ed:LandingPageURI>
     384          <ed:Languages>
     385             <ed:Language>swe</ed:Language>
     386          </ed:Languages>
     387          <ed:AvailableDataViews ref="hits" />
     388          <ed:AvailableDataViews ref="adv"/>
     389          <ed:AvailableLayers ref="word lemma pos pos2" />
    389390        </ed:Resource>
    390391    </ed:Resources>
    391392</ed:EndpointDescription>
    392393}}}
    393 {{{
    394 #!div style="border: 1px solid #000000; font-size: 75%"
    395 TODO: describe the above example
    396 }}}
     394[#REF_Example_6 Example 6] show an Endpoint Description for an Endpoint that supports the ''Advanced Search'' capability. The `ed:SupportedDataViews` also shows support for ADV in this case. The `ed:SupportedLayers` contains the list of SupportedLayer elements. These elements must carry an `@id` attribute that is referred to by a `ed:Resource` element to indicate which Data View is supported and a `@delivery-policy` attribute. The `@result-id` attribute is used in ADV. If needed the optional `@qualifier` attribute is used in a FCS-QL search term to address this specific layer, e.g. `pos` or `pos2`. The attribute `@alt-value-info` should contain a short description about the layer. If further information is needed use the `@alt-value-info-uri` attribute with an well-fomed URI to point to a web site. This information could be shown by the Aggregator together with any search results. The attribute `@type` has a default value of `value` which should only be changed to `empty` when needed.
    397395
    398396== Searching