Changes between Version 20 and Version 21 of Taskforces/FCS/FCS-Specification-Draft


Ignore:
Timestamp:
10/28/15 11:06:42 (9 years ago)
Author:
Oliver Schonefeld
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Taskforces/FCS/FCS-Specification-Draft

    v20 v21  
    227227Add stuff required for advanced capability.
    228228}}}
     229Endpoints need to provide information about their capabilities to support auto-configuration of Clients. The ''Endpoint Description'' mechanism provides the necessary facility to provide this information to the Clients. Endpoints `MUST` encode their capabilities using an XML format and embed this information into the SRU/CQL protocol as described in section [#explain Operation ''explain'']. The XML fragment generated by the Endpoint for the Endpoint Description `MUST` be valid according to the XML schema "[source:FederatedSearch/schema/Endpoint-Description.xsd Endpoint-Description.xsd]" ([source:FederatedSearch/schema/Endpoint-Description.xsd?format=txt download]).
     230
     231The XML fragment for ''Endpoint Description'' is encoded as an `<ed:EndpointDescription>` element, that contains the following attributes and children:
     232 * one `@version` attribute (`REQUIRED`) on the `<ed:EndpointDescription>` element. The value of the `@version` attribute `MUST` be `2`.
     233 * one `<ed:Capabilities>` element (`REQUIRED`) that contains one or more `<ed:Capability>` elements \\
     234   The content of the `<ed:Capability>` element is a Capability Identifier, that indicates the capabilities, that are supported by the Endpoint. For valid values for the Capability Identifier, see section [#capabilities Capabilities]. This list `MUST NOT` include duplicate values.
     235 * one `<ed:SupportedDataViews>` element (`REQUIRED`) \\
     236   A list of Data Views that are supported by this Endpoint. This list is composed of one or more `<ed:SupportedDataView>` elements. The content of a `<ed:SupportedDataView>` `MUST` be the MIME type of a supported Data View, e.g. `application/x-clarin-fcs-hits+xml`. Each `<ed:SupportedDataView>` element `MUST` carry a `@id` and a `@delivery-policy` attribute. The value of the `@id` attribute is later used in the `<ed:Resource>` element to indicate, which Data View is supported by a resource (see below). Endpoints `SHOULD` use the recommended short identifier for the Data View. The `@delivery-policy` indicates, the Endpoint's delivery policy, for that Data View. Valid values are `send-by-default` for the ''send-by-default'' and `need-to-request` for the ''need-to-request'' delivery policy. \\
     237   This list `MUST NOT` include duplicate entries, i.e. no MIME type must appear more than once. \\
     238   The value of the `@id` attribute `MUST NOT` contain the characters `,` (comma) or `;` (semicolon)
     239 * one `<ed:SupportedLayers>` element (`REQUIRED` if Endpoint supports ''Advanced Search'' capability) \\
     240   A list of Layers that are generally supported by this Endpoint. This list is composed of one or more `<ed:SupportedLayer>` elements. The content of a `<ed:SupportedLayer>` `MUST` be the identifier of a Layer (see [#layers section "Layers"]), e.g. `orth`. Each `<ed:SupportedLayer>` element `MUST` carry an `@id` and a `@delivery-policy` attribute. The value of the `@id` attribute is later used in the `<ed:Resource>` element to indicate, which Data View is supported by a resource (see below). The `@result-id` attribute is used in the Advanced Data View (see [#advancedDataView section "Advanced Data View"]). Each `<ed:SupportedLayer>` element `MAY` carry an optional `@qualifier` attribute. It is used a a qualifier in a FCS-QL search term in to address this specific layer. \\
     241   This list `MUST NOT` include duplicate entries, i.e. no Layer with the same `@result-id` MIME type must appear more than once. \\
     242   The value of the `@id` or `@result-id` attribute `MUST NOT` contain the characters `,` (comma) or `;` (semicolon)
     243   The value of the `@qualifier` attribute `MUST NOT` contain characters other than `a`-`z`,`A`-`Z`,`0`-`9` and `-` (hyphen).
     244   The `<ed:SupportedLayer>` element `MAY` carry an `@alt-value-info` and `@alt-value-info-uri` attribute; `@alt-value-info` `SHOULD` contain a sort description about the layer, e.g. the original tag set used; `@alt-value-info-uri` `MUST` contain a well-formed URI and `SHOULD` point to a web site with further information, e.g. about the original tag set and how the translation to FCS is done. Client, e.g. the Aggregator, can display this information together with the search result.
     245 * one `<ed:Resources>` element (`REQUIRED`) \\
     246   A list of (top-level) resources that are available, i.e. searchable, at the Endpoint. The `<ed:Resources>` element contains one or more `<ed:Resource>` elements (see below). The Endpoint `MUST` declare at least one (top-level) resource.
     247
     248The `<ed:Resource>` element contains a basic description of a resource that is available at the Endpoint. A resource is a searchable entity, e.g. a single corpus. The `<ed:Resources>` has a mandatory `@pid` attribute that contains persistent identifier of the resource. This value `MUST` be the same as the ''!MdSelfLink'' of the CMDI record describing the resource. The `<ed:Resources>` element contains the following children:
     249 * one or more `<ed:Title>` elements (`REQUIRED`) \\
     250   A human readable title for the resource. A `REQUIRED` `@xml:lang` attribute indicates the language of the title. An English version of  the title is `REQUIRED`. The list of titles `MUST NOT` contain duplicate entries for the same language.
     251 * zero or more `<ed:Description>` elements (`OPTIONAL`) \\
     252   An optional human-readable description of the resource. It `SHOULD` be at most one sentence. A `REQUIRED` `@xml:lang` attribute indicates the language of the description. If supplied, an English version of the description is `REQUIRED`. The list of descriptions `MUST NOT` contain duplicate entries for the same language.
     253 * zero or one `<ed:LandingPageURI>` element (`OPTIONAL`) \\
     254   A link to a website for the resource, e.g. a landing page for a resource, i.e. a web-site that describes a corpus.
     255 * one `<ed:Languages>` element (`REQUIRED`) \\
     256   The (relevant) languages available within the resource. The `<ed:Languages>` element contains one or more `<ed:Language>` elements. The content of a `<ed:Language>` element `MUST` be a ISO 639-3 three letter language code. This element should be repeated for all languages (relevant) available ''within'' the resource, however this list `MUST NOT` contain duplicate entries.
     257 * one `<ed:AvailableDataViews>` element (`REQUIRED`) \\
     258   The Data Views that are available for the resource. The `<ed:AvailableDataViews>` element `MUST` carry a `@ref` attribute, that contains a whitespace separated list of id values, that correspond to value of the appropriate `@id` attribute for the `<ed:SupportedDataView>` elements that are referenced. \\
     259   In case of sub-resources, each Resource `SHOULD` support all Data Views that are supported by the parent resource. However, every resource `MUST` declare all available Data Views independently, i.e. there is no implicit inheritance semantic.
     260 * one `<ed:AvailableLayers>` element (`REQUIRED` if Endpoint supports ''Advanced Search'' capability). The `<ed:AvailableLayers>` element `MUST` carry a `@ref` attribute, that contains a whitespace separated list of id values, that correspond to the value of the appropriate `@id` attribute for the `<ed:SupportedLayer>` elements that are referenced. \\
     261   In case of sub-resources, each Resource `SHOULD` support all Layers that are supported by the parent resource. However, every resource `MUST` declare all available Layers independently, i.e. there is no implicit inheritance semantic.
     262* zero or one `<ed:Resources>` element (`OPTIONAL`) \\
     263  If a resource has searchable sub-resources, the Endpoint `MUST` supply additional finer grained resource elements, which are wrapped in a `<ed:Resources>` element. A sub-resource is a searchable entity within a resource, e.g. a sub-corpus.
     264
     265[=#REF_Example_4]Example 4:
     266{{{#!xml
     267<ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="2">
     268    <ed:Capabilities>
     269        <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability>
     270    </ed:Capabilities>
     271    <ed:SupportedDataViews>
     272        <ed:SupportedDataView id="hits" delivery-policy="send-by-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView>
     273    </ed:SupportedDataViews>
     274    <ed:Resources>
     275        <!-- just one top-level resource at the Endpoint -->
     276        <ed:Resource pid="http://hdl.handle.net/4711/0815">
     277            <ed:Title xml:lang="de">Goethe Corpus</ed:Title>
     278            <ed:Title xml:lang="en">Goethe Korpus</ed:Title>
     279            <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description>
     280            <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description>
     281            <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI>
     282            <ed:Languages>
     283                <ed:Language>deu</ed:Language>
     284            </ed:Languages>
     285            <ed:AvailableDataViews ref="hits" />
     286        </ed:Resource>
     287    </ed:Resources>
     288</ed:EndpointDescription>
     289}}}
     290[#REF_Example_4 Example 4] shows a simple Endpoint Description for an Endpoint that only supports the ''Basic Search'' Capability and only provides the Generic Hits Data View, which is indicated by a `<ed:SupportedDataView>` element. This element carries a `@id` attribute with a value of `hits`, the recommended value for the short identifier, and indicates a delivery policy of ''send-by-default'' by the `@delivery-policy` attribute. It only provides one top-level resource identified by the persistent identifier `http://hdl.handle.net/4711/0815`. The resource has a title as well as a description in German and English. A landing page is located at `http://repos.example.org/corpus1.html`. The predominant language in the resource contents is German. Only the Generic Hits Data View is supported for this resource, because the `<ed:AvailableDataViews>` element only references the `<ed:SupporedDataView>` element with the `@id` with a value of `hits`.
     291
     292[=#REF_Example_5]Example 5:
     293{{{#!xml
     294<ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="2">
     295    <ed:Capabilities>
     296        <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability>
     297    </ed:Capabilities>
     298    <ed:SupportedDataViews>
     299        <ed:SupportedDataView id="hits" delivery-policy="send-by-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView>
     300        <ed:SupportedDataView id="cmdi" delivery-policy="need-to-request">application/x-cmdi+xml</ed:SupportedDataView>
     301    </ed:SupportedDataViews>
     302    <ed:Resources>
     303        <!-- top-level resource 1 -->
     304        <ed:Resource pid="http://hdl.handle.net/4711/0815">
     305            <ed:Title xml:lang="de">Goethe Corpus</ed:Title>
     306            <ed:Title xml:lang="en">Goethe Korpus</ed:Title>
     307            <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description>
     308            <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description>
     309            <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI>
     310            <ed:Languages>
     311                <ed:Language>deu</ed:Language>
     312            </ed:Languages>
     313            <ed:AvailableDataViews ref="hits" />
     314        </ed:Resource>
     315        <!-- top-level resource 2 -->
     316        <ed:Resource pid="http://hdl.handle.net/4711/0816">
     317            <ed:Title xml:lang="de">Mannheimer Morgen newspaper Corpus</ed:Title>
     318            <ed:Title xml:lang="en">Zeitungskorpus des Mannheimer Morgen</ed:Title>
     319            <ed:LandingPageURI>http://repos.example.org/corpus2.html</ed:LandingPageURI>
     320            <ed:Languages>
     321                <ed:Language>deu</ed:Language>
     322            </ed:Languages>
     323            <ed:AvailableDataViews ref="hits cmdi" />
     324            <ed:Resources>
     325                <!-- sub-resource 1 of top-level resource 2 -->
     326                <ed:Resource pid="http://hdl.handle.net/4711/0816-1">
     327                    <ed:Title xml:lang="de">Mannheimer Morgen newspaper Corpus (before 1990)</ed:Title>
     328                    <ed:Title xml:lang="en">Zeitungskorpus des Mannheimer Morgen (vor 1990)</ed:Title>
     329                    <ed:LandingPageURI>http://repos.example.org/corpus2.html#sub1</ed:LandingPageURI>
     330                    <ed:Languages>
     331                        <ed:Language>deu</ed:Language>
     332                    </ed:Languages>
     333                    <ed:AvailableDataViews ref="hits cmdi" />
     334                </ed:Resource>
     335                <!-- sub-resource 2 of top-level resource 2 -->
     336                <ed:Resource pid="http://hdl.handle.net/4711/0816-2">
     337                    <ed:Title xml:lang="de">Mannheimer Morgen newspaper Corpus (after 1990)</ed:Title>
     338                    <ed:Title xml:lang="en">Zeitungskorpus des Mannheimer Morgen (nach 1990)</ed:Title>
     339                    <ed:LandingPageURI>http://repos.example.org/corpus2.html#sub2</ed:LandingPageURI>
     340                    <ed:Languages>
     341                        <ed:Language>deu</ed:Language>
     342                    </ed:Languages>
     343                    <ed:AvailableDataViews ref="hits cmdi" />
     344                </ed:Resource>
     345            </ed:Resources>
     346        </ed:Resource>
     347    </ed:Resources>
     348</ed:EndpointDescription>
     349}}}
     350The more complex [#REF_Example_5 Example 5] show an Endpoint Description for an Endpoint that, similar to [#REF_Example_4 Example 4], supports the ''Basic Search'' capability. In addition to the Generic Hits Data View, it also supports the CMDI Data View. The delivery polices are ''send-by-default'' for the Generic Hits Data View and ''need-to-request'' for the CMDI Data View. The Endpoint has two top-level resources (identified by the persistent identifiers `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816`. The second top-level resource has two independently searchable sub-resources, identified by the persistent identifier `http://hdl.handle.net/4711/0816-1` and `http://hdl.handle.net/4711/0816-2`. All resources are described using several properties, like title, description, etc. The first top-level resource provides only the Generic Hits Data View, while the other top-level resource including its children provide the Generic Hits and the CMDI Data Views.
     351
     352[=#REF_Example_6]Example 6:
     353{{{#!xml
     354<ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="2">
     355    <ed:Capabilities>
     356        <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability>
     357        <ed:Capability>http://clarin.eu/fcs/capability/advanced-search</ed:Capability>
     358    </ed:Capabilities>
     359    <ed:SupportedDataViews>
     360        <ed:SupportedDataView id="hits" delivery-policy="send-by-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView>
     361    </ed:SupportedDataViews>
     362    <!-- ADV-FCS -->
     363    <SupportedLayers>
     364        <SupportedLayer id="l1" result-id="http://endpoint.example.org/Layers/orth1">orth</SupportedLayer>
     365        <SupportedLayer id="l2" result-id="http://endpoint.example.org/Layers/pos1" qualifier="x">pos</SupportedLayer>
     366        <SupportedLayer id="l3" result-id="http://endpoint.example.org/Layers/pos2" qualifier="y"
     367            alt-value-info="STTS tagset"
     368            alt-value-info-uri="http://repos.example.org/tagset_doc.html">pos</SupportedLayer>
     369        <SupportedLayer id="l4" result-id="http://endpoint.example.org/Layers/word" type="empty">word</SupportedLayer>
     370        <SupportedLayer id="l5" result-id="http://endpoint.example.org/Layers/lemma1">lemma</SupportedLayer>
     371    </SupportedLayers>
     372
     373    <ed:Resources>
     374        <!-- just one top-level resource at the Endpoint -->
     375        <ed:Resource pid="http://hdl.handle.net/4711/0815">
     376            <ed:Title xml:lang="de">Goethe Corpus</ed:Title>
     377            <ed:Title xml:lang="en">Goethe Korpus</ed:Title>
     378            <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description>
     379            <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description>
     380            <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI>
     381            <ed:Languages>
     382                <ed:Language>deu</ed:Language>
     383            </ed:Languages>
     384            <ed:AvailableDataViews ref="hits" />
     385            <AvailableLayers ref="l1 l2 l3 l4 l5" />
     386        </ed:Resource>
     387    </ed:Resources>
     388</ed:EndpointDescription>
     389}}}
    229390
    230391== Searching
     
    253414The ''Advanced Search'' capability allows searching in annotated data. Queries can be across annotation layer, e.g. token and part-of-speech layer. CLARIN-FCS defined a set of search-able annotation layers with certain semantics and syntax. Endpoints `SHOULD` support as many different, of course depending on the resource type, annotation layers as possible.
    254415
    255 ==== Layers
     416==== Layers #layers
    256417||=Identifier =||=Annotation Tier Description                                =||=Syntax                          =||=Examples (without quotes)           =||
    257418|| `token`     || Appropriate tokenisation of resource, i.e. words            || ''String''                       || "Dog", "cat", "walked"               ||
     
    379540}}}
    380541
    381 ===== Advanced (ADV)
     542===== Advanced (ADV) #advancedDataView
    382543||=Description                  =|| The representation of the hit for Advanced Search ||
    383544||=MIME type                    =|| `application/x-clarin-fcs-adv+xml` ||
     
    449610
    450611
    451 == Operation ''explain''
     612== Operation ''explain'' #explain
    452613{{{
    453614#!div style="border: 1px solid #000000; font-size: 75%"
     
    542703}}}
    543704
    544 == Operation ''scan''
     705== Operation ''scan'' #scan
    545706The ''scan'' operation of the SRU protocol is currently not used in the ''Basic Search'' or ''Advanced Search'' capability of CLARIN-FCS. Future capabilities may use this operation, therefore it `NOT RECOMMENDED` for Endpoints to define custom extensions that use this operation.
    546707
    547 == Operation ''searchRetrieve''
     708== Operation ''searchRetrieve'' #searchRetrieve
    548709{{{
    549710#!div style="border: 1px solid #000000; font-size: 75%"