Changes between Version 61 and Version 62 of Taskforces/FCS/FCS-Specification-Draft
- Timestamp:
- 06/13/17 12:27:49 (7 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Taskforces/FCS/FCS-Specification-Draft
v61 v62 582 582 ||=XML Schema =|| [source:FederatedSearch/schema/Core_2/DataView-Advanced.xsd DataView-Advanced.xsd] ([source:FederatedSearch/schema/Core_2/DataView-Advanced.xsd?format=txt download]) || 583 583 584 The ''Advanced (ADV)'' Data View serves as the natual serialization of search results for ''Advanced Search'' queries. The ADV Data View supports structured information in one or more annotation layers. The annotations are streams (ranges) over the signal in a stand-off like format with start and end offsets. The list of `Segment` elements building a stream can be of type `item` for character-based streams or `timestamp` for audio streams (granularity up to 0.001s). The Endpoint is responsible for choosing the proper offsets for the segments. The segments `MUST` be possible to align over all annotation layers. For character streams the recommendation is Unicode Normalization Form ''KC''. Segments `MAY` also have an endpoint specific reference indicated by an URI that could be shown in the Aggregator, e.g. to open an audio player or other viewer with contents from the Search Engine. The list of `Layer` elements contains `Span` elements making references to the segments. A Span inherits the start and end offsets from its segments and contains the actual annotation as its content. It `MAY` also carry information about the original annotation value in an `@alt-value` attribute. The document order of the `Layer` elements define the view order in the Aggregator. Each Layer has a ''Layer Type Identifier'' and a ''Layer Identifier''. The Endpoint `SHOULD` at least resturn all layers that were referenced in the query. It `MAY` return more layers. The attribute `@highlight` is used to mark Spans as hits. Multiple hit markers are supported and the Aggregator `MAY` display them visually distinct. It is up to the Endpoint to decide what should be marked as a hit, but the recommendation is to mark everything referenced in thequery.584 The ''Advanced (ADV)'' Data View serves as the natual serialization of search results for ''Advanced Search'' queries. The ADV Data View supports structured information in one or more annotation layers. The annotations are streams (ranges) over the signal in a stand-off like format with start and end offsets. The list of `Segment` elements building a stream can be of type `item` for character-based streams or `timestamp` for audio streams (granularity up to 0.001s). The Endpoint is responsible for choosing the proper offsets for the segments. The segments `MUST` be possible to align over all annotation layers. For character streams the recommendation is Unicode Normalization Form ''KC''. Segments `MAY` also have an endpoint specific reference indicated by an URI that could be shown in the Aggregator, e.g. to open an audio player or other viewer with contents from the Search Engine. The list of `Layer` elements contains `Span` elements making references to the segments. A `Span` inherits the start and end offsets from its segments and contains the actual annotation as its content. It `MAY` also carry information about the original annotation value in an `@alt-value` attribute. The document order of the `Layer` elements define the view order in the Aggregator. Each Layer has a ''Layer type identifier'' and a ''Layer identifier''. The Endpoint `SHOULD` at least return all layers that were referenced in the Advanced Search query. It `MAY` return more layers. The attribute `@highlight` is used to mark Spans as hits. Multiple hit markers are supported and the Aggregator `MAY` display them visually distinct. It is up to the Endpoint to decide what should be marked as a hit, but the recommendation is to mark everything referenced in the Advanced Search query. 585 585 586 586 {{{#!comment … … 699 699 === Versioning and Extensions 700 700 ==== Backwards Compatibility #backwardsCompatibility 701 {{{ 702 #!div style="border: 1px solid #000000; font-size: 75%" 703 TODO: check and proof-read 704 }}} 705 706 Clients `MUST` be compatible to CLARIN-FCS 1.0, thus must implement SRU 1.2. If a Client uses CLARIN-FCS 1.0 to talk to an Endpoint, it `MUST NOT` use features beyond the Basic Search capability. Clients `MUST` implement a heuristic to automatically determine which CLARIN-FCS protocol version, i.e. which version of the SRU protocol, can be used talk an Endpoint. 701 Clients `MUST` be compatible to CLARIN-FCS 1.0, thus `MUST` implement SRU 1.2. If a Client uses CLARIN-FCS 1.0 to talk to an Endpoint, it `MUST NOT` use features beyond the Basic Search capability. Clients `MUST` implement a heuristic to automatically determine which CLARIN-FCS protocol version, i.e. which version of the SRU protocol, can be used talk an Endpoint. 707 702 708 703 Clients `MUST` be able to process the legacy XML namespaces: … … 846 841 }}} 847 842 843 {{{#!xml 844 <sruResponse:explainResponse> 845 <sruResponse:version>2.0</sruResponse:version> 846 <sruResponse:record> 847 <sruResponse:recordSchema>http://explain.z3950.org/dtd/2.0/</sruResponse:recordSchema> 848 <sruResponse:recordXMLEscaping>xml</sruResponse:recordXMLEscaping> 849 <sruResponse:recordData> 850 <zr:explain> 851 <zr:serverInfo protocol="SRU" version="2.0" transport="http"> 852 <zr:host>127.0.0.1</zr:host> 853 <zr:port>8080</zr:port> 854 <zr:database>korp-endpoint</zr:database> 855 </zr:serverInfo> 856 <zr:databaseInfo> 857 <zr:title lang="se">Språkbankens korpusar</zr:title> 858 <zr:title lang="en" primary="true">The Språkbanken corpora</zr:title> 859 <zr:description lang="se">Sök i Språkbankens korpusar.</zr:description> 860 <zr:description lang="en" primary="true">Search in the Språkbanken corpora.</zr:description> 861 <zr:author lang="en">Språkbanken (The Swedish Language Bank)</zr:author> 862 <zr:author lang="se" primary="true">Språkbanken</zr:author> 863 </zr:databaseInfo> 864 <zr:indexInfo> 865 <zr:set identifier="http://clarin.eu/fcs/resource" name="fcs"> 866 <zr:title lang="se">Clarins innehållssökning</zr:title> 867 <zr:title lang="en" primary="true">CLARIN Content Search</zr:title> 868 </zr:set> 869 <zr:index search="true" scan="false" sort="false"> 870 <zr:title lang="en" primary="true">Words</zr:title> 871 <zr:map primary="true"> 872 <zr:name set="fcs">words</zr:name> 873 </zr:map> 874 </zr:index> 875 </zr:indexInfo> 876 <zr:schemaInfo> 877 <zr:schema identifier="http://clarin.eu/fcs/resource" name="fcs"> 878 <zr:title lang="en" primary="true">CLARIN Content Search</zr:title> 879 </zr:schema> 880 </zr:schemaInfo> 881 <zr:configInfo> 882 <zr:default type="numberOfRecords">250</zr:default> 883 <zr:setting type="maximumRecords">1000</zr:setting> 884 </zr:configInfo> 885 </zr:explain> 886 </sruResponse:recordData> 887 </sruResponse:record> 888 <sruResponse:echoedExplainRequest> 889 <sruResponse:version>2.0</sruResponse:version> 890 </sruResponse:echoedExplainRequest> 891 <sruResponse:extraResponseData> 892 <ed:EndpointDescription version="2"> 893 <ed:Capabilities> 894 <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability> 895 <ed:Capability>http://clarin.eu/fcs/capability/advanced-search</ed:Capability> 896 </ed:Capabilities> 897 <ed:SupportedDataViews> 898 <ed:SupportedDataView id="hits" delivery-policy="send-by-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView> 899 <ed:SupportedDataView id="adv" delivery-policy="send-by-default">application/x-clarin-fcs-adv+xml</ed:SupportedDataView> 900 <ed:SupportedDataView id="cmdi" delivery-policy="need-to-request">application/x-cmdi+xml</ed:SupportedDataView> 901 </ed:SupportedDataViews> 902 <ed:SupportedLayers> 903 <ed:SupportedLayer id="word" result-id="http://spraakbanken.gu.se/ns/fcs/layer/word">text</ed:SupportedLayer> 904 <ed:SupportedLayer id="lemma" result-id="http://spraakbanken.gu.se/ns/fcs/layer/lemma">lemma</ed:SupportedLayer> 905 <ed:SupportedLayer id="pos" result-id="http://spraakbanken.gu.se/ns/fcs/layer/pos">pos</ed:SupportedLayer> 906 </ed:SupportedLayers> 907 <ed:Resources> 908 <ed:Resource pid="hdl:10794/suc"> 909 <ed:Title xml:lang="sv">SUC-korpusen</ed:Title> 910 <ed:Title xml:lang="en">The SUC corpus</ed:Title> 911 <ed:Description xml:lang="sv">Stockholm-Umeå-korpusen hos Språkbanken.</ed:Description> 912 <ed:Description xml:lang="en">The Stockholm-Umeå corpus at Språkbanken.</ed:Description> 913 <ed:LandingPageURI>https://spraakbanken.gu.se/resurser/suc</ed:LandingPageURI> 914 <ed:Languages> 915 <ed:Language>swe</ed:Language> 916 </ed:Languages> 917 <ed:AvailableDataViews ref="hits"/> 918 <ed:AvailableLayers ref="word"/> 919 </ed:Resource> 920 </ed:Resources> 921 </ed:EndpointDescription> 922 </sruResponse:extraResponseData> 923 </sruResponse:explainResponse> 924 }}} 925 848 926 == Operation ''scan'' #scan 849 927 The ''scan'' operation of the SRU protocol is currently not used in the ''Basic Search'' or ''Advanced Search'' capability of CLARIN-FCS. Future capabilities may use this operation, therefore it is `NOT RECOMMENDED` for Endpoints to define custom extensions that use this operation.