{{{ #!div class="system-message" '''NOTE''': This page is work-in-progress. Final draft is scheduled to be delivered by 2015-10-31. }}} [[PageOutline(1-6)]] = CLARIN Federated Content Search (CLARIN-FCS) - Core 2.0 = Introduction {{{ #!div style="border: 1px solid #000000; font-size: 75%" TODO: Proof-read/Check sub-sections. }}} The goal of the ''CLARIN Federated Content Search (CLARIN-FCS) - Core'' specification is to introduce an ''interface specification'' that decouples the ''search engine'' functionality from its ''exploitation'', i.e. user-interfaces, third-party applications, and to allow services to access heterogeneous search engines in a uniform way. == Terminology The key words `MUST`, `MUST NOT`, `REQUIRED`, `SHALL`, `SHALL NOT`, `SHOULD`, `SHOULD NOT`, `RECOMMENDED`, `MAY`, and `OPTIONAL` in this document are to be interpreted as described in [#REF_RFC_2119 RFC2119]. == Glossary Aggregator:: A module or service to dispatch queries to repositories and collect results. CLARIN-FCS, FCS:: CLARIN federated content search, an interface specification to allow searching within resource content of repositories. Client:: A software component, which implements the interface specification to query Endpoints, i.e. an aggregator or a user-interface. CQL:: Contextual Query Language, previously known as Common Query Language, is a domain specific language for representing queries to information retrieval systems such as search engines, bibliographic catalogs and museum collection information. Data View:: A Data View is a mechanism to support different representations of search results, e.g. a "hits with highlights" view, an image or a geolocation. Data View Payload, Payload:: The actual content encoded within a Data View, i.e. a CMDI metadata record or a KML encoded geolocation. Endpoint:: A software component, which implements the CLARIN-FCS interface specification and translates between CLARIN-FCS and a search engine. FCS-QL:: Federated Content Search Query Language is the query language used in the advanced CLARIN-FCS profile. It is derived from Corpus Workbench's [#REF_CQP_Tutorial CQP-TUTORIAL] Hit:: A piece of data returned by a Search Engine that matches the search criterion. What is considered a Hit highly depends on Search Engine. Interface Specification:: Common harmonized interface and suite of protocols that repositories need to implement. PID:: A Persistent identifier is a long-lasting reference to a digital object. Repository:: A software component at a CLARIN center that stores resources (= data) and information about these resources (= metadata). Repository Registry:: A separate service that allows registering Repositories and their Endpoints and provides information about these to other components, e.g. an Aggregator. The [http://centres.clarin.eu/ CLARIN Center Registry] is an implementation of such a repository registry. Resource:: A searchable and addressable entity at an Endpoint, such as a text corpus or a multi-modal corpus. Resource Fragment:: A smaller unit in a Resource, i.e. a sentence in a text corpus or a time interval in an audio transcription. Result Set:: An (ordered) set of hits that match a search criterion produced by a search engine as the result of processing a query. Search Engine:: A software component within a repository, that allows for searching within the repository contents. SRU:: Search and Retrieve via URL, is a protocol for Internet search queries. Originally introduced by Library of Congress [#REF_LOC_SRU_12 LOC-SRU12], later standardization process moved to OASIS [#REF_SRU_12 OASIS-SRU12]. == Normative References RFC2119[=#REF_RFC_2119]:: Key words for use in RFCs to Indicate Requirement Levels, IETF RFC 2119, March 1997, \\ [http://www.ietf.org/rfc/rfc2119.txt] XML-Namespaces[=#REF_XML_Namespaces]:: Namespaces in XML 1.0 (Third Edition), W3C, 8 December 2009, \\ [http://www.w3.org/TR/2009/REC-xml-names-20091208/] OASIS-SRU-Overview[=#REF_SRU_Overview]:: searchRetrieve: Part 0. Overview Version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.doc] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.html (HTML)], [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.pdf (PDF)] OASIS-SRU-APD[=#REF_SRU_APD]:: searchRetrieve: Part 1. Abstract Protocol Definition Version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.doc] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.html (HTML)] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.pdf (PDF)] OASIS-SRU12[=#REF_SRU_12]:: searchRetrieve: Part 2. SRU searchRetrieve Operation: APD Binding for SRU 1.2 Version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.doc] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.html (HTML)] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.pdf (PDF)] OASIS-SRU20[=#REF_SRU_20]:: searchRetrieve: Part 3. SRU searchRetrieve Operation: APD Binding for SRU 2.0 Version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part3-sru2.0/searchRetrieve-v1.0-os-part3-sru2.0.doc] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part3-sru2.0/searchRetrieve-v1.0-os-part3-sru2.0.html (HTML)] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part3-sru2.0/searchRetrieve-v1.0-os-part3-sru2.0.pdf (PDF)] OASIS-CQL[=#REF_CQL]:: searchRetrieve: Part 5. CQL: The Contextual Query Language version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.doc] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.html (HTML)] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.pdf (PDF)] SRU-Explain[=#REF_Explain]:: searchRetrieve: Part 7. SRU Explain Operation version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.doc] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.html (HTML)] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.pdf (PDF)] SRU-Scan[=#REF_Scan]:: searchRetrieve: Part 6. SRU Scan Operation version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.doc] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.html (HTML)] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.PDF (PDF)] LOC-SRU12[=#REF_LOC_SRU_12]:: SRU Version 1.2: SRU !Search/Retrieve Operation, Library of Congress, \\ [http://www.loc.gov/standards/sru/sru-1-2.html] LOC-DIAG[=#REF_LOC_DIAG]:: SRU Version 1.2: SRU Diagnostics List, Library of Congress,\\ [http://www.loc.gov/standards/sru/diagnostics/diagnosticsList.html] UD-POS[=#REF_UD_POS]:: Universal Dependencies, Universal POS tags, \\ [https://universaldependencies.github.io/docs/u/pos/index.html] SAMPA[=#REF_SAMPA]:: Dafydd Gibbon, Inge Mertins, Roger Moore (Eds.): Handbook of Multimodal and Spoken Language Systems. Resources, Terminology and Product Evaluation, Kluwer Academic Publishers, Boston MA, 2000, ISBN 0-7923-7904-7 CLARIN-FCS-!DataViews[=#REF_FCS_DataViews]:: CLARIN Federated Content Search (CLARIN-FCS) - Data Views, SCCTC FCS Task-Force, April 2014, \\ [https://trac.clarin.eu/wiki/FCS/Dataviews] == Non-Normative References CQP-TUTORIAL[=#REF_CQP_Tutorial]:: Evert et al.: The IMS Open Corpus Workbench (CWB) CQP Query Language Tutorial, CWB Version 3.0, February 2010, \\ [http://cwb.sourceforge.net/files/CQP_Tutorial/] RFC6838[=#REF_RFC_6838]:: Media Type Specifications and Registration Procedures, IETF RFC 6838, January 2013, \\ [http://www.ietf.org/rfc/rfc6838.txt] RFC3023[=#REF_RFC_3023]:: XML Media Types, IETF RFC 3023, January 2001, \\ [http://www.ietf.org/rfc/rfc3023.txt] == Typographic and XML Namespace conventions The following typographic conventions for XML fragments will be used throughout this specification: * `` \\ An XML element with the Generic Identifier ''Element'' that is bound to an XML namespace denoted by the prefix ''prefix''. * `@attr` \\ An XML attribute with the name ''attr'' {{{#!comment * `@prefix:attr` \\ An XML attribute with the name ''attr'' that is bound to an XML namespaces denoted by the prefix ''prefix''. }}} * `string` \\ The literal ''string'' must be used either as element content or attribute value. Endpoints and Clients `MUST` adhere to the [#REF_XML_Namespaces XML-Namespaces] specification. The CLARIN-FCS interface specification generally does not dictate whether XML elements should be serialized in their prefixed or non-prefixed syntax, but Endpoints `MUST` ensure that the correct XML namespace is used for elements and that XML namespaces are declared correctly. Clients `MUST` be agnostic regarding syntax for serializing the XML elements, i.e. if the prefixed or un-prefixed variant was used, and `SHOULD` operate solely on ''expanded names'', i.e. pairs of ''namespace name'' and ''local name''. The following XML namespace names and prefixes are used throughout this specification. The column "Recommended Syntax" indicates which syntax variant `SHOULD` be used by the Endpoint to serialize the XML response. ||=Prefix =||=Namespace Name =||=Comment =||=Recommended Syntax =|| || `fcs` || `http://clarin.eu/fcs/resource` || CLARIN-FCS Resources || prefixed || || `ed` || `http://clarin.eu/fcs/endpoint-description` || CLARIN-FCS Endpoint Description || prefixed || || `hits` || `http://clarin.eu/fcs/dataview/hits` || CLARIN-FCS Generic Hits Data View || prefixed || || `adv` || `http://clarin.eu/fcs/dataview/advanced` || CLARIN-FCS Advanced Data View || prefixed || || `sru` || `http://www.loc.gov/zing/srw/` || SRU || prefixed || || `diag` || `http://www.loc.gov/zing/srw/diagnostic/` || SRU Diagnostics || prefixed || || `zr` || `http://explain.z3950.org/dtd/2.0/` || SRU/ZeeRex Explain || prefixed || {{{ #!div style="border: 1px solid #000000; font-size: 75%" Careful with the SRU Namespaces; they probably need to be adjusted SRU 2.0 (=> OASIS). }}} = CLARIN-FCS Interface Specification The CLARIN-FCS Interface Specification defines a set of capabilities, an extensible result format and a set of required operations. CLARIN-FCS is built on the SRU/CQL standard and additional functionality required for CLARIN-FCS is added through SRU/CQL's extension mechanisms. Specifically, the CLARIN-FCS Interface Specification consists of two parts, a set of formats, and a transport protocol. The ''Endpoint'' component is a software component that acts as a bridge between a ''Client'' and a ''Search Engine'' and passes the requests sent by the ''Client'' to the ''Search Engine''. The ''Search Engine'' is a custom software component that allows the search of language resources in a Repository. The ''Endpoint'' implements the ''Transport Protocol'' and acts as a mediator between the CLARIN-FCS specific formats and the idiosyncrasies of ''Search Engines'' of the individual Repositories. The following figure illustrates the overall architecture: {{{ +---------+ | Client | +---------+ /|\ | | SRU / CQL | w/CLARIN-FCS extensions | \|/ +----------------------------------------------+ | | Endpoint /|\ | | | | | | ------------------- ------------------- | | | translate request | | translate result | | | ------------------- ------------------- | | | | | | \|/ | | +----------------------------------------------+ /|\ | | Search Engine specific protocols/formats | \|/ +---------------------------+ | Search Engine | +---------------------------+ }}} In general, the work flow in CLARIN-FCS is as follows: a Client submits a query to an Endpoint. The Endpoint translates the query from CQL or FCS-QL to the query dialect used by the Search Engine and submits the translated query to the Search Engine. The Search Engine processes the query and generates a result set, i.e. it compiles a set of hits that match the search criterion. The Endpoint then translates the results from the Search Engine-specific result set format to the CLARIN-FCS result format and sends them to the Client. == Discovery #Discovery The ''Discovery'' step allows a Client to gather information about an Endpoint, in particular which capabilities are supported or which resources are available for searching. === Capabilities A ''Capability'' defines a certain feature set that is part of CLARIN-FCS, e.g. what kind of queries are supported. Each Endpoint implements some (or all) of these Capabilities. The Endpoint will announce the capabilities it provides to allow a Client to auto-tune itself (see section [#endpointDescription Endpoint Description]). Each Capability is identified by a ''Capability Identifier'', which uses the URI syntax. The following Capabilities are defined in CLARIN-FCS defined: ||=Name =||=Capability Identifier =||=Summary =|| || ''Basic Search'' || `http://clarin.eu/fcs/capability/basic-search` || Simple full-text searching || || ''Advanced Search'' || `http://clarin.eu/fcs/capability/advanced-search` || Searching in structured and/or annotated data || Endpoints `MUST` implement the ''Basic Search'' Capability. Endpoints `MUST NOT` invent custom Capability Identifiers and `MUST` only use the values defined above. === Endpoint Description #endpointDescription {{{ #!div style="border: 1px solid #000000; font-size: 75%" Add stuff required for advanced capability. }}} == Searching In the ''Searching'' step the Client performs the actual search request to a to previously [#Discovery discovered] Endpoint. === Basic Search #basicSearch The ''Basic Search'' capability provides simple full-text search. Queries in Basic Search `MUST` be performed in the ''Contextual Query Language'' ([#REF_CQL OASIS-CQL]). The Endpoint `MUST` support ''term-only'' queries. The Endpoint `SHOULD` support ''terms'' combined with boolean operator queries (''AND'' and ''OR''), including sub-queries. An Endpoint `MAY` also support ''NOT'' or ''PROX'' operator queries. If an Endpoint does not support a query, i.e. the used operators are not supported by the Endpoint, it `MUST` return an appropriate error message using the appropriate SRU diagnostic ([#REF_LOC_DIAG LOC-DIAG]). The Endpoint `MUST` perform the query on an annotation tier that makes the most sense for the user, i.e. the textual content for a text corpus resource or the orthographic transcription of a spoken language corpus. Endpoints `SHOULD` perform the query case-sensitive. Examples for valid CQL queries for Basic Search are: {{{ cat "cat" cat AND dog "grumpy cat" "grumpy cat" AND dog "grumpy cat" OR "lazy dog" cat AND (mouse OR "lazy dog") }}} '''NOTE''': In CQL, a ''term'' can be a single token or a phrase, i.e. tokens separated by spaces. If a single ''term'' contains spaces, it needs to be quoted. \\ '''NOTE''': Endpoints `MUST` be able to parse all of CQL. If they don't support a certain CQL feature, they `MUST` generate an appropriate error message (see section [#sruCQL SRU/CQL]). Especially, if an Endpoint ''only'' supports ''Basic Search'', it `MUST NOT` silently accept queries that include CQL features besides ''term-only'' and ''terms'' combined with boolean operator queries, i.e. queries involving context sets, etc. === Advanced Search The ''Advanced Search'' capability allows searching in annotated data. Queries can be across annotation layer, e.g. token and part-of-speech layer. CLARIN-FCS defined a set of search-able annotation layers with certain semantics and syntax. Endpoints `SHOULD` support as many different, of course depending n the resource type, annotation layers as possible. ==== Layers ||=Identifier =||=Annotation Tier Description =||=Syntax =||=Examples (without quotes) =|| || `token` || Appropriate tokenisation of resource, i.e. words || ''String'' || "Dog", "cat", "walked" || || `lemma` || Lemmatisation of tokens || ''String'' || "good", "walking", "dog" || || `pos` || Part-of-Speech annotations || [#REF_UD_POS Universal POS tags] || "NOUN", "VERB", "ADJ" || || `orth` || Orthographic transcription of (mostly) spoken resources || ''String'' || "dug", "cat", "wolking" || || `norm` || Orthographic normalization of (mostly) spoken resources || ''String'' || "dog", "cat", "walking" || || `phonetic` || Phonetic transcription || [#REF_SAMPA SAMPA] || "'du:", "'vi:-d6 'ha:-b@n" || || `text` || Annotation tier that is used in [#basicSearch Basic Search] || ''String'' || "Dog", "cat" "walked" || The column Syntax describes the inventory of symbols that a Client `MUST` use with a corresponding annotation layer; the value ''String'' denotes that symbols are arbitrary Unicode Strings, i.e. no fixed inventory of symbols are defined. An Endpoint `SHOULD` provide an appropriate error, if a Client used an invalid value. ==== FCS-QL TODO: About FCS-QL === Result Format {{{ #!div style="border: 1px solid #000000; font-size: 75%" TODO: check and adjust / proof-read }}} The Search Engine will produce a result set containing several hits as the outcome of processing a query. The Endpoint `MUST` serialize these hits in the CLARIN-FCS result format. Endpoints are `REQUIRED` to adhere to the principle, that ''one'' hit `MUST` be serialized as ''one'' CLARIN-FCS result record and `MUST NOT` combine several hits in one CLARIN-FCS result record. E.g., if a query matches five different sentences within one text (= the resource), the Endpoint must serialize them as five SRU records each with one Hit each referencing the same containing Resource (see section [#searchRetrieve Operation ''searchRetrieve'']). CLARIN-FCS uses a customized format for returning results. ''Resource'' and ''Resource Fragments'' serve as containers for hit results, which are presented in one or more ''Data View''. The following section describes the Resource format and Data View format and section [#searchRetrieve Operation ''searchRetrieve''] will describe, how hits are embedded within SRU responses. ==== Resource and !ResourceFragment To encode search results, CLARIN-FCS supports two building blocks: Resources:: A ''Resource'' is a ''searchable'' and ''addressable'' entity at the Endpoint, such as a text corpus or a multi-modal corpus. A resource `SHOULD` be a self-contained unit, i.e. not a single sentence in a text corpus or a time interval in an audio transcription, but rather a complete document from a text corpus or a complete audio transcription. Resource Fragments:: A ''Resource Fragment'' is a smaller unit in a ''Resource'', i.e. a sentence in a text corpus or a time interval in an audio transcription. A Resource `SHOULD` be the most precise unit of data that is directly addressable as a "whole". A Resource `SHOULD` contain a Resource Fragment, if the hit consists of just a part of the Resource unit (for example if the hit is a sentence within a large text). A Resource Fragment `SHOULD` be addressable within a resource, i.e. it has an offset or a resource-internal identifier. Using Resource Fragments is `OPTIONAL`, but Endpoints are encouraged to use them. If the Endpoint encodes a hit with a Resource Fragment, the actual hit `SHOULD` be encoded as a Data View that within the Resource Fragment. Endpoints `SHOULD` always provide a link to the resource itself, i.e. each Resource or Resource Fragment `SHOULD` be identified by a persistent identifier or providing a URI, that is unique for Endpoint. Even if direct linking is not possible, i.e. due to licensing issues, the Endpoints `SHOULD` provide a URI to link to a web-page describing the corpus or collection, including instruction on how to obtain it. Endpoints `SHOULD` provide links that are as specific as possible (and logical), i.e. if a sentence within a resource cannot be addressed directly, the Resource Fragment `SHOULD NOT` contain a persistent identifier or a URI. If the Endpoint can provide both, a persistent identifier as well as a URI, for either Resource or Resource Fragment, they `SHOULD` provide both. When working with results, Clients `SHOULD` prefer persistent identifiers over regular URIs. Resource and Resource Fragment are serialized in XML and Endpoints `MUST` generate responses that are valid according to the XML schema "[source:FederatedSearch/schema/Resource.xsd Resource.xsd]" ([source:FederatedSearch/schema/Resource.xsd?format=txt download]). A Resource is encoded in the form of a `` element, a ''Resource Fragment'' in the form of a `` element. The content of a Data View is wrapped in a `` element. `` is the top-level element and `MAY` contain zero or more `` elements and `MAY` contain zero or more `` elements. A `` element `MUST` contain one or more `` elements. The elements ``, `` and `` `MAY` carry a `@pid` and/or a `@ref` attribute, which allows linking to the original data represented by the Resource, Resource Fragment, or Data View. A `@pid` attribute `MUST` contain a valid persistent identifier, a `@ref` `MUST` contain valid URI, i.e. a "plain" URI without the additional semantics of being a persistent reference. If the Endpoint cannot provide a `@pid` attribute for a ``, they `SHOULD` provide a `@ref` attribute. Endpoint `SHOULD` add either a `@pid` or `@ref` attribute to either the `` or the `` element, if possible to both elements. Endpoints are `RECOMMENDED` to give `@pid` attributes, if they can provide them. Endpoints `MUST` use the identifier `http://clarin.eu/fcs/resource` for the ''responseItemType'' (= content for the `` element) in SRU responses. Endpoints `MAY` serialize hits as multiple Data Views, however they `MUST` provide the Generic Hits (HITS) Data View either encoded as a Resource Fragment (if applicable), or otherwise within the Resource (if there is no reasonable Resource Fragment). Other Data Views `SHOULD` be put in a place that is logical for their content (as is to be determined by the Endpoint), e.g. a metadata Data View would most likely be put directly below Resource and a Data View representing some annotation layers directly around the hit is more likely to belong within a Resource Fragment. [=#REF_Example_1]Example 1: {{{#!xml }}} [#REF_Example_1 Example 1] shows a simple hit, which is encoded in one Data View of type ''Generic Hits'' embedded within a Resource. The type of the Data View is identified by the MIME type `application/x-clarin-fcs-hits+xml`. The Resource is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`. [=#REF_Example_2]Example 2: {{{#!xml }}} [#REF_Example_2 Example 2] shows a hit encoded as a Resource Fragment embedded within a Resource. The actual hit is again encoded as one Data View of type ''Generic Hits''. The hit is not directly referenceable, but the Resource, in which hit occurred, is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`. In contrast to [#REF_Example_1 Example 1], the Endpoint decided to provide a "semantically richer" encoding and embedded the hit using a Resource Fragment within the Resource to indicate that the hit is a part of a larger resource, e.g. a sentence in a text document. [=#REF_Example_3]Example 3: {{{#!xml }}} The most complex [#REF_Example_3 Example 3] is similar to [#REF_Example_2 Example 2], i.e. it shows a hit is encoded as one ''Generic Hits'' Data View in a Resource Fragment, which is embedded in a Resource. In contrast to Example 2, another Data View of type ''CMDI'' is embedded directly within the Resource. The Endpoint can use this type of Data View to directly provide CMDI metadata about the Resource to Clients. All entities of the Hit can be referenced by a persistent identifier and a URI. The complete Resource is referenceable by either the persistent identifier `http://hdl.handle.net/4711/08-15` or the URI `http://repos.example.org/file/text_08_15.html` and the CMDI metadata record in the CMDI Data View is referenceable either by the persistent identifier `http://hdl.handle.net/4711/08-15-1` or the URI `http://repos.example.org/file/08_15_1.cmdi`. The actual hit in the Resource Fragment is also directly referenceable by either the persistent identifier `http://hdl.handle.net/4711/00-15-2` or the URI `http://repos.example.org/file/text_08_15.html#sentence2`. ==== Data View A ''Data View'' serves as a container for encoding the actual search results (the data fragments relevant to search) within CLARIN-FCS. Data Views are designed to allow for different representations of results, i.e. they are deliberately kept open to allow further extensions with more supported Data View formats. This specification only defines a ''most basic'' Data View for representing search results, called ''Generic Hits'' (see below). More Data Views are defined in the supplementary specification [#REF_FCS_DataViews CLARIN-FCS-DataViews]. The content of a Data View is called ''Payload''. Each Payload is typed and the type of the Payload is recorded in the `@type` attribute of the `` element. The Payload type is identified by a MIME type ([#REF_RFC_6838 RFC6838], [#REF_RFC_3023 RFC3023]). If no existing MIME type can be used, implementers `SHOULD` define a proper private mime type. The Payload of a Data View can either be deposited ''inline'' or by ''reference''. In the case of ''inline'', it `MUST` be serialized as an XML fragment below the `` element. This is the preferred method for payloads that can easily be serialized in XML. Deposition by ''reference'' is meant for content that cannot easily be deposited inline, i.e. binary content (like images). In this case, the Data View `MUST` include a `@ref` or `@pid` attribute that links location for Clients to download the payload. This location `SHOULD` be ''openly accessible'', i.e. data can be downloaded freely without any need to perform a login. Data Views are classified into a ''send-by-default'' and a ''need-to-request'' delivery policy. In case of the ''send-by-default'' delivery policy, the Endpoint `MUST` send the Data View automatically, i.e. Endpoints `MUST` unconditionally include the Data View when they serialize a response to a search request. In the case of ''need-to-request'', the Client must explicitly request the Endpoint to include this Data View in the response. This enables the Endpoint to not generate and serialize Data Views that are "expensive" in terms of computational power or bandwidth for every response. To request such a Data View, a Client `MUST` submit a comma separated list of Data View identifiers (see section [#endpointDescription Endpoint Description]) in the `x-fcs-dataviews` extra request parameter with the ''searchRetrieve'' request. If a Client requests a Data View that is not valid for the search context, the Endpoint `MUST` generate a non-fatal diagnostic `http://clarin.eu/fcs/diagnostic/4` ("Requested Data View not valid for this resource"). The details field of the diagnostic `MUST` contain the MIME type of the Data View that was not valid. If more than one requested Data View is invalid, the Endpoint `MUST` generate a ''separate'' non-fatal diagnostic `http://clarin.eu/fcs/diagnostic/4` for each of the requested Data Views. The description of every Data View contains a recommendation as to how the Endpoint should handle the payload delivery, i.e. if a Data View is by default considered ''send-by-default'' or ''need-to-request''. Endpoint `MAY` choose to implement different policy. The relevant information which policy is implemented by an Endpoint for a specific Data View is part of the ''Endpoint Description'' (see section [#endpointDescription Endpoint Description]). For each Data View, a ''Recommended Short Identifier'' is defined, that Endpoint `SHOULD` use for an identifier of the Data View in the list of supported Data Views in the ''Endpoint Description'' The ''Generic Hits'' Data View is mandatory, thus all Endpoints `MUST` implement this it and provide search results represented in the ''Generic Hits'' Data View. Endpoints `MUST` implement the ''Generic Hits'' Data View with the ''send-by-default'' delivery policy. '''NOTE''': The examples in the following sections ''show only'' the payload with the enclosing `` element of a Data View. Of course, the Data View must be embedded either in a `` or a `` element. The `@pid` and `@ref` attributes have been omitted for all ''inline'' payload types. ===== Generic Hits (HITS) ||=Description =|| The representation of the hit || ||=MIME type =|| `application/x-clarin-fcs-hits+xml` || ||=Payload Disposition =|| ''inline'' || ||=Payload Delivery =|| ''send-by-default'' (`REQUIRED`) || ||=Recommended Short Identifier =|| `hits` (`RECOMMENDED`) || The ''Generic Hits'' Data View serves as the ''most basic'' agreement in CLARIN-FCS for serialization of search results and `MUST` be implemented by all Endpoints. In many cases, this Data View can only serve as an (lossy) approximation, because resources at Endpoints are very heterogeneous. For instance, the Generic Hits Data View is probably not the best representation for a hit result in a corpus of spoken language, but an architecture like CLARIN-FCS requires one common representation to be implemented by all Endpoints, therefore this Data View was defined. The Generic Hits Data View supports multiple markers for supplying highlighting for an individual hit, e.g. if a query contains a (boolean) conjunction, the Endpoint can use multiple markers to provide individual highlights for the matching terms. An Endpoint `MUST NOT` use this Data View to aggregate several hits within one resource. Each hit `SHOULD` be presented within the context of a complete sentence. If that is not possible due to the nature of the type of the resource, the Endpoint `MUST` provide an equivalent reasonable unit of context (e.g. within a phrase of an orthographic transcription of an utterance). The `` element within the `` element is not enforced by the XML schema, but Endpoints are `RECOMMENDED` to use it. The XML fragment of the Generic Hits payload `MUST` be valid according to the XML schema "[source:FederatedSearch/schema/DataView-Hits.xsd DataView-Hits.xsd]" ([source:FederatedSearch/schema/DataView-Hits.xsd?format=txt download]). * Example (single hit marker): {{{#!xml The quick brown fox jumps over the lazy dog. }}} * Example (multiple hit markers): {{{#!xml The quick brown fox jumps over the lazy dog. }}} ===== Advanced (ADV) ||=Description =|| The representation of the hit for Advanced Search || ||=MIME type =|| `application/x-clarin-fcs-adv+xml` || ||=Payload Disposition =|| ''inline'' || ||=Payload Delivery =|| ''send-by-default'' (`REQUIRED`) || ||=Recommended Short Identifier =|| `adv` (`RECOMMENDED`) || {{{ #!div style="border: 1px solid #000000; font-size: 75%" TODO: describe! }}} === Versioning and Extensions ==== Backwards Compatibility {{{ #!div style="border: 1px solid #000000; font-size: 75%" TODO: check and proof-read }}} Endpoints `SHOULD` also be compatible to CLARIN-FCS 1.0, thus support the SRU 1.2 protocol. Clients `MUST` be compatible to CLARIN-FCS 1.0, thus must implement SRU 1.2. If a Client uses CLARIN-FCS 1.0 to talk to an Endpoint, it `MUST NOT` use features beyond the Basic Search capability. Clients `MUST` implement a heuristic to automatically determine which CLARIN-FCS protocol version, i.e. which version of the SRU protocol, can be used talk an Endpoint. Pseudo algorithm for version detection heuristic: * Send ''explain'' request without `version` and `operation` parameter * Check SRU response for content of the element `/` ==== Endpoint Custom Extensions Endpoints can add custom extensions, i.e. custom data, to the Result Format. This extension mechanism can for example be used to provide hints for an (XSLT/XQuery) application that works directly on CLARIN-FCS, e.g. to allow it to generate back and forward links to navigate in a result set. An Endpoint `MAY` add arbitrary XML fragments to the extension hooks provided in the `` element (see the XML schema for "Resource.xsd"). The XML fragment for the extension `MUST` use a custom XML namespace name for the extension. Endpoints `MUST NOT` use XML namespace names that start with the prefixes `http://clarin.eu`, `http://www.clarin.eu/`, `https://clarin.eu` or `https://www.clarin.eu/`. A Client `MUST` ignore any custom extensions it does not understand and skip over these XML fragments when parsing the Endpoint's response. The non-normative appendix contains an [#extensionExample example], how an extension could be implemented. = CLARIN-FCS to SRU/CQL binding == SRU/CQL #sruCQL {{{ #!div style="border: 1px solid #000000; font-size: 75%" SRU 2.0 requirement }}} CLARIN-FCS uses SRU 2.0 (!Search/Retrieve via URL) as underlaying communication protocol. SRU specifies a general communication protocol for searching and retrieving records and CQL (Contextual Query Language) specifies extensible query language. SRU 2.0 allows using additional custom query languages. CLARIN-FCS Core 2.0 uses FCS-QL for the Advanced Search capability. Endpoints and Clients `MUST` implement the SRU/CQL protocol suite as defined in [#REF_SRU_Overview OASIS-SRU-Overview], [#REF_SRU_APD OASIS-SRU-APD], [#REF_CQL OASIS-CQL], [#REF_Explain SRU-Explain], [#REF_Scan SRU-Scan], especially with respect to: * Data Model, * Query Model, * Processing Model, * Result Set Model, and * Diagnostics Model Endpoints and Clients `MUST` implement the APD Binding for SRU 2.0, as defined in [#REF_SRU_20 OASIS-SRU-20]. \\ Clients `MUST` implement APD Binding for SRU 1.2, as defined in [#REF_SRU_12 OASIS-SRU-12]. \\ Endpoints `SHOULD` implement APD Binding for SRU 1.2, as defined in [#REF_SRU_12 OASIS-SRU-12]. \\ Endpoints and Clients `MAY` also implement APD binding for version 1.1. {{{ #!div style="border: 1px solid #000000; font-size: 75%" TODO: think about XML namespaces }}} Endpoints and Clients `MUST` use the following XML namespace names (namespace URIs) for serializing responses: * `http://www.loc.gov/zing/srw/` for SRU response documents, and * `http://www.loc.gov/zing/srw/diagnostic/` for diagnostics within SRU response documents. CLARIN-FCS deviates from the OASIS specification [#REF_SRU_Overview OASIS-SRU-Overview] and [#REF_SRU_12 OASIS-SRU-12] to ensure backwards comparability with SRU 1.2 services as they were defined by the [#REF_LOC_SRU_12 LOC-SRU12]. Endpoints or Clients `MUST` support CQL conformance ''Level 2'' (as defined in [#REF_OASIS_CQL OASIS-CQL, section 6]), i.e. be able to ''parse'' (Endpoints) or ''serialize'' (Clients) all of CQL and respond with appropriate error messages to the search/retrieve protocol interface. '''NOTE''': this does ''not imply'', that Endpoints are ''required'' to support all of CQL, but rather that they are able to ''parse'' all of CQL and generate the appropriate error message, if a query includes a feature they do not support. Endpoints `MUST` generate diagnostics according to [#REF_SRU_20 OASIS-SRU-20, Appendix D] for error conditions or to indicate unsupported features. Unfortunately, the OASIS specification does not provides a comprehensive list of diagnostics for CQL-related errors. Therefore, Endpoints `MUST` use diagnostics from [#REF_LOC_DIAG LOC-DIAG, section "Diagnostics Relating to CQL"] for CQL related errors. Endpoints `MUST` support the HTTP GET [#REF_SRU_20 OASIS-SRU-20, Appendix B.1] and HTTP POST [#REF_SRU_20 OASIS-SRU-20, Appendix B.2] lower level protocol binding. Endpoints `MAY` also support the SOAP [#REF_SRU_20 OASIS-SRU-20, Appendix B.3] binding. == Operation ''explain'' {{{ #!div style="border: 1px solid #000000; font-size: 75%" TODO: adjust for advanced stuff! }}} The ''explain'' operation of the SRU protocol serves to announce server capabilities and to allow clients to configure themselves automatically. This operation is used similarly. The Endpoint `MUST` respond to a ''explain'' request by a proper ''explain'' response. As per [#REF_Explain SRU-Explain], the response `MUST` contain one `` element that contains an ''SRU Explain'' record. The `` element `MUST` contain the literal `http://explain.z3950.org/dtd/2.0/`, i.e. the official ''identifier'' for Explain records. According to the Capabilities supported by the Endpoint the Explain record `MUST` contain the following elements: ''Basic-Search'' Capability:: `` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`) \\ `` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`) \\ `` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`). This element `MUST` contain an element `` with an `@identifier` attribute with a value of `http://clarin.eu/fcs/resource` and an `@name` attribute with a value of `fcs`. \\ `` is `OPTIONAL` \\ Other capabilities may define how the `` element is to be used, therefore it is `NOT RECOMMENDED` for Endpoints to use it in custom extensions. To support auto-configuration in CLARIN-FCS, the Endpoint `MUST` provide support ''Endpoint Description''. The Endpoint Description is included in explain response utilizing SRUs extension mechanism, i.e. by embedding an XML fragment into the `` element. The Endpoint `MUST` include the Endpoint Description ''only'' if the Client performs an explain request with the ''extra request parameter'' `x-fcs-endpoint-description` with a value of `true`. If the Client performs an explain request ''without'' supplying this extra request parameter the Endpoint `MUST NOT` include the Endpoint Description. The format of the Endpoint Description XML fragment is defined in [#endpointDescription Endpoint Description]. The following example shows a request and response to an ''explain'' request with added extra request parameter `x-fcs-endpoint-description`: * HTTP GET request: Client → Endpoint: {{{#!sh http://repos.example.org/fcs-endpoint?operation=explain&version=1.2&x-fcs-endpoint-description=true }}} * HTTP Response: Endpoint → Client: {{{#!xml 1.2 http://explain.z3950.org/dtd/2.0/ xml repos.example.org 80 fcs-endpoint Goethe Corpus Goethe Korpus Der Goethe Korpus des IDS Mannheim. The Goethe corpus of IDS Mannheim. CLARIN Federated Content Search 250 1000 1.2 http://repos.example.org/fcs-endpoint http://clarin.eu/fcs/capability/basic-search application/x-clarin-fcs-hits+xml Goethe Corpus Goethe Korpus Der Goethe Korpus des IDS Mannheim. The Goethe corpus of IDS Mannheim. http://repos.example.org/corpus1.html deu }}} == Operation ''scan'' The ''scan'' operation of the SRU protocol is currently not used in the ''Basic Search'' or ''Advanced Search'' capability of CLARIN-FCS. Future capabilities may use this operation, therefore it `NOT RECOMMENDED` for Endpoints to define custom extensions that use this operation. == Operation ''searchRetrieve'' {{{ #!div style="border: 1px solid #000000; font-size: 75%" Align with newly introduced section "Search Phase" \\ Define String for SRU query-lanaguge paramater ("fcs"? "clarin-fcs"?) }}} The ''searchRetrieve'' operation of the SRU protocol is used for searching in the Resources that are provided by the Endpoint. The SRU protocol defines the serialization of request and response formats in [#REF_SRU_20 OASIS-SRU-20] for SRU version 2.0 and [#REF_SRU_12 OASIS-SRU-12] for SRU version 1.2. An Endpoint `MUST` respond in the correct format, i.e. when Endpoint also supports SRU 1.2 and the request is issued in SRU version 1.2, the response must be encoded accordingly. In SRU, search result hits are encoded down to a record level, i.e. the `` element, and SRU allows records to be serialized in various formats, so called ''record schemas'' Endpoints `MUST` support the CLARIN-FCS record schema (see section [#resultFormat Result Format]) and `MUST` use the value `http://clarin.eu/fcs/resource` for the ''responseItemType'' ("record schema identifier"). Endpoints `MUST` represent exactly ''one hit'' within the Resource as one SRU record, i.e. `` element. The following example shows a request and response to a ''searchRetrieve'' request with a ''term-only'' query for "cat": * HTTP GET request: Client → Endpoint: {{{#!sh http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat }}} * HTTP Response: Endpoint → Client: {{{#!xml 1.2 6 http://clarin.eu/fcs/resource xml The quick brown cat jumps over the lazy dog. 1 1.2 cat cql.serverChoice = cat 1 http://repos.example.org/fcs-endpoint }}} In general, the Endpoint is `REQUIRED` to accept an ''unrestricted search'' and `SHOULD` perform the search operation on ''all'' Resources that are available at the Endpoint. If that is for some reason not feasible, e.g. performing an unrestricted search would allocate too many resources, the Endpoint `MAY` independently restrict the search to a scope that it can handle. If it does so, it `MUST` issue a non-fatal diagnostics `http://clarin.eu/fcs/diagnostic/2` ("Resource set too large. Query context automatically adjusted."). The details field of diagnostics `MUST` contain the persistent identifier of the resources to which the query scope was limited to. If the Endpoint limits the query scope to more than one resource, it `MUST` generate a ''separate'' non-fatal diagnostic `http://clarin.eu/fcs/diagnostic/2` for each of the resources. The Client can request the Endpoint to ''restrict the search'' to a sub-resource of these Resources. In this case, the Client `MUST` pass a comma-separated list of persistent identifiers in the `x-fcs-context` extra request parameter of the ''searchRetrieve'' request. The Endpoint `MUST` then restrict the search to those Resources, which are identified by the persistent identifiers passed by the Client. If a Client requests too many resources for the Endpoint to handle with `x-fcs-context`, the Endpoint `MAY` issue a fatal diagnostic `http://clarin.eu/fcs/diagnostic/3` ("Resource set too large. Cannot perform Query.") and terminate processing. Alternatively, the Endpoint `MAY` also automatically adjust the scope and issue a non-fatal diagnostic `http://clarin.eu/fcs/diagnostic/2` (see above). And Endpoint `MUST NOT` issue a `http://clarin.eu/fcs/diagnostic/3` diagnostic in response to a request, if a Client performed the request ''without'' the `x-fcs-context` extra request parameter. The Client can extract all valid persistent identifiers from the `@pid` attribute of the `` element, obtained by the ''explain'' request (see section [#explain Operation ''explain''] and section [#endpointDescription Endpoint Description]). The list of persistent identifiers can get extensive, but a Client can use the HTTP POST method instead of HTTP GET method for submitting the request. For example, to restrict the search to the Resource with the persistent identifier `http://hdl.handle.net/4711/0815` the Client must issue the following request: {{{#!sh http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-context=http://hdl.handle.net/4711/0815 }}} To restrict the search to the Resources with the persistent identifier `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816-2` the Client must issue the following request: {{{#!sh http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-context=http://hdl.handle.net/4711/0815,http://hdl.handle.net/4711/0816-2 }}} If an invalid persistent identifier is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/1` diagnostic, i.e. add the appropriate XML fragment to the `` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. just issue the diagnostic and perform no search, or it `MAY` treat it as non-fatal and perform the search. If a Client wants to request one or more Data Views, that are handled by Endpoint with the ''need-to-request'' delivery policy, it `MUST` pass a comma-separated list of ''Data View identifier'' in the `x-fcs-dataviews` extra request parameter of the 'searchRetrieve' request. A Client can extract valid values for the ''Data View identifiers'' from the `@id` attribute of the `` elements in the Endpoint Description of the Endpoint (see section [#explain Operation ''explain''] and section [#endpointDescription Endpoint Description]). For example, to request the CMDI Data View from an Endpoint that has an Endpoint Description, as described in [#REF_Example_5 Example 5], a Client would need to use the ''Data View identifier'' `cmdi` and submit the following request: {{{#!sh http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-dataviews=cmdi }}} If an invalid ''Data View identifier'' is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/4`diagnostic, i.e. add the appropriate XML fragment to the `` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. simply issue the diagnostic and perform no search, or it `MAY` treat it a non-fatal and perform the search. = Normative Appendix {{{ #!div style="border: 1px solid #000000; font-size: 75%" TODO: check and proof-read all sub-sections. }}} == List of extra request parameters The following extra request parameters are used in CLARIN-FCS. The column ''SRU operations'' lists the SRU operation, for which this extra request parameter is to be used. Clients `MUST NOT` use the parameter for an operation that is not listed in this column. However, if a Client sends an invalid parameter, an Endpoint `SHOULD` issue a fatal diagnostic "Unsupported Parameter" (`info:srw/diagnostic/1/8`) and stop processing the request. Alternatively, an Endpoint `MAY` silently ignore the invalid parameter. ||=Parameter Name =||=SRU operations =||=Allowed values =||=Description =|| || `x-fcs-endpoint-description` || explain || `true`; all other values are reserved and `MUST` not be used by Clients || If the parameter is given (with the value `true`), the Endpoint `MUST` include an Endpoint Description in the `` element of the ''explain'' response. || || `x-fcs-context` || searchRetrieve || A comma-separated list of persistent identifiers || The Endpoint `MUST` restrict the search to the resources identified by the persistent identifiers. || || `x-fcs-dataviews` || searchRetrieve || A comma-separated list of Data View identifiers || The Endpoint `SHOULD` include the additional ''need-to-request'' type Data Views in the response. || || `x-fcs-rewrites-allowed` || searchRetrieve || `true`; all other values are reserved and `MUST` not be used by Clients. \\ Clients `MUST` only use this parameter when performing an Advanced Search request. || If the parameter is given (with the value `true`), the Endpoint `MAY` rewrite the query to a simpler query to allow for more recall. || == List of diagnostics {{{ #!div style="border: 1px solid #000000; font-size: 75%" TODO: keep column if diagnostic is fatal or non-fatal? }}} Apart from the SRU diagnostics defined in [#REF_SRU_12 OASIS-SRU-12, Appendix C] and [#REF_LOC_DIAG LOC-DIAG], the following diagnostics are used in CLARIN-FCS. The column "Details Format" specifies what `SHOULD` be returned in the details field. If this column is blank, the format is "undefined" and the Endpoint `MAY` return whatever it feels appropriate, including nothing. The column "Impact" specifies, if the endpoint should continue ("non-fatal") or should stop ("fatal") processing. ||=Identifier URI =||=Description =||=Details Format =||=Impact =||=Note =|| || `http://clarin.eu/fcs/diagnostic/1` || Persistent identifier passed by the Client for restricting the search is invalid. || The offending persistent identifier. || non-fatal || If more than one invalid persistent identifiers were submitted by the Client, the Endpoint `MUST` generate a separate diagnostic for each invalid persistent identifier. || || `http://clarin.eu/fcs/diagnostic/2` || Resource set too large. Query context automatically adjusted. || The persistent identifier of the resource to which the query context was adjusted. || non-fatal || If an Endpoint limited the query context to more than one resource, it `MUST` generate a separate diagnostic for each resource to which the query context was adjusted. || || `http://clarin.eu/fcs/diagnostic/3` || Resource set too large. Cannot perform Query. || || fatal || || || `http://clarin.eu/fcs/diagnostic/4` || Requested Data View not valid for this resource. || The Data View MIME type. || non-fatal || If more than one invalid Data View was requested, the Endpoint `MUST` generate a separate diagnostic for each invalid Data View. || || `http://clarin.eu/fcs/diagnostic/10` || General query syntax error. || Detailed error message why the query could not be parsed. || fatal || Endpoints `MUST` use this diagnostic only if the Client performed an Advanced Search request. || || `http://clarin.eu/fcs/diagnostic/11` || Query too complex. Cannot perform Query. || Details why could not be performed, e.g. unsupported layer or unsupported combination of operators. || fatal || Endpoints `MUST` use this diagnostic only if the Client performed an Advanced Search request. || || `http://clarin.eu/fcs/diagnostic/12` || Query was rewritten. || Details how the query was rewritten. || non-fatal || Endpoints `MUST` use this diagnostic only if the Client performed an Advanced Search request with the `x-fcs-rewrites-allowed` request parameter. || || `http://clarin.eu/fcs/diagnostic/14` || General processing hint. || E.g. "No matches, because layer 'XY' is not available in your selection of resources" || non-fatal || Endpoints `MUST` use this diagnostic only if the Client performed an Advanced Search. || = Non-normative Appendix {{{ #!div style="border: 1px solid #000000; font-size: 75%" TODO: check and proof-read all sub-sections. }}} == Syntax variant for Handle system Persistent Identifier URIs Persistent Identifiers from the Handle system are defined in two syntax variants: a regular URI format for the Handle protocol, i.e. with a `hdl:` prefix, or ''actionable'' URIs with a `http://hdl.handle.net/` prefix. Generally, CLARIN software should support both syntax variants, therefore the CLARIN-FCS Interface Specification does not endorse a specific syntax variant. However, Endpoints are recommended to use the ''actionable'' syntax variant. == Referring to an Endpoint from a CMDI record Centers are encouraged to provide links to their CLARIN-FCS Endpoints in the metadata records for their resources. Other services, like the VLO, can use this information for automatically configuring an Aggregator for searching resources at the Endpoint. To refer to an Endpoint, a `` element with child-element `` set to the value `SearchService` and a `@mimetype` attribute with a value of `application/sru+xml` need to be added to the CMDI record. The content of the `` element must contain a URI that points to the Endpoint web service. Example: {{{#!xml http://hdl.handle.net/4711/0815 SearchService http://repos.example.org/fcs-endpoint }}} == Endpoint custom extensions #extensionExample The CLARIN-FCS protocol specification allows Endpoints to add custom data to their responses, e.g. to provide hints to an (XSLT/XQuery) application that works directly on CLARIN-FCS. It could use the custom data to generate back and forward links for a GUI to navigate in a result set. The following example illustrates how extensions can be embedded into the Result Format: {{{#!xml The quick brown fox jumps over the lazy dog. }}} == Endpoint highlight hints for repositories An Aggregator can use the `@ref` attributes of the ``, `` or `` elements to provide a link for the user to directly jump to the resource at a Repository. To support hit highlighting, an Endpoint can augment the URI in the `@ref` attribute with query parameters to implement hit highlighting in the Repository. In the following example, the URI `http://repos.example.org/resource.cgi/` is a CGI script that displays a given resource at the Repository in HTML format and uses the `highlight` query parameter to add highlights to the resource. Of course, it's up to the Endpoint and the Repository, if and how they implement such a feature. {{{#!xml The quick brown fox jumps over the lazy dog. }}}