Changes between Version 52 and Version 53 of Taskforces/FCS/FCS-Specification-Draft


Ignore:
Timestamp:
06/09/17 10:00:42 (7 years ago)
Author:
Leif-Jöran
Comment:

Reverted to version 50 since a lot of the formatting went bad.

Legend:

Unmodified
Added
Removed
Modified
  • Taskforces/FCS/FCS-Specification-Draft

    v52 v53  
    44}}}
    55[[PageOutline(1-6)]]
    6 
    7 = CLARIN Federated Content Search (CLARIN-FCS) - Core 2.0 =
    8 = Introduction =
    9 {{{
    10 #!div style="border: 1px solid #000000; font-size: 75%"
    11 TODO: Proof-read/Check sub-sections.
    12 }}}
     6= CLARIN Federated Content Search (CLARIN-FCS) - Core 2.0
     7
     8= Introduction
    139The goal of the ''CLARIN Federated Content Search (CLARIN-FCS) - Core'' specification is to introduce an ''interface specification'' that decouples the ''search engine'' functionality from its ''exploitation'', i.e. user-interfaces, third-party applications, and to allow services to access heterogeneous search engines in a uniform way.
    1410
    15 == Terminology ==
     11== Terminology
    1612The key words `MUST`, `MUST NOT`, `REQUIRED`, `SHALL`, `SHALL NOT`, `SHOULD`, `SHOULD NOT`, `RECOMMENDED`, `MAY`, and `OPTIONAL` in this document are to be interpreted as described in [#REF_RFC_2119 RFC2119].
    1713
    18 == Glossary ==
    19  Aggregator:: A module or service to dispatch queries to repositories and collect results.
    20 
    21  [=#REF_Annotation_Layer Annotation Layer]:: An annotation layer is the sum of possible annotations for a language resource, such as part of speech or orthographic transcription. Usually it is related to a given annotation task or topic. For the scope of the specification it is used as synonym for annotation tier.
    22 
    23  CLARIN-FCS, FCS:: CLARIN federated content search, an interface specification to allow searching within resource content of repositories.
    24 
    25  Client:: A software component, which implements the interface specification to query Endpoints, i.e. an aggregator or a user-interface.
    26 
    27  CQL:: Contextual Query Language, previously known as Common Query Language, is a domain specific language for representing queries to information retrieval systems such as search engines, bibliographic catalogs and museum collection information.
    28 
    29  Data View:: A Data View is a mechanism to support different representations of search results, e.g. a "hits with highlights" view, an image or a geolocation.
    30 
    31  Data View Payload, Payload:: The actual content encoded within a Data View, i.e. a CMDI metadata record or a KML encoded geolocation.
    32 
    33  Endpoint:: A software component, which implements the CLARIN-FCS interface specification and translates between CLARIN-FCS and a search engine.
    34 
    35  FCS-QL:: Federated Content Search Query Language is the query language used in the advanced CLARIN-FCS profile. It is derived from Corpus Workbench's [#REF_CQP_Tutorial CQP-TUTORIAL]
    36 
    37  Hit:: A piece of data returned by a Search Engine that matches the search criterion. What is considered a Hit highly depends on Search Engine.
    38 
    39  Interface Specification:: Common harmonized interface and suite of protocols that repositories need to implement.
    40 
    41  Layer:: See [#REF_Annotation_Layer "'Annotation Layer'"]
    42 
    43  PID:: A Persistent identifier is a long-lasting reference to a digital object.
    44 
    45  Repository:: A software component at a CLARIN center that stores resources (= data) and information about these resources (= metadata).
    46 
    47  Repository Registry:: A separate service that allows registering Repositories and their Endpoints and provides information about these to other components, e.g. an Aggregator. The [http://centres.clarin.eu/ CLARIN Center Registry] is an implementation of such a repository registry.
    48 
    49  Resource:: A searchable and addressable entity at an Endpoint, such as a text corpus or a multi-modal corpus.
    50 
    51  Resource Fragment:: A smaller unit in a Resource, i.e. a sentence in a text corpus or a time interval in an audio transcription.
    52 
    53  Result Set:: An (ordered) set of hits that match a search criterion produced by a search engine as the result of processing a query.
    54 
    55  Search Engine:: A software component within a repository, that allows for searching within the repository contents.
    56 
    57  SRU:: Search and Retrieve via URL, is a protocol for Internet search queries. Originally introduced by Library of Congress [#REF_LOC_SRU_12 LOC-SRU12], later standardization process moved to OASIS [#REF_SRU_12 OASIS-SRU12].
    58 
    59 == Normative References ==
    60  RFC2119[=#REF_RFC_2119]:: Key words for use in RFCs to Indicate Requirement Levels, IETF RFC 2119, March 1997, \\ [http://www.ietf.org/rfc/rfc2119.txt]
    61 
    62  XML-Namespaces[=#REF_XML_Namespaces]:: Namespaces in XML 1.0 (Third Edition), W3C, 8 December 2009, \\ [http://www.w3.org/TR/2009/REC-xml-names-20091208/]
    63 
    64  OASIS-SRU-Overview[=#REF_SRU_Overview]:: searchRetrieve: Part 0. Overview Version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.doc] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.html (HTML)], [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.pdf (PDF)]
    65 
    66  OASIS-SRU-APD[=#REF_SRU_APD]:: searchRetrieve: Part 1. Abstract Protocol Definition Version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.doc] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.html (HTML)] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.pdf (PDF)]
    67 
    68  OASIS-SRU12[=#REF_SRU_12]:: searchRetrieve: Part 2. SRU searchRetrieve Operation: APD Binding for SRU 1.2 Version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.doc] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.html (HTML)] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.pdf (PDF)]
    69 
    70  OASIS-SRU20[=#REF_SRU_20]:: searchRetrieve: Part 3. SRU searchRetrieve Operation: APD Binding for SRU 2.0 Version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part3-sru2.0/searchRetrieve-v1.0-os-part3-sru2.0.doc] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part3-sru2.0/searchRetrieve-v1.0-os-part3-sru2.0.html (HTML)] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part3-sru2.0/searchRetrieve-v1.0-os-part3-sru2.0.pdf (PDF)]
    71 
    72  OASIS-CQL[=#REF_CQL]:: searchRetrieve: Part 5. CQL: The Contextual Query Language version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.doc] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.html (HTML)] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.pdf (PDF)]
    73 
    74  SRU-Explain[=#REF_Explain]:: searchRetrieve: Part 7. SRU Explain Operation version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.doc] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.html (HTML)] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.pdf (PDF)]
    75 
    76  SRU-Scan[=#REF_Scan]:: searchRetrieve: Part 6. SRU Scan Operation version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.doc]   [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.html (HTML)] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.PDF (PDF)]
    77 
    78  LOC-SRU12[=#REF_LOC_SRU_12]:: SRU Version 1.2: SRU !Search/Retrieve Operation, Library of Congress, \\ [http://www.loc.gov/standards/sru/sru-1-2.html]
    79 
    80  LOC-DIAG[=#REF_LOC_DIAG]:: SRU Version 1.2: SRU Diagnostics List, Library of Congress,\\ [http://www.loc.gov/standards/sru/diagnostics/diagnosticsList.html]
    81 
    82  UD-POS[=#REF_UD_POS]:: Universal Dependencies, Universal POS tags v2.0, \\ [https://universaldependencies.github.io/u/pos/index.html]
    83 
    84  SAMPA[=#REF_SAMPA]:: Dafydd Gibbon, Inge Mertins, Roger Moore (Eds.): Handbook of Multimodal and Spoken Language Systems. Resources, Terminology and Product Evaluation, Kluwer Academic Publishers, Boston MA, 2000, ISBN 0-7923-7904-7
    85 
    86  CLARIN-FCS-!DataViews[=#REF_FCS_DataViews]:: CLARIN Federated Content Search (CLARIN-FCS) - Data Views, SCCTC FCS Task-Force, April 2014, \\ [https://trac.clarin.eu/wiki/FCS/Dataviews]
    87 
    88 == Non-Normative References ==
    89  CQP-TUTORIAL[=#REF_CQP_Tutorial]:: Evert et al.: The IMS Open Corpus Workbench (CWB) CQP Query Language Tutorial, CWB Version 3.0, February 2010, \\ [http://cwb.sourceforge.net/files/CQP_Tutorial/]
    90 
    91  RFC6838[=#REF_RFC_6838]:: Media Type Specifications and Registration Procedures, IETF RFC 6838, January 2013, \\ [http://www.ietf.org/rfc/rfc6838.txt]
    92 
    93  RFC3023[=#REF_RFC_3023]:: XML Media Types, IETF RFC 3023, January 2001, \\ [http://www.ietf.org/rfc/rfc3023.txt]
    94 
    95 == Typographic and XML Namespace conventions ==
     14== Glossary
     15 Aggregator::
     16    A module or service to dispatch queries to repositories and collect results.
     17
     18 [=#REF_Annotation_Layer Annotation Layer]::
     19    An annotation layer is the sum of possible annotations for a language resource, such as part of speech or orthographic transcription. Usually it is related to a given annotation task or topic. For the scope of the specification it is used as synonym for annotation tier.
     20
     21 CLARIN-FCS, FCS::
     22    CLARIN federated content search, an interface specification to allow searching within resource content of repositories.
     23
     24 Client::
     25    A software component, which implements the interface specification to query Endpoints, i.e. an aggregator or a user-interface.     
     26
     27 CQL::
     28    Contextual Query Language, previously known as Common Query Language, is a domain specific language for representing queries to information retrieval systems such as search engines, bibliographic catalogs and museum collection information.
     29
     30 Data View::
     31    A Data View is a mechanism to support different representations of search results, e.g. a "hits with highlights" view, an image or a geolocation.
     32
     33 Data View Payload, Payload::
     34    The actual content encoded within a Data View, i.e. a CMDI metadata record or a KML encoded geolocation.
     35
     36 Endpoint::
     37    A software component, which implements the CLARIN-FCS interface specification and translates between CLARIN-FCS and a search engine.
     38
     39 FCS-QL::
     40    Federated Content Search Query Language is the query language used in the advanced CLARIN-FCS profile. It is derived from Corpus Workbench's [#REF_CQP_Tutorial CQP-TUTORIAL]
     41
     42 Hit::
     43    A piece of data returned by a Search Engine that matches the search criterion. What is considered a Hit highly depends on Search Engine.
     44
     45 Interface Specification::
     46    Common harmonized interface and suite of protocols that repositories need to implement.
     47
     48 Layer::
     49    See [#REF_Annotation_Layer ''Annotation Layer'']
     50
     51 PID::
     52    A Persistent identifier is a long-lasting reference to a digital object.
     53
     54 Repository::
     55    A software component at a CLARIN center that stores resources (= data) and information about these resources (= metadata).
     56
     57 Repository Registry::
     58    A separate service that allows registering Repositories and their Endpoints and provides information about these to other components, e.g. an Aggregator. The [http://centres.clarin.eu/ CLARIN Center Registry] is an implementation of such a repository registry.
     59
     60 Resource::
     61    A searchable and addressable entity at an Endpoint, such as a text corpus or a multi-modal corpus.
     62
     63 Resource Fragment::
     64    A smaller unit in a Resource, i.e. a sentence in a text corpus or a time interval in an audio transcription.
     65
     66 Result Set::
     67    An (ordered) set of hits that match a search criterion produced by a search engine as the result of processing a query.
     68
     69 Search Engine::
     70    A software component within a repository, that allows for searching within the repository contents.
     71
     72 SRU::
     73    Search and Retrieve via URL, is a protocol for Internet search queries. Originally introduced by Library of Congress [#REF_LOC_SRU_12 LOC-SRU12], later standardization process moved to OASIS [#REF_SRU_12 OASIS-SRU12].
     74
     75== Normative References
     76 RFC2119[=#REF_RFC_2119]::
     77    Key words for use in RFCs to Indicate Requirement Levels, IETF RFC 2119, March 1997, \\
     78    [http://www.ietf.org/rfc/rfc2119.txt]
     79
     80 XML-Namespaces[=#REF_XML_Namespaces]::
     81    Namespaces in XML 1.0 (Third Edition), W3C, 8 December 2009, \\
     82    [http://www.w3.org/TR/2009/REC-xml-names-20091208/]
     83
     84 OASIS-SRU-Overview[=#REF_SRU_Overview]::
     85    searchRetrieve: Part 0. Overview Version 1.0, OASIS, January 2013, \\
     86    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.doc]
     87    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.html (HTML)],
     88    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.pdf (PDF)]
     89
     90 OASIS-SRU-APD[=#REF_SRU_APD]::
     91    searchRetrieve: Part 1. Abstract Protocol Definition Version 1.0, OASIS, January 2013, \\
     92    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.doc]
     93    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.html (HTML)]
     94    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.pdf (PDF)]
     95
     96 OASIS-SRU12[=#REF_SRU_12]::
     97    searchRetrieve: Part 2. SRU searchRetrieve Operation: APD Binding for SRU 1.2 Version 1.0, OASIS, January 2013, \\
     98    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.doc]
     99    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.html (HTML)]
     100    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.pdf (PDF)]
     101
     102 OASIS-SRU20[=#REF_SRU_20]::
     103    searchRetrieve: Part 3. SRU searchRetrieve Operation: APD Binding for SRU 2.0 Version 1.0, OASIS, January 2013, \\
     104    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part3-sru2.0/searchRetrieve-v1.0-os-part3-sru2.0.doc]
     105    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part3-sru2.0/searchRetrieve-v1.0-os-part3-sru2.0.html (HTML)]
     106    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part3-sru2.0/searchRetrieve-v1.0-os-part3-sru2.0.pdf (PDF)]
     107
     108 OASIS-CQL[=#REF_CQL]::
     109    searchRetrieve: Part 5. CQL: The Contextual Query Language version 1.0, OASIS, January 2013, \\
     110    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.doc]
     111    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.html (HTML)]
     112    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.pdf (PDF)]
     113
     114 SRU-Explain[=#REF_Explain]::
     115    searchRetrieve: Part 7. SRU Explain Operation version 1.0, OASIS, January 2013, \\
     116    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.doc]
     117    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.html (HTML)]
     118    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.pdf (PDF)]
     119
     120 SRU-Scan[=#REF_Scan]::
     121    searchRetrieve: Part 6. SRU Scan Operation version 1.0, OASIS, January 2013, \\
     122    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.doc] 
     123    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.html (HTML)]
     124    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.PDF (PDF)]
     125
     126 LOC-SRU12[=#REF_LOC_SRU_12]::
     127    SRU Version 1.2: SRU !Search/Retrieve Operation, Library of Congress, \\
     128    [http://www.loc.gov/standards/sru/sru-1-2.html]
     129
     130 LOC-DIAG[=#REF_LOC_DIAG]::
     131    SRU Version 1.2: SRU Diagnostics List, Library of Congress,\\
     132    [http://www.loc.gov/standards/sru/diagnostics/diagnosticsList.html]
     133
     134 UD-POS[=#REF_UD_POS]::
     135    Universal Dependencies, Universal POS tags v2.0, \\
     136    [https://universaldependencies.github.io/u/pos/index.html]
     137
     138 SAMPA[=#REF_SAMPA]::
     139    Dafydd Gibbon, Inge Mertins, Roger Moore (Eds.): Handbook of Multimodal and Spoken Language Systems. Resources, Terminology and Product Evaluation, Kluwer Academic Publishers, Boston MA, 2000, ISBN 0-7923-7904-7
     140
     141 CLARIN-FCS-!DataViews[=#REF_FCS_DataViews]::
     142    CLARIN Federated Content Search (CLARIN-FCS) - Data Views, SCCTC FCS Task-Force, April 2014, \\
     143    [https://trac.clarin.eu/wiki/FCS/Dataviews]
     144
     145== Non-Normative References
     146 CQP-TUTORIAL[=#REF_CQP_Tutorial]::
     147    Evert et al.: The IMS Open Corpus Workbench (CWB) CQP Query Language Tutorial, CWB Version 3.0, February 2010, \\
     148    [http://cwb.sourceforge.net/files/CQP_Tutorial/]
     149
     150 RFC6838[=#REF_RFC_6838]::
     151    Media Type Specifications and Registration Procedures, IETF RFC 6838, January 2013, \\
     152    [http://www.ietf.org/rfc/rfc6838.txt]
     153
     154 RFC3023[=#REF_RFC_3023]::
     155    XML Media Types, IETF RFC 3023, January 2001, \\
     156    [http://www.ietf.org/rfc/rfc3023.txt]
     157
     158== Typographic and XML Namespace conventions
    96159The following typographic conventions for XML fragments will be used throughout this specification:
    97 
    98160 * `<prefix:Element>` \\ An XML element with the Generic Identifier ''Element'' that is bound to an XML namespace denoted by the prefix ''prefix''.
    99161 * `@attr` \\ An XML attribute with the name ''attr''
    100 
    101162{{{#!comment
    102 
    103163 * `@prefix:attr` \\ An XML attribute with the name ''attr'' that is bound to an XML namespaces denoted by the prefix ''prefix''.
    104 
    105 }}}
    106 
     164}}}
    107165 * `string` \\ The literal ''string'' must be used either as element content or attribute value.
    108 
    109166Endpoints and Clients `MUST` adhere to the [#REF_XML_Namespaces XML-Namespaces] specification. The CLARIN-FCS interface specification generally does not dictate whether XML elements should be serialized in their prefixed or non-prefixed syntax, but Endpoints `MUST` ensure that the correct XML namespace is used for elements and that XML namespaces are declared correctly. Clients `MUST` be agnostic regarding syntax for serializing the XML elements, i.e. if the prefixed or un-prefixed variant was used, and `SHOULD` operate solely on ''expanded names'', i.e. pairs of ''namespace name'' and ''local name''.
    110167
    111168The following XML namespace names and prefixes are used throughout this specification. The column "Recommended Syntax" indicates which syntax variant `SHOULD` be used by the Endpoint to serialize the XML response.
    112 
    113 ||=Prefix =||=Namespace Name =||=Comment =||=Recommended Syntax =||
    114 || `fcs` || `http://clarin.eu/fcs/resource` || CLARIN-FCS Resources || prefixed ||
    115 || `ed` || `http://clarin.eu/fcs/endpoint-description` || CLARIN-FCS Endpoint Description || prefixed ||
    116 || `hits` || `http://clarin.eu/fcs/dataview/hits` || CLARIN-FCS Generic Hits Data View || prefixed ||
    117 || `adv` || `http://clarin.eu/fcs/dataview/advanced` || CLARIN-FCS Advanced Data View || prefixed ||
    118 || `sru` || `http://docs.oasis-open.org/ns/search-ws/sruResponse` || SRU Version 2.0 || prefixed ||
    119 || `diag` || `http://docs.oasis-open.org/ns/search-ws/diagnostic` || SRU Version 2.0 Diagnostics || prefixed ||
    120 || `zr` || `http://explain.z3950.org/dtd/2.0/` || SRU/ZeeRex Explain || prefixed ||
    121 || `sru` || `http://www.loc.gov/zing/srw/` || SRU Version 1.2, ''only compatibility mode'' || prefixed ||
    122 || `diag` || `http://www.loc.gov/zing/srw/diagnostic/` || SRU Version 1.2 Diagnostics, ''only compatibility mode'' || prefixed ||
    123 
    124 = CLARIN-FCS Interface Specification =
     169||=Prefix =||=Namespace Name                                       =||=Comment                                                 =||=Recommended Syntax =||
     170|| `fcs`   || `http://clarin.eu/fcs/resource`                       || CLARIN-FCS Resources                                     || prefixed ||
     171|| `ed`    || `http://clarin.eu/fcs/endpoint-description`           || CLARIN-FCS Endpoint Description                          || prefixed ||
     172|| `hits`  || `http://clarin.eu/fcs/dataview/hits`                  || CLARIN-FCS Generic Hits Data View                        || prefixed ||
     173|| `adv`   || `http://clarin.eu/fcs/dataview/advanced`              || CLARIN-FCS Advanced Data View                            || prefixed ||
     174|| `sru`   || `http://docs.oasis-open.org/ns/search-ws/sruResponse` || SRU Version 2.0                                          || prefixed ||
     175|| `diag`  || `http://docs.oasis-open.org/ns/search-ws/diagnostic`  || SRU Version 2.0 Diagnostics                              || prefixed ||
     176|| `zr`    || `http://explain.z3950.org/dtd/2.0/`                   || SRU/ZeeRex Explain                                       || prefixed ||
     177|| `sru`   || `http://www.loc.gov/zing/srw/`                        || SRU Version 1.2, ''only compatibility mode''             || prefixed ||
     178|| `diag`  || `http://www.loc.gov/zing/srw/diagnostic/`             || SRU Version 1.2 Diagnostics, ''only compatibility mode'' || prefixed ||
     179
     180= CLARIN-FCS Interface Specification
    125181The CLARIN-FCS Interface Specification defines a set of capabilities, an extensible result format and a set of required operations. CLARIN-FCS is built on the SRU/CQL standard and additional functionality required for CLARIN-FCS is added through SRU/CQL's extension mechanisms.
    126182
    127183Specifically, the CLARIN-FCS Interface Specification consists of two parts, a set of formats, and a transport protocol. The ''Endpoint'' component is a software component that acts as a bridge between a ''Client'' and a ''Search Engine'' and passes the requests sent by the ''Client'' to the ''Search Engine''. The ''Search Engine'' is a custom software component that allows the search of language resources in a Repository. The ''Endpoint'' implements the ''Transport Protocol'' and acts as a mediator between the CLARIN-FCS specific formats and the idiosyncrasies of ''Search Engines'' of the individual Repositories. The following figure illustrates the overall architecture:
    128 
    129184{{{
    130185                   +---------+
     
    157212In general, the work flow in CLARIN-FCS is as follows: a Client submits a query to an Endpoint. The Endpoint translates the query from CQL or FCS-QL to the query dialect used by the Search Engine and submits the translated query to the Search Engine. The Search Engine processes the query and generates a result set, i.e. it compiles a set of hits that match the search criterion. The Endpoint then translates the results from the Search Engine-specific result set format to the CLARIN-FCS result format and sends them to the Client.
    158213
    159 == Discovery == #Discovery
     214== Discovery #Discovery
    160215The ''Discovery'' step allows a Client to gather information about an Endpoint, in particular which capabilities are supported or which resources are available for searching.
    161216
    162 === Capabilities ===
     217=== Capabilities
    163218A ''Capability'' defines a certain feature set that is part of CLARIN-FCS, e.g. what kind of queries are supported. Each Endpoint implements some (or all) of these Capabilities. The Endpoint will announce the capabilities it provides to allow a Client to auto-tune itself (see section [#endpointDescription Endpoint Description]). Each Capability is identified by a ''Capability Identifier'', which uses the URI syntax. The following Capabilities are defined in CLARIN-FCS:
    164 
    165 ||=Name =||=Capability Identifier =||=Summary =||
    166 || ''Basic Search'' || `http://clarin.eu/fcs/capability/basic-search` || Simple full-text searching ||
     219||=Name               =||=Capability Identifier                            =||=Summary                                      =||
     220|| ''Basic Search''    || `http://clarin.eu/fcs/capability/basic-search`    || Simple full-text searching                    ||
    167221|| ''Advanced Search'' || `http://clarin.eu/fcs/capability/advanced-search` || Searching in structured and/or annotated data ||
    168222
    169223Endpoints `MUST` implement the ''Basic Search'' Capability. Endpoints `MUST NOT` invent custom Capability Identifiers and `MUST` only use the values defined above.
    170224
    171 === Endpoint Description === #endpointDescription
     225
     226=== Endpoint Description #endpointDescription
    172227{{{
    173228#!div style="border: 1px solid #000000; font-size: 75%"
     
    177232
    178233The XML fragment for ''Endpoint Description'' is encoded as an `<ed:EndpointDescription>` element, that contains the following attributes and children:
    179 
    180234 * one `@version` attribute (`REQUIRED`) on the `<ed:EndpointDescription>` element. The value of the `@version` attribute `MUST` be `2`.
    181  * one `<ed:Capabilities>` element (`REQUIRED`) that contains one or more `<ed:Capability>` elements \\ The content of the `<ed:Capability>` element is a Capability Identifier, that indicates the capabilities, that are supported by the Endpoint. For valid values for the Capability Identifier, see section [#capabilities Capabilities]. This list `MUST NOT` include duplicate values.
    182  * one `<ed:SupportedDataViews>` element (`REQUIRED`) \\ A list of Data Views that are supported by this Endpoint. This list is composed of one or more `<ed:SupportedDataView>` elements. The content of a `<ed:SupportedDataView>` `MUST` be the MIME type of a supported Data View, e.g. `application/x-clarin-fcs-hits+xml`. Each `<ed:SupportedDataView>` element `MUST` carry a `@id` and a `@delivery-policy` attribute. The value of the `@id` attribute is later used in the `<ed:Resource>` element to indicate, which Data View is supported by a resource (see below). Endpoints `SHOULD` use the recommended short identifier for the Data View. The `@delivery-policy` indicates, the Endpoint's delivery policy, for that Data View. Valid values are `send-by-default` for the ''send-by-default'' and `need-to-request` for the ''need-to-request'' delivery policy. \\ This list `MUST NOT` include duplicate entries, i.e. no MIME type must appear more than once. \\ The value of the `@id` attribute `MUST NOT` contain the characters `,` (comma) or `;` (semicolon)
    183  * one `<ed:SupportedLayers>` element (`REQUIRED` if Endpoint supports ''Advanced Search'' capability) \\ A list of Layers that are generally supported by this Endpoint. This list is composed of one or more `<ed:SupportedLayer>` elements. The content of a `<ed:SupportedLayer>` `MUST` be the identifier of a Layer (see [#layers section "Layers"]), e.g. `orth`. Each `<ed:SupportedLayer>` element `MUST` carry an `@id` and a `@delivery-policy` attribute. The value of the `@id` attribute is later used in the `<ed:Resource>` element to indicate, which Data View is supported by a resource (see below). The `@result-id` attribute is used in the Advanced Data View (see [#advancedDataView section "Advanced Data View"]). Each `<ed:SupportedLayer>` element `MAY` carry an optional `@qualifier` attribute. It is used a a qualifier in a FCS-QL search term in to address this specific layer. \\ This list `MUST NOT` include duplicate entries, i.e. no Layer with the same `@result-id` MIME type must appear more than once. \\ The value of the `@id` or `@result-id` attribute `MUST NOT` contain the characters `,` (comma) or `;` (semicolon) The value of the `@qualifier` attribute `MUST NOT` contain characters other than `a`-`z`,`A`-`Z`,`0`-`9` and `-` (hyphen). The `<ed:SupportedLayer>` element `MAY` carry an `@alt-value-info` and `@alt-value-info-uri` attribute; `@alt-value-info` `SHOULD` contain a sort description about the layer, e.g. the original tag set used; `@alt-value-info-uri` `MUST` contain a well-formed URI and `SHOULD` point to a web site with further information, e.g. about the original tag set and how the translation to FCS is done. Client, e.g. the Aggregator, can display this information together with the search result.
    184  * one `<ed:Resources>` element (`REQUIRED`) \\ A list of (top-level) resources that are available, i.e. searchable, at the Endpoint. The `<ed:Resources>` element contains one or more `<ed:Resource>` elements (see below). The Endpoint `MUST` declare at least one (top-level) resource.
     235 * one `<ed:Capabilities>` element (`REQUIRED`) that contains one or more `<ed:Capability>` elements \\
     236   The content of the `<ed:Capability>` element is a Capability Identifier, that indicates the capabilities, that are supported by the Endpoint. For valid values for the Capability Identifier, see section [#capabilities Capabilities]. This list `MUST NOT` include duplicate values.
     237 * one `<ed:SupportedDataViews>` element (`REQUIRED`) \\
     238   A list of Data Views that are supported by this Endpoint. This list is composed of one or more `<ed:SupportedDataView>` elements. The content of a `<ed:SupportedDataView>` `MUST` be the MIME type of a supported Data View, e.g. `application/x-clarin-fcs-hits+xml`. Each `<ed:SupportedDataView>` element `MUST` carry a `@id` and a `@delivery-policy` attribute. The value of the `@id` attribute is later used in the `<ed:Resource>` element to indicate, which Data View is supported by a resource (see below). Endpoints `SHOULD` use the recommended short identifier for the Data View. The `@delivery-policy` indicates, the Endpoint's delivery policy, for that Data View. Valid values are `send-by-default` for the ''send-by-default'' and `need-to-request` for the ''need-to-request'' delivery policy. \\
     239   This list `MUST NOT` include duplicate entries, i.e. no MIME type must appear more than once. \\
     240   The value of the `@id` attribute `MUST NOT` contain the characters `,` (comma) or `;` (semicolon)
     241 * one `<ed:SupportedLayers>` element (`REQUIRED` if Endpoint supports ''Advanced Search'' capability) \\
     242   A list of Layers that are generally supported by this Endpoint. This list is composed of one or more `<ed:SupportedLayer>` elements. The content of a `<ed:SupportedLayer>` `MUST` be the identifier of a Layer (see [#layers section "Layers"]), e.g. `orth`. Each `<ed:SupportedLayer>` element `MUST` carry an `@id` and a `@delivery-policy` attribute. The value of the `@id` attribute is later used in the `<ed:Resource>` element to indicate, which Data View is supported by a resource (see below). The `@result-id` attribute is used in the Advanced Data View (see [#advancedDataView section "Advanced Data View"]). Each `<ed:SupportedLayer>` element `MAY` carry an optional `@qualifier` attribute. It is used a a qualifier in a FCS-QL search term in to address this specific layer. \\
     243   This list `MUST NOT` include duplicate entries, i.e. no Layer with the same `@result-id` MIME type must appear more than once. \\
     244   The value of the `@id` or `@result-id` attribute `MUST NOT` contain the characters `,` (comma) or `;` (semicolon)
     245   The value of the `@qualifier` attribute `MUST NOT` contain characters other than `a`-`z`,`A`-`Z`,`0`-`9` and `-` (hyphen).
     246   The `<ed:SupportedLayer>` element `MAY` carry an `@alt-value-info` and `@alt-value-info-uri` attribute; `@alt-value-info` `SHOULD` contain a sort description about the layer, e.g. the original tag set used; `@alt-value-info-uri` `MUST` contain a well-formed URI and `SHOULD` point to a web site with further information, e.g. about the original tag set and how the translation to FCS is done. Client, e.g. the Aggregator, can display this information together with the search result.
     247 * one `<ed:Resources>` element (`REQUIRED`) \\
     248   A list of (top-level) resources that are available, i.e. searchable, at the Endpoint. The `<ed:Resources>` element contains one or more `<ed:Resource>` elements (see below). The Endpoint `MUST` declare at least one (top-level) resource.
    185249
    186250The `<ed:Resource>` element contains a basic description of a resource that is available at the Endpoint. A resource is a searchable entity, e.g. a single corpus. The `<ed:Resources>` has a mandatory `@pid` attribute that contains persistent identifier of the resource. This value `MUST` be the same as the ''!MdSelfLink'' of the CMDI record describing the resource. The `<ed:Resources>` element contains the following children:
    187 
    188  * one or more `<ed:Title>` elements (`REQUIRED`) \\ A human readable title for the resource. A `REQUIRED` `@xml:lang` attribute indicates the language of the title. An English version of  the title is `REQUIRED`. The list of titles `MUST NOT` contain duplicate entries for the same language.
    189  * zero or more `<ed:Description>` elements (`OPTIONAL`) \\ An optional human-readable description of the resource. It `SHOULD` be at most one sentence. A `REQUIRED` `@xml:lang` attribute indicates the language of the description. If supplied, an English version of the description is `REQUIRED`. The list of descriptions `MUST NOT` contain duplicate entries for the same language.
    190  * zero or one `<ed:LandingPageURI>` element (`OPTIONAL`) \\ A link to a website for the resource, e.g. a landing page for a resource, i.e. a web-site that describes a corpus.
    191  * one `<ed:Languages>` element (`REQUIRED`) \\ The (relevant) languages available within the resource. The `<ed:Languages>` element contains one or more `<ed:Language>` elements. The content of a `<ed:Language>` element `MUST` be a ISO 639-3 three letter language code. This element should be repeated for all languages (relevant) available ''within'' the resource, however this list `MUST NOT` contain duplicate entries.
    192  * one `<ed:AvailableDataViews>` element (`REQUIRED`) \\ The Data Views that are available for the resource. The `<ed:AvailableDataViews>` element `MUST` carry a `@ref` attribute, that contains a whitespace separated list of id values, that correspond to value of the appropriate `@id` attribute for the `<ed:SupportedDataView>` elements that are referenced. \\ In case of sub-resources, each Resource `SHOULD` support all Data Views that are supported by the parent resource. However, every resource `MUST` declare all available Data Views independently, i.e. there is no implicit inheritance semantic.
    193  * one `<ed:AvailableLayers>` element (`REQUIRED` if Endpoint supports ''Advanced Search'' capability). The `<ed:AvailableLayers>` element `MUST` carry a `@ref` attribute, that contains a whitespace separated list of id values, that correspond to the value of the appropriate `@id` attribute for the `<ed:SupportedLayer>` elements that are referenced. \\ In case of sub-resources, each Resource `SHOULD` support all Layers that are supported by the parent resource. However, every resource `MUST` declare all available Layers independently, i.e. there is no implicit inheritance semantic.
    194  * zero or one `<ed:Resources>` element (`OPTIONAL`) \\ If a resource has searchable sub-resources, the Endpoint `MUST` supply additional finer grained resource elements, which are wrapped in a `<ed:Resources>` element. A sub-resource is a searchable entity within a resource, e.g. a sub-corpus.
    195 
    196 [=#REF_Example_4]Example 4: {{{#!xml <ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="2">
    197 
    198   <ed:Capabilities>
    199     <ed:Capability> http://clarin.eu/fcs/capability/basic-search </ed:Capability >
    200   </ed:Capabilities > <ed:SupportedDataViews>
    201     <ed:SupportedDataView id="hits" delivery-policy="send-by-default"> application/x-clarin-fcs-hits+xml</ed:SupportedDataView >
    202   </ed:SupportedDataViews > <ed:Resources>
    203     <!-- just one top-level resource at the Endpoint --> <ed:Resource pid="http://hdl.handle.net/4711/0815">
    204       <ed:Title xml:lang="de"> Goethe Korpus</ed:Title > <ed:Title xml:lang="en"> Goethe corpus</ed:Title > <ed:Description xml:lang="de"> Der Goethe Korpus des IDS Mannheim.</ed:Description > <ed:Description xml:lang="en"> The Goethe corpus of IDS Mannheim.</ed:Description > <ed:LandingPageURI> http://repos.example.org/corpus1.html </ed:LandingPageURI > <ed:Languages>
    205         <ed:Language> deu</ed:Language >
    206       </ed:Languages > <ed:AvailableDataViews ref="hits" />
    207     </ed:Resource >
    208   </ed:Resources >
    209 
    210 </ed:EndpointDescription> }}} [#REF_Example_4 Example 4] shows a simple Endpoint Description for an Endpoint that only supports the ''Basic Search'' Capability and only provides the Generic Hits Data View, which is indicated by a `<ed:SupportedDataView>` element. This element carries a `@id` attribute with a value of `hits`, the recommended value for the short identifier, and indicates a delivery policy of ''send-by-default'' by the `@delivery-policy` attribute. It only provides one top-level resource identified by the persistent identifier `http://hdl.handle.net/4711/0815`. The resource has a title as well as a description in German and English. A landing page is located at `http://repos.example.org/corpus1.html`. The predominant language in the resource contents is German. Only the Generic Hits Data View is supported for this resource, because the `<ed:AvailableDataViews>` element only references the `<ed:SupporedDataView>` element with the `@id` with a value of `hits`.
    211 
    212 [=#REF_Example_5]Example 5: {{{#!xml <ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="2">
    213 
    214   <ed:Capabilities>
    215     <ed:Capability> http://clarin.eu/fcs/capability/basic-search </ed:Capability >
    216   </ed:Capabilities > <ed:SupportedDataViews>
    217     <ed:SupportedDataView id="hits" delivery-policy="send-by-default"> application/x-clarin-fcs-hits+xml</ed:SupportedDataView > <ed:SupportedDataView id="cmdi" delivery-policy="need-to-request"> application/x-cmdi+xml</ed:SupportedDataView >
    218   </ed:SupportedDataViews > <ed:Resources>
    219     <!-- top-level resource 1 --> <ed:Resource pid="http://hdl.handle.net/4711/0815">
    220       <ed:Title xml:lang="de"> Goethe Korpus</ed:Title > <ed:Title xml:lang="en"> Goethe corpus</ed:Title > <ed:Description xml:lang="de"> Der Goethe Korpus des IDS Mannheim.</ed:Description > <ed:Description xml:lang="en"> The Goethe corpus of IDS Mannheim.</ed:Description > <ed:LandingPageURI> http://repos.example.org/corpus1.html </ed:LandingPageURI > <ed:Languages>
    221         <ed:Language> deu</ed:Language >
    222       </ed:Languages > <ed:AvailableDataViews ref="hits" />
    223     </ed:Resource > <!-- top-level resource 2 --> <ed:Resource pid="http://hdl.handle.net/4711/0816">
    224       <ed:Title xml:lang="de"> Zeitungskorpus des Mannheimer Morgen</ed:Title > <ed:Title xml:lang="en"> Mannheimer Morgen newspaper corpus</ed:Title > <ed:LandingPageURI> http://repos.example.org/corpus2.html </ed:LandingPageURI > <ed:Languages>
    225         <ed:Language> deu</ed:Language >
    226       </ed:Languages > <ed:AvailableDataViews ref="hits cmdi" /> <ed:Resources>
    227         <!-- sub-resource 1 of top-level resource 2 --> <ed:Resource pid="http://hdl.handle.net/4711/0816-1">
    228           <ed:Title xml:lang="de"> Zeitungskorpus des Mannheimer Morgen (vor 1990)</ed:Title > <ed:Title xml:lang="en"> Mannheimer Morgen newspaper corpus (before 1990)</ed:Title > <ed:LandingPageURI> http://repos.example.org/corpus2.html#sub1 </ed:LandingPageURI > <ed:Languages>
    229             <ed:Language> deu</ed:Language >
    230           </ed:Languages > <ed:AvailableDataViews ref="hits cmdi" />
    231         </ed:Resource > <!-- sub-resource 2 of top-level resource 2 --> <ed:Resource pid="http://hdl.handle.net/4711/0816-2">
    232           <ed:Title xml:lang="de"> Zeitungskorpus des Mannheimer Morgen (nach 1990)</ed:Title > <ed:Title xml:lang="en"> Mannheimer Morgen newspaper corpus (after 1990)</ed:Title > <ed:LandingPageURI> http://repos.example.org/corpus2.html#sub2 </ed:LandingPageURI > <ed:Languages>
    233             <ed:Language> deu</ed:Language >
    234           </ed:Languages > <ed:AvailableDataViews ref="hits cmdi" />
    235         </ed:Resource >
    236       </ed:Resources >
    237     </ed:Resource >
    238   </ed:Resources >
    239 
    240 </ed:EndpointDescription> }}} The more complex [#REF_Example_5 Example 5] show an Endpoint Description for an Endpoint that, similar to [#REF_Example_4 Example 4], supports the ''Basic Search'' capability. In addition to the Generic Hits Data View, it also supports the CMDI Data View. The delivery polices are ''send-by-default'' for the Generic Hits Data View and ''need-to-request'' for the CMDI Data View. The Endpoint has two top-level resources (identified by the persistent identifiers `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816`. The second top-level resource has two independently searchable sub-resources, identified by the persistent identifier `http://hdl.handle.net/4711/0816-1` and `http://hdl.handle.net/4711/0816-2`. All resources are described using several properties, like title, description, etc. The first top-level resource provides only the Generic Hits Data View, while the other top-level resource including its children provide the Generic Hits and the CMDI Data Views.
    241 
    242 [=#REF_Example_6]Example 6: {{{#!xml <ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="2">
    243 
    244   <ed:Capabilities>
    245     <ed:Capability> http://clarin.eu/fcs/capability/basic-search </ed:Capability > <ed:Capability> http://clarin.eu/fcs/capability/advanced-search </ed:Capability >
    246   </ed:Capabilities > <ed:SupportedDataViews>
    247     <ed:SupportedDataView id="hits" delivery-policy="send-by-default"> application/x-clarin-fcs-hits+xml</ed:SupportedDataView >
    248   </ed:SupportedDataViews > <!-- ADV-FCS --> <SupportedLayers >
    249     <SupportedLayer  id="l1" result-id="http://endpoint.example.org/Layers/orth1 ">orth</SupportedLayer > <SupportedLayer  id="l2" result-id="http://endpoint.example.org/Layers/pos1 " qualifier="x">pos</SupportedLayer > <SupportedLayer  id="l3" result-id="http://endpoint.example.org/Layers/pos2 " qualifier="y"
    250       alt-value-info="STTS tagset" alt-value-info-uri="http://repos.example.org/tagset_doc.html ">pos</SupportedLayer >
    251     <SupportedLayer  id="l4" result-id="http://endpoint.example.org/Layers/word " type="empty">word</SupportedLayer > <SupportedLayer  id="l5" result-id="http://endpoint.example.org/Layers/lemma1 ">lemma</SupportedLayer >
    252   </SupportedLayers >
    253 
    254   <ed:Resources>
    255     <!-- just one top-level resource at the Endpoint --> <ed:Resource pid="http://hdl.handle.net/4711/0815">
    256       <ed:Title xml:lang="de"> Goethe Korpus</ed:Title > <ed:Title xml:lang="en"> Goethe corpus</ed:Title > <ed:Description xml:lang="de"> Der Goethe Korpus des IDS Mannheim.</ed:Description > <ed:Description xml:lang="en"> The Goethe corpus of IDS Mannheim.</ed:Description > <ed:LandingPageURI> http://repos.example.org/corpus1.html </ed:LandingPageURI > <ed:Languages>
    257         <ed:Language> deu</ed:Language >
    258       </ed:Languages > <ed:AvailableDataViews ref="hits" /> <AvailableLayers  ref="l1 l2 l3 l4 l5" />
    259     </ed:Resource >
    260   </ed:Resources >
    261 
    262 </ed:EndpointDescription> }}}
    263 
     251 * one or more `<ed:Title>` elements (`REQUIRED`) \\
     252   A human readable title for the resource. A `REQUIRED` `@xml:lang` attribute indicates the language of the title. An English version of  the title is `REQUIRED`. The list of titles `MUST NOT` contain duplicate entries for the same language.
     253 * zero or more `<ed:Description>` elements (`OPTIONAL`) \\
     254   An optional human-readable description of the resource. It `SHOULD` be at most one sentence. A `REQUIRED` `@xml:lang` attribute indicates the language of the description. If supplied, an English version of the description is `REQUIRED`. The list of descriptions `MUST NOT` contain duplicate entries for the same language.
     255 * zero or one `<ed:LandingPageURI>` element (`OPTIONAL`) \\
     256   A link to a website for the resource, e.g. a landing page for a resource, i.e. a web-site that describes a corpus.
     257 * one `<ed:Languages>` element (`REQUIRED`) \\
     258   The (relevant) languages available within the resource. The `<ed:Languages>` element contains one or more `<ed:Language>` elements. The content of a `<ed:Language>` element `MUST` be a ISO 639-3 three letter language code. This element should be repeated for all languages (relevant) available ''within'' the resource, however this list `MUST NOT` contain duplicate entries.
     259 * one `<ed:AvailableDataViews>` element (`REQUIRED`) \\
     260   The Data Views that are available for the resource. The `<ed:AvailableDataViews>` element `MUST` carry a `@ref` attribute, that contains a whitespace separated list of id values, that correspond to value of the appropriate `@id` attribute for the `<ed:SupportedDataView>` elements that are referenced. \\
     261   In case of sub-resources, each Resource `SHOULD` support all Data Views that are supported by the parent resource. However, every resource `MUST` declare all available Data Views independently, i.e. there is no implicit inheritance semantic.
     262 * one `<ed:AvailableLayers>` element (`REQUIRED` if Endpoint supports ''Advanced Search'' capability). The `<ed:AvailableLayers>` element `MUST` carry a `@ref` attribute, that contains a whitespace separated list of id values, that correspond to the value of the appropriate `@id` attribute for the `<ed:SupportedLayer>` elements that are referenced. \\
     263   In case of sub-resources, each Resource `SHOULD` support all Layers that are supported by the parent resource. However, every resource `MUST` declare all available Layers independently, i.e. there is no implicit inheritance semantic.
     264* zero or one `<ed:Resources>` element (`OPTIONAL`) \\
     265  If a resource has searchable sub-resources, the Endpoint `MUST` supply additional finer grained resource elements, which are wrapped in a `<ed:Resources>` element. A sub-resource is a searchable entity within a resource, e.g. a sub-corpus.
     266
     267[=#REF_Example_4]Example 4:
     268{{{#!xml
     269<ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="2">
     270    <ed:Capabilities>
     271        <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability>
     272    </ed:Capabilities>
     273    <ed:SupportedDataViews>
     274        <ed:SupportedDataView id="hits" delivery-policy="send-by-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView>
     275    </ed:SupportedDataViews>
     276    <ed:Resources>
     277        <!-- just one top-level resource at the Endpoint -->
     278        <ed:Resource pid="http://hdl.handle.net/4711/0815">
     279            <ed:Title xml:lang="de">Goethe Korpus</ed:Title>
     280            <ed:Title xml:lang="en">Goethe corpus</ed:Title>
     281            <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description>
     282            <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description>
     283            <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI>
     284            <ed:Languages>
     285                <ed:Language>deu</ed:Language>
     286            </ed:Languages>
     287            <ed:AvailableDataViews ref="hits" />
     288        </ed:Resource>
     289    </ed:Resources>
     290</ed:EndpointDescription>
     291}}}
     292[#REF_Example_4 Example 4] shows a simple Endpoint Description for an Endpoint that only supports the ''Basic Search'' Capability and only provides the Generic Hits Data View, which is indicated by a `<ed:SupportedDataView>` element. This element carries a `@id` attribute with a value of `hits`, the recommended value for the short identifier, and indicates a delivery policy of ''send-by-default'' by the `@delivery-policy` attribute. It only provides one top-level resource identified by the persistent identifier `http://hdl.handle.net/4711/0815`. The resource has a title as well as a description in German and English. A landing page is located at `http://repos.example.org/corpus1.html`. The predominant language in the resource contents is German. Only the Generic Hits Data View is supported for this resource, because the `<ed:AvailableDataViews>` element only references the `<ed:SupporedDataView>` element with the `@id` with a value of `hits`.
     293
     294[=#REF_Example_5]Example 5:
     295{{{#!xml
     296<ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="2">
     297    <ed:Capabilities>
     298        <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability>
     299    </ed:Capabilities>
     300    <ed:SupportedDataViews>
     301        <ed:SupportedDataView id="hits" delivery-policy="send-by-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView>
     302        <ed:SupportedDataView id="cmdi" delivery-policy="need-to-request">application/x-cmdi+xml</ed:SupportedDataView>
     303    </ed:SupportedDataViews>
     304    <ed:Resources>
     305        <!-- top-level resource 1 -->
     306        <ed:Resource pid="http://hdl.handle.net/4711/0815">
     307            <ed:Title xml:lang="de">Goethe Korpus</ed:Title>
     308            <ed:Title xml:lang="en">Goethe corpus</ed:Title>
     309            <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description>
     310            <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description>
     311            <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI>
     312            <ed:Languages>
     313                <ed:Language>deu</ed:Language>
     314            </ed:Languages>
     315            <ed:AvailableDataViews ref="hits" />
     316        </ed:Resource>
     317        <!-- top-level resource 2 -->
     318        <ed:Resource pid="http://hdl.handle.net/4711/0816">
     319            <ed:Title xml:lang="de">Zeitungskorpus des Mannheimer Morgen</ed:Title>
     320            <ed:Title xml:lang="en">Mannheimer Morgen newspaper corpus</ed:Title>
     321            <ed:LandingPageURI>http://repos.example.org/corpus2.html</ed:LandingPageURI>
     322            <ed:Languages>
     323                <ed:Language>deu</ed:Language>
     324            </ed:Languages>
     325            <ed:AvailableDataViews ref="hits cmdi" />
     326            <ed:Resources>
     327                <!-- sub-resource 1 of top-level resource 2 -->
     328                <ed:Resource pid="http://hdl.handle.net/4711/0816-1">
     329                    <ed:Title xml:lang="de">Zeitungskorpus des Mannheimer Morgen (vor 1990)</ed:Title>
     330                    <ed:Title xml:lang="en">Mannheimer Morgen newspaper corpus (before 1990)</ed:Title>
     331                    <ed:LandingPageURI>http://repos.example.org/corpus2.html#sub1</ed:LandingPageURI>
     332                    <ed:Languages>
     333                        <ed:Language>deu</ed:Language>
     334                    </ed:Languages>
     335                    <ed:AvailableDataViews ref="hits cmdi" />
     336                </ed:Resource>
     337                <!-- sub-resource 2 of top-level resource 2 -->
     338                <ed:Resource pid="http://hdl.handle.net/4711/0816-2">
     339                    <ed:Title xml:lang="de">Zeitungskorpus des Mannheimer Morgen (nach 1990)</ed:Title>
     340                    <ed:Title xml:lang="en">Mannheimer Morgen newspaper corpus (after 1990)</ed:Title>
     341                    <ed:LandingPageURI>http://repos.example.org/corpus2.html#sub2</ed:LandingPageURI>
     342                    <ed:Languages>
     343                        <ed:Language>deu</ed:Language>
     344                    </ed:Languages>
     345                    <ed:AvailableDataViews ref="hits cmdi" />
     346                </ed:Resource>
     347            </ed:Resources>
     348        </ed:Resource>
     349    </ed:Resources>
     350</ed:EndpointDescription>
     351}}}
     352The more complex [#REF_Example_5 Example 5] show an Endpoint Description for an Endpoint that, similar to [#REF_Example_4 Example 4], supports the ''Basic Search'' capability. In addition to the Generic Hits Data View, it also supports the CMDI Data View. The delivery polices are ''send-by-default'' for the Generic Hits Data View and ''need-to-request'' for the CMDI Data View. The Endpoint has two top-level resources (identified by the persistent identifiers `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816`. The second top-level resource has two independently searchable sub-resources, identified by the persistent identifier `http://hdl.handle.net/4711/0816-1` and `http://hdl.handle.net/4711/0816-2`. All resources are described using several properties, like title, description, etc. The first top-level resource provides only the Generic Hits Data View, while the other top-level resource including its children provide the Generic Hits and the CMDI Data Views.
     353
     354[=#REF_Example_6]Example 6:
     355{{{#!xml
     356<ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="2">
     357    <ed:Capabilities>
     358        <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability>
     359        <ed:Capability>http://clarin.eu/fcs/capability/advanced-search</ed:Capability>
     360    </ed:Capabilities>
     361    <ed:SupportedDataViews>
     362        <ed:SupportedDataView id="hits" delivery-policy="send-by-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView>
     363    </ed:SupportedDataViews>
     364    <!-- ADV-FCS -->
     365    <SupportedLayers>
     366        <SupportedLayer id="l1" result-id="http://endpoint.example.org/Layers/orth1">orth</SupportedLayer>
     367        <SupportedLayer id="l2" result-id="http://endpoint.example.org/Layers/pos1" qualifier="x">pos</SupportedLayer>
     368        <SupportedLayer id="l3" result-id="http://endpoint.example.org/Layers/pos2" qualifier="y"
     369            alt-value-info="STTS tagset"
     370            alt-value-info-uri="http://repos.example.org/tagset_doc.html">pos</SupportedLayer>
     371        <SupportedLayer id="l4" result-id="http://endpoint.example.org/Layers/word" type="empty">word</SupportedLayer>
     372        <SupportedLayer id="l5" result-id="http://endpoint.example.org/Layers/lemma1">lemma</SupportedLayer>
     373    </SupportedLayers>
     374
     375    <ed:Resources>
     376        <!-- just one top-level resource at the Endpoint -->
     377        <ed:Resource pid="http://hdl.handle.net/4711/0815">
     378            <ed:Title xml:lang="de">Goethe Korpus</ed:Title>
     379            <ed:Title xml:lang="en">Goethe corpus</ed:Title>
     380            <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description>
     381            <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description>
     382            <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI>
     383            <ed:Languages>
     384                <ed:Language>deu</ed:Language>
     385            </ed:Languages>
     386            <ed:AvailableDataViews ref="hits" />
     387            <AvailableLayers ref="l1 l2 l3 l4 l5" />
     388        </ed:Resource>
     389    </ed:Resources>
     390</ed:EndpointDescription>
     391}}}
    264392{{{
    265393#!div style="border: 1px solid #000000; font-size: 75%"
    266394TODO: describe the above example
    267395}}}
    268 == Searching ==
     396
     397== Searching
    269398In the ''Searching'' step the Client performs the actual search request to a previously [#Discovery discovered] Endpoint.
    270399
    271 === Basic Search === #basicSearch
     400=== Basic Search #basicSearch
    272401The ''Basic Search'' capability provides simple full-text search. Queries in Basic Search `MUST` be performed in the ''Contextual Query Language'' ([#REF_CQL OASIS-CQL]). The Endpoint `MUST` support ''term-only''  queries.  The Endpoint `SHOULD` support ''terms'' combined with boolean operator queries (''AND'' and ''OR''), including sub-queries. An Endpoint `MAY` also support ''NOT'' or ''PROX'' operator queries. If an Endpoint does not support a query, i.e. the used operators are not supported by the Endpoint, it `MUST` return an appropriate error message using the appropriate SRU diagnostic ([#REF_LOC_DIAG LOC-DIAG]).
    273402
     
    275404
    276405Examples of valid CQL queries for Basic Search are:
    277 
    278406{{{
    279407cat
     
    285413cat AND (mouse OR "lazy dog")
    286414}}}
    287 '''NOTE''': In CQL, a ''term'' can be a single token or a phrase, i.e. tokens separated by spaces. If a single ''term'' contains spaces, it needs to be quoted. \\ '''NOTE''': Endpoints `MUST` be able to parse all of CQL. If they don't support a certain CQL feature, they `MUST` generate an appropriate error message (see section [#sruCQL SRU/CQL]). Especially, if an Endpoint ''only'' supports ''Basic Search'', it `MUST NOT` silently accept queries that include CQL features besides ''term-only'' and ''terms'' combined with boolean operator queries, i.e. queries involving context sets, etc.
    288 
    289 === Advanced Search ===
     415
     416'''NOTE''': In CQL, a ''term'' can be a single token or a phrase, i.e. tokens separated by spaces. If a single ''term'' contains spaces, it needs to be quoted. \\
     417'''NOTE''': Endpoints `MUST` be able to parse all of CQL. If they don't support a certain CQL feature, they `MUST` generate an appropriate error message (see section [#sruCQL SRU/CQL]). Especially, if an Endpoint ''only'' supports ''Basic Search'', it `MUST NOT` silently accept queries that include CQL features besides ''term-only'' and ''terms'' combined with boolean operator queries, i.e. queries involving context sets, etc.
     418
     419=== Advanced Search
    290420The ''Advanced Search'' capability allows searching in annotated data, that is represented in annotation layers. An annotation ''layer'' contains annotations of a specific type, e.g. lemma or part-of-speech layer. Queries can be across annotation layer.
    291421
    292422CLARIN-FCS defined a set of searchable annotation layers with certain semantics and syntax. Endpoints `SHOULD` support as many different, of course depending on the resource type, annotation layers as possible.
    293423
    294 ==== Layers ==== #layers
     424==== Layers #layers
    295425Each Layer is assumed to be ''segmented'', e.g. to allow for searching for a single lemma. However, CLARIN-FCS does not endorse a specific segmentation, i.e. the segmentation of Layers is in the domain of the Endpoint and ''opaque'' to CLARIN-FCS. CLARIN-FCS '''does not''' endorse nor assume a ''formal linguistic relation'' or ''formal linguistic hierarchy'' between two items on two different layers.
    296426
    297 ||=Layer Type Identifier =||=Annotation Layer Description =||=Syntax =||=Examples (without quotes) =||
    298 || `text` || Textual representation of resource, also the layer that is used in [#basicSearch Basic Search] || ''String'' || "Dog", "cat" "walking", "better" ||
    299 || `lemma` || Lemmatisation || ''String'' || "good", "walk", "dog" ||
    300 || `pos` || Part-of-Speech annotations || [#REF_UD_POS Universal POS tags] || "NOUN", "VERB", "ADJ" ||
    301 || `orth` || Orthographic transcription of (mostly) spoken resources || ''String'' || "dug", "cat", "wolking" ||
    302 || `norm` || Orthographic normalization of (mostly) spoken resources || ''String'' || "dog", "cat", "walking", "best" ||
    303 || `phonetic` || Phonetic transcription || [#REF_SAMPA SAMPA] || "'du:", "'vi:-d6 'ha:-b@n" ||
    304 
    305 The column ''Layer Type Identifier'' denotes the identifier for a layer. It is used in [#fcsQL FCS-QL] queries and the XML serialization for the [#advancedDataView Advanced Data View]. All valid identifiers are defined in the table above, all other identifiers are reserved and `MUST NOT` be used. Clients and Endpoints `MAY` create custom Layer Type Identifiers, e.g. for testing proposed. If they so so, the custom Layer Type identifiers `MUST` start with the String `x-`, e.g. `x-customLayer`. The column ''Syntax'' describes the inventory of symbols that a Client `MUST` use with a corresponding annotation layer; the value ''String'' denotes that symbols are arbitrary Unicode Strings, i.e. no fixed inventory of symbols are defined. An Endpoint `SHOULD` provide an appropriate error, if a Client used an invalid value.
    306 
    307 ==== FCS-QL ==== #fcsQL
     427||=Layer Type Identifier =||=Annotation Layer Description                                =||=Syntax                          =||=Examples (without quotes)           =||
     428|| `text`                 || Textual representation of resource, also the layer that is used in [#basicSearch Basic Search] || ''String''                       || "Dog", "cat" "walking", "better"                ||
     429|| `lemma`                || Lemmatisation                                               || ''String''                       || "good", "walk", "dog"             ||
     430|| `pos`                  || Part-of-Speech annotations                                  || [#REF_UD_POS Universal POS tags] || "NOUN", "VERB", "ADJ"                ||
     431|| `orth`                 || Orthographic transcription of (mostly) spoken resources     || ''String''                       || "dug", "cat", "wolking"              ||
     432|| `norm`                 || Orthographic normalization of (mostly) spoken resources     || ''String''                       || "dog", "cat", "walking", "best"              ||
     433|| `phonetic`             || Phonetic transcription                                      || [#REF_SAMPA SAMPA]               || "'du:", "'vi:-d6 'ha:-b@n"           ||
     434
     435The column ''Layer Type Identifier'' denotes the identifier for a layer. It is used in [#fcsQL FCS-QL] queries and the XML serialization for the [#advancedDataView Advanced Data View]. All valid identifiers are defined in the table above, all other identifiers are reserved and `MUST NOT` be used. Clients and Endpoints `MAY` create custom Layer Type Identifiers, e.g. for testing proposed. If they so so, the custom Layer Type identifiers `MUST` start with the String `x-`, e.g. `x-customLayer`.
     436The column ''Syntax'' describes the inventory of symbols that a Client `MUST` use with a corresponding annotation layer; the value ''String'' denotes that symbols are arbitrary Unicode Strings, i.e. no fixed inventory of symbols are defined. An Endpoint `SHOULD` provide an appropriate error, if a Client used an invalid value.
     437
     438==== FCS-QL #fcsQL
    308439{{{
    309440#!div style="border: 1px solid #000000; font-size: 75%"
     
    315446
    316447Examples of valid FCS-QL queries for ''Advanced Search'' are:
    317 
    318448{{{
    319449"walking"
     
    329459[z:pos = "ADJ" & q:pos = "ADJ"]
    330460}}}
    331 The qualifiers ''z'' in ''z:pos'' and ''q'' in ''q:pos'' `SHOULD` match an available qualifier attribute value in a ''pos''-`SupportedLayer` in a discovered ''EndpointDescripion''.
    332 
    333 '''NOTE''': Endpoints supporting ''Advanced Search'' `MUST` be able to parse all of FCS-QL. If they don't support a certain FCS-QL feature, they `MUST` generate an appropriate error message (see section [#sruCQL SRU/CQL]). If an Endpoint ''only'' supports ''Basic Search'', it `MUST NOT` silently accept queries that include FCS-QL features.\\ '''NOTE''': FCS-QL layer identifiers are reserved. The Endpoint `MUST` prepend the local prefix `x-` to any identifier used outside of the reserved set, e.g., `x-customLayer` for a local identifier `customLayer`.
    334 
    335 === Result Format ===
     461
     462The qualifiers ''z'' in ''z:pos'' and ''q'' in ''q:pos'' `SHOULD` match an available qualifier attribute value in a ''pos''-{{{SupportedLayer}}} in a discovered ''EndpointDescripion''.
     463
     464
     465'''NOTE''': Endpoints supporting ''Advanced Search'' `MUST` be able to parse all of FCS-QL. If they don't support a certain FCS-QL feature, they `MUST` generate an appropriate error message (see section [#sruCQL SRU/CQL]). If an Endpoint ''only'' supports ''Basic Search'', it `MUST NOT` silently accept queries that include FCS-QL features.\\
     466'''NOTE''': FCS-QL layer identifiers are reserved. The Endpoint `MUST` prepend the local prefix {{{x-}}} to any identifier used outside of the reserved set, e.g., {{{x-customLayer}}} for a local identifier {{{customLayer}}}.
     467
     468
     469=== Result Format
    336470{{{
    337471#!div style="border: 1px solid #000000; font-size: 75%"
     
    342476CLARIN-FCS uses a customized format for returning results. ''Resource'' and ''Resource Fragments'' serve as containers for hit results, which are presented in one or more ''Data View''. The following section describes the Resource format and Data View format and section [#searchRetrieve Operation ''searchRetrieve''] will describe, how hits are embedded within SRU responses.
    343477
    344 ==== Resource and !ResourceFragment ====
     478==== Resource and !ResourceFragment
    345479To encode search results, CLARIN-FCS supports two building blocks:
    346 
    347  Resources:: A ''Resource'' is a ''searchable'' and ''addressable'' entity at the Endpoint, such as a text corpus or a multi-modal corpus. A resource `SHOULD` be a self-contained unit, i.e. not a single sentence in a text corpus or a time interval in an audio transcription, but rather a complete document from a text corpus or a complete audio transcription.
    348  Resource Fragments:: A ''Resource Fragment'' is a smaller unit in a ''Resource'', i.e. a sentence in a text corpus or a time interval in an audio transcription.
     480 Resources::
     481   A ''Resource'' is a ''searchable'' and ''addressable'' entity at the Endpoint, such as a text corpus or a multi-modal corpus. A resource `SHOULD` be a self-contained unit, i.e. not a single sentence in a text corpus or a time interval in an audio transcription, but rather a complete document from a text corpus or a complete audio transcription.
     482 Resource Fragments::
     483   A ''Resource Fragment'' is a smaller unit in a ''Resource'', i.e. a sentence in a text corpus or a time interval in an audio transcription.
    349484
    350485A Resource `SHOULD` be the most precise unit of data that is directly addressable as a "whole". A Resource `SHOULD` contain a Resource Fragment, if the hit consists of just a part of the Resource unit (for example if the hit is a sentence within a large text). A Resource Fragment `SHOULD` be addressable within a resource, i.e. it has an offset or a resource-internal identifier. Using Resource Fragments is `OPTIONAL`, but Endpoints are encouraged to use them. If the Endpoint encodes a hit with a Resource Fragment, the actual hit `SHOULD` be encoded as a Data View that within the Resource Fragment.
     
    362497Endpoints `MAY` serialize hits as multiple Data Views, however they `MUST` provide the Generic Hits (HITS) Data View either encoded as a Resource Fragment (if applicable), or otherwise within the Resource (if there is no reasonable Resource Fragment). Other Data Views `SHOULD` be put in a place that is logical for their content (as is to be determined by the Endpoint), e.g. a metadata Data View would most likely be put directly below Resource and a Data View representing some annotation layers directly around the hit is more likely to belong within a Resource Fragment.
    363498
    364 [=#REF_Example_1]Example 1: {{{#!xml <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/00-15">
    365 
     499[=#REF_Example_1]Example 1:
     500{{{#!xml
     501<fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/00-15">
    366502  <fcs:DataView type="application/x-clarin-fcs-hits+xml">
    367     <!-- data view payload omitted -->
    368   </fcs:DataView >
    369 
    370 </fcs:Resource> }}} [#REF_Example_1 Example 1] shows a simple hit, which is encoded in one Data View of type ''Generic Hits'' embedded within a Resource. The type of the Data View is identified by the MIME type `application/x-clarin-fcs-hits+xml`. The Resource is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`.
    371 
    372 [=#REF_Example_2]Example 2: {{{#!xml <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/08-15">
    373 
     503      <!-- data view payload omitted -->
     504  </fcs:DataView>
     505</fcs:Resource>
     506}}}
     507[#REF_Example_1 Example 1] shows a simple hit, which is encoded in one Data View of type ''Generic Hits'' embedded within a Resource. The type of the Data View is identified by the MIME type `application/x-clarin-fcs-hits+xml`. The Resource is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`.
     508
     509[=#REF_Example_2]Example 2:
     510{{{#!xml
     511<fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/08-15">
    374512  <fcs:ResourceFragment>
    375513    <fcs:DataView type="application/x-clarin-fcs-hits+xml">
    376514      <!-- data view payload omitted -->
    377     </fcs:DataView >
    378   </fcs:ResourceFragment >
    379 
    380 </fcs:Resource> }}} [#REF_Example_2 Example 2] shows a hit encoded as a Resource Fragment embedded within a Resource. The actual hit is again encoded as one Data View of type ''Generic Hits''. The hit is not directly referenceable, but the Resource, in which hit occurred, is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`. In contrast to [#REF_Example_1 Example 1], the Endpoint decided to provide a "semantically richer" encoding and embedded the hit using a Resource Fragment within the Resource to indicate that the hit is a part of a larger resource, e.g. a sentence in a text document.
    381 
    382 [=#REF_Example_3]Example 3: {{{#!xml <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource"
    383 
    384   pid="http://hdl.handle.net/4711/08-15 " ref="http://repos.example.org/file/text_08_15.html ">
    385 
    386 <fcs:DataView type="application/x-cmdi+xml"
    387 
    388   pid="http://hdl.handle.net/4711/08-15-1 " ref="http://repos.example.org/file/08_15_1.cmdi ">
    389 
    390 <!-- data view payload omitted -->
    391 
    392   </fcs:DataView > <fcs:ResourceFragment pid="http://hdl.handle.net/4711/08-15-2" ref="http://repos.example.org/file/text_08_15.html#sentence2">
     515    </fcs:DataView>
     516  </fcs:ResourceFragment>
     517</fcs:Resource>
     518}}}
     519[#REF_Example_2 Example 2] shows a hit encoded as a Resource Fragment embedded within a Resource. The actual hit is again encoded as one Data View of type ''Generic Hits''. The hit is not directly referenceable, but the Resource, in which hit occurred, is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`. In contrast to [#REF_Example_1 Example 1], the Endpoint decided to provide a "semantically richer" encoding and embedded the hit using a Resource Fragment within the Resource to indicate that the hit is a part of a larger resource, e.g. a sentence in a text document.
     520
     521[=#REF_Example_3]Example 3:
     522{{{#!xml
     523<fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource"
     524              pid="http://hdl.handle.net/4711/08-15" ref="http://repos.example.org/file/text_08_15.html">
     525  <fcs:DataView type="application/x-cmdi+xml"
     526                pid="http://hdl.handle.net/4711/08-15-1" ref="http://repos.example.org/file/08_15_1.cmdi">
     527      <!-- data view payload omitted -->
     528  </fcs:DataView>
     529  <fcs:ResourceFragment pid="http://hdl.handle.net/4711/08-15-2" ref="http://repos.example.org/file/text_08_15.html#sentence2">
    393530    <fcs:DataView type="application/x-clarin-fcs-hits+xml">
    394531      <!-- data view payload omitted -->
    395     </fcs:DataView >
    396   </fcs:ResourceFragment >
    397 
    398 </fcs:Resource> }}} The more complex [#REF_Example_3 Example 3] is similar to [#REF_Example_2 Example 2], i.e. it shows a hit is encoded as one ''Generic Hits'' Data View in a Resource Fragment, which is embedded in a Resource. In contrast to Example 2, another Data View of type ''CMDI'' is embedded directly within the Resource. The Endpoint can use this type of Data View to directly provide CMDI metadata about the Resource to Clients. All entities of the Hit can be referenced by a persistent identifier and a URI. The complete Resource is referenceable by either the persistent identifier `http://hdl.handle.net/4711/08-15` or the URI `http://repos.example.org/file/text_08_15.html` and the CMDI metadata record in the CMDI Data View is referenceable either by the persistent identifier `http://hdl.handle.net/4711/08-15-1` or the URI `http://repos.example.org/file/08_15_1.cmdi`. The actual hit in the Resource Fragment is also directly referenceable by either the persistent identifier `http://hdl.handle.net/4711/00-15-2` or the URI `http://repos.example.org/file/text_08_15.html#sentence2`.
    399 
    400 ==== Data View ====
     532    </fcs:DataView>
     533  </fcs:ResourceFragment>
     534</fcs:Resource>
     535}}}
     536The more complex [#REF_Example_3 Example 3] is similar to [#REF_Example_2 Example 2], i.e. it shows a hit is encoded as one ''Generic Hits'' Data View in a Resource Fragment, which is embedded in a Resource. In contrast to Example 2, another Data View of type ''CMDI'' is embedded directly within the Resource. The Endpoint can use this type of Data View to directly provide CMDI metadata about the Resource to Clients.
     537All entities of the Hit can be referenced by a persistent identifier and a URI. The complete Resource is referenceable by either the persistent identifier `http://hdl.handle.net/4711/08-15` or the URI `http://repos.example.org/file/text_08_15.html` and the CMDI metadata record in the CMDI Data View is referenceable either by the persistent identifier `http://hdl.handle.net/4711/08-15-1` or the URI `http://repos.example.org/file/08_15_1.cmdi`. The actual hit in the Resource Fragment is also directly referenceable by either the persistent identifier `http://hdl.handle.net/4711/00-15-2` or the URI `http://repos.example.org/file/text_08_15.html#sentence2`.   
     538
     539==== Data View
    401540A ''Data View'' serves as a container for encoding the actual search results (the data fragments relevant to search) within CLARIN-FCS. Data Views are designed to allow for different representations of results, i.e. they are deliberately kept open to allow further extensions with more supported Data View formats. This specification only defines a ''most basic'' Data View for representing search results, called ''Generic Hits'' (see below). More Data Views are defined in the supplementary specification [#REF_FCS_DataViews CLARIN-FCS-DataViews].
    402541
     
    413552'''NOTE''': The examples in the following sections ''show only'' the payload with the enclosing `<fcs:DataView>` element of a Data View. Of course, the Data View must be embedded either in a `<fcs:Resource>` or a `<fcs:ResourceFragment>` element. The `@pid` and `@ref` attributes have been omitted for all ''inline'' payload types.
    414553
    415 ===== Generic Hits (HITS) =====
    416 ||=Description =|| The representation of the hit ||
    417 ||=MIME type =|| `application/x-clarin-fcs-hits+xml` ||
    418 ||=Payload Disposition =|| ''inline'' ||
    419 ||=Payload Delivery =|| ''send-by-default'' (`REQUIRED`) ||
     554===== Generic Hits (HITS)
     555||=Description                  =|| The representation of the hit ||
     556||=MIME type                    =|| `application/x-clarin-fcs-hits+xml` ||
     557||=Payload Disposition          =|| ''inline'' ||
     558||=Payload Delivery             =|| ''send-by-default'' (`REQUIRED`) ||
    420559||=Recommended Short Identifier =|| `hits` (`RECOMMENDED`) ||
    421 ||=XML Schema =|| [source:FederatedSearch/schema/Core_2/DataView-Hits.xsd DataView-Hits.xsd] ([source:FederatedSearch/schema/Core_2/DataView-Hits.xsd?format=txt download]) ||
    422 
     560||=XML Schema                   =|| [source:FederatedSearch/schema/Core_2/DataView-Hits.xsd DataView-Hits.xsd] ([source:FederatedSearch/schema/Core_2/DataView-Hits.xsd?format=txt download]) ||
    423561The ''Generic Hits'' Data View serves as the ''most basic'' agreement in CLARIN-FCS for serialization of search results and `MUST` be implemented by all Endpoints. In many cases, this Data View can only serve as an (lossy) approximation, because resources at Endpoints are very heterogeneous. For instance, the Generic Hits Data View is probably not the best representation for a hit result in a corpus of spoken language, but an architecture like CLARIN-FCS requires one common representation to be implemented by all Endpoints, therefore this Data View was defined. The Generic Hits Data View supports multiple markers for supplying highlighting for an individual hit, e.g. if a query contains a (boolean) conjunction, the Endpoint can use multiple markers to provide individual highlights for the matching terms. An Endpoint `MUST NOT` use this Data View to aggregate several hits within one resource. Each hit `SHOULD` be presented within the context of a complete sentence. If that is not possible due to the nature of the type of the resource, the Endpoint `MUST` provide an equivalent reasonable unit of context (e.g. within a phrase of an orthographic transcription of an utterance). The `<hits:Hit>` element within the `<hits:Result>` element is not enforced by the XML schema, but Endpoints are `RECOMMENDED` to use it. The XML fragment of the Generic Hits payload `MUST` be valid according to the XML schema "[source:FederatedSearch/schema/Core_2/DataView-Hits.xsd DataView-Hits.xsd]" ([source:FederatedSearch/schema/Core_2/DataView-Hits.xsd?format=txt download]).
    424 
    425562 * Example (single hit marker):
    426 
    427 {{{#!xml <!-- potential @pid and @ref attributes omitted --> <fcs:DataView type="application/x-clarin-fcs-hits+xml">
    428 
     563{{{#!xml
     564<!-- potential @pid and @ref attributes omitted -->
     565<fcs:DataView type="application/x-clarin-fcs-hits+xml">
    429566  <hits:Result xmlns:hits="http://clarin.eu/fcs/dataview/hits">
    430     The quick brown <hits:Hit> fox</hits:Hit > jumps over the lazy dog.
    431   </hits:Result >
    432 
    433 </fcs:DataView> }}}
    434 
     567    The quick brown <hits:Hit>fox</hits:Hit> jumps over the lazy dog.
     568  </hits:Result>
     569</fcs:DataView>
     570}}}
    435571 * Example (multiple hit markers):
    436 
    437 {{{#!xml <!-- potential @pid and @ref attributes omitted --> <fcs:DataView type="application/x-clarin-fcs-hits+xml">
    438 
     572{{{#!xml
     573<!-- potential @pid and @ref attributes omitted -->
     574<fcs:DataView type="application/x-clarin-fcs-hits+xml">
    439575  <hits:Result xmlns:hits="http://clarin.eu/fcs/dataview/hits">
    440     The quick brown <hits:Hit> fox</hits:Hit > jumps over the lazy <hits:Hit> dog</hits:Hit >.
    441   </hits:Result >
    442 
    443 </fcs:DataView> }}}
    444 
    445 ===== Advanced (ADV) ===== #advancedDataView
    446 ||=Description =|| The representation of the hit for Advanced Search ||
    447 ||=MIME type =|| `application/x-clarin-fcs-adv+xml` ||
    448 ||=Payload Disposition =|| ''inline'' ||
    449 ||=Payload Delivery =|| ''send-by-default'' (`REQUIRED`) ||
     576    The quick brown <hits:Hit>fox</hits:Hit> jumps over the lazy <hits:Hit>dog</hits:Hit>.
     577  </hits:Result>
     578</fcs:DataView>
     579}}}
     580
     581===== Advanced (ADV) #advancedDataView
     582||=Description                  =|| The representation of the hit for Advanced Search ||
     583||=MIME type                    =|| `application/x-clarin-fcs-adv+xml` ||
     584||=Payload Disposition          =|| ''inline'' ||
     585||=Payload Delivery             =|| ''send-by-default'' (`REQUIRED`) ||
    450586||=Recommended Short Identifier =|| `adv` (`RECOMMENDED`) ||
    451 ||=XML Schema =|| [source:FederatedSearch/schema/Core_2/DataView-Advanced.xsd DataView-Advanced.xsd] ([source:FederatedSearch/schema/Core_2/DataView-Advanced.xsd?format=txt download]) ||
     587||=XML Schema                   =|| [source:FederatedSearch/schema/Core_2/DataView-Advanced.xsd DataView-Advanced.xsd] ([source:FederatedSearch/schema/Core_2/DataView-Advanced.xsd?format=txt download]) ||
    452588
    453589{{{
     
    455591TODO: describe!
    456592}}}
    457  * ADV Data View allows to return structured information for Advanced Search queries
    458  * organized in one or more annotation layers
    459  * annotation layer := annotations of a specific type, e.g. part-of-speech or orthographic transcription
    460  * annotations of two different annotation layers may freely overlap; no self-overlap in an annotation layer
    461  * Data View serialization in a stand-off like format, i.e annotations are ranges over the signal (= language resource as character, token or audio stream) are denoted by start and end offsets
    462  * layers alignable (through their offsets) and referable (trough their layer identifier)
    463  * ADV Data View serialization:
    464    * a list of segments (= "inventory" of all ranges used to describe annotations")
    465      * units can be "items" (= offsets in character or token-stream) or "timestamp" (timestamps in audio-stream), timestamps may have a resolution of up to 1/1000 second.
    466        * endpoints are responsible for choosing proper offsets for segments. they must do so in a consistent manner, i.e. in a single result (= ADV Data View instance) the chosen offsets must allow for aligning the segments of different layers. a recommendation for character streams: character := Unicode codepoint, normalized to Unicode Normalization Form KC (NFKC; Compatibility Decomposition, followed by Canonical Composition)
    467      * segments may also have an endpoint specific reference (= URI); can be show in aggregator and if user clicks link can open a viewer (e.g. audio-player) at the endpoint
    468    * a list of layers, each has a type (e.g. "pos", "lemma", see Layer Type identifier in section Layers above) and an layer identifier (= URI)
    469      * a layer consists of one or more Spans. A span references a segment (and thus inherits the start- and en- offsets) and contains the actual annotation (e.g. the port-of-speech label) in it's content; MAY also contain alt-value (e.g. original annotation value)
    470      * document order of layer elements define the view order in the Aggregator
    471      * endpoints should return at least all layers that where referenced the query; they may return more
    472    * Hit Makers are added by marking Spans as hits (add `@highlight` attribute); multiple hit-makers are supported and Aggregator may display them visually distinct
    473      * where to add hit markers is up to the endpoint; generally "things" that where referenced in the query should be marked.
     593
     594 - ADV Data View allows to return structured information for Advanced Search queries
     595 - organized in one or more annotation layers
     596 - annotation layer := annotations of a specific type, e.g. part-of-speech or orthographic transcription
     597 - annotations of two different annotation layers may freely overlap; no self-overlap in an annotation layer
     598 - Data View serialization in a stand-off like format, i.e annotations are ranges over the signal (= language resource as character, token or audio stream) are denoted by start and end offsets
     599 - layers alignable (through their offsets) and referable (trough their layer identifier)
     600 - ADV Data View serialization:
     601   - a list of segments (= "inventory" of all ranges used to describe annotations")
     602     - units can be "items" (= offsets in character or token-stream) or "timestamp" (timestamps in audio-stream), timestamps may have a resolution of up to 1/1000 second.
     603       - endpoints are responsible for choosing proper offsets for segments. they must do so in a consistent manner, i.e. in a single result (= ADV Data View instance) the chosen offsets must allow for aligning the segments of different layers. a recommendation for character streams: character := Unicode codepoint, normalized to Unicode Normalization Form KC (NFKC; Compatibility Decomposition, followed by Canonical Composition)
     604     - segments may also have an endpoint specific reference (= URI); can be show in aggregator and if user clicks link can open a viewer (e.g. audio-player) at the endpoint
     605   - a list of layers, each has a type (e.g. "pos", "lemma", see Layer Type identifier in section Layers above) and an layer identifier (= URI)
     606     - a layer consists of one or more Spans. A span references a segment (and thus inherits the start- and en- offsets) and contains the actual annotation (e.g. the port-of-speech label) in it's content; MAY also contain alt-value (e.g. original annotation value)
     607     - document order of layer elements define the view order in the Aggregator
     608     - endpoints should return at least all layers that where referenced the query; they may return more
     609   - Hit Makers are added by marking Spans as hits (add `@highlight` attribute); multiple hit-makers are supported and Aggregator may display them visually distinct
     610     - where to add hit markers is up to the endpoint; generally "things" that where referenced in the query should be marked.
     611
    474612
    475613Example: a sentence interpreted as a character stream
    476 
    477 ||=Data =|| t || || d || a || || ' || s || || d || e || || e || n || i || g || e || || e || c || h || t || e || || h || o || o || p || || v || o || o || r || || o || n || s || || m || e || n || s || e || n ||
     614||=Data   =|| t || || d || a || || ' || s ||   || d || e ||   || e || n || i || g || e ||   || e || c || h || t || e ||   || h || o || o || p ||   || v || o || o || r ||   || o || n || s ||   || m || e || n || s || e || n ||
    478615||=Offset =|| 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 || 10 || 11 || 12 || 13 || 14 || 15 || 16 || 17 || 18 || 19 || 20 || 21 || 22 || 23 || 24 || 25 || 26 || 27 || 28 || 29 || 30 || 31 || 32 || 33 || 34 || 35 || 36 || 37 || 38 || 39 || 40 || 41 || 42 || 43 ||
    479616
    480617Example: several annotation layers for the sentence
    481 
    482 ||=Offset (Start, End) =|| 1,1 || 3,4 || 6,7 || 9,10 || 12,16 || 18,22 || 24,27 || 29,32 || 34,36 || 38,43 ||
    483 ||=Layer ''orth'' =|| t || da || 's || de || enige || echte || hoop || voor || ons || mensen ||
    484 ||=Layer ''pos'' =|| X || PRON || VERB || DET || DET || ADJ || NOUN || ADP || PRON || NOUN ||
    485 ||=Layer ''lemma'' =|| _ || dat || zijn || de || enig || echt || hoop || voor || ons || mens ||
    486 ||=Layer ''phonetic'' =|| t@ || dAz || dAz || d@ || en@G@ || Ext@ || hop || for || Ons || mEns@ ||
    487 
    488 Example: XML serialization {{{#!xml <Advanced>
    489 
    490   <Segments unit="items">
    491     <Segment id="s1"  start="1"  end="1"
    492       ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=0:173"/ >
    493     <Segment id="s2"  start="3"  end="4"
    494       ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=173:304"/ >
    495     <Segment id="s3"  start="6"  end="7"
    496       ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=173:304"/ >
    497     <Segment id="s4"  start="9"  end="10"
    498       ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=304:480"/ >
    499     <Segment id="s5"  start="12" end="16"
    500       ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=480:1119"/ >
    501     <Segment id="s6"  start="18" end="22"
    502       ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=1339:1901"/ >
    503     <Segment id="s7"  start="24" end="27"
    504       ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=1901:2427"/ >
    505     <Segment id="s8"  start="29" end="32"
    506       ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=3084:3493"/ >
    507     <Segment id="s9"  start="34" end="36"
    508       ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=3493:3754"/ >
    509     <Segment id="s10" start="38" end="43"
    510       ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=3754:4274"/ >
    511   </Segments>
    512 
    513   <Layers>
    514     <Layer id="http://endpoint.example.org/Layers/orth1 ">
    515       <Span ref="s1">t</Span> <Span ref="s2">da</Span>  <Span ref="s3">'s</Span>  <Span ref="s4">de</Span>  <Span ref="s5">enige</Span>  <Span ref="s6">echte</Span>  <Span ref="s7">hoop</Span>  <Span ref="s8">voor</Span>  <Span ref="s9">ons</Span>  <Span ref="s10">mensen</Span>
    516     </Layer>
    517 
    518   <Layer id="http://endpoint.example.org/Layers/pos1 ">
    519     <Span ref="s1"  alt-value="SPEC(afgebr)">X</Span> <Span ref="s2"  alt-value="VNW(aanw,pron,stan,vol,3o,ev)">PRON</Span> <Span ref="s3"  alt-value="WW(pv,tgw,ev)">VERB</Span> <Span ref="s4"  alt-value="LID(bep,stan,rest)">DET</Span> <Span ref="s5"  alt-value="VNW(onbep,det,stan,prenom,met-e,rest)">DET</Span> <Span ref="s6"  alt-value="ADJ(prenom,basis,met-e,stan)">ADJ</Span> <Span ref="s7"  alt-value="N(soort,ev,basis,zijd,stan)">NOUN</Span> <Span ref="s8"  alt-value="VZ(init)">ADP</Span> <Span ref="s9"  alt-value="VNW(pr,pron,obl,vol,1,mv)">PRON</Span> <Span ref="s10" alt-value="N(soort,mv,basis)">NOUN</Span>
    520   </Layer>
    521 
    522   <Layer id="http://endpoint.example.org/Layers/lemma1 ">
    523     <Span ref="s1">_</Span> <Span ref="s2">dat</Span> <Span ref="s3">zijn</Span> <Span ref="s4" >de</Span> <Span ref="s5">enig</Span> <Span ref="s6" highlight="h1">echt</Span> <Span ref="s7" highlight="h1">hoop</Span> <Span ref="s8">voor</Span> <Span ref="s9">ons</Span> <Span ref="s10">mens</Span>
    524   </Layer>
    525 
    526   <Layer id="http://endpoint.example.org/Layers/phon ">
    527     <Span ref="s1">t@</Span> <Span ref="s2" highlight="h2">dAz</Span> <Span ref="s3">dAz</Span> <Span ref="s4">d@</Span> <Span ref="s5">en@G@</Span> <Span ref="s6">Ext@</Span> <Span ref="s7">hop</Span> <Span ref="s8">for</Span> <Span ref="s9">Ons</Span> <Span ref="s10">mEns@</Span>
    528   </Layer>
    529 
    530 </Layers> </Advanced> }}}
    531 
    532 === Versioning and Extensions ===
    533 ==== Backwards Compatibility ==== #backwardsCompatibility
     618||=Offset (Start, End) =|| 1,1  || 3,4    || 6,7    || 9,10   || 12,16    || 18,22    || 24,27   || 29,32   || 34,36   || 38,43  ||
     619||=Layer ''orth''      =|| t    || da     || 's     || de     || enige    || echte    || hoop    || voor    || ons     || mensen ||
     620||=Layer ''pos''       =|| X    || PRON   || VERB   || DET    || DET      || ADJ      || NOUN    || ADP     || PRON    || NOUN   ||
     621||=Layer ''lemma''     =|| _    || dat    || zijn   || de     || enig     || echt     || hoop    || voor    || ons     || mens   ||
     622||=Layer ''phonetic''  =|| t@   || dAz    || dAz    || d@     || en@G@    || Ext@     || hop     || for     || Ons     || mEns@  ||
     623
     624Example: XML serialization
     625{{{#!xml
     626<Advanced>
     627    <Segments unit="items">
     628        <Segment id="s1"  start="1"  end="1"
     629            ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=0:173"/>
     630        <Segment id="s2"  start="3"  end="4"
     631            ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=173:304"/>
     632        <Segment id="s3"  start="6"  end="7"
     633            ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=173:304"/>
     634        <Segment id="s4"  start="9"  end="10"
     635            ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=304:480"/>
     636        <Segment id="s5"  start="12" end="16"
     637            ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=480:1119"/>
     638        <Segment id="s6"  start="18" end="22"
     639            ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=1339:1901"/>
     640        <Segment id="s7"  start="24" end="27"
     641            ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=1901:2427"/>
     642        <Segment id="s8"  start="29" end="32"
     643            ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=3084:3493"/>
     644        <Segment id="s9"  start="34" end="36"
     645            ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=3493:3754"/>
     646        <Segment id="s10" start="38" end="43"
     647            ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=3754:4274"/>
     648    </Segments>
     649
     650    <Layers>
     651        <Layer id="http://endpoint.example.org/Layers/orth1">
     652            <Span ref="s1">t</Span>
     653            <Span ref="s2">da</Span>
     654            <Span ref="s3">'s</Span>
     655            <Span ref="s4">de</Span>
     656            <Span ref="s5">enige</Span>
     657            <Span ref="s6">echte</Span>
     658            <Span ref="s7">hoop</Span>
     659            <Span ref="s8">voor</Span>
     660            <Span ref="s9">ons</Span>
     661            <Span ref="s10">mensen</Span>
     662        </Layer>
     663
     664        <Layer id="http://endpoint.example.org/Layers/pos1">
     665            <Span ref="s1"  alt-value="SPEC(afgebr)">X</Span>
     666            <Span ref="s2"  alt-value="VNW(aanw,pron,stan,vol,3o,ev)">PRON</Span>
     667            <Span ref="s3"  alt-value="WW(pv,tgw,ev)">VERB</Span>
     668            <Span ref="s4"  alt-value="LID(bep,stan,rest)">DET</Span>
     669            <Span ref="s5"  alt-value="VNW(onbep,det,stan,prenom,met-e,rest)">DET</Span>
     670            <Span ref="s6"  alt-value="ADJ(prenom,basis,met-e,stan)">ADJ</Span>
     671            <Span ref="s7"  alt-value="N(soort,ev,basis,zijd,stan)">NOUN</Span>
     672            <Span ref="s8"  alt-value="VZ(init)">ADP</Span>
     673            <Span ref="s9"  alt-value="VNW(pr,pron,obl,vol,1,mv)">PRON</Span>
     674            <Span ref="s10" alt-value="N(soort,mv,basis)">NOUN</Span>
     675        </Layer>
     676
     677        <Layer id="http://endpoint.example.org/Layers/lemma1">
     678            <Span ref="s1">_</Span>
     679            <Span ref="s2">dat</Span>
     680            <Span ref="s3">zijn</Span>
     681            <Span ref="s4" >de</Span>
     682            <Span ref="s5">enig</Span>
     683            <Span ref="s6" highlight="h1">echt</Span>
     684            <Span ref="s7" highlight="h1">hoop</Span>
     685            <Span ref="s8">voor</Span>
     686            <Span ref="s9">ons</Span>
     687            <Span ref="s10">mens</Span>
     688        </Layer>
     689
     690        <Layer id="http://endpoint.example.org/Layers/phon">
     691            <Span ref="s1">t@</Span>
     692            <Span ref="s2" highlight="h2">dAz</Span>
     693            <Span ref="s3">dAz</Span>
     694            <Span ref="s4">d@</Span>
     695            <Span ref="s5">en@G@</Span>
     696            <Span ref="s6">Ext@</Span>
     697            <Span ref="s7">hop</Span>
     698            <Span ref="s8">for</Span>
     699            <Span ref="s9">Ons</Span>
     700            <Span ref="s10">mEns@</Span>
     701        </Layer>
     702    </Layers>
     703</Advanced>
     704}}}
     705
     706=== Versioning and Extensions
     707==== Backwards Compatibility #backwardsCompatibility
    534708{{{
    535709#!div style="border: 1px solid #000000; font-size: 75%"
    536710TODO: check and proof-read
    537711}}}
     712
    538713Clients `MUST` be compatible to CLARIN-FCS 1.0, thus must implement SRU 1.2. If a Client uses CLARIN-FCS 1.0 to talk to an Endpoint, it `MUST NOT` use features beyond the Basic Search capability. Clients `MUST` implement a heuristic to automatically determine which CLARIN-FCS protocol version, i.e. which version of the SRU protocol, can be used talk an Endpoint.
    539714
    540715Clients `MUST` be able to process the legacy XML namespaces:
    541 
    542716 * `http://www.loc.gov/zing/srw/` for SRU response documents, and
    543717 * `http://www.loc.gov/zing/srw/diagnostic/` for diagnostics within SRU response documents.
    544 
    545718which SRU 1.2 Endpoints use for serializing responses as well as the OASIS XML namespaces. CLARIN-FCS deviates from the OASIS specification [#REF_SRU_Overview OASIS-SRU-Overview] and [#REF_SRU_12 OASIS-SRU-12] to ensure backwards comparability with SRU 1.2 services as they were defined by the [#REF_LOC_SRU_12 LOC-SRU12].
    546719
    547720Pseudo algorithm for version detection heuristic:
    548 
    549721 * Send ''explain'' request without `version` and `operation` parameter
    550722 * Check SRU response for content of the element `<sru:explainResponse>/<sru:version>`
    551723
    552 ==== Endpoint Custom Extensions ====
    553 Endpoints can add custom extensions, i.e. custom data, to the Result Format. This extension mechanism can for example be used to provide hints for an (XSLT/XQuery) application that works directly on CLARIN-FCS, e.g. to allow it to generate back and forward links to navigate in a result set.
     724==== Endpoint Custom Extensions
     725Endpoints can add custom extensions, i.e. custom data, to the Result Format. This extension mechanism can for example be used to provide hints for an (XSLT/XQuery) application that works directly on CLARIN-FCS, e.g. to allow it to generate back and forward links to navigate in a result set. 
    554726
    555727An Endpoint `MAY` add arbitrary XML fragments to the extension hooks provided in the `<fcs:Resource>` element (see the XML schema for "Resource.xsd"). The XML fragment for the extension `MUST` use a custom XML namespace name for the extension. Endpoints `MUST NOT` use XML namespace names that start with the prefixes `http://clarin.eu`, `http://www.clarin.eu/`, `https://clarin.eu` or `https://www.clarin.eu/`.
     
    559731The non-normative appendix contains an [#extensionExample example], how an extension could be implemented.
    560732
    561 = CLARIN-FCS to SRU/CQL binding =
    562 == SRU/CQL == #sruCQL
     733= CLARIN-FCS to SRU/CQL binding
     734== SRU/CQL #sruCQL
    563735{{{
    564736#!div style="border: 1px solid #000000; font-size: 75%"
     
    568740
    569741Endpoints and Clients `MUST` implement the SRU/CQL protocol suite as defined in [#REF_SRU_Overview OASIS-SRU-Overview], [#REF_SRU_APD OASIS-SRU-APD], [#REF_CQL OASIS-CQL], [#REF_Explain SRU-Explain], [#REF_Scan SRU-Scan], especially with respect to:
    570 
    571742 * Data Model,
    572743 * Query Model,
    573744 * Processing Model,
    574745 * Result Set Model, and
    575  * Diagnostics Model
    576 
    577 Endpoints and Clients `MUST` implement the APD Binding for SRU 2.0, as defined in [#REF_SRU_20 OASIS-SRU-20]. \\ Clients `MUST` implement APD Binding for SRU 1.2, as defined in [#REF_SRU_12 OASIS-SRU-12]. \\ Clients `MAY` also implement APD binding for version 1.1. \\ '''NOTE''': when implementing SRU 1.2 Endpoints and Clients `MUST` behave like described in the section [#backwardsCompatibility Backwards Compatibility].
     746 * Diagnostics Model   
     747
     748Endpoints and Clients `MUST` implement the APD Binding for SRU 2.0, as defined in [#REF_SRU_20 OASIS-SRU-20]. \\
     749Clients `MUST` implement APD Binding for SRU 1.2, as defined in [#REF_SRU_12 OASIS-SRU-12]. \\
     750Clients `MAY` also implement APD binding for version 1.1. \\
     751'''NOTE''': when implementing SRU 1.2 Endpoints and Clients `MUST` behave like described in the section [#backwardsCompatibility Backwards Compatibility].
    578752
    579753Endpoints or Clients `MUST` support CQL conformance ''Level 2'' (as defined in [#REF_OASIS_CQL OASIS-CQL, section 6]), i.e. be able to ''parse'' (Endpoints) or ''serialize'' (Clients) all of CQL and respond with appropriate error messages to the search/retrieve protocol interface.
     
    585759Endpoints `MUST` support the HTTP GET [#REF_SRU_20 OASIS-SRU-20, Appendix B.1] and HTTP POST [#REF_SRU_20 OASIS-SRU-20, Appendix B.2] lower level protocol binding. Endpoints `MAY` also support the SOAP [#REF_SRU_20 OASIS-SRU-20, Appendix B.3] binding.
    586760
    587 == Operation ''explain'' == #explain
     761
     762== Operation ''explain'' #explain
    588763{{{
    589764#!div style="border: 1px solid #000000; font-size: 75%"
     
    595770
    596771According to the Capabilities supported by the Endpoint the Explain record `MUST` contain the following elements:
    597 
    598  ''Basic-Search'' Capability:: `<zr:serverInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`) \\ `<zr:databaseInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`) \\ `<zr:schemaInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`). This element `MUST` contain an element `<zr:schema>` with an `@identifier` attribute with a value of `http://clarin.eu/fcs/resource` and an `@name` attribute with a value of `fcs`. \\ `<zr:configInfo>` is `OPTIONAL` \\ Other capabilities  may define how the `<zr:indexInfo>` element is to be used, therefore it is `NOT RECOMMENDED` for Endpoints to use it in custom extensions.
     772 ''Basic-Search'' Capability::
     773   `<zr:serverInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`) \\
     774   `<zr:databaseInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`) \\
     775   `<zr:schemaInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`). This element `MUST` contain an element `<zr:schema>` with an `@identifier` attribute with a value of `http://clarin.eu/fcs/resource` and an `@name` attribute with a value of `fcs`. \\
     776   `<zr:configInfo>` is `OPTIONAL` \\
     777   Other capabilities  may define how the `<zr:indexInfo>` element is to be used, therefore it is `NOT RECOMMENDED` for Endpoints to use it in custom extensions.
    599778
    600779To support auto-configuration in CLARIN-FCS, the Endpoint `MUST` provide support ''Endpoint Description''. The Endpoint Description is included in explain response utilizing SRUs extension mechanism, i.e. by embedding an XML fragment into the `<sru:extraResponseData>` element. The Endpoint `MUST` include the Endpoint Description ''only'' if the Client performs an explain request with the ''extra request parameter'' `x-fcs-endpoint-description` with a value of `true`. If the Client performs an explain request ''without'' supplying this extra request parameter the Endpoint `MUST NOT` include the Endpoint Description. The format of the Endpoint Description XML fragment is defined in [#endpointDescription Endpoint Description].
    601780
    602781The following example shows a request and response to an ''explain'' request with added extra request parameter `x-fcs-endpoint-description`:
    603 
    604  * HTTP GET request: Client → Endpoint:
    605 
    606 {{{#!sh http://repos.example.org/fcs-endpoint?operation=explain&version=1.2&x-fcs-endpoint-description=true }}}
    607 
    608  * HTTP Response: Endpoint → Client:
    609 
    610 {{{#!xml <?xml version='1.0' encoding='utf-8'?> <sru:explainResponse xmlns:sru="http://www.loc.gov/zing/srw/">
    611 
    612   <sru:version> 1.2</sru:version > <sru:record>
    613     <sru:recordSchema> http://explain.z3950.org/dtd/2.0/ </sru:recordSchema > <sru:recordPacking> xml</sru:recordPacking > <sru:recordData>
     782* HTTP GET request: Client → Endpoint:
     783{{{#!sh
     784http://repos.example.org/fcs-endpoint?operation=explain&version=1.2&x-fcs-endpoint-description=true
     785}}}
     786* HTTP Response: Endpoint → Client:
     787{{{#!xml
     788<?xml version='1.0' encoding='utf-8'?>
     789<sru:explainResponse xmlns:sru="http://www.loc.gov/zing/srw/">
     790  <sru:version>1.2</sru:version>
     791  <sru:record>
     792    <sru:recordSchema>http://explain.z3950.org/dtd/2.0/</sru:recordSchema>
     793    <sru:recordPacking>xml</sru:recordPacking>
     794    <sru:recordData>
    614795      <zr:explain xmlns:zr="http://explain.z3950.org/dtd/2.0/">
    615         <!-- <zr:serverInfo>  is REQUIRED --> <zr:serverInfo protocol="SRU" version="1.2" transport="http">
    616           <zr:host> repos.example.org</zr:host > <zr:port> 80</zr:port > <zr:database> fcs-endpoint</zr:database >
    617         </zr:serverInfo > <!-- <zr:databaseInfo>  is REQUIRED --> <zr:databaseInfo>
    618           <zr:title lang="de"> Goethe Corpus</zr:title > <zr:title lang="en" primary="true"> Goethe Korpus</zr:title > <zr:description lang="de"> Der Goethe Korpus des IDS Mannheim.</zr:description > <zr:description lang="en" primary="true"> The Goethe corpus of IDS Mannheim.</zr:description >
    619         </zr:databaseInfo > <!-- <zr:schemaInfo>  is REQUIRED --> <zr:schemaInfo>
     796        <!-- <zr:serverInfo > is REQUIRED -->
     797        <zr:serverInfo protocol="SRU" version="1.2" transport="http">
     798          <zr:host>repos.example.org</zr:host>
     799          <zr:port>80</zr:port>
     800          <zr:database>fcs-endpoint</zr:database>
     801        </zr:serverInfo>
     802        <!-- <zr:databaseInfo> is REQUIRED -->
     803        <zr:databaseInfo>
     804          <zr:title lang="de">Goethe Corpus</zr:title>
     805          <zr:title lang="en" primary="true">Goethe Korpus</zr:title>
     806          <zr:description lang="de">Der Goethe Korpus des IDS Mannheim.</zr:description>
     807          <zr:description lang="en" primary="true">The Goethe corpus of IDS Mannheim.</zr:description>
     808        </zr:databaseInfo>
     809        <!-- <zr:schemaInfo> is REQUIRED -->
     810        <zr:schemaInfo>
    620811          <zr:schema identifier="http://clarin.eu/fcs/resource" name="fcs">
    621             <zr:title lang="en" primary="true"> CLARIN Federated Content Search</zr:title >
    622           </zr:schema >
    623         </zr:schemaInfo > <!-- <zr:configInfo>  is OPTIONAL --> <zr:configInfo>
    624           <zr:default type="numberOfRecords"> 250</zr:default > <zr:setting type="maximumRecords"> 1000</zr:setting >
    625         </zr:configInfo >
    626       </zr:explain >
    627     </sru:recordData >
    628   </sru:record > <!-- <sru:echoedExplainRequest>  is OPTIONAL --> <sru:echoedExplainRequest>
    629     <sru:version> 1.2</sru:version > <sru:baseUrl> http://repos.example.org/fcs-endpoint </sru:baseUrl >
    630   </sru:echoedExplainRequest > <sru:extraResponseData>
     812            <zr:title lang="en" primary="true">CLARIN Federated Content Search</zr:title>
     813          </zr:schema>
     814        </zr:schemaInfo>
     815        <!-- <zr:configInfo> is OPTIONAL -->
     816        <zr:configInfo>
     817          <zr:default type="numberOfRecords">250</zr:default>
     818          <zr:setting type="maximumRecords">1000</zr:setting>
     819        </zr:configInfo>
     820      </zr:explain>
     821    </sru:recordData>
     822  </sru:record>
     823  <!-- <sru:echoedExplainRequest> is OPTIONAL -->
     824  <sru:echoedExplainRequest>
     825    <sru:version>1.2</sru:version>
     826    <sru:baseUrl>http://repos.example.org/fcs-endpoint</sru:baseUrl>
     827  </sru:echoedExplainRequest>
     828  <sru:extraResponseData>
    631829    <ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="1">
    632830      <ed:Capabilities>
    633         <ed:Capability> http://clarin.eu/fcs/capability/basic-search </ed:Capability >
    634       </ed:Capabilities > <ed:SupportedDataViews>
    635         <ed:SupportedDataView id="hits" delivery-policy="send-by-default"> application/x-clarin-fcs-hits+xml</ed:SupportedDataView >
    636       </ed:SupportedDataViews > <ed:Resources>
    637         <!-- just one top-level resource at the Endpoint --> <ed:Resource pid="http://hdl.handle.net/4711/0815">
    638           <ed:Title xml:lang="de"> Goethe Corpus</ed:Title > <ed:Title xml:lang="en"> Goethe Korpus</ed:Title > <ed:Description xml:lang="de"> Der Goethe Korpus des IDS Mannheim.</ed:Description > <ed:Description xml:lang="en"> The Goethe corpus of IDS Mannheim.</ed:Description > <ed:LandingPageURI> http://repos.example.org/corpus1.html </ed:LandingPageURI > <ed:Languages>
    639             <ed:Language> deu</ed:Language >
    640           </ed:Languages > <ed:AvailableDataViews ref="hits"/>
    641         </ed:Resource >
    642       </ed:Resources >
    643     </ed:EndpointDescription >
    644   </sru:extraResponseData >
    645 
    646 </sru:explainResponse> }}}
    647 
    648 == Operation ''scan'' == #scan
     831          <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability>
     832      </ed:Capabilities>
     833      <ed:SupportedDataViews>
     834          <ed:SupportedDataView id="hits" delivery-policy="send-by-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView>
     835      </ed:SupportedDataViews>
     836      <ed:Resources>
     837          <!-- just one top-level resource at the Endpoint -->
     838          <ed:Resource pid="http://hdl.handle.net/4711/0815">
     839              <ed:Title xml:lang="de">Goethe Corpus</ed:Title>
     840              <ed:Title xml:lang="en">Goethe Korpus</ed:Title>
     841              <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description>
     842              <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description>
     843              <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI>
     844              <ed:Languages>
     845                  <ed:Language>deu</ed:Language>
     846              </ed:Languages>
     847              <ed:AvailableDataViews ref="hits"/>
     848          </ed:Resource>
     849      </ed:Resources>
     850    </ed:EndpointDescription>
     851  </sru:extraResponseData>
     852</sru:explainResponse>
     853}}}
     854
     855== Operation ''scan'' #scan
    649856The ''scan'' operation of the SRU protocol is currently not used in the ''Basic Search'' or ''Advanced Search'' capability of CLARIN-FCS. Future capabilities may use this operation, therefore it is `NOT RECOMMENDED` for Endpoints to define custom extensions that use this operation.
    650857
    651 == Operation ''searchRetrieve'' == #searchRetrieve
     858== Operation ''searchRetrieve'' #searchRetrieve
    652859{{{
    653860#!div style="border: 1px solid #000000; font-size: 75%"
     
    657864The ''searchRetrieve'' operation of the SRU protocol is used for searching in the Resources that are provided by the Endpoint. The SRU protocol defines the serialization of request and response formats in [#REF_SRU_20 OASIS-SRU-20] for SRU version 2.0 and [#REF_SRU_12 OASIS-SRU-12] for SRU version 1.2. An Endpoint `MUST` respond in the correct format, i.e. when Endpoint also supports SRU 1.2 and the request is issued in SRU version 1.2, the response must be encoded accordingly.
    658865
    659 In SRU, search result hits are encoded down to a record level, i.e. the `<sru:record>` element, and SRU allows records to be serialized in various formats, so called ''record schemas'' Endpoints `MUST` support the CLARIN-FCS record schema (see section [#resultFormat Result Format]) and `MUST` use the value `http://clarin.eu/fcs/resource` for the ''responseItemType'' ("record schema identifier"). Endpoints `MUST` represent exactly ''one hit'' within the Resource as one SRU record, i.e. `<sru:record>` element.
     866In SRU, search result hits are encoded down to a record level, i.e. the `<sru:record>` element, and SRU allows records to be serialized in various formats, so called ''record schemas'' Endpoints `MUST` support the CLARIN-FCS record schema (see section [#resultFormat Result Format]) and `MUST` use the value `http://clarin.eu/fcs/resource` for the ''responseItemType'' ("record schema identifier").
     867Endpoints `MUST` represent exactly ''one hit'' within the Resource as one SRU record, i.e. `<sru:record>` element.
    660868
    661869The following example shows a request and response to a ''searchRetrieve'' request with a ''term-only'' query for "cat":
    662 
    663  * HTTP GET request: Client → Endpoint:
    664 
    665 {{{#!sh http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat }}}
    666 
    667  * HTTP Response: Endpoint → Client:
    668 
    669 {{{#!xml <?xml version='1.0' encoding='utf-8'?> <sru:searchRetrieveResponse xmlns:sru="http://www.loc.gov/zing/srw/">
    670 
    671   <sru:version> 1.2</sru:version > <sru:numberOfRecords> 6</sru:numberOfRecords > <sru:records>
     870* HTTP GET request: Client → Endpoint:
     871{{{#!sh
     872http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat
     873}}}
     874* HTTP Response: Endpoint → Client:
     875{{{#!xml
     876<?xml version='1.0' encoding='utf-8'?>
     877<sru:searchRetrieveResponse xmlns:sru="http://www.loc.gov/zing/srw/">
     878  <sru:version>1.2</sru:version>
     879  <sru:numberOfRecords>6</sru:numberOfRecords>
     880  <sru:records>
    672881    <sru:record>
    673       <sru:recordSchema> http://clarin.eu/fcs/resource </sru:recordSchema > <sru:recordPacking> xml</sru:recordPacking > <sru:recordData>
     882      <sru:recordSchema>http://clarin.eu/fcs/resource</sru:recordSchema>
     883      <sru:recordPacking>xml</sru:recordPacking>
     884      <sru:recordData>
    674885        <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/08-15">
    675886          <fcs:ResourceFragment>
    676887            <fcs:DataView type="application/x-clarin-fcs-hits+xml">
    677888              <hits:Result xmlns:hits="http://clarin.eu/fcs/dataview/hits">
    678                 The quick brown <hits:Hit> cat</hits:Hit > jumps over the lazy dog.
    679               </hits:Result >
    680             </fcs:DataView >
    681           </fcs:ResourceFragment >
    682         </fcs:Resource >
    683       </sru:recordData > <sru:recordPosition> 1</sru:recordPosition >
    684     </sru:record > <!-- more <sru:records>  omitted for brevity -->
    685   </sru:records > <!-- <sru:echoedSearchRetrieveRequest>  is OPTIONAL --> <sru:echoedSearchRetrieveRequest>
    686     <sru:version> 1.2</sru:version > <sru:query> cat</sru:query > <sru:xQuery xmlns="http://www.loc.gov/zing/cql/xcql/">
     889                The quick brown <hits:Hit>cat</hits:Hit> jumps over the lazy dog.
     890              </hits:Result>
     891            </fcs:DataView>
     892          </fcs:ResourceFragment>
     893        </fcs:Resource>
     894      </sru:recordData>
     895      <sru:recordPosition>1</sru:recordPosition>
     896    </sru:record>
     897    <!-- more <sru:records> omitted for brevity -->
     898  </sru:records>
     899  <!-- <sru:echoedSearchRetrieveRequest> is OPTIONAL -->
     900  <sru:echoedSearchRetrieveRequest>
     901    <sru:version>1.2</sru:version>
     902    <sru:query>cat</sru:query>
     903    <sru:xQuery xmlns="http://www.loc.gov/zing/cql/xcql/">
    687904      <searchClause>
    688         <index>cql.serverChoice</index> <relation>
     905        <index>cql.serverChoice</index>
     906        <relation>
    689907          <value>=</value>
    690         </relation> <term>cat</term>
     908        </relation>
     909        <term>cat</term>
    691910      </searchClause>
    692     </sru:xQuery > <sru:startRecord> 1</sru:startRecord > <sru:baseUrl> http://repos.example.org/fcs-endpoint </sru:baseUrl >
    693   </sru:echoedSearchRetrieveRequest >
    694 
    695 </sru:searchRetrieveResponse> }}}
     911    </sru:xQuery>
     912    <sru:startRecord>1</sru:startRecord>
     913    <sru:baseUrl>http://repos.example.org/fcs-endpoint</sru:baseUrl>
     914  </sru:echoedSearchRetrieveRequest>
     915</sru:searchRetrieveResponse>
     916}}}
    696917
    697918In general, the Endpoint is `REQUIRED` to accept an ''unrestricted search'' and `SHOULD` perform the search operation on ''all'' Resources that are available at the Endpoint. If that is for some reason not feasible, e.g. performing an unrestricted search would allocate too many resources, the Endpoint `MAY` independently restrict the search to a scope that it can handle. If it does so, it `MUST` issue a non-fatal diagnostics `http://clarin.eu/fcs/diagnostic/2` ("Resource set too large. Query context automatically adjusted."). The details field of diagnostics `MUST` contain the persistent identifier of the resources to which the query scope was limited to. If the Endpoint limits the query scope to more than one resource, it `MUST` generate a ''separate'' non-fatal diagnostic `http://clarin.eu/fcs/diagnostic/2` for each of the resources.
     
    701922The Client can extract all valid persistent identifiers from the `@pid` attribute of the `<ed:Resource>` element, obtained by the ''explain'' request (see section [#explain Operation ''explain''] and section [#endpointDescription Endpoint Description]). The list of persistent identifiers can get extensive, but a Client can use the HTTP POST method instead of HTTP GET method for submitting the request.
    702923
    703 For example, to restrict the search to the Resource with the persistent identifier `http://hdl.handle.net/4711/0815` the Client must issue the following request: {{{#!sh http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-context=http://hdl.handle.net/4711/0815 }}} To restrict the search to the Resources with the persistent identifier `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816-2` the Client must issue the following request: {{{#!sh http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-context=http://hdl.handle.net/4711/0815,http://hdl.handle.net/4711/0816-2 }}} If an invalid persistent identifier is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/1` diagnostic, i.e. add the appropriate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. just issue the diagnostic and perform no search, or it `MAY` treat it as non-fatal and perform the search.
    704 
    705 If a Client wants to request one or more Data Views, that are handled by Endpoint with the ''need-to-request'' delivery policy, it `MUST` pass a comma-separated list of ''Data View identifier'' in the `x-fcs-dataviews` extra request parameter of the 'searchRetrieve' request. A Client can extract valid values for the ''Data View identifiers'' from the `@id` attribute of the `<ed:SupportedDataView>` elements in the Endpoint Description of the Endpoint (see section [#explain Operation ''explain''] and section [#endpointDescription Endpoint Description]).
    706 
    707 For example, to request the CMDI Data View from an Endpoint that has an Endpoint Description, as described in [#REF_Example_5 Example 5], a Client would need to use the ''Data View identifier'' `cmdi` and submit the following request: {{{#!sh http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-dataviews=cmdi }}} If an invalid ''Data View identifier'' is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/4`diagnostic, i.e. add the appropriate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. simply issue the diagnostic and perform no search, or it `MAY` treat it a non-fatal and perform the search.
    708 
    709 = Normative Appendix =
     924For example, to restrict the search to the Resource with the persistent identifier `http://hdl.handle.net/4711/0815` the Client must issue the following request:
     925{{{#!sh
     926http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-context=http://hdl.handle.net/4711/0815
     927}}}
     928To restrict the search to the Resources with the persistent identifier `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816-2` the Client must issue the following request:
     929{{{#!sh
     930http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-context=http://hdl.handle.net/4711/0815,http://hdl.handle.net/4711/0816-2
     931}}}
     932If an invalid persistent identifier is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/1` diagnostic, i.e. add the appropriate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. just issue the diagnostic and perform no search, or it `MAY` treat it as non-fatal and perform the search.
     933
     934If a Client wants to request one or more Data Views, that are handled by Endpoint with the ''need-to-request'' delivery policy, it `MUST` pass a comma-separated list of ''Data View identifier'' in the `x-fcs-dataviews` extra request parameter of the 'searchRetrieve' request. A Client can extract valid values for the ''Data View identifiers'' from the `@id` attribute of the `<ed:SupportedDataView>` elements in the Endpoint Description of the Endpoint (see section [#explain Operation ''explain''] and section [#endpointDescription Endpoint Description]).
     935
     936For example, to request the CMDI Data View from an Endpoint that has an Endpoint Description, as described in [#REF_Example_5 Example 5], a Client would need to use the ''Data View identifier'' `cmdi` and submit the following request:
     937{{{#!sh
     938http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-dataviews=cmdi
     939}}}
     940If an invalid ''Data View identifier'' is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/4`diagnostic, i.e. add the appropriate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. simply issue the diagnostic and perform no search, or it `MAY` treat it a non-fatal and perform the search.
     941
     942
     943= Normative Appendix
    710944{{{
    711945#!div style="border: 1px solid #000000; font-size: 75%"
    712946TODO: check and proof-read all sub-sections.
    713947}}}
    714 == List of extra request parameters ==
     948== List of extra request parameters
    715949The following extra request parameters are used in CLARIN-FCS. The column ''SRU operations'' lists the SRU operation, for which this extra request parameter is to be used. Clients `MUST NOT` use the parameter for an operation that is not listed in this column. However, if a Client sends an invalid parameter, an Endpoint `SHOULD` issue a fatal diagnostic "Unsupported Parameter" (`info:srw/diagnostic/1/8`) and stop processing the request. Alternatively, an Endpoint `MAY` silently ignore the invalid parameter.
    716 
    717 ||=Parameter Name =||=SRU operations =||=Allowed values =||=Description =||
     950||=Parameter Name   =||=SRU operations =||=Allowed values =||=Description =||
    718951|| `x-fcs-endpoint-description` || explain || `true`; all other values are reserved and `MUST` not be used by Clients || If the parameter is given (with the value `true`), the Endpoint `MUST` include an Endpoint Description in the `<sru:extraResponseData>` element of the ''explain'' response. ||
    719952|| `x-fcs-context` || searchRetrieve || A comma-separated list of persistent identifiers || The Endpoint `MUST` restrict the search to the resources identified by the persistent identifiers. ||
     
    721954|| `x-fcs-rewrites-allowed` || searchRetrieve || `true`; all other values are reserved and `MUST` not be used by Clients. \\ Clients `MUST` only use this parameter when performing an Advanced Search request. || If the parameter is given (with the value `true`), the Endpoint `MAY` rewrite the query to a simpler query to allow for more recall. ||
    722955
    723 == List of diagnostics ==
     956== List of diagnostics
    724957{{{
    725958#!div style="border: 1px solid #000000; font-size: 75%"
     
    727960}}}
    728961Apart from the SRU diagnostics defined in [#REF_SRU_12 OASIS-SRU-12, Appendix C] and [#REF_LOC_DIAG LOC-DIAG], the following diagnostics are used in CLARIN-FCS.  The column "Details Format" specifies what `SHOULD` be returned in the details field. If this column is blank, the format is "undefined" and the Endpoint `MAY` return whatever it feels appropriate, including nothing. The column "Impact" specifies, if the endpoint should continue ("non-fatal") or should stop ("fatal") processing.
    729 
    730 ||=Identifier URI =||=Description =||=Details Format =||=Impact =||=Note =||
     962||=Identifier URI                     =||=Description                                                          =||=Details Format =||=Impact =||=Note =||
    731963|| `http://clarin.eu/fcs/diagnostic/1` || Persistent identifier passed by the Client for restricting the search is invalid. || The offending persistent identifier. || non-fatal || If more than one invalid persistent identifiers were submitted by the Client, the Endpoint `MUST` generate a separate diagnostic for each invalid persistent identifier. ||
    732964|| `http://clarin.eu/fcs/diagnostic/2` || Resource set too large. Query context automatically adjusted. || The persistent identifier of the resource to which the query context was adjusted. || non-fatal || If an Endpoint limited the query context to more than one resource, it `MUST` generate a separate diagnostic for each resource to which the query context was adjusted. ||
    733965|| `http://clarin.eu/fcs/diagnostic/3` || Resource set too large. Cannot perform Query. || || fatal || ||
    734966|| `http://clarin.eu/fcs/diagnostic/4` || Requested Data View not valid for this resource. || The Data View MIME type. || non-fatal || If more than one invalid Data View was requested, the Endpoint `MUST` generate a separate diagnostic for each invalid Data View. ||
    735 || `http://clarin.eu/fcs/diagnostic/10` || General query syntax error. || Detailed error message why the query could not be parsed. || fatal || Endpoints `MUST` use this diagnostic only if the Client performed an Advanced Search request. ||
     967|| `http://clarin.eu/fcs/diagnostic/10` || General query syntax error.  || Detailed error message why the query could not be parsed. || fatal || Endpoints `MUST` use this diagnostic only if the Client performed an Advanced Search request. ||
    736968|| `http://clarin.eu/fcs/diagnostic/11` || Query too complex. Cannot perform Query. || Details why could not be performed, e.g. unsupported layer or unsupported combination of operators. || fatal || Endpoints `MUST` use this diagnostic only if the Client performed an Advanced Search request. ||
    737969|| `http://clarin.eu/fcs/diagnostic/12` || Query was rewritten. || Details how the query was rewritten. || non-fatal || Endpoints `MUST` use this diagnostic only if the Client performed an Advanced Search request with the `x-fcs-rewrites-allowed` request parameter. ||
    738970|| `http://clarin.eu/fcs/diagnostic/14` || General processing hint. || E.g. "No matches, because layer 'XY' is not available in your selection of resources" || non-fatal || Endpoints `MUST` use this diagnostic only if the Client performed an Advanced Search request. ||
    739971
    740 == CLARIN FCS-QL Grammar Specification == #fcsQLEBNF
     972== CLARIN FCS-QL Grammar Specification #fcsQLEBNF
    741973The version of the CLARIN FCS-QL is tied to the FCS Core version starting with version 2.0.
    742974
    743 The grammar specification for the FCS-QL is heavily based on Poliqarp but also with inspiration from other query languages' grammars. An unqualified or qualified "attribute" denotes the annotation layer to be used, e.g. unqualified "word", "lemma", "pos" or qualified "pos:stts". Default is "text" for compatibility with FCS 1.0 where simple wordforms in a pair of single or double quotes can be matched.
     975The grammar specification for the FCS-QL is heavily based on Poliqarp but also with inspiration from other query languages' grammars.
     976An unqualified or qualified "attribute" denotes the annotation layer to be used, e.g. unqualified "word", "lemma", "pos" or qualified "pos:stts". Default is "text" for compatibility with FCS 1.0 where simple wordforms in a pair of single or double quotes can be matched.
    744977
    745978=== FCS-QL EBNF ===
    746979{{{#!comment
    747 
    748980  Please keep the EBNF nicely formatted. Thanks!
    749 
    750 }}}
    751 
     981}}}
    752982{{{
    753983 [1] query                 ::= main-query within-part?
     
    8581088=== Notes ===
    8591089 * "simple-within-scope": possible values for scope
    860    * "sentence", "s", "utterance", "u": denote a matching scope of something like a sentence or utterance. provides compatibility with FCS 1.0 ("Generic Hits", "Each hit SHOULD be presented within the context of a complete sentence.")
     1090   *  "sentence", "s", "utterance", "u": denote a matching scope of something like a sentence or utterance. provides compatibility with FCS 1.0 ("Generic Hits", "Each hit SHOULD be presented within the context of a complete sentence.")
    8611091   * "paragraph" | "p" | "turn" | "t": denote the next larger unit, e.g. something like a paragraph
    8621092   * "article" | "session": something like a whole document
    863  * `[25]` and `[26]` "any $SOMETING codepoint" are a pain to get easily done in at least ANTLR and JavaCC. Especially in combination with `[27]`
     1093 * {{{[25]}}} and {{{[26]}}} "any $SOMETING codepoint" are a pain to get easily done in at least ANTLR and JavaCC. Especially in combination with {{{[27]}}}
    8641094 * regex are not defined/guarded by this grammar
    8651095
    866 = Non-normative Appendix =
     1096= Non-normative Appendix
    8671097{{{
    8681098#!div style="border: 1px solid #000000; font-size: 75%"
    8691099TODO: check and proof-read all sub-sections.
    8701100}}}
    871 == Syntax variant for Handle system Persistent Identifier URIs ==
     1101== Syntax variant for Handle system Persistent Identifier URIs
    8721102Persistent Identifiers from the Handle system are defined in two syntax variants: a regular URI format for the Handle protocol, i.e. with a `hdl:` prefix, or ''actionable'' URIs with a `http://hdl.handle.net/` prefix. Generally, CLARIN software should support both syntax variants, therefore the CLARIN-FCS Interface Specification does not endorse a specific syntax variant. However, Endpoints are recommended to use the ''actionable'' syntax variant.
    8731103
    874 == Referring to an Endpoint from a CMDI record ==
    875 Centers are encouraged to provide links to their CLARIN-FCS Endpoints in the metadata records for their resources. Other services, like the VLO, can use this information for automatically configuring an Aggregator for searching resources at the Endpoint.     To refer to an Endpoint, a `<cmdi:ResourceProxy>` element with child-element `<cmdi:ResourceType>` set to the value `SearchService` and a `@mimetype` attribute with a value of `application/sru+xml` need to be added to the CMDI record. The content of the `<cmdi:ResourceRef>` element must contain a URI that points to the Endpoint web service.
    876 
    877 Example: {{{#!xml <cmdi:CMD xmlns:cmdi="http://www.clarin.eu/cmd/" CMDVersion="1.1">
    878 
     1104== Referring to an Endpoint from a CMDI record
     1105Centers are encouraged to provide links to their CLARIN-FCS Endpoints in the metadata records for their resources. Other services, like the VLO, can use this information for automatically configuring an Aggregator for searching resources at the Endpoint.
     1106   
     1107To refer to an Endpoint, a `<cmdi:ResourceProxy>` element with child-element `<cmdi:ResourceType>` set to the value `SearchService` and a `@mimetype` attribute with a value of `application/sru+xml` need to be added to the CMDI record. The content of the `<cmdi:ResourceRef>` element must contain a URI that points to the Endpoint web service.
     1108
     1109Example:
     1110{{{#!xml
     1111<cmdi:CMD xmlns:cmdi="http://www.clarin.eu/cmd/" CMDVersion="1.1">
    8791112  <cmdi:Header>
    880     <!-- ... --> <cmdi:MdSelfLink> http://hdl.handle.net/4711/0815 </cmdi:MdSelfLink > <!-- ... -->
    881   </cmdi:Header > <cmdi:Resources>
     1113    <!-- ... -->
     1114    <cmdi:MdSelfLink>http://hdl.handle.net/4711/0815</cmdi:MdSelfLink>
     1115    <!-- ... -->
     1116  </cmdi:Header>
     1117  <cmdi:Resources>
    8821118    <cmdi:ResourceProxyList>
    883       <!-- ... --> <cmdi:ResourceProxy id="r4711">
    884         <cmdi:ResourceType mimetype="application/sru+xml"> SearchService </cmdi:ResourceType > <cmdi:ResourceRef> http://repos.example.org/fcs-endpoint </cmdi:ResourceRef >
    885       </cmdi:ResourceProxy > <!-- ... -->
    886     </cmdi:ResourceProxyList >
    887   </cmdi:Resources > <!-- ... -->
    888 
    889 </cmdi:CMD> }}}
    890 
    891 == Endpoint custom extensions == #extensionExample
     1119      <!-- ... -->
     1120      <cmdi:ResourceProxy id="r4711">
     1121        <cmdi:ResourceType mimetype="application/sru+xml">SearchService</cmdi:ResourceType>
     1122        <cmdi:ResourceRef>http://repos.example.org/fcs-endpoint</cmdi:ResourceRef>
     1123      </cmdi:ResourceProxy>
     1124      <!-- ... -->
     1125    </cmdi:ResourceProxyList>
     1126  </cmdi:Resources>
     1127  <!-- ... -->
     1128</cmdi:CMD>
     1129}}}
     1130
     1131== Endpoint custom extensions #extensionExample
    8921132The CLARIN-FCS protocol specification allows Endpoints to add custom data to their responses, e.g. to provide hints to an (XSLT/XQuery) application that works directly on CLARIN-FCS. It could use the custom data to generate back and forward links for a GUI to navigate in a result set.
    8931133
    894 The following example illustrates how extensions can be embedded into the Result Format: {{{#!xml <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/0815">
    895 
    896   <fcs:DataView type="application/x-clarin-fcs-hits+xml">
    897     <hits:Result xmlns:hits="http://clarin.eu/fcs/dataview/hits">
    898       The quick brown <hits:Hit> fox</hits:Hit > jumps over the lazy <hits:Hit> dog</hits:Hit >.
    899     </hits:Result >
    900   </fcs:DataView >
    901 
    902   <!--
    903     NOTE: this is purely fictional and only serves to demonstrate how
    904       to add custom extensions to the result representation within CLARIN-FCS.
    905   -->
    906 
    907   <!--
    908     Example 1: a hypothetical Endpoint extension for navigation in a result set: it basically provides a set of hrefs, that a GUI can convert into navigation buttions.
    909   --> <nav:navigation xmlns:nav="http://repos.example.org/navigation">
    910     <nav:curr href="http://repos.example.org/resultset/4711/4611" /> <nav:prev href="http://repos.example.org/resultset/4711/4610" /> <nav:next href="http://repos.example.org/resultset/4711/4612" />
    911   </nav:navigation >
    912 
    913   <!--
    914     Example 2: a hypothetical Endpoint extension for directly referencing parent resources: it basically provides a link to the parent resource, that can be exploited by a GUI (e.g. build on XSLT/XQuery).
    915   --> <parent:Parent  xmlns:parent="http://repos.example.org/parent "
    916     ref="http://repos.example.org/path/to/parent/1235.cmdi " />
    917 
    918 </fcs:Resource> }}}
    919 
    920 == Endpoint highlight hints for repositories ==
     1134The following example illustrates how extensions can be embedded into the Result Format:
     1135{{{#!xml
     1136<fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/0815">
     1137    <fcs:DataView type="application/x-clarin-fcs-hits+xml">
     1138      <hits:Result xmlns:hits="http://clarin.eu/fcs/dataview/hits">
     1139        The quick brown <hits:Hit>fox</hits:Hit> jumps over the lazy <hits:Hit>dog</hits:Hit>.
     1140      </hits:Result>
     1141    </fcs:DataView>
     1142   
     1143    <!--
     1144        NOTE: this is purely fictional and only serves to demonstrate how
     1145              to add custom extensions to the result representation
     1146              within CLARIN-FCS.
     1147    -->
     1148
     1149    <!--
     1150        Example 1: a hypothetical Endpoint extension for navigation in a result
     1151        set: it basically provides a set of hrefs, that a GUI can convert into
     1152        navigation buttions.   
     1153    -->
     1154    <nav:navigation xmlns:nav="http://repos.example.org/navigation">
     1155        <nav:curr href="http://repos.example.org/resultset/4711/4611" />
     1156        <nav:prev href="http://repos.example.org/resultset/4711/4610" />
     1157        <nav:next href="http://repos.example.org/resultset/4711/4612" />
     1158    </nav:navigation>
     1159
     1160    <!--
     1161       Example 2: a hypothetical Endpoint extension for directly referencing parent
     1162       resources: it basically provides a link to the parent resource, that can be
     1163       exploited by a GUI (e.g. build on XSLT/XQuery).
     1164    -->
     1165    <parent:Parent xmlns:parent="http://repos.example.org/parent"
     1166                   ref="http://repos.example.org/path/to/parent/1235.cmdi" />
     1167</fcs:Resource>
     1168}}}
     1169
     1170== Endpoint highlight hints for repositories
    9211171An Aggregator can use the `@ref` attributes of the `<fcs:Resource>`, `<fcs:ResourceFragment>` or `<fcs:DataView>` elements to provide a link for the user to directly jump to the resource at a Repository. To support hit highlighting, an Endpoint can augment the URI in the `@ref` attribute with query parameters to implement hit highlighting in the Repository.
    9221172
    923 In the following example, the URI `http://repos.example.org/resource.cgi/<pid>` is a CGI script that displays a given resource at the Repository in HTML format and uses the `highlight` query parameter to add highlights to the resource. Of course, it's up to the Endpoint and the Repository, if and how they implement such a feature.  {{{#!xml <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/0815">
    924 
     1173In the following example, the URI `http://repos.example.org/resource.cgi/<pid>` is a CGI script that displays a given resource at the Repository in HTML format and uses the `highlight` query parameter to add highlights to the resource. Of course, it's up to the Endpoint and the Repository, if and how they implement such a feature.
     1174{{{#!xml
     1175<fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/0815">
    9251176  <fcs:DataView type="application/x-clarin-fcs-hits+xml" ref="http://repos.example.org/resource.cgi/4711/0815?highlight=fox">
    9261177    <hits:Result xmlns:hits="http://clarin.eu/fcs/dataview/hits">
    927       The quick brown <hits:Hit> fox</hits:Hit > jumps over the lazy dog.
    928     </hits:Result >
    929   </fcs:DataView >
    930 
    931 </fcs:Resource> }}}
     1178      The quick brown <hits:Hit>fox</hits:Hit> jumps over the lazy dog.
     1179    </hits:Result>
     1180  </fcs:DataView>
     1181</fcs:Resource>
     1182}}}