Changes between Version 50 and Version 51 of Taskforces/FCS/FCS-Specification-Draft


Ignore:
Timestamp:
06/09/17 09:49:15 (7 years ago)
Author:
Leif-Jöran
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Taskforces/FCS/FCS-Specification-Draft

    v50 v51  
    11{{{
    22#!div class="system-message"
    3 '''NOTE''': This page is in final editing. Final draft is scheduled to be delivered by 2017-04-28.
     3'''NOTE''': This page is final draft scheduled for approval by SCCTC May 2017.
    44}}}
    55[[PageOutline(1-6)]]
    6 = CLARIN Federated Content Search (CLARIN-FCS) - Core 2.0
    7 
    8 = Introduction
     6
     7= CLARIN Federated Content Search (CLARIN-FCS) - Core 2.0 =
     8= Introduction =
    99{{{
    1010#!div style="border: 1px solid #000000; font-size: 75%"
     
    1313The goal of the ''CLARIN Federated Content Search (CLARIN-FCS) - Core'' specification is to introduce an ''interface specification'' that decouples the ''search engine'' functionality from its ''exploitation'', i.e. user-interfaces, third-party applications, and to allow services to access heterogeneous search engines in a uniform way.
    1414
    15 == Terminology
     15== Terminology ==
    1616The key words `MUST`, `MUST NOT`, `REQUIRED`, `SHALL`, `SHALL NOT`, `SHOULD`, `SHOULD NOT`, `RECOMMENDED`, `MAY`, and `OPTIONAL` in this document are to be interpreted as described in [#REF_RFC_2119 RFC2119].
    1717
    18 == Glossary
    19  Aggregator::
    20     A module or service to dispatch queries to repositories and collect results.
    21 
    22  [=#REF_Annotation_Layer Annotation Layer]::
    23     An annotation layer is the sum of possible annotations for a language resource, such as part of speech or orthographic transcription. Usually it is related to a given annotation task or topic. For the scope of the specification it is used as synonym for annotation tier.
    24 
    25  CLARIN-FCS, FCS::
    26     CLARIN federated content search, an interface specification to allow searching within resource content of repositories.
    27 
    28  Client::
    29     A software component, which implements the interface specification to query Endpoints, i.e. an aggregator or a user-interface.     
    30 
    31  CQL::
    32     Contextual Query Language, previously known as Common Query Language, is a domain specific language for representing queries to information retrieval systems such as search engines, bibliographic catalogs and museum collection information.
    33 
    34  Data View::
    35     A Data View is a mechanism to support different representations of search results, e.g. a "hits with highlights" view, an image or a geolocation.
    36 
    37  Data View Payload, Payload::
    38     The actual content encoded within a Data View, i.e. a CMDI metadata record or a KML encoded geolocation.
    39 
    40  Endpoint::
    41     A software component, which implements the CLARIN-FCS interface specification and translates between CLARIN-FCS and a search engine.
    42 
    43  FCS-QL::
    44     Federated Content Search Query Language is the query language used in the advanced CLARIN-FCS profile. It is derived from Corpus Workbench's [#REF_CQP_Tutorial CQP-TUTORIAL]
    45 
    46  Hit::
    47     A piece of data returned by a Search Engine that matches the search criterion. What is considered a Hit highly depends on Search Engine.
    48 
    49  Interface Specification::
    50     Common harmonized interface and suite of protocols that repositories need to implement.
    51 
    52  Layer::
    53     See [#REF_Annotation_Layer ''Annotation Layer'']
    54 
    55  PID::
    56     A Persistent identifier is a long-lasting reference to a digital object.
    57 
    58  Repository::
    59     A software component at a CLARIN center that stores resources (= data) and information about these resources (= metadata).
    60 
    61  Repository Registry::
    62     A separate service that allows registering Repositories and their Endpoints and provides information about these to other components, e.g. an Aggregator. The [http://centres.clarin.eu/ CLARIN Center Registry] is an implementation of such a repository registry.
    63 
    64  Resource::
    65     A searchable and addressable entity at an Endpoint, such as a text corpus or a multi-modal corpus.
    66 
    67  Resource Fragment::
    68     A smaller unit in a Resource, i.e. a sentence in a text corpus or a time interval in an audio transcription.
    69 
    70  Result Set::
    71     An (ordered) set of hits that match a search criterion produced by a search engine as the result of processing a query.
    72 
    73  Search Engine::
    74     A software component within a repository, that allows for searching within the repository contents.
    75 
    76  SRU::
    77     Search and Retrieve via URL, is a protocol for Internet search queries. Originally introduced by Library of Congress [#REF_LOC_SRU_12 LOC-SRU12], later standardization process moved to OASIS [#REF_SRU_12 OASIS-SRU12].
    78 
    79 == Normative References
    80  RFC2119[=#REF_RFC_2119]::
    81     Key words for use in RFCs to Indicate Requirement Levels, IETF RFC 2119, March 1997, \\
    82     [http://www.ietf.org/rfc/rfc2119.txt]
    83 
    84  XML-Namespaces[=#REF_XML_Namespaces]::
    85     Namespaces in XML 1.0 (Third Edition), W3C, 8 December 2009, \\
    86     [http://www.w3.org/TR/2009/REC-xml-names-20091208/]
    87 
    88  OASIS-SRU-Overview[=#REF_SRU_Overview]::
    89     searchRetrieve: Part 0. Overview Version 1.0, OASIS, January 2013, \\
    90     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.doc]
    91     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.html (HTML)],
    92     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.pdf (PDF)]
    93 
    94  OASIS-SRU-APD[=#REF_SRU_APD]::
    95     searchRetrieve: Part 1. Abstract Protocol Definition Version 1.0, OASIS, January 2013, \\
    96     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.doc]
    97     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.html (HTML)]
    98     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.pdf (PDF)]
    99 
    100  OASIS-SRU12[=#REF_SRU_12]::
    101     searchRetrieve: Part 2. SRU searchRetrieve Operation: APD Binding for SRU 1.2 Version 1.0, OASIS, January 2013, \\
    102     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.doc]
    103     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.html (HTML)]
    104     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.pdf (PDF)]
    105 
    106  OASIS-SRU20[=#REF_SRU_20]::
    107     searchRetrieve: Part 3. SRU searchRetrieve Operation: APD Binding for SRU 2.0 Version 1.0, OASIS, January 2013, \\
    108     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part3-sru2.0/searchRetrieve-v1.0-os-part3-sru2.0.doc]
    109     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part3-sru2.0/searchRetrieve-v1.0-os-part3-sru2.0.html (HTML)]
    110     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part3-sru2.0/searchRetrieve-v1.0-os-part3-sru2.0.pdf (PDF)]
    111 
    112  OASIS-CQL[=#REF_CQL]::
    113     searchRetrieve: Part 5. CQL: The Contextual Query Language version 1.0, OASIS, January 2013, \\
    114     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.doc]
    115     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.html (HTML)]
    116     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.pdf (PDF)]
    117 
    118  SRU-Explain[=#REF_Explain]::
    119     searchRetrieve: Part 7. SRU Explain Operation version 1.0, OASIS, January 2013, \\
    120     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.doc]
    121     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.html (HTML)]
    122     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.pdf (PDF)]
    123 
    124  SRU-Scan[=#REF_Scan]::
    125     searchRetrieve: Part 6. SRU Scan Operation version 1.0, OASIS, January 2013, \\
    126     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.doc] 
    127     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.html (HTML)]
    128     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.PDF (PDF)]
    129 
    130  LOC-SRU12[=#REF_LOC_SRU_12]::
    131     SRU Version 1.2: SRU !Search/Retrieve Operation, Library of Congress, \\
    132     [http://www.loc.gov/standards/sru/sru-1-2.html]
    133 
    134  LOC-DIAG[=#REF_LOC_DIAG]::
    135     SRU Version 1.2: SRU Diagnostics List, Library of Congress,\\
    136     [http://www.loc.gov/standards/sru/diagnostics/diagnosticsList.html]
    137 
    138  UD-POS[=#REF_UD_POS]::
    139     Universal Dependencies, Universal POS tags, \\
    140     [https://universaldependencies.github.io/docs/u/pos/index.html]
    141 
    142  SAMPA[=#REF_SAMPA]::
    143     Dafydd Gibbon, Inge Mertins, Roger Moore (Eds.): Handbook of Multimodal and Spoken Language Systems. Resources, Terminology and Product Evaluation, Kluwer Academic Publishers, Boston MA, 2000, ISBN 0-7923-7904-7
    144 
    145  CLARIN-FCS-!DataViews[=#REF_FCS_DataViews]::
    146     CLARIN Federated Content Search (CLARIN-FCS) - Data Views, SCCTC FCS Task-Force, April 2014, \\
    147     [https://trac.clarin.eu/wiki/FCS/Dataviews]
    148 
    149 == Non-Normative References
    150  CQP-TUTORIAL[=#REF_CQP_Tutorial]::
    151     Evert et al.: The IMS Open Corpus Workbench (CWB) CQP Query Language Tutorial, CWB Version 3.0, February 2010, \\
    152     [http://cwb.sourceforge.net/files/CQP_Tutorial/]
    153 
    154  RFC6838[=#REF_RFC_6838]::
    155     Media Type Specifications and Registration Procedures, IETF RFC 6838, January 2013, \\
    156     [http://www.ietf.org/rfc/rfc6838.txt]
    157 
    158  RFC3023[=#REF_RFC_3023]::
    159     XML Media Types, IETF RFC 3023, January 2001, \\
    160     [http://www.ietf.org/rfc/rfc3023.txt]
    161 
    162 == Typographic and XML Namespace conventions
     18== Glossary ==
     19 Aggregator:: A module or service to dispatch queries to repositories and collect results.
     20
     21 [=#REF_Annotation_Layer Annotation Layer]:: An annotation layer is the sum of possible annotations for a language resource, such as part of speech or orthographic transcription. Usually it is related to a given annotation task or topic. For the scope of the specification it is used as synonym for annotation tier.
     22
     23 CLARIN-FCS, FCS:: CLARIN federated content search, an interface specification to allow searching within resource content of repositories.
     24
     25 Client:: A software component, which implements the interface specification to query Endpoints, i.e. an aggregator or a user-interface.
     26
     27 CQL:: Contextual Query Language, previously known as Common Query Language, is a domain specific language for representing queries to information retrieval systems such as search engines, bibliographic catalogs and museum collection information.
     28
     29 Data View:: A Data View is a mechanism to support different representations of search results, e.g. a "hits with highlights" view, an image or a geolocation.
     30
     31 Data View Payload, Payload:: The actual content encoded within a Data View, i.e. a CMDI metadata record or a KML encoded geolocation.
     32
     33 Endpoint:: A software component, which implements the CLARIN-FCS interface specification and translates between CLARIN-FCS and a search engine.
     34
     35 FCS-QL:: Federated Content Search Query Language is the query language used in the advanced CLARIN-FCS profile. It is derived from Corpus Workbench's [#REF_CQP_Tutorial CQP-TUTORIAL]
     36
     37 Hit:: A piece of data returned by a Search Engine that matches the search criterion. What is considered a Hit highly depends on Search Engine.
     38
     39 Interface Specification:: Common harmonized interface and suite of protocols that repositories need to implement.
     40
     41 Layer:: See [#REF_Annotation_Layer "'Annotation Layer'"]
     42
     43 PID:: A Persistent identifier is a long-lasting reference to a digital object.
     44
     45 Repository:: A software component at a CLARIN center that stores resources (= data) and information about these resources (= metadata).
     46
     47 Repository Registry:: A separate service that allows registering Repositories and their Endpoints and provides information about these to other components, e.g. an Aggregator. The [http://centres.clarin.eu/ CLARIN Center Registry] is an implementation of such a repository registry.
     48
     49 Resource:: A searchable and addressable entity at an Endpoint, such as a text corpus or a multi-modal corpus.
     50
     51 Resource Fragment:: A smaller unit in a Resource, i.e. a sentence in a text corpus or a time interval in an audio transcription.
     52
     53 Result Set:: An (ordered) set of hits that match a search criterion produced by a search engine as the result of processing a query.
     54
     55 Search Engine:: A software component within a repository, that allows for searching within the repository contents.
     56
     57 SRU:: Search and Retrieve via URL, is a protocol for Internet search queries. Originally introduced by Library of Congress [#REF_LOC_SRU_12 LOC-SRU12], later standardization process moved to OASIS [#REF_SRU_12 OASIS-SRU12].
     58
     59== Normative References ==
     60 RFC2119[=#REF_RFC_2119]:: Key words for use in RFCs to Indicate Requirement Levels, IETF RFC 2119, March 1997, \\ [http://www.ietf.org/rfc/rfc2119.txt]
     61
     62 XML-Namespaces[=#REF_XML_Namespaces]:: Namespaces in XML 1.0 (Third Edition), W3C, 8 December 2009, \\ [http://www.w3.org/TR/2009/REC-xml-names-20091208/]
     63
     64 OASIS-SRU-Overview[=#REF_SRU_Overview]:: searchRetrieve: Part 0. Overview Version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.doc] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.html (HTML)], [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.pdf (PDF)]
     65
     66 OASIS-SRU-APD[=#REF_SRU_APD]:: searchRetrieve: Part 1. Abstract Protocol Definition Version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.doc] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.html (HTML)] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.pdf (PDF)]
     67
     68 OASIS-SRU12[=#REF_SRU_12]:: searchRetrieve: Part 2. SRU searchRetrieve Operation: APD Binding for SRU 1.2 Version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.doc] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.html (HTML)] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.pdf (PDF)]
     69
     70 OASIS-SRU20[=#REF_SRU_20]:: searchRetrieve: Part 3. SRU searchRetrieve Operation: APD Binding for SRU 2.0 Version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part3-sru2.0/searchRetrieve-v1.0-os-part3-sru2.0.doc] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part3-sru2.0/searchRetrieve-v1.0-os-part3-sru2.0.html (HTML)] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part3-sru2.0/searchRetrieve-v1.0-os-part3-sru2.0.pdf (PDF)]
     71
     72 OASIS-CQL[=#REF_CQL]:: searchRetrieve: Part 5. CQL: The Contextual Query Language version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.doc] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.html (HTML)] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.pdf (PDF)]
     73
     74 SRU-Explain[=#REF_Explain]:: searchRetrieve: Part 7. SRU Explain Operation version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.doc] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.html (HTML)] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.pdf (PDF)]
     75
     76 SRU-Scan[=#REF_Scan]:: searchRetrieve: Part 6. SRU Scan Operation version 1.0, OASIS, January 2013, \\ [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.doc]   [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.html (HTML)] [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.PDF (PDF)]
     77
     78 LOC-SRU12[=#REF_LOC_SRU_12]:: SRU Version 1.2: SRU !Search/Retrieve Operation, Library of Congress, \\ [http://www.loc.gov/standards/sru/sru-1-2.html]
     79
     80 LOC-DIAG[=#REF_LOC_DIAG]:: SRU Version 1.2: SRU Diagnostics List, Library of Congress,\\ [http://www.loc.gov/standards/sru/diagnostics/diagnosticsList.html]
     81
     82 UD-POS[=#REF_UD_POS]:: Universal Dependencies, Universal POS tags v2.0, \\ [https://universaldependencies.org/u/pos/index.html]
     83
     84 SAMPA[=#REF_SAMPA]:: Dafydd Gibbon, Inge Mertins, Roger Moore (Eds.): Handbook of Multimodal and Spoken Language Systems. Resources, Terminology and Product Evaluation, Kluwer Academic Publishers, Boston MA, 2000, ISBN 0-7923-7904-7
     85
     86 CLARIN-FCS-!DataViews[=#REF_FCS_DataViews]:: CLARIN Federated Content Search (CLARIN-FCS) - Data Views, SCCTC FCS Task-Force, April 2014, \\ [https://trac.clarin.eu/wiki/FCS/Dataviews]
     87
     88== Non-Normative References ==
     89 CQP-TUTORIAL[=#REF_CQP_Tutorial]:: Evert et al.: The IMS Open Corpus Workbench (CWB) CQP Query Language Tutorial, CWB Version 3.0, February 2010, \\ [http://cwb.sourceforge.net/files/CQP_Tutorial/]
     90
     91 RFC6838[=#REF_RFC_6838]:: Media Type Specifications and Registration Procedures, IETF RFC 6838, January 2013, \\ [http://www.ietf.org/rfc/rfc6838.txt]
     92
     93 RFC3023[=#REF_RFC_3023]:: XML Media Types, IETF RFC 3023, January 2001, \\ [http://www.ietf.org/rfc/rfc3023.txt]
     94
     95== Typographic and XML Namespace conventions ==
    16396The following typographic conventions for XML fragments will be used throughout this specification:
     97
    16498 * `<prefix:Element>` \\ An XML element with the Generic Identifier ''Element'' that is bound to an XML namespace denoted by the prefix ''prefix''.
    16599 * `@attr` \\ An XML attribute with the name ''attr''
     100
    166101{{{#!comment
     102
    167103 * `@prefix:attr` \\ An XML attribute with the name ''attr'' that is bound to an XML namespaces denoted by the prefix ''prefix''.
    168 }}}
     104
     105}}}
     106
    169107 * `string` \\ The literal ''string'' must be used either as element content or attribute value.
     108
    170109Endpoints and Clients `MUST` adhere to the [#REF_XML_Namespaces XML-Namespaces] specification. The CLARIN-FCS interface specification generally does not dictate whether XML elements should be serialized in their prefixed or non-prefixed syntax, but Endpoints `MUST` ensure that the correct XML namespace is used for elements and that XML namespaces are declared correctly. Clients `MUST` be agnostic regarding syntax for serializing the XML elements, i.e. if the prefixed or un-prefixed variant was used, and `SHOULD` operate solely on ''expanded names'', i.e. pairs of ''namespace name'' and ''local name''.
    171110
    172111The following XML namespace names and prefixes are used throughout this specification. The column "Recommended Syntax" indicates which syntax variant `SHOULD` be used by the Endpoint to serialize the XML response.
    173 ||=Prefix =||=Namespace Name                                       =||=Comment                                                 =||=Recommended Syntax =||
    174 || `fcs`   || `http://clarin.eu/fcs/resource`                       || CLARIN-FCS Resources                                     || prefixed ||
    175 || `ed`    || `http://clarin.eu/fcs/endpoint-description`           || CLARIN-FCS Endpoint Description                          || prefixed ||
    176 || `hits`  || `http://clarin.eu/fcs/dataview/hits`                  || CLARIN-FCS Generic Hits Data View                        || prefixed ||
    177 || `adv`   || `http://clarin.eu/fcs/dataview/advanced`              || CLARIN-FCS Advanced Data View                            || prefixed ||
    178 || `sru`   || `http://docs.oasis-open.org/ns/search-ws/sruResponse` || SRU Version 2.0                                          || prefixed ||
    179 || `diag`  || `http://docs.oasis-open.org/ns/search-ws/diagnostic`  || SRU Version 2.0 Diagnostics                              || prefixed ||
    180 || `zr`    || `http://explain.z3950.org/dtd/2.0/`                   || SRU/ZeeRex Explain                                       || prefixed ||
    181 || `sru`   || `http://www.loc.gov/zing/srw/`                        || SRU Version 1.2, ''only compatibility mode''             || prefixed ||
    182 || `diag`  || `http://www.loc.gov/zing/srw/diagnostic/`             || SRU Version 1.2 Diagnostics, ''only compatibility mode'' || prefixed ||
    183 
    184 = CLARIN-FCS Interface Specification
     112
     113||=Prefix =||=Namespace Name =||=Comment =||=Recommended Syntax =||
     114|| `fcs` || `http://clarin.eu/fcs/resource` || CLARIN-FCS Resources || prefixed ||
     115|| `ed` || `http://clarin.eu/fcs/endpoint-description` || CLARIN-FCS Endpoint Description || prefixed ||
     116|| `hits` || `http://clarin.eu/fcs/dataview/hits` || CLARIN-FCS Generic Hits Data View || prefixed ||
     117|| `adv` || `http://clarin.eu/fcs/dataview/advanced` || CLARIN-FCS Advanced Data View || prefixed ||
     118|| `sru` || `http://docs.oasis-open.org/ns/search-ws/sruResponse` || SRU Version 2.0 || prefixed ||
     119|| `diag` || `http://docs.oasis-open.org/ns/search-ws/diagnostic` || SRU Version 2.0 Diagnostics || prefixed ||
     120|| `zr` || `http://explain.z3950.org/dtd/2.0/` || SRU/ZeeRex Explain || prefixed ||
     121|| `sru` || `http://www.loc.gov/zing/srw/` || SRU Version 1.2, ''only compatibility mode'' || prefixed ||
     122|| `diag` || `http://www.loc.gov/zing/srw/diagnostic/` || SRU Version 1.2 Diagnostics, ''only compatibility mode'' || prefixed ||
     123
     124= CLARIN-FCS Interface Specification =
    185125The CLARIN-FCS Interface Specification defines a set of capabilities, an extensible result format and a set of required operations. CLARIN-FCS is built on the SRU/CQL standard and additional functionality required for CLARIN-FCS is added through SRU/CQL's extension mechanisms.
    186126
    187127Specifically, the CLARIN-FCS Interface Specification consists of two parts, a set of formats, and a transport protocol. The ''Endpoint'' component is a software component that acts as a bridge between a ''Client'' and a ''Search Engine'' and passes the requests sent by the ''Client'' to the ''Search Engine''. The ''Search Engine'' is a custom software component that allows the search of language resources in a Repository. The ''Endpoint'' implements the ''Transport Protocol'' and acts as a mediator between the CLARIN-FCS specific formats and the idiosyncrasies of ''Search Engines'' of the individual Repositories. The following figure illustrates the overall architecture:
     128
    188129{{{
    189130                   +---------+
     
    216157In general, the work flow in CLARIN-FCS is as follows: a Client submits a query to an Endpoint. The Endpoint translates the query from CQL or FCS-QL to the query dialect used by the Search Engine and submits the translated query to the Search Engine. The Search Engine processes the query and generates a result set, i.e. it compiles a set of hits that match the search criterion. The Endpoint then translates the results from the Search Engine-specific result set format to the CLARIN-FCS result format and sends them to the Client.
    217158
    218 == Discovery #Discovery
     159== Discovery == #Discovery
    219160The ''Discovery'' step allows a Client to gather information about an Endpoint, in particular which capabilities are supported or which resources are available for searching.
    220161
    221 === Capabilities
     162=== Capabilities ===
    222163A ''Capability'' defines a certain feature set that is part of CLARIN-FCS, e.g. what kind of queries are supported. Each Endpoint implements some (or all) of these Capabilities. The Endpoint will announce the capabilities it provides to allow a Client to auto-tune itself (see section [#endpointDescription Endpoint Description]). Each Capability is identified by a ''Capability Identifier'', which uses the URI syntax. The following Capabilities are defined in CLARIN-FCS:
    223 ||=Name               =||=Capability Identifier                            =||=Summary                                      =||
    224 || ''Basic Search''    || `http://clarin.eu/fcs/capability/basic-search`    || Simple full-text searching                    ||
     164
     165||=Name =||=Capability Identifier =||=Summary =||
     166|| ''Basic Search'' || `http://clarin.eu/fcs/capability/basic-search` || Simple full-text searching ||
    225167|| ''Advanced Search'' || `http://clarin.eu/fcs/capability/advanced-search` || Searching in structured and/or annotated data ||
    226168
    227169Endpoints `MUST` implement the ''Basic Search'' Capability. Endpoints `MUST NOT` invent custom Capability Identifiers and `MUST` only use the values defined above.
    228170
    229 
    230 === Endpoint Description #endpointDescription
     171=== Endpoint Description === #endpointDescription
    231172{{{
    232173#!div style="border: 1px solid #000000; font-size: 75%"
     
    236177
    237178The XML fragment for ''Endpoint Description'' is encoded as an `<ed:EndpointDescription>` element, that contains the following attributes and children:
     179
    238180 * one `@version` attribute (`REQUIRED`) on the `<ed:EndpointDescription>` element. The value of the `@version` attribute `MUST` be `2`.
    239  * one `<ed:Capabilities>` element (`REQUIRED`) that contains one or more `<ed:Capability>` elements \\
    240    The content of the `<ed:Capability>` element is a Capability Identifier, that indicates the capabilities, that are supported by the Endpoint. For valid values for the Capability Identifier, see section [#capabilities Capabilities]. This list `MUST NOT` include duplicate values.
    241  * one `<ed:SupportedDataViews>` element (`REQUIRED`) \\
    242    A list of Data Views that are supported by this Endpoint. This list is composed of one or more `<ed:SupportedDataView>` elements. The content of a `<ed:SupportedDataView>` `MUST` be the MIME type of a supported Data View, e.g. `application/x-clarin-fcs-hits+xml`. Each `<ed:SupportedDataView>` element `MUST` carry a `@id` and a `@delivery-policy` attribute. The value of the `@id` attribute is later used in the `<ed:Resource>` element to indicate, which Data View is supported by a resource (see below). Endpoints `SHOULD` use the recommended short identifier for the Data View. The `@delivery-policy` indicates, the Endpoint's delivery policy, for that Data View. Valid values are `send-by-default` for the ''send-by-default'' and `need-to-request` for the ''need-to-request'' delivery policy. \\
    243    This list `MUST NOT` include duplicate entries, i.e. no MIME type must appear more than once. \\
    244    The value of the `@id` attribute `MUST NOT` contain the characters `,` (comma) or `;` (semicolon)
    245  * one `<ed:SupportedLayers>` element (`REQUIRED` if Endpoint supports ''Advanced Search'' capability) \\
    246    A list of Layers that are generally supported by this Endpoint. This list is composed of one or more `<ed:SupportedLayer>` elements. The content of a `<ed:SupportedLayer>` `MUST` be the identifier of a Layer (see [#layers section "Layers"]), e.g. `orth`. Each `<ed:SupportedLayer>` element `MUST` carry an `@id` and a `@delivery-policy` attribute. The value of the `@id` attribute is later used in the `<ed:Resource>` element to indicate, which Data View is supported by a resource (see below). The `@result-id` attribute is used in the Advanced Data View (see [#advancedDataView section "Advanced Data View"]). Each `<ed:SupportedLayer>` element `MAY` carry an optional `@qualifier` attribute. It is used a a qualifier in a FCS-QL search term in to address this specific layer. \\
    247    This list `MUST NOT` include duplicate entries, i.e. no Layer with the same `@result-id` MIME type must appear more than once. \\
    248    The value of the `@id` or `@result-id` attribute `MUST NOT` contain the characters `,` (comma) or `;` (semicolon)
    249    The value of the `@qualifier` attribute `MUST NOT` contain characters other than `a`-`z`,`A`-`Z`,`0`-`9` and `-` (hyphen).
    250    The `<ed:SupportedLayer>` element `MAY` carry an `@alt-value-info` and `@alt-value-info-uri` attribute; `@alt-value-info` `SHOULD` contain a sort description about the layer, e.g. the original tag set used; `@alt-value-info-uri` `MUST` contain a well-formed URI and `SHOULD` point to a web site with further information, e.g. about the original tag set and how the translation to FCS is done. Client, e.g. the Aggregator, can display this information together with the search result.
    251  * one `<ed:Resources>` element (`REQUIRED`) \\
    252    A list of (top-level) resources that are available, i.e. searchable, at the Endpoint. The `<ed:Resources>` element contains one or more `<ed:Resource>` elements (see below). The Endpoint `MUST` declare at least one (top-level) resource.
     181 * one `<ed:Capabilities>` element (`REQUIRED`) that contains one or more `<ed:Capability>` elements \\ The content of the `<ed:Capability>` element is a Capability Identifier, that indicates the capabilities, that are supported by the Endpoint. For valid values for the Capability Identifier, see section [#capabilities Capabilities]. This list `MUST NOT` include duplicate values.
     182 * one `<ed:SupportedDataViews>` element (`REQUIRED`) \\ A list of Data Views that are supported by this Endpoint. This list is composed of one or more `<ed:SupportedDataView>` elements. The content of a `<ed:SupportedDataView>` `MUST` be the MIME type of a supported Data View, e.g. `application/x-clarin-fcs-hits+xml`. Each `<ed:SupportedDataView>` element `MUST` carry a `@id` and a `@delivery-policy` attribute. The value of the `@id` attribute is later used in the `<ed:Resource>` element to indicate, which Data View is supported by a resource (see below). Endpoints `SHOULD` use the recommended short identifier for the Data View. The `@delivery-policy` indicates, the Endpoint's delivery policy, for that Data View. Valid values are `send-by-default` for the ''send-by-default'' and `need-to-request` for the ''need-to-request'' delivery policy. \\ This list `MUST NOT` include duplicate entries, i.e. no MIME type must appear more than once. \\ The value of the `@id` attribute `MUST NOT` contain the characters `,` (comma) or `;` (semicolon)
     183 * one `<ed:SupportedLayers>` element (`REQUIRED` if Endpoint supports ''Advanced Search'' capability) \\ A list of Layers that are generally supported by this Endpoint. This list is composed of one or more `<ed:SupportedLayer>` elements. The content of a `<ed:SupportedLayer>` `MUST` be the identifier of a Layer (see [#layers section "Layers"]), e.g. `orth`. Each `<ed:SupportedLayer>` element `MUST` carry an `@id` and a `@delivery-policy` attribute. The value of the `@id` attribute is later used in the `<ed:Resource>` element to indicate, which Data View is supported by a resource (see below). The `@result-id` attribute is used in the Advanced Data View (see [#advancedDataView section "Advanced Data View"]). Each `<ed:SupportedLayer>` element `MAY` carry an optional `@qualifier` attribute. It is used a a qualifier in a FCS-QL search term in to address this specific layer. \\ This list `MUST NOT` include duplicate entries, i.e. no Layer with the same `@result-id` MIME type must appear more than once. \\ The value of the `@id` or `@result-id` attribute `MUST NOT` contain the characters `,` (comma) or `;` (semicolon) The value of the `@qualifier` attribute `MUST NOT` contain characters other than `a`-`z`,`A`-`Z`,`0`-`9` and `-` (hyphen). The `<ed:SupportedLayer>` element `MAY` carry an `@alt-value-info` and `@alt-value-info-uri` attribute; `@alt-value-info` `SHOULD` contain a sort description about the layer, e.g. the original tag set used; `@alt-value-info-uri` `MUST` contain a well-formed URI and `SHOULD` point to a web site with further information, e.g. about the original tag set and how the translation to FCS is done. Client, e.g. the Aggregator, can display this information together with the search result.
     184 * one `<ed:Resources>` element (`REQUIRED`) \\ A list of (top-level) resources that are available, i.e. searchable, at the Endpoint. The `<ed:Resources>` element contains one or more `<ed:Resource>` elements (see below). The Endpoint `MUST` declare at least one (top-level) resource.
    253185
    254186The `<ed:Resource>` element contains a basic description of a resource that is available at the Endpoint. A resource is a searchable entity, e.g. a single corpus. The `<ed:Resources>` has a mandatory `@pid` attribute that contains persistent identifier of the resource. This value `MUST` be the same as the ''!MdSelfLink'' of the CMDI record describing the resource. The `<ed:Resources>` element contains the following children:
    255  * one or more `<ed:Title>` elements (`REQUIRED`) \\
    256    A human readable title for the resource. A `REQUIRED` `@xml:lang` attribute indicates the language of the title. An English version of  the title is `REQUIRED`. The list of titles `MUST NOT` contain duplicate entries for the same language.
    257  * zero or more `<ed:Description>` elements (`OPTIONAL`) \\
    258    An optional human-readable description of the resource. It `SHOULD` be at most one sentence. A `REQUIRED` `@xml:lang` attribute indicates the language of the description. If supplied, an English version of the description is `REQUIRED`. The list of descriptions `MUST NOT` contain duplicate entries for the same language.
    259  * zero or one `<ed:LandingPageURI>` element (`OPTIONAL`) \\
    260    A link to a website for the resource, e.g. a landing page for a resource, i.e. a web-site that describes a corpus.
    261  * one `<ed:Languages>` element (`REQUIRED`) \\
    262    The (relevant) languages available within the resource. The `<ed:Languages>` element contains one or more `<ed:Language>` elements. The content of a `<ed:Language>` element `MUST` be a ISO 639-3 three letter language code. This element should be repeated for all languages (relevant) available ''within'' the resource, however this list `MUST NOT` contain duplicate entries.
    263  * one `<ed:AvailableDataViews>` element (`REQUIRED`) \\
    264    The Data Views that are available for the resource. The `<ed:AvailableDataViews>` element `MUST` carry a `@ref` attribute, that contains a whitespace separated list of id values, that correspond to value of the appropriate `@id` attribute for the `<ed:SupportedDataView>` elements that are referenced. \\
    265    In case of sub-resources, each Resource `SHOULD` support all Data Views that are supported by the parent resource. However, every resource `MUST` declare all available Data Views independently, i.e. there is no implicit inheritance semantic.
    266  * one `<ed:AvailableLayers>` element (`REQUIRED` if Endpoint supports ''Advanced Search'' capability). The `<ed:AvailableLayers>` element `MUST` carry a `@ref` attribute, that contains a whitespace separated list of id values, that correspond to the value of the appropriate `@id` attribute for the `<ed:SupportedLayer>` elements that are referenced. \\
    267    In case of sub-resources, each Resource `SHOULD` support all Layers that are supported by the parent resource. However, every resource `MUST` declare all available Layers independently, i.e. there is no implicit inheritance semantic.
    268 * zero or one `<ed:Resources>` element (`OPTIONAL`) \\
    269   If a resource has searchable sub-resources, the Endpoint `MUST` supply additional finer grained resource elements, which are wrapped in a `<ed:Resources>` element. A sub-resource is a searchable entity within a resource, e.g. a sub-corpus.
    270 
    271 [=#REF_Example_4]Example 4:
    272 {{{#!xml
    273 <ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="2">
    274     <ed:Capabilities>
    275         <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability>
    276     </ed:Capabilities>
    277     <ed:SupportedDataViews>
    278         <ed:SupportedDataView id="hits" delivery-policy="send-by-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView>
    279     </ed:SupportedDataViews>
    280     <ed:Resources>
    281         <!-- just one top-level resource at the Endpoint -->
    282         <ed:Resource pid="http://hdl.handle.net/4711/0815">
    283             <ed:Title xml:lang="de">Goethe Korpus</ed:Title>
    284             <ed:Title xml:lang="en">Goethe corpus</ed:Title>
    285             <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description>
    286             <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description>
    287             <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI>
    288             <ed:Languages>
    289                 <ed:Language>deu</ed:Language>
    290             </ed:Languages>
    291             <ed:AvailableDataViews ref="hits" />
    292         </ed:Resource>
    293     </ed:Resources>
    294 </ed:EndpointDescription>
    295 }}}
    296 [#REF_Example_4 Example 4] shows a simple Endpoint Description for an Endpoint that only supports the ''Basic Search'' Capability and only provides the Generic Hits Data View, which is indicated by a `<ed:SupportedDataView>` element. This element carries a `@id` attribute with a value of `hits`, the recommended value for the short identifier, and indicates a delivery policy of ''send-by-default'' by the `@delivery-policy` attribute. It only provides one top-level resource identified by the persistent identifier `http://hdl.handle.net/4711/0815`. The resource has a title as well as a description in German and English. A landing page is located at `http://repos.example.org/corpus1.html`. The predominant language in the resource contents is German. Only the Generic Hits Data View is supported for this resource, because the `<ed:AvailableDataViews>` element only references the `<ed:SupporedDataView>` element with the `@id` with a value of `hits`.
    297 
    298 [=#REF_Example_5]Example 5:
    299 {{{#!xml
    300 <ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="2">
    301     <ed:Capabilities>
    302         <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability>
    303     </ed:Capabilities>
    304     <ed:SupportedDataViews>
    305         <ed:SupportedDataView id="hits" delivery-policy="send-by-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView>
    306         <ed:SupportedDataView id="cmdi" delivery-policy="need-to-request">application/x-cmdi+xml</ed:SupportedDataView>
    307     </ed:SupportedDataViews>
    308     <ed:Resources>
    309         <!-- top-level resource 1 -->
    310         <ed:Resource pid="http://hdl.handle.net/4711/0815">
    311             <ed:Title xml:lang="de">Goethe Korpus</ed:Title>
    312             <ed:Title xml:lang="en">Goethe corpus</ed:Title>
    313             <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description>
    314             <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description>
    315             <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI>
    316             <ed:Languages>
    317                 <ed:Language>deu</ed:Language>
    318             </ed:Languages>
    319             <ed:AvailableDataViews ref="hits" />
    320         </ed:Resource>
    321         <!-- top-level resource 2 -->
    322         <ed:Resource pid="http://hdl.handle.net/4711/0816">
    323             <ed:Title xml:lang="de">Zeitungskorpus des Mannheimer Morgen</ed:Title>
    324             <ed:Title xml:lang="en">Mannheimer Morgen newspaper corpus</ed:Title>
    325             <ed:LandingPageURI>http://repos.example.org/corpus2.html</ed:LandingPageURI>
    326             <ed:Languages>
    327                 <ed:Language>deu</ed:Language>
    328             </ed:Languages>
    329             <ed:AvailableDataViews ref="hits cmdi" />
    330             <ed:Resources>
    331                 <!-- sub-resource 1 of top-level resource 2 -->
    332                 <ed:Resource pid="http://hdl.handle.net/4711/0816-1">
    333                     <ed:Title xml:lang="de">Zeitungskorpus des Mannheimer Morgen (vor 1990)</ed:Title>
    334                     <ed:Title xml:lang="en">Mannheimer Morgen newspaper corpus (before 1990)</ed:Title>
    335                     <ed:LandingPageURI>http://repos.example.org/corpus2.html#sub1</ed:LandingPageURI>
    336                     <ed:Languages>
    337                         <ed:Language>deu</ed:Language>
    338                     </ed:Languages>
    339                     <ed:AvailableDataViews ref="hits cmdi" />
    340                 </ed:Resource>
    341                 <!-- sub-resource 2 of top-level resource 2 -->
    342                 <ed:Resource pid="http://hdl.handle.net/4711/0816-2">
    343                     <ed:Title xml:lang="de">Zeitungskorpus des Mannheimer Morgen (nach 1990)</ed:Title>
    344                     <ed:Title xml:lang="en">Mannheimer Morgen newspaper corpus (after 1990)</ed:Title>
    345                     <ed:LandingPageURI>http://repos.example.org/corpus2.html#sub2</ed:LandingPageURI>
    346                     <ed:Languages>
    347                         <ed:Language>deu</ed:Language>
    348                     </ed:Languages>
    349                     <ed:AvailableDataViews ref="hits cmdi" />
    350                 </ed:Resource>
    351             </ed:Resources>
    352         </ed:Resource>
    353     </ed:Resources>
    354 </ed:EndpointDescription>
    355 }}}
    356 The more complex [#REF_Example_5 Example 5] show an Endpoint Description for an Endpoint that, similar to [#REF_Example_4 Example 4], supports the ''Basic Search'' capability. In addition to the Generic Hits Data View, it also supports the CMDI Data View. The delivery polices are ''send-by-default'' for the Generic Hits Data View and ''need-to-request'' for the CMDI Data View. The Endpoint has two top-level resources (identified by the persistent identifiers `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816`. The second top-level resource has two independently searchable sub-resources, identified by the persistent identifier `http://hdl.handle.net/4711/0816-1` and `http://hdl.handle.net/4711/0816-2`. All resources are described using several properties, like title, description, etc. The first top-level resource provides only the Generic Hits Data View, while the other top-level resource including its children provide the Generic Hits and the CMDI Data Views.
    357 
    358 [=#REF_Example_6]Example 6:
    359 {{{#!xml
    360 <ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="2">
    361     <ed:Capabilities>
    362         <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability>
    363         <ed:Capability>http://clarin.eu/fcs/capability/advanced-search</ed:Capability>
    364     </ed:Capabilities>
    365     <ed:SupportedDataViews>
    366         <ed:SupportedDataView id="hits" delivery-policy="send-by-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView>
    367     </ed:SupportedDataViews>
    368     <!-- ADV-FCS -->
    369     <SupportedLayers>
    370         <SupportedLayer id="l1" result-id="http://endpoint.example.org/Layers/orth1">orth</SupportedLayer>
    371         <SupportedLayer id="l2" result-id="http://endpoint.example.org/Layers/pos1" qualifier="x">pos</SupportedLayer>
    372         <SupportedLayer id="l3" result-id="http://endpoint.example.org/Layers/pos2" qualifier="y"
    373             alt-value-info="STTS tagset"
    374             alt-value-info-uri="http://repos.example.org/tagset_doc.html">pos</SupportedLayer>
    375         <SupportedLayer id="l4" result-id="http://endpoint.example.org/Layers/word" type="empty">word</SupportedLayer>
    376         <SupportedLayer id="l5" result-id="http://endpoint.example.org/Layers/lemma1">lemma</SupportedLayer>
    377     </SupportedLayers>
    378 
    379     <ed:Resources>
    380         <!-- just one top-level resource at the Endpoint -->
    381         <ed:Resource pid="http://hdl.handle.net/4711/0815">
    382             <ed:Title xml:lang="de">Goethe Korpus</ed:Title>
    383             <ed:Title xml:lang="en">Goethe corpus</ed:Title>
    384             <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description>
    385             <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description>
    386             <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI>
    387             <ed:Languages>
    388                 <ed:Language>deu</ed:Language>
    389             </ed:Languages>
    390             <ed:AvailableDataViews ref="hits" />
    391             <AvailableLayers ref="l1 l2 l3 l4 l5" />
    392         </ed:Resource>
    393     </ed:Resources>
    394 </ed:EndpointDescription>
    395 }}}
     187
     188 * one or more `<ed:Title>` elements (`REQUIRED`) \\ A human readable title for the resource. A `REQUIRED` `@xml:lang` attribute indicates the language of the title. An English version of  the title is `REQUIRED`. The list of titles `MUST NOT` contain duplicate entries for the same language.
     189 * zero or more `<ed:Description>` elements (`OPTIONAL`) \\ An optional human-readable description of the resource. It `SHOULD` be at most one sentence. A `REQUIRED` `@xml:lang` attribute indicates the language of the description. If supplied, an English version of the description is `REQUIRED`. The list of descriptions `MUST NOT` contain duplicate entries for the same language.
     190 * zero or one `<ed:LandingPageURI>` element (`OPTIONAL`) \\ A link to a website for the resource, e.g. a landing page for a resource, i.e. a web-site that describes a corpus.
     191 * one `<ed:Languages>` element (`REQUIRED`) \\ The (relevant) languages available within the resource. The `<ed:Languages>` element contains one or more `<ed:Language>` elements. The content of a `<ed:Language>` element `MUST` be a ISO 639-3 three letter language code. This element should be repeated for all languages (relevant) available ''within'' the resource, however this list `MUST NOT` contain duplicate entries.
     192 * one `<ed:AvailableDataViews>` element (`REQUIRED`) \\ The Data Views that are available for the resource. The `<ed:AvailableDataViews>` element `MUST` carry a `@ref` attribute, that contains a whitespace separated list of id values, that correspond to value of the appropriate `@id` attribute for the `<ed:SupportedDataView>` elements that are referenced. \\ In case of sub-resources, each Resource `SHOULD` support all Data Views that are supported by the parent resource. However, every resource `MUST` declare all available Data Views independently, i.e. there is no implicit inheritance semantic.
     193 * one `<ed:AvailableLayers>` element (`REQUIRED` if Endpoint supports ''Advanced Search'' capability). The `<ed:AvailableLayers>` element `MUST` carry a `@ref` attribute, that contains a whitespace separated list of id values, that correspond to the value of the appropriate `@id` attribute for the `<ed:SupportedLayer>` elements that are referenced. \\ In case of sub-resources, each Resource `SHOULD` support all Layers that are supported by the parent resource. However, every resource `MUST` declare all available Layers independently, i.e. there is no implicit inheritance semantic.
     194 * zero or one `<ed:Resources>` element (`OPTIONAL`) \\ If a resource has searchable sub-resources, the Endpoint `MUST` supply additional finer grained resource elements, which are wrapped in a `<ed:Resources>` element. A sub-resource is a searchable entity within a resource, e.g. a sub-corpus.
     195
     196[=#REF_Example_4]Example 4: {{{#!xml <ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="2">
     197
     198  <ed:Capabilities>
     199    <ed:Capability> http://clarin.eu/fcs/capability/basic-search </ed:Capability >
     200  </ed:Capabilities > <ed:SupportedDataViews>
     201    <ed:SupportedDataView id="hits" delivery-policy="send-by-default"> application/x-clarin-fcs-hits+xml</ed:SupportedDataView >
     202  </ed:SupportedDataViews > <ed:Resources>
     203    <!-- just one top-level resource at the Endpoint --> <ed:Resource pid="http://hdl.handle.net/4711/0815">
     204      <ed:Title xml:lang="de"> Goethe Korpus</ed:Title > <ed:Title xml:lang="en"> Goethe corpus</ed:Title > <ed:Description xml:lang="de"> Der Goethe Korpus des IDS Mannheim.</ed:Description > <ed:Description xml:lang="en"> The Goethe corpus of IDS Mannheim.</ed:Description > <ed:LandingPageURI> http://repos.example.org/corpus1.html </ed:LandingPageURI > <ed:Languages>
     205        <ed:Language> deu</ed:Language >
     206      </ed:Languages > <ed:AvailableDataViews ref="hits" />
     207    </ed:Resource >
     208  </ed:Resources >
     209
     210</ed:EndpointDescription> }}} [#REF_Example_4 Example 4] shows a simple Endpoint Description for an Endpoint that only supports the ''Basic Search'' Capability and only provides the Generic Hits Data View, which is indicated by a `<ed:SupportedDataView>` element. This element carries a `@id` attribute with a value of `hits`, the recommended value for the short identifier, and indicates a delivery policy of ''send-by-default'' by the `@delivery-policy` attribute. It only provides one top-level resource identified by the persistent identifier `http://hdl.handle.net/4711/0815`. The resource has a title as well as a description in German and English. A landing page is located at `http://repos.example.org/corpus1.html`. The predominant language in the resource contents is German. Only the Generic Hits Data View is supported for this resource, because the `<ed:AvailableDataViews>` element only references the `<ed:SupporedDataView>` element with the `@id` with a value of `hits`.
     211
     212[=#REF_Example_5]Example 5: {{{#!xml <ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="2">
     213
     214  <ed:Capabilities>
     215    <ed:Capability> http://clarin.eu/fcs/capability/basic-search </ed:Capability >
     216  </ed:Capabilities > <ed:SupportedDataViews>
     217    <ed:SupportedDataView id="hits" delivery-policy="send-by-default"> application/x-clarin-fcs-hits+xml</ed:SupportedDataView > <ed:SupportedDataView id="cmdi" delivery-policy="need-to-request"> application/x-cmdi+xml</ed:SupportedDataView >
     218  </ed:SupportedDataViews > <ed:Resources>
     219    <!-- top-level resource 1 --> <ed:Resource pid="http://hdl.handle.net/4711/0815">
     220      <ed:Title xml:lang="de"> Goethe Korpus</ed:Title > <ed:Title xml:lang="en"> Goethe corpus</ed:Title > <ed:Description xml:lang="de"> Der Goethe Korpus des IDS Mannheim.</ed:Description > <ed:Description xml:lang="en"> The Goethe corpus of IDS Mannheim.</ed:Description > <ed:LandingPageURI> http://repos.example.org/corpus1.html </ed:LandingPageURI > <ed:Languages>
     221        <ed:Language> deu</ed:Language >
     222      </ed:Languages > <ed:AvailableDataViews ref="hits" />
     223    </ed:Resource > <!-- top-level resource 2 --> <ed:Resource pid="http://hdl.handle.net/4711/0816">
     224      <ed:Title xml:lang="de"> Zeitungskorpus des Mannheimer Morgen</ed:Title > <ed:Title xml:lang="en"> Mannheimer Morgen newspaper corpus</ed:Title > <ed:LandingPageURI> http://repos.example.org/corpus2.html </ed:LandingPageURI > <ed:Languages>
     225        <ed:Language> deu</ed:Language >
     226      </ed:Languages > <ed:AvailableDataViews ref="hits cmdi" /> <ed:Resources>
     227        <!-- sub-resource 1 of top-level resource 2 --> <ed:Resource pid="http://hdl.handle.net/4711/0816-1">
     228          <ed:Title xml:lang="de"> Zeitungskorpus des Mannheimer Morgen (vor 1990)</ed:Title > <ed:Title xml:lang="en"> Mannheimer Morgen newspaper corpus (before 1990)</ed:Title > <ed:LandingPageURI> http://repos.example.org/corpus2.html#sub1 </ed:LandingPageURI > <ed:Languages>
     229            <ed:Language> deu</ed:Language >
     230          </ed:Languages > <ed:AvailableDataViews ref="hits cmdi" />
     231        </ed:Resource > <!-- sub-resource 2 of top-level resource 2 --> <ed:Resource pid="http://hdl.handle.net/4711/0816-2">
     232          <ed:Title xml:lang="de"> Zeitungskorpus des Mannheimer Morgen (nach 1990)</ed:Title > <ed:Title xml:lang="en"> Mannheimer Morgen newspaper corpus (after 1990)</ed:Title > <ed:LandingPageURI> http://repos.example.org/corpus2.html#sub2 </ed:LandingPageURI > <ed:Languages>
     233            <ed:Language> deu</ed:Language >
     234          </ed:Languages > <ed:AvailableDataViews ref="hits cmdi" />
     235        </ed:Resource >
     236      </ed:Resources >
     237    </ed:Resource >
     238  </ed:Resources >
     239
     240</ed:EndpointDescription> }}} The more complex [#REF_Example_5 Example 5] show an Endpoint Description for an Endpoint that, similar to [#REF_Example_4 Example 4], supports the ''Basic Search'' capability. In addition to the Generic Hits Data View, it also supports the CMDI Data View. The delivery polices are ''send-by-default'' for the Generic Hits Data View and ''need-to-request'' for the CMDI Data View. The Endpoint has two top-level resources (identified by the persistent identifiers `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816`. The second top-level resource has two independently searchable sub-resources, identified by the persistent identifier `http://hdl.handle.net/4711/0816-1` and `http://hdl.handle.net/4711/0816-2`. All resources are described using several properties, like title, description, etc. The first top-level resource provides only the Generic Hits Data View, while the other top-level resource including its children provide the Generic Hits and the CMDI Data Views.
     241
     242[=#REF_Example_6]Example 6: {{{#!xml <ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="2">
     243
     244  <ed:Capabilities>
     245    <ed:Capability> http://clarin.eu/fcs/capability/basic-search </ed:Capability > <ed:Capability> http://clarin.eu/fcs/capability/advanced-search </ed:Capability >
     246  </ed:Capabilities > <ed:SupportedDataViews>
     247    <ed:SupportedDataView id="hits" delivery-policy="send-by-default"> application/x-clarin-fcs-hits+xml</ed:SupportedDataView >
     248  </ed:SupportedDataViews > <!-- ADV-FCS --> <SupportedLayers >
     249    <SupportedLayer  id="l1" result-id="http://endpoint.example.org/Layers/orth1 ">orth</SupportedLayer > <SupportedLayer  id="l2" result-id="http://endpoint.example.org/Layers/pos1 " qualifier="x">pos</SupportedLayer > <SupportedLayer  id="l3" result-id="http://endpoint.example.org/Layers/pos2 " qualifier="y"
     250      alt-value-info="STTS tagset" alt-value-info-uri="http://repos.example.org/tagset_doc.html ">pos</SupportedLayer >
     251    <SupportedLayer  id="l4" result-id="http://endpoint.example.org/Layers/word " type="empty">word</SupportedLayer > <SupportedLayer  id="l5" result-id="http://endpoint.example.org/Layers/lemma1 ">lemma</SupportedLayer >
     252  </SupportedLayers >
     253
     254  <ed:Resources>
     255    <!-- just one top-level resource at the Endpoint --> <ed:Resource pid="http://hdl.handle.net/4711/0815">
     256      <ed:Title xml:lang="de"> Goethe Korpus</ed:Title > <ed:Title xml:lang="en"> Goethe corpus</ed:Title > <ed:Description xml:lang="de"> Der Goethe Korpus des IDS Mannheim.</ed:Description > <ed:Description xml:lang="en"> The Goethe corpus of IDS Mannheim.</ed:Description > <ed:LandingPageURI> http://repos.example.org/corpus1.html </ed:LandingPageURI > <ed:Languages>
     257        <ed:Language> deu</ed:Language >
     258      </ed:Languages > <ed:AvailableDataViews ref="hits" /> <AvailableLayers  ref="l1 l2 l3 l4 l5" />
     259    </ed:Resource >
     260  </ed:Resources >
     261
     262</ed:EndpointDescription> }}}
     263
    396264{{{
    397265#!div style="border: 1px solid #000000; font-size: 75%"
    398266TODO: describe the above example
    399267}}}
    400 
    401 == Searching
     268== Searching ==
    402269In the ''Searching'' step the Client performs the actual search request to a previously [#Discovery discovered] Endpoint.
    403270
    404 === Basic Search #basicSearch
     271=== Basic Search === #basicSearch
    405272The ''Basic Search'' capability provides simple full-text search. Queries in Basic Search `MUST` be performed in the ''Contextual Query Language'' ([#REF_CQL OASIS-CQL]). The Endpoint `MUST` support ''term-only''  queries.  The Endpoint `SHOULD` support ''terms'' combined with boolean operator queries (''AND'' and ''OR''), including sub-queries. An Endpoint `MAY` also support ''NOT'' or ''PROX'' operator queries. If an Endpoint does not support a query, i.e. the used operators are not supported by the Endpoint, it `MUST` return an appropriate error message using the appropriate SRU diagnostic ([#REF_LOC_DIAG LOC-DIAG]).
    406273
     
    408275
    409276Examples of valid CQL queries for Basic Search are:
     277
    410278{{{
    411279cat
     
    417285cat AND (mouse OR "lazy dog")
    418286}}}
    419 
    420 '''NOTE''': In CQL, a ''term'' can be a single token or a phrase, i.e. tokens separated by spaces. If a single ''term'' contains spaces, it needs to be quoted. \\
    421 '''NOTE''': Endpoints `MUST` be able to parse all of CQL. If they don't support a certain CQL feature, they `MUST` generate an appropriate error message (see section [#sruCQL SRU/CQL]). Especially, if an Endpoint ''only'' supports ''Basic Search'', it `MUST NOT` silently accept queries that include CQL features besides ''term-only'' and ''terms'' combined with boolean operator queries, i.e. queries involving context sets, etc.
    422 
    423 === Advanced Search
     287'''NOTE''': In CQL, a ''term'' can be a single token or a phrase, i.e. tokens separated by spaces. If a single ''term'' contains spaces, it needs to be quoted. \\ '''NOTE''': Endpoints `MUST` be able to parse all of CQL. If they don't support a certain CQL feature, they `MUST` generate an appropriate error message (see section [#sruCQL SRU/CQL]). Especially, if an Endpoint ''only'' supports ''Basic Search'', it `MUST NOT` silently accept queries that include CQL features besides ''term-only'' and ''terms'' combined with boolean operator queries, i.e. queries involving context sets, etc.
     288
     289=== Advanced Search ===
    424290The ''Advanced Search'' capability allows searching in annotated data, that is represented in annotation layers. An annotation ''layer'' contains annotations of a specific type, e.g. lemma or part-of-speech layer. Queries can be across annotation layer.
    425291
    426292CLARIN-FCS defined a set of searchable annotation layers with certain semantics and syntax. Endpoints `SHOULD` support as many different, of course depending on the resource type, annotation layers as possible.
    427293
    428 ==== Layers #layers
     294==== Layers ==== #layers
    429295Each Layer is assumed to be ''segmented'', e.g. to allow for searching for a single lemma. However, CLARIN-FCS does not endorse a specific segmentation, i.e. the segmentation of Layers is in the domain of the Endpoint and ''opaque'' to CLARIN-FCS. CLARIN-FCS '''does not''' endorse nor assume a ''formal linguistic relation'' or ''formal linguistic hierarchy'' between two items on two different layers.
    430296
    431 ||=Layer Type Identifier =||=Annotation Layer Description                                =||=Syntax                          =||=Examples (without quotes)           =||
    432 || `text`                 || Textual representation of resource, also the layer that is used in [#basicSearch Basic Search] || ''String''                       || "Dog", "cat" "walking", "better"                ||
    433 || `lemma`                || Lemmatisation                                               || ''String''                       || "good", "walk", "dog"             ||
    434 || `pos`                  || Part-of-Speech annotations                                  || [#REF_UD_POS Universal POS tags] || "NOUN", "VERB", "ADJ"                ||
    435 || `orth`                 || Orthographic transcription of (mostly) spoken resources     || ''String''                       || "dug", "cat", "wolking"              ||
    436 || `norm`                 || Orthographic normalization of (mostly) spoken resources     || ''String''                       || "dog", "cat", "walking", "best"              ||
    437 || `phonetic`             || Phonetic transcription                                      || [#REF_SAMPA SAMPA]               || "'du:", "'vi:-d6 'ha:-b@n"           ||
    438 
    439 The column ''Layer Type Identifier'' denotes the identifier for a layer. It is used in [#fcsQL FCS-QL] queries and the XML serialization for the [#advancedDataView Advanced Data View]. All valid identifiers are defined in the table above, all other identifiers are reserved and `MUST NOT` be used. Clients and Endpoints `MAY` create custom Layer Type Identifiers, e.g. for testing proposed. If they so so, the custom Layer Type identifiers `MUST` start with the String `x-`, e.g. `x-customLayer`.
    440 The column ''Syntax'' describes the inventory of symbols that a Client `MUST` use with a corresponding annotation layer; the value ''String'' denotes that symbols are arbitrary Unicode Strings, i.e. no fixed inventory of symbols are defined. An Endpoint `SHOULD` provide an appropriate error, if a Client used an invalid value.
    441 
    442 ==== FCS-QL #fcsQL
     297||=Layer Type Identifier =||=Annotation Layer Description =||=Syntax =||=Examples (without quotes) =||
     298|| `text` || Textual representation of resource, also the layer that is used in [#basicSearch Basic Search] || ''String'' || "Dog", "cat" "walking", "better" ||
     299|| `lemma` || Lemmatisation || ''String'' || "good", "walk", "dog" ||
     300|| `pos` || Part-of-Speech annotations || [#REF_UD_POS Universal POS tags] || "NOUN", "VERB", "ADJ" ||
     301|| `orth` || Orthographic transcription of (mostly) spoken resources || ''String'' || "dug", "cat", "wolking" ||
     302|| `norm` || Orthographic normalization of (mostly) spoken resources || ''String'' || "dog", "cat", "walking", "best" ||
     303|| `phonetic` || Phonetic transcription || [#REF_SAMPA SAMPA] || "'du:", "'vi:-d6 'ha:-b@n" ||
     304
     305The column ''Layer Type Identifier'' denotes the identifier for a layer. It is used in [#fcsQL FCS-QL] queries and the XML serialization for the [#advancedDataView Advanced Data View]. All valid identifiers are defined in the table above, all other identifiers are reserved and `MUST NOT` be used. Clients and Endpoints `MAY` create custom Layer Type Identifiers, e.g. for testing proposed. If they so so, the custom Layer Type identifiers `MUST` start with the String `x-`, e.g. `x-customLayer`. The column ''Syntax'' describes the inventory of symbols that a Client `MUST` use with a corresponding annotation layer; the value ''String'' denotes that symbols are arbitrary Unicode Strings, i.e. no fixed inventory of symbols are defined. An Endpoint `SHOULD` provide an appropriate error, if a Client used an invalid value.
     306
     307==== FCS-QL ==== #fcsQL
    443308{{{
    444309#!div style="border: 1px solid #000000; font-size: 75%"
     
    450315
    451316Examples of valid FCS-QL queries for ''Advanced Search'' are:
     317
    452318{{{
    453319"walking"
     
    463329[z:pos = "ADJ" & q:pos = "ADJ"]
    464330}}}
    465 
    466 The qualifiers ''z'' in ''z:pos'' and ''q'' in ''q:pos'' `SHOULD` match an available qualifier attribute value in a ''pos''-{{{SupportedLayer}}} in a discovered ''EndpointDescripion''.
    467 
    468 
    469 '''NOTE''': Endpoints supporting ''Advanced Search'' `MUST` be able to parse all of FCS-QL. If they don't support a certain FCS-QL feature, they `MUST` generate an appropriate error message (see section [#sruCQL SRU/CQL]). If an Endpoint ''only'' supports ''Basic Search'', it `MUST NOT` silently accept queries that include FCS-QL features.\\
    470 '''NOTE''': FCS-QL layer identifiers are reserved. The Endpoint `MUST` prepend the local prefix {{{x-}}} to any identifier used outside of the reserved set, e.g., {{{x-customLayer}}} for a local identifier {{{customLayer}}}.
    471 
    472 
    473 === Result Format
     331The qualifiers ''z'' in ''z:pos'' and ''q'' in ''q:pos'' `SHOULD` match an available qualifier attribute value in a ''pos''-`SupportedLayer` in a discovered ''EndpointDescripion''.
     332
     333'''NOTE''': Endpoints supporting ''Advanced Search'' `MUST` be able to parse all of FCS-QL. If they don't support a certain FCS-QL feature, they `MUST` generate an appropriate error message (see section [#sruCQL SRU/CQL]). If an Endpoint ''only'' supports ''Basic Search'', it `MUST NOT` silently accept queries that include FCS-QL features.\\ '''NOTE''': FCS-QL layer identifiers are reserved. The Endpoint `MUST` prepend the local prefix `x-` to any identifier used outside of the reserved set, e.g., `x-customLayer` for a local identifier `customLayer`.
     334
     335=== Result Format ===
    474336{{{
    475337#!div style="border: 1px solid #000000; font-size: 75%"
     
    480342CLARIN-FCS uses a customized format for returning results. ''Resource'' and ''Resource Fragments'' serve as containers for hit results, which are presented in one or more ''Data View''. The following section describes the Resource format and Data View format and section [#searchRetrieve Operation ''searchRetrieve''] will describe, how hits are embedded within SRU responses.
    481343
    482 ==== Resource and !ResourceFragment
     344==== Resource and !ResourceFragment ====
    483345To encode search results, CLARIN-FCS supports two building blocks:
    484  Resources::
    485    A ''Resource'' is a ''searchable'' and ''addressable'' entity at the Endpoint, such as a text corpus or a multi-modal corpus. A resource `SHOULD` be a self-contained unit, i.e. not a single sentence in a text corpus or a time interval in an audio transcription, but rather a complete document from a text corpus or a complete audio transcription.
    486  Resource Fragments::
    487    A ''Resource Fragment'' is a smaller unit in a ''Resource'', i.e. a sentence in a text corpus or a time interval in an audio transcription.
     346
     347 Resources:: A ''Resource'' is a ''searchable'' and ''addressable'' entity at the Endpoint, such as a text corpus or a multi-modal corpus. A resource `SHOULD` be a self-contained unit, i.e. not a single sentence in a text corpus or a time interval in an audio transcription, but rather a complete document from a text corpus or a complete audio transcription.
     348 Resource Fragments:: A ''Resource Fragment'' is a smaller unit in a ''Resource'', i.e. a sentence in a text corpus or a time interval in an audio transcription.
    488349
    489350A Resource `SHOULD` be the most precise unit of data that is directly addressable as a "whole". A Resource `SHOULD` contain a Resource Fragment, if the hit consists of just a part of the Resource unit (for example if the hit is a sentence within a large text). A Resource Fragment `SHOULD` be addressable within a resource, i.e. it has an offset or a resource-internal identifier. Using Resource Fragments is `OPTIONAL`, but Endpoints are encouraged to use them. If the Endpoint encodes a hit with a Resource Fragment, the actual hit `SHOULD` be encoded as a Data View that within the Resource Fragment.
     
    501362Endpoints `MAY` serialize hits as multiple Data Views, however they `MUST` provide the Generic Hits (HITS) Data View either encoded as a Resource Fragment (if applicable), or otherwise within the Resource (if there is no reasonable Resource Fragment). Other Data Views `SHOULD` be put in a place that is logical for their content (as is to be determined by the Endpoint), e.g. a metadata Data View would most likely be put directly below Resource and a Data View representing some annotation layers directly around the hit is more likely to belong within a Resource Fragment.
    502363
    503 [=#REF_Example_1]Example 1:
    504 {{{#!xml
    505 <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/00-15">
     364[=#REF_Example_1]Example 1: {{{#!xml <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/00-15">
     365
    506366  <fcs:DataView type="application/x-clarin-fcs-hits+xml">
    507       <!-- data view payload omitted -->
    508   </fcs:DataView>
    509 </fcs:Resource>
    510 }}}
    511 [#REF_Example_1 Example 1] shows a simple hit, which is encoded in one Data View of type ''Generic Hits'' embedded within a Resource. The type of the Data View is identified by the MIME type `application/x-clarin-fcs-hits+xml`. The Resource is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`.
    512 
    513 [=#REF_Example_2]Example 2:
    514 {{{#!xml
    515 <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/08-15">
     367    <!-- data view payload omitted -->
     368  </fcs:DataView >
     369
     370</fcs:Resource> }}} [#REF_Example_1 Example 1] shows a simple hit, which is encoded in one Data View of type ''Generic Hits'' embedded within a Resource. The type of the Data View is identified by the MIME type `application/x-clarin-fcs-hits+xml`. The Resource is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`.
     371
     372[=#REF_Example_2]Example 2: {{{#!xml <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/08-15">
     373
    516374  <fcs:ResourceFragment>
    517375    <fcs:DataView type="application/x-clarin-fcs-hits+xml">
    518376      <!-- data view payload omitted -->
    519     </fcs:DataView>
    520   </fcs:ResourceFragment>
    521 </fcs:Resource>
    522 }}}
    523 [#REF_Example_2 Example 2] shows a hit encoded as a Resource Fragment embedded within a Resource. The actual hit is again encoded as one Data View of type ''Generic Hits''. The hit is not directly referenceable, but the Resource, in which hit occurred, is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`. In contrast to [#REF_Example_1 Example 1], the Endpoint decided to provide a "semantically richer" encoding and embedded the hit using a Resource Fragment within the Resource to indicate that the hit is a part of a larger resource, e.g. a sentence in a text document.
    524 
    525 [=#REF_Example_3]Example 3:
    526 {{{#!xml
    527 <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource"
    528               pid="http://hdl.handle.net/4711/08-15" ref="http://repos.example.org/file/text_08_15.html">
    529   <fcs:DataView type="application/x-cmdi+xml"
    530                 pid="http://hdl.handle.net/4711/08-15-1" ref="http://repos.example.org/file/08_15_1.cmdi">
    531       <!-- data view payload omitted -->
    532   </fcs:DataView>
    533   <fcs:ResourceFragment pid="http://hdl.handle.net/4711/08-15-2" ref="http://repos.example.org/file/text_08_15.html#sentence2">
     377    </fcs:DataView >
     378  </fcs:ResourceFragment >
     379
     380</fcs:Resource> }}} [#REF_Example_2 Example 2] shows a hit encoded as a Resource Fragment embedded within a Resource. The actual hit is again encoded as one Data View of type ''Generic Hits''. The hit is not directly referenceable, but the Resource, in which hit occurred, is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`. In contrast to [#REF_Example_1 Example 1], the Endpoint decided to provide a "semantically richer" encoding and embedded the hit using a Resource Fragment within the Resource to indicate that the hit is a part of a larger resource, e.g. a sentence in a text document.
     381
     382[=#REF_Example_3]Example 3: {{{#!xml <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource"
     383
     384  pid="http://hdl.handle.net/4711/08-15 " ref="http://repos.example.org/file/text_08_15.html ">
     385
     386<fcs:DataView type="application/x-cmdi+xml"
     387
     388  pid="http://hdl.handle.net/4711/08-15-1 " ref="http://repos.example.org/file/08_15_1.cmdi ">
     389
     390<!-- data view payload omitted -->
     391
     392  </fcs:DataView > <fcs:ResourceFragment pid="http://hdl.handle.net/4711/08-15-2" ref="http://repos.example.org/file/text_08_15.html#sentence2">
    534393    <fcs:DataView type="application/x-clarin-fcs-hits+xml">
    535394      <!-- data view payload omitted -->
    536     </fcs:DataView>
    537   </fcs:ResourceFragment>
    538 </fcs:Resource>
    539 }}}
    540 The more complex [#REF_Example_3 Example 3] is similar to [#REF_Example_2 Example 2], i.e. it shows a hit is encoded as one ''Generic Hits'' Data View in a Resource Fragment, which is embedded in a Resource. In contrast to Example 2, another Data View of type ''CMDI'' is embedded directly within the Resource. The Endpoint can use this type of Data View to directly provide CMDI metadata about the Resource to Clients.
    541 All entities of the Hit can be referenced by a persistent identifier and a URI. The complete Resource is referenceable by either the persistent identifier `http://hdl.handle.net/4711/08-15` or the URI `http://repos.example.org/file/text_08_15.html` and the CMDI metadata record in the CMDI Data View is referenceable either by the persistent identifier `http://hdl.handle.net/4711/08-15-1` or the URI `http://repos.example.org/file/08_15_1.cmdi`. The actual hit in the Resource Fragment is also directly referenceable by either the persistent identifier `http://hdl.handle.net/4711/00-15-2` or the URI `http://repos.example.org/file/text_08_15.html#sentence2`.   
    542 
    543 ==== Data View
     395    </fcs:DataView >
     396  </fcs:ResourceFragment >
     397
     398</fcs:Resource> }}} The more complex [#REF_Example_3 Example 3] is similar to [#REF_Example_2 Example 2], i.e. it shows a hit is encoded as one ''Generic Hits'' Data View in a Resource Fragment, which is embedded in a Resource. In contrast to Example 2, another Data View of type ''CMDI'' is embedded directly within the Resource. The Endpoint can use this type of Data View to directly provide CMDI metadata about the Resource to Clients. All entities of the Hit can be referenced by a persistent identifier and a URI. The complete Resource is referenceable by either the persistent identifier `http://hdl.handle.net/4711/08-15` or the URI `http://repos.example.org/file/text_08_15.html` and the CMDI metadata record in the CMDI Data View is referenceable either by the persistent identifier `http://hdl.handle.net/4711/08-15-1` or the URI `http://repos.example.org/file/08_15_1.cmdi`. The actual hit in the Resource Fragment is also directly referenceable by either the persistent identifier `http://hdl.handle.net/4711/00-15-2` or the URI `http://repos.example.org/file/text_08_15.html#sentence2`.
     399
     400==== Data View ====
    544401A ''Data View'' serves as a container for encoding the actual search results (the data fragments relevant to search) within CLARIN-FCS. Data Views are designed to allow for different representations of results, i.e. they are deliberately kept open to allow further extensions with more supported Data View formats. This specification only defines a ''most basic'' Data View for representing search results, called ''Generic Hits'' (see below). More Data Views are defined in the supplementary specification [#REF_FCS_DataViews CLARIN-FCS-DataViews].
    545402
     
    556413'''NOTE''': The examples in the following sections ''show only'' the payload with the enclosing `<fcs:DataView>` element of a Data View. Of course, the Data View must be embedded either in a `<fcs:Resource>` or a `<fcs:ResourceFragment>` element. The `@pid` and `@ref` attributes have been omitted for all ''inline'' payload types.
    557414
    558 ===== Generic Hits (HITS)
    559 ||=Description                  =|| The representation of the hit ||
    560 ||=MIME type                    =|| `application/x-clarin-fcs-hits+xml` ||
    561 ||=Payload Disposition          =|| ''inline'' ||
    562 ||=Payload Delivery             =|| ''send-by-default'' (`REQUIRED`) ||
     415===== Generic Hits (HITS) =====
     416||=Description =|| The representation of the hit ||
     417||=MIME type =|| `application/x-clarin-fcs-hits+xml` ||
     418||=Payload Disposition =|| ''inline'' ||
     419||=Payload Delivery =|| ''send-by-default'' (`REQUIRED`) ||
    563420||=Recommended Short Identifier =|| `hits` (`RECOMMENDED`) ||
    564 ||=XML Schema                   =|| [source:FederatedSearch/schema/Core_2/DataView-Hits.xsd DataView-Hits.xsd] ([source:FederatedSearch/schema/Core_2/DataView-Hits.xsd?format=txt download]) ||
     421||=XML Schema =|| [source:FederatedSearch/schema/Core_2/DataView-Hits.xsd DataView-Hits.xsd] ([source:FederatedSearch/schema/Core_2/DataView-Hits.xsd?format=txt download]) ||
     422
    565423The ''Generic Hits'' Data View serves as the ''most basic'' agreement in CLARIN-FCS for serialization of search results and `MUST` be implemented by all Endpoints. In many cases, this Data View can only serve as an (lossy) approximation, because resources at Endpoints are very heterogeneous. For instance, the Generic Hits Data View is probably not the best representation for a hit result in a corpus of spoken language, but an architecture like CLARIN-FCS requires one common representation to be implemented by all Endpoints, therefore this Data View was defined. The Generic Hits Data View supports multiple markers for supplying highlighting for an individual hit, e.g. if a query contains a (boolean) conjunction, the Endpoint can use multiple markers to provide individual highlights for the matching terms. An Endpoint `MUST NOT` use this Data View to aggregate several hits within one resource. Each hit `SHOULD` be presented within the context of a complete sentence. If that is not possible due to the nature of the type of the resource, the Endpoint `MUST` provide an equivalent reasonable unit of context (e.g. within a phrase of an orthographic transcription of an utterance). The `<hits:Hit>` element within the `<hits:Result>` element is not enforced by the XML schema, but Endpoints are `RECOMMENDED` to use it. The XML fragment of the Generic Hits payload `MUST` be valid according to the XML schema "[source:FederatedSearch/schema/Core_2/DataView-Hits.xsd DataView-Hits.xsd]" ([source:FederatedSearch/schema/Core_2/DataView-Hits.xsd?format=txt download]).
     424
    566425 * Example (single hit marker):
    567 {{{#!xml
    568 <!-- potential @pid and @ref attributes omitted -->
    569 <fcs:DataView type="application/x-clarin-fcs-hits+xml">
     426
     427{{{#!xml <!-- potential @pid and @ref attributes omitted --> <fcs:DataView type="application/x-clarin-fcs-hits+xml">
     428
    570429  <hits:Result xmlns:hits="http://clarin.eu/fcs/dataview/hits">
    571     The quick brown <hits:Hit>fox</hits:Hit> jumps over the lazy dog.
    572   </hits:Result>
    573 </fcs:DataView>
    574 }}}
     430    The quick brown <hits:Hit> fox</hits:Hit > jumps over the lazy dog.
     431  </hits:Result >
     432
     433</fcs:DataView> }}}
     434
    575435 * Example (multiple hit markers):
    576 {{{#!xml
    577 <!-- potential @pid and @ref attributes omitted -->
    578 <fcs:DataView type="application/x-clarin-fcs-hits+xml">
     436
     437{{{#!xml <!-- potential @pid and @ref attributes omitted --> <fcs:DataView type="application/x-clarin-fcs-hits+xml">
     438
    579439  <hits:Result xmlns:hits="http://clarin.eu/fcs/dataview/hits">
    580     The quick brown <hits:Hit>fox</hits:Hit> jumps over the lazy <hits:Hit>dog</hits:Hit>.
    581   </hits:Result>
    582 </fcs:DataView>
    583 }}}
    584 
    585 ===== Advanced (ADV) #advancedDataView
    586 ||=Description                  =|| The representation of the hit for Advanced Search ||
    587 ||=MIME type                    =|| `application/x-clarin-fcs-adv+xml` ||
    588 ||=Payload Disposition          =|| ''inline'' ||
    589 ||=Payload Delivery             =|| ''send-by-default'' (`REQUIRED`) ||
     440    The quick brown <hits:Hit> fox</hits:Hit > jumps over the lazy <hits:Hit> dog</hits:Hit >.
     441  </hits:Result >
     442
     443</fcs:DataView> }}}
     444
     445===== Advanced (ADV) ===== #advancedDataView
     446||=Description =|| The representation of the hit for Advanced Search ||
     447||=MIME type =|| `application/x-clarin-fcs-adv+xml` ||
     448||=Payload Disposition =|| ''inline'' ||
     449||=Payload Delivery =|| ''send-by-default'' (`REQUIRED`) ||
    590450||=Recommended Short Identifier =|| `adv` (`RECOMMENDED`) ||
    591 ||=XML Schema                   =|| [source:FederatedSearch/schema/Core_2/DataView-Advanced.xsd DataView-Advanced.xsd] ([source:FederatedSearch/schema/Core_2/DataView-Advanced.xsd?format=txt download]) ||
     451||=XML Schema =|| [source:FederatedSearch/schema/Core_2/DataView-Advanced.xsd DataView-Advanced.xsd] ([source:FederatedSearch/schema/Core_2/DataView-Advanced.xsd?format=txt download]) ||
    592452
    593453{{{
     
    595455TODO: describe!
    596456}}}
    597 
    598  - ADV Data View allows to return structured information for Advanced Search queries
    599  - organized in one or more annotation layers
    600  - annotation layer := annotations of a specific type, e.g. part-of-speech or orthographic transcription
    601  - annotations of two different annotation layers may freely overlap; no self-overlap in an annotation layer
    602  - Data View serialization in a stand-off like format, i.e annotations are ranges over the signal (= language resource as character, token or audio stream) are denoted by start and end offsets
    603  - layers alignable (through their offsets) and referable (trough their layer identifier)
    604  - ADV Data View serialization:
    605    - a list of segments (= "inventory" of all ranges used to describe annotations")
    606      - units can be "items" (= offsets in character or token-stream) or "timestamp" (timestamps in audio-stream), timestamps may have a resolution of up to 1/1000 second.
    607        - endpoints are responsible for choosing proper offsets for segments. they must do so in a consistent manner, i.e. in a single result (= ADV Data View instance) the chosen offsets must allow for aligning the segments of different layers. a recommendation for character streams: character := Unicode codepoint, normalized to Unicode Normalization Form KC (NFKC; Compatibility Decomposition, followed by Canonical Composition)
    608      - segments may also have an endpoint specific reference (= URI); can be show in aggregator and if user clicks link can open a viewer (e.g. audio-player) at the endpoint
    609    - a list of layers, each has a type (e.g. "pos", "lemma", see Layer Type identifier in section Layers above) and an layer identifier (= URI)
    610      - a layer consists of one or more Spans. A span references a segment (and thus inherits the start- and en- offsets) and contains the actual annotation (e.g. the port-of-speech label) in it's content; MAY also contain alt-value (e.g. original annotation value)
    611      - document order of layer elements define the view order in the Aggregator
    612      - endpoints should return at least all layers that where referenced the query; they may return more
    613    - Hit Makers are added by marking Spans as hits (add `@highlight` attribute); multiple hit-makers are supported and Aggregator may display them visually distinct
    614      - where to add hit markers is up to the endpoint; generally "things" that where referenced in the query should be marked.
    615 
     457 * ADV Data View allows to return structured information for Advanced Search queries
     458 * organized in one or more annotation layers
     459 * annotation layer := annotations of a specific type, e.g. part-of-speech or orthographic transcription
     460 * annotations of two different annotation layers may freely overlap; no self-overlap in an annotation layer
     461 * Data View serialization in a stand-off like format, i.e annotations are ranges over the signal (= language resource as character, token or audio stream) are denoted by start and end offsets
     462 * layers alignable (through their offsets) and referable (trough their layer identifier)
     463 * ADV Data View serialization:
     464   * a list of segments (= "inventory" of all ranges used to describe annotations")
     465     * units can be "items" (= offsets in character or token-stream) or "timestamp" (timestamps in audio-stream), timestamps may have a resolution of up to 1/1000 second.
     466       * endpoints are responsible for choosing proper offsets for segments. they must do so in a consistent manner, i.e. in a single result (= ADV Data View instance) the chosen offsets must allow for aligning the segments of different layers. a recommendation for character streams: character := Unicode codepoint, normalized to Unicode Normalization Form KC (NFKC; Compatibility Decomposition, followed by Canonical Composition)
     467     * segments may also have an endpoint specific reference (= URI); can be show in aggregator and if user clicks link can open a viewer (e.g. audio-player) at the endpoint
     468   * a list of layers, each has a type (e.g. "pos", "lemma", see Layer Type identifier in section Layers above) and an layer identifier (= URI)
     469     * a layer consists of one or more Spans. A span references a segment (and thus inherits the start- and en- offsets) and contains the actual annotation (e.g. the port-of-speech label) in it's content; MAY also contain alt-value (e.g. original annotation value)
     470     * document order of layer elements define the view order in the Aggregator
     471     * endpoints should return at least all layers that where referenced the query; they may return more
     472   * Hit Makers are added by marking Spans as hits (add `@highlight` attribute); multiple hit-makers are supported and Aggregator may display them visually distinct
     473     * where to add hit markers is up to the endpoint; generally "things" that where referenced in the query should be marked.
    616474
    617475Example: a sentence interpreted as a character stream
    618 ||=Data   =|| t || || d || a || || ' || s ||   || d || e ||   || e || n || i || g || e ||   || e || c || h || t || e ||   || h || o || o || p ||   || v || o || o || r ||   || o || n || s ||   || m || e || n || s || e || n ||
     476
     477||=Data =|| t || || d || a || || ' || s || || d || e || || e || n || i || g || e || || e || c || h || t || e || || h || o || o || p || || v || o || o || r || || o || n || s || || m || e || n || s || e || n ||
    619478||=Offset =|| 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 || 10 || 11 || 12 || 13 || 14 || 15 || 16 || 17 || 18 || 19 || 20 || 21 || 22 || 23 || 24 || 25 || 26 || 27 || 28 || 29 || 30 || 31 || 32 || 33 || 34 || 35 || 36 || 37 || 38 || 39 || 40 || 41 || 42 || 43 ||
    620479
    621480Example: several annotation layers for the sentence
    622 ||=Offset (Start, End) =|| 1,1  || 3,4    || 6,7    || 9,10   || 12,16    || 18,22    || 24,27   || 29,32   || 34,36   || 38,43  ||
    623 ||=Layer ''orth''      =|| t    || da     || 's     || de     || enige    || echte    || hoop    || voor    || ons     || mensen ||
    624 ||=Layer ''pos''       =|| X    || PRON   || VERB   || DET    || DET      || ADJ      || NOUN    || ADP     || PRON    || NOUN   ||
    625 ||=Layer ''lemma''     =|| _    || dat    || zijn   || de     || enig     || echt     || hoop    || voor    || ons     || mens   ||
    626 ||=Layer ''phonetic''  =|| t@   || dAz    || dAz    || d@     || en@G@    || Ext@     || hop     || for     || Ons     || mEns@  ||
    627 
    628 Example: XML serialization
    629 {{{#!xml
    630 <Advanced>
    631     <Segments unit="items">
    632         <Segment id="s1"  start="1"  end="1"
    633             ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=0:173"/>
    634         <Segment id="s2"  start="3"  end="4"
    635             ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=173:304"/>
    636         <Segment id="s3"  start="6"  end="7"
    637             ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=173:304"/>
    638         <Segment id="s4"  start="9"  end="10"
    639             ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=304:480"/>
    640         <Segment id="s5"  start="12" end="16"
    641             ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=480:1119"/>
    642         <Segment id="s6"  start="18" end="22"
    643             ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=1339:1901"/>
    644         <Segment id="s7"  start="24" end="27"
    645             ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=1901:2427"/>
    646         <Segment id="s8"  start="29" end="32"
    647             ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=3084:3493"/>
    648         <Segment id="s9"  start="34" end="36"
    649             ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=3493:3754"/>
    650         <Segment id="s10" start="38" end="43"
    651             ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=3754:4274"/>
    652     </Segments>
    653 
    654     <Layers>
    655         <Layer id="http://endpoint.example.org/Layers/orth1">
    656             <Span ref="s1">t</Span>
    657             <Span ref="s2">da</Span>
    658             <Span ref="s3">'s</Span>
    659             <Span ref="s4">de</Span>
    660             <Span ref="s5">enige</Span>
    661             <Span ref="s6">echte</Span>
    662             <Span ref="s7">hoop</Span>
    663             <Span ref="s8">voor</Span>
    664             <Span ref="s9">ons</Span>
    665             <Span ref="s10">mensen</Span>
    666         </Layer>
    667 
    668         <Layer id="http://endpoint.example.org/Layers/pos1">
    669             <Span ref="s1"  alt-value="SPEC(afgebr)">X</Span>
    670             <Span ref="s2"  alt-value="VNW(aanw,pron,stan,vol,3o,ev)">PRON</Span>
    671             <Span ref="s3"  alt-value="WW(pv,tgw,ev)">VERB</Span>
    672             <Span ref="s4"  alt-value="LID(bep,stan,rest)">DET</Span>
    673             <Span ref="s5"  alt-value="VNW(onbep,det,stan,prenom,met-e,rest)">DET</Span>
    674             <Span ref="s6"  alt-value="ADJ(prenom,basis,met-e,stan)">ADJ</Span>
    675             <Span ref="s7"  alt-value="N(soort,ev,basis,zijd,stan)">NOUN</Span>
    676             <Span ref="s8"  alt-value="VZ(init)">ADP</Span>
    677             <Span ref="s9"  alt-value="VNW(pr,pron,obl,vol,1,mv)">PRON</Span>
    678             <Span ref="s10" alt-value="N(soort,mv,basis)">NOUN</Span>
    679         </Layer>
    680 
    681         <Layer id="http://endpoint.example.org/Layers/lemma1">
    682             <Span ref="s1">_</Span>
    683             <Span ref="s2">dat</Span>
    684             <Span ref="s3">zijn</Span>
    685             <Span ref="s4" >de</Span>
    686             <Span ref="s5">enig</Span>
    687             <Span ref="s6" highlight="h1">echt</Span>
    688             <Span ref="s7" highlight="h1">hoop</Span>
    689             <Span ref="s8">voor</Span>
    690             <Span ref="s9">ons</Span>
    691             <Span ref="s10">mens</Span>
    692         </Layer>
    693 
    694         <Layer id="http://endpoint.example.org/Layers/phon">
    695             <Span ref="s1">t@</Span>
    696             <Span ref="s2" highlight="h2">dAz</Span>
    697             <Span ref="s3">dAz</Span>
    698             <Span ref="s4">d@</Span>
    699             <Span ref="s5">en@G@</Span>
    700             <Span ref="s6">Ext@</Span>
    701             <Span ref="s7">hop</Span>
    702             <Span ref="s8">for</Span>
    703             <Span ref="s9">Ons</Span>
    704             <Span ref="s10">mEns@</Span>
    705         </Layer>
    706     </Layers>
    707 </Advanced>
    708 }}}
    709 
    710 === Versioning and Extensions
    711 ==== Backwards Compatibility #backwardsCompatibility
     481
     482||=Offset (Start, End) =|| 1,1 || 3,4 || 6,7 || 9,10 || 12,16 || 18,22 || 24,27 || 29,32 || 34,36 || 38,43 ||
     483||=Layer ''orth'' =|| t || da || 's || de || enige || echte || hoop || voor || ons || mensen ||
     484||=Layer ''pos'' =|| X || PRON || VERB || DET || DET || ADJ || NOUN || ADP || PRON || NOUN ||
     485||=Layer ''lemma'' =|| _ || dat || zijn || de || enig || echt || hoop || voor || ons || mens ||
     486||=Layer ''phonetic'' =|| t@ || dAz || dAz || d@ || en@G@ || Ext@ || hop || for || Ons || mEns@ ||
     487
     488Example: XML serialization {{{#!xml <Advanced>
     489
     490  <Segments unit="items">
     491    <Segment id="s1"  start="1"  end="1"
     492      ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=0:173"/ >
     493    <Segment id="s2"  start="3"  end="4"
     494      ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=173:304"/ >
     495    <Segment id="s3"  start="6"  end="7"
     496      ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=173:304"/ >
     497    <Segment id="s4"  start="9"  end="10"
     498      ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=304:480"/ >
     499    <Segment id="s5"  start="12" end="16"
     500      ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=480:1119"/ >
     501    <Segment id="s6"  start="18" end="22"
     502      ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=1339:1901"/ >
     503    <Segment id="s7"  start="24" end="27"
     504      ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=1901:2427"/ >
     505    <Segment id="s8"  start="29" end="32"
     506      ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=3084:3493"/ >
     507    <Segment id="s9"  start="34" end="36"
     508      ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=3493:3754"/ >
     509    <Segment id="s10" start="38" end="43"
     510      ref="http://hdl.handle.net/4711/123456789?urlappend=%3Fplay=3754:4274"/ >
     511  </Segments>
     512
     513  <Layers>
     514    <Layer id="http://endpoint.example.org/Layers/orth1 ">
     515      <Span ref="s1">t</Span> <Span ref="s2">da</Span>  <Span ref="s3">'s</Span>  <Span ref="s4">de</Span>  <Span ref="s5">enige</Span>  <Span ref="s6">echte</Span>  <Span ref="s7">hoop</Span>  <Span ref="s8">voor</Span>  <Span ref="s9">ons</Span>  <Span ref="s10">mensen</Span>
     516    </Layer>
     517
     518  <Layer id="http://endpoint.example.org/Layers/pos1 ">
     519    <Span ref="s1"  alt-value="SPEC(afgebr)">X</Span> <Span ref="s2"  alt-value="VNW(aanw,pron,stan,vol,3o,ev)">PRON</Span> <Span ref="s3"  alt-value="WW(pv,tgw,ev)">VERB</Span> <Span ref="s4"  alt-value="LID(bep,stan,rest)">DET</Span> <Span ref="s5"  alt-value="VNW(onbep,det,stan,prenom,met-e,rest)">DET</Span> <Span ref="s6"  alt-value="ADJ(prenom,basis,met-e,stan)">ADJ</Span> <Span ref="s7"  alt-value="N(soort,ev,basis,zijd,stan)">NOUN</Span> <Span ref="s8"  alt-value="VZ(init)">ADP</Span> <Span ref="s9"  alt-value="VNW(pr,pron,obl,vol,1,mv)">PRON</Span> <Span ref="s10" alt-value="N(soort,mv,basis)">NOUN</Span>
     520  </Layer>
     521
     522  <Layer id="http://endpoint.example.org/Layers/lemma1 ">
     523    <Span ref="s1">_</Span> <Span ref="s2">dat</Span> <Span ref="s3">zijn</Span> <Span ref="s4" >de</Span> <Span ref="s5">enig</Span> <Span ref="s6" highlight="h1">echt</Span> <Span ref="s7" highlight="h1">hoop</Span> <Span ref="s8">voor</Span> <Span ref="s9">ons</Span> <Span ref="s10">mens</Span>
     524  </Layer>
     525
     526  <Layer id="http://endpoint.example.org/Layers/phon ">
     527    <Span ref="s1">t@</Span> <Span ref="s2" highlight="h2">dAz</Span> <Span ref="s3">dAz</Span> <Span ref="s4">d@</Span> <Span ref="s5">en@G@</Span> <Span ref="s6">Ext@</Span> <Span ref="s7">hop</Span> <Span ref="s8">for</Span> <Span ref="s9">Ons</Span> <Span ref="s10">mEns@</Span>
     528  </Layer>
     529
     530</Layers> </Advanced> }}}
     531
     532=== Versioning and Extensions ===
     533==== Backwards Compatibility ==== #backwardsCompatibility
    712534{{{
    713535#!div style="border: 1px solid #000000; font-size: 75%"
    714536TODO: check and proof-read
    715537}}}
    716 
    717538Clients `MUST` be compatible to CLARIN-FCS 1.0, thus must implement SRU 1.2. If a Client uses CLARIN-FCS 1.0 to talk to an Endpoint, it `MUST NOT` use features beyond the Basic Search capability. Clients `MUST` implement a heuristic to automatically determine which CLARIN-FCS protocol version, i.e. which version of the SRU protocol, can be used talk an Endpoint.
    718539
    719540Clients `MUST` be able to process the legacy XML namespaces:
     541
    720542 * `http://www.loc.gov/zing/srw/` for SRU response documents, and
    721543 * `http://www.loc.gov/zing/srw/diagnostic/` for diagnostics within SRU response documents.
     544
    722545which SRU 1.2 Endpoints use for serializing responses as well as the OASIS XML namespaces. CLARIN-FCS deviates from the OASIS specification [#REF_SRU_Overview OASIS-SRU-Overview] and [#REF_SRU_12 OASIS-SRU-12] to ensure backwards comparability with SRU 1.2 services as they were defined by the [#REF_LOC_SRU_12 LOC-SRU12].
    723546
    724547Pseudo algorithm for version detection heuristic:
     548
    725549 * Send ''explain'' request without `version` and `operation` parameter
    726550 * Check SRU response for content of the element `<sru:explainResponse>/<sru:version>`
    727551
    728 ==== Endpoint Custom Extensions
    729 Endpoints can add custom extensions, i.e. custom data, to the Result Format. This extension mechanism can for example be used to provide hints for an (XSLT/XQuery) application that works directly on CLARIN-FCS, e.g. to allow it to generate back and forward links to navigate in a result set. 
     552==== Endpoint Custom Extensions ====
     553Endpoints can add custom extensions, i.e. custom data, to the Result Format. This extension mechanism can for example be used to provide hints for an (XSLT/XQuery) application that works directly on CLARIN-FCS, e.g. to allow it to generate back and forward links to navigate in a result set.
    730554
    731555An Endpoint `MAY` add arbitrary XML fragments to the extension hooks provided in the `<fcs:Resource>` element (see the XML schema for "Resource.xsd"). The XML fragment for the extension `MUST` use a custom XML namespace name for the extension. Endpoints `MUST NOT` use XML namespace names that start with the prefixes `http://clarin.eu`, `http://www.clarin.eu/`, `https://clarin.eu` or `https://www.clarin.eu/`.
     
    735559The non-normative appendix contains an [#extensionExample example], how an extension could be implemented.
    736560
    737 = CLARIN-FCS to SRU/CQL binding
    738 == SRU/CQL #sruCQL
     561= CLARIN-FCS to SRU/CQL binding =
     562== SRU/CQL == #sruCQL
    739563{{{
    740564#!div style="border: 1px solid #000000; font-size: 75%"
     
    744568
    745569Endpoints and Clients `MUST` implement the SRU/CQL protocol suite as defined in [#REF_SRU_Overview OASIS-SRU-Overview], [#REF_SRU_APD OASIS-SRU-APD], [#REF_CQL OASIS-CQL], [#REF_Explain SRU-Explain], [#REF_Scan SRU-Scan], especially with respect to:
     570
    746571 * Data Model,
    747572 * Query Model,
    748573 * Processing Model,
    749574 * Result Set Model, and
    750  * Diagnostics Model   
    751 
    752 Endpoints and Clients `MUST` implement the APD Binding for SRU 2.0, as defined in [#REF_SRU_20 OASIS-SRU-20]. \\
    753 Clients `MUST` implement APD Binding for SRU 1.2, as defined in [#REF_SRU_12 OASIS-SRU-12]. \\
    754 Clients `MAY` also implement APD binding for version 1.1. \\
    755 '''NOTE''': when implementing SRU 1.2 Endpoints and Clients `MUST` behave like described in the section [#backwardsCompatibility Backwards Compatibility].
     575 * Diagnostics Model
     576
     577Endpoints and Clients `MUST` implement the APD Binding for SRU 2.0, as defined in [#REF_SRU_20 OASIS-SRU-20]. \\ Clients `MUST` implement APD Binding for SRU 1.2, as defined in [#REF_SRU_12 OASIS-SRU-12]. \\ Clients `MAY` also implement APD binding for version 1.1. \\ '''NOTE''': when implementing SRU 1.2 Endpoints and Clients `MUST` behave like described in the section [#backwardsCompatibility Backwards Compatibility].
    756578
    757579Endpoints or Clients `MUST` support CQL conformance ''Level 2'' (as defined in [#REF_OASIS_CQL OASIS-CQL, section 6]), i.e. be able to ''parse'' (Endpoints) or ''serialize'' (Clients) all of CQL and respond with appropriate error messages to the search/retrieve protocol interface.
     
    763585Endpoints `MUST` support the HTTP GET [#REF_SRU_20 OASIS-SRU-20, Appendix B.1] and HTTP POST [#REF_SRU_20 OASIS-SRU-20, Appendix B.2] lower level protocol binding. Endpoints `MAY` also support the SOAP [#REF_SRU_20 OASIS-SRU-20, Appendix B.3] binding.
    764586
    765 
    766 == Operation ''explain'' #explain
     587== Operation ''explain'' == #explain
    767588{{{
    768589#!div style="border: 1px solid #000000; font-size: 75%"
     
    774595
    775596According to the Capabilities supported by the Endpoint the Explain record `MUST` contain the following elements:
    776  ''Basic-Search'' Capability::
    777    `<zr:serverInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`) \\
    778    `<zr:databaseInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`) \\
    779    `<zr:schemaInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`). This element `MUST` contain an element `<zr:schema>` with an `@identifier` attribute with a value of `http://clarin.eu/fcs/resource` and an `@name` attribute with a value of `fcs`. \\
    780    `<zr:configInfo>` is `OPTIONAL` \\
    781    Other capabilities  may define how the `<zr:indexInfo>` element is to be used, therefore it is `NOT RECOMMENDED` for Endpoints to use it in custom extensions.
     597
     598 ''Basic-Search'' Capability:: `<zr:serverInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`) \\ `<zr:databaseInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`) \\ `<zr:schemaInfo>` as defined in [#REF_Explain SRU-Explain] (`REQUIRED`). This element `MUST` contain an element `<zr:schema>` with an `@identifier` attribute with a value of `http://clarin.eu/fcs/resource` and an `@name` attribute with a value of `fcs`. \\ `<zr:configInfo>` is `OPTIONAL` \\ Other capabilities  may define how the `<zr:indexInfo>` element is to be used, therefore it is `NOT RECOMMENDED` for Endpoints to use it in custom extensions.
    782599
    783600To support auto-configuration in CLARIN-FCS, the Endpoint `MUST` provide support ''Endpoint Description''. The Endpoint Description is included in explain response utilizing SRUs extension mechanism, i.e. by embedding an XML fragment into the `<sru:extraResponseData>` element. The Endpoint `MUST` include the Endpoint Description ''only'' if the Client performs an explain request with the ''extra request parameter'' `x-fcs-endpoint-description` with a value of `true`. If the Client performs an explain request ''without'' supplying this extra request parameter the Endpoint `MUST NOT` include the Endpoint Description. The format of the Endpoint Description XML fragment is defined in [#endpointDescription Endpoint Description].
    784601
    785602The following example shows a request and response to an ''explain'' request with added extra request parameter `x-fcs-endpoint-description`:
    786 * HTTP GET request: Client → Endpoint:
    787 {{{#!sh
    788 http://repos.example.org/fcs-endpoint?operation=explain&version=1.2&x-fcs-endpoint-description=true
    789 }}}
    790 * HTTP Response: Endpoint → Client:
    791 {{{#!xml
    792 <?xml version='1.0' encoding='utf-8'?>
    793 <sru:explainResponse xmlns:sru="http://www.loc.gov/zing/srw/">
    794   <sru:version>1.2</sru:version>
    795   <sru:record>
    796     <sru:recordSchema>http://explain.z3950.org/dtd/2.0/</sru:recordSchema>
    797     <sru:recordPacking>xml</sru:recordPacking>
    798     <sru:recordData>
     603
     604 * HTTP GET request: Client → Endpoint:
     605
     606{{{#!sh http://repos.example.org/fcs-endpoint?operation=explain&version=1.2&x-fcs-endpoint-description=true }}}
     607
     608 * HTTP Response: Endpoint → Client:
     609
     610{{{#!xml <?xml version='1.0' encoding='utf-8'?> <sru:explainResponse xmlns:sru="http://www.loc.gov/zing/srw/">
     611
     612  <sru:version> 1.2</sru:version > <sru:record>
     613    <sru:recordSchema> http://explain.z3950.org/dtd/2.0/ </sru:recordSchema > <sru:recordPacking> xml</sru:recordPacking > <sru:recordData>
    799614      <zr:explain xmlns:zr="http://explain.z3950.org/dtd/2.0/">
    800         <!-- <zr:serverInfo > is REQUIRED -->
    801         <zr:serverInfo protocol="SRU" version="1.2" transport="http">
    802           <zr:host>repos.example.org</zr:host>
    803           <zr:port>80</zr:port>
    804           <zr:database>fcs-endpoint</zr:database>
    805         </zr:serverInfo>
    806         <!-- <zr:databaseInfo> is REQUIRED -->
    807         <zr:databaseInfo>
    808           <zr:title lang="de">Goethe Corpus</zr:title>
    809           <zr:title lang="en" primary="true">Goethe Korpus</zr:title>
    810           <zr:description lang="de">Der Goethe Korpus des IDS Mannheim.</zr:description>
    811           <zr:description lang="en" primary="true">The Goethe corpus of IDS Mannheim.</zr:description>
    812         </zr:databaseInfo>
    813         <!-- <zr:schemaInfo> is REQUIRED -->
    814         <zr:schemaInfo>
     615        <!-- <zr:serverInfo>  is REQUIRED --> <zr:serverInfo protocol="SRU" version="1.2" transport="http">
     616          <zr:host> repos.example.org</zr:host > <zr:port> 80</zr:port > <zr:database> fcs-endpoint</zr:database >
     617        </zr:serverInfo > <!-- <zr:databaseInfo>  is REQUIRED --> <zr:databaseInfo>
     618          <zr:title lang="de"> Goethe Corpus</zr:title > <zr:title lang="en" primary="true"> Goethe Korpus</zr:title > <zr:description lang="de"> Der Goethe Korpus des IDS Mannheim.</zr:description > <zr:description lang="en" primary="true"> The Goethe corpus of IDS Mannheim.</zr:description >
     619        </zr:databaseInfo > <!-- <zr:schemaInfo>  is REQUIRED --> <zr:schemaInfo>
    815620          <zr:schema identifier="http://clarin.eu/fcs/resource" name="fcs">
    816             <zr:title lang="en" primary="true">CLARIN Federated Content Search</zr:title>
    817           </zr:schema>
    818         </zr:schemaInfo>
    819         <!-- <zr:configInfo> is OPTIONAL -->
    820         <zr:configInfo>
    821           <zr:default type="numberOfRecords">250</zr:default>
    822           <zr:setting type="maximumRecords">1000</zr:setting>
    823         </zr:configInfo>
    824       </zr:explain>
    825     </sru:recordData>
    826   </sru:record>
    827   <!-- <sru:echoedExplainRequest> is OPTIONAL -->
    828   <sru:echoedExplainRequest>
    829     <sru:version>1.2</sru:version>
    830     <sru:baseUrl>http://repos.example.org/fcs-endpoint</sru:baseUrl>
    831   </sru:echoedExplainRequest>
    832   <sru:extraResponseData>
     621            <zr:title lang="en" primary="true"> CLARIN Federated Content Search</zr:title >
     622          </zr:schema >
     623        </zr:schemaInfo > <!-- <zr:configInfo>  is OPTIONAL --> <zr:configInfo>
     624          <zr:default type="numberOfRecords"> 250</zr:default > <zr:setting type="maximumRecords"> 1000</zr:setting >
     625        </zr:configInfo >
     626      </zr:explain >
     627    </sru:recordData >
     628  </sru:record > <!-- <sru:echoedExplainRequest>  is OPTIONAL --> <sru:echoedExplainRequest>
     629    <sru:version> 1.2</sru:version > <sru:baseUrl> http://repos.example.org/fcs-endpoint </sru:baseUrl >
     630  </sru:echoedExplainRequest > <sru:extraResponseData>
    833631    <ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/endpoint-description" version="1">
    834632      <ed:Capabilities>
    835           <ed:Capability>http://clarin.eu/fcs/capability/basic-search</ed:Capability>
    836       </ed:Capabilities>
    837       <ed:SupportedDataViews>
    838           <ed:SupportedDataView id="hits" delivery-policy="send-by-default">application/x-clarin-fcs-hits+xml</ed:SupportedDataView>
    839       </ed:SupportedDataViews>
    840       <ed:Resources>
    841           <!-- just one top-level resource at the Endpoint -->
    842           <ed:Resource pid="http://hdl.handle.net/4711/0815">
    843               <ed:Title xml:lang="de">Goethe Corpus</ed:Title>
    844               <ed:Title xml:lang="en">Goethe Korpus</ed:Title>
    845               <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description>
    846               <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description>
    847               <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI>
    848               <ed:Languages>
    849                   <ed:Language>deu</ed:Language>
    850               </ed:Languages>
    851               <ed:AvailableDataViews ref="hits"/>
    852           </ed:Resource>
    853       </ed:Resources>
    854     </ed:EndpointDescription>
    855   </sru:extraResponseData>
    856 </sru:explainResponse>
    857 }}}
    858 
    859 == Operation ''scan'' #scan
     633        <ed:Capability> http://clarin.eu/fcs/capability/basic-search </ed:Capability >
     634      </ed:Capabilities > <ed:SupportedDataViews>
     635        <ed:SupportedDataView id="hits" delivery-policy="send-by-default"> application/x-clarin-fcs-hits+xml</ed:SupportedDataView >
     636      </ed:SupportedDataViews > <ed:Resources>
     637        <!-- just one top-level resource at the Endpoint --> <ed:Resource pid="http://hdl.handle.net/4711/0815">
     638          <ed:Title xml:lang="de"> Goethe Corpus</ed:Title > <ed:Title xml:lang="en"> Goethe Korpus</ed:Title > <ed:Description xml:lang="de"> Der Goethe Korpus des IDS Mannheim.</ed:Description > <ed:Description xml:lang="en"> The Goethe corpus of IDS Mannheim.</ed:Description > <ed:LandingPageURI> http://repos.example.org/corpus1.html </ed:LandingPageURI > <ed:Languages>
     639            <ed:Language> deu</ed:Language >
     640          </ed:Languages > <ed:AvailableDataViews ref="hits"/>
     641        </ed:Resource >
     642      </ed:Resources >
     643    </ed:EndpointDescription >
     644  </sru:extraResponseData >
     645
     646</sru:explainResponse> }}}
     647
     648== Operation ''scan'' == #scan
    860649The ''scan'' operation of the SRU protocol is currently not used in the ''Basic Search'' or ''Advanced Search'' capability of CLARIN-FCS. Future capabilities may use this operation, therefore it is `NOT RECOMMENDED` for Endpoints to define custom extensions that use this operation.
    861650
    862 == Operation ''searchRetrieve'' #searchRetrieve
     651== Operation ''searchRetrieve'' == #searchRetrieve
    863652{{{
    864653#!div style="border: 1px solid #000000; font-size: 75%"
     
    868657The ''searchRetrieve'' operation of the SRU protocol is used for searching in the Resources that are provided by the Endpoint. The SRU protocol defines the serialization of request and response formats in [#REF_SRU_20 OASIS-SRU-20] for SRU version 2.0 and [#REF_SRU_12 OASIS-SRU-12] for SRU version 1.2. An Endpoint `MUST` respond in the correct format, i.e. when Endpoint also supports SRU 1.2 and the request is issued in SRU version 1.2, the response must be encoded accordingly.
    869658
    870 In SRU, search result hits are encoded down to a record level, i.e. the `<sru:record>` element, and SRU allows records to be serialized in various formats, so called ''record schemas'' Endpoints `MUST` support the CLARIN-FCS record schema (see section [#resultFormat Result Format]) and `MUST` use the value `http://clarin.eu/fcs/resource` for the ''responseItemType'' ("record schema identifier").
    871 Endpoints `MUST` represent exactly ''one hit'' within the Resource as one SRU record, i.e. `<sru:record>` element.
     659In SRU, search result hits are encoded down to a record level, i.e. the `<sru:record>` element, and SRU allows records to be serialized in various formats, so called ''record schemas'' Endpoints `MUST` support the CLARIN-FCS record schema (see section [#resultFormat Result Format]) and `MUST` use the value `http://clarin.eu/fcs/resource` for the ''responseItemType'' ("record schema identifier"). Endpoints `MUST` represent exactly ''one hit'' within the Resource as one SRU record, i.e. `<sru:record>` element.
    872660
    873661The following example shows a request and response to a ''searchRetrieve'' request with a ''term-only'' query for "cat":
    874 * HTTP GET request: Client → Endpoint:
    875 {{{#!sh
    876 http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat
    877 }}}
    878 * HTTP Response: Endpoint → Client:
    879 {{{#!xml
    880 <?xml version='1.0' encoding='utf-8'?>
    881 <sru:searchRetrieveResponse xmlns:sru="http://www.loc.gov/zing/srw/">
    882   <sru:version>1.2</sru:version>
    883   <sru:numberOfRecords>6</sru:numberOfRecords>
    884   <sru:records>
     662
     663 * HTTP GET request: Client → Endpoint:
     664
     665{{{#!sh http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat }}}
     666
     667 * HTTP Response: Endpoint → Client:
     668
     669{{{#!xml <?xml version='1.0' encoding='utf-8'?> <sru:searchRetrieveResponse xmlns:sru="http://www.loc.gov/zing/srw/">
     670
     671  <sru:version> 1.2</sru:version > <sru:numberOfRecords> 6</sru:numberOfRecords > <sru:records>
    885672    <sru:record>
    886       <sru:recordSchema>http://clarin.eu/fcs/resource</sru:recordSchema>
    887       <sru:recordPacking>xml</sru:recordPacking>
    888       <sru:recordData>
     673      <sru:recordSchema> http://clarin.eu/fcs/resource </sru:recordSchema > <sru:recordPacking> xml</sru:recordPacking > <sru:recordData>
    889674        <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/08-15">
    890675          <fcs:ResourceFragment>
    891676            <fcs:DataView type="application/x-clarin-fcs-hits+xml">
    892677              <hits:Result xmlns:hits="http://clarin.eu/fcs/dataview/hits">
    893                 The quick brown <hits:Hit>cat</hits:Hit> jumps over the lazy dog.
    894               </hits:Result>
    895             </fcs:DataView>
    896           </fcs:ResourceFragment>
    897         </fcs:Resource>
    898       </sru:recordData>
    899       <sru:recordPosition>1</sru:recordPosition>
    900     </sru:record>
    901     <!-- more <sru:records> omitted for brevity -->
    902   </sru:records>
    903   <!-- <sru:echoedSearchRetrieveRequest> is OPTIONAL -->
    904   <sru:echoedSearchRetrieveRequest>
    905     <sru:version>1.2</sru:version>
    906     <sru:query>cat</sru:query>
    907     <sru:xQuery xmlns="http://www.loc.gov/zing/cql/xcql/">
     678                The quick brown <hits:Hit> cat</hits:Hit > jumps over the lazy dog.
     679              </hits:Result >
     680            </fcs:DataView >
     681          </fcs:ResourceFragment >
     682        </fcs:Resource >
     683      </sru:recordData > <sru:recordPosition> 1</sru:recordPosition >
     684    </sru:record > <!-- more <sru:records>  omitted for brevity -->
     685  </sru:records > <!-- <sru:echoedSearchRetrieveRequest>  is OPTIONAL --> <sru:echoedSearchRetrieveRequest>
     686    <sru:version> 1.2</sru:version > <sru:query> cat</sru:query > <sru:xQuery xmlns="http://www.loc.gov/zing/cql/xcql/">
    908687      <searchClause>
    909         <index>cql.serverChoice</index>
    910         <relation>
     688        <index>cql.serverChoice</index> <relation>
    911689          <value>=</value>
    912         </relation>
    913         <term>cat</term>
     690        </relation> <term>cat</term>
    914691      </searchClause>
    915     </sru:xQuery>
    916     <sru:startRecord>1</sru:startRecord>
    917     <sru:baseUrl>http://repos.example.org/fcs-endpoint</sru:baseUrl>
    918   </sru:echoedSearchRetrieveRequest>
    919 </sru:searchRetrieveResponse>
    920 }}}
     692    </sru:xQuery > <sru:startRecord> 1</sru:startRecord > <sru:baseUrl> http://repos.example.org/fcs-endpoint </sru:baseUrl >
     693  </sru:echoedSearchRetrieveRequest >
     694
     695</sru:searchRetrieveResponse> }}}
    921696
    922697In general, the Endpoint is `REQUIRED` to accept an ''unrestricted search'' and `SHOULD` perform the search operation on ''all'' Resources that are available at the Endpoint. If that is for some reason not feasible, e.g. performing an unrestricted search would allocate too many resources, the Endpoint `MAY` independently restrict the search to a scope that it can handle. If it does so, it `MUST` issue a non-fatal diagnostics `http://clarin.eu/fcs/diagnostic/2` ("Resource set too large. Query context automatically adjusted."). The details field of diagnostics `MUST` contain the persistent identifier of the resources to which the query scope was limited to. If the Endpoint limits the query scope to more than one resource, it `MUST` generate a ''separate'' non-fatal diagnostic `http://clarin.eu/fcs/diagnostic/2` for each of the resources.
     
    926701The Client can extract all valid persistent identifiers from the `@pid` attribute of the `<ed:Resource>` element, obtained by the ''explain'' request (see section [#explain Operation ''explain''] and section [#endpointDescription Endpoint Description]). The list of persistent identifiers can get extensive, but a Client can use the HTTP POST method instead of HTTP GET method for submitting the request.
    927702
    928 For example, to restrict the search to the Resource with the persistent identifier `http://hdl.handle.net/4711/0815` the Client must issue the following request:
    929 {{{#!sh
    930 http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-context=http://hdl.handle.net/4711/0815
    931 }}}
    932 To restrict the search to the Resources with the persistent identifier `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816-2` the Client must issue the following request:
    933 {{{#!sh
    934 http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-context=http://hdl.handle.net/4711/0815,http://hdl.handle.net/4711/0816-2
    935 }}}
    936 If an invalid persistent identifier is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/1` diagnostic, i.e. add the appropriate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. just issue the diagnostic and perform no search, or it `MAY` treat it as non-fatal and perform the search.
    937 
    938 If a Client wants to request one or more Data Views, that are handled by Endpoint with the ''need-to-request'' delivery policy, it `MUST` pass a comma-separated list of ''Data View identifier'' in the `x-fcs-dataviews` extra request parameter of the 'searchRetrieve' request. A Client can extract valid values for the ''Data View identifiers'' from the `@id` attribute of the `<ed:SupportedDataView>` elements in the Endpoint Description of the Endpoint (see section [#explain Operation ''explain''] and section [#endpointDescription Endpoint Description]).
    939 
    940 For example, to request the CMDI Data View from an Endpoint that has an Endpoint Description, as described in [#REF_Example_5 Example 5], a Client would need to use the ''Data View identifier'' `cmdi` and submit the following request:
    941 {{{#!sh
    942 http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-dataviews=cmdi
    943 }}}
    944 If an invalid ''Data View identifier'' is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/4`diagnostic, i.e. add the appropriate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. simply issue the diagnostic and perform no search, or it `MAY` treat it a non-fatal and perform the search.
    945 
    946 
    947 = Normative Appendix
     703For example, to restrict the search to the Resource with the persistent identifier `http://hdl.handle.net/4711/0815` the Client must issue the following request: {{{#!sh http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-context=http://hdl.handle.net/4711/0815 }}} To restrict the search to the Resources with the persistent identifier `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816-2` the Client must issue the following request: {{{#!sh http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-context=http://hdl.handle.net/4711/0815,http://hdl.handle.net/4711/0816-2 }}} If an invalid persistent identifier is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/1` diagnostic, i.e. add the appropriate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. just issue the diagnostic and perform no search, or it `MAY` treat it as non-fatal and perform the search.
     704
     705If a Client wants to request one or more Data Views, that are handled by Endpoint with the ''need-to-request'' delivery policy, it `MUST` pass a comma-separated list of ''Data View identifier'' in the `x-fcs-dataviews` extra request parameter of the 'searchRetrieve' request. A Client can extract valid values for the ''Data View identifiers'' from the `@id` attribute of the `<ed:SupportedDataView>` elements in the Endpoint Description of the Endpoint (see section [#explain Operation ''explain''] and section [#endpointDescription Endpoint Description]).
     706
     707For example, to request the CMDI Data View from an Endpoint that has an Endpoint Description, as described in [#REF_Example_5 Example 5], a Client would need to use the ''Data View identifier'' `cmdi` and submit the following request: {{{#!sh http://repos.example.org/fcs-endpoint?operation=searchRetrieve&version=1.2&query=cat&x-fcs-dataviews=cmdi }}} If an invalid ''Data View identifier'' is passed by the Client, the Endpoint `MUST` issue a `http://clarin.eu/fcs/diagnostic/4`diagnostic, i.e. add the appropriate XML fragment to the `<sru:diagnostics>` element of the response. The Endpoint `MAY` treat this condition as fatal, i.e. simply issue the diagnostic and perform no search, or it `MAY` treat it a non-fatal and perform the search.
     708
     709= Normative Appendix =
    948710{{{
    949711#!div style="border: 1px solid #000000; font-size: 75%"
    950712TODO: check and proof-read all sub-sections.
    951713}}}
    952 == List of extra request parameters
     714== List of extra request parameters ==
    953715The following extra request parameters are used in CLARIN-FCS. The column ''SRU operations'' lists the SRU operation, for which this extra request parameter is to be used. Clients `MUST NOT` use the parameter for an operation that is not listed in this column. However, if a Client sends an invalid parameter, an Endpoint `SHOULD` issue a fatal diagnostic "Unsupported Parameter" (`info:srw/diagnostic/1/8`) and stop processing the request. Alternatively, an Endpoint `MAY` silently ignore the invalid parameter.
    954 ||=Parameter Name   =||=SRU operations =||=Allowed values =||=Description =||
     716
     717||=Parameter Name =||=SRU operations =||=Allowed values =||=Description =||
    955718|| `x-fcs-endpoint-description` || explain || `true`; all other values are reserved and `MUST` not be used by Clients || If the parameter is given (with the value `true`), the Endpoint `MUST` include an Endpoint Description in the `<sru:extraResponseData>` element of the ''explain'' response. ||
    956719|| `x-fcs-context` || searchRetrieve || A comma-separated list of persistent identifiers || The Endpoint `MUST` restrict the search to the resources identified by the persistent identifiers. ||
     
    958721|| `x-fcs-rewrites-allowed` || searchRetrieve || `true`; all other values are reserved and `MUST` not be used by Clients. \\ Clients `MUST` only use this parameter when performing an Advanced Search request. || If the parameter is given (with the value `true`), the Endpoint `MAY` rewrite the query to a simpler query to allow for more recall. ||
    959722
    960 == List of diagnostics
     723== List of diagnostics ==
    961724{{{
    962725#!div style="border: 1px solid #000000; font-size: 75%"
     
    964727}}}
    965728Apart from the SRU diagnostics defined in [#REF_SRU_12 OASIS-SRU-12, Appendix C] and [#REF_LOC_DIAG LOC-DIAG], the following diagnostics are used in CLARIN-FCS.  The column "Details Format" specifies what `SHOULD` be returned in the details field. If this column is blank, the format is "undefined" and the Endpoint `MAY` return whatever it feels appropriate, including nothing. The column "Impact" specifies, if the endpoint should continue ("non-fatal") or should stop ("fatal") processing.
    966 ||=Identifier URI                     =||=Description                                                          =||=Details Format =||=Impact =||=Note =||
     729
     730||=Identifier URI =||=Description =||=Details Format =||=Impact =||=Note =||
    967731|| `http://clarin.eu/fcs/diagnostic/1` || Persistent identifier passed by the Client for restricting the search is invalid. || The offending persistent identifier. || non-fatal || If more than one invalid persistent identifiers were submitted by the Client, the Endpoint `MUST` generate a separate diagnostic for each invalid persistent identifier. ||
    968732|| `http://clarin.eu/fcs/diagnostic/2` || Resource set too large. Query context automatically adjusted. || The persistent identifier of the resource to which the query context was adjusted. || non-fatal || If an Endpoint limited the query context to more than one resource, it `MUST` generate a separate diagnostic for each resource to which the query context was adjusted. ||
    969733|| `http://clarin.eu/fcs/diagnostic/3` || Resource set too large. Cannot perform Query. || || fatal || ||
    970734|| `http://clarin.eu/fcs/diagnostic/4` || Requested Data View not valid for this resource. || The Data View MIME type. || non-fatal || If more than one invalid Data View was requested, the Endpoint `MUST` generate a separate diagnostic for each invalid Data View. ||
    971 || `http://clarin.eu/fcs/diagnostic/10` || General query syntax error.  || Detailed error message why the query could not be parsed. || fatal || Endpoints `MUST` use this diagnostic only if the Client performed an Advanced Search request. ||
     735|| `http://clarin.eu/fcs/diagnostic/10` || General query syntax error. || Detailed error message why the query could not be parsed. || fatal || Endpoints `MUST` use this diagnostic only if the Client performed an Advanced Search request. ||
    972736|| `http://clarin.eu/fcs/diagnostic/11` || Query too complex. Cannot perform Query. || Details why could not be performed, e.g. unsupported layer or unsupported combination of operators. || fatal || Endpoints `MUST` use this diagnostic only if the Client performed an Advanced Search request. ||
    973737|| `http://clarin.eu/fcs/diagnostic/12` || Query was rewritten. || Details how the query was rewritten. || non-fatal || Endpoints `MUST` use this diagnostic only if the Client performed an Advanced Search request with the `x-fcs-rewrites-allowed` request parameter. ||
    974738|| `http://clarin.eu/fcs/diagnostic/14` || General processing hint. || E.g. "No matches, because layer 'XY' is not available in your selection of resources" || non-fatal || Endpoints `MUST` use this diagnostic only if the Client performed an Advanced Search request. ||
    975739
    976 == CLARIN FCS-QL Grammar Specification #fcsQLEBNF
     740== CLARIN FCS-QL Grammar Specification == #fcsQLEBNF
    977741The version of the CLARIN FCS-QL is tied to the FCS Core version starting with version 2.0.
    978742
    979 The grammar specification for the FCS-QL is heavily based on Poliqarp but also with inspiration from other query languages' grammars.
    980 An unqualified or qualified "attribute" denotes the annotation layer to be used, e.g. unqualified "word", "lemma", "pos" or qualified "pos:stts". Default is "text" for compatibility with FCS 1.0 where simple wordforms in a pair of single or double quotes can be matched.
     743The grammar specification for the FCS-QL is heavily based on Poliqarp but also with inspiration from other query languages' grammars. An unqualified or qualified "attribute" denotes the annotation layer to be used, e.g. unqualified "word", "lemma", "pos" or qualified "pos:stts". Default is "text" for compatibility with FCS 1.0 where simple wordforms in a pair of single or double quotes can be matched.
    981744
    982745=== FCS-QL EBNF ===
    983746{{{#!comment
     747
    984748  Please keep the EBNF nicely formatted. Thanks!
    985 }}}
     749
     750}}}
     751
    986752{{{
    987753 [1] query                 ::= main-query within-part?
     
    1092858=== Notes ===
    1093859 * "simple-within-scope": possible values for scope
    1094    *  "sentence", "s", "utterance", "u": denote a matching scope of something like a sentence or utterance. provides compatibility with FCS 1.0 ("Generic Hits", "Each hit SHOULD be presented within the context of a complete sentence.")
     860   * "sentence", "s", "utterance", "u": denote a matching scope of something like a sentence or utterance. provides compatibility with FCS 1.0 ("Generic Hits", "Each hit SHOULD be presented within the context of a complete sentence.")
    1095861   * "paragraph" | "p" | "turn" | "t": denote the next larger unit, e.g. something like a paragraph
    1096862   * "article" | "session": something like a whole document
    1097  * {{{[25]}}} and {{{[26]}}} "any $SOMETING codepoint" are a pain to get easily done in at least ANTLR and JavaCC. Especially in combination with {{{[27]}}}
     863 * `[25]` and `[26]` "any $SOMETING codepoint" are a pain to get easily done in at least ANTLR and JavaCC. Especially in combination with `[27]`
    1098864 * regex are not defined/guarded by this grammar
    1099865
    1100 = Non-normative Appendix
     866= Non-normative Appendix =
    1101867{{{
    1102868#!div style="border: 1px solid #000000; font-size: 75%"
    1103869TODO: check and proof-read all sub-sections.
    1104870}}}
    1105 == Syntax variant for Handle system Persistent Identifier URIs
     871== Syntax variant for Handle system Persistent Identifier URIs ==
    1106872Persistent Identifiers from the Handle system are defined in two syntax variants: a regular URI format for the Handle protocol, i.e. with a `hdl:` prefix, or ''actionable'' URIs with a `http://hdl.handle.net/` prefix. Generally, CLARIN software should support both syntax variants, therefore the CLARIN-FCS Interface Specification does not endorse a specific syntax variant. However, Endpoints are recommended to use the ''actionable'' syntax variant.
    1107873
    1108 == Referring to an Endpoint from a CMDI record
    1109 Centers are encouraged to provide links to their CLARIN-FCS Endpoints in the metadata records for their resources. Other services, like the VLO, can use this information for automatically configuring an Aggregator for searching resources at the Endpoint.
    1110    
    1111 To refer to an Endpoint, a `<cmdi:ResourceProxy>` element with child-element `<cmdi:ResourceType>` set to the value `SearchService` and a `@mimetype` attribute with a value of `application/sru+xml` need to be added to the CMDI record. The content of the `<cmdi:ResourceRef>` element must contain a URI that points to the Endpoint web service.
    1112 
    1113 Example:
    1114 {{{#!xml
    1115 <cmdi:CMD xmlns:cmdi="http://www.clarin.eu/cmd/" CMDVersion="1.1">
     874== Referring to an Endpoint from a CMDI record ==
     875Centers are encouraged to provide links to their CLARIN-FCS Endpoints in the metadata records for their resources. Other services, like the VLO, can use this information for automatically configuring an Aggregator for searching resources at the Endpoint.     To refer to an Endpoint, a `<cmdi:ResourceProxy>` element with child-element `<cmdi:ResourceType>` set to the value `SearchService` and a `@mimetype` attribute with a value of `application/sru+xml` need to be added to the CMDI record. The content of the `<cmdi:ResourceRef>` element must contain a URI that points to the Endpoint web service.
     876
     877Example: {{{#!xml <cmdi:CMD xmlns:cmdi="http://www.clarin.eu/cmd/" CMDVersion="1.1">
     878
    1116879  <cmdi:Header>
    1117     <!-- ... -->
    1118     <cmdi:MdSelfLink>http://hdl.handle.net/4711/0815</cmdi:MdSelfLink>
    1119     <!-- ... -->
    1120   </cmdi:Header>
    1121   <cmdi:Resources>
     880    <!-- ... --> <cmdi:MdSelfLink> http://hdl.handle.net/4711/0815 </cmdi:MdSelfLink > <!-- ... -->
     881  </cmdi:Header > <cmdi:Resources>
    1122882    <cmdi:ResourceProxyList>
    1123       <!-- ... -->
    1124       <cmdi:ResourceProxy id="r4711">
    1125         <cmdi:ResourceType mimetype="application/sru+xml">SearchService</cmdi:ResourceType>
    1126         <cmdi:ResourceRef>http://repos.example.org/fcs-endpoint</cmdi:ResourceRef>
    1127       </cmdi:ResourceProxy>
    1128       <!-- ... -->
    1129     </cmdi:ResourceProxyList>
    1130   </cmdi:Resources>
    1131   <!-- ... -->
    1132 </cmdi:CMD>
    1133 }}}
    1134 
    1135 == Endpoint custom extensions #extensionExample
     883      <!-- ... --> <cmdi:ResourceProxy id="r4711">
     884        <cmdi:ResourceType mimetype="application/sru+xml"> SearchService </cmdi:ResourceType > <cmdi:ResourceRef> http://repos.example.org/fcs-endpoint </cmdi:ResourceRef >
     885      </cmdi:ResourceProxy > <!-- ... -->
     886    </cmdi:ResourceProxyList >
     887  </cmdi:Resources > <!-- ... -->
     888
     889</cmdi:CMD> }}}
     890
     891== Endpoint custom extensions == #extensionExample
    1136892The CLARIN-FCS protocol specification allows Endpoints to add custom data to their responses, e.g. to provide hints to an (XSLT/XQuery) application that works directly on CLARIN-FCS. It could use the custom data to generate back and forward links for a GUI to navigate in a result set.
    1137893
    1138 The following example illustrates how extensions can be embedded into the Result Format:
    1139 {{{#!xml
    1140 <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/0815">
    1141     <fcs:DataView type="application/x-clarin-fcs-hits+xml">
    1142       <hits:Result xmlns:hits="http://clarin.eu/fcs/dataview/hits">
    1143         The quick brown <hits:Hit>fox</hits:Hit> jumps over the lazy <hits:Hit>dog</hits:Hit>.
    1144       </hits:Result>
    1145     </fcs:DataView>
    1146    
    1147     <!--
    1148         NOTE: this is purely fictional and only serves to demonstrate how
    1149               to add custom extensions to the result representation
    1150               within CLARIN-FCS.
    1151     -->
    1152 
    1153     <!--
    1154         Example 1: a hypothetical Endpoint extension for navigation in a result
    1155         set: it basically provides a set of hrefs, that a GUI can convert into
    1156         navigation buttions.   
    1157     -->
    1158     <nav:navigation xmlns:nav="http://repos.example.org/navigation">
    1159         <nav:curr href="http://repos.example.org/resultset/4711/4611" />
    1160         <nav:prev href="http://repos.example.org/resultset/4711/4610" />
    1161         <nav:next href="http://repos.example.org/resultset/4711/4612" />
    1162     </nav:navigation>
    1163 
    1164     <!--
    1165        Example 2: a hypothetical Endpoint extension for directly referencing parent
    1166        resources: it basically provides a link to the parent resource, that can be
    1167        exploited by a GUI (e.g. build on XSLT/XQuery).
    1168     -->
    1169     <parent:Parent xmlns:parent="http://repos.example.org/parent"
    1170                    ref="http://repos.example.org/path/to/parent/1235.cmdi" />
    1171 </fcs:Resource>
    1172 }}}
    1173 
    1174 == Endpoint highlight hints for repositories
     894The following example illustrates how extensions can be embedded into the Result Format: {{{#!xml <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/0815">
     895
     896  <fcs:DataView type="application/x-clarin-fcs-hits+xml">
     897    <hits:Result xmlns:hits="http://clarin.eu/fcs/dataview/hits">
     898      The quick brown <hits:Hit> fox</hits:Hit > jumps over the lazy <hits:Hit> dog</hits:Hit >.
     899    </hits:Result >
     900  </fcs:DataView >
     901
     902  <!--
     903    NOTE: this is purely fictional and only serves to demonstrate how
     904      to add custom extensions to the result representation within CLARIN-FCS.
     905  -->
     906
     907  <!--
     908    Example 1: a hypothetical Endpoint extension for navigation in a result set: it basically provides a set of hrefs, that a GUI can convert into navigation buttions.
     909  --> <nav:navigation xmlns:nav="http://repos.example.org/navigation">
     910    <nav:curr href="http://repos.example.org/resultset/4711/4611" /> <nav:prev href="http://repos.example.org/resultset/4711/4610" /> <nav:next href="http://repos.example.org/resultset/4711/4612" />
     911  </nav:navigation >
     912
     913  <!--
     914    Example 2: a hypothetical Endpoint extension for directly referencing parent resources: it basically provides a link to the parent resource, that can be exploited by a GUI (e.g. build on XSLT/XQuery).
     915  --> <parent:Parent  xmlns:parent="http://repos.example.org/parent "
     916    ref="http://repos.example.org/path/to/parent/1235.cmdi " />
     917
     918</fcs:Resource> }}}
     919
     920== Endpoint highlight hints for repositories ==
    1175921An Aggregator can use the `@ref` attributes of the `<fcs:Resource>`, `<fcs:ResourceFragment>` or `<fcs:DataView>` elements to provide a link for the user to directly jump to the resource at a Repository. To support hit highlighting, an Endpoint can augment the URI in the `@ref` attribute with query parameters to implement hit highlighting in the Repository.
    1176922
    1177 In the following example, the URI `http://repos.example.org/resource.cgi/<pid>` is a CGI script that displays a given resource at the Repository in HTML format and uses the `highlight` query parameter to add highlights to the resource. Of course, it's up to the Endpoint and the Repository, if and how they implement such a feature.
    1178 {{{#!xml
    1179 <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/0815">
     923In the following example, the URI `http://repos.example.org/resource.cgi/<pid>` is a CGI script that displays a given resource at the Repository in HTML format and uses the `highlight` query parameter to add highlights to the resource. Of course, it's up to the Endpoint and the Repository, if and how they implement such a feature.  {{{#!xml <fcs:Resource xmlns:fcs="http://clarin.eu/fcs/resource" pid="http://hdl.handle.net/4711/0815">
     924
    1180925  <fcs:DataView type="application/x-clarin-fcs-hits+xml" ref="http://repos.example.org/resource.cgi/4711/0815?highlight=fox">
    1181926    <hits:Result xmlns:hits="http://clarin.eu/fcs/dataview/hits">
    1182       The quick brown <hits:Hit>fox</hits:Hit> jumps over the lazy dog.
    1183     </hits:Result>
    1184   </fcs:DataView>
    1185 </fcs:Resource>
    1186 }}}
     927      The quick brown <hits:Hit> fox</hits:Hit > jumps over the lazy dog.
     928    </hits:Result >
     929  </fcs:DataView >
     930
     931</fcs:Resource> }}}