Changes between Version 2 and Version 3 of Taskforces/FCS/FCS-Specification-Draft


Ignore:
Timestamp:
10/19/15 12:51:31 (9 years ago)
Author:
Oliver Schonefeld
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Taskforces/FCS/FCS-Specification-Draft

    v2 v3  
    55[[PageOutline(1-6)]]
    66= CLARIN Federated Content Search (CLARIN-FCS) - Core 2.0
     7The goal of the ''CLARIN Federated Content Search (CLARIN-FCS) - Core'' specification is to introduce an ''interface specification'' that decouples the ''search engine'' functionality from its ''exploitation'', i.e. user-interfaces, third-party applications, and to allow services to access heterogeneous search engines in a uniform way.
     8
    79= Introduction
    8 
     10{{{
     11#!div style="border: 1px solid #000000; font-size: 75%"
     12All following sub-sections to be updated as required.
     13}}}
    914== Terminology
     15The key words `MUST`, `MUST NOT`, `REQUIRED`, `SHALL`, `SHALL NOT`, `SHOULD`, `SHOULD NOT`, `RECOMMENDED`, `MAY`, and `OPTIONAL` in this document are to be interpreted as described in [#REF_RFC_2119 RFC2119].
     16
    1017== Glossary
     18 Aggregator::
     19    A module or service to dispatch queries to repositories and collect results.
     20
     21 CLARIN-FCS, FCS::
     22    CLARIN federated content search, an interface specification to allow searching within resource content of repositories.
     23
     24 Client::
     25    A software component, which implements the interface specification to query Endpoints, i.e. an aggregator or a user-interface.     
     26
     27 CQL::
     28    Contextual Query Language, previously known as Common Query Language, is a formal language for representing queries to information retrieval systems such as search engines, bibliographic catalogs and museum collection information.
     29
     30 Data View::
     31    A Data View is a mechanism to support different representations of search results, e.g. a "hits with highlights" view, an image or a geolocation.
     32
     33 Data View Payload, Payload::
     34    The actual content encoded within a Data View, i.e. a CMDI metadata record or a KML encoded geolocation.
     35
     36 Endpoint::
     37    A software component, which implements the CLARIN-FCS interface specification and translates between CLARIN-FCS and a search engine.
     38
     39 FCS-QL::
     40    Federated Content Search Query Language is the query language used in the advanced CLARIN-FCS profile. It is derived from Corpus Workbench's [#REF_CQP_Tutorial CQP-TUTORIAL]
     41
     42 Hit::
     43    A piece of data returned by a Search Engine that matches the search criterion. What is considered a Hit highly depends on Search Engine.
     44
     45 Interface Specification::
     46    Common harmonized interface and suite of protocols that repositories need to implement.
     47
     48 PID::
     49    A Persistent identifier is a long-lasting reference to a digital object.
     50
     51 Repository::
     52    A software component at a CLARIN center that stores resources (= data) and information about these resources (= metadata).
     53
     54 Repository Registry::
     55    A separate service that allows registering Repositories and their Endpoints and provides information about these to other components, e.g. an Aggregator. The [http://centres.clarin.eu/ CLARIN Center Registry] is an implementation of such a repository registry.
     56
     57 Resource::
     58    A searchable and addressable entity at an Endpoint, such as a text corpus or a multi-modal corpus.
     59
     60 Resource Fragment::
     61    A smaller unit in a Resource, i.e. a sentence in a text corpus or a time interval in an audio transcription.
     62
     63 Result Set::
     64    An (ordered) set of hits that match a search criterion produced by a search engine as the result of processing a query.
     65
     66 Search Engine::
     67    A software component within a repository, that allows for searching within the repository contents.
     68
     69 SRU::
     70    Search and Retrieve via URL, is a protocol for Internet search queries. Originally introduced by Library of Congress [#REF_LOC_SRU_12 LOC-SRU12], later standardization process moved to OASIS [#REF_SRU_12 OASIS-SRU12].
     71
    1172== Normative References
     73 RFC2119[=#REF_RFC_2119]::
     74    Key words for use in RFCs to Indicate Requirement Levels, IETF RFC 2119, March 1997, \\
     75    [http://www.ietf.org/rfc/rfc2119.txt]
     76
     77 XML-Namespaces[=#REF_XML_Namespaces]::
     78    Namespaces in XML 1.0 (Third Edition), W3C, 8 December 2009, \\
     79    [http://www.w3.org/TR/2009/REC-xml-names-20091208/]
     80
     81 OASIS-SRU-Overview[=#REF_SRU_Overview]::
     82    searchRetrieve: Part 0. Overview Version 1.0, OASIS, January 2013, \\
     83    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.doc]
     84    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.html (HTML)],
     85    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.pdf (PDF)]
     86
     87 OASIS-SRU-APD[=#REF_SRU_APD]::
     88    searchRetrieve: Part 1. Abstract Protocol Definition Version 1.0, OASIS, January 2013, \\
     89    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.doc]
     90    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.html (HTML)]
     91    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.pdf (PDF)]
     92
     93 OASIS-SRU12[=#REF_SRU_12]::
     94    searchRetrieve: Part 2. SRU searchRetrieve Operation: APD Binding for SRU 1.2 Version 1.0, OASIS, January 2013, \\
     95    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.doc]
     96    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.html (HTML)]
     97    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.pdf (PDF)]
     98
     99 OASIS-CQL[=#REF_CQL]::
     100    searchRetrieve: Part 5. CQL: The Contextual Query Language version 1.0, OASIS, January 2013, \\
     101    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.doc]
     102    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.html (HTML)]
     103    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.pdf (PDF)]
     104
     105 SRU-Explain[=#REF_Explain]::
     106    searchRetrieve: Part 7. SRU Explain Operation version 1.0, OASIS, January 2013, \\
     107    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.doc]
     108    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.html (HTML)]
     109    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.pdf (PDF)]
     110
     111 SRU-Scan[=#REF_Scan]::
     112    searchRetrieve: Part 6. SRU Scan Operation version 1.0, OASIS, January 2013, \\
     113    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.doc] 
     114    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.html (HTML)]
     115    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.PDF (PDF)]
     116
     117 LOC-SRU12[=#REF_LOC_SRU_12]::
     118    SRU Version 1.2: SRU !Search/Retrieve Operation, Library of Congress, \\
     119    [http://www.loc.gov/standards/sru/sru-1-2.html]
     120
     121 LOC-DIAG[=#REF_LOC_DIAG]::
     122    SRU Version 1.2: SRU Diagnostics List, Library of Congress,\\
     123    [http://www.loc.gov/standards/sru/diagnostics/diagnosticsList.html]
     124
     125 CLARIN-FCS-!DataViews[=#REF_FCS_DataViews]::
     126    CLARIN Federated Content Search (CLARIN-FCS) - Data Views, SCCTC FCS Task-Force, April 2014, \\
     127    [https://trac.clarin.eu/wiki/FCS/Dataviews]
     128
    12129== Non-Normative References
     130 CQP-TUTORIAL[=#REF_CQP_Tutorial]::
     131    Evert et al.: The IMS Open Corpus Workbench (CWB) CQP Query Language Tutorial, CWB Version 3.0, February 2010, \\
     132    [http://cwb.sourceforge.net/files/CQP_Tutorial/]
     133
     134 RFC6838[=#REF_RFC_6838]::
     135    Media Type Specifications and Registration Procedures, IETF RFC 6838, January 2013, \\
     136    [http://www.ietf.org/rfc/rfc6838.txt]
     137
     138 RFC3023[=#REF_RFC_3023]::
     139    XML Media Types, IETF RFC 3023, January 2001, \\
     140    [http://www.ietf.org/rfc/rfc3023.txt]
     141
    13142== Typographic and XML Namespace conventions
    14 {{{
    15 #!div style="border: 1px solid #000000; font-size: 75%"
    16 All sub-sections to be updated as required.
    17 }}}
     143The following typographic conventions for XML fragments will be used throughout this specification:
     144 * `<prefix:Element>` \\ An XML element with the Generic Identifier ''Element'' that is bound to an XML namespace denoted by the prefix ''prefix''.
     145 * `@attr` \\ An XML attribute with the name ''attr''
     146{{{#!comment
     147 * `@prefix:attr` \\ An XML attribute with the name ''attr'' that is bound to an XML namespaces denoted by the prefix ''prefix''.
     148}}}
     149 * `string` \\ The literal ''string'' must be used either as element content or attribute value.
     150Endpoints and Clients `MUST` adhere to the [#REF_XML_Namespaces XML-Namespaces] specification. The CLARIN-FCS interface specification generally does not dictate whether XML elements should be serialized in their prefixed or non-prefixed syntax, but Endpoints `MUST` ensure that the correct XML namespace is used for elements and that XML namespaces are declared correctly. Clients `MUST` be agnostic regarding syntax for serializing the XML elements, i.e. if the prefixed or un-prefixed variant was used, and `SHOULD` operate solely on ''expanded names'', i.e. pairs of ''namespace name'' and ''local name''.
     151
     152The following XML namespace names and prefixes are used throughout this specification. The column "Recommended Syntax" indicates which syntax variant `SHOULD` be used by the Endpoint to serialize the XML response.
     153||=Prefix =||=Namespace Name                                 =||=Comment                          =||=Recommended Syntax =||
     154|| `fcs`   || `http://clarin.eu/fcs/resource`                 || CLARIN-FCS Resources              || prefixed ||
     155|| `ed`    || `http://clarin.eu/fcs/endpoint-description`     || CLARIN-FCS Endpoint Description   || prefixed ||
     156|| `hits`  || `http://clarin.eu/fcs/dataview/hits`            || CLARIN-FCS Generic Hits Data View || prefixed ||
     157|| `adv`   || `http://clarin.eu/fcs/dataview/advanced`        || CLARIN-FCS Advanced Data View     || prefixed ||
     158|| `sru`   || `http://www.loc.gov/zing/srw/`                  || SRU                               || prefixed ||
     159|| `diag`  || `http://www.loc.gov/zing/srw/diagnostic/`       || SRU Diagnostics                   || prefixed ||
     160|| `zr`    || `http://explain.z3950.org/dtd/2.0/`             || SRU/ZeeRex Explain                || prefixed ||
     161{{{
     162#!div style="border: 1px solid #000000; font-size: 75%"
     163Careful with the SRU Namespaces; they probably need to be adjusted SRU 2.0 (=> OASIS).
     164}}}
     165
    18166= CLARIN-FCS Interface Specification
    19167== "Discovery Phase"
     
    48196{{{
    49197#!div style="border: 1px solid #000000; font-size: 75%"
    50 Say something about backwards compatibility with "basic-search"
     198Say something about backwards compatibility with "basic-search". \\
     199Clients should also be compatible with FCS 1.0 (= SRU 1.2) and use heuristic to determine, if an endpoint is still using FCS 1.0.
    51200}}}
    52201==== Endpoint Custom Extensions