Changes between Version 43 and Version 44 of FCS-Specification-ScrapBook


Ignore:
Timestamp:
02/13/14 20:17:52 (10 years ago)
Author:
oschonef
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FCS-Specification-ScrapBook

    v43 v44  
    55 2. Resource enumeration (aka scan on fcs.resource) rather complex and unintuitive
    66 3. Basic KWIC records has no provision for multiple "highlight" hits
    7  4. Clear recommendation for using Resource and !ResouceFragment
     7 4. No (clear) recommendation for using Resource and !ResouceFragment
    88 5. What about recursiveness in Resource (see current schema)? What is the use case?
    99
     
    3535The main goal of CLARIN federated content search (CLARIN-FCS) is to introduce a ''interface specification'', to decouple the ''search engine'' functionality from its ''exploitation'', i.e. user-interfaces, third-party applications and to allow services to access search engines in an uniform way.
    3636
    37 
    3837=== Terminology ===
    3938The key words `MUST`, `MUST NOT`, `REQUIRED`, `SHALL`, `SHALL NOT`, `SHOULD`, `SHOULD NOT`, `RECOMMENDED`, `MAY`, and `OPTIONAL` in this document are to be interpreted as described in [#REF_RFC_2119 RFC2119].
    4039
     40=== Glossary ===
     41 Aggregator::
     42    A module or service to dispatch queries to repositories and collect results.
     43
     44 CLARIN-FCS, FCS::
     45    CLARIN federated content search, an interface specification to allow searching within resource content of repositories.
     46
     47 Client::
     48    A software component, that implements the interface specification to query Endpoints, i.e. an aggregator or an user-interface.     
     49
     50 CQL::
     51    Contextual Query Language, previously known as Common Query Language, is a formal language for representing queries to information retrieval systems such as search engines, bibliographic catalogs and museum collection information.
     52
     53 Endpoint::
     54    A software component, that implements the CLARIN-FCS interface specification and translates between CLARIN-FCS and a search engine.
     55
     56 Interface Specification::
     57    Common harmonized interface and suite of protocols that repositories need to implement.
     58
     59 Search Engine::
     60    A software component within a repository, that allows for searching within the repository contents.
     61
     62 SRU::
     63    Search and Retrieve via URL, is a protocol for Internet search queries.
     64
     65 Data View::
     66    A Data View is a mechanism to support different representations of search results, e.g. a "hits with highlights" view, an image or a geolocation.
     67
     68 Data View Payload, Payload::
     69    The actual content encoded within a Data View, i.e. a CMDI metadata record or a KML encoded geolocation.
     70
     71 PID::
     72    A Persistent identifier is a long-lasting reference to a digital object.
     73
     74 Repository::
     75    A software component at a CLARIN center that stores resources (= data) and information about these resources (= metadata).
     76
     77 Repository Registry::
     78    A separate service that allows registering Endpoints and provides information about these to other components, e.g. an aggegator. The [http://centres.clarin.eu/ CLARIN Center Registry] is an implementation of such a repository registry.
     79
     80=== Normative References ===
     81 RFC2119[=#REF_RFC_2119]::
     82    Key words for use in RFCs to Indicate Requirement Levels, IETF RFC 2119, March 1997, \\
     83    [http://www.ietf.org/rfc/rfc2119.txt]
     84
     85 XML-Namespaces[=#REF_XML_Namespaces]::
     86    Namespaces in XML 1.1 (Second Edition), W3C, August 2006, \\
     87    [http://www.w3.org/TR/2006/REC-xml-names11-20060816]
     88
     89 OASIS-SRU-Overview[=#REF_SRU_Overview]::
     90    searchRetrieve: Part 0. Overview Version 1.0, OASIS, January 2013, \\
     91    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.doc]
     92    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.html (HTML)],
     93    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.pdf (PDF)]
     94
     95 OASIS-SRU-APD[=#REF_SRU_APD]::
     96    searchRetrieve: Part 1. Abstract Protocol Definition Version 1.0, OASIS, January 2013, \\
     97    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.doc]
     98    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.html (HTML)]
     99    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.pdf (PDF)]
     100
     101 OASIS-SRU12[=#REF_SRU_12]::
     102    searchRetrieve: Part 2. SRU searchRetrieve Operation: APD Binding for SRU 1.2 Version 1.0, OASIS, January 2013, \\
     103    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.doc]
     104    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.html (HTML)]
     105    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.pdf (PDF)]
     106
     107 OASIS-CQL[=#REF_CQL]::
     108    searchRetrieve: Part 5. CQL: The Contextual Query Language version 1.0, OASIS, January 2013, \\
     109    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.doc]
     110    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.html (HTML)]
     111    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.pdf (PDF)]
     112
     113 SRU-Explain[=#REF_Explain]::
     114    searchRetrieve: Part 7. SRU Explain Operation version 1.0, OASIS, January 2013, \\
     115    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.doc]
     116    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.html (HTML)]
     117    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.pdf (PDF)]
     118
     119 SRU-Scan[=#REF_Scan]::
     120    searchRetrieve: Part 6. SRU Scan Operation version 1.0, OASIS, January 2013, \\
     121    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.doc] 
     122    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.html (HTML)]
     123    [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.PDF (PDF)]
     124
     125 LOC-SRU12[=#REF_LOC_SRU_12]::
     126    SRU Version 1.2: SRU !Search/Retrieve Operation, Library of Congress, \\
     127    [http://www.loc.gov/standards/sru/sru-1-2.html]
     128
     129 LOC-DIAG[=#REF_LOC_DIAG]::
     130    SRU Version 1.2: SRU Diagnostics List, Library of Congress,\\
     131    [http://www.loc.gov/standards/sru/diagnostics/diagnosticsList.html]
     132
     133=== Non-Normative References ===
     134 RFC6838[=#REF_RFC_6838]::
     135    Media Type Specifications and Registration Procedures, IETF RFC 6838, January 2013, \\
     136    [http://www.ietf.org/rfc/rfc6838.txt]
     137 RFC3023[=#REF_RFC_3023]::
     138    XML Media Types, IETF RFC 3023, January 2001, \\
     139    [http://www.ietf.org/rfc/rfc3023.txt]
     140 KML[=#REF_KML_Spec]::
     141    Keyhole Markup Language (KML), Open Geospatial Consortium, 2008, \\
     142    [http://www.opengeospatial.org/standards/kml]
    41143
    42144=== Typographic and XML Namespace conventions ===
     
    61163|| `kml`   || `http://www.opengis.net/kml/2.2`                || Keyhole Markup Language         || un-prefixed ||
    62164
    63 === Glossary ===
    64  Aggregator::
    65     A module or service to dispatch queries to repositories and collect results.
    66 
    67  CLARIN-FCS, FCS::
    68     CLARIN federated content search, an interface specification to allow searching within resource content of repositories.
    69 
    70  Client::
    71     A software component, that implements the interface specification to query Endpoints, i.e. an aggregator or an user-interface.     
    72 
    73  CQL::
    74     Contextual Query Language, previously known as Common Query Language, is a formal language for representing queries to information retrieval systems such as search engines, bibliographic catalogs and museum collection information.
    75 
    76  Endpoint::
    77     A software component, that implements the CLARIN-FCS interface specification and translates between CLARIN-FCS and a search engine.
    78 
    79  Interface Specification::
    80     Common harmonized interface and suite of protocols that repositories need to implement.
    81 
    82  Search Engine::
    83     A software component within a repository, that allows for searching within the repository contents.
    84 
    85  SRU::
    86     Search and Retrieve via URL, is a protocol for Internet search queries.
    87 
    88  Data View::
    89     A Data View is a mechanism to support different representations of search results, e.g. a "hits with highlights" view, an image or a geolocation.
    90 
    91  Data View Payload, Payload::
    92     The actual content encoded within a Data View, i.e. a CMDI metadata record or a KML encoded geolocation.
    93 
    94  PID::
    95     A Persistent identifier is a long-lasting reference to a digital object.
    96 
    97  Repository::
    98     A software component at a CLARIN center that stores resources (= data) and information about these resources (= metadata).
    99 
    100  Repository Registry::
    101     A separate service that allows registering Endpoints and provides information about these to other components, e.g. an aggegator. The [http://centres.clarin.eu/ CLARIN Center Registry] is an implementation of such a repository registry.
    102 
    103 === Normative References ===
    104  RFC2119[=#REF_RFC_2119]::
    105     Key words for use in RFCs to Indicate Requirement Levels, IETF RFC 2119, March 1997, \\
    106     [http://www.ietf.org/rfc/rfc2119.txt]
    107 
    108  XML-Namespaces[=#REF_XML_Namespaces]::
    109     Namespaces in XML 1.1 (Second Edition), W3C, August 2006, \\
    110     [http://www.w3.org/TR/2006/REC-xml-names11-20060816]
    111 
    112  OASIS-SRU-Overview[=#REF_SRU_Overview]::
    113     searchRetrieve: Part 0. Overview Version 1.0, OASIS, January 2013, \\
    114     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.doc]
    115     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.html (HTML)],
    116     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.pdf (PDF)]
    117 
    118  OASIS-SRU-APD[=#REF_SRU_APD]::
    119     searchRetrieve: Part 1. Abstract Protocol Definition Version 1.0, OASIS, January 2013, \\
    120     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.doc]
    121     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.html (HTML)]
    122     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.pdf (PDF)]
    123 
    124  OASIS-SRU12[=#REF_SRU_12]::
    125     searchRetrieve: Part 2. SRU searchRetrieve Operation: APD Binding for SRU 1.2 Version 1.0, OASIS, January 2013, \\
    126     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.doc]
    127     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.html (HTML)]
    128     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.pdf (PDF)]
    129 
    130  OASIS-CQL[=#REF_CQL]::
    131     searchRetrieve: Part 5. CQL: The Contextual Query Language version 1.0, OASIS, January 2013, \\
    132     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.doc]
    133     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.html (HTML)]
    134     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.pdf (PDF)]
    135 
    136  SRU-Explain[=#REF_Explain]::
    137     searchRetrieve: Part 7. SRU Explain Operation version 1.0, OASIS, January 2013, \\
    138     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.doc]
    139     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.html (HTML)]
    140     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.pdf (PDF)]
    141 
    142  SRU-Scan[=#REF_Scan]::
    143     searchRetrieve: Part 6. SRU Scan Operation version 1.0, OASIS, January 2013, \\
    144     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.doc] 
    145     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.html (HTML)]
    146     [http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.PDF (PDF)]
    147 
    148  LOC-SRU12[=#REF_LOC_SRU_12]::
    149     SRU Version 1.2: SRU !Search/Retrieve Operation, Library of Congress, \\
    150     [http://www.loc.gov/standards/sru/sru-1-2.html]
    151 
    152  LOC-DIAG[=#REF_LOC_DIAG]::
    153     SRU Version 1.2: SRU Diagnostics List, Library of Congress,\\
    154     [http://www.loc.gov/standards/sru/diagnostics/diagnosticsList.html]
    155 
    156 === Non-Normative References ===
    157  RFC6838[=#REF_RFC_6838]::
    158     Media Type Specifications and Registration Procedures, IETF RFC 6838, January 2013, \\
    159     [http://www.ietf.org/rfc/rfc6838.txt]
    160  RFC3023[=#REF_RFC_3023]::
    161     XML Media Types, IETF RFC 3023, January 2001, \\
    162     [http://www.ietf.org/rfc/rfc3023.txt]
    163  KML[=#REF_KML_Spec]::
    164     Keyhole Markup Language (KML), Open Geospatial Consortium, 2008, \\
    165     [http://www.opengeospatial.org/standards/kml]
    166 
    167165== CLARIN-FCS Interface Specification ==
    168 
    169166The CLARIN-FCS interface specification defined two profiles, an extensible result format and a set of required operations. CLARIN-FCS is built on the SRU/CQL standard and additional functionality required for CLARIN-FCS is added through SRU/CQL's extension mechanisms.
    170167
     
    201198The following sections describe the CLARIN-FCS profiles and query and result formats, how SRU/CQL is used as a transport protocol in the context of CLARIN-FCS and the required CLARIN-FCS specific extensions to SRU.
    202199
    203 
    204200=== Profiles ===
    205201CLARIN-FCS defines two profiles:
     
    226222Endpoints and Clients `MUST` support the ''basic profile''. For now, Endpoints and Clients `MUST NOT` claim to support the ''extended profile''.
    227223
    228 
    229224=== Result Format ===
    230225CLARIN-FCS uses a customized format for returning results. ''Resource'' and ''Resource Fragments'' serve as containers for hit results, that are presented in one or more  ''Data View''. The following section describes the Resource format and Data View format and section [#searchRetrieve Operation ''searchRetrieve''] will describe, how hits are embedded within SRU responses.
     
    293288Endpoints `MUST` serialize one Resource for each hit, i.e. they `MUST NOT` combine several hits in one Resource. E.g., if a query matches five different sentences within one text (= the resource), the Endpoint must serialize five Resource (= one per hit) and embed each within one SRU result (see [#searchRetrieve below]).
    294289
    295 
    296290==== Data View ====
    297291A ''Data View'' serves as a container for representing search results within CLARIN-FCS. Data Views are designed to allow for different representations of results, i.e. they are deliberately kept open to allow further extensions with more supported Data View formats. The content of a Data View is called ''Payload''. Each Payload is typed and the type of the Payload is recorded in the `@type` attribute if the `<fcs:DataView>` element. The Payload type is is identified by a MIME type ([#REF_RFC_6838 RFC6838], [#REF_RFC_3023 RFC3023]). If no existing MIME type can be used, implementors `SHOULD` define a properer private mime type.
     
    327321}}}
    328322
    329 
    330323===== Component Metadata (CMDI) =====
    331324||=Description         =|| A CMDI metadata record ||
     
    348341}}}
    349342
    350 
    351343===== Images (IMG) =====
    352344||=Description         =|| An image related to the hit ||
     
    355347
    356348The ''Image'' Data View allows top provide an image, that is relevant to the hit, e.g. a facsimile of the source of a transcription. Endpoints `MUST` provide the payload by the ''reference'' method and the image file `SHOULD` be downloadable without any restrictions.
    357 
    358349 * Example:
    359350{{{#!xml
     
    361352<fcs:DataView type="image/png" ref="http://repos.example.org/resources/4711/0815.png" />
    362353}}}
    363 
    364354
    365355===== Geolocation (GEO) =====
     
    383373</fcs:DataView>
    384374}}}
    385 
    386375
    387376=== Endpoint Description ===#endpointDescription
     
    489478This more complex [#REF_Example_5 example] show a Endpoint Description for an Endpoint that, similar to [#REF_Example_4 Example 4], supports the ''basic'' profile. In addition to the Generic Hits Data View it also supports CMDI the CMDI Data View. The Endpoint has two top-level collections (identified by the persistent identifiers `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816`. The second top-level collection has two sub-collections, identified by the persistent identifier `http://hdl.handle.net/4711/0816-1` and `http://hdl.handle.net/4711/0816-2`. All collections are described using several properties, like title, description, etc.
    490479
    491 
    492480== CLARIN-FCS to SRU/CQL binding ==
    493 
    494481=== SRU/CQL ===
    495482SRU (!Search/Retrieve via URL) specifies a general communication protocol for searching and retrieving records and the CQL (Contextual Query Language) specifies a extensible query language. CLARIN-FCS is built on SRU 1.2. A subsequent specification may be built on SRU 2.0.
     
    515502Endpoints `MUST` generate diagnostics according to [#REF_SRU_12 OASIS-SRU-12, Appendix C] for error conditions or to indicate unsupported features. Unfortunately, the OASIS specification does not provides a comprehensive list of diagnostics for CQL related errors. Therefore, Endpoints `MUST` use diagnostics from [#REF_LOC_DIAG LOC-DIAG, section "Diagnostics Relating to CQL"] for CQL related errors.
    516503
    517 
    518504=== Operation ''explain'' ===#explain
    519 
    520505The ''explain'' operation of SRU serves to announce server capabilities and to allows clients to configure themselves automatically. This operation is used similarly.
    521506
     
    600585}}}
    601586
    602 
    603587=== Operation ''scan'' ===#scan
    604 
    605588The SRU operation ''scan'' is currently not used in the ''basic'' profile of CLARIN-FCS. An ''extended'' profile may use this operation, therefore it `NOT RECOMMENDED` for Endpoints to define custom extensions that use this operation.
    606589
    607 
    608590=== Operation ''searchRetrieve'' ===#searchRetrieve
    609 
    610591Yada yada ...
    611592
    612593
    613594== Appendix: Non-normative
    614 
    615595The following sections are non-normative.
    616596
    617597=== Referring to an Endpoint from a CMDI record ===
    618 
    619598Centers are encouraged to provide links to their CLARIN-FCS Endpoints in the metadata records for their resources. Other services, like the VLO, can use this information for automatically configuring an Aggregator for searching resources at the Endpoint.
    620599   
    621600To refer to an Endpoint a `<cmdi:ResourceProxy>` with `<cmdi:ResourceType>` set to the value `SearchService` and a `@mimetype` attribute with a value of `application/sru+xml` need to be added to the CMDI record. The content of the `<cmdi:ResourceRef>` element must contain an URI that points to the Endpoint web service.
    622601
    623  * Example:
     602Example:
    624603{{{#!xml
    625604<cmdi:CMD xmlns:cmdi="http://www.clarin.eu/cmd/" CMDVersion="1.1">
     
    643622}}}
    644623
    645 
    646624=== Endpoint custom extensions ===
    647 
    648625The CLARIN-FCS protocol specification allows Endpoints to add custom data to their responses. This extension mechanism can for example be used to provide hints to an (XSLT/XQuery) application that works directly on CLARIN-FCS, e.g. to allow it to generate back and forward links to navigate in a result set.
    649626
     
    678655</fcs:Resource>
    679656}}}
    680 
    681657----