Version 8 (modified by 10 years ago) (diff) | ,
---|
FCS Specification Scrapbook
Issues with current document
- Uncomprehensible and not well structures :(
- Resource enumeration (aka scan on fcs.resource) rather complex and unintuitive
- Basic KWIC records has no provision for multiple "highlight" hits
- Clear recommendation for using Resource and ResouceFragment
General ideas / design goals towards better specification
- Define FCS conformance level independent of what SRU/CQL do. Don't call them "level", but maybe something like profile to avoid confusion.
- Do a basic profile first
- Do an advanced/extend profile later in a separate specification or specification amendment (which must be, of course, compatible to basic profile)
- Add provisions to, e.g. explain output, to allow endpoints to indicate the profile, they support
- Better structure of document (and don't include aggregation stuff; that's a different specification; implementors of endpoints should not need to worry about aggregator implementation)
- Keep XML sanity always in mind (so there are no namespace issues as in CMDI)
- Honor and use extension hooks provided by SRU/CQL
Proposal for new specification
The following is a proposal for a revisited federated content search specification. When done, cut and paste to the appropriate section of the Wiki and publish on the CLARIN web page.
CLARIN Federated Content Search (CLARIN-FCS)
Introduction
The main goal of CLARIN federated content search (CLARIN-FCS) is to introduce a interface specification, to decouple the search engine functionality from its exploitation, i.e. user-interfaces, third-party applications and to allow services to access search engines in an uniform way.
The CLARIN-FCS interface specification is built upon the SRU/CQL standard and additional functionality required for CLARIN-FCS is added through SRU/CQL's extension mechanisms.
Terminology
The key words MUST
, MUST NOT
, REQUIRED
, SHALL
, SHALL NOT
, SHOULD
, SHOULD NOT
, RECOMMENDED
, MAY
, and OPTIONAL
in this document are to be interpreted as described in RFC2119.
Glossary
- Aggregator
- A module or service to dispatch queries to repositories and collect
results.
- CLARIN-FCS, FCS
- CLARIN federated content search, an interface specification to allow searching within resource content of repositories.
- Client
- A software component, that implements the interface specification to query endpoints, i.e. an aggregator or an user-interface.
- CQL
- Contextual Query Language, previously known as Common Query Language, is a formal language for representing queries to information retrieval systems such as search engines, bibliographic catalogs and museum collection information.
- Endpoint
- A software component, that implements the CLARIN-FCS interface specification and translates between CLARIN-FCS and a search engine.
- Interface Specification
- Common harmonized interface and suite of protocols that repositories need to implement.
- Search Engine
- A software component within a repository, that allows for searching within the repository contents.
- SRU
- Search and Retrieve via URL, is a protocol for Internet search queries.
- Data View
- write def
- PID
- Persistent identifier, write more
- Repository
- A software component at a CLARIN center that stores resources (= data) and information about these resources (= metadata).
- Repository Registry
- A separate service that allows registering endpoints and provides information about these to other components, e.g. an aggegator. The CLARIN Center Registry is an implementation of such a repository registry.
Normaitive References
- RFC2119
-
Key words for use in RFCs to Indicate Requirement Levels, IETF RFC 2119, March 1997,
http://www.ietf.org/rfc/rfc2119.txt
- OASIS-SRU-Overview
-
searchRetrieve: Part 0. Overview Version 1.0, OASIS, January 2013,
http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part0-overview/searchRetrieve-v1.0-os-part0-overview.doc (HTML), (PDF)
- OASIS-SRU-APD
-
searchRetrieve: Part 1. Abstract Protocol Definition Version 1.0, OASIS, January 2013,
http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part1-apd/searchRetrieve-v1.0-os-part1-apd.doc (HTML) (PDF)
- OASIS-SRU12
-
searchRetrieve: Part 2. SRU searchRetrieve Operation: APD Binding for SRU 1.2 Version 1.0, OASIS, January 2013,
http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part2-sru1.2/searchRetrieve-v1.0-os-part2-sru1.2.doc (HTML) (PDF)
- OASIS-CQL
-
searchRetrieve: Part 5. CQL: The Contextual Query Language version 1.0, OASIS, January 2013,
http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part5-cql/searchRetrieve-v1.0-os-part5-cql.doc (HTML) (PDF)
- SRU-Explain
-
searchRetrieve: Part 7. SRU Explain Operation version 1.0, OASIS, January 2013,
http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part7-explain/searchRetrieve-v1.0-os-part7-explain.doc (HTML) (PDF)
- SRU-Scan
-
searchRetrieve: Part 6. SRU Scan Operation version 1.0, OASIS, January 2014,
http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/os/part6-scan/searchRetrieve-v1.0-os-part6-scan.doc (HTML) (PDF)
- LOC-SRU12
-
SRU VERSION 1.2: SRU Search/Retrieve Operation, Library of Congress,
http://www.loc.gov/standards/sru/sru-1-2.html
SRU/CQL
SRU (Search/Retrieve via URL) specifies a general communication protocol for searching and retrieving records and the CQL (Contextual Query Language) specifies a extensible query language. CLARIN-FCS is built on SRU 1.2; subsequent specification may built on SRU 2.0.
Endpoints MUST
implement the SRU/CQL protocol suite as defined in OASIS-SRU-Overview, OASIS-SRU-APD, OASIS-CQL, SRU-Explain, SRU-Scan, especially with respect to:
- Data Model,
- Query Model,
- Processing Model,
- Result Set Model, and
- Diagnostics Model
Endpoints MUST
use the implement the APD Binding for SRU 1.2, as defined in OASIS-SRU-12. Endpoints MAY
implement APD binding for version 1.1 or version 2.0.
Endpoints MUST
use the following namespace URIs for serializing responses:
http://www.loc.gov/zing/srw/
for SRU response documents, andhttp://www.loc.gov/zing/srw/diagnostic/
for diagnostics within SRU response documents.
CLARIN-FCS deviates from the OASIS specification OASIS-SRU-Overview and OASIS-SRU-12 to ensure backwards comparability with SRU 1.2 services as they where defined by the LOC-SRU12.
CLARIN-FCS Interface Specification
Profiles
Yada yada yada ...
Data Views
Yada Yada yada ...
Operations
Yada yada yada ...
Endpoint Identification
Is mapped to SRU explain operation. Yada yada ...
Performing Queries and returning results
Is mapped to SRU SearchRetrieve operation. Yada yada ...