wiki:FCS-Specification-ScrapBook

Version 4 (modified by oschonef, 10 years ago) (diff)

--

FCS Specification Scrapbook

Issues with current document

  1. Uncomprehensible and not well structures :(
  2. Resource enumeration (aka scan on fcs.resource) rather complex and unintuitive
  3. Basic KWIC records has no provision for multiple "highlight" hits
  4. Clear recommendation for using Resource and ResouceFragment

General ideas / design goals towards better specification

  1. Define FCS conformance level independent of what SRU/CQL do. Don't call them "level", but maybe something like profile to avoid confusion.
    1. Do a basic profile first
    2. Do an advanced/extend profile later in a separate specification or specification amendment (which must be, of course, compatible to basic profile)
    3. Add provisions to, e.g. explain output, to allow endpoints to indicate the profile, they support
  2. Better structure of document (and don't include aggregation stuff; that's a different specification; implementors of endpoints should not need to worry about aggregator implementation)
  3. Keep XML sanity always in mind (so there are no namespace issues as in CMDI)
  4. Honor and use extension hooks provided by SRU/CQL

Proposal for new specification

The following is a proposal for a revisited federated content search specification. When done, cut and paste to the appropriate section of the Wiki and publish on the CLARIN web page.

CLARIN Federated Content Search (CLARIN-FCS)

Introduction

The main goal of CLARIN federated content search (CLARIN-FCS) is to introduce a interface specification, to decouple the search engine functionality from its exploitation, i.e. user-interfaces, third-party applications and to allow services to access search engines in an uniform way.

Terminology

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC2119.

Glossary

Aggregator
A module or service to dispatch queries to repositories and collect

results.

CLARIN-FCS, FCS
CLARIN federated content search, an interface specification to allow searching within resource content of repositories.
Client
A software component, that implements the interface specification to query endpoints, i.e. an aggregator or an user-interface.
CQL
Contextual Query Language, previously known as Common Query Language, is a formal language for representing queries to information retrieval systems such as search engines, bibliographic catalogs and museum collection information.
Endpoint
A software component, that implements the CLARIN-FCS interface specification and translates between CLARIN-FCS and a search engine.
Interface Specification
Common harmonized interface and suite of protocols that repositories need to implement.
Search Engine
A software component within a repository, that allows for searching within the repository contents.
SRU
Search and Retrieve via URL, is a protocol for Internet search queries.
Data View
write def
PID
Persistent identifier, write more
Repository
A software component at a CLARIN center that stores resources (= data) and information about these resources (= metadata).
Repository Registry
A separate service that allows registering endpoints and provides information about these to other components, e.g. an aggegator. The CLARIN Center Registry is an implementation of such a repository registry.

Normaitive References

RFC2119
Key words for use in RFCs to Indicate Requirement Levels, IETF RFC 2119, March 1997,
http://www.ietf.org/rfc/rfc2119.txt
OASIS-SRU-Overview
searchRetrieve: Part 0. Overview Version 1.0, OASIS, January 2013,
http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/csd01/part0-overview/searchRetrieve25v1.0-csd01-part0-overview.doc
OASIS-SRU-APD
searchRetrieve: Part 1. Abstract Protocol Definition Version 1.0, OASIS, January 2013,
csd01-part1-apd.doc
OASIS-SRU12
searchRetrieve: Part 2. SRU searchRetrieve Operation: APD Binding for SRU 1.2 Version 1.0, OASIS, January 2013,
http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/csd01/part2-sru1.2/searchRetrieve33v1.0-csd01-part2-sru1.2.doc
OASIS-CQL
searchRetrieve: Part 5. CQL: The Contextual Query Language version 1.0, OASIS, January 2013,
part5-cql/searchRetrieve-v1.0-45 csd01-part5-cql.doc
SRU-Explain
searchRetrieve: Part 7. SRU Explain Operation version 1.0, OASIS, January 2013,
http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/csd01/part7-explain/searchRetrieve53v1.0-csd01-part7-explain.doc
SRU-Scan
searchRetrieve: Part 6. SRU Scan Operation version 1.0, OASIS, January 2014,
http://docs.oasis-open.org/search-ws/searchRetrieve/v1.0/csd01/part6-scan/searchRetrieve-v1.0-csd01-part6-scan.doc
LOC-SRU12
SRU VERSION 1.2: SRU Search/Retrieve? Operation, Library of Congress,
http://www.loc.gov/standards/sru/sru-1-2.html

SRU/CQL

Endpoints MUST implement the SRU/CQL protocol suite as defined in OASIS-SRU-Overview, OASIS-SRU-APD, OASIS-CQL, SRU-Explain, SRU-Scan, especially with respect to:

  • Data Model,
  • Query Model,
  • Processing Model,
  • Result Set Model, and
  • Diagnostics Model

Endpoints MUST use the implement the APD Binding for SRU 1.2, as defined in OASIS-SRU-12. Endpoints MAY implement APD binding for version 1.1 or version 2.0.

Endpoints MUST use the following namespace URIs for serializing responses:

  • http://www.loc.gov/zing/srw/ for SRU response documents, and
  • http://www.loc.gov/zing/srw/diagnostic/ for serializing diagnostics within SRU response documents.

CLARIN-FCS deviates from the OASIS specification OASIS-SRU-Overview and OASIS-SRU-12 to ensure backwards comparability with SRU 1.2 services as they where defined by the LOC-SRU12.

CLARIN-FCS Interface Specification

Profiles

Yada yada yada ...

Data Views

Yada Yada yada ...

Operations

Yada yada yada ...

Endpoint Identification

Is mapped to SRU explain operation. Yada yada ...

Performing Queries and returning results

Is mapped to SRU SearchRetrieve operation. Yada yada ...