Version 61 (modified by Oliver Schonefeld, 10 years ago) (diff)


Warning: This information is deprecated! Please find current information at the Core Specification and supplementary Data Views Specification pages.

FCS Endpoints

This page was started to preliminary collect information about individual repositories/centers/providers/endpoints within FederatedSearch.

Meanwhile there is the CenterRegistry?, that collects and provides the information about the available repositories (see below), that will be the sole source of information about the individual endpoints. However, since not all endpoints already provide a description within the CenterRegistry? and also for reference, for the time being the list of endpoints below is being kept. Also, this page details about issues relevant to FCS-endpoints. (But see also FCS-specification.)

Announcing the endpoints

There are two ways of announcing a FCS-endpoint.

  1. CenterProfile published via the Center Registry
  2. !SearchService-ResourceProxy in a CMD record of a resource/collection.

CenterProfile in the Center Registry

In SOA a separate registry-service is required to announce the individual services/endpoints. Within CLARIN the CenterRegistry? has been introduced to collect and expose organizational/administrational and technical information about individual CLARIN centers.

This information is encoded as a dedicated CMD record according to the CenterProfile [].

These descriptions shall cover all aspects (especially all available services) of given center, to the information about the FCS-endpoint is just a part of the description. Following is a snippet of a CenterProfile linking to the FCS-endpoint:


(Sidenote: There is a semantic/functional overlap of such a CMD record for the endpoint and the SRU-explain-response of given service.
TODO: have to clarify the interaction and dependencies. )

Discussion: replicating Metadata about resources

There is ongoing discussion, how much information about the resources provided via the endpoint should be included in the endpoint description (foremost example being the Language), as the resources are actually already described in separate CMD records.

So it would be "cleaner" to just link to separate CMD records of a collection or a resource, that would provide information like ResourceType, Language, available AnnotationTiers, Time/Space Coverage etc. However that would put an undue burden on the side of the aggregator/client, that would have to crawl through and resolve multiple metadata records, plus be able to make sense of the heterogeneous structure of the CMD records, to get to the required information.

Thus for now, Language (and possibly a few other basic fields) will be added into the endpoint-description. But there are plans (and prototyping work at Meertens) to combine the metadata and content search, that would allow to filter the content search on any metadata query.


Another way of linking to a FCS-endpoint (bottom-up) is by adding a SearchService-ResourceProxy into the CMD record of any collection/resource.

The obvious semantics is, that the stated FCS-endpoint provides search capabilities over given resource. However because the SearchService-Proxy leads just to the "nearest" endpoint, which may expose multiple resources, the endpoint has to accept the parameter x-context, to restrict the search to given resource. A client/aggregator invoking a search for given resource, passes the PID of the resource as x-context parameter.

This method is especially important to support the metadata/content search: With the information about the respective !SearchService inside the CMD descriptions of the resources, it will be easy to do any metadata query, extract a list of FCS-endpoints from the resultset, and pass it e.g. to the FCS-Aggregator to continue searching in the content.

Implementing Endpoints

Here is one preliminary list of services available (or planned) within the FederatedSearch:

Provider DB/Collection link type language status
KNAW mimore dialects Lexicon? Dutch dialects online, level 0, open issues, sample query
MPI IMDI-subset spoken corpora Dutch, German, English, French, Swedish online, level 0, open issues sample query
INL INL text corpus historic Dutch online, level 0, open issues, sample query
INL INL text corpus historic Dutch online, level 0, open issues, sample query
DANS Lieffering historical text corpus Dutch, French Online, level 1, no known issues, sample query
ICLTT C4 historical text corpus German online, level 0, open issues, sample query
UPF El Pais Newspaper Corpus 2005 text corpus Spanish online, allows index-based queries (e.g. queries only on content ccs.content=... sample query)
Uni Tübingen Tübingen Baumbank des Deutschen - Diachrones Corpus text corpus German online, level 0, sample query
IDS Goethe text corpus German online, level 0
sample query enumerate corpora
IDS TextGrid Digital Library (Literature folder) text corpus German online, level 0, highly experimental
sample query enumerate corpora
Uni Leipzig Leipzig Corpora Collection, Deutscher Wortschatz text corpus multiple languages online, level 1, sample query, WIP
BBAW DTA Corpus historical text corpus German online, level 0, sample query
BBAW Dingler Online historical text corpus German online, level 0, sample query
BBAW C4 Corpus reference corpus German online, level 0, sample query
UdS GRUG Treebank parallel Treebank German online, level 0 sample query
UdS Saarbrücken Cookbook Corpora (SaCoCo?) diachronic corpus German online, level 0 sample query
BAS Various speech corpora German, English, Japanese online, level 0 sample query

List of corpora per endpoint.

MPI for Psycholinguistics (


Candidate Services

Provider DB/Collection link type language status
OTA BNC ? text corpus English potential
OTA TRACTOR-archive text corpus? Central an East EU potential
BAS Speech Corpora spoken corpora German planned
Gothenburg Uni Spraakbanken literary texts Swedish? planned?
Gothenburg Uni Spraakbanken text corpus Swedish? planned?
UPF El Pais Newspaper Corpus (Metadata) text corpus Catalan only metadata (date, lang), sample query?
ICLTT CMDI-MDService Metadata multiple langs online, but very sloppy, too different

And following some reference SRU-services:

Provider endpoint info
Library of Congress ?
Oxford English Dictionary info about OED-SRU service
Gutenberg (metadata) provided by indexdata project Open Content

See also for DE endpoint candidates:

SRU/CQL conformance testing

A SRU Server Tester for testing basic protocol conformance is available at:

Preliminary CLARIN FCS SRU/CQL tester at (requires authentication):

Current Issues

At the time of writing (3th of may) there are some issues with all of the sru/cql implementations. We will list them here per service.

  • Does not implement x-cmd-collections
  • <sru:recordData/> is a closed element instead of wrapping around the <css:Resource>
  • How to generate link to resource containing hit?

  • Not XML-ified content in the DataView? element. Should have at least wrapping of elements around the hit.

  • Does not implement scan on cmd.collections
  • How to generate link to resource containing hit?

  • Encodes searchretrieve responce in DCU format (and not our format)
  • Does not seem to use x-cmd-collections

  • searchRetrieve returns a completely different xml thing.
  • scan on cmd.collections generates a nullpointerexception


an associated service must check regularly the availability of the services. (and perhaps even more) - a simple nagios plugin is available

Further (future) Issues

  • Integration with MD-search
  • Integration with VLO
  • Integration with the Virtual Collection Registry
  • Aggregrator/endpoint.

We note that all three will ideally be integrated in the same manner. For this to happen there are a few conditions that MUST be met:

  • All centres generate nice CMDI
  • This CMDI is of a same granuality as the search contstrainability in the endpoint
  • Everyone includes in the CMDI files a unique set identifier thingy (e.g., the node-id mapping thing at the MPI).
  • The endpoints understand the unique identifiers from their corresponding CMDI files!