
Version 15 (modified by vronk, 13 years ago) (diff)


Towards a FCS Specification

In which we list things we agree should be in our future standard / specification.


  1. Towards a FCS Specification
    1. Contents
    2. SRU-CQL servlets
      1. Explain response
      2. Search context
      3. Scan response
      4. Search result
    3. FCS webservice
    4. Clarin Metadata Search

SRU-CQL servlets

Explain response

The explain response should, ideally, provide a list of ISOCatted indexes as possible search indexes.

Example (tentative):

  <set identifier="" name="isocat"/>
  <set identifier="" name="ccs"/>

  <index id="?">
    <title lang="en">Part of Speech</title>
   <map><name set="isocat">partOfSpeech</name></map>

 <index id="?">
    <title lang="en">Words</title>
   <map><name set="ccs">words</name></map>

  <index id="?">
    <title lang="en">Phonetics</title>
   <map><name set="ccs">phonetics</name></map>


Search context

Restricting the search space shall be done via x-cmd-context parameter (obsoleting: x-cmd-collections).

Scan response

Search result

 <ccs:Resource pid="{pid of the resource}“>  
    <ccs:Metadata cmd-link="{PID of the CMD-record}">  /* cmd-link is optional */
        /* this is for metadata provided directly by the content provider 
         * NOT the CMD  metadata.  */
      <ccs:f key="{any metadata-key}">{any metadata value}</ccs:f>
    <ccs:ResourceFragment pid="{offset relative to the parent resource PID}“>    	
       <ccs:Metadata>   /* metadata pertaining to the specific (matching) fragment, like metadata on the "current" speaker */

      <ccs:DataView type="text/xml"><meertens:any/>     

      <ccs:DataView type="image/jpeg" >
        <link>{optional link  to the data}</link>
element representing a resource, carrying the identifier. It may represent anything that has a PID (and a MDRecord). So in particular it may also be collections, aggregating other Resources. Allowed children are: Resource, ResourceFragment, Metadata and DataView
A part of a resource, without own PID, i.e. something addressable with: PID of the Resource + Fragment Identifier. Fragment Identifier to be used depends on the resource type, it may be: XPointer, timecode, sequence-offset, etc. Allowed children are: Metadata and DataView
the element carrying the typed data Content can be anything that is in other namespace. The content has to be possible inline or referenced. Important for Images and AV-Files.
optional element carrying metadata about the Resource or ResourceFragment. It can carry an optional parameter cmd-link with the PID of a CMD-record. (This only makes sense for Resource/Metadata?

Data Views

All dataviews of specific types should be the same in all implementations. That is, if a service presents results a KWIC, that should be the same KWIC in all services. Here we propose one encoding for several types of dataviews. All implementing services must conform.

Keyword in context
<ccs:DataView type="kwic">
   Junker Frauenlob , purre knix plautz - Ihr seid ein komischer Kauz - Habt ein Bärtlein von Haaren schwarz , Ziehet es aus mit einem Tropfen Harz Prrrr - ho wird das lang , Kling klang - g - a - d -
 , Scheiden thut weh - der Daus
A location on a map.
A bunch of annotated text. We start by supporting the TCF and EAF format as they have existing viewers.
<ccs:DataView type="annotations/eaf">

For an example EAF file see: sample file

Formats and viewers:

Type Format Viewer URL
Annotations EAF Annexviewer provided by MPI sample view
Annotations TCF AnnotationViewer? provided by Tübingen sample view
Geolocation KML Google-Maps?

FCS webservice

REST webinterface that does the aggregation (Metasearch) (map/reduce). Should support the x-cmd-context as given by the metadatasearch service, or by the user in the federated search website.

PazPar2 is a federated search service (metasearch).

Clarin Metadata Search

Metadata Search