wiki:FCS-spec

Version 22 (modified by vronk, 13 years ago) (diff)

--

Towards a Specification of the FCS-API for individual Content Providers

Target Audience: Technical Stuff of Content Providers

Specification of the interface that Content Provider willing to join the federated search have to implement. This builds upon the SRU/CQL protocol, but concentrates mainly on the specific agreements on top of the protocol.

In which we list things we agree should be in our future standard / specification.

SRU basics

(We have to ensure, that our specification is compatible with the established implementation and usage of the protocol.)

Context Sets

Explain operation

This basic request serves to announce server's capabilities and should allow the client to configure itself automatically.

FIXME: start more gently.

The explain response should, ideally, provide a list of ISOCatted indexes as possible search indexes. If there is no ISOCat equivalent the CCS-context set is to be used.

Example (tentative):

 <indexInfo>
  <set identifier="http://www.isocat.org/datcat" name="isocat"/>
  <set identifier="clarin.eu/schema/ccs-v1.0" name="ccs"/>

  <index id="?">
    <title lang="en">Part of Speech</title>
   <map><name set="isocat">partOfSpeech</name></map>
 </index>	

 <index id="?">
    <title lang="en">Words</title>
   <map><name set="ccs">words</name></map>
 </index>	

  <index id="?">
    <title lang="en">Phonetics</title>
   <map><name set="ccs">phonetics</name></map>
 </index>	

</indexInfo>

searchRetrieve operation

The operation to send the actual query.

Search result

The common ground is the <sru:searchRetrieveResponse> defined by the protocol. This goes down to the <sru:record> wrapping element. The proposition is to continue with:

 <ccs:Resource pid="{pid of the resource}“>  
    <ccs:Metadata cmd-link="{PID of the CMD-record}">  /* cmd-link is optional */
        /* this is for metadata provided directly by the content provider 
         * NOT the CMD  metadata.  */
      <ccs:f key="{any metadata-key}">{any metadata value}</ccs:f>
    </ccs:Metadata>
    <ccs:ResourceFragment pid="{offset relative to the parent resource PID}“>    	
       <ccs:Metadata>   /* metadata pertaining to the specific (matching) fragment, like metadata on the "current" speaker */
       </ccs:Metadata>

      <ccs:DataView type="text/xml"><meertens:any/>     
      </ccs:DataView>		

      <ccs:DataView type="image/jpeg" >
        <link>{optional link  to the data}</link>
      </ccs:DataView>  
    </ccs:ResourceFragment>
 </ccs:Resource>
Resource
element representing a resource, carrying the identifier. It may represent anything that has a PID (and a MDRecord). So in particular it may also be collections, aggregating other Resources. Allowed children are: Resource, ResourceFragment, Metadata and DataView
ResourceFragment?
A part of a resource, without own PID, i.e. something addressable with: PID of the Resource + Fragment Identifier. Fragment Identifier to be used depends on the resource type, it may be: XPointer, timecode, sequence-offset, etc. Allowed children are: Metadata and DataView
DataView?
the element carrying the typed data Content can be anything that is in other namespace. The content has to be possible inline or referenced. Important for Images and AV-Files.
Metadata
optional element carrying metadata about the Resource or ResourceFragment. It can carry an optional parameter cmd-link with the PID of a CMD-record. (This only makes sense for Resource/Metadata?

Data Views

All dataviews of specific types should be the same in all implementations. That is, if a service presents results as KWIC, that should be the same KWIC in all services. Here we propose several types of DataViews?, the format has to be yet defined for most of them.

kwic
Keyword in context
<ccs:DataView type="kwic">
   Junker Frauenlob , purre knix plautz - Ihr seid ein komischer Kauz - Habt ein Bärtlein von Haaren schwarz , Ziehet es aus mit einem Tropfen Harz Prrrr - ho wird das lang , Kling klang - g - a - d -
  <kw>e</kw>
 , Scheiden thut weh - der Daus
</ccs:DataView>
Geolocation
A location on a map.
Dictionary
A dictionary entry. To be defined
List
A list of things. To be defined
Matrix
A matrix containing things. To be defined.
Annotations
A bunch of annotated text. We start by supporting the TCF and EAF format as they have existing viewers. For an example EAF file see: sample file For now we have annotations/eaf as type and annotations/tcf as type.
<ccs:DataView type="annotations/eaf">
  <link>http://corpus1.mpi.nl/qfs1/media-archive/demo/pewi/Annotations/elan-example1.eaf</link>
</ccs:DataView>

Formats and viewers:

Type Format Viewer URL
Annotations EAF Annexviewer provided by MPI sample view
Annotations TCF AnnotationViewer? provided by Tübingen sample view
Geolocation KML Google-Maps? http://openlayers.org/

restricting the search by collections

Restricting the search space shall be done via x-cmd-domain parameter (obsoleting: x-cmd-collections).

Scan operation

As an extension to normal SRU the scan response defines a list of searchable collections/domains available at the provider. As a scanClause argument cmd.domains should be used.

Example (tentative):

<sru:scanResponse xmlns:sru="http://www.loc.gov/zing/srw/"
          xmlns:diag="http://www.loc.gov/zing/srw/diagnostic/"
          xmlns:myServer="http://myServer.com/"> 
<sru:version>1.2</sru:version> 
  <sru:terms> 
 
    <sru:term> 
          <sru:value>MPI86949#</sru:value> 
          <sru:numberOfRecords>42</sru:numberOfRecords> 
          <sru:displayTerm>The CGN-Corpus (Corpus Gesproken Nederlands)</sru:displayTerm> 
    </sru:term> 
    <sru:term> 
          <sru:value>MPI556280#</sru:value> 
          <sru:numberOfRecords>42</sru:numberOfRecords> 
          <sru:displayTerm>ESF corpus</sru:displayTerm> 
    </sru:term> 
    <sru:term> 
          <sru:value>MPI214746#</sru:value> 
          <sru:numberOfRecords>42</sru:numberOfRecords> 
          <sru:displayTerm>IFA corpus</sru:displayTerm> 
    </sru:term> 
    <sru:term> 
          <sru:value>MPI1296694#</sru:value> 
          <sru:numberOfRecords>42</sru:numberOfRecords> 
          <sru:displayTerm>Childes corpus</sru:displayTerm> 
    </sru:term> 
    <sru:term> 
          <sru:value>MPI1259419#</sru:value> 
          <sru:numberOfRecords>42</sru:numberOfRecords> 
          <sru:displayTerm>Talkbank corpus</sru:displayTerm> 
    </sru:term> 
 
  </sru:terms> 
  <sru:echoedScanRequest> 
    <sru:version>1.2</sru:version> 
    <sru:scanClause>cmd.collections</sru:scanClause> 
    <sru:responsePosition></sru:responsePosition> 
    <sru:maximumTerms>42</sru:maximumTerms> 
  </sru:echoedScanRequest> 
</sru:scanResponse>