wiki:FCS-spec

Version 19 (modified by herste, 13 years ago) (diff)

--

Towards a FCS Specification

In which we list things we agree should be in our future standard / specification.

TODO: integrate FCS-Format

Contents

  1. Towards a FCS Specification
    1. Contents
    2. SRU-CQL servlets
      1. Explain response
      2. Search domain
      3. Scan response
      4. Search result

SRU-CQL servlets

Explain response

The explain response should, ideally, provide a list of ISOCatted indexes as possible search indexes.

Example (tentative):

 <indexInfo>
  <set identifier="http://www.isocat.org/datcat" name="isocat"/>
  <set identifier="clarin.eu/schema/ccs-v1.0" name="ccs"/>

  <index id="?">
    <title lang="en">Part of Speech</title>
   <map><name set="isocat">partOfSpeech</name></map>
 </index>	

 <index id="?">
    <title lang="en">Words</title>
   <map><name set="ccs">words</name></map>
 </index>	

  <index id="?">
    <title lang="en">Phonetics</title>
   <map><name set="ccs">phonetics</name></map>
 </index>	

</indexInfo>

Search domain

Restricting the search space shall be done via x-cmd-domain parameter (obsoleting: x-cmd-collections).

Scan response

As an extension to normal SRU the scan response defines a list of searchable collections/domains available at the provider. As a scanClause argument cmd.domains should be used.

Example (tentative):

<sru:scanResponse xmlns:sru="http://www.loc.gov/zing/srw/"
          xmlns:diag="http://www.loc.gov/zing/srw/diagnostic/"
          xmlns:myServer="http://myServer.com/"> 
<sru:version>1.2</sru:version> 
  <sru:terms> 
 
    <sru:term> 
          <sru:value>MPI86949#</sru:value> 
          <sru:numberOfRecords>42</sru:numberOfRecords> 
          <sru:displayTerm>The CGN-Corpus (Corpus Gesproken Nederlands)</sru:displayTerm> 
    </sru:term> 
    <sru:term> 
          <sru:value>MPI556280#</sru:value> 
          <sru:numberOfRecords>42</sru:numberOfRecords> 
          <sru:displayTerm>ESF corpus</sru:displayTerm> 
    </sru:term> 
    <sru:term> 
          <sru:value>MPI214746#</sru:value> 
          <sru:numberOfRecords>42</sru:numberOfRecords> 
          <sru:displayTerm>IFA corpus</sru:displayTerm> 
    </sru:term> 
    <sru:term> 
          <sru:value>MPI1296694#</sru:value> 
          <sru:numberOfRecords>42</sru:numberOfRecords> 
          <sru:displayTerm>Childes corpus</sru:displayTerm> 
    </sru:term> 
    <sru:term> 
          <sru:value>MPI1259419#</sru:value> 
          <sru:numberOfRecords>42</sru:numberOfRecords> 
          <sru:displayTerm>Talkbank corpus</sru:displayTerm> 
    </sru:term> 
 
  </sru:terms> 
  <sru:echoedScanRequest> 
    <sru:version>1.2</sru:version> 
    <sru:scanClause>cmd.collections</sru:scanClause> 
    <sru:responsePosition></sru:responsePosition> 
    <sru:maximumTerms>42</sru:maximumTerms> 
  </sru:echoedScanRequest> 
</sru:scanResponse>

Search result

 <ccs:Resource pid="{pid of the resource}“>  
    <ccs:Metadata cmd-link="{PID of the CMD-record}">  /* cmd-link is optional */
        /* this is for metadata provided directly by the content provider 
         * NOT the CMD  metadata.  */
      <ccs:f key="{any metadata-key}">{any metadata value}</ccs:f>
    </ccs:Metadata>
    <ccs:ResourceFragment pid="{offset relative to the parent resource PID}“>    	
       <ccs:Metadata>   /* metadata pertaining to the specific (matching) fragment, like metadata on the "current" speaker */
       </ccs:Metadata>

      <ccs:DataView type="text/xml"><meertens:any/>     
      </ccs:DataView>		

      <ccs:DataView type="image/jpeg" >
        <link>{optional link  to the data}</link>
      </ccs:DataView>  
    </ccs:ResourceFragment>
 </ccs:Resource>
Resource
element representing a resource, carrying the identifier. It may represent anything that has a PID (and a MDRecord). So in particular it may also be collections, aggregating other Resources. Allowed children are: Resource, ResourceFragment, Metadata and DataView
ResourceFragment?
A part of a resource, without own PID, i.e. something addressable with: PID of the Resource + Fragment Identifier. Fragment Identifier to be used depends on the resource type, it may be: XPointer, timecode, sequence-offset, etc. Allowed children are: Metadata and DataView
DataView?
the element carrying the typed data Content can be anything that is in other namespace. The content has to be possible inline or referenced. Important for Images and AV-Files.
Metadata
optional element carrying metadata about the Resource or ResourceFragment. It can carry an optional parameter cmd-link with the PID of a CMD-record. (This only makes sense for Resource/Metadata?

Data Views

All dataviews of specific types should be the same in all implementations. That is, if a service presents results a KWIC, that should be the same KWIC in all services. Here we propose one encoding for several types of dataviews. All implementing services must conform.

kwic
Keyword in context
<ccs:DataView type="kwic">
   Junker Frauenlob , purre knix plautz - Ihr seid ein komischer Kauz - Habt ein Bärtlein von Haaren schwarz , Ziehet es aus mit einem Tropfen Harz Prrrr - ho wird das lang , Kling klang - g - a - d -
  <kw>e</kw>
 , Scheiden thut weh - der Daus
</ccs:DataView>
Geolocation
A location on a map. TODO?
Dictionary
A dictionary entry. To be defined
List
A list of things. To be defined
Matrix
A matrix containing things. To be defined.
Annotations
A bunch of annotated text. We start by supporting the TCF and EAF format as they have existing viewers. For an example EAF file see: sample file For now we have annotations/eaf as type and annotations/tcf as type.
<ccs:DataView type="annotations/eaf">
  <link>http://corpus1.mpi.nl/qfs1/media-archive/demo/pewi/Annotations/elan-example1.eaf</link>
</ccs:DataView>

Formats and viewers:

Type Format Viewer URL
Annotations EAF Annexviewer provided by MPI sample view
Annotations TCF AnnotationViewer? provided by Tübingen sample view
Geolocation KML Google-Maps?