wiki:FCS-spec

Context Navigation

Version 19 (modified by herste, 13 years ago) (diff)
--

Towards a FCS Specification

In which we list things we agree should be in our future standard / specification.

TODO: integrate FCS-Format

Towards a FCS Specification
1. Contents
2. SRU-CQL servlets

SRU-CQL servlets

Explain response

The explain response should, ideally, provide a list of ISOCatted indexes as possible search indexes.

Example (tentative):

 <indexInfo>
  <set identifier="http://www.isocat.org/datcat" name="isocat"/>
  <set identifier="clarin.eu/schema/ccs-v1.0" name="ccs"/>

  <index id="?">
    <title lang="en">Part of Speech</title>
   <map><name set="isocat">partOfSpeech</name></map>
 </index>	

 <index id="?">
    <title lang="en">Words</title>
   <map><name set="ccs">words</name></map>
 </index>	

  <index id="?">
    <title lang="en">Phonetics</title>
   <map><name set="ccs">phonetics</name></map>
 </index>	

</indexInfo>

Search domain

Restricting the search space shall be done via x-cmd-domain parameter (obsoleting: x-cmd-collections).

Scan response

As an extension to normal SRU the scan response defines a list of searchable collections/domains available at the provider. As a scanClause argument cmd.domains should be used.

Example (tentative):

<sru:scanResponse xmlns:sru="http://www.loc.gov/zing/srw/"
          xmlns:diag="http://www.loc.gov/zing/srw/diagnostic/"
          xmlns:myServer="http://myServer.com/"> 
<sru:version>1.2</sru:version> 
  <sru:terms> 
 
    <sru:term> 
          <sru:value>MPI86949#</sru:value> 
          <sru:numberOfRecords>42</sru:numberOfRecords> 
          <sru:displayTerm>The CGN-Corpus (Corpus Gesproken Nederlands)</sru:displayTerm> 
    </sru:term> 
    <sru:term> 
          <sru:value>MPI556280#</sru:value> 
          <sru:numberOfRecords>42</sru:numberOfRecords> 
          <sru:displayTerm>ESF corpus</sru:displayTerm> 
    </sru:term> 
    <sru:term> 
          <sru:value>MPI214746#</sru:value> 
          <sru:numberOfRecords>42</sru:numberOfRecords> 
          <sru:displayTerm>IFA corpus</sru:displayTerm> 
    </sru:term> 
    <sru:term> 
          <sru:value>MPI1296694#</sru:value> 
          <sru:numberOfRecords>42</sru:numberOfRecords> 
          <sru:displayTerm>Childes corpus</sru:displayTerm> 
    </sru:term> 
    <sru:term> 
          <sru:value>MPI1259419#</sru:value> 
          <sru:numberOfRecords>42</sru:numberOfRecords> 
          <sru:displayTerm>Talkbank corpus</sru:displayTerm> 
    </sru:term> 
 
  </sru:terms> 
  <sru:echoedScanRequest> 
    <sru:version>1.2</sru:version> 
    <sru:scanClause>cmd.collections</sru:scanClause> 
    <sru:responsePosition></sru:responsePosition> 
    <sru:maximumTerms>42</sru:maximumTerms> 
  </sru:echoedScanRequest> 
</sru:scanResponse>

Search result

 <ccs:Resource pid="{pid of the resource}“>  
    <ccs:Metadata cmd-link="{PID of the CMD-record}">  /* cmd-link is optional */
        /* this is for metadata provided directly by the content provider 
         * NOT the CMD  metadata.  */
      <ccs:f key="{any metadata-key}">{any metadata value}</ccs:f>
    </ccs:Metadata>
    <ccs:ResourceFragment pid="{offset relative to the parent resource PID}“>    	
       <ccs:Metadata>   /* metadata pertaining to the specific (matching) fragment, like metadata on the "current" speaker */
       </ccs:Metadata>

      <ccs:DataView type="text/xml"><meertens:any/>     
      </ccs:DataView>		

      <ccs:DataView type="image/jpeg" >
        <link>{optional link  to the data}</link>
      </ccs:DataView>  
    </ccs:ResourceFragment>
 </ccs:Resource>

Resource: element representing a resource, carrying the identifier. It may represent anything that has a PID (and a MDRecord). So in particular it may also be collections, aggregating other Resources. Allowed children are: Resource, ResourceFragment, Metadata and DataView
ResourceFragment?: A part of a resource, without own PID, i.e. something addressable with: PID of the Resource + Fragment Identifier. Fragment Identifier to be used depends on the resource type, it may be: XPointer, timecode, sequence-offset, etc. Allowed children are: Metadata and DataView
DataView?: the element carrying the typed data Content can be anything that is in other namespace. The content has to be possible inline or referenced. Important for Images and AV-Files.
Metadata: optional element carrying metadata about the Resource or ResourceFragment. It can carry an optional parameter cmd-link with the PID of a CMD-record. (This only makes sense for Resource/Metadata?

Data Views

All dataviews of specific types should be the same in all implementations. That is, if a service presents results a KWIC, that should be the same KWIC in all services. Here we propose one encoding for several types of dataviews. All implementing services must conform.

kwic

Keyword in context

<ccs:DataView type="kwic">
   Junker Frauenlob , purre knix plautz - Ihr seid ein komischer Kauz - Habt ein Bärtlein von Haaren schwarz , Ziehet es aus mit einem Tropfen Harz Prrrr - ho wird das lang , Kling klang - g - a - d -
  <kw>e</kw>
 , Scheiden thut weh - der Daus
</ccs:DataView>

Geolocation: A location on a map. TODO?

Dictionary: A dictionary entry. To be defined
List: A list of things. To be defined

Matrix: A matrix containing things. To be defined.

Annotations

A bunch of annotated text. We start by supporting the TCF and EAF format as they have existing viewers. For an example EAF file see: sample file For now we have annotations/eaf as type and annotations/tcf as type.

<ccs:DataView type="annotations/eaf">
  <link>http://corpus1.mpi.nl/qfs1/media-archive/demo/pewi/Annotations/elan-example1.eaf</link>
</ccs:DataView>

Formats and viewers:

Type	Format	Viewer	URL
Annotations	EAF	Annexviewer provided by MPI	sample view
Annotations	TCF	AnnotationViewer? provided by Tübingen	sample view
Geolocation	KML	Google-Maps?

Download in other formats:

Plain Text