Version 22 (modified by 13 years ago) (diff) | ,
---|
Towards a Specification of the FCS-API for individual Content Providers
Target Audience: Technical Stuff of Content Providers
Specification of the interface that Content Provider willing to join the federated search have to implement. This builds upon the SRU/CQL protocol, but concentrates mainly on the specific agreements on top of the protocol.
In which we list things we agree should be in our future standard / specification.
SRU basics
(We have to ensure, that our specification is compatible with the established implementation and usage of the protocol.)
Context Sets
Explain operation
This basic request serves to announce server's capabilities and should allow the client to configure itself automatically.
FIXME: start more gently.
The explain response should, ideally, provide a list of ISOCatted indexes as possible search indexes. If there is no ISOCat equivalent the CCS-context set is to be used.
Example (tentative):
<indexInfo> <set identifier="http://www.isocat.org/datcat" name="isocat"/> <set identifier="clarin.eu/schema/ccs-v1.0" name="ccs"/> <index id="?"> <title lang="en">Part of Speech</title> <map><name set="isocat">partOfSpeech</name></map> </index> <index id="?"> <title lang="en">Words</title> <map><name set="ccs">words</name></map> </index> <index id="?"> <title lang="en">Phonetics</title> <map><name set="ccs">phonetics</name></map> </index> </indexInfo>
searchRetrieve operation
The operation to send the actual query.
Search result
The common ground is the <sru:searchRetrieveResponse>
defined by the protocol. This goes down to the <sru:record>
wrapping element.
The proposition is to continue with:
<ccs:Resource pid="{pid of the resource}“> <ccs:Metadata cmd-link="{PID of the CMD-record}"> /* cmd-link is optional */ /* this is for metadata provided directly by the content provider * NOT the CMD metadata. */ <ccs:f key="{any metadata-key}">{any metadata value}</ccs:f> </ccs:Metadata> <ccs:ResourceFragment pid="{offset relative to the parent resource PID}“> <ccs:Metadata> /* metadata pertaining to the specific (matching) fragment, like metadata on the "current" speaker */ </ccs:Metadata> <ccs:DataView type="text/xml"><meertens:any/> </ccs:DataView> <ccs:DataView type="image/jpeg" > <link>{optional link to the data}</link> </ccs:DataView> </ccs:ResourceFragment> </ccs:Resource>
- Resource
-
element representing a resource, carrying the identifier.
It may represent anything that has a PID (and a MDRecord).
So in particular it may also be collections, aggregating other Resources.
Allowed children are:
Resource
,ResourceFragment
,Metadata
andDataView
- ResourceFragment?
-
A part of a resource, without own PID, i.e. something addressable with: PID of the Resource + Fragment Identifier.
Fragment Identifier to be used depends on the resource type, it may be: XPointer, timecode, sequence-offset, etc.
Allowed children are:
Metadata
andDataView
- DataView?
- the element carrying the typed data Content can be anything that is in other namespace. The content has to be possible inline or referenced. Important for Images and AV-Files.
- Metadata
-
optional element carrying metadata about the
Resource
orResourceFragment
. It can carry an optional parametercmd-link
with the PID of a CMD-record. (This only makes sense for Resource/Metadata?
Data Views
All dataviews of specific types should be the same in all implementations.
That is, if a service presents results as KWIC
, that should be the same KWIC in all services.
Here we propose several types of DataViews?, the format has to be yet defined for most of them.
- kwic
- Keyword in context
<ccs:DataView type="kwic"> Junker Frauenlob , purre knix plautz - Ihr seid ein komischer Kauz - Habt ein Bärtlein von Haaren schwarz , Ziehet es aus mit einem Tropfen Harz Prrrr - ho wird das lang , Kling klang - g - a - d - <kw>e</kw> , Scheiden thut weh - der Daus </ccs:DataView>
- Geolocation
- A location on a map.
- Dictionary
- A dictionary entry. To be defined
- List
- A list of things. To be defined
- Matrix
- A matrix containing things. To be defined.
- Annotations
- A bunch of annotated text.
We start by supporting the TCF and EAF format as they have existing viewers.
For an example EAF file see: sample file
For now we have annotations/eaf as type and annotations/tcf as type.
<ccs:DataView type="annotations/eaf"> <link>http://corpus1.mpi.nl/qfs1/media-archive/demo/pewi/Annotations/elan-example1.eaf</link> </ccs:DataView>
Formats and viewers:
Type | Format | Viewer | URL |
Annotations | EAF | Annexviewer provided by MPI | sample view |
Annotations | TCF | AnnotationViewer? provided by Tübingen | sample view |
Geolocation | KML | Google-Maps? http://openlayers.org/ |
restricting the search by collections
Restricting the search space shall be done via x-cmd-domain
parameter (obsoleting: x-cmd-collections
).
Scan operation
As an extension to normal SRU the scan response defines a list of searchable collections/domains available at the provider. As a scanClause argument cmd.domains should be used.
Example (tentative):
<sru:scanResponse xmlns:sru="http://www.loc.gov/zing/srw/" xmlns:diag="http://www.loc.gov/zing/srw/diagnostic/" xmlns:myServer="http://myServer.com/"> <sru:version>1.2</sru:version> <sru:terms> <sru:term> <sru:value>MPI86949#</sru:value> <sru:numberOfRecords>42</sru:numberOfRecords> <sru:displayTerm>The CGN-Corpus (Corpus Gesproken Nederlands)</sru:displayTerm> </sru:term> <sru:term> <sru:value>MPI556280#</sru:value> <sru:numberOfRecords>42</sru:numberOfRecords> <sru:displayTerm>ESF corpus</sru:displayTerm> </sru:term> <sru:term> <sru:value>MPI214746#</sru:value> <sru:numberOfRecords>42</sru:numberOfRecords> <sru:displayTerm>IFA corpus</sru:displayTerm> </sru:term> <sru:term> <sru:value>MPI1296694#</sru:value> <sru:numberOfRecords>42</sru:numberOfRecords> <sru:displayTerm>Childes corpus</sru:displayTerm> </sru:term> <sru:term> <sru:value>MPI1259419#</sru:value> <sru:numberOfRecords>42</sru:numberOfRecords> <sru:displayTerm>Talkbank corpus</sru:displayTerm> </sru:term> </sru:terms> <sru:echoedScanRequest> <sru:version>1.2</sru:version> <sru:scanClause>cmd.collections</sru:scanClause> <sru:responsePosition></sru:responsePosition> <sru:maximumTerms>42</sru:maximumTerms> </sru:echoedScanRequest> </sru:scanResponse>