Version 19 (modified by 13 years ago) (diff) | ,
---|
Towards a FCS Specification
In which we list things we agree should be in our future standard / specification.
TODO: integrate FCS-Format
Contents
SRU-CQL servlets
Explain response
The explain response should, ideally, provide a list of ISOCatted indexes as possible search indexes.
Example (tentative):
<indexInfo> <set identifier="http://www.isocat.org/datcat" name="isocat"/> <set identifier="clarin.eu/schema/ccs-v1.0" name="ccs"/> <index id="?"> <title lang="en">Part of Speech</title> <map><name set="isocat">partOfSpeech</name></map> </index> <index id="?"> <title lang="en">Words</title> <map><name set="ccs">words</name></map> </index> <index id="?"> <title lang="en">Phonetics</title> <map><name set="ccs">phonetics</name></map> </index> </indexInfo>
Search domain
Restricting the search space shall be done via x-cmd-domain
parameter (obsoleting: x-cmd-collections
).
Scan response
As an extension to normal SRU the scan response defines a list of searchable collections/domains available at the provider. As a scanClause argument cmd.domains should be used.
Example (tentative):
<sru:scanResponse xmlns:sru="http://www.loc.gov/zing/srw/" xmlns:diag="http://www.loc.gov/zing/srw/diagnostic/" xmlns:myServer="http://myServer.com/"> <sru:version>1.2</sru:version> <sru:terms> <sru:term> <sru:value>MPI86949#</sru:value> <sru:numberOfRecords>42</sru:numberOfRecords> <sru:displayTerm>The CGN-Corpus (Corpus Gesproken Nederlands)</sru:displayTerm> </sru:term> <sru:term> <sru:value>MPI556280#</sru:value> <sru:numberOfRecords>42</sru:numberOfRecords> <sru:displayTerm>ESF corpus</sru:displayTerm> </sru:term> <sru:term> <sru:value>MPI214746#</sru:value> <sru:numberOfRecords>42</sru:numberOfRecords> <sru:displayTerm>IFA corpus</sru:displayTerm> </sru:term> <sru:term> <sru:value>MPI1296694#</sru:value> <sru:numberOfRecords>42</sru:numberOfRecords> <sru:displayTerm>Childes corpus</sru:displayTerm> </sru:term> <sru:term> <sru:value>MPI1259419#</sru:value> <sru:numberOfRecords>42</sru:numberOfRecords> <sru:displayTerm>Talkbank corpus</sru:displayTerm> </sru:term> </sru:terms> <sru:echoedScanRequest> <sru:version>1.2</sru:version> <sru:scanClause>cmd.collections</sru:scanClause> <sru:responsePosition></sru:responsePosition> <sru:maximumTerms>42</sru:maximumTerms> </sru:echoedScanRequest> </sru:scanResponse>
Search result
<ccs:Resource pid="{pid of the resource}“> <ccs:Metadata cmd-link="{PID of the CMD-record}"> /* cmd-link is optional */ /* this is for metadata provided directly by the content provider * NOT the CMD metadata. */ <ccs:f key="{any metadata-key}">{any metadata value}</ccs:f> </ccs:Metadata> <ccs:ResourceFragment pid="{offset relative to the parent resource PID}“> <ccs:Metadata> /* metadata pertaining to the specific (matching) fragment, like metadata on the "current" speaker */ </ccs:Metadata> <ccs:DataView type="text/xml"><meertens:any/> </ccs:DataView> <ccs:DataView type="image/jpeg" > <link>{optional link to the data}</link> </ccs:DataView> </ccs:ResourceFragment> </ccs:Resource>
- Resource
-
element representing a resource, carrying the identifier.
It may represent anything that has a PID (and a MDRecord).
So in particular it may also be collections, aggregating other Resources.
Allowed children are:
Resource
,ResourceFragment
,Metadata
andDataView
- ResourceFragment?
-
A part of a resource, without own PID, i.e. something addressable with: PID of the Resource + Fragment Identifier.
Fragment Identifier to be used depends on the resource type, it may be: XPointer, timecode, sequence-offset, etc.
Allowed children are:
Metadata
andDataView
- DataView?
- the element carrying the typed data Content can be anything that is in other namespace. The content has to be possible inline or referenced. Important for Images and AV-Files.
- Metadata
-
optional element carrying metadata about the
Resource
orResourceFragment
. It can carry an optional parametercmd-link
with the PID of a CMD-record. (This only makes sense for Resource/Metadata?
Data Views
All dataviews of specific types should be the same in all implementations. That is, if a service presents results a KWIC, that should be the same KWIC in all services. Here we propose one encoding for several types of dataviews. All implementing services must conform.
- kwic
- Keyword in context
<ccs:DataView type="kwic"> Junker Frauenlob , purre knix plautz - Ihr seid ein komischer Kauz - Habt ein Bärtlein von Haaren schwarz , Ziehet es aus mit einem Tropfen Harz Prrrr - ho wird das lang , Kling klang - g - a - d - <kw>e</kw> , Scheiden thut weh - der Daus </ccs:DataView>
- Geolocation
- A location on a map. TODO?
- Dictionary
- A dictionary entry. To be defined
- List
- A list of things. To be defined
- Matrix
- A matrix containing things. To be defined.
- Annotations
- A bunch of annotated text.
We start by supporting the TCF and EAF format as they have existing viewers.
For an example EAF file see: sample file
For now we have annotations/eaf as type and annotations/tcf as type.
<ccs:DataView type="annotations/eaf"> <link>http://corpus1.mpi.nl/qfs1/media-archive/demo/pewi/Annotations/elan-example1.eaf</link> </ccs:DataView>
Formats and viewers:
Type | Format | Viewer | URL |
Annotations | EAF | Annexviewer provided by MPI | sample view |
Annotations | TCF | AnnotationViewer? provided by Tübingen | sample view |
Geolocation | KML | Google-Maps? |