Changes between Version 12 and Version 13 of FCS-Specification-ScrapBook


Ignore:
Timestamp:
02/03/14 17:52:43 (10 years ago)
Author:
oschonef
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FCS-Specification-ScrapBook

    v12 v13  
    6969    A separate service that allows registering endpoints and provides information about these to other components, e.g. an aggegator. The [http://centres.clarin.eu/ CLARIN Center Registry] is an implementation of such a repository registry.
    7070
    71 === Normaitive References ===
     71=== Normative References ===
    7272 RFC2119[=#REF_RFC_2119]::
    7373    Key words for use in RFCs to Indicate Requirement Levels, IETF RFC 2119, March 1997, \\
     
    170170
    171171
    172 === Data Views ===
    173 
    174 Yada Yada yada ...
    175 
    176 
    177 === Operations ===#identify
     172=== Result Format ===
     173==== Resources and !ResourceFragments ====
     174''' WIP: will be reformulated '''
     175
     176In CLARIN-FCS, each {{{<sru:record>}}} element represents one hit within the ''resource'', which is encoded as a {{{<fcs:Resource>}}} element. Each resource shall be identified a persistent identifier (or, less preferably, a endpoint unique URI). The correct resource to return here is the most precise unit of data that is directly addressable as a "whole".  The hit may contain a ''resource fragment'', which is encoded as as a {{{<fcs:ResourceFragment>}}} element. The resource fragment shall be addressable within a resource, i.e. it has an offset or a resource-internal identifier. Using resource fragments is optional, but encouraged.
     177
     178The actual hit within a resource is provided encoded as a ''data view'' format and is serialized as a {{{<fcs:DataView>}}} element inside either the {{{<fcs:Resource>}}} or {{{<fcs:ResourceFragment>}}} element. Each hit may be serialized as multiple data views, however the keyword-in-context (KWIC) data view is mandatory with the resource fragment (if applicable), or otherwise within the resource (if there is no reasonable resource fragment). Other data views should be put in a place that is logical for their content (as is to be determined by the endpoint. E.g. a metadata data view would most likely be put directly under a resource. On the other hand a data view representing some annotation layers directly around the hit is more likely to belong in within the resource fragment.
     179
     180Each entity (i.e. {{{<fcs:Resource>}}}, {{{<fcs:ResourceFragment>}}} or {{{<fcs:DataView>}}} element) contains a {{{ref}}} attribute, which points to the original data represented by the resource, resource fragment, or data view as well as possible. It should always be possible to directly link to the resource itself. Worst case this will be a web-page describing a corpus or collection (including instruction on how to obtain it). Best case it directly links to a specific file or part of a resource in which the hit was obtained. The latter is not always possible, and when possible often constrained by licensing issues. Endpoints should provide links that are as specific as possible/logical.
     181
     182For CLARIN-FCS, a custom record schema has been defined. The ''record schema identifier'' for this schema is {{{http://clarin.eu/fcs/1.0}}} and the appropriate XML Schema can be found at [source:FederatedSearch/Resource.xsd Resource.xsd] ([source:FederatedSearch/Resource.xsd?format=txt download]).
     183
     184
     185
     186==== Data Views ====
     187''' WIP: will be reformulated '''
     188
     189The data views are designed to allow for different representations of search results within CLARIN-FCS. They are deliberately kept open to allow further extensions in the future with more supported data view formats.
     190
     191The type of each data view is identified by the {{{type}}} attribute of the {{{<fcs:DataView>}}} element. The value if defined to be a [http://en.wikipedia.org/wiki/MIME_Type MIME type]. If no existing MIME type can be used, implementors are encouraged to define a properer private mime type. The following formats are currently being considered:
     192 Keyword-In-Context (KWIC)::
     193   Description: a keyword-in-context view, where each hit should be presented within the context of a complete sentence (if possible) or any other reasonable unit of context (e.g. if sentences cannot be determined by the endpoint). The keyword-in-context data view is '''mandatory''' for all endpoints. The appropriate XML schema can be found at [source:FederatedSearch/Resource-KWIC.xsd Resource-KWIC.xsd] ([source:FederatedSearch/Resource-KWIC.xsd?format=txt download]). \\
     194   Type: {{{application/x-clarin-fcs-kwic+xml}}} \\
     195   Example for a keyword-in-context data view (formatted for brevity):
     196{{{#!xml
     197<fcs:DataView type="application/x-clarin-fcs-kwic+xml">
     198  <kwic:kwic xmlns:kwic="http://clarin.eu/fcs/1.0/kwic">
     199    <kwic:c type="left">Some text with the </kwic:c>
     200    <kwic:kw>keyword</kwic:kw>
     201    <kwic:c type="right">highlighted.</kwic:c>
     202  </kwic:kwic>
     203</fcs:DataView>
     204}}}
     205  CMDI metadata::
     206    Description: a CMDI metadata record applicable to the specific context, i.e., if contained inside a resource element it should apply to the entire resource, and if contained inside a resource fragment is should apply specifically to the resource fragment (i.e., more specialized than the metadata for the encompassing resource). The minimal XML schema for CMDI records can be found at [source:metadata/trunk/toolkit/xsd/minimal-cmdi.xsd minimal-cmdi.xsd] ([source:metadata/trunk/toolkit/xsd/minimal-cmdi.xsd?format=txt download]). For further information refer to the [http://www.clarin.eu/cmdi Component Metadata] pages. \\
     207    Type: {{{application/x-cmdi+xml}}} \\
     208
     209  Geolocation (KML)::
     210    Description: A geographic location encoded in the Keyhole Markup Language. Please refer to the [http://www.opengeospatial.org/standards/kml KML standard] for further information and the current KML XML schema. \\
     211    Type: {{{application/vnd.google-earth.kml+xml}}}
     212
     213
     214== CLARIN-FCS to SRU/CQL binding ==
    178215
    179216Yada yada yada ...