172 | | === Data Views === |
173 | | |
174 | | Yada Yada yada ... |
175 | | |
176 | | |
177 | | === Operations ===#identify |
| 172 | === Result Format === |
| 173 | ==== Resources and !ResourceFragments ==== |
| 174 | ''' WIP: will be reformulated ''' |
| 175 | |
| 176 | In CLARIN-FCS, each {{{<sru:record>}}} element represents one hit within the ''resource'', which is encoded as a {{{<fcs:Resource>}}} element. Each resource shall be identified a persistent identifier (or, less preferably, a endpoint unique URI). The correct resource to return here is the most precise unit of data that is directly addressable as a "whole". The hit may contain a ''resource fragment'', which is encoded as as a {{{<fcs:ResourceFragment>}}} element. The resource fragment shall be addressable within a resource, i.e. it has an offset or a resource-internal identifier. Using resource fragments is optional, but encouraged. |
| 177 | |
| 178 | The actual hit within a resource is provided encoded as a ''data view'' format and is serialized as a {{{<fcs:DataView>}}} element inside either the {{{<fcs:Resource>}}} or {{{<fcs:ResourceFragment>}}} element. Each hit may be serialized as multiple data views, however the keyword-in-context (KWIC) data view is mandatory with the resource fragment (if applicable), or otherwise within the resource (if there is no reasonable resource fragment). Other data views should be put in a place that is logical for their content (as is to be determined by the endpoint. E.g. a metadata data view would most likely be put directly under a resource. On the other hand a data view representing some annotation layers directly around the hit is more likely to belong in within the resource fragment. |
| 179 | |
| 180 | Each entity (i.e. {{{<fcs:Resource>}}}, {{{<fcs:ResourceFragment>}}} or {{{<fcs:DataView>}}} element) contains a {{{ref}}} attribute, which points to the original data represented by the resource, resource fragment, or data view as well as possible. It should always be possible to directly link to the resource itself. Worst case this will be a web-page describing a corpus or collection (including instruction on how to obtain it). Best case it directly links to a specific file or part of a resource in which the hit was obtained. The latter is not always possible, and when possible often constrained by licensing issues. Endpoints should provide links that are as specific as possible/logical. |
| 181 | |
| 182 | For CLARIN-FCS, a custom record schema has been defined. The ''record schema identifier'' for this schema is {{{http://clarin.eu/fcs/1.0}}} and the appropriate XML Schema can be found at [source:FederatedSearch/Resource.xsd Resource.xsd] ([source:FederatedSearch/Resource.xsd?format=txt download]). |
| 183 | |
| 184 | |
| 185 | |
| 186 | ==== Data Views ==== |
| 187 | ''' WIP: will be reformulated ''' |
| 188 | |
| 189 | The data views are designed to allow for different representations of search results within CLARIN-FCS. They are deliberately kept open to allow further extensions in the future with more supported data view formats. |
| 190 | |
| 191 | The type of each data view is identified by the {{{type}}} attribute of the {{{<fcs:DataView>}}} element. The value if defined to be a [http://en.wikipedia.org/wiki/MIME_Type MIME type]. If no existing MIME type can be used, implementors are encouraged to define a properer private mime type. The following formats are currently being considered: |
| 192 | Keyword-In-Context (KWIC):: |
| 193 | Description: a keyword-in-context view, where each hit should be presented within the context of a complete sentence (if possible) or any other reasonable unit of context (e.g. if sentences cannot be determined by the endpoint). The keyword-in-context data view is '''mandatory''' for all endpoints. The appropriate XML schema can be found at [source:FederatedSearch/Resource-KWIC.xsd Resource-KWIC.xsd] ([source:FederatedSearch/Resource-KWIC.xsd?format=txt download]). \\ |
| 194 | Type: {{{application/x-clarin-fcs-kwic+xml}}} \\ |
| 195 | Example for a keyword-in-context data view (formatted for brevity): |
| 196 | {{{#!xml |
| 197 | <fcs:DataView type="application/x-clarin-fcs-kwic+xml"> |
| 198 | <kwic:kwic xmlns:kwic="http://clarin.eu/fcs/1.0/kwic"> |
| 199 | <kwic:c type="left">Some text with the </kwic:c> |
| 200 | <kwic:kw>keyword</kwic:kw> |
| 201 | <kwic:c type="right">highlighted.</kwic:c> |
| 202 | </kwic:kwic> |
| 203 | </fcs:DataView> |
| 204 | }}} |
| 205 | CMDI metadata:: |
| 206 | Description: a CMDI metadata record applicable to the specific context, i.e., if contained inside a resource element it should apply to the entire resource, and if contained inside a resource fragment is should apply specifically to the resource fragment (i.e., more specialized than the metadata for the encompassing resource). The minimal XML schema for CMDI records can be found at [source:metadata/trunk/toolkit/xsd/minimal-cmdi.xsd minimal-cmdi.xsd] ([source:metadata/trunk/toolkit/xsd/minimal-cmdi.xsd?format=txt download]). For further information refer to the [http://www.clarin.eu/cmdi Component Metadata] pages. \\ |
| 207 | Type: {{{application/x-cmdi+xml}}} \\ |
| 208 | |
| 209 | Geolocation (KML):: |
| 210 | Description: A geographic location encoded in the Keyhole Markup Language. Please refer to the [http://www.opengeospatial.org/standards/kml KML standard] for further information and the current KML XML schema. \\ |
| 211 | Type: {{{application/vnd.google-earth.kml+xml}}} |
| 212 | |
| 213 | |
| 214 | == CLARIN-FCS to SRU/CQL binding == |