Changes between Version 35 and Version 36 of FCS-Specification-ScrapBook


Ignore:
Timestamp:
02/12/14 15:44:52 (10 years ago)
Author:
oschonef
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FCS-Specification-ScrapBook

    v35 v36  
    203203
    204204=== Result Format ===
    205 CLARIN-FCS uses a customized format for returning results. ''Resource'' and ''Resource Fragments'' serve as containers for hit results, that are presented in one or more  ''Data View''. The following section describes the result result and data view format and section [#query Performing Queries and returning results] will describe, how hits are embedded within SRU responses.
     205CLARIN-FCS uses a customized format for returning results. ''Resource'' and ''Resource Fragments'' serve as containers for hit results, that are presented in one or more  ''Data View''. The following section describes the result result and data view format and section [#searchRetrieve Operation ''searchRetrieve''] will describe, how hits are embedded within SRU responses.
    206206
    207207==== Resource and !ResourceFragment ====
     
    234234</fcs:Resource>
    235235}}}
    236 This example shows a simple hit, which is encoded in one Data View of type ''Generic Hits'' embedded within a Resource. The type of the Data View is identified by the MIME type `application/x-clarin-fcs-hits+xml`. The Resource is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`.
     236This [#REF_Example_1 example] shows a simple hit, which is encoded in one Data View of type ''Generic Hits'' embedded within a Resource. The type of the Data View is identified by the MIME type `application/x-clarin-fcs-hits+xml`. The Resource is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`.
    237237
    238238[=#REF_Example_2]Example 2:
     
    246246</fcs:Resource>
    247247}}}
    248 This example shows a hit encoded as a Resource Fragment embedded within a Resource. The actual hit is again encoded as one Data View of type ''Generic Hits''. The hit is not directly referenceable, but the Resource, in which hit occurred, is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`. In contrast to [#REF_Example_1 Example 1], the endpoint decided to provide a "semantically richer" encoding and embedded the hit using a Resource Fragment within the Resource to indicate that the hit is a part of a larger resource, e.g. a sentence in a text document.
     248This [#REF_Example_2 example] shows a hit encoded as a Resource Fragment embedded within a Resource. The actual hit is again encoded as one Data View of type ''Generic Hits''. The hit is not directly referenceable, but the Resource, in which hit occurred, is referenceable by the persistent identifier `http://hdl.handle.net/4711/08-15`. In contrast to [#REF_Example_1 Example 1], the endpoint decided to provide a "semantically richer" encoding and embedded the hit using a Resource Fragment within the Resource to indicate that the hit is a part of a larger resource, e.g. a sentence in a text document.
    249249
    250250[=#REF_Example_3]Example 3:
     
    263263</fcs:Resource>
    264264}}}
    265 The most complex example is similar to [#REF_Example_2 Example 2], i.e. it shows a hit is encoded as one ''Generic Hits'' Data View in a Resource Fragment, that is embedded in a Resource. In contrast to Example 2, another Data View of type ''CMDI'' is embedded directly within the Resource. An Endpoint can use this type of Data View to directly provide CMDI metadata about the Resource to Clients.
     265The most complex [#REF_Example_3] example] is similar to [#REF_Example_2 Example 2], i.e. it shows a hit is encoded as one ''Generic Hits'' Data View in a Resource Fragment, that is embedded in a Resource. In contrast to Example 2, another Data View of type ''CMDI'' is embedded directly within the Resource. An Endpoint can use this type of Data View to directly provide CMDI metadata about the Resource to Clients.
    266266All entities of the Hit can be referenced by a persistent identifier and an URI. The complete Resource is referenceable by either the persistent identifier `http://hdl.handle.net/4711/08-15` or the URI `http://repos.example.org/file/text_08_15.html` and the CMDI metadata record in the CMDI Data View is referenceable either by the persistent identifier `http://hdl.handle.net/4711/08-15-1` or the URI `http://repos.example.org/file/08_15_1.cmdi`. The actual hit in the Resource Fragment is also directly referenceable by either the persistent identifier `http://hdl.handle.net/4711/00-15-2` or the URI `http://repos.example.org/file/text_08_15.html#sentence2`.   
    267267
     
    282282||=MIME type           =|| `application/x-clarin-fcs-hits+xml` ||
    283283||=Payload Disposition =|| ''inline'' ||
    284 The ''Generic Hits'' Data View contains the serialization of a search result hit. It supports multiple maskers for suppling highlighting for the hit. Each hit `SHOULD` be presented within the context of a complete sentence. If that is not possible due to the nature of the type of the resource, the the Endpoint `SHALL` provide an equivalent reasonable unit of context (e.g. within a phrase of a orthographic transcription of an utterance). All Endpoints `MUST` provide hits represented in this Data View. The XML fragment of the Generic Hits payload `MUST` be valid according to the XML schema "[source:FederatedSearch/schema/DataView-Hits.xsd DataView-Hits.xsd]" ([source:FederatedSearch/schema/DataView-Hits.xsd?format=txt download])
     284The ''Generic Hits'' Data View contains the serialization of a search result hit. It supports multiple maskers for suppling highlighting for the hit. Each hit `SHOULD` be presented within the context of a complete sentence. If that is not possible due to the nature of the type of the resource, the the Endpoint `SHALL` provide an equivalent reasonable unit of context (e.g. within a phrase of a orthographic transcription of an utterance). All Endpoints `MUST` provide hits represented in this Data View. The XML fragment of the Generic Hits payload `MUST` be valid according to the XML schema "[source:FederatedSearch/schema/DataView-Hits.xsd DataView-Hits.xsd]" ([source:FederatedSearch/schema/DataView-Hits.xsd?format=txt download]).
    285285 * Example (single hit marker):
    286286{{{#!xml
     
    317317</fcs:DataView>
    318318}}}
    319 
    320319 * Example (referenced):
    321320{{{#!xml
     
    362361
    363362
    364 === Endpoint Description and Identification ===
    365 
    366 Yada Yada Yada ...
     363=== Endpoint Description ===
     364
     365Endpoints need to provide information about their capabilities to support auto-configuration of Clients, This capabilities include, among other information, the Profile that is supported by an Endpoint. The ''Endpoint Description'' mechanism provides the necessary facility to provide this information to the Clients. Endpoints `MUST` encode their capabilities using an XML format and embed this information into the SRU/CQL protocol as described in section [#explain Operation ''explain'']. The XML fragment generated by the Endpoint for the Endpoint Description `MUST` be valid according to the XML schema "[source:FederatedSearch/schema/Endpoint-Description.xsd Endpoint-Description.xsd]" ([source:FederatedSearch/schema/Endpoint-Description.xsd?format=txt download]).
     366
     367The XML fragment for ''Endpoint Description'' is encoded as an `<ed:EndpointDescription>` element, that contains the following children:
     368 * one `<ed:Profile>` element (`REQUIRED`) \\
     369   The value of the `<ed:Profile>` element indicates the Profile, that is supported by the Endpoint. \\
     370   Valid values are:
     371    * `basic`: the Endpoint supports the ''basic'' Profile \\
     372   '''NOTE''': a future CLARIN-FCS specification will introduce more values.
     373 * one `<ed:SupportedDataViews>` (`REQUIRED`) \\
     374   A list of Data Views, that are supported by this Endpoint. This list is composed of one or more `<ed:SupportedDataView>` elements. The content of a `<ed:SupportedDataView>` `MUST` be the MIME type of a supported Data View, e.g. `application/x-clarin-fcs-hits+xml`.
     375 * one `<ed:Collections>` element (`REQUIRED`) \\
     376   A list of (top-level) collections that are available at the Endpoint. The `<ed:Collections>` element contains one or more `<ed:Collection>` elements (see below). An Endpoint `MUST` declare at least one (top-level) collection.
     377
     378The `<ed:Collection>` element contains a detailed description of a collection that is available at an Endpoint. A collection is a searchable entity, e.g. a single corpus. The `<ed:Collection>` has a mandatory `@pid` attribute, that contains persistent identifier of the collection. This value `MUST` be the same as the ''!MdSelfLink'' of the CMDI record describing the collection. The `<ed:Collection>` element contains the following children:
     379 * one or more `<ed:Title>` elements (`REQUIRED`) \\
     380   A human readable title for the collection. A `REQUIRED` `@xml:lang` attribute indicates the language of the title. An English title is `REQUIRED`. The list of titles `MUST NOT` contain duplicate entries for the same language.
     381 * zero or more `<ed:Description>` elements (`OPTIONAL`) \\
     382   An optional human-readable description of the collection. Is `SHOULD` be at most one sentence. A `REQUIRED` `@xml:lang` attribute indicates the language of the description. If supplied, an English version is `REQUIRED`. The list of descriptions `MUST NOT` contain duplicate entries for the same language.
     383 * zero or one `<ed:LandingPageURI>` element (`OPTIONAL`) \\
     384   A link to a website for this collection, e.g. a landing page for a collection, i.e. a web-site that describes a corpus.
     385 * one `<ed:Languages>` element (`REQUIRED`) \\
     386   The (relevant) languages available within the collection. The `<ed:Languages>` element contains one or more `<ed:Language>` elements. The content of a `<ed:Language>` element `MUST` be a ISO 639-3 three letter language code. This element should be repeated for all languages (relevant) available ''within'' the collection, however this list `MUST NOT` contain duplicate entries.
     387 * zero or one `<ed:Collections>` element (`OPTIONAL`) \\
     388   If a collection has searchable sub-collections the Endpoint `MUST` supply additional finer grained collection elements, which are wrapped in a `<ed:Collections>` element. A sub-collection is a searchable entity within a collection, e.g. a sub-corpus.
     389
     390[=#REF_ED_Example_1]Example 1:
     391{{{#!xml
     392<ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/1.0/endpoint-description">
     393  <ed:Profile>basic</ed:Profile>
     394  <ed:SupportedDataViews>
     395    <ed:SupportedDataView>application/x-clarin-fcs-hits+xml</ed:SupportedDataView>
     396  </ed:SupportedDataViews>
     397  <ed:Collections>
     398    <!-- just one top-level collection at the Endpoint -->
     399    <ed:Collection pid="http://hdl.handle.net/4711/0815">
     400      <ed:Title xml:lang="de">Goethe Corpus</ed:Title>
     401      <ed:Title xml:lang="en">Goethe Korpus</ed:Title>
     402      <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description>
     403      <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description>
     404      <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI>
     405      <ed:Languages>
     406        <ed:Language>deu</ed:Language>
     407      </ed:Languages>
     408    </ed:Collection>
     409  </ed:Collections>
     410</ed:EndpointDescription>
     411}}}
     412This [#REF_ED_Example_1 example] shows a simple Endpoint Description for an Endpoint that supports the ''basic'' Profile and only provides the Generic Hits Data View. It only provides one top-level collection identified by the persistent identifier `http://hdl.handle.net/4711/0815`. The collection a title as well as a description in German and English. A landing page is located at `http://repos.example.org/corpus1.html`. The searchable collection contents are only available in German.
     413
     414[=#REF_ED_Example_2]Example 2:
     415{{{#!xml
     416<ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/1.0/endpoint-description">
     417  <ed:Profile>basic</ed:Profile>
     418  <ed:SupportedDataViews>
     419    <ed:SupportedDataView>application/x-clarin-fcs-hits+xml</ed:SupportedDataView>
     420    <ed:SupportedDataView>application/x-cmdi+xml</ed:SupportedDataView>
     421  </ed:SupportedDataViews>
     422  <ed:Collections>
     423    <!-- top-level collection 1 -->
     424    <ed:Collection pid="http://hdl.handle.net/4711/0815">
     425      <ed:Title xml:lang="de">Goethe Corpus</ed:Title>
     426      <ed:Title xml:lang="en">Goethe Korpus</ed:Title>
     427      <ed:Description xml:lang="de">Der Goethe Korpus des IDS Mannheim.</ed:Description>
     428      <ed:Description xml:lang="en">The Goethe corpus of IDS Mannheim.</ed:Description>
     429      <ed:LandingPageURI>http://repos.example.org/corpus1.html</ed:LandingPageURI>
     430      <ed:Languages>
     431        <ed:Language>deu</ed:Language>
     432      </ed:Languages>
     433    </ed:Collection>
     434    <!-- top-level collection 2 -->
     435    <ed:Collection pid="http://hdl.handle.net/4711/0816">
     436      <ed:Title xml:lang="de">Mannheimer Morgen newspaper Corpus</ed:Title>
     437      <ed:Title xml:lang="en">Zeitungskorpus des Mannheimer Morgen</ed:Title>
     438      <ed:LandingPageURI>http://repos.example.org/corpus2.html</ed:LandingPageURI>
     439      <ed:Languages>
     440        <ed:Language>deu</ed:Language>
     441      </ed:Languages>
     442      <ed:Collections>
     443        <!-- sub-collection 1 of top-level collection 2 -->
     444        <ed:Collection pid="http://hdl.handle.net/4711/0816-1">
     445          <ed:Title xml:lang="de">Mannheimer Morgen newspaper Corpus (before 1990)</ed:Title>
     446          <ed:Title xml:lang="en">Zeitungskorpus des Mannheimer Morgen (vor 1990)</ed:Title>
     447          <ed:LandingPageURI>http://repos.example.org/corpus2.html#sub1</ed:LandingPageURI>
     448          <ed:Languages>
     449            <ed:Language>deu</ed:Language>
     450          </ed:Languages>
     451        </ed:Collection>
     452        <!-- sub-collection 2 of top-level collection 2 -->
     453        <ed:Collection pid="http://hdl.handle.net/4711/0816-2">
     454          <ed:Title xml:lang="de">Mannheimer Morgen newspaper Corpus (after 1990)</ed:Title>
     455          <ed:Title xml:lang="en">Zeitungskorpus des Mannheimer Morgen (nach 1990)</ed:Title>
     456          <ed:LandingPageURI>http://repos.example.org/corpus2.html#sub2</ed:LandingPageURI>
     457          <ed:Languages>
     458            <ed:Language>deu</ed:Language>
     459          </ed:Languages>
     460        </ed:Collection>
     461      </ed:Collections>
     462    </ed:Collection>
     463  </ed:Collections>
     464</ed:EndpointDescription>
     465}}}
     466This more complex [#REF_ED_Example_2 example] show a Endpoint Description for an Endpoint that, similar to [#REF_ED_Example_1 Example 1], supports the ''basic'' profile. In addition to the Generic Hits Data View it also supports CMDI the CMDI Data View. The Endpoint has two top-level collections (identified by the persistent identifiers `http://hdl.handle.net/4711/0815` and `http://hdl.handle.net/4711/0816`. The second top-level collection has two sub-collections, identified by the persistent identifier `http://hdl.handle.net/4711/0816-1` and `http://hdl.handle.net/4711/0816-2`. All collections are described using several properties, like title, description, etc.
    367467
    368468
     
    393493
    394494
    395 === Endpoint Identification ===
    396 
    397 Is mapped to SRU ''explain'' operation. Yada yada ...
    398 
    399 
    400 === Performing Queries and returning results ===#query
    401 
    402 Is mapped to SRU ''!SearchRetrieve'' operation. Yada yada ...
     495=== Operation ''explain'' ===#explain
     496
     497Yada yada ...
     498
     499=== Operation ''scan'' ===#scan
     500
     501The SRU operation ''scan'' is currently not used in the ''basic'' profile of CLARIN-FCS. An ''extended'' profile may use this operation, therefore Endpoints are `NOT RECOMMENDED` to define custom extensions that use operation.
     502
     503
     504=== Operation ''searchRetrieve'' ===#searchRetrieve
     505
     506Yada yada ...
     507
    403508
    404509== Appendix: Non-normative