Changes between Version 80 and Version 81 of FCS-Specification-ScrapBook


Ignore:
Timestamp:
03/14/14 13:57:35 (11 years ago)
Author:
Oliver Schonefeld
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FCS-Specification-ScrapBook

    v80 v81  
    7474    This is nor yet very useful for the ''basic'' Profile, but will be useful for ''extended'' Profile(s).
    7575 3. '''Need-to-Request Data Views''': Data Views will be classified into a "send-by-default" and a "need-to-request" class (which need to be documented in the spec analogous to the payload disposition). The first would indicate, that the Endpoint will include this data view unconditionally, e.g. the Generic Hits view. For the second class, a Client explicitly needs to request the Data View by using another custom query parameter (`x-fsc-request-dataviews`?) and supplying a (list of) data view(s). The Endpoint Description should also indicate to which class a data views to. This way, Endpoints could also indicate, that they e.g. always include some data view, that has been marked as "need-to-request" by the spec. This change would enable FCS to add/define Data Views that are too "expensive" (e.g. in terms of computational power or bandwidth) to generate for every request.
    76  4. Separate Data Views from Core-Spec; add them in an appendix, if a "one document" spec is compiled. Generic Hits however will also stay in Core-Spec, because it's the lowest common denominator
    77 
    78 == Proposal for new specification ==
     76
     77== Proposal for new specification: Federated Content Search - Core Specification ==
    7978The following is a proposal for a revisited federated content search specification. When done, cut and paste to the appropriate section of the Wiki and publish on the CLARIN web page.
    80 
    8179----
    82 
    83 = CLARIN Federated Content Search (CLARIN-FCS) =
     80= CLARIN Federated Content Search (CLARIN-FCS) - Core =
    8481[[PageOutline(1-5)]]
    8582
    8683== Introduction ==
    87 The main goal of CLARIN federated content search (CLARIN-FCS) is to introduce a ''interface specification'', to decouple the ''search engine'' functionality from its ''exploitation'', i.e. user-interfaces, third-party applications and to allow services to access search engines in a uniform way.
     84The goal of the ''CLARIN Federated Content Search (CLARIN-FCS) - Core'' specification is to introduce a ''interface specification'', that decouples the ''search engine'' functionality from its ''exploitation'', i.e. user-interfaces, third-party applications and to allow services to access heterogeneous search engines in a uniform way.
    8885
    8986=== Terminology ===
     
    190187    Media Type Specifications and Registration Procedures, IETF RFC 6838, January 2013, \\
    191188    [http://www.ietf.org/rfc/rfc6838.txt]
     189
    192190 RFC3023[=#REF_RFC_3023]::
    193191    XML Media Types, IETF RFC 3023, January 2001, \\
    194192    [http://www.ietf.org/rfc/rfc3023.txt]
    195  KML[=#REF_KML_Spec]::
    196     Keyhole Markup Language (KML), Open Geospatial Consortium, 2008, \\
    197     [http://www.opengeospatial.org/standards/kml]
    198193
    199194=== Typographic and XML Namespace conventions ===
     
    208203
    209204The following XML namespace names and prefixes are used throughout this specification. The column "Recommended Syntax" indicates, which syntax variant `SHOULD` be used by the Endpoint to serialize the XML response.
    210 ||=Prefix =||=Namespace Name                                 =||=Comment                        =||= Recommended Syntax =||
     205||=Prefix =||=Namespace Name                                 =||=Comment                        =||=Recommended Syntax =||
    211206|| `fsc`   || `http://clarin.eu/fcs/resource`                 || CLARIN-FCS Resources            || prefixed ||
    212207|| `ed`    || `http://clarin.eu/fcs/endpoint-description`     || CLARIN-FCS Endpoint Description || prefixed ||
     
    357352||=MIME type           =|| `application/x-clarin-fcs-hits+xml` ||
    358353||=Payload Disposition =|| ''inline'' ||
    359 The ''Generic Hits'' Data View serves as the ''most basic'' agreement in CLARIN-FCS for serialization search results and `MUST` be implemented by all Endpoints. In many cases, this Data View can only serve as an (lossy) approximation, because resources at Endpoints are very heterogeneous. E.g. the Generic Hits Data View is probably not the best representation for a hit result in a corpus of spoken language, but an architecture like CLARIN-FCS requires one common representation to be implemented by all Endpoints, therefore this Data View was defined. The Generic Hits Data View supports multiple markers for suppling highlighting for an individual hit, e.g. if a query contains a (boolean) conjunction the Endpoint can use multiple markers to provide individual highlights for the matching terms. An Endpoint `MUST NOT` use this Data View to aggregate several hits within one resource. Each hit `SHOULD` be presented within the context of a complete sentence. If that is not possible due to the nature of the type of the resource, the Endpoint `MUST` provide an equivalent reasonable unit of context (e.g. within a phrase of a orthographic transcription of an utterance). The `<hits:Hits>` element within the `<hits:Result>` element is not enforced by the XML schema, but Endpoints are `RECOMMENDED` to use it. The XML fragment of the Generic Hits payload `MUST` be valid according to the XML schema "[source:FederatedSearch/schema/DataView-Hits.xsd DataView-Hits.xsd]" ([source:FederatedSearch/schema/DataView-Hits.xsd?format=txt download]).
     354The ''Generic Hits'' Data View serves as the ''most basic'' agreement in CLARIN-FCS for serialization of search results and `MUST` be implemented by all Endpoints. In many cases, this Data View can only serve as an (lossy) approximation, because resources at Endpoints are very heterogeneous. E.g. the Generic Hits Data View is probably not the best representation for a hit result in a corpus of spoken language, but an architecture like CLARIN-FCS requires one common representation to be implemented by all Endpoints, therefore this Data View was defined. The Generic Hits Data View supports multiple markers for suppling highlighting for an individual hit, e.g. if a query contains a (boolean) conjunction the Endpoint can use multiple markers to provide individual highlights for the matching terms. An Endpoint `MUST NOT` use this Data View to aggregate several hits within one resource. Each hit `SHOULD` be presented within the context of a complete sentence. If that is not possible due to the nature of the type of the resource, the Endpoint `MUST` provide an equivalent reasonable unit of context (e.g. within a phrase of a orthographic transcription of an utterance). The `<hits:Hits>` element within the `<hits:Result>` element is not enforced by the XML schema, but Endpoints are `RECOMMENDED` to use it. The XML fragment of the Generic Hits payload `MUST` be valid according to the XML schema "[source:FederatedSearch/schema/DataView-Hits.xsd DataView-Hits.xsd]" ([source:FederatedSearch/schema/DataView-Hits.xsd?format=txt download]).
    360355 * Example (single hit marker):
    361356{{{#!xml
     
    374369    The quick brown <hits:Hit>fox</hits:Hit> jumps over the lazy <hits:Hit>dog</hits:Hit>.
    375370  </hits:Result>
    376 </fcs:DataView>
    377 }}}
    378 
    379 ===== Component Metadata (CMDI) =====
    380 ||=Description         =|| A CMDI metadata record ||
    381 ||=MIME type           =|| `application/x-cmdi+xml` ||
    382 ||=Payload Disposition =|| ''inline'' or ''reference'' ||
    383 The ''Component Metadata'' Data View allows to embed a CMDI metadata record that is ''applicable'' to the specific context into the Endpoint response, e.g. metadata about the resource in which the hit was produced. If this CMDI record is applicable for the entire Resource, it `SHOULD` be put in a `<fcs:DataView>` element below the `<fcs:Resource>` element. If it is applicable to the Resource Fragment, i.e. it contains more specialized metadata than the metadata for the encompassing resource, it `SHOULD` be put in a `<fcs:DataView>` element below the `<fcs:ResourceFragment>` element. Endpoints `SHOULD` provide the payload ''inline'', but Endpoints `MAY` also use the ''reference'' method. If an Endpoint uses the ''reference'' method, the CMDI metadata record `MUST` be downloadable without any restrictions.
    384  * Example (inline):
    385 {{{#!xml
    386 <!-- potential @pid and @ref attributes omitted -->
    387 <fcs:DataView type="application/x-cmdi+xml">
    388   <cmdi:CMD xmlns:cmdi="http://www.clarin.eu/cmd/" CMDVersion="1.1">
    389     <!-- content omitted -->
    390   </cmdi:CMD>
    391 </fcs:DataView>
    392 }}}
    393  * Example (referenced):
    394 {{{#!xml
    395 <!-- potential @pid attribute omitted -->
    396 <fcs:DataView type="application/x-cmdi+xml" ref="http://repos.example.org/resources/4711/0815.cmdi" />
    397 }}}
    398 
    399 ===== Images (IMG) =====
    400 ||=Description         =|| An image related to the hit ||
    401 ||=MIME type           =|| `image/png`, `image/jpeg`, `image/gif`, `image/svg+xml` ||
    402 ||=Payload Disposition =|| ''reference'' ||
    403 
    404 The ''Image'' Data View allows to provide an image, that is relevant to the hit, e.g. a facsimile of the source of a transcription. Endpoints `MUST` provide the payload by the ''reference'' method and the image file `SHOULD` be downloadable without any restrictions.
    405  * Example:
    406 {{{#!xml
    407 <!-- potential @pid attribute omitted -->
    408 <fcs:DataView type="image/png" ref="http://repos.example.org/resources/4711/0815.png" />
    409 }}}
    410 
    411 ===== Geolocation (GEO) =====
    412 ||=Description         =|| An geographic location related to the hit ||
    413 ||=MIME type           =|| `application/vnd.google-earth.kml+xml` ||
    414 ||=Payload Disposition =|| ''inline'' ||
    415 The ''Geolocation'' Data View allows to geolocalize a hit. If `MUST` be encoded using the XML representation of the Keyhole Markup Language (KML). The KML fragment `MUST` comply with the specification as defined by [#REF_KML_Spec KML].
    416  * Example:
    417 {{{#!xml
    418 <!-- potential @pid and @ref attributes omitted -->
    419 <fcs:DataView type="application/vnd.google-earth.kml+xml">
    420   <kml:kml xmlns:kml="http://www.opengis.net/kml/2.2">
    421     <kml:Placemark>
    422       <kml:name>IDS Mannheim</kml:name>
    423       <kml:description>Institut für Deutsche Sprache, R5 6-13, 68161 Mannheim, Germany</kml:description>
    424       <kml:Point>
    425         <kml:coordinates>8.4719510,49.4883700,0</kml:coordinates>
    426       </kml:Point>
    427     </kml:Placemark>
    428   </kml:kml>
    429371</fcs:DataView>
    430372}}}
     
    828770}}}
    829771----
     772----
     773== Proposal for new specification: Federated Content Search - Data Views ==
     774The following is a proposal for the separate specification of Data Views for CLARIN-FCS. When done, cut and paste to the appropriate section of the Wiki and publish on the CLARIN web page.
     775----
     776= CLARIN Federated Content Search (CLARIN-FCS) - Data Views =
     777[[PageOutline(1-5)]]
     778
     779== Introduction ==
     780This specification is a supplementary specification to the CLARIN-FCS Core specification and defines additional Data View to be used in CLARIN-FCS. This specification will tersely describe the supplementary Data Views. For detailed information about the CLARIN-FCS ''interface specification'' see [#REF_FCS_Core CLARIN-FCS-Core].
     781
     782=== Terminology ===
     783The key words `MUST`, `MUST NOT`, `REQUIRED`, `SHALL`, `SHALL NOT`, `SHOULD`, `SHOULD NOT`, `RECOMMENDED`, `MAY`, and `OPTIONAL` in this document are to be interpreted as described in [#REF_RFC_2119 RFC2119].
     784
     785=== Normative References ===
     786 RFC2119[=#REF_RFC_2119]::
     787    Key words for use in RFCs to Indicate Requirement Levels, IETF RFC 2119, March 1997, \\
     788    [http://www.ietf.org/rfc/rfc2119.txt]
     789
     790 XML-Namespaces[=#REF_XML_Namespaces]::
     791    Namespaces in XML 1.0 (Third Edition), W3C, 8 December 2009, \\
     792    [http://www.w3.org/TR/2009/REC-xml-names-20091208/]
     793
     794 CLARIN-FCS-Core[=#REF_FCS_Core]::
     795    CLARIN Federated Content Search (CLARIN-FCS) - Core, SCCTC FCS Task Force, March 2014, \\
     796    [http://www.clarin.eu/fcs/add/link/here]   
     797
     798=== Non-Normative References ===
     799 RFC6838[=#REF_RFC_6838]::
     800    Media Type Specifications and Registration Procedures, IETF RFC 6838, January 2013, \\
     801    [http://www.ietf.org/rfc/rfc6838.txt]
     802
     803 RFC3023[=#REF_RFC_3023]::
     804    XML Media Types, IETF RFC 3023, January 2001, \\
     805    [http://www.ietf.org/rfc/rfc3023.txt]
     806
     807 KML[=#REF_KML_Spec]::
     808    Keyhole Markup Language (KML), Open Geospatial Consortium, 2008, \\
     809    [http://www.opengeospatial.org/standards/kml]
     810
     811=== Typographic and XML Namespace conventions ===
     812The following typographic conventions for XML fragments will be used throughout this specification:
     813 * `<prefix:Element>` \\ An XML element with the Generic Identifier ''Element'' that is bound to an XML namespace denoted by the prefix ''prefix''.
     814 * `@attr` \\ An XML attribute with the name ''attr''
     815{{{#!comment
     816 * `@prefix:attr` \\ An XML attribute with the name ''attr'' that is bound to an XML namespaces denoted by the prefix ''prefix''.
     817}}}
     818 * `string` \\ The literal ''string'' must be used either as element content or attribute value.
     819Endpoints and Clients `MUST` adhere the [#REF_XML_Namespaces XML-Namespaces] specification. The CLARIN-FCS interface specification generally does not dictate whether XML elements should be serialized in their prefixed or non-prefixed syntax, but Endpoints `MUST` ensure that the correct XML namespace is used for elements and that XML namespaces are declared correctly. Clients `MUST` be agnostic to which syntax for serializing the XML elements, i.e. if the prefixed or un-prefixed variant was used, and `SHOULD` operate solely on ''expanded names'', i.e. pairs of ''namespace name'' and ''local name''.
     820
     821The following XML namespace names and prefixes are used throughout this specification. The column "Recommended Syntax" indicates, which syntax variant `SHOULD` be used by the Endpoint to serialize the XML response.
     822||=Prefix =||=Namespace Name                                 =||=Comment                        =||=Recommended Syntax =||
     823|| `fsc`   || `http://clarin.eu/fcs/resource`                 || CLARIN-FCS Resources            || prefixed ||
     824|| `cmdi`  || `http://www.clarin.eu/cmd/`                     || Component Metadata              || un-prefixed ||
     825|| `kml`   || `http://www.opengis.net/kml/2.2`                || Keyhole Markup Language         || un-prefixed ||
     826
     827== Data Views ==
     828A Data View serves as a container for representing search results within CLARIN-FCS. Data Views are designed to allow for different representations of results. This specification defines supplementary Data Views beyond the Generic Hits Data View, that is part of the CLARIN-FCS Core specification. For detailed information what Data Views are and how they are integrated in CLARIN-FCS see [#REF_FCS_Core CLARIN-FCS-Core].
     829
     830'''NOTE''': The examples in the following sections ''show only'' the payload with the enclosing `<fcs:DataView>` element of a Data View. Of course, the Data View must be embedded either in a `<fcs:Resource>` or a `<fcs:ResourceFragment>` element. The `@pid` and `@ref` attributes have been omitted for all ''inline'' payload types.
     831
     832=== Generic Hits (HITS) ===
     833The ''Generic Hits'' (HITS) Data View is an integral part of the Core specification and serves as the as the ''most basic'' agreement in CLARIN-FCS for the serialization of search results. For details about this Data View, see refer to the Core specification [#REF_FCS_Core CLARIN-FCS-Core, Section "Generic Hits (HITS)"].
     834
     835=== Component Metadata (CMDI) ===
     836||=Description         =|| A CMDI metadata record ||
     837||=MIME type           =|| `application/x-cmdi+xml` ||
     838||=Payload Disposition =|| ''inline'' or ''reference'' ||
     839The ''Component Metadata'' Data View allows to embed a CMDI metadata record that is ''applicable'' to the specific context into the Endpoint response, e.g. metadata about the resource in which the hit was produced. If this CMDI record is applicable for the entire Resource, it `SHOULD` be put in a `<fcs:DataView>` element below the `<fcs:Resource>` element. If it is applicable to the Resource Fragment, i.e. it contains more specialized metadata than the metadata for the encompassing resource, it `SHOULD` be put in a `<fcs:DataView>` element below the `<fcs:ResourceFragment>` element. Endpoints `SHOULD` provide the payload ''inline'', but Endpoints `MAY` also use the ''reference'' method. If an Endpoint uses the ''reference'' method, the CMDI metadata record `MUST` be downloadable without any restrictions.
     840 * Example (inline):
     841{{{#!xml
     842<!-- potential @pid and @ref attributes omitted -->
     843<fcs:DataView type="application/x-cmdi+xml">
     844  <cmdi:CMD xmlns:cmdi="http://www.clarin.eu/cmd/" CMDVersion="1.1">
     845    <!-- content omitted -->
     846  </cmdi:CMD>
     847</fcs:DataView>
     848}}}
     849 * Example (referenced):
     850{{{#!xml
     851<!-- potential @pid attribute omitted -->
     852<fcs:DataView type="application/x-cmdi+xml" ref="http://repos.example.org/resources/4711/0815.cmdi" />
     853}}}
     854
     855=== Images (IMG) ===
     856||=Description         =|| An image related to the hit ||
     857||=MIME type           =|| `image/png`, `image/jpeg`, `image/gif`, `image/svg+xml` ||
     858||=Payload Disposition =|| ''reference'' ||
     859
     860The ''Image'' Data View allows to provide an image, that is relevant to the hit, e.g. a facsimile of the source of a transcription. Endpoints `MUST` provide the payload by the ''reference'' method and the image file `SHOULD` be downloadable without any restrictions.
     861 * Example:
     862{{{#!xml
     863<!-- potential @pid attribute omitted -->
     864<fcs:DataView type="image/png" ref="http://repos.example.org/resources/4711/0815.png" />
     865}}}
     866
     867=== Geolocation (GEO) ===
     868||=Description         =|| An geographic location related to the hit ||
     869||=MIME type           =|| `application/vnd.google-earth.kml+xml` ||
     870||=Payload Disposition =|| ''inline'' ||
     871The ''Geolocation'' Data View allows to geolocalize a hit. If `MUST` be encoded using the XML representation of the Keyhole Markup Language (KML). The KML fragment `MUST` comply with the specification as defined by [#REF_KML_Spec KML].
     872 * Example:
     873{{{#!xml
     874<!-- potential @pid and @ref attributes omitted -->
     875<fcs:DataView type="application/vnd.google-earth.kml+xml">
     876  <kml:kml xmlns:kml="http://www.opengis.net/kml/2.2">
     877    <kml:Placemark>
     878      <kml:name>IDS Mannheim</kml:name>
     879      <kml:description>Institut für Deutsche Sprache, R5 6-13, 68161 Mannheim, Germany</kml:description>
     880      <kml:Point>
     881        <kml:coordinates>8.4719510,49.4883700,0</kml:coordinates>
     882      </kml:Point>
     883    </kml:Placemark>
     884  </kml:kml>
     885</fcs:DataView>
     886}}}
     887----
    830888= Discussion =
    831889* [[teckart|Thomas (ASV)]]: In the 'Basic profile' section it is stated that 'Endpoint must not silently accept queries that include CLQ features besides term-only and terms combined...'. What would be the desired action then (issuing a diagnositic?)?