Changes between Version 58 and Version 59 of FCS-Specification-ScrapBook


Ignore:
Timestamp:
02/17/14 18:58:23 (10 years ago)
Author:
oschonef
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FCS-Specification-ScrapBook

    v58 v59  
    753753 * [[oschonef|Oliver (IDS)]]: The question is, what different functionality do we want to differentiate. And, of course, we need to encode these features for Clients. I think, the basic profile is quite clear in what Endpoints need to support. All the "interesting" (query expansion, data cats, etc) bits are nowhere clear enough to have them specified. I want to do this in the yet-to-be defined  "extended" profile, which could also foresee such a matrix. That could be, e .g. an additional list of `<feature>` elements that will be part of the endpoint description. However, I want to keep the distinction between a basic profile, which people can implement right ways, and more advanced stuff. We really need to have a normative spec.
    754754  * [[teckart|Thomas (ASV)]]: I absolutly agree. Just wanted to point out that maybe we already can think of how these features will be encoded (without a decision of what features will make it into the future extended specification).
     755   * [[oschonef|Oliver (IDS)]]: We could already prepare for for feature encoding, by adding adding a `<feature>` elements to the Endpoint Description. Or better call the capabilities, so something along the lines of:
     756{{{#!xml
     757<ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/1.0/endpoint-description">
     758  <ed:Profile>basic</ed:Profile>
     759  <ed:Capabilities>
     760    <!-- capabilities should be identified by closed vocabulary; maybe encoded by URIs? -->
     761    <ed:Capability>http://clarin.eu/fcs/1.0/feature/basic-search</ed:Capability>
     762    <!-- actually the next would already be an extended capability beyond the basic profile -->
     763    <ed:Capability>http://clarin.eu/fcs/1.0/feature/query-expansion</ed:Capability>
     764  </ed:Capabilities>
     765  <!-- other EndpointDescription stuff ... -->
     766</ed:EndpointDescription>
     767}}}
     768     If we do this, it to be decided, if we want to keep the `<Profile>` element or if we decide, that this information is redundant and a profiles will require a certain set of capabilities. However, if we ditch `<Profile>`, it's not so easy for a Client to decide spot-on what profile is supported by the Endpoint.
    755769* [[teckart|Thomas (ASV)]]: The endpoint specification contains information about supported profile and dataviews. This is no problem for an endpoint with only one resource or homogenous resources. When we extend the profile by adding information about annotation tiers this may not be applicable for all provided resources (like an endpoint that provides two corpora, where only one is annotated with POS tags). So maybe some of this information is not specific for the endpoint but for the resource.
    756770 * [[oschonef|Oliver (IDS)]]: My aim was to consider the list of data views as a hint, what data views  are available at an Endpoint. The semantics is not, that ''every'' resource/collections supports all these data views. It's a little influenced by what SRU does in explain with `<zr:schemaInfo>`.  We could keep it this way, remove it of extend it, so this information can be given on resource/collection level
    757771  * [[teckart|Thomas (ASV)]]: Right. We should at least think about it as this information could be quite relevant for the user experience in an aggregator.
     772    * [[oschonef|Oliver (IDS)]]: Yes, so do you think we should mark "supported dataviews" per collection/resource? Something along the lines like:
     773{{{#!xml
     774<ed:EndpointDescription xmlns:ed="http://clarin.eu/fcs/1.0/endpoint-description">
     775  <ed:Profile>basic</ed:Profile>
     776  <ed:SupportedDataViews>
     777    <ed:SupportedDataView id="dv1">application/x-clarin-fcs-hits+xml</ed:SupportedDataView>
     778    <ed:SupportedDataView id="dv2">application/x-cmdi+xml</ed:SupportedDataView>
     779    <ed:SupportedDataView id="dv3">image/png</ed:SupportedDataView>
     780  </ed:SupportedDataViews>
     781  <ed:Collections>
     782    <ed:Collection pid="http://hdl.handle.net/4711/0815">
     783      <!-- NB: regular stuff skipped -->
     784      <ed:SupportedDataViews ref="dv1" />
     785    </ed:Collection>
     786    <ed:Collection pid="http://hdl.handle.net/4711/0816">
     787      <!-- NB: regular stuff skipped -->
     788      <ed:SupportedDataViews ref="dv1 dv2" />
     789    </ed:Collection>
     790    <ed:Collection pid="http://hdl.handle.net/1">
     791      <!-- NB: regular stuff skipped -->
     792      <ed:SupportedDataViews ref="dv1 dv2 dv3" />
     793      <ed:Collections>
     794        <ed:Collection pid="http://hdl.handle.net/1/1">
     795          <!-- NB: regular stuff skipped -->
     796          <ed:SupportedDataViews ref="dv1 dv2" />
     797        </ed:Collection>
     798        <ed:Collection pid="http://hdl.handle.net/1/2">
     799          <!-- NB: regular stuff skipped -->
     800          <ed:SupportedDataViews ref="dv1 dv3" />
     801        </ed:Collection>
     802      </ed:Collections>
     803    </ed:Collection>
     804  </ed:Collections>
     805</ed:EndpointDescription>
     806}}}
     807      What about Collections with Sub-Collections? Parent collection would not indicate the supported data views or it would be the union of supported dataviews of their child-collections? Is there a use-case where collection `http://hdl.handle.net/1` contains more data (= resources) then union of `http://hdl.handle.net/1/1` and `http://hdl.handle.net/1/2`, i.e. there are resources "in between" the child-collection and it's parent? If so, could they have "conflicting" dataviews?
     808    * [[oschonef|Oliver (IDS)]]: Shall we foresee some mechanism for Clients to tell the Endpoint in what data views they are interested? E.g. the Endpoint may support Geolocation but the Client does not care or support it and could ask the Endpoint at query time ''to not'' serialize the Geolocation data view (= less wasted bytes send over the network).
    758809* [[teckart|Thomas (ASV)]]: The current (old) solution for exposing granularity and structure of supported collections is a multiple-staged mechanism: the client queries for the first-level structure (=collections) and can explicitly ask the endpoint to give additional information about the internal structure of these collections (and so on...). This is very helpful for endpoints which support queries on detailled subcollections. The proposed solution above would force the endpoint to expose the complete structure of all provided resources in the explain response, which would lead (for example for the endpoint in Leipzig) to very large responses.
    759  * [[oschonef|Oliver (IDS)]]: True, but the old approach is overly complex. If the response is large (> 100MB), so be it. An efficient Endpoint implementation should do an streaming approach when serializing the response and the Client should not assume, that the response to this information will fir in 1MB of memory. If it's is hard for the endpoint to compile this list (e.g. it requires complex database queries), it's IMHO again, a matter of the endpoint to cache this information (in memory or disk) and just stream it into the explain response.
     810 * [[oschonef|Oliver (IDS)]]: True, but the old approach is overly complex. If the response is large (> 100MB), so be it. An efficient Endpoint implementation should do an streaming approach when serializing the response and the Client should not assume, that the response to this information will fit in, let's say,1MB of memory. If it's is hard for the endpoint to compile this list (e.g. it requires complex database queries), it's IMHO again, a matter of the endpoint to cache this information (in memory or disk) and just stream it into the explain response.
    760811* [[teckart|Thomas (ASV)]]: For endpoints that support many collections it can be a problem to force them to allow unrestricted queries regarding resource selection. In these cases every query would lead to a high load on the local search engine. Instead I would prefer to at least allow using a default collection if no restrictions are provided by the client.
    761812  * [[oschonef|Oliver (IDS)]]: This has already been the case with then old specification. That's more more in line with the ''searchRetrieve'' semantics of SRU. We should discuss this within the Taskforce.