Changes between Version 55 and Version 56 of FCS-Specification-ScrapBook


Ignore:
Timestamp:
02/17/14 13:11:25 (10 years ago)
Author:
oschonef
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FCS-Specification-ScrapBook

    v55 v56  
    746746Thomas Eckart: some questions and remarks
    747747* In the 'Basic profile' section it is stated that 'Endpoint must not silently accept queries that include CLQ features besides term-only and terms combined...'. What would be the desired action then (issuing a diagnositic?)?
     748 * [[oschonef|Oliver (IDS)]]: Right, the desired action is to issue a diagnostic. We need to add that information to the text. In general, if something goes wrong, Endpoints shallissue appropriate diagnostic.
    748749* Do we really want to force every endpoint to support the Generic Hits dataview? For some resources there may be no 'literal text' (like audio, video, highly specific annotation formats) which could lead to some ugly string serialization instead of using custom extensions and which would leave the client with the interpretation of the output.
     750 * [[oschonef|Oliver (IDS)]]: I guess so, that's the lowest common denominator for an aggregator. We had the KWIC data view in the old specification, that served the same purpose. However, the latter does not support multiple hits. IMHO, we need to keep this common denominator, even if this is just an approximation for other resource types. Endpoints are always free to also add a more appropriate data view for the resource to their response.
    749751* Do we want to specify additional payloads for resource types like audio and video (referenced, attributes: mimetype + optional offset information)?
     752 * [[oschonef|Oliver (IDS)]]: Yes, we certainly  can do that, however I would need more input for that.
    750753* Is it a good idea to specify the capabilities of an endpoint by using profile names (where only one is allowed in the explain response)? In the past it was planed to use some 'feature matrix' to reflect the differences in supported functionality for the endpoints. This would allow to explicitly state which subset of possible functionalities (besides the basic stuff) is supported by an endpoint.
     754 * [[oschonef|Oliver (IDS)]]: The question is, what different functionality do we want to differentiate. And, of course, we need to encode these features for Clients. I think, the basic profile is quite clear in what Endpoints need to support. All the "interesting" (query expansion, data cats, etc) bits are nowhere clear enough to have them specified. I want to do this in the yet-to-be defined  "extended" profile, which could also foresee such a matrix. That could be, e .g. an additional list of `<feature>` elements that will be part of the endpoint description. However, I want to keep the distinction between a basic profile, which people can implement right ways, and more advanced stuff. We really need to have a normative spec.
    751755* The endpoint specification contains information about supported profile and dataviews. This is no problem for an endpoint with only one resource or homogenous resources. When we extend the profile by adding information about annotation tiers this may not be applicable for all provided resources (like an endpoint that provides two corpora, where only one is annotated with POS tags). So maybe some of this information is not specific for the endpoint but for the resource.
     756 * [[oschonef|Oliver (IDS)]]: My aim was to consider the list of data views as a hint, what data views  are available at an Endpoint. The semantics is not, that ''every'' resource/collections supports all these data views. It's a little influenced by what SRU does in explain with `<zr:schemaInfo>`.  We could keep it this way, remove it of extend it, so this information can be given on resource/collection level
    752757* The current (old) solution for exposing granularity and structure of supported collections is a multiple-staged mechanism: the client queries for the first-level structure (=collections) and can explicitly ask the endpoint to give additional information about the internal structure of these collections (and so on...). This is very helpful for endpoints which support queries on detailled subcollections. The proposed solution above would force the endpoint to expose the complete structure of all provided resources in the explain response, which would lead (for example for the endpoint in Leipzig) to very large responses.
     758 * [[oschonef|Oliver (IDS)]]: True, but the old approach is overly complex. If the response is large (> 100MB), so be it. An efficient Endpoint implementation should do an streaming approach when serializing the response and the Client should not assume, that the response to this information will fir in 1MB of memory. If it's is hard for the endpoint to compile this list (e.g. it requires complex database queries), it's IMHO again, a matter of the endpoint to cache this information (in memory or disk) and just stream it into the explain response.
    753759* For endpoints that support many collections it can be a problem to force them to allow unrestricted queries regarding resource selection. In these cases every query would lead to a high load on the local search engine. Instead I would prefer to at least allow using a default collection if no restrictions are provided by the client.
     760  * [[oschonef|Oliver (IDS)]]: This has already been the case with then old specification. That's more more in line with the ''searchRetrieve'' semantics of SRU. We should discuss this within the Taskforce.