Context Navigation

Changes between Version 62 and Version 63 of FCS-Specification-ScrapBook

Timestamp:: 02/17/14 21:36:11 (10 years ago)
Author:: oschonef
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

FCS-Specification-ScrapBook

-                      v62
+                      v63
      If we do this, it to be decided, if we want to keep the `<Profile>` element or if we decide, that this information is redundant and a profiles will require a certain set of capabilities. However, if we ditch `<Profile>`, it's not so easy for a Client to decide spot-on what profile is supported by the Endpoint.
     * [[teckart|Thomas (ASV)]]: I am not sure if this is a problem, but we could interpret "Capabilities" just as a verbose listing of all capabilities of the endpoint. The <ed:Profile> is just the shorter version regarding our definition of what is "basic" and what is "extended". That way our standard aggregator can just work with the profile name, whereas other clients (that don't care how we interpret these terms) could look if an endpoint supports all functionality they need.
+      * [[oschonef|Oliver (IDS)]]: Ok, then let's keep `<ed:Profile>` and add the verbose capabilities list to the Endpoint Description. A TODO for the ''basic'' profile is then to decide the ''basic'' capabilities are (I think this should just be one 'basic-search').
 * [[teckart|Thomas (ASV)]]: The endpoint specification contains information about supported profile and dataviews. This is no problem for an endpoint with only one resource or homogenous resources. When we extend the profile by adding information about annotation tiers this may not be applicable for all provided resources (like an endpoint that provides two corpora, where only one is annotated with POS tags). So maybe some of this information is not specific for the endpoint but for the resource.
  * [[oschonef|Oliver (IDS)]]: My aim was to consider the list of data views as a hint, what data views  are available at an Endpoint. The semantics is not, that ''every'' resource/collections supports all these data views. It's a little influenced by what SRU does in explain with `<zr:schemaInfo>`.  We could keep it this way, remove it of extend it, so this information can be given on resource/collection level
 …
       What about Collections with Sub-Collections? Parent collection would not indicate the supported data views or it would be the union of supported dataviews of their child-collections? Is there a use-case where collection `http://hdl.handle.net/1` contains more data (= resources) then union of `http://hdl.handle.net/1/1` and `http://hdl.handle.net/1/2`, i.e. there are resources "in between" the child-collection and it's parent? If so, could they have "conflicting" dataviews?
      * [[teckart|Thomas (ASV)]]: I am not sure about that these "invisible" resources. But in general we could think of this as normal inheritance. Every root collection element could specify the minimal set of supported dataviews for all daughter nodes (or if missing it is assumed that all entries in SupportedDataViews are supported). Every node in the sub-collection tree can overwrite this configuration. When we have Hits as mandatory dataview even for otherwise disjoint sets of dataviews in the sub-collections the root collection can at least provide Hits for everything.
+       * [[oschonef|Oliver (IDS)]]: Yes, I like that. It did not occur to me to think of this as an ''inheritance'' model. To summarize, child collections can add additional supported data views. The snipped above would then need to be corrected, i.e. the collection `http://hdl.handle.net/1` can only "announce" `dv1` (Generic Hits) as this is the only subset of data views by both child nodes.
     * [[oschonef|Oliver (IDS)]]: Shall we foresee some mechanism for Clients to tell the Endpoint in what data views they are interested? E.g. the Endpoint may support Geolocation but the Client does not care or support it and could ask the Endpoint at query time ''to not'' serialize the Geolocation data view (= less wasted bytes send over the network).
      * [[teckart|Thomas (ASV)]]: I think that this a good idea. Maybe in some cases it could also be useful to provide dataviews only if they are explicitly requested. This could allow adding data views that are too "expensive" (computational or regarding bandwith) to generate for every request.
+       * [[oschonef|Oliver (IDS)]]: Then I propose to classify the data views into a "send-by-default" and a "need-to-request" class (and document a default class in the spec analogous to the payload disposition). The first would indicate, that the Endpoint will include this data view unconditionally, e.g. the Generic Hits view. For the second class, we need to invent another custom query parameter (`x-clarin-fsc-request-datacviews`?) that the Clients need to send to explicitly request the Endpoint to include this additional (list of) data view(s). Furthermore, the Endpoint Description could also indicate to which class a data views to. This way, Endpoints could also indicate, that they e.g. always include some data view, that has been marked as "need-to-request" by the spec.
 * [[teckart|Thomas (ASV)]]: The current (old) solution for exposing granularity and structure of supported collections is a multiple-staged mechanism: the client queries for the first-level structure (=collections) and can explicitly ask the endpoint to give additional information about the internal structure of these collections (and so on...). This is very helpful for endpoints which support queries on detailled subcollections. The proposed solution above would force the endpoint to expose the complete structure of all provided resources in the explain response, which would lead (for example for the endpoint in Leipzig) to very large responses.
  * [[oschonef|Oliver (IDS)]]: True, but the old approach is overly complex. If the response is large (> 100MB), so be it. An efficient Endpoint implementation should do an streaming approach when serializing the response and the Client should not assume, that the response to this information will fit in, let's say,1MB of memory. If it's is hard for the endpoint to compile this list (e.g. it requires complex database queries), it's IMHO again, a matter of the endpoint to cache this information (in memory or disk) and just stream it into the explain response.