Changes between Version 64 and Version 65 of FCS-Specification-ScrapBook


Ignore:
Timestamp:
02/18/14 14:58:49 (10 years ago)
Author:
teckart
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FCS-Specification-ScrapBook

    v64 v65  
    769769    * [[teckart|Thomas (ASV)]]: I am not sure if this is a problem, but we could interpret "Capabilities" just as a verbose listing of all capabilities of the endpoint. The <ed:Profile> is just the shorter version regarding our definition of what is "basic" and what is "extended". That way our standard aggregator can just work with the profile name, whereas other clients (that don't care how we interpret these terms) could look if an endpoint supports all functionality they need.
    770770      * [[oschonef|Oliver (IDS)]]: Ok, then let's keep `<ed:Profile>` and add the verbose capabilities list to the Endpoint Description. A TODO for the ''basic'' profile is then to decide the ''basic'' capabilities are (I think this should just be one 'basic-search').
     771       * [[teckart|Thomas (ASV)]]: I agree. Our basic profile hardly contains anything that can be differentiated more.
    771772* [[teckart|Thomas (ASV)]]: The endpoint specification contains information about supported profile and dataviews. This is no problem for an endpoint with only one resource or homogenous resources. When we extend the profile by adding information about annotation tiers this may not be applicable for all provided resources (like an endpoint that provides two corpora, where only one is annotated with POS tags). So maybe some of this information is not specific for the endpoint but for the resource.
    772773 * [[oschonef|Oliver (IDS)]]: My aim was to consider the list of data views as a hint, what data views  are available at an Endpoint. The semantics is not, that ''every'' resource/collections supports all these data views. It's a little influenced by what SRU does in explain with `<zr:schemaInfo>`.  We could keep it this way, remove it of extend it, so this information can be given on resource/collection level
     
    813814     * [[teckart|Thomas (ASV)]]: I think that this a good idea. Maybe in some cases it could also be useful to provide dataviews only if they are explicitly requested. This could allow adding data views that are too "expensive" (computational or regarding bandwith) to generate for every request.
    814815       * [[oschonef|Oliver (IDS)]]: Then I propose to classify the data views into a "send-by-default" and a "need-to-request" class (and document a default class in the spec analogous to the payload disposition). The first would indicate, that the Endpoint will include this data view unconditionally, e.g. the Generic Hits view. For the second class, we need to invent another custom query parameter (`x-clarin-fsc-request-datacviews`?) that the Clients need to send to explicitly request the Endpoint to include this additional (list of) data view(s). Furthermore, the Endpoint Description could also indicate to which class a data views to. This way, Endpoints could also indicate, that they e.g. always include some data view, that has been marked as "need-to-request" by the spec.
     816        * [[teckart|Thomas (ASV)]]: Good idea.
    815817* [[teckart|Thomas (ASV)]]: The current (old) solution for exposing granularity and structure of supported collections is a multiple-staged mechanism: the client queries for the first-level structure (=collections) and can explicitly ask the endpoint to give additional information about the internal structure of these collections (and so on...). This is very helpful for endpoints which support queries on detailled subcollections. The proposed solution above would force the endpoint to expose the complete structure of all provided resources in the explain response, which would lead (for example for the endpoint in Leipzig) to very large responses.
    816818 * [[oschonef|Oliver (IDS)]]: True, but the old approach is overly complex. If the response is large (> 100MB), so be it. An efficient Endpoint implementation should do an streaming approach when serializing the response and the Client should not assume, that the response to this information will fit in, let's say,1MB of memory. If it's is hard for the endpoint to compile this list (e.g. it requires complex database queries), it's IMHO again, a matter of the endpoint to cache this information (in memory or disk) and just stream it into the explain response.
     
    818820  * [[oschonef|Oliver (IDS)]]: This has already been the case with then old specification. That's more more in line with the ''searchRetrieve'' semantics of SRU. We should discuss this within the Taskforce.\\
    819821    How about the following solution: By default, Endpoint should query all resources, if no `x-clarin-fcs-context` parameter is supplied by the Client (like already in the spec). If that is for some reason ''not feasible/possible'' for an Endpoint, it MAY restrict the search to whatever scope it thinks it can handle ''and'' MUST issue a non-fatal diagnostics (uri tbd), like "collection set too large. query context automatically adjusted" (and include the PIDs of the collections comma seperated in the details field. Furthermore, we should introduce a fatal diagnostic "collection set too large. cannot perform query" (uri tdb)to be issued by the Endpoint, if it feels the Client requested too many collections with `x-clarin-fcs-context`. \\ I would like to avoid a situation, where an Endpoint needs special treatment (e.g. by unconditionally requireing `x-clarin-fcs-context`) to return any results. With the solution above, we could get at least some results and the Endpoint can indicate, that users need to "do more" to search broader.
     822   * [[teckart|Thomas (ASV)]]: That is a good idea. With the old specification we decided to just restrict the search in these cases to a "default resource" without additional information for the user (=very ugly). Specifying a diagnostic would solve our load problem and would guarantee that the client knows about the situation.