Context Navigation

Changes between Version 12 and Version 13 of FCS-specification

Timestamp:: 04/17/12 13:40:29 (12 years ago)
Author:: dietuyt
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

FCS-specification

-                      v12
+                      v13
 The aggregator exists of 2 parts:
+ * a backend, responsible for all communication with the end points
+ * a frontend, the web GUI that provides an end user interface to the functionality of the backend
+=== Input for the aggregator ===
+To restrict a content search to certain (sub)collections the aggregator accepts 2 input formats:
+ * a list of [MdSelfLink, endpoint URL] pairs in JSON
+   * in case the client only knows the MdSelfLink it can retrieve the endpoint URL via the ResourceProxy with the mimetype "application/sru+xml" that can be found when resolving the MdSelfLink
+ * a CMDI file based on the [http://catalog.clarin.eu/ds/ComponentRegistry?item=clarin.eu:cr1:p_1271859438175 VirtualCollection profile] ('''this profile will be changed in the near future''')
+ * a '''backend''', responsible for all communication with the end points
+ * a '''frontend''', the web GUI that provides an end user interface to the functionality of the backend
+=== Input for the aggregator backend ===
+The aggregator backend is also an SRU/CQL server, however it does not function as an endpoint. Instead it distributes the incoming queries to the endpoints it knows and then aggregates the results.
+* How does the aggregator know the endpoints?
+  * It knows the endpoints by querying the [wiki:CenterRegistry CLARIN center registry]
+  * It also accepts links to CLARIN-compatible endpoints explicitly given as a parameter (see below)
+* How can the aggregator backend restrict a content search to certain (sub)collections?
+  * With a list of [MdSelfLink, endpoint URL] pairs in JSON, sent as x-aggregation-context parameter for !SearchRetrieve
+    * in case the client only knows the MdSelfLink it can retrieve the endpoint URL via the ResourceProxy with the mimetype "application/sru+xml" that can be found when resolving the MdSelfLink
 An example of the JSON pairs:
 {{{#!js
+{{{
+{
     "hdl:1839/00-0000-0000-0003-467E-9": "http://cqlservlet.mpi.nl",
 …
+}
 }}}
+An example of the JSON directly above used to restrict the search at the aggregator (using HTTP POST, and searching for the string "bellen"):
+{{{
+POST http://aggregator.clarin.eu
+operation: searchRetrieve
+version: 1.2
+query: bellen
+x-aggregation-context: {"hdl:1839/00-0000-0000-0003-467E-9":"http://cqlservlet.mpi.nl","hdl:1839/00-0000-0000-0003-4682-F": "http://cqlservlet.mpi.nl","hdl:1839/00-0000-0000-0003-4692-D":"http://cqlservlet.mpi.nl"}
+}}}
+'''Note 1:''' the endpoint URL needs to be registered in the CenterRegistry (to prevent the risk of DDOS attacks and privilege escalations via the aggregator)
+'''Note 2:''' Scalability:
+ * an example post of 100.000 pairs could result in a post of about 5 MB (should work)
+ * the most expensive operation will take place at the end points: correctly restricting the search given a list of metadata collections
 === Referring to an SRU endpoint from a CMDI file ===