Changes between Version 12 and Version 13 of FCS-specification


Ignore:
Timestamp:
04/17/12 13:40:29 (12 years ago)
Author:
dietuyt
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FCS-specification

    v12 v13  
    5959The aggregator exists of 2 parts:
    6060
    61  * a backend, responsible for all communication with the end points
    62  * a frontend, the web GUI that provides an end user interface to the functionality of the backend
    63 
    64 === Input for the aggregator ===
    65 
    66 To restrict a content search to certain (sub)collections the aggregator accepts 2 input formats:
    67 
    68  * a list of [MdSelfLink, endpoint URL] pairs in JSON
    69    * in case the client only knows the MdSelfLink it can retrieve the endpoint URL via the ResourceProxy with the mimetype "application/sru+xml" that can be found when resolving the MdSelfLink
    70  * a CMDI file based on the [http://catalog.clarin.eu/ds/ComponentRegistry?item=clarin.eu:cr1:p_1271859438175 VirtualCollection profile] ('''this profile will be changed in the near future''')
     61 * a '''backend''', responsible for all communication with the end points
     62 * a '''frontend''', the web GUI that provides an end user interface to the functionality of the backend
     63
     64=== Input for the aggregator backend ===
     65
     66The aggregator backend is also an SRU/CQL server, however it does not function as an endpoint. Instead it distributes the incoming queries to the endpoints it knows and then aggregates the results.
     67
     68* How does the aggregator know the endpoints?
     69  * It knows the endpoints by querying the [wiki:CenterRegistry CLARIN center registry]
     70  * It also accepts links to CLARIN-compatible endpoints explicitly given as a parameter (see below)
     71
     72* How can the aggregator backend restrict a content search to certain (sub)collections?
     73  * With a list of [MdSelfLink, endpoint URL] pairs in JSON, sent as x-aggregation-context parameter for !SearchRetrieve
     74    * in case the client only knows the MdSelfLink it can retrieve the endpoint URL via the ResourceProxy with the mimetype "application/sru+xml" that can be found when resolving the MdSelfLink
    7175
    7276An example of the JSON pairs:
    7377
    74 {{{#!js
     78{{{
    7579{
    7680    "hdl:1839/00-0000-0000-0003-467E-9": "http://cqlservlet.mpi.nl",
     
    7983}
    8084}}}
     85
     86An example of the JSON directly above used to restrict the search at the aggregator (using HTTP POST, and searching for the string "bellen"):
     87{{{
     88POST http://aggregator.clarin.eu
     89operation: searchRetrieve
     90version: 1.2
     91query: bellen
     92x-aggregation-context: {"hdl:1839/00-0000-0000-0003-467E-9":"http://cqlservlet.mpi.nl","hdl:1839/00-0000-0000-0003-4682-F": "http://cqlservlet.mpi.nl","hdl:1839/00-0000-0000-0003-4692-D":"http://cqlservlet.mpi.nl"}
     93}}}
     94
     95
     96'''Note 1:''' the endpoint URL needs to be registered in the CenterRegistry (to prevent the risk of DDOS attacks and privilege escalations via the aggregator)
     97'''Note 2:''' Scalability:
     98 * an example post of 100.000 pairs could result in a post of about 5 MB (should work)
     99 * the most expensive operation will take place at the end points: correctly restricting the search given a list of metadata collections
    81100
    82101=== Referring to an SRU endpoint from a CMDI file ===