61 | | * a backend, responsible for all communication with the end points |
62 | | * a frontend, the web GUI that provides an end user interface to the functionality of the backend |
63 | | |
64 | | === Input for the aggregator === |
65 | | |
66 | | To restrict a content search to certain (sub)collections the aggregator accepts 2 input formats: |
67 | | |
68 | | * a list of [MdSelfLink, endpoint URL] pairs in JSON |
69 | | * in case the client only knows the MdSelfLink it can retrieve the endpoint URL via the ResourceProxy with the mimetype "application/sru+xml" that can be found when resolving the MdSelfLink |
70 | | * a CMDI file based on the [http://catalog.clarin.eu/ds/ComponentRegistry?item=clarin.eu:cr1:p_1271859438175 VirtualCollection profile] ('''this profile will be changed in the near future''') |
| 61 | * a '''backend''', responsible for all communication with the end points |
| 62 | * a '''frontend''', the web GUI that provides an end user interface to the functionality of the backend |
| 63 | |
| 64 | === Input for the aggregator backend === |
| 65 | |
| 66 | The aggregator backend is also an SRU/CQL server, however it does not function as an endpoint. Instead it distributes the incoming queries to the endpoints it knows and then aggregates the results. |
| 67 | |
| 68 | * How does the aggregator know the endpoints? |
| 69 | * It knows the endpoints by querying the [wiki:CenterRegistry CLARIN center registry] |
| 70 | * It also accepts links to CLARIN-compatible endpoints explicitly given as a parameter (see below) |
| 71 | |
| 72 | * How can the aggregator backend restrict a content search to certain (sub)collections? |
| 73 | * With a list of [MdSelfLink, endpoint URL] pairs in JSON, sent as x-aggregation-context parameter for !SearchRetrieve |
| 74 | * in case the client only knows the MdSelfLink it can retrieve the endpoint URL via the ResourceProxy with the mimetype "application/sru+xml" that can be found when resolving the MdSelfLink |
| 85 | |
| 86 | An example of the JSON directly above used to restrict the search at the aggregator (using HTTP POST, and searching for the string "bellen"): |
| 87 | {{{ |
| 88 | POST http://aggregator.clarin.eu |
| 89 | operation: searchRetrieve |
| 90 | version: 1.2 |
| 91 | query: bellen |
| 92 | x-aggregation-context: {"hdl:1839/00-0000-0000-0003-467E-9":"http://cqlservlet.mpi.nl","hdl:1839/00-0000-0000-0003-4682-F": "http://cqlservlet.mpi.nl","hdl:1839/00-0000-0000-0003-4692-D":"http://cqlservlet.mpi.nl"} |
| 93 | }}} |
| 94 | |
| 95 | |
| 96 | '''Note 1:''' the endpoint URL needs to be registered in the CenterRegistry (to prevent the risk of DDOS attacks and privilege escalations via the aggregator) |
| 97 | '''Note 2:''' Scalability: |
| 98 | * an example post of 100.000 pairs could result in a post of about 5 MB (should work) |
| 99 | * the most expensive operation will take place at the end points: correctly restricting the search given a list of metadata collections |