wiki:FCS-Aggregator

Version 6 (modified by vronk, 13 years ago) (diff)

--

One component of the FederatedSearch Infrastructure is the aggregator - a service that accepts the queries, distributes them onto target repositories, collects and merges the partial results and passes an aggregated result back to the client.

There is now one quick&dirty implementation by Herman Stehouwer hosted at mpi.nl: http://lux17.mpi.nl/ds/fedsearch/.

But we should really spend some time to investigate and try to use the components from the very mature and really widely used Zebra/YAZ/Masterkey-framework. Especially Pazpar2 - the metasearching middleware - seems a great candidate for exactly what we intend. This framework is originally built on the powerful, but complex Z39.50-protocol, but most components (in particular Pazpar2) talk also SRU/SRW.

There is currently one instance of pazpar2 running test-wise on the clarin-at server: http://clarin.aac.ac.at/pazpar2/jsdemo1/ , which however connects to some of the target repositories already provided with the pazpar2-distribution. There are really many big libraries and digital collections available, like Library of Congress, OAIster, etc. Currently the instance connects to a number of libraries (set up in edu.xml).

Conformance issues

So the next step would be to adapt the configuration, so that it reads our search services. I tried to connect to the Annex/Trova SRU-interface, but I keep getting errors. I also tried to connect via yaz-client (simple console client from the Zebra/yaz-suite), which also keeps complaining. One error message among others is Content type does not appear to be XML and indeed all the answers from the annex-sru-service are mime-type: text/plain. So this would be the first thing to change. However also when tried with our internal sru-service-prototype, which at least delivers the result as text/xml, I still got errors. I even tried to return the explain-response from LoC-SRU - to no avail.

It also seems to be a problem to have multi-level path to the service. yaz interprets everything after the port as a databasename and escapes slashes. So as next debugging step we should try a base-uri that hase only single-step path.

Now I was able to access mpi and icltt - endpoints at least via yaz-client. So the two things to remember:

  1. simple base-path (everything after domain is interpreted as database-name (and slashes are escaped)
  2. Content-Encoding: text/xml

In general we would need better ways of debugging the connection. One starting point is the SRU Base Profile, which specifies the minimal requirements, however this is of little help for the real debugging, to find which little piece is jamming.

Remarks to Installation of pazpar2/yaz-client on Linux

(on OS: openSuse 11.2)

There were sw-packages available via opensuse-distributions/yast, but they were outdated (3.0.44) (uninstalled those). There are also various packages in the indexdata-repository, but they kept missing some libraries.

So

  1. downloaded latest sources of yaz-4.1.7 and pazpar2-1.5.6.
  2. tried simple:
     ./configure 
     make  
     make install
    
    But when tried yaz-client, pazpar2 they failed with missing shared library: "error while loading shared libraries: libyaz_icu.so.4:" (although it was available under /usr/local/lib)
  3. Then tried various configurations (always with make uninstall) and finally working:
     ./configure --disable-shared --with-icu --with-xml2 --with-xslt
    
    It disables shared-objects, -xml2 and -xslt options is said in the docs to be necessary for SRU-support.
  4. setup edu.xml as the targets-configuration in `pazpar2/etc/default.xml
  5. started pazpar2:
     cwd:pazpar2/etc> ../src/pazpar2 -f pazpar2.cfg
    
  6. copied pazpar2/www/test1,jsdemo to /srv/www/htdocs
  7. added ReverseProxy in apache setup (in httpd.conf.local)
    (see also pazpar2-docs#apache2proxy
     ProxyPass /pazpar2 http://localhost:9004
     ProxyVia Off
     ProxyPassReverse /pazpar2 http://localhost:9004
     ProxyPassReverseCookieDomain localhost corpus5.aac.oeaw.ac.at
     ProxyPassReverseCookiePath / /pazpar2
    
  8. try under: http://clarin.aac.ac.at/pazpar2/jsdemo1/

Attachments (1)

Download all attachments as: .zip