Opened 11 years ago

Closed 10 years ago

#392 closed task (fixed)

Content search in result set - Aufgabe von BAS, BBAW, HZSK, IDS, Saarbrücken, Stuttgart

Reported by: kathrin.beck@uni-tuebingen.de Owned by: Thorsten Trippel
Priority: major Milestone:
Component: MetadataCuration Version:
Keywords: Federated Content Search Cc: clarin-vlo-tf@sfs.uni-tuebingen.de, teckart

Description

Ein Link "Content search in result set" vom VLO auf den FCS ist bisher ressourcenübergreifend nur von Leipzig, MPI und Tübingen implementiert. Er fehlt noch bei den Partnern:
BAS, BBAW, HZSK, IDS, Saarbrücken, Stuttgart

Leipzig hat den Link folgendermaßen über CMDI eingebunden:

<ResourceProxy? id="SOME_ID">
<ResourceType? mimetype="application/sru+xml">SearchService?</ResourceType?>
<ResourceRef?>http://clarinws.informatik.uni-leipzig.de:8080/CQL</ResourceRef?>
</ResourceProxy?>

Vielen Dank und viele Grüße,

Kathrin

Change History (17)

comment:1 Changed 11 years ago by DefaultCC Plugin

Cc: teckart added

comment:2 Changed 11 years ago by knappen

Is this the right way to go? The information is already in the center registry, shouldn't the VLO retrieve it from there? Otherwise, we have "doppelte Datenhaltung" in center registry and cmdi metadata files.

comment:3 Changed 11 years ago by teckart

At least this was consensus in the past. It's also part of the CLARIN-FCS-documentation (ReferringtoanSRUendpointfromaCMDIfile).

Just using the center registry wouldn't be a solution as the center description only contains the link to the SRU/CQL-interface without information which resources are actually acessible via this endpoint.

comment:4 Changed 11 years ago by kathrin.beck@uni-tuebingen.de

Dear all,
I am afraid I cannot help with technical details - I'm only interacting from the "user perspective". Thomas Eckart and Kees Jan van der Looy (among others) should be able to answer all your questions.

I just wanted to point out that there seems to be a problem that should be fixed.

Best and thanks to you all for your integration efforts,

Kathrin

comment:5 Changed 11 years ago by dietuyt

Component: VLOMetadataCuration

As Thomas already mentioned, the use of the SearchService? ResourceProxy? is the way to have FCS-links in the VLO. So it is each center's responsibility to add them. Full documentation and examples are available at https://trac.clarin.eu/wiki/FCS-specification

Moving this ticket to Metadata Curation as it is something that needs to be added to the CMDI files of the CLARIN centres.

comment:6 Changed 11 years ago by hedeland

Still it's not quite clear to me what is required from us. We had the "Content search in result set" thing working by M24 and it seems to be working still, just that it's not that obvious that the resource is "checked" since the aggregator start page when coming from the VLO is (now?) "About" and not "Search options". And we still can't include all our resources in the FCS due to access restrictions. Do I need to change anything anywhere?

comment:7 Changed 11 years ago by knappen

UdS metadata are now fixed, the next harvest will have them with SRU/CQL endpoint information.

comment:8 Changed 11 years ago by schiel@bas.uni-muenchen.de

We spend some time to understand why the FCS in the VLO does not work, although we have inserted ResourceProxy? entries as requested. We also followed the trac documentation mentioned above by Dieter but could not find any explanation how the VLO currently creates the FCS links on the three different locations:

  1. 'search in results left below facets' : only Leipzig data, seems to be a hack, URL contains no PID(s)
  1. 'search in results right below result list' : seems to appear, when at least one found record contains a sru ResourceProxy?, URL contains no PID(s) so it is not clear how the aggregator knows in which corpora to search.


  1. 'search in VLO record after the ResourceProxy? Link list' : e.g. HZSK corpus HAMATAC. This URL contains the PID of the resource; makes sense.

The FCS link created in BAS VLO record contains the correct PID of the found record and the correct servlet address of the BAS FCS endpoint, but the aggregator is opened by the VLO without any useful pre-selection; instead the 'About' page is shown (as pointed already out by Hanna).

Number 1 is a hack that works only for Leipzig MD.
Number 2 might work, if all PIDs of the found resources can be passed to the aggregator (how?). Currently it looks like a hack that works only if one PID is found in the results (as with HAMATAC).
Number 3 might work for resources that

  • have a known PID in the aggregator
  • can be searched

It will of course not work with all the CMDI records that are not known to the aggregator (e.g. all recoding sessions)

Currently, number 3 works with HAMATAC but not with BAS (e.g. VM2) or MPI (e.g. MPI CGN) CMDI records:
We suspect that the reason for this is the '@format=cmdi' part in the PID of BAS and MPI (which by the way is the correct way to address a CMDI file).
The VLO must therefore strip this from the PID before passing it to the FCS.

Shall I put this into a separate ticket under VLO?

Last edited 11 years ago by schiel@bas.uni-muenchen.de (previous) (diff)

comment:9 Changed 11 years ago by knappen

Another point to consider: There is not necessarily a one-to-one mapping from resources to endpoints. We (UdS) are currently running one endpoint per resource for technical reasons; but there can be one endpoint for several resources. I can even think of having more than one endpoint for a given resource, e.g., separate endpoints for subcorpora or separate search engines with different behaviour.

comment:10 Changed 11 years ago by jstegmann

changes made to IMS Stuttgart metadata in the repository. FCS information should show up after the next VLO harvest. further investigation on our side afterwards ...

comment:11 Changed 11 years ago by teckart

@FSchiel: Just a short description of the current functionality:
the VLO supports two ways of redirects to the FCS aggregator:

  • Facted Search Page (=Startpage): if you narrow down the result set to contain less than 200 resources with an SRU/CQL-Link in their respective CMDI file (200 is of course just an arbitrary limit), than you will get a link "Content search in result set" which sends a POST request to the aggregator with the identifiers of ALL these resources contained in the result set. Resources in the result set than do not contain a SearchService?-ResourceProxy? are omitted. In the next version of the VLO the link will also contain the information how many resources will be actually queried. Different positions of this link are just due to layouting problems, that have to be fixed.
  • ResultPage? (of a single resource): this page contains a "Content search in resource" link if its CMDI file contains a SearchService? ResourceProxy?. The redirect is again a POST request to the aggregator, but only with the information about endpoint and identifier of this single resource. It is an error, if you see such a link for a resource without SearchService? ResourceProxy?. Please send me an example, so that I can check where the problem is.

Btw. to see the "About" page is the default behaviour of the aggregator, even if the preselect functionality did work.

comment:12 Changed 11 years ago by schiel@bas.uni-muenchen.de

After some hacking of PIDs the FCS link on BAS resources works now (thanks to Thomas, Yana. Uwe and Thomas). We also removed the entry to the FCS in the CMDIs that are not searchable in the FCS. For the BAS this ticket can therefore considered to be closed.

comment:13 Changed 10 years ago by Thorsten Trippel

Owner: changed from kathrin.beck@uni-tuebingen.de to Thorsten Trippel
Status: newassigned

comment:14 Changed 10 years ago by Sander Maijers

Can this ticket be closed?

comment:15 Changed 10 years ago by schiel@bas.uni-muenchen.de

Regarding the BAS this ticket can be closed.

Florian

comment:16 Changed 10 years ago by Jörg Knappen

For UdS, the ticket can be closed, too.

Maybe it is the best to close this ticket and re-issue remaining problems (if there are any) as new tickets.

comment:17 Changed 10 years ago by teckart@informatik.uni-leipzig.de

Resolution: fixed
Status: assignedclosed

Closing the ticket as the redirect from VLO to FCS should work now for all centres (when information is available in CMDI & via SRU Scan). For all new problems/feature requests: please create a new ticket.

Note: See TracTickets for help on using tickets.