wiki:SearchContext

Version 2 (modified by vronk, 13 years ago) (diff)

intermediary work in progress revision

Search Context

Allow to restrict the content search on specific Repositories/Collections/Resources?.

This has following aspects:

on individual Repository
restrict the search to selected (sub-)collections or resources within the repository. The Repository has to expose the index of collections/repositories. This could/should be done via exposing the MD-Records.
on the aggregating component
Here it is additionally necessary to know the mapping between collections/resources and the repository they are provided by.
extensional
the requesting client provides a list of identifiers as a filter
intensional
the client restricts the query by some MD filter. See more on CDMDC.

There seem to be only two viable ways of resolving a combined MD Content query:

A) the target repository is capable of extensional filter
In that case the Resource-PIDs can be extracted from the result of the MD-query and passed to the target repository as filter.
B) the target repository is capable of intensional filter
In that case the MD-query can be directly passed to the target repository. If Metadata is available about the resources in the repository a MD-query in the MD-Repository should be performed to check if it has sense to send the query to given repository in the first place.

In any case there has to be knowledge about the capabilities of the target repository, that the CDMDC-logic can act upon.

Scenarios how to get to a resource:

link to resource
the (resolvable) Resource-handle anywhere, send by email etc. Content Provider is responsible for suitable displaying.
  
MDRecord with resolvable Resource-handle
MDRecord for given Resource is published (via harvesting within CMDI-Infrastructure) User can search by metadata and traverse from MD-Record in the MD-result to the Resource. The resource could be either reached again by resolving/following the Resource-handle or the MD-Search component could handle the resource-link to redirect it to a viewer. The decision which viewer to use for which resources, would be configured within the MD-Search component (user-customizable) and based e.g. on the CMD-Profile of the MDRecord.
Content query
The user performs a (federated) content search and as a result is provided with a KWIC-list, where each hit is pointing to the originating Resource. A sensible option seems to be to group the result by Resource. See more about KWIC-DataView.
Resource not accessible
The critical part is, how to handle situations, where the Resource is not accessible at all (i.e. irrespectively of access-rights)? This is often the case for written corpora, where the individual texts of the corpus are not available directly. Possible solutions:
  • Link to the MD-Record of the resource
  • Link to the MD-Record of the collection
  • If available Link to a part of the Resource that is available (Sometimes at least a bit bigger context-snippet around the keyword is available

Corpus as a Resource