wiki:SearchContext

Version 5 (modified by vronk, 13 years ago) (diff)

--

Search Context

Allow to restrict the content search on specific Repositories/Collections/Resources.

We propose an extension parameter in the protocol:

  ?x-cmd-context={list[PID]}

This has following aspects:

extensional
the requesting client provides a list of identifiers as a filter.
This requires an index of collections/resources exposed by the Content Provider, which could/should be done via publishing the MD-Records.
intensional
the client restricts the query by some MD filter expressed as part of the combined MD/Content-query. The extension parameter is not used/supported.
See more in CDMDC.
on individual Repository
the default use of the extension parameter: Restrict the search to selected (sub-)collections or resources within the repository, by providing an explicit list of PIDs.
on the aggregating component
which additionally needs to know the mapping between collections/resources and the repository they are provided by. That means that value of x-cmd-context would have to be of the form:
 list[Repository-endpoints 
         list[Resource-PIDs]
     ]
Based on this information the Aggregator-component could formulate the requests for individual endpoints, passing the corresponding Resource-PIDs in the x-cmd-context-parameter.

Scenarios how to get to a resource

link to resource
the (resolvable) Resource-handle anywhere, send by email etc. Content Provider is responsible for suitable displaying.
  
MDRecord with resolvable Resource-handle
MDRecord for given Resource is published (via harvesting within CMDI-Infrastructure) User can search by metadata and traverse from MD-Record in the MD-result to the Resource. The resource could be either reached again by resolving/following the Resource-handle or the MD-Search component could handle the resource-link to redirect it to a viewer. The decision which viewer to use for which resources, would be configured within the MD-Search component (user-customizable) and based e.g. on the CMD-Profile of the MDRecord.
Content query
The user performs a (federated) content search and as a result is provided with a KWIC-list, where each hit is pointing to the originating Resource. A sensible option seems to be to group the result by Resource. See more about KWIC-DataView.
Resource not accessible
The critical part is, how to handle situations, where the Resource is not accessible at all (i.e. irrespectively of access-rights)? This is often the case for written corpora, where the individual texts of the corpus are not available directly. Possible solutions:
  • Link to the MD-Record of the resource
  • Link to the MD-Record of the collection
  • If available Link to a part of the Resource that is available (Sometimes at least a bit bigger context-snippet around the keyword is available

Corpus as a Resource