wiki:SearchContext

Search Context

Allow to restrict the content search on specific Repositories/Collections/Resources.

We propose an extension parameter in the protocol:

  ?x-cmd-context={list[PID]}

This has following aspects:

extensional
the requesting client provides a list of identifiers as a filter.
This requires an index of collections/resources exposed by the Content Provider, which could/should be done via publishing the MD-Records.
intensional
the client restricts the query by some MD filter expressed as part of the combined MD/Content-query. The extension parameter is not used/supported.
See more in CDMDC.
on individual Repository
the default use of the extension parameter: Restrict the search to selected (sub-)collections or resources within the repository, by providing an explicit list of PIDs.
on the aggregating component
which additionally needs to know the mapping between collections/resources and the repository they are provided by. That means that value of x-cmd-context would have to be of the form:
 list[Repository-endpoints 
         list[Resource-PIDs]
     ]
Based on this information the Aggregator-component could formulate the requests for individual endpoints, passing the corresponding Resource-PIDs in the x-cmd-context-parameter.

Scenarios how to get to a resource

link to resource
the (resolvable) Resource-handle anywhere, send by email etc. Content Provider is responsible for suitable displaying.
  
MDRecord with resolvable Resource-handle
MDRecord for given Resource is published (via harvesting within CMDI-Infrastructure) User can search by metadata and traverse from MD-Record in the MD-result to the Resource. The resource could be either reached again by resolving/following the Resource-handle or the MD-Search component could handle the resource-link to redirect it to a viewer. The decision which viewer to use for which resources, would be configured within the MD-Search component (user-customizable) and based e.g. on the CMD-Profile of the MDRecord.
Content query
The user performs a (federated) content search and as a result is provided with a KWIC-list, where each hit is pointing to the originating Resource. A sensible option seems to be to group the result by Resource. See more about KWIC-DataView.
Resource not accessible
The critical question is, how to handle situations, where the Resource is not accessible at all (i.e. irrespectively of access-rights). This is often the case for written corpora, where the individual texts of the corpus are not available directly. Optimally this situation shouldn't arise at all - if there is Metadata-record it should contain a link to the resource (or any sensible proxy) and if the resource itself is not accessible (like a piece of text in a big text-corpus), it shouldn't be announced by a separate MD-Record to the outer world at all. But if it still occurs, possible solutions are:
  • Link to the MD-Record of the resource
  • Link to the MD-Record of the parent collection
  • Link to a part of the Resource that is available (Sometimes in a text-corpus at least a bit bigger context-snippet around the keyword is available.)

Corpus as a Resource

As mentioned in previous chapter (under Resource not accessible) if individual items of a corpus are not accessible at all, they shouldn't be exposed via corresponding MD-Records. In such cases it seems more practical to provide just one MD-Record for the whole corpus, describing the corpus as whole, including coverage across various dimensions (time, space, genre) and linking to an endpoint, where the corpus can be searched in.

Last modified 13 years ago Last modified on 08/29/11 12:53:30