wiki:CDMDC

Version 5 (modified by vronk, 13 years ago) (diff)

--

Combined Distributed Metadata Content Search

(The related chapters in the FederatedSearch doc are 4.6 Combined Metadata and Content Query and 6.3 Combined Metadata Content Search.)

A combined metadata content search shall allow to restrict the context of the content search by metadata filters. Simple examples:

 Find all occurrences where a female Actor said “Ja”. 
 Find all occurrences of “viable system” in texts where the Organisation responsible for the collection the text is part of is a university.

Generally it should be possible to formulate the whole query (both metadata and content part) in one query string (with the same CQL-syntax).

This be realized as a two phase process:

  1. metadata search – restricts the candidate collections to search in, based on metadata part of the query.

It returns a list of candidates, which is used in:

  1. content search – by the federated content search, to iterate through and issue the content query to each candidate in turn.

Control Flow in a Combined Metadata Content Search

Linking MDRecords and Content Services

The main challenge seems to be how to relate the metadata records with the (endpoints of the) repositories/search engines, so that it is possible to map from a MD-result to the candidate repositories to search in.

Variant 1: Collection's MDRecord points to the Content Service

A basic solution is that the ResourceRef? of the Collection MDRecord points to the endpoint of given search engine, but this seems to simplistic.

Variant 2: Separate MDRecord Repository

Define a separate CMD-Profile Repository (see RepositoryRegistry, that carries

  1. the URL of the Content Service endpoint
  2. references to the MDRecords (<ResourceRef>) of Collections and/or Resources that are searchable via this Service
  3. any further technical information about the service

It shouldn't carry any information about the content. That should be maintained separately in the MDRecords for the referenced Collections and Resources.

Variant 3: Use Virtual Collection

MD-Search could be packed as a virtual collection, sending only a reference of it to the content search.

TODO: This needs to be elaborated further.

Attachments (5)

Download all attachments as: .zip