wiki:FCS-FeatureMatrix

Version 12 (modified by vronk, 12 years ago) (diff)

--

FCS - Feature Matrix

This shall be a comprehensive list of individual features that every FCS-compliant search engine must/can implement. However it is currently out of sync with the current specification discussed in detail in FCS-specification.

SRU/CQL proposes conformance levels, but this is too coarse-grained and fuzzy. We try to be clearer here and go down to the level of individual features.

Obviously the list is not complete. TODO: complete

explain response
Provide information about the configuration of the service, mainly about available indices and defaults.

We have to examine, if explain-record can be the single authoritative source of information about a repository, that would carry all the configuration information discussed below.

Query

simple term search
accept a simple term query
 query=word
TODO: decide substring or full-word TODO: decide when searching in lexica search only in the lemma-list or in the explanations as well (full-text)
wildcards
support wildcards like
 wor*  *ove   *oo*  ?ool   b?t
There could be also a version encoded with the help of relation like: index startsWith wor
index search
support searchClause-queries:
 index relation searchTerm
See more about indices in next chapter.
supported relations single term
what are the allowed relations in in a query of the form index rel single-search-term:
  • = - exact match (on token? on annotation?)
  • regex? - regular expression
supported relations multi-term
for a search clause with multiple terms (like "rat hat cat") SRU proposes:
  • any - any of the terms (OR)
  • all - all of the terms (AND)
  • adj - terms in that order - a phrase
  • all/window/#N - terms within a given window (#N) (SRU/CQL 2.0 proposal)
honor search context
Allow to restrict search to specific collections/resources specified as a parameter (see SearchContext)
 ?x-cmd-context={list[Resource-PID]}
honor VC as search context
able to process a virtual collection as means of restricting the search-context.
boolean search
 AND OR PROX

Indices

There are following operations on indices:

  • search
  • scan (/aggregate)
  • output (/group, /sort)

It needs to be explicated in the description of the repository, which operations are supported on which indices.

In SRU/CQL indices are defined in context sets. We plan to introduce following context sets:

ccs
content indices like:
 kwic, pos, lemma
TODO: we need some distinction between a lemma as one annotation layer in the full-text content and as head of a lexicon entry.
isocat
supporting index-search on isocat data categories. (Mapping internal indices to isocat.)
cmd
content search supporting also MD-filters. ("intensional filter" - see CDMDC)

We should agree on some basic set of MD-indices at least for output that "should" be implemented to make life easier to software and users. Hot candidates are:

  • title/name (default string representation of the resource)
  • language
  • resource-type
  • creation/publication date
  • or simply inspire by the VLO-facets

Additionally especially regarding metadata indices (the cmd-context set) we have to agree how to use existing context sets like dublincore.

Search Result

provide result in ccs:Resource-format
FederatedSearch/ccsResource.xsd
provide resource reference
Resource@pid
provide DataView@type=X
In which formats can you provide the results?
provide DataView@type=kwic
Can you provide basic keyword in context view?
provide DataView@type=metadata
Do you provide information about the Resource (and/or ResourceFragment)? What kind of information (what metadata-fields - describe by @schema parameter?)
provide link to CMD-metadata
Fill ccs:DataView@pid/@ref with reference to CMD-record.
provide CMD-metadata
alternatively resolve and embed the CMD-record
   <ccs:DataView type="metadata" schema="{cmd-profile specific schema}"><cmd:CMD>

supporting operations

scan indices
implement scan-operation allowing to query the terms available in individual indices