wiki:FCS-FeatureMatrix

Version 13 (modified by olhsha, 11 years ago) (diff)

--

FCS - Feature Matrix

This shall be a comprehensive list of individual features that every FCS-compliant search engine must/can implement. However it is currently out of sync with the current specification discussed in detail in FCS-specification.

SRU/CQL proposes conformance levels, but this is too coarse-grained and fuzzy. We try to be clearer here and go down to the level of individual features.

Obviously the list is not complete. TODO: complete

explain response
Provide information about the configuration of the service, mainly about available indices and defaults.

We have to examine, if explain-record can be the single authoritative source of information about a repository, that would carry all the configuration information discussed below.

Query

simple term search (Conformance level 0)
accept a simple term query: a word or a phrase.
 query=fish
 query="system"
 query="language acquisition"
 query="She said \"Yes\""
}}
It is a search for occurrences of a full word or a phrase. In the example queries above ''fish'' and ''"system"'' are instances of a word, and  ''"language acquisition"'' and ''"She said \"Yes\""'' are instances of a phrase.

  '''''TODO:''' decide when searching in lexica search only in the lemma-list or in the explanations as well (full-text)''

  honor search context (Conformance level 0) ::
  Allow to restrict search to specific collections/resources specified as a parameter (see [[SearchContext]])
{{{
 ?x-cmd-context={list[Resource-PID]}
}}}


 wildcards (Conformance level 1-2) ::
   support wildcards like
{{{
 wor*  *ove   *oo*  ?ool   b?t
}}}
  There could be also a version encoded with the help of `relation` like: `index startsWith wor`
   
  index search (Conformance level 1-2) ::
  support searchClause-queries:
{{{
 index relation searchTerm
}}} 

Examples are
{{{
dc.creator = anderson
title adj "wonderful feelings"
bib.dateIssued < 1998
}}}
  
 See more about indices in next chapter.

 supported relations single term ::
  what are the allowed relations in in a query of the form `index rel single-search-term`:
   * `=` - exact match (''on token? on annotation?'')
   * '<' - comparison on numeric values
   * `regex`? - regular expression

 supported relations multi-term ::
   for a search clause with multiple terms (like "rat hat cat") SRU proposes:
   * `any` - any of the terms (OR)
   * `all` - all of the terms (AND)
   * `adj` - terms in that order - a phrase
   * `all/window/#N` - terms within a given window (#N) (SRU/CQL 2.0 proposal)
 
'''TODO''': discuss the lists of relations above in more detail: what do we want exactly? 
  
 honor VC as search context  (Conformance level 1-2) ::
 able to process a virtual collection as means of restricting the search-context.


 boolean search (Conformance level 1-2) ::
{{{
 AND OR PROX
}}}

== Indices ==
There are following operations on indices:
 * `search`
 * `scan` (/`aggregate`)
 * `output` (/`group`, /`sort`)

It needs to be explicated in the description of the repository, which operations are supported on which indices.

In SRU/CQL indices are defined in context sets. We plan to introduce [[FCS-spec#ContextSets|following context sets]]:
 ccs ::
   content indices like:
{{{
 kwic, pos, lemma
}}}
   '''''TODO:''' we need some distinction between a '''lemma''' as one annotation layer in the full-text content and as head of a lexicon entry.''
 isocat ::
   supporting index-search on `isocat` data categories. (Mapping internal indices to `isocat`.)
 cmd ::
   content search supporting also MD-filters. ("intensional filter" - see [[CDMDC]])

   We should agree on some basic set of MD-indices at least for `output` that "should" be implemented to make life easier to software and users. Hot candidates are:
      * title/name (default string representation of the resource)
      * language
      * resource-type
      * creation/publication date
      * or simply inspire by the VLO-facets

Additionally especially regarding metadata indices (the `cmd`-context set) we have to agree how to use existing context sets like ''dublincore''.

== Search Result ==
 provide result in ccs:Resource-format::
   [source:FederatedSearch/ccsResource.xsd]
 provide resource reference ::
   `Resource@pid`
 provide !DataView@type=X ::
   In which formats can you provide the results?
 provide !DataView@type=kwic ::
   Can you provide basic keyword in context view?
 provide !DataView@type=metadata::
   Do you provide information about the Resource (and/or !ResourceFragment)? What kind of information (what metadata-fields - describe by @schema parameter?)
 provide link to CMD-metadata ::
   Fill ccs:DataView@pid/@ref with reference to CMD-record.
 provide CMD-metadata ::
   alternatively resolve and embed the CMD-record
{{{
   <ccs:DataView type="metadata" schema="{cmd-profile specific schema}"><cmd:CMD>
}}}

== supporting operations ==

 scan indices ::
  implement `scan`-operation allowing to query the terms available in individual indices