= FCS - Feature Matrix = This shall be a comprehensive list of individual features that every FCS-compliant search engine must/can implement. However it is currently out of sync with the current specification discussed in detail in [[FCS-specification]]. SRU/CQL [[http://www.loc.gov/standards/sru/specs/base-profile.html|proposes conformance levels]], but this is too coarse-grained and fuzzy. We try to be clearer here and go down to the level of individual features. ''Obviously the list is not complete. '''TODO:''' complete '' [[FCS-spec#Explainoperation|explain response]] :: Provide information about the configuration of the service, mainly about available indices and defaults. ''We have to examine, if `explain`-record can be the single authoritative source of information about a repository, that would carry all the configuration information discussed below.'' == Query == simple term search (Conformance level 0) :: accept a simple term query: a word or a phrase. {{{ query=fish query="system" query="language acquisition" query="She said \"Yes\"" }}} It is a search for occurrences of a full word or a phrase. In the example queries above ''fish'' and ''"system"'' are instances of a word, and ''"language acquisition"'' and ''"She said \"Yes\""'' are instances of a phrase. '''''TODO:''' decide when searching in lexica search only in the lemma-list or in the explanations as well (full-text)'' honor search context (Conformance level 0) :: Allow to restrict search to specific collections/resources specified as a parameter (see [[SearchContext]]) {{{ ?x-cmd-context={list[Resource-PID]} }}} wildcards (Conformance level 1-2) :: support wildcards like {{{ wor* *ove *oo* ?ool b?t }}} There could be also a version encoded with the help of `relation` like: `index startsWith wor` index search (Conformance level 1-2) :: support searchClause-queries: {{{ index relation searchTerm }}} Examples are {{{ dc.creator = anderson title adj "wonderful feelings" bib.dateIssued < 1998 }}} See more about indices in next chapter. supported relations single term (Conformance level 1-2) :: what are the allowed relations in in a query of the form `index rel single-search-term`: * `=` - exact match (''on token? on annotation?'') * `<`, `>=`, `<=`, `==` * `regex`? - regular expression supported relations multi-term (Conformance level 1-2) :: for a search clause with multiple terms (like "rat hat cat") SRU proposes: * `any` - any of the terms (OR) * `all` - all of the terms (AND) * `adj` - terms in that order - a phrase * `within` * `encloses` * `all/window/#N` - terms within a given window (#N) (SRU/CQL 2.0 proposal) '''TODO''': discuss the lists of relations above in more detail: what do we want exactly for indices? supported relation modifiers (Conformance level 1-2):: * `\stem` * `\relevant` * `\fuzzy` * `\respectCase` * `\isoDate` * `\oid` honor VC as search context (Conformance level 1-2) :: able to process a virtual collection as means of restricting the search-context. boolean search (Conformance level 1-2) :: {{{ AND OR PROX }}} Examples: {{{ system AND language system OR language system AND (language OR acquisition) system NOT language /*read: AND NOT; it is not a unary operator. */ }}} PROX is a special boolean proximity operator “allowing for the relative locations of the terms to be used in order to determine the resulting set of records”. It is also the only Boolean operator to take (Boolean) modifiers: {{{ PROX /unit = {unit} /distance {comparison_operator} {number} /ordered|unordered }}} Examples for PROX: {{{ cat prox/unit=word/distance>2/ordered hat cat prox/unit=paragraph hat }}} == Indices == There are following operations on indices: * `search` * `scan` (/`aggregate`) * `output` (/`group`, /`sort`) It needs to be explicated in the description of the repository, which operations are supported on which indices. In SRU/CQL indices are defined in context sets. We plan to introduce [[FCS-spec#ContextSets|following context sets]]: ccs :: content indices like: {{{ kwic, pos, lemma }}} '''''TODO:''' we need some distinction between a '''lemma''' as one annotation layer in the full-text content and as head of a lexicon entry.'' isocat :: supporting index-search on `isocat` data categories. (Mapping internal indices to `isocat`.) cmd :: content search supporting also MD-filters. ("intensional filter" - see [[CDMDC]]) We should agree on some basic set of MD-indices at least for `output` that "should" be implemented to make life easier to software and users. Hot candidates are: * title/name (default string representation of the resource) * language * resource-type * creation/publication date * or simply inspire by the VLO-facets Additionally especially regarding metadata indices (the `cmd`-context set) we have to agree how to use existing context sets like ''dublincore''. == Search Result == provide result in ccs:Resource-format:: [source:FederatedSearch/ccsResource.xsd] provide resource reference :: `Resource@pid` provide !DataView@type=X :: In which formats can you provide the results? provide !DataView@type=kwic :: Can you provide basic keyword in context view? provide !DataView@type=metadata:: Do you provide information about the Resource (and/or !ResourceFragment)? What kind of information (what metadata-fields - describe by @schema parameter?) provide link to CMD-metadata :: Fill ccs:DataView@pid/@ref with reference to CMD-record. provide CMD-metadata :: alternatively resolve and embed the CMD-record {{{ }}} == supporting operations == scan indices :: implement `scan`-operation allowing to query the terms available in individual indices