Version 11 (modified by 12 years ago) (diff) | ,
---|
FCS - Feature Matrix
This shall be a comprehensive list of individual features that every FCS-compliant search engine must/can implement. However it is currently out of sync with the current specification discussed in detail in FCS-specification.
SRU/CQL proposes conformance levels, but this is too coarse-grained and fuzzy. We try to be clearer here and go down to the level of individual features.
Obviously the list is not complete. TODO: complete
- explain response
- Provide information about the configuration of the service, mainly about available indices and defaults.
We have to examine, if
explain
-record can be the single authoritative source of information about a repository, that would carry all the configuration information discussed below.
Query
- simple term search
-
accept a simple term query
query=word
TODO: decide substring or full-word TODO: decide when searching in lexica search only in the lemma-list or in the explanations as well (full-text) - wildcards
-
support wildcards like
wor* *ove *oo* ?ool b?t
There could be also a version encoded with the help ofrelation
like:index startsWith wor
- index search
-
support searchClause-queries:
index relation searchTerm
See more about indices in next chapter.
- supported relations single term
-
what are the allowed relations in in a query of the form
index rel single-search-term
:=
- exact match (on token? on annotation?)regex
? - regular expression
- supported relations multi-term
-
for a search clause with multiple terms (like "rat hat cat") SRU proposes:
any
- any of the terms (OR)all
- all of the terms (AND)adj
- terms in that order - a phraseall/window/#N
- terms within a given window (#N) (SRU/CQL 2.0 proposal)
- honor search context
-
Allow to restrict search to specific collections/resources specified as a parameter (see SearchContext)
?x-cmd-context={list[Resource-PID]}
- honor VC as search context
- able to process a virtual collection as means of restricting the search-context.
- boolean search
-
AND OR PROX
Indices
There are following operations on indices:
search
scan
(/aggregate
)output
(/group
, /sort
)
It needs to be explicated in the description of the repository, which operations are supported on which indices.
In SRU/CQL indices are defined in context sets. We plan to introduce following context sets:
- ccs
-
content indices like:
kwic, pos, lemma
TODO: we need some distinction between a lemma as one annotation layer in the full-text content and as head of a lexicon entry. - isocat
-
supporting index-search on
isocat
data categories. (Mapping internal indices toisocat
.) - cmd
- content search supporting also MD-filters. ("intensional filter" - see CDMDC)
We should agree on some basic set of MD-indices at least for
output
that "should" be implemented to make life easier to software and users. Hot candidates are:
- title/name (default string representation of the resource)
- language
- resource-type
- creation/publication date
- or simply inspire by the VLO-facets
Additionally especially regarding metadata indices (the cmd
-context set) we have to agree how to use existing context sets like dublincore.
Search Result
- provide result in ccs:Resource-format
- FederatedSearch/ccsResource.xsd
- provide resource reference
-
Resource@pid
- provide DataView@type=X
- In which formats can you provide the results?
- provide DataView@type=kwic
- Can you provide basic keyword in context view?
- provide DataView@type=metadata
- Do you provide information about the Resource (and/or ResourceFragment)? What kind of information (what metadata-fields - describe by @schema parameter?)
- provide link to CMD-metadata
- Fill ccs:DataView@pid/@ref with reference to CMD-record.
- provide CMD-metadata
-
alternatively resolve and embed the CMD-record
<ccs:DataView type="metadata" schema="{cmd-profile specific schema}"><cmd:CMD>
supporting operations
- scan indices
-
implement
scan
-operation allowing to query the terms available in individual indices