Opened 9 years ago
Closed 9 years ago
#774 closed defect (fixed)
Index more fields for basic Text Search
Reported by: | stranak@ufal.mff.cuni.cz | Owned by: | Twan Goosen |
---|---|---|---|
Priority: | major | Milestone: | VLO-3.3 |
Component: | VLO web app | Version: | |
Keywords: | Cc: | Sander Maijers |
Description
Searching in for "Czech corpus" returns 4 records, all of which have these words in Description. However browsing via facets for "Language:Czech AND ResourceType:Corpus" returns 72 results.
So, for the text search to be really usable, it must index more fields.
Change History (6)
comment:1 Changed 9 years ago by
Cc: | Sander Maijers added |
---|
comment:2 Changed 9 years ago by
Component: | AAI → VLO web app |
---|
comment:3 Changed 9 years ago by
comment:4 Changed 9 years ago by
Owner: | set to Twan Goosen |
---|---|
Status: | new → assigned |
comment:5 Changed 9 years ago by
Dear developers,
Just a clarification.
Can somebody explain what fields are indexed at the moment?
Also, how they are rated for the search results?
I think we cannot/should not resolve this issue without the information.
Thank you very much in advance.
comment:6 Changed 9 years ago by
Milestone: | → VLO-3.3 |
---|---|
Resolution: | → fixed |
Status: | assigned → closed |
The current stable version of the vlo (3.2.2) searches for the literal query phrase. So when searching for "Czech corpus" you will only find records that have these two words next to each other in that order in the full text field, i.e. anywhere in the record.
The upcoming 3.3 version, currently in beta, uses the ExtendedDisMax parser which behaves quite differently and defaults to and "AND" interpretation of the entered search terms. See http://beta-vlo.clarin.eu/search?q=Czech+corpus for the result for this particular example (currently 264 hits).
Replying to go.sugimoto@…:
Can somebody explain what fields are indexed at the moment?
Also, how they are rated for the search results?
This is defined in the this section of the Solr configuration, in particular the lines
<str name="qf"> text name^8 description^4 keywords^2 language^2 country^2 organisation^2 subject^2 collection^1 modality^1 genre^1 continent^.5 id^.1 </str>
This is the complete list of fields included in the search, with numbers representing the weight of each of these fields, which determines the order in which the results are provided (e.g. a hit in name or description will significantly 'bump' up a particular record).
Given the changes in VLO 3.3, I think this ticket can be closed as resolved. I'm doing this now, but feel free to reopen the ticket if you think it is still relevant.
... and searching for "czech AND corpus" in the VLO search field gives 20 hits (with the current production VLO) ...