Opened 9 years ago

Closed 9 years ago

#774 closed defect (fixed)

Index more fields for basic Text Search

Reported by: stranak@ufal.mff.cuni.cz Owned by: Twan Goosen
Priority: major Milestone: VLO-3.3
Component: VLO web app Version:
Keywords: Cc: Sander Maijers

Description

Searching in for "Czech corpus" returns 4 records, all of which have these words in Description. However browsing via facets for "Language:Czech AND ResourceType:Corpus" returns 72 results.

So, for the text search to be really usable, it must index more fields.

Change History (6)

comment:1 Changed 9 years ago by DefaultCC Plugin

Cc: Sander Maijers added

comment:2 Changed 9 years ago by stranak@ufal.mff.cuni.cz

Component: AAIVLO web app

comment:3 Changed 9 years ago by Jörg Knappen

... and searching for "czech AND corpus" in the VLO search field gives 20 hits (with the current production VLO) ...

comment:4 Changed 9 years ago by Twan Goosen

Owner: set to Twan Goosen
Status: newassigned

comment:5 Changed 9 years ago by go.sugimoto@oeaw.ac.at

Dear developers,

Just a clarification.
Can somebody explain what fields are indexed at the moment?
Also, how they are rated for the search results?
I think we cannot/should not resolve this issue without the information.
Thank you very much in advance.

comment:6 Changed 9 years ago by Twan Goosen

Milestone: VLO-3.3
Resolution: fixed
Status: assignedclosed

The current stable version of the vlo (3.2.2) searches for the literal query phrase. So when searching for "Czech corpus" you will only find records that have these two words next to each other in that order in the full text field, i.e. anywhere in the record.

The upcoming 3.3 version, currently in beta, uses the ExtendedDisMax parser which behaves quite differently and defaults to and "AND" interpretation of the entered search terms. See http://beta-vlo.clarin.eu/search?q=Czech+corpus for the result for this particular example (currently 264 hits).

Replying to go.sugimoto@…:

Can somebody explain what fields are indexed at the moment?
Also, how they are rated for the search results?

This is defined in the this section of the Solr configuration, in particular the lines

<str name="qf">
  text name^8 description^4 keywords^2 language^2 country^2 organisation^2 subject^2 collection^1 modality^1 genre^1 continent^.5 id^.1
</str>

This is the complete list of fields included in the search, with numbers representing the weight of each of these fields, which determines the order in which the results are provided (e.g. a hit in name or description will significantly 'bump' up a particular record).

Given the changes in VLO 3.3, I think this ticket can be closed as resolved. I'm doing this now, but feel free to reopen the ticket if you think it is still relevant.

Note: See TracTickets for help on using tickets.