Opened 10 years ago

Closed 10 years ago

#509 closed task (fixed)

Store Solr indexes in central location

Reported by: Twan Goosen Owned by:
Priority: minor Milestone: VLO-3.0
Component: VLO web app Version:
Keywords: Cc: teckart@informatik.uni-leipzig.de, latadmin@mpi.nl, KeesJan.vandeLooij@mpi.nl

Description

Adjust the deployment assembly setup in such a way that a central location for the active Solr index can be configured to prevent old versions including old indexes from clogging up the production server.

Change History (8)

comment:1 Changed 10 years ago by DefaultCC Plugin

Cc: teckart@informatik.uni-leipzig.de added

comment:2 Changed 10 years ago by Twan Goosen

Milestone: VLO-3.0

comment:3 Changed 10 years ago by latadmin@mpi.nl

Cc: latadmin@mpi.nl added
Type: enhancementtask

comment:4 Changed 10 years ago by Twan Goosen

Cc: KeesJan.vandeLooij@mpi.nl added

The location of SOLR indices is already configured to be in a central place. The exact location is actually configured, for production, in the file

/lat/tomcat-vlo1-solr/conf/Catalina/localhost/solr#vlo.xml

which currently has

<Environment name="solr/home" type="java.lang.String" value="/lat/webapps/vlo/current/config/solr" override="true"/>

The problem, if anything, is the fact that the path currently configured is a symlink to the SOLR configuration of the current version of the VLO package, which means that new versions do not overwrite old versions.
As far as I can see there are two things that can be done to prevent old indices from sticking around:

  • always clean up old indices after an update (but this would have to be done manually and cannot be enforced)
  • put the index in a fixed location, but keep taking the configuration from the VLO distribution

Proposal for an implementation of the latter:

  • create a base directory that serves as the SOLR home, e.g. /lat/webapps/vlo/solr (there can of course be a parallel /lat/webapps/vlo/solr-beta)
  • in this directory, make a symlink conf -> /lat/webapps/vlo/current/config/solr/conf
  • a data directory will be generated by Solr on indexing
  • notice that for VLO 3.0/Solr 4.6 the directory structure should look like this:
    • /lat/webapps/vlo/solr [assumed Solr root]
      • solr.xml
      • collection1
        • conf [symlink -> /lat/webapps/vlo/current/config/solr/collection1/conf]
        • data [generated by Solr]

In any case, this is purely an application/server management issue and does not require any changes on the development/packaging side of the VLO.

There is probably a risk of the index not being in sync with the configuration. With the current setup this cannot happen, because an upgrade implies reindexing from scratch. I'm not sure how much of a problem this is, maybe a Thomas can comment on that.

LatAdmins, please tell us what you think of this! I think Willem's and Kees Jan's experience with maintaining the VLO is very relevant here.

comment:5 Changed 10 years ago by latadmin@mpi.nl

As a third, keep-it-simple, suggestion for preventing "index pile up":

  • make directories /lat/webapps/vlo/stable-index and /lat/webapps/vlo/beta-index ONCE
  • edit stable and beta solr#vlo.xml ONCE to let solr/home point to the correct index directories

Then the index is no longer symlinked and no longer "part of the software", not bound to specific versions of VLO and not collecting dust in multiple, never cleaned up, copies from old VLO versions :-)

Of course you are welcome to flush indexes when upgrading VLO, if desired. I expect update packages to contain some sort of upgrade instructions text document (or script) anyway, so they can mention for which version step it is recommended to flush the index. The software could also do this automatically in the long run, triggered by detection of incompatible index content.

comment:6 Changed 10 years ago by Twan Goosen

The cleanest solution would be to adjust the dataDir parameter in solrConfig.xml. However, this would be server specific while a new solrConfig.xml will be packaged with each release of the VLO. If the LatAdmins do not see an issue in executing this step on each deployment (of course it will be documented properly), this would be my proposal. It would be the only change with respect to the current situation.
The admins can decide on the exact location of the Solr data directories (one for each instance I assume, possibly a third one for seamless re-indexing).

comment:7 Changed 10 years ago by Twan Goosen

Addition: some interesting possibilities seem to lie in the usage of environment variables and/or java system properties in the SolrConfig, see stackoverflow

comment:8 Changed 10 years ago by Twan Goosen

Resolution: fixed
Status: newclosed

Made the Solr data directory (which by default is a child of the Solr configuration directory, next to 'conf') configurable through a java system property in [5062].

For a description, see the changes in DEPLOY-README and UPGRADE.

Note: See TracTickets for help on using tickets.