Version 109 (modified by 8 years ago) (diff) | ,
---|
Virtual Language Observatory (VLO)
Versions and deployment plan
Version | in testing (@RZG) | in beta | in production | Milestone | Changes |
---|---|---|---|---|---|
4.0 (development) | milestone | CHANGES | |||
3.4.1 (current) | CHANGES | ||||
3.4.0 | milestone | CHANGES | |||
3.3 | 2015-06-.. | 2015-08-21 (3.3) 2015-10-29 (3.3.2) | 2015-09-30 | milestone | CHANGES |
3.2 | 2015-05-27 | 2015-06-01 | 2015-07-08 | milestone | CHANGES |
3.1 | 2015-03-11 | 2015-03-11 | 2015-03-24 | milestone | CHANGES |
3.0.1 | 2014-06-16 | 2014-06-17 | 2014-06-17 | milestone | CHANGES |
3.0 | 2014-05-15 | 2014-05-13 | 2014-05-20 | milestone | CHANGES |
2.18 | 2014-01-17 | 2014-02-06 | milestone | CHANGES | |
2.17 | 2013-11-12 | 2013-11-15 | milestone | 2.17 | |
2.16 | 2013-10-24 | 2013-11-04 | milestone | CHANGES | |
2.15 | rolled back to 2.14 (memory problems) | rolled back to 2.14 (memory problems) | CHANGES | ||
2.14 | May 2013 | June 2014 | CHANGES |
Usage
- End users will browse to the web application (e.g. https://vlo.clarin.eu)
- The import can be started from the command line or scheduled via cron. For instructions, see README.txt
- If need be, the Solr index can be flushed by removing the Solr data directory's content (for the exact location, see table below)
- The Solr interface will typically only be available locally, for usage by the web app and the importer
Technical notes
Deployment/configuration
Upgrade instructions are included in the UPGRADE.txt file.
VLO is deployed to vlo.clarin.eu for production and beta-vlo.clarin.eu' for beta. There is a single tomcat instances that hosts both the web application and the Solr back end.
The application bundles, which includes a Solr instance app, the importer, the web app and configuration are deployed to directories under /lat/webapps/vlo, which also has symlinks for both the current production and current beta directories. These symlinks are used in the tomcat configuration so that this does not need any change when updating the version of the VLO.
Most of the configuration takes place in the VloConfig.xml file. Its format is read by both the importer and the web app and has options for both, so this can be shared in a single file. In addition, the web app has a set of context parameters to configure the location of VloConfig.xml that should be used, and to optionally override the Solr base URL.
See the packages deployment instructions (DEPLOY-README.txt) for details.
Deployment environments
Environment | Test | Beta | Production | |||
---|---|---|---|---|---|---|
Server | catalog-clarin.esc.rzg.mpg.de | beta-vlo-clarin.esc.rzg.mpg.de | lvps92-51-161-129.dedicated.hosteurope.de | |||
URL | http://alpha-vlo.clarin.eu | http://beta-vlo.clarin.eu | https://vlo.clarin.eu | |||
User | vlouser | vlouser | vlouser | |||
Locations | ||||||
Dockerised | No | Yes | No | |||
VLO home | /lat/webapps/vlo | /opt/webapps/vlo | /srv/webapps/vlo | |||
Solr data dir | /data2/vlo/config/solr/collection1/data | /opt/solr-data | /srv/webapps/vlo/solrdata | |||
Config | ||||||
deleteAllFirst | true | true | false | |||
maxDaysInSolr | 0 | 0 | 7 | |||
solrUrl | http://localhost:8080/vlo-solr/core0/ | http://172.17.42.1:8080/solr/core0/ | http://localhost:8080/solr/core0/ |
Data sources and mapping
Data sources: see CmdiDataSources
Mapping CMDI > VLO fields and facets
- See http://vlo.clarin.eu/mapping which is based on facetConcepts.xml
- White/Blacklist as suggested by CLARIN-D centres
- Value mapping for various fields on basis of uniform mapping files: https://github.com/clarin-eric/VLO/tree/master/vlo-vocabularies/maps/uniform_maps
Development
The VLO's sources are on GitHub!
Architecture
- Solr server holds the document index; Solrj client is used in the importer and web app
- Importer processes (parsing + post-processing) CMDI documents retrieved by the OAI harvester? and creates the Solr indexes (facets + full text search)
- Wicket + Spring for the front-end
- Pages and custom components and models for wicket
- Reading the guide, especially the section on models, is highly recommended before diving in!
- Spring wiring through eu.clarin.cmdi.vlo.config.VloSpringConfig
- Bootstrapped through filter in web.xml and applicationContext.xml in WEB-INF
- Beans are defined here and can be injected into wicket components and other wicket beans
- For a tutorial/introduction, see http://hwellmann.blogspot.nl/2011/02/building-web-applicaton-with-wicket.html
- Unit test may inject a custom (overridden) spring configuration
- Pages and custom components and models for wicket
Maven project structure (all with eu.clarin.cmdi groupId):
- vlo reactor project with some shared properties (library versions) and dependencies (logging, unit testing); it has the following child modules:
- vlo-commons has some common classes for configuration (VloConfig and factories), constants and utility methods;
- vlo-vocabularies has vocabularies and maps for value curation/normalisation/harmonisation
- vlo-solr WAR project for the Solr instance with configuration setup for VLO;
- vlo-importer builds as a runnable importer that reads CMDI instances from configured locations and creates an index into a deployed Solr instance;
- vlo-web-app WAR project for the VLO front-end that uses the facets and fields in the populated Solr index;
- vlo-distribution an assembler project that takes the output of vlo-solr, vlo-importer and vlo-web-app and combines these into a single distribution package that can be used for deployment on an arbitrary server.
- vlo-sitemap a stand-alone tool for creating a sitemap with all records and static pages in the VLO
Configuration
VloConfig (in vlo-commons) is a POJO (only getters and setters, no reading or writing logic). There are factories for different ways of creating the configuration.
For the web app, the ServletVloConfigFactory is used which reads the configuration from file but also processes context parameters which can override the parameters in the configuration file.
Some options configurable through VloConfig:
- locations of metadata to be imported
- various solr parameters
- parameters that control the size of the wicket cache
- the fields that need to be hidden as technical fields or ignored altogether
- the facets that need to be shown in the search interface
- the name of the collection facet (or leave out to remove separate collection facet)
- base URL's for federated content search
- language code URL's
- ...
Localisation
Using Wicket i8n support, see Wicket Guide.
- Page or component specific resource strings in .properties files in eu.clarin.cmdi.vlo.wicket.pages and eu.clarin.cmdi.vlo.wicket.panels.*
- Field names in /src/main/resources/fieldNames.properties
- Resource types in /src/main/resources/resourceTypes.properties
Release and distribution
The project vlo-distribution is set up to produce a distribution package for easy deployment on build. It depends on the other projects in the VLO reactor (parent) project.
Instructions on how to prepare for release and distribution are available in the README file (e.g. README in trunk).
Testing
Test VM
Test versions of the VLO can be deployed to the following server: catalog-clarin.esc.rzg.mpg.de. It is publicly accessible. To access it over SSH, get an account for con01.rzg.mpg.de (contact Florian Kaiser) or ask Twan? or Dieter? to login. From this host, an SSH connection to root@catalog-clarin.esc.rzg.mpg.de can be made on basis of a key pair without the need for a password. The host has Apache and Tomcat configured with reverse proxies in the former to access the latter from outside.
End user testing
A test plan for end-user testing of the VLO front-end has been implemented at MPI. Contact corpus management (corpman@mpi.nl) for more information or testing requests.
Design
Page / components hierarchy
The following diagram presents the composition of the pages of the VLO, showing the pages and their most important components (hence it is not a complete picture).
- Where not shown explicitly, the cardinality of the composition relations is 1:1
- The red lines represent possible navigation routes. The SimpleSearchPage is the default entry page
Models lifecycle
Selection models
Flow of the query and facet selection and derivatives:
Document and field models
Flow of the Solr document model, Solr field model, and derivatives
See also
- VLO-Taskforce (CLARIN-D)
- Data sources
Tickets
Misc (old)
Background
- In general: see this paper for an overview about how the VLO works
language codes
- old SIL codes and their ISO-639-3 equivalent: http://www.ethnologue.com/14/show_language.asp?code=ach
- SIL codes list: http://tal.univ-paris3.fr/mkAlign/mka-online/LanguageCodes.html
Demo scenario
Google Earth > Language Information > Hoava > Hoava data > lexicon > nice scanned word lists
Multimodal Corpus: VLO: search 'SVC', select 'SVC' from result list and look at Corpus information and Metadata, maybe click on first Link to landing page in BAS repo, go back to result list (Back-Button), select 'i003' from result list, scroll down in link list to the second movie link (type mpg), click on it and authentify via AAI, movie plays.
Corpus of printed historical texts (Deutsches Textarchiv): Country: Germany > Collection: Deutsches Textarchiv > Genre: Gebrauchsliteratur (= functional literature) > name: Davidis, Henriette: Praktisches Kochbuch für die gewöhnliche und feinere Küche. 4. Aufl. Bielefeld, 1849. > Follow link to Landing Page
Inspiration sources
OLAC facet browser: http://dla.library.upenn.edu/dla/olac/index.html