[[PageOutline(2-5)]] = Virtual Language Observatory (VLO) = == Versions and deployment plan == ||=Version=||=[http://catalog-clarin.esc.rzg.mpg.de/vlo in testing] (@RZG)=||=[http://catalog.clarin.eu/vlobeta/ in beta]=||=[http://catalog.clarin.eu/vlo/ in production]=||=Milestone=||=Changes=|| || 3.2 (development) || || || || [milestone:VLO-3.2 milestone] || [source:/vlo/trunk/CHANGES.txt CHANGES] || || 3.1 (staging) || 2015-03-11 || 2015-03-11 || || [milestone:VLO-3.1 milestone] || [source:/vlo/branches/vlo-3.1/CHANGES.txt CHANGES] || || 3.0.1 (current) || 2014-06-16 || 2014-06-17 || 2014-06-17 || [milestone:VLO-3.0.1 milestone] || [source:/vlo/tags/vlo-3.0.1/CHANGES CHANGES] || || [./3.0] || 2014-05-15 || 2014-05-13 || 2014-05-20 || [milestone:VLO-3.0 milestone] || [source:/vlo/tags/vlo-3.0/CHANGES CHANGES] || || 2.18 || ||2014-01-17 ||2014-02-06 || [milestone:VLO-2.18 milestone] || [source:/vlo/tags/vlo-2.18/CHANGES CHANGES] || || [./2.17] || ||2013-11-12 ||2013-11-15 || [milestone:VLO-2.17 milestone] || [./2.17] || || [./2.16] || ||2013-10-24 ||2013-11-04 || [milestone:VLO-2.16 milestone] ||[source:/vlo/tags/2.16.beta/CHANGES CHANGES] || || [./2.15] || ||rolled back to 2.14 (memory problems) ||rolled back to 2.14 (memory problems) || || [source:/vlo/tags/2.15/CHANGES CHANGES]|| || 2.14 || ||May 2013 ||June 2014 || || [source:/vlo/tags/2.14.beta.fixed.again/CHANGES CHANGES]|| == Technical notes == === Deployment/configuration === Upgrade instructions are included in the [source:/vlo/trunk/UPGRADE UPGRADE] file. VLO is deployed to ''catalog.clarin.eu'' for beta and production. There are three tomcat instances running: * '''tomcat-vlo1''': production web-app, publicly accessible through [http://catalog.clarin.eu/vlo/] * '''tomcat-vlo1-solr''': production Solr instance, '''locally''' accessible through [http://catalog.clarin.eu:8077/solr/vlo] * '''tomcat-vlo2''': beta web-app, publicly accessible through [http://catalog.clarin.eu/vlobeta/] The application bundles, which includes a Solr instance app, the importer, the web app and configuration are deployed to directories under /lat/webapps/vlo, which also has symlinks for both the current production and current beta directories. These symlinks are used in the tomcat configuration so that this does not need any change when updating the version of the VLO. Most of the '''configuration''' takes place in the !VloConfig.xml file. Its format is read by both the importer and the web app and has options for both, so this can be shared in a single file. In addition, the web app has a set of context parameters to configure the location of !VloConfig.xml that should be used, and to ''optionally'' override the Solr base URL. See the packages deployment instructions ([source:/vlo/trunk/DEPLOY-README DEPLOY-README]) for details. ==== Deployment environments ==== ||= Environment =||= Location =||= Configuration =|| ||= Test =|| http://catalog-clarin.esc.rzg.mpg.de/vlo/ (see [#Testing Testing]) || || ||= Beta =|| http://catalog.clarin.eu/vlobeta/ || deleteAllFirst: true || ||= Production =|| http://catalog.clarin.eu/vlo/ || deleteAllFirst: false\\maxDaysInSolr: 7 || === Data sources and mapping === Data sources: see CmdiDataSources Mapping CMDI > VLO fields and facets * See [http://lux13.mpi.nl/isocat/clarin/vlo/mapping/] which is based on [https://trac.clarin.eu/browser/vlo/trunk/vlo_importer/src/main/resources/facetConcepts.xml] * [[./ManualMapping|White/Blacklist]] as suggested by CLARIN-D centres * Mapping from !MdCollectionDisplayName to !NationalProject: [https://trac.clarin.eu/browser/vlo/trunk/vlo-commons/src/main/resources/nationalProjectsMapping.xml nationalProjectsMapping.xml] === Development === ==== Architecture ==== * '''Solr''' server holds the document index; Solrj client is used in the importer and web app * '''Importer''' processes (parsing + post-processing) CMDI documents retrieved by the [[OAI-PMH|OAI harvester]] and creates the Solr indexes (facets + full text search) * Wicket + Spring for the '''front-end''' * Pages and custom components and models for wicket * Reading the [http://wicket.apache.org/guide/ guide], especially the [http://wicket.apache.org/guide/guide/modelsforms.html#modelsforms_1 section on models], is highly recommended before diving in! * Spring wiring through eu.clarin.cmdi.vlo.config.!VloSpringConfig * Bootstrapped through filter in web.xml and applicationContext.xml in WEB-INF * Beans are defined here and can be injected into wicket components and other wicket beans * For a tutorial/introduction, see http://hwellmann.blogspot.nl/2011/02/building-web-applicaton-with-wicket.html * Unit test may inject a custom (overridden) spring configuration Maven project structure (all with '''eu.clarin.cmdi''' groupId): * '''vlo''' reactor project with some shared properties (library versions) and dependencies (logging, unit testing); it has the following child modules: * '''vlo-commons''' has some common classes for configuration (!VloConfig and factories), constants and utility methods; * '''vlo-solr''' WAR project for the Solr instance with configuration setup for VLO; * '''vlo-importer''' builds as a runnable importer that reads CMDI instances from configured locations and creates an index into a deployed Solr instance; * '''vlo-web-app''' WAR project for the VLO front-end that uses the facets and fields in the populated Solr index; * '''vlo-distribution''' an assembler project that takes the output of vlo-solr, vlo-importer and vlo-web-app and combines these into a single distribution package that can be used for deployment on an arbitrary server. ==== Configuration ==== !VloConfig (in vlo-commons) is a POJO (only getters and setters, no reading or writing logic). There are factories for different ways of creating the configuration. For the web app, the !ServletVloConfigFactory is used which reads the configuration from file but also processes context parameters which can override the parameters in the configuration file. Some options configurable through !VloConfig: * locations of metadata to be imported * various solr parameters * parameters that control the size of the wicket cache * the fields that need to be hidden as technical fields or ignored altogether * the facets that need to be shown in the search interface * the name of the collection facet (or leave out to remove separate collection facet) * base URL's for federated content search * language code URL's * ... ==== Localisation ==== Using Wicket i8n support, see [http://wicket.apache.org/guide/guide/i18n.html Wicket Guide]. * Page or component specific resource strings in .properties files in eu.clarin.cmdi.vlo.wicket.pages and eu.clarin.cmdi.vlo.wicket.panels.* * Field names in /src/main/resources/fieldNames.properties * Resource types in /src/main/resources/resourceTypes.properties ==== Release and distribution ==== The project ''vlo-distribution'' is set up to produce a distribution package for easy deployment on build. It depends on the other projects in the VLO reactor (parent) project. Instructions on how to prepare for release and distribution are available in the README file (e.g. [source:/vlo/trunk/README README] in trunk). ==== Testing ==== ===== Test VM ===== Test versions of the VLO can be deployed to the following server: '''catalog-clarin.esc.rzg.mpg.de'''. It is publicly accessible. To access it over SSH, get an account for ''con01.rzg.mpg.de'' (contact [mailto:florian.kaiser@rzg.mpg.de Florian Kaiser]) or ask [[twagoo|Twan]] or [[dietuyt|Dieter]] to login. From this host, an SSH connection to ''!root@catalog-clarin.esc.rzg.mpg.de'' can be made on basis of a key pair without the need for a password. The host has Apache and Tomcat configured with reverse proxies in the former to access the latter from outside. ===== End user testing ===== A test plan for end-user testing of the VLO front-end has been implemented at MPI. Contact corpus management (corpman@mpi.nl) for more information or testing requests. == Design == === Page / components hierarchy === The following diagram presents the composition of the pages of the VLO, showing the pages and their ''most important'' components (hence it is not a complete picture). * Where not shown explicitly, the cardinality of the composition relations is 1:1 * The red lines represent possible navigation routes. The !SimpleSearchPage is the default entry page [[Image(CmdiVirtualLanguageObservatory/3.0:VLO-components.png, 1024px)]] === Models lifecycle === ==== Selection models ==== Flow of the query and facet selection and derivatives: [[Image(CmdiVirtualLanguageObservatory/3.0:VLO-models-selection.png, 1024px)]] ==== Document and field models ==== Flow of the Solr document model, Solr field model, and derivatives [[Image(CmdiVirtualLanguageObservatory/3.0:VLO-models-document.png, 1024px)]] == See also == * [[VLO-Taskforce]] (CLARIN-D) * [[CmdiDataSources|Data sources]] == Tickets == * [report:10 Report of all open VLO tickets] == Misc (old) == === Background === * In general: see [http://hdl.handle.net/11858/00-001M-0000-000F-85F4-5 this paper] for an overview about how the VLO works === language codes === * old SIL codes and their ISO-639-3 equivalent: http://www.ethnologue.com/14/show_language.asp?code=ach * SIL codes list: http://tal.univ-paris3.fr/mkAlign/mka-online/LanguageCodes.html === Demo scenario === Google Earth > Language Information > Hoava > Hoava data > lexicon > nice scanned word lists Multimodal Corpus: VLO: search 'SVC', select 'SVC' from result list and look at Corpus information and Metadata, maybe click on first Link to landing page in BAS repo, go back to result list (Back-Button), select 'i003' from result list, scroll down in link list to the second movie link (type mpg), click on it and authentify via AAI, movie plays. Corpus of printed historical texts (Deutsches Textarchiv): Country: Germany > Collection: Deutsches Textarchiv > Genre: Gebrauchsliteratur (= functional literature) > name: Davidis, Henriette: Praktisches Kochbuch für die gewöhnliche und feinere Küche. 4. Aufl. Bielefeld, 1849. > Follow link to Landing Page === Inspiration sources === OLAC facet browser: http://dla.library.upenn.edu/dla/olac/index.html