wiki:VirtualCollectionRegistry

Version 35 (modified by Twan Goosen, 9 years ago) (diff)

added links to resources section

Responsible for this page: Twan Goosen?.
Last content check: 2015-12-10

Purpose

The purpose of this page is to collect relevant information about Your Project.

CLARIN Virtual Collection Registry

From the requirements description:

A virtual collection (VC) is a collection that is the result of browsing or searching repositories rather than being the result of construction and first-time publication by an organisation. Another characterisation is that the resources in a VC are already available in other collections, and the VC can be considered a derived collection. Nevertheless these VCs may need to be citable for future use and should therefore be able to be registered in a register, the VCR.

The Virtual Collection Registry (VCR) is an online registry that provides a RESTful web service (REST service) as well as a web based graphical user interface for the creation, publication, management and retrieval of virtual collections. Published collections are made available as CMDI records over OAI-PMH, and get assigned a persistent identifier by means of the EPIC service.


Subpages

People


Getting code

  • You can browse the code here
  • Check out from: https://trac.clarin.eu/VirtualCollectionRegistry

Usage

The components of the VCR are available at the following locations (currently referencing the alpha deployment!):

The help page explains the core concepts and terminology relevant to the VCR (based on the information gathered on this page).


System Requirements

The following requirements apply to development environment as well as testing/production environments:

  • MySQL server
    • A dedicated schema is required; a connection resource needs to be configured (with write permissions) in the servlet container, and available to the application; the application will create/update the database schema automatically, see README file for more details

Dependencies

For providing CMDI metadata, the VCR uses the VirtualCollection profile which is hosted by the CLARIN Component Registry.

The application notably makes use of the following libraries:

  • clarin.webservices.pidservices2 for reading and writing EPIC PIDs (from Leipzig University)
  • mpgaai-shhaa for Shibboleth authentication (developed at RZG for the MPG)
  • OAIProvider as a framework for the embedded OAI provider (developer by Oliver Schonefeld at IDS for CLARIN)

Building and Deploying

Development

Additional requirements for building the application from source:

  • JDK 7 or higher
  • Maven 3 or higher

The VCR sources are contained in a single Maven project. Most of the code is written in Java, complemented by HTML, (S)CSS and some images for the graphical user interface and a number of XML documents. Structurally, it consists of three largely independent functional components, each providing an interface to a shared 'VirtualCollectionRegistry' core service. The sections below describe the shared 'core' and the individual components.

VCR core

The core of the VCR is the collection store, which is implemented through JPA with annotated classes in the eu.clarin.cmdi.virtualcollectionregistry.model package. The persistence.xml file configures the Hibernate Persistence Provider to use a MySQL datasource for storage. The 'update' value of the hibernate.hbm2ddl.auto property instructs Hibernate to update the database schema at runtime if needed. A singleton object of the DataStore class provides access to the JPA Entity Manager.

The DataStore is primarily used in the service implemented by the VirtualCollectionRegistry class, which provides high level CRUD methods, using the VirtualCollection model class for the representation of individual collections. It also calls the configured PID service to register and link a persistent identifier upon publication of a collection. These methods are used by the REST service, OAI provider and Wicket application classes to access and manipulate the stored collections.

The Spring framework is used to define beans for singleton service objects, which are injected into various components throughout the application. The applicationContext.xml file bootstraps the bean definition, while most beans are discovered en constructed by means of 'component scanning'. For example, a singleton bean for DataStore exists because it is annotated @Repository. The instance is made available in the VirtualCollectionRegistry and Application (for the Wicket UI) instances because both are managed by Spring too (annotated with @Service and @Component respectively) and contain a DataStore field annotated with @Autowired. In addition, some beans are defined in configuration classes (annotated with @Configuration), by methods annoted @Bean. Note that all beans are singleton by default - so called 'prototype beans' are not used in the VCR.

There are a number of application context profiles, the activity of which determine which beans get instantiated by Spring. The context parameter spring.profiles.active can be used to activate one or more profiles. The profiles are defined by means of the @Profile annotation. At the moment one profile exists for each PID provider implementation; for example, the vcr.pid.epic profile activates the EPICPersistentIdentifierProvider implementation and the EPICPersistentIdentifierConfiguration configuration. It is technically valid to activate multiple profiles but in this case this will lead to a clash of service implementations. See deployment information for more details on using profiles to select the PID provider for a VCR instance after deployment.

A number of core services for marshalling, validation, etcetera can be found in eu.clarin.cmdi.virtualcollectionregistry.service package (generally interfaces with implementations in the .impl subpackage).

More info:

JAX-RS REST service

The package eu.clarin.cmdi.virtualcollectionregistry.rest defines a RESTful web service by means of JAX-RS annotations. It uses https://jersey.java.net/ Jersey as an implementation and depends on a number of Jersey extensions, primarily injection of beans. Four classes define the following (sub)resources:

  • /virtualcollections (class VirtualCollectionsResource) allows listing (GET) of published collections and POSTing of new collections
    • /virtualcollections/{id} (class VirtualCollectionResource) is a subresource that allows GETting, PUTting and DELETEing individual collections
  • /my-virtualcollections (class MyVirtualCollectionsResource) only allows retrieval (GET) the list of private and published collections owned by the authenticated user
  • / (class BaseResource) is a dummy resource that renders an informative HTML page (if content negotiation allows) about the service

Two BodyWriter classes are implemented and registered, taking care of the rendering of VirtualCollection object as CMDI or XML/JSON respectively, by means of different methods of the (injected) VirtualCollectionMarshaller service. This way, a single method in VirtualCollectionResource can handle these various serialisation options (depending on the HTTP Accept header in the client request). An additional method handles requests accepting HTML by returning a redirect ('see other') response pointing to the collection details page of the Wicket UI.

A specific format can be requested, overriding any content negotiation, by appending the format identifier to the URL, e.g. virtualcollections/1001.xml.

The CMDI output is generated by marshalling instances of the classes generated by the JAX-B Maven plugin at build time (XJC), based on the VirtualCollection profile schema, a copy of which is included in the sources.

The collections space can be queried by means of a domain specific query language, as described in the REST documentation. The grammar is defined in the file QueryParser.jjt. The javacc-maven-plugin generates the parser at build time (with the JJTree preprocessor). Query strings (as optional GET parameters) are passed from the REST resources to the VirtualCollectionRegistry service, which uses a static method to get it parsed into a ParsedQuery object, which in turn is used to obtain JPA query objects that can be used to retrieve concrete results.

More info:

Wicket application (UI)

The graphical user interface is implemented as a Wicket (version 1.4) web application, which can be found in eu.clarin.cmdi.virtualcollectionregistry.gui and subpackages. The Application class is the entry point to the application, and is registered as a bean that gets picked up by the SpringWebApplicationFactory defined in the web.xml. Each page is defined by a tuple consisting of a class and HTML template of the same name (minus extension). Most pages consist of components, many of which are custom to the VCR and are also defined by a class and HTML template. Each page and component typically represents a model; the VCR has a number of custom Wicket component model implementations in the gui package. All pages extend a BasePage, which defines the common layout (header, footer including some common components).

The VCR user interface makes quite intensive use of Javascript. First of all, it uses Wicket's out-of-the-box functionalities for partial updates via AJAX. In addition, JQuery is used for enhanced client side interaction. There is some custom JQuery based code, which is integrated with the Wicket components using the WiQuery library. The base page includes the Bootstrap Javascript library to support some out-of-the-box layout niceness.

The style of the UI is defined by a combination of Bootstrap CSS classes (like the Javascript, the style is included in the base page) and a number of .scss (SASS) files that get compiled to CSS at build time. The main file is vcr.scss, which include the base CLARIN style and includes a number of VCR-specific style definitions.

More info:

OAI provider

The VCR OAI provider is using the OAIProvider library, which consists of an out-of-the-box singleton OAIProvider instance and a servlet, in addition to an application specific Repository. The repository is implemented by VirtualColletionRegistryOAIRepository in the eu.clarin.cmdi.virtualcollectionregistry.oai package, which also contains a Spring configuration class that links the repository with a VirtualCollectionRegistry service instance on the one hand and the OAIProvider on the other. The servlet is mounted (via web.xml) at /oai and picks up the OAI provider instance automatically.

The provider supports Dublin Core (DC) and CMDI output. The DC is generated by the provider based on information provided by the Repository implementation. The CMDI output is generated via the same methods as the CMDI output of the REST service (see above).

Troubleshooting

When building using JDK8, you may get an error like this:

org.xml.sax.SAXParseException; systemId: file:/[..]/VirtualCollection.xsd; lineNumber: 1; columnNumber: 659; schema_reference: Failed to read schema document 'xml.xsd', because 'http' access is not allowed due to restriction set by the accessExternalSchema property. 

To fix this, add the following parameter to the mvn command: -Djavax.xml.accessExternalSchema=all (or add it to the MAVEN_OPTS environment variable). More info on this error

Deployment

The VCR builds into a single WAR file that can be deployed onto a servlet container (tested on and developed for Tomcat 7). There exists an alternative build profile called development (use mvn install -Pdevelopment) which makes logging output go to the console and enables Tomcat user realm authentication. By default, however, the application is enabled for Shibboleth authentication and logging to a file on disk. To allow for the switch between authentication mechanism, the web.xml file is duplicated and one of the two is selected at package time - see VirtualCollectionRegistry/trunk/VirtualCollectionRegistry/src/main/webapp/WEB-INF.

Building the VCR at least up to the package phase (preferably mvn install) will result in the creation of a .tar.gz file in the target directory that contains the deployable WAR file as well as the project documentation and licensing information in a directory that contains the artifact name and version number.

Further deployment instructions (primarily adding a datasource and a number of parameters to context.xml) can be found in the README file.

Beta

In case of a beta deployment, uncomment the following section in the base page HTML template at WEB-INF/classes/eu/clarin/cmdi/virtualcollectionregistry/gui/pages/BasePage.html:

<div id="betabadge">
  <span>BETA</span>
</div>

This will add a 'BETA' marker in the topleft corner of the page.


Interfaces

REST service

The VCR REST service provides CRUD operations on a 'virtualcollection' resource. Its internal XML format for input and output is defined by an XML schema. In addition, it supports JSON for input and output and CMDI and HTML output for individual collections (via content negotiation). The CMDI is based on the new VirtualCollection profile (xml, xsd).

In addition to the standers 'Accept header' method, the content type of the response can also be requested by extending the url with the desired format, e.g. /service/virtualcollections/1.cmdi.

A full description of the service can be found in Protocol.txt.

Form submission service

A special endpoint of the VCR REST service allows input from HMTL forms to create new virtual collections. This way, other web applications (such as the VLO) can prepare a collection based on resources gathered in their workflows, and allow the user to send them to the VCR with a single click while preserving the ability to authenticate via Shibboleth.

The submission endpoint is /vcr/service/submit. It accepts the following form parameters:

  • type (required)
  • name (required)
  • metadataUri (list, required)
  • resourceUri (list, required)
  • description (required)
  • keyword
  • purpose
  • reproducibility
  • reproducibilityNotice
  • creationDate
  • queryDescription
  • queryUri
  • queryProfile
  • queryValue

An example implementation (based on Wicket) can be found in the 'VirtualCollectionSubmissionPage' in the experimental branch of the VLO.


Design

Global architecture:

Notes:

  • The handles for individual virtual collections resolve to the REST resource vcr/service/virtualcollections/{id}. By means of content negotiation this will cause a redirect to the front end (vcr/app/details/{id}) in case HTML is requested/preferred. If no preferred response format is specified, CMDI will be returned.
  • The GUI does not use the REST service to access the data, but rather communicates with the relational database directly (using Hibernate)

Tickets and milestones

Milestones:

Open tickets:

Milestone: VirtualCollectionRegistry-1.1 (5 matches)

Ticket Summary Priority Owner Reporter
#654 Improve dealing with failed PID minting critical Willem Elbers Twan Goosen
#748 Inject OAIProvider instance major Twan Goosen
#603 Auto-recognize links in description field minor Twan Goosen
#605 Make it possible to re-use authors that have been entered before minor Twan Goosen
#626 Bootstrap styled pagination controls minor Twan Goosen

Milestone: VirtualCollectionRegistry-1.2 (4 matches)

Ticket Summary Priority Owner Reporter
#611 prefill the mimetype for ResourceProxy major Dieter Van Uytvanck
#837 Automatically detect mimetype of references major Twan Goosen Willem Elbers
#842 DataCite metadata store major Twan Goosen Willem Elbers
#612 add creator lookup based on ORCID minor Dieter Van Uytvanck

Milestone: None (6 matches)

Ticket Summary Priority Owner Reporter
#601 Versioning of virtual collections major Twan Goosen
#621 Custom user properties override SAML attributes major Twan Goosen
#622 More lightweight response at /service/virtualcollections major Twan Goosen
#1014 Make button group with two choices more clear major Willem Elbers Willem Elbers
#1017 Logout button major Willem Elbers Willem Elbers
#871 Record page could be stateless minor Willem Elbers Twan Goosen


Status, Planning and Roadmap

TODO Status: is the project active, on hold, mature but supported, due to be deprecated, etc.

Planning and roadmap: if there are other places with planning documents, don't forget to link to them.


Resources


History

TODO Who has worked on this project, and roughly what did they do? Mention significant developments even if the relevant code/functionality has later been removed. Include yourself, of course.

Attachments (1)

Download all attachments as: .zip