wiki:VirtualCollectionRegistry

Version 21 (modified by Twan Goosen, 10 years ago) (diff)

some fixes in technical REST section

CLARIN Virtual Collection Registry

Description

From the requirements description:

A virtual collection (VC) is a collection that is the result of browsing or searching repositories rather than being the result of construction and first-time publication by an organisation. Another characterisation is that the resources in a VC are already available in other collections, and the VC can be considered a derived collection. Nevertheless these VCs may need to be citable for future use and should therefore be able to be registered in a register, the VCR.

The Virtual Collection Registry (VCR) is an online registry that provides a RESTful web service (REST service) as well as a web based graphical user interface for the creation, publication, management and retrieval of virtual collections. Published collections are made available as CMDI records over OAI-PMH, and get assigned a persistent identifier by means of the EPIC service.

These components are available at the following locations (currently referencing the alpha deployment!):

The help page explains the core concepts and terminology relevant to the VCR (based on the information gathered on this page).

REST service

The VCR REST service provides CRUD operations on a 'virtualcollection' resource. Its internal XML format for input and output is defined by an XML schema. In addition, it supports JSON for input and output and CMDI and HTML output for individual collections (via content negotiation). The CMDI is based on the new VirtualCollection profile (xml, xsd).

In addition to the standers 'Accept header' method, the content type of the response can also be requested by extending the url with the desired format, e.g. /service/virtualcollections/1.cmdi.

A full description of the service can be found in Protocol.txt.

Form submission service

A special endpoint of the VCR REST service allows input from HMTL forms to create new virtual collections. This way, other web applications (such as the VLO) can prepare a collection based on resources gathered in their workflows, and allow the user to send them to the VCR with a single click while preserving the ability to authenticate via Shibboleth.

The submission endpoint is /vcr/service/submit. It accepts the following form parameters:

  • type (required)
  • name (required)
  • metadataUri (list, required)
  • resourceUri (list, required)
  • description (required)
  • keyword
  • purpose
  • reproducibility
  • reproducibilityNotice
  • creationDate
  • queryDescription
  • queryUri
  • queryProfile
  • queryValue

An example implementation (based on Wicket) can be found in the 'VirtualCollectionSubmissionPage' in the experimental branch of the VLO.

Technical notes

Development

The VCR sources are contained in a single Maven project. Most of the code is written in Java, complemented by HTML, (S)CSS and some images for the graphical user interface and a number of XML documents. Structurally, it consists of three largely independent functional components, each providing an interface to a shared 'VirtualCollectionRegistry' core service. The sections below describe the shared 'core' and the individual components.

Source code: VirtualCollectionRegistry

VCR core

The core of the VCR is the collection store, which is implemented through JPA with annotated classes in the eu.clarin.cmdi.virtualcollectionregistry.model package. The persistence.xml file configures the Hibernate Persistence Provider to use a MySQL datasource for storage. The 'update' value of the hibernate.hbm2ddl.auto property instructs Hibernate to update the database schema at runtime if needed. A singleton object of the DataStore class provides access to the JPA Entity Manager.

The DataStore is primarily used in the service implemented by the VirtualCollectionRegistry class, which provides high level CRUD methods, using the VirtualCollection model class for the representation of individual collections. It also calls the configured PID service to register and link a persistent identifier upon publication of a collection. These methods are used by the REST service, OAI provider and Wicket application classes to access and manipulate the stored collections.

The Spring framework is used to define beans for singleton service objects, which are injected into various components throughout the application. The applicationContext.xml file bootstraps the bean definition, while most beans are discovered en constructed by means of 'component scanning'. For example, a singleton bean for DataStore exists because it is annotated @Repository. The instance is made available in the VirtualCollectionRegistry and Application (for the Wicket UI) instances because both are managed by Spring too (annotated with @Service and @Component respectively) and contain a DataStore field annotated with @Autowired. In addition, some beans are defined in configuration classes (annotated with @Configuration), by methods annoted @Bean. Note that all beans are singleton by default - so called 'prototype beans' are not used in the VCR.

There are a number of application context profiles, the activity of which determine which beans get instantiated by Spring. The context parameter spring.profiles.active can be used to activate one or more profiles. The profiles are defined by means of the @Profile annotation. At the moment one profile exists for each PID provider implementation; for example, the vcr.pid.epic profile activates the EPICPersistentIdentifierProvider implementation and the EPICPersistentIdentifierConfiguration configuration. It is technically valid to activate multiple profiles but in this case this will lead to a clash of service implementations. See deployment information for more details on using profiles to select the PID provider for a VCR instance after deployment.

A number of core services for marshalling, validation, etcetera can be found in eu.clarin.cmdi.virtualcollectionregistry.service package (generally interfaces with implementations in the .impl subpackage).

More info:

JAX-RS REST service

The package eu.clarin.cmdi.virtualcollectionregistry.rest defines a RESTful web service by means of JAX-RS annotations. It uses https://jersey.java.net/ Jersey as an implementation and depends on a number of Jersey extensions, primarily injection of beans. Four classes define the following (sub)resources:

  • /virtualcollections (class VirtualCollectionsResource) allows listing (GET) of published collections and POSTing of new collections
    • /virtualcollections/{id} (class VirtualCollectionResource) is a subresource that allows GETting, PUTting and DELETEing individual collections
  • /my-virtualcollections (class MyVirtualCollectionsResource) only allows retrieval (GET) the list of private and published collections owned by the authenticated user
  • / (class BaseResource) is a dummy resource that renders an informative HTML page (if content negotiation allows) about the service

Two BodyWriter classes are implemented and registered, taking care of the rendering of VirtualCollection object as CMDI or XML/JSON respectively, by means of different methods of the (injected) VirtualCollectionMarshaller service. This way, a single method in VirtualCollectionResource can handle these various serialisation options (depending on the HTTP Accept header in the client request). An additional method handles requests accepting HTML by returning a redirect ('see other') response pointing to the collection details page of the Wicket UI.

The CMDI output is generated by marshalling instances of the classes generated by the JAX-B Maven plugin at build time (XJC), based on the VirtualCollection profile schema, a copy of which is included in the sources.

The collections space can be queried by means of a domain specific query language, as described in the REST documentation. The grammar is defined in the file QueryParser.jjt. The javacc-maven-plugin generates the parser at build time (with the JJTree preprocessor). Query strings (as optional GET parameters) are passed from the REST resources to the VirtualCollectionRegistry service, which uses a static method to get it parsed into a ParsedQuery object, which in turn is used to obtain JPA query objects that can be used to retrieve concrete results.

More info:

Wicket application (UI)

The graphical user interface is implemented as a Wicket (version 1.4) web application, which can be found in eu.clarin.cmdi.virtualcollectionregistry.gui and subpackages. The Application class is the entry point to the application, and is registered as a bean that gets picked up by the SpringWebApplicationFactory defined in the web.xml. Each page is defined by a tuple consisting of a class and HTML template of the same name (minus extension). Most pages consist of components, many of which are custom to the VCR and are also defined by a class and HTML template. Each page and component typically represents a model; the VCR has a number of custom Wicket component model implementations in the gui package. All pages extend a BasePage, which defines the common layout (header, footer including some common components).

The VCR user interface makes quite intensive use of Javascript. First of all, it uses Wicket's out-of-the-box functionalities for partial updates via AJAX. In addition, JQuery is used for enhanced client side interaction. There is some custom JQuery based code, which is integrated with the Wicket components using the WiQuery library. The base page includes the Bootstrap Javascript library to support some out-of-the-box layout niceness.

The style of the UI is defined by a combination of Bootstrap CSS classes (like the Javascript, the style is included in the base page) and a number of .scss (SASS) files that get compiled to CSS at build time. The main file is vcr.scss, which include the base CLARIN style and includes a number of VCR-specific style definitions.

More info:

OAI provider

The VCR OAI provider is using the OAIProvider library, which consists of an out-of-the-box singleton OAIProvider instance and a servlet, in addition to an application specific Repository. The repository is implemented by VirtualColletionRegistryOAIRepository in the eu.clarin.cmdi.virtualcollectionregistry.oai package, which also contains a Spring configuration class that links the repository with a VirtualCollectionRegistry service instance on the one hand and the OAIProvider on the other. The servlet is mounted (via web.xml) at /oai and picks up the OAI provider instance automatically.

The provider supports Dublin Core (DC) and CMDI output. The DC is generated by the provider based on information provided by the Repository implementation. The CMDI output is generated via the same methods as the CMDI output of the REST service (see above).

Deployment

The VCR builds into a single WAR file that can be deployed onto a servlet container (tested on and developed for Tomcat 7). There exists an alternative build profile called development (use mvn install -Pdevelopment) which makes logging output go to the console and enables Tomcat user realm authentication. By default, however, the application is enabled for Shibboleth authentication and logging to a file on disk. To allow for the switch between authentication mechanism, the web.xml file is duplicated and one of the two is selected at package time - see VirtualCollectionRegistry/trunk/VirtualCollectionRegistry/src/main/webapp/WEB-INF.

Building the VCR at least up to the package phase (preferably mvn install) will result in the creation of a .tar.gz file in the target directory that contains the deployable WAR file as well as the project documentation and licensing information in a directory that contains the artifact name and version number.

Further deployment instructions (primarily adding a datasource and a number of parameters to context.xml) can be found in the README file.

Tickets

Milestones:

Open tickets:

Ticket Summary Priority Owner Reporter Milestone
#654 Improve dealing with failed PID minting critical Willem Elbers Twan Goosen VirtualCollectionRegistry-1.1
#601 Versioning of virtual collections major Twan Goosen
#611 prefill the mimetype for ResourceProxy major Dieter Van Uytvanck VirtualCollectionRegistry-1.2
#621 Custom user properties override SAML attributes major Twan Goosen
#622 More lightweight response at /service/virtualcollections major Twan Goosen
#748 Inject OAIProvider instance major Twan Goosen VirtualCollectionRegistry-1.1
#837 Automatically detect mimetype of references major Twan Goosen Willem Elbers VirtualCollectionRegistry-1.2
#842 DataCite metadata store major Twan Goosen Willem Elbers VirtualCollectionRegistry-1.2
#1014 Make button group with two choices more clear major Willem Elbers Willem Elbers
#1017 Logout button major Willem Elbers Willem Elbers
#603 Auto-recognize links in description field minor Twan Goosen VirtualCollectionRegistry-1.1
#605 Make it possible to re-use authors that have been entered before minor Twan Goosen VirtualCollectionRegistry-1.1
#612 add creator lookup based on ORCID minor Dieter Van Uytvanck VirtualCollectionRegistry-1.2
#626 Bootstrap styled pagination controls minor Twan Goosen VirtualCollectionRegistry-1.1
#871 Record page could be stateless minor Willem Elbers Twan Goosen

Attachments (1)

Download all attachments as: .zip