wiki:Piwik

Piwik as a central web analytics solution for CLARIN websites services

Twan Goosen, June 2014

Latest official paper

Introduction

"Web analytics is the measurement, collection, analysis and reporting of web data for purposes of understanding and optimizing web usage." (source: wikipedia)

Some (most?) CLARIN services have some method of obtaining statistics over the number of visitors, pages visited and/or visitor origins and characteristics. Generally this is based on an analysis of server logs (through e.g. AWStats). The alternative ‘page tagging’ method provides more detailed information. It also allows for easy centralisation of this analysis, so that different centres can make use of the same server without having to install or maintain any software locally. Google Analytics is an example of a platform based on this method, but there are (debatable?) privacy concerns. Piwik is an open-source solution that consists of a server component that collects statistics from an arbitrary number of websites or other services, and does not require interaction with any external party (therefore all data is stored on a ‘local’ server, e.g. at a CLARIN Centre).

Applications

For websites, the most straightforward way to integrate with Piwik is by adding a JavaScript (JS) snippet to each page. The client will execute this snippet and thus send information to the central server. In addition there is a ‘noscript’ tracker that works for pages without JS or clients that do not have JS enabled. In addition there is a REST API and there are libraries and plugins for a number of programming languages and frameworks (most importantly Java, PHP, Python and GWT). This means that for example web services could be tracked as well, and that tracking can be heavily customised. There are also plugins for a number of content management systems and web applications (e.g. Wordpress, Drupal, Plone, MediaWiki, Trac, Jenkins).

An overview of available integration options, Tracking API The Tracking API is used from within backend web application code to track page views. You need an application instance-specific user account and a registration of your application instance-specific website with our Piwik installation. Tied to this user account is a token that you have to specify in all your API calls, in order to track page views related to your application instance. In addition, your user/token must have admin privileges for that website (see: http://developer.piwik.org/api-reference/PHP-Piwik-Tracker#settokenauth)

For a tutorial on how to integrate Piwic in a Java application, Daniël de Kok wrote a very helpful tutorial which is available at:

attachment:piwik-weblicht.pdf

Features

The analysis web interface of Piwik shows real time statistics on a ‘dashboard’-like interface, either per website/service or in a general overview. It provides, among other things, extensive visitor information (location, visit duration, technical details), page transitions, speed statistics. It can also track external (from search engines) and internal searches.

Data can be exported to a number of file formats including Excel. Existing server logs can be imported into Piwik so that historical data can be integrated.

An overview of Piwik's features

Security

The analysis interface and analytics API can be secured with Shibboleth to restrict access, for example to a fixed set of people per centre (e.g. the technical and administrative contacts in the Centre Registry). The granularity of the access restriction will have to be investigated, but in first instance the idea is that all centres can access all statistics.

Also see How do I force Piwik to only track Page URLs that belong to my website? for a basic sanity check on visit recording attempts.

Pilot

From 24 June 2014, CLARIN ERIC hosts a Piwik test server on on https://stats.clarin.eu/ and is tying some web applications to this instance.

To get a tracking API token, please describe your website's ‘name’ and all home page URLs (http? https? etc.) to sysops@clarin.eu.

WebLicht

For the WebLicht integration and the usage of the Java library, Daniël de Kok wrote a very helpful tutorial.

Virtual Collection Registry

References

Piwik

Last modified 8 years ago Last modified on 04/08/16 13:41:00

Attachments (1)

Download all attachments as: .zip