wiki:Dashboard statistics

Note

The same content is also available at GoogleDocs? [​https://docs.google.com/document/d/1arzkWarvC0TF39z2HkNw8quueI4da9MOsyLR410hSg0/edit?usp=sharing] Some images are omitted, as they are too big to fit. It is encouraged to read in the GoogleDocs? for the most up-to-date and comprehensive information.

The VLO Dashboard and its statistics

This document will briefly describe the two major statistics required for VLO data workflow using Dashboard. The first is status indicator (Chapter 1) and the second is quality report (Chapter 2). The statistics will play an essential role for the Dashboard functionalities described in VLO/CMDI data workflow framework.

Dashboard status indicator

This section deals with the information needed to show the status of each data set in the Dashboard view (see below).

https://trac.clarin.eu/attachment/wiki/VLO/CMDI%20data%20workflow%20framework/Dashboard_UI_MockUp.png

  1. Harvesting

Harvester checks:

- Target repository and data sets
- Harvesting process
  • OK (100% harvested) or No (show error message with explanation)
  • (The number of harvested records)
  1. CMDI converting (if applicable)

Converter checks:

- OLAC schema
- Conversion results
  • OK (100% converted) or No (show error message with explanation)
  • (The number of converted records)
  1. CMDI validation

Validator checks:

- CMDI schema
- Validation results
  • OK or No (show error message with explanation)

*(The number of validated records)

  1. Concept mapping

Mapper checks:

- Mapping results
  • OK or No (show error message with explanation)
  • (The number of the fields covered for concept mapping (percentage of all fields))
  1. Value mapping and normalisation.

Mapper/normaliser checks:

- Mapping and normalisation results
  • OK or No (show error message with explanation)
  • (The number of the fields covered for value mapping and normalisation (percentage of all fields))
  1. Indexing

Indexer checks:

- Indexing results
  • The number of indexed records
  • (The number of the VLO indexed records (percentage of all VLO records))
  • The number of facets covered (percentage of all facets)
  1. Distribution
  • The number of (CMDI) records in the repository (OAI etc)

2. Data quality report

Data quality report is a new important set of information to inform a data provider about the quality of submitted metadata. This document is the successor of the idea of MDQAS. It includes some statistics and descriptive messages. It should be send (semi)automatically to the data providers via email (as the primary audience), and, at the same time, be consulted within the Dashboard by the VLO curators and administrators.

While the Dashboard will come with a wide range of metadata statistics for the whole ingestion process, the main concern for the data provider is not the statistics per se. They have to know very quickly how good or bad the submitted metadata is, and our goal is to tell them how easy to identify the issues and persuade them to improve the metadata. Therefore, it is best to be short and down-to-the point descriptions, avoiding too detailed and comprehensive statistics as written in MDQAS.

An example is given below:

A Digital Archive of Research Papers in Computational Linguistics Report date: 2015-11-09

Overview Last harvesting: 2015-10-13 Excellent (100% - 3280 records) Last indexing: 2015-10-14 Excellent (100% - 3280 records) Data quality: Moderate

The coverage of concepts: 85% The coverage of facets: 70% Normalisation rate: 15% Broken links: 4%

Details: Concept uncovered fields: olac:date, olac:coverage, olac:creator Facets uncovered: Country, ResourceType?, Missing values in: olac:coverage, Unnormalised values: <olac:type>speech intev</olac:type>

<olac:type>gesproken</olac:type> <olac:type>tele communication</olac:type> <olac:date>1888?</olac:date> <olac:date>?-1988</olac:date> <olac:date>it is unknown</olac:date>

Broken links: https://vlo.clarin.eu/record?docId=11022_47_0000-0000-20DF-1

https://vlo.clarin.eu/record?docId=11022_47_0000-0000-20DF-2 https://vlo.clarin.eu/record?docId=11022_47_0000-0000-20DF-3

Next step: Please check our guideline and consult admin@vlo.clarin.eu if needed.

Your little effort make a difference for the visibility of your data in VLO.

Enjoy getting more traffic from VLO!

Last modified 8 years ago Last modified on 11/11/15 11:23:47