Changes between Initial Version and Version 1 of Dashboard statistics


Ignore:
Timestamp:
11/11/15 11:06:22 (9 years ago)
Author:
go.sugimoto@oeaw.ac.at
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Dashboard statistics

    v1 v1  
     1== Note ==
     2The same content is also available at GoogleDocs [​https://docs.google.com/document/d/1arzkWarvC0TF39z2HkNw8quueI4da9MOsyLR410hSg0/edit?usp=sharing] Some images are omitted, as they are too big to fit. It is encouraged to read in the GoogleDocs for the most up-to-date and comprehensive information.
     3
     4
     5== The VLO Dashboard and its statistics ==
     6
     7This document will briefly describe the two major statistics required for VLO data workflow using Dashboard. The first is status indicator (Chapter 1) and the second is quality report (Chapter 2). The statistics will play an essential role for the Dashboard functionalities described in [[VLO/CMDI data workflow framework]].
     8
     9
     10== Dashboard status indicator ==
     11This section deals with the information needed to show the status of each data set in the Dashboard view (see below).
     12
     13[[Image()]]
     14
     15
     161.  Harvesting
     17Harvester checks:
     18{{{
     19- Target repository and data sets
     20- Harvesting process
     21}}}
     22
     23* OK (100% harvested) or No (show error message with explanation)
     24* (The number of harvested records)
     25
     26
     272.  CMDI converting (if applicable)
     28Converter checks:
     29
     30{{{
     31- OLAC schema
     32- Conversion results
     33}}}
     34
     35* OK (100% converted) or No (show error message with explanation)
     36* (The number of converted records)
     37
     38
     393. CMDI validation
     40Validator checks:
     41
     42{{{
     43- CMDI schema
     44- Validation results
     45}}}
     46
     47* OK or No (show error message with explanation)
     48*(The number of validated records)
     49
     50
     514. Concept mapping
     52Mapper checks:
     53
     54{{{
     55- Mapping results
     56}}}
     57
     58* OK or No (show error message with explanation)
     59* (The number of the fields covered for concept mapping (percentage of all fields))
     60
     61
     625. Value mapping and normalisation.
     63Mapper/normaliser checks:
     64
     65{{{
     66- Mapping and normalisation results
     67}}}
     68
     69* OK or No (show error message with explanation)
     70* (The number of the fields covered for value mapping and normalisation (percentage of all fields))
     71
     72
     736. Indexing
     74Indexer checks:
     75
     76{{{
     77- Indexing results
     78}}}
     79
     80* The number of indexed records
     81* (The number of the VLO indexed records (percentage of all VLO records))
     82* The number of facets covered (percentage of all facets)
     83
     847. Distribution
     85* The number of (CMDI) records in the repository (OAI etc)
     86
     87
     88== 2. Data quality report ==
     89Data quality report is a new important set of information to inform a data provider about the quality of submitted metadata. This document is the successor of the idea of MDQAS. It includes some statistics and descriptive messages. It should be send (semi)automatically to the data providers via email (as the primary audience), and, at the same time, be consulted within the Dashboard by the VLO curators and administrators.
     90
     91While the Dashboard will come with a wide range of metadata statistics for the whole ingestion process, the main concern for the data provider is not the statistics per se. They have to know very quickly how good or bad the submitted metadata is, and our goal is to tell them how easy to identify the issues and persuade them to improve the metadata. Therefore, it is best to be short and down-to-the point descriptions, avoiding too detailed and comprehensive statistics as written in MDQAS. 
     92
     93
     94An example is given below:
     95
     96
     97
     98{{{#!td
     99'''A Digital Archive of Research Papers in Computational Linguistics'''
     100Report date:  2015-11-09
     101
     102
     103'''Overview'''
     104Last harvesting:   2015-10-13 Excellent (100% - 3280 records)
     105Last indexing:     2015-10-14 Excellent (100% - 3280 records)
     106Data quality:      Moderate
     107                     The coverage of concepts: 85%
     108                     The coverage of facets: 70%
     109                     Normalisation rate: 15% 
     110                     Broken links: 4%
     111             
     112       
     113'''Details:'''
     114Concept uncovered fields: olac:date, olac:coverage, olac:creator
     115Facets uncovered: Country, ResourceType,
     116Missing values in: olac:coverage,
     117Unnormalised values: <olac:type>speech intev</olac:type>
     118                     <olac:type>gesproken</olac:type>
     119                     <olac:type>tele communication</olac:type>
     120                     <olac:date>1888?</olac:date>
     121                     <olac:date>?-1988</olac:date>
     122                     <olac:date>it is unknown</olac:date>
     123Broken links:         https://vlo.clarin.eu/record?docId=11022_47_0000-0000-20DF-1
     124                      https://vlo.clarin.eu/record?docId=11022_47_0000-0000-20DF-2
     125                      https://vlo.clarin.eu/record?docId=11022_47_0000-0000-20DF-3
     126
     127
     128'''Next step:'''
     129Please check our guideline and consult admin@vlo.clarin.eu if needed.
     130
     131Your little effort make a difference for the visibility of your data in VLO.
     132
     133Enjoy getting more traffic from VLO!
     134}}}