[=#topofpage] ''Responsible for this page: [mailto:davor.ostojic@oeaw.ac.at Davor Ostojić].''\\ ''Last content check: 03-11-2015'' \\ ''Status: design'' {{{ #!html

Purpose

}}} The purpose of this page is to collect relevant information about Curation Module project. = Project: Curation Module = The goal of this project is to implement software component for curation and quality assessment which can be integrated in the CLARINs VLO workflow. Project is initialized by [[wiki:Taskforces/Curation|Metadata Curation Task Force]]. Specification for the Curation Module is based on the [[wiki:MDQAS|Metadata Quality Assessement Service]] proposal. Curation Module will be able to validate and normalize single MD records, repositories and profiles, to assess their quality and to produce reports with different information for different actors in VLO workflow. For implementation this project will use some of the existing CLARIN components. {{{ #!comment ---- == Subpages == If there are subpages to this page, uncomment this section and add links these pages. }}} ---- {{{ #!comment This section can be skipped for short pages. }}} {{{ #!html

Contents

}}} [[PageOutline(1-2, , inline)]] ---- == People == * [mailto:matej.durco@oeaw.ac.at Matej Ďurčo] - coordinator (CLARIN-AT, CLARIN Center Vienna) * [mailto:davor.ostojic@oeaw.ac.at Davor Ostojić] - developer (CLARIN-AT, CLARIN Center Vienna) ---- == Getting code == github: https://github.com/clarin-eric/clarin-curation-module {{{ #!comment * You can browse the code [source:yourproject here] * Check out from: {{{https://trac.clarin.eu/yourproject}}} }}} ---- == Usage == ==== Web Application ==== url: https://clarin.oeaw.ac.at/curate/ ==== REST API ==== instance: * https://clarin.oeaw.ac.at/curate/rest/instance?url=url_of_an_instance profile: * https://clarin.oeaw.ac.at/curate/rest/profile?url=a_valid_of_a_profile * https://clarin.oeaw.ac.at/curate/rest/profile/id/{profilesID} ==== CLI ==== To run it from the command line: java -cp curate.jar:path_to_maven_dependecies/* eu.clarin.cmdi.curation.main.Main Parameters: * -config       a path to the configuration file, internal configuration file is used by default. This parameter is optional input type: * -p to curate a profile * -i to curate an instance * -c to curate a collection resource: * -path - Space separated paths to file or folder to be curated * -url - Space separated urls to profile or instance to be curated * -id - Space separated CLARIN profile IDs in format: clarin.eu:cr1:p_xxx Allowed combinations are: * -p -path / url / id * -i -path / url * -c -path ---- == System Requirements == Requirements for the project are based on [[wiki:MDQAS#Requirements|Metadata Quality Assessement Service requirements]] === Identified Use Cases === ==== Use Case 1 – Metadata Creator checks the validity of newly created record ==== * Title: Check validity of metadata record * Actor: MD Creator * Level: User Goal * Main Success Scenario:[[BR]] 1. User copies MD record into the web form and starts validation by clicking "Validate" button[[BR]] 2. Module does schema validation, link checks, vocabulary check, facet coverage assessment[[BR]] 3. User gets the report with status, eventual errors and assessment[[BR]] 4. User gets instructions how to improve MD record (recommended profile, recommended values)[[BR]] ==== Use Case 2 – MD Modeler checks the quality of profiles ==== * Title: Check quality of profile/schema * Actor: MD Modeler * Level: User Goal * Main Success Scenario:[[BR]] 1. CMDI Editor runs the curation module and passes as argument profile or schema[[BR]] 2. Module does link checks and facet coverage assessment[[BR]] 3. User gets the report on links availability and facet coverage[[BR]] ==== Use Case 3 – Repository Admin checks quality of metadata in his repository ==== * Title: Check overall quality of metadata in repository * Actor: Repository Admin * Level: User Goal * Main Success Scenario:[[BR]] 1. Admin runs module from command line and passes as argument location containing MD records[[BR]] 2. Module does quality assessment of the records[[BR]] 3. Admin gets summarized report on overall quality of MD records in his repository[[BR]] ==== Use Case 4 – Curation Module in VLO workflow ==== * Title: Use Case 4 – Curation Module in VLO workflow * Actor: VLO workflow * Level: Summary * Main Success Scenario:[[BR]] 1. Curation Module is called before vlo-importer component with location where MD records are stored as argument[[BR]] 2. Module does validation and normalization and generates different kinds of reports and normalized MD record[[BR]] 3. VLO importer uses normalized records in post-processing phase and imports them into SOLR[[BR]] 4. After importing script emails reports to VLO admin, MD Curators and data providers[[BR]] === Requirements === * Curation module will be integrated in VLO workflow. It has to provide VLO with normalized MD records for further (post-) processing and ingestion in SOLR. Since there are already huge number of records (currently 800K) and this number grows every week, curation module has to be performable. * Ability to provide validation and / or assessment service for metadata creators (on instance level). * Curation module has to provide service for CMDI profile/schema assessment for for MD modeler. * Curation module must be able to work with single record and with collections in batch mode. * Local and remote invocation. * Implementation of curation workflow includes following steps: schema validation, URL inspection, value validation against vocabularies and normalization. (See [[Cmdi/QualityCriteria]] for full list of checks) * Feedback about errors, per record or per collection. * Feedback about quality: score, facet coverage. * Provision of instructions on MD improvement. * Time dimension, keep track of quality in time for analysis. ---- == Dependencies == For implementation following projects will be used: * [[browser:CMDIValidator|CMDI Validator]] * [[wiki:CmdiVirtualLanguageObservatory|VLO]] backend * [[wiki:OAIHarvester|OAI Harvester]] * [[Cmdi/QualityCriteria]] ---- == Requirements == * java 8 * maven 3 * tomcat 7+ == Building and Deploying == To build the application {{{ #!rst mvn clean install }}} === copy war file from curation-module-web into tomcats webapps folder ---- == Design == Component Diagram: [[Image(Curatoin Module Component Diagram.jpg)]] ---- == Tickets == #676 - Create a metadata curation module ---- == Status, Planning and Roadmap == Status: implementation ---- == Resources == ---- == History == ---- == Meetings == [https://docs.google.com/document/d/1LgvfUVd7XKhoiDw54BpQM7_fQLdC_sYXNl_MXjFPK4o gdoc used for minutes and notes] [attachment:telco-08-04-2016.pdf telco 08/04/2016] [attachment:telco-28-04-2016.pdf telco 28/04/2016]