Changes between Version 18 and Version 19 of VLO/CMDI data workflow framework


Ignore:
Timestamp:
11/11/15 10:14:02 (9 years ago)
Author:
go.sugimoto@oeaw.ac.at
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • VLO/CMDI data workflow framework

    v18 v19  
    1717OLAC and CMDI are the two formats allowed to be imported into VLO environment, and the former is converted to CMDI by a predefined mapping. When CMDI is ready, it is being ingested into the solr/lucene index, governed by a set of configuration files: facetConcepts.xml dealing with the mapping of elements to facets (via concepts) and a set of text files defining the normalisation of values. These files are the essence of the CMDI-VLO facet mapping, and, in principle, edited manually by the VLO curators. The processed data will be indexed and published seamlessly on the VLO website, where the end users can browse and search data. The VLO curators also have some difficult time to control the data quality, because they have to manually edit raw files (XML or CSV alike) of concept mapping and value mapping and normalisation, in conjunction with the external CLARIN services. They also need to examine the outcomes on the public website to check the data integrity.
    1818
     19Known issues and potential solutions:
     20
     21||= When? = ||= What? =||= Solutions? =||
     22
     23The following sections will try to resolve the issues above, implementing the solutions.
    1924
    2025== Optimised VLO data workflow ==
    2126
    2227This section will highlight two main tasks of development illustrated in Figure 2: 1) VLO Dashboard data management system and 2) enhanced MD authoring tool. In terms of implementation, it seems most natural to ask VLO central developers to work on 1) and CLARIN centers to work on 2), simply due to the current responsibilities of the local and central development, so that the two can be developed in parallel, if doable.
     28
     29[[Image()]]
     30
     31{{{Figure 2 - VLO Dashboard data management system and extended CLARIN centre MD authoring tool
     32(VLO Dashboard will manage the whole data ingestion and publication pipeline, communicating with extra CLARIN services.
     33MD authoring tool will connect to the extra CLARIN services)}}}
    2334
    2435
     
    5061* Send an email to the data provider (eg data quality report)
    5162
    52 
    53 
     63[[Image()]]
    5464{{{Figure 3. Dashboard overview of data ingestion process}}}
    5565
     
    6474
    6575
     76[[Image()]]
    6677{{{Figure 4. Concept mapping interface in line with the XML code structure}}}
    6778
    68 
     79[[Image()]]
    6980{{{Figure 5. Value mapping and normalisation interface in line with the XML code structure}}}
    7081
     
    93104
    94105
    95 It is also very important all the technical implementation in this document follows the VLO guidelines and recommendations which will indicate what to be done to organise and manage the metadata for the sake of the end-users.
     106It is also very important all the technical implementation in this document follows '''the VLO guidelines and recommendations''' which will indicate what to be done to organise and manage the metadata for the sake of the end-users.
    96107
    97108