Changes between Version 18 and Version 19 of VLO/CMDI data workflow framework
- Timestamp:
- 11/11/15 10:14:02 (9 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
VLO/CMDI data workflow framework
v18 v19 17 17 OLAC and CMDI are the two formats allowed to be imported into VLO environment, and the former is converted to CMDI by a predefined mapping. When CMDI is ready, it is being ingested into the solr/lucene index, governed by a set of configuration files: facetConcepts.xml dealing with the mapping of elements to facets (via concepts) and a set of text files defining the normalisation of values. These files are the essence of the CMDI-VLO facet mapping, and, in principle, edited manually by the VLO curators. The processed data will be indexed and published seamlessly on the VLO website, where the end users can browse and search data. The VLO curators also have some difficult time to control the data quality, because they have to manually edit raw files (XML or CSV alike) of concept mapping and value mapping and normalisation, in conjunction with the external CLARIN services. They also need to examine the outcomes on the public website to check the data integrity. 18 18 19 Known issues and potential solutions: 20 21 ||= When? = ||= What? =||= Solutions? =|| 22 23 The following sections will try to resolve the issues above, implementing the solutions. 19 24 20 25 == Optimised VLO data workflow == 21 26 22 27 This section will highlight two main tasks of development illustrated in Figure 2: 1) VLO Dashboard data management system and 2) enhanced MD authoring tool. In terms of implementation, it seems most natural to ask VLO central developers to work on 1) and CLARIN centers to work on 2), simply due to the current responsibilities of the local and central development, so that the two can be developed in parallel, if doable. 28 29 [[Image()]] 30 31 {{{Figure 2 - VLO Dashboard data management system and extended CLARIN centre MD authoring tool 32 (VLO Dashboard will manage the whole data ingestion and publication pipeline, communicating with extra CLARIN services. 33 MD authoring tool will connect to the extra CLARIN services)}}} 23 34 24 35 … … 50 61 * Send an email to the data provider (eg data quality report) 51 62 52 53 63 [[Image()]] 54 64 {{{Figure 3. Dashboard overview of data ingestion process}}} 55 65 … … 64 74 65 75 76 [[Image()]] 66 77 {{{Figure 4. Concept mapping interface in line with the XML code structure}}} 67 78 68 79 [[Image()]] 69 80 {{{Figure 5. Value mapping and normalisation interface in line with the XML code structure}}} 70 81 … … 93 104 94 105 95 It is also very important all the technical implementation in this document follows the VLO guidelines and recommendationswhich will indicate what to be done to organise and manage the metadata for the sake of the end-users.106 It is also very important all the technical implementation in this document follows '''the VLO guidelines and recommendations''' which will indicate what to be done to organise and manage the metadata for the sake of the end-users. 96 107 97 108