Changes between Version 5 and Version 6 of VLO/CMDI data workflow framework
- Timestamp:
- 11/05/15 14:38:48 (9 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
VLO/CMDI data workflow framework
v5 v6 12 12 13 13 == Current VLO data workflow == 14 (may not be accurate. to be precisely produced soon) 14 {{{(may not be accurate. to be precisely produced soon)}}} 15 15 16 16 Figure 1 illustrates the current state of data workflow. (It may not be 100% accurate, but will be modified later, if needed) It is well established, but not optimised. A data provider may use a metadata authoring tool hosted at one of the CLARIN centres. The typical examples are ARBIL in the Netherlands, COMEDI developed in Norway and the submission form of DSpace as developed in Czech. It provides an easy-to-use GUI web interface where a CMDI profile can be imported or generated and metadata records can be created. Each of the tools more less tightly integrates with the underlying repository, where the metadata is stored together with the actual resources in one digital object. The metadata is exposed via an OAI-PMH endpoint from where it is fetched by VLO harvester on a regular basis. However, while the authoring tools try to provide a local control over the quality of the metadata (offering custom auto-complete functionality and various consistency checks), a common, formal and rigorous mechanism is lacking for the authoring tools to control the quality of metadata which VLO team is struggling to cope with. The ability of these applications is limited to synchronise and interoperate with four extra CLARIN services, namely Centre Registry, Component Registry, CLAVAS, and CCR. In particular, CLAVAS is not used as authoritative source of controlled vocabularies. At the moment, the data providers are required to make some effort to improve the metadata quality by consultation. … … 23 23 == Phase I - VLO Dashboard data management system and extended CLARIN centre MD authoring tool == 24 24 25 VLO Dashboard will manage the whole data ingestion and publication pipeline, communicating with extra CLARIN services. MD authoring tool will connect to the extra CLARIN services. 25 {{{VLO Dashboard will manage the whole data ingestion and publication pipeline, communicating with extra CLARIN services. MD authoring tool will connect to the extra CLARIN services}}} 26 26 27 [[Image()]] 27 28 … … 79 80 80 81 == Phase II - Extended Dissemination Information Package (DIP) == 81 OAI-PMH repository / RDF triple store /LOD API 82 {{{OAI-PMH repository / RDF triple store /LOD API}}} 82 83 83 84 After establishing a best practice ingestion data workflow, this phase is rather simple. It can be skipped, if it is not needed or to be incorporated in another phase. In essence, it will add more sophisticated Dissemination Information Package (DIP) module where data records will be stored in more advanced format(s) and could be accessible via different methods. … … 87 88 88 89 == Phase III - VLO central service == 89 90 Integration of CLARIN centre services (MD authoring tool) and extra services (Centre Registry, Component Registry, CLAVAS, CCR, and Persistent Identifier) with one Dashboard VLO data management system 90 {{{Integration of CLARIN centre services (MD authoring tool) and extra services (Centre Registry, Component Registry, CLAVAS, CCR, and Persistent Identifier) with one Dashboard VLO data management system }}} 91 91 92 92 This is rather an ambitious plan. The phase III is to eliminate redundant services within the VLO framework, and the possibly best solution is to introduce cloud-like central environment where all services are served centrally, and the CLARIN partners will access via two major entry points to manage data in the whole system.