Changes between Version 5 and Version 6 of VLO/CMDI data workflow framework


Ignore:
Timestamp:
11/05/15 14:38:48 (9 years ago)
Author:
go.sugimoto@oeaw.ac.at
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • VLO/CMDI data workflow framework

    v5 v6  
    1212
    1313== Current VLO data workflow ==
    14 (may not be accurate. to be precisely produced soon)
     14{{{(may not be accurate. to be precisely produced soon)}}}
    1515
    1616Figure 1 illustrates the current state of data workflow. (It may not be 100% accurate, but will be modified later, if needed) It is well established, but not optimised. A data provider may use a metadata authoring tool hosted at one of the CLARIN centres. The typical examples are ARBIL in the Netherlands, COMEDI developed in Norway and the submission form of DSpace as developed in Czech. It provides an easy-to-use GUI web interface where a CMDI profile can be imported or generated and metadata records can be created. Each of the tools more less tightly integrates with the underlying repository, where the metadata is stored together with the actual resources in one digital object. The metadata is exposed via an OAI-PMH endpoint from where it is fetched by VLO harvester on a regular basis. However, while the authoring tools try to provide a local control over the quality of the metadata (offering custom auto-complete functionality and various consistency checks), a common, formal and rigorous mechanism is lacking for the authoring tools to control the quality of metadata which VLO team is struggling to cope with. The ability of these applications  is limited to synchronise and interoperate with four extra CLARIN services, namely Centre Registry, Component Registry, CLAVAS, and CCR. In particular, CLAVAS is not used as authoritative source of controlled vocabularies. At the moment, the data providers are required to make some effort to improve the metadata quality by consultation.
     
    2323== Phase I - VLO Dashboard data management system and extended CLARIN centre MD authoring tool ==
    2424 
    25 VLO Dashboard will manage the whole data ingestion and publication pipeline, communicating with extra CLARIN services. MD authoring tool will connect to the extra CLARIN services.
     25{{{VLO Dashboard will manage the whole data ingestion and publication pipeline, communicating with extra CLARIN services. MD authoring tool will connect to the extra CLARIN services}}}
     26
    2627[[Image()]]
    2728
     
    7980
    8081== Phase II - Extended Dissemination Information Package (DIP) ==
    81 OAI-PMH repository / RDF triple store /LOD API
     82{{{OAI-PMH repository / RDF triple store /LOD API}}}
    8283
    8384After establishing a best practice ingestion data workflow, this phase is rather simple. It can be skipped, if it is not needed or to be incorporated in another phase. In essence, it will add more sophisticated Dissemination Information Package (DIP) module where data records will be stored in more advanced format(s) and could be accessible via different methods.
     
    8788
    8889== Phase III - VLO central service ==
    89 
    90 Integration of CLARIN centre services (MD authoring tool) and extra services (Centre Registry, Component Registry, CLAVAS, CCR, and Persistent Identifier) with one Dashboard VLO data management system
     90{{{Integration of CLARIN centre services (MD authoring tool) and extra services (Centre Registry, Component Registry, CLAVAS, CCR, and Persistent Identifier) with one Dashboard VLO data management system }}}
    9191
    9292This is rather an ambitious plan. The phase III is to eliminate redundant services within the VLO framework, and the possibly best solution is to introduce cloud-like central environment where all services are served centrally, and the CLARIN partners will access via two major entry points to manage data in the whole system.