Changes between Version 12 and Version 13 of VLO/CMDI data workflow framework


Ignore:
Timestamp:
11/06/15 13:52:38 (9 years ago)
Author:
go.sugimoto@oeaw.ac.at
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • VLO/CMDI data workflow framework

    v12 v13  
    1212{{{(may not be accurate. to be precisely produced soon)}}}
    1313
    14 [[Image(Copy of Current VLO data workflow.png​)]]
     14[[Image(VLO data workflow current .png​)]]
    1515
    1616Figure 1 illustrates the current state of data workflow. (It may not be 100% accurate, but will be modified later, if needed) It is well established, but not optimised. A data provider may use a metadata authoring tool hosted at one of the CLARIN centres. The typical examples are ARBIL in the Netherlands, COMEDI developed in Norway and the submission form of DSpace as developed in Czech. It provides an easy-to-use GUI web interface where a CMDI profile can be imported or generated and metadata records can be created. Each of the tools more less tightly integrates with the underlying repository, where the metadata is stored together with the actual resources in one digital object. The metadata is exposed via an OAI-PMH endpoint from where it is fetched by VLO harvester on a regular basis. However, while the authoring tools try to provide a local control over the quality of the metadata (offering custom auto-complete functionality and various consistency checks), a common, formal and rigorous mechanism is lacking for the authoring tools to control the quality of metadata which VLO team is struggling to cope with. The ability of these applications  is limited to synchronise and interoperate with four extra CLARIN services, namely Centre Registry, Component Registry, CLAVAS, and CCR. In particular, CLAVAS is not used as authoritative source of controlled vocabularies. At the moment, the data providers are required to make some effort to improve the metadata quality by consultation.
     
    2222{{{VLO Dashboard will manage the whole data ingestion and publication pipeline, communicating with extra CLARIN services. MD authoring tool will connect to the extra CLARIN services}}}
    2323
    24 [[Image(Go - VLO data workflow(3).png​)]]
     24[[Image(VLO data workflow Phase 1 .png)]]
    2525
    2626In this phase, the main tasks are to develop 1) VLO Dashboard data management system and 2) enhanced MD authoring tool. In terms of implementation, it seems most natural to ask VLO central developers to work on 1) and CLARIN centers to work on 2), simply due to the current responsibilities of the local and central development, so that the two can be developed in parallel, if doable.
     
    8686{{{OAI-PMH repository / RDF triple store /LOD API}}}
    8787
    88 [[Image(Go - VLO data workflow(2).png)]]
     88[[Image(VLO data workflow Phase 2 .png)]]
    8989
    9090After establishing a best practice ingestion data workflow, this phase is rather simple. It can be skipped, if it is not needed or to be incorporated in another phase. In essence, it will add more sophisticated Dissemination Information Package (DIP) module where data records will be stored in more advanced format(s) and could be accessible via different methods.
     
    9696{{{Integration of CLARIN centre services (MD authoring tool) and extra services (Centre Registry, Component Registry, CLAVAS, CCR, and Persistent Identifier) with one Dashboard VLO data management system }}}
    9797
    98 [[Image(Copy of Go - VLO data workflow_longterm.png)]]
     98[[Image(VLO data workflow Phase 3 .png)]]
    9999
    100100This is rather an ambitious plan, and should be regarded as a potential. Although it may be a long way to go, it is also good to aim for a more innovative future and we can look back, from the futuristic viewpoint to adjust a course of long-term development strategies.