wiki:IMDI2CMDI

Responsible for this page: Twan Goosen?. Last content check: 22-12-2014

back to the CMDI Interoperability overview

IMDI2CMDI

This CMDI transformation converts IMDI metadata documents into valid CMD documents.

From

To

Notes

IMDI lat-corpus CMD profile largely lossless‡
IMDI lat-session CMD profile largely lossless‡
IMDI Sign Language Session CMD profiles largely lossless‡

‡ Retention of all information not guaranteed. Tests with the TLA archive showed that all essential values were retained (vocabulary links and metadata are generally omitted though).


Status

testing


Contact


Contents

  1. Status
  2. Contact
  3. Contents
  4. Getting the code
  5. Description
    1. Usage information
    2. Questions and Answers


Getting the code

  • ISLE2CLARIN
    • You can browse the code on GitHub: here
    • Clone from: https://github.com/TheLanguageArchive/ISLE2CLARIN.git
  • MetadataTranslator stylesheets
    • You can browse the code on GitHub: here
    • Clone from: https://github.com/TheLanguageArchive/MetadataTranslator.git

Description

The Language Archive has developed a tool to batch convert ISLE metadata (IMDI) files to Component Metadata (CMDI), primarily for the purpose of migrating their own archive to CMDI. This tool, ISLE2CLARIN, performs the conversion on basis of an XSLT stylesheet and optionally performs validation on both the IMDI input and CMDI output. It can also be configured to skip certain files.

The stylesheet that defines the actual transformation is contained in a separate project called the MetadataTranslator. This project contains a REST service, which can be run in a servlet container to perform (bi-directional for selected profiles) translations between IMDI and CMDI on the fly, and a library of stylesheets that define the underlying transformations, which is also used by the ISLE2CLARIN tool.

An IMDI file is transformed into an instance of one out of a number of CMDI profiles, depending on the 'profile' of the IMDI. The main distinction is between IMDI Corpus and IMDI Session. Furthermore there are a number of specialised Session profiles that map to distinct CMDI profiles:

  • CGN (Corpus spoken dutch)
  • CNGT (Dutch sign language)
  • DBD (Dutch Bilingual Database)

The imdi2cmdi stylesheet depends on a set of language code mappings. For each language code that it encounters, it attempts to output the ISO-639-3 representation.

The transformation from IMDI to CMDI is not guaranteed to be lossless. However, it has been tested on a representative selection of the IMDI archive of TLA and was found to retain all essential values. By design, some information is discarded: the links and specifications with respect to external vocabularies, for example, is not kept. The embedded history information gets updated in the transformation process.

TLA has also developed a set of CMDI2IMDI transformations, which are available as a part of the Metadata Translator as well. These are provided 'as is' and transformations are likely to be lossy.


Usage information

The easiest way to convert one or more IMDI files to CMDI is by running the ISLE2CLARIN executable jar:

java -jar isle2clarin.jar <DIR with IMDI files>

For more information and options, see the documentation of ISLE2CLARIN.

Questions and Answers

If you have a question about the IMDI2CMDI transformation please contact Alexander König?!

Last modified 9 years ago Last modified on 12/22/14 15:05:58