56 | | This is a version of {{{imdi2clarin.xsl}}} that batch processes a whole directory structure of imdi files, call it from the command line like this: |
| 56 | [http://tla.mpi.nl The Language Archive] has developed a tool to batch convert ISLE metadata (IMDI) files to Component Metadata (CMDI), primarily for the purpose of migrating their own archive to CMDI. This tool, [https://github.com/TheLanguageArchive/ISLE2CLARIN ISLE2CLARIN], performs the conversion on basis of an XSLT stylesheet and optionally performs validation on both the IMDI input and CMDI output. It can also be configured to skip certain files. |
| 57 | |
| 58 | The [https://github.com/TheLanguageArchive/MetadataTranslator/tree/master/Translator/src/main/resources/templates/imdi2cmdi stylesheet] that defines the actual transformation is contained in a separate project called the [https://github.com/TheLanguageArchive/MetadataTranslator MetadataTranslator]. This project contains a REST service, which can be run in a servlet container to perform (bi-directional for selected profiles) translations between IMDI and CMDI on the fly, and a library of stylesheets that define the underlying transformations, which is also used by the ISLE2CLARIN tool. |
| 59 | |
| 60 | An IMDI file is transformed into an instance of one out of a number of CMDI profiles, depending on the 'profile' of the IMDI. The main distinction is between '''IMDI Corpus''' and '''IMDI Session'''. Furthermore there are a number of specialised Session profiles that map to distinct CMDI profiles: |
| 61 | * CGN (Corpus spoken dutch) |
| 62 | * CNGT (Dutch sign language) |
| 63 | * DBD (Dutch Bilingual Database) |
| 64 | |
| 65 | The imdi2cmdi stylesheet depends on a set of language code mappings. For each language code that it encounters, it attempts to output the ISO-639-3 representation. |
| 66 | |
| 67 | The transformation from IMDI to CMDI is '''not''' guaranteed to be lossless. However, it has been tested on a representative selection of the IMDI archive of TLA and was found to retain all essential values. By design, some information is discarded: the links and specifications with respect to external vocabularies, for example, is not kept. The embedded history information gets updated in the transformation process. |
| 68 | |
| 69 | TLA has also developed a set of CMDI2IMDI transformations, which are available as a part of the Metadata Translator as well. These are provided 'as is' and transformations are likely to be lossy. |
| 70 | |
| 71 | ---- |
| 72 | === Usage information === |
| 73 | The easiest way to convert one or more IMDI files to CMDI is by running the ISLE2CLARIN executable jar: |
62 | | Two optional parameters can be provided: |
63 | | |
64 | | * ''collection'': a collection name can be specified for each record. This information is extrinsic to the IMDI file, so it is given as an external parameter. Omit this if you are unsure. |
65 | | * ''uri-base'': if this optional parameter is defined, the behaviour of this stylesheet changes in the following ways: If no archive handle is available for !MdSelfLink, the base URI is inserted there instead. All links (!ResourceProxy elements) that contain relative paths are resolved into absolute URIs in the context of the base URI. Omit this if you are unsure. |
66 | | |
67 | | === Technical information === |
68 | | |
69 | | No Saxon specific extensions are used. However, the exact syntax used by a {{{collection()}}} invocation is implementation defined, so might need to be adapted when using another XSLT 2.0 engine then Saxon. |
| 78 | For more information and options, see the documentation of [https://github.com/TheLanguageArchive/ISLE2CLARIN ISLE2CLARIN]. |