non-English language names should be ignored

Some CMDI files contain both a language name and a language code, eg:

<LanguageName xml:lang="de">Deutsch</LanguageName>

This leads to a double facet value: "German" and "Deutsch", like in:

When the xml:lang attribute has a value that differs from "en" or "eng" the LanguageName? should be ignored upon import.

Note, in the given example the XSD is neatly annotated with the ISOCAT datacategory, resulting in multiple xpaths. Needs some experimentation, situation may differ for different instances or components.

To clarify, as these are automatically generated XPATHS the solution is not completely trivial. (if we don't want to break a lot of stuff).

Of which ticket is this a duplicate?

There is an additional problem in CMDI files where the xs:lang is not specified, eg:


I see several possibe solutions

  • ignore LanguageName? completely when one code and one name are available
  • have a large mapping list with known language names that map to the canonical English name, eg Nederlands > Dutch, Deutsch > German (could be part of the import or of a curation module that runs beforehand)

comment:5 Changed 12 years ago by sanmai

I agree about ignoring LanguageName?. Shouldn't such a mapping be to the language code instead of to common or canonical names? No language identifier is as canonical as an ISO 639-3 code. I would say, there is only need for an ISO 639-3 code. Any further descriptives can be sourced from the SIL code table and that data even changes from time to time.

For certain languages (e.g. many of the ones studied at the MPI) no ISO-639-3 identifier exists. So we need a fallback in case no code is given, but for most of the CMDI records there is such a code, and in that case the language name can be ignored.

Priority: minormajor

Boosting priority as this can be relatively easily fixed (for the case when xml:lang is there) and more and more records result in double language entries.

Splitting 3.1 milestone. Most open tickets go to 3.2 so that we can have a release on the short term.

Fixed in trunk (r6150)

