Opened 10 years ago

Closed 9 years ago

#554 closed defect (fixed)

Missing language information for some files

Reported by: teckart@informatik.uni-leipzig.de Owned by: teckart
Priority: major Milestone: VLO-3.1
Component: VLO importer Version:
Keywords: Cc: Twan Goosen

Change History (13)

comment:1 Changed 10 years ago by DefaultCC Plugin

Cc: Twan Goosen added

comment:2 Changed 10 years ago by Twan Goosen

Milestone: VLO-3.0

comment:3 Changed 10 years ago by Jörg Knappen

For that particular resource, the language is not tagged in the Metadata. It seems that VLO 2.18 somehow extracts the language information from the title, the subject, or the SIL identifier.

Looks like a case for metadata curation to me.

comment:4 in reply to:  3 Changed 10 years ago by Twan Goosen

Replying to j.knappen@…:

For that particular resource, the language is not tagged in the Metadata. It seems that VLO 2.18 somehow extracts the language information from the title, the subject, or the SIL identifier.

Looks like a case for metadata curation to me.

The language code is in there actually:

...
<OLAC-DcmiTerms>
  ...
  <publisher>SIL International (www.sil.org)</publisher>
  <subject olac-language="kil"/>
  <title>Kariya: a language of Nigeria</title>
  ...

It matches the pattern

/c:CMD/c:Components//c:OLAC-DcmiTerms/c:subject/@olac-language

defined in facetConcepts.xml.

comment:5 Changed 10 years ago by teckart@informatik.uni-leipzig.de

That is true, but the normal behaviour of the importer is that these hard-coded patterns will be ignored if the CMDI profile contains at least one element that has a matching conceptlink. This is the case here ("OLAC-DcmiTerms?/language", although empty). Therefore normally this attribute is not used to fill the language facet for this profile.

I agree with Jörg here that it would be a cleaner way to use the element that was actually supposed to store the language information instead of using some underspecified attribute. Nonetheless it leaves the question how the language information was extracted in the production VLO.

comment:6 in reply to:  5 Changed 10 years ago by Twan Goosen

Replying to teckart@…:

I agree with Jörg here that it would be a cleaner way to use the element that was actually supposed to store the language information instead of using some underspecified attribute. Nonetheless it leaves the question how the language information was extracted in the production VLO.

The reason seems to be that the overriding (if I understand this correctly) concept link was added in the 3.0 branch in [4575] and this change was never applied to 2.x. Maybe the mapping for 3.0 needs to be reconsidered?

comment:7 Changed 10 years ago by teckart@informatik.uni-leipzig.de

That's it! I think we should leave the facet definition as it is (​as http://purl.org/dc/terms/language is a correct DC for this facet) and try to convince the creator to include values in the language element in the future. In my opinion the fixed patterns should just be seen as a shortterm workaround and I don't think that it is a good idea to optimize the VLO results for them.

comment:8 Changed 10 years ago by Twan Goosen

The direct creator of these CMDI's is the harvester, so maybe something can be done there, i.e. fill the 'language' field with the value from the attribute. Additionally or alternatively we could assign the data category to the 'olac-language' attribute in the profile specification as well. But maybe there are arguments against doing that?

comment:9 Changed 10 years ago by teckart@informatik.uni-leipzig.de

I would prefer the first solution (should be only a minor change in the XSLT). We could also assign a concept link to the attribute, but as there is already a "correct" element in the profile this is probably not necessary...

comment:10 Changed 10 years ago by Twan Goosen

Milestone: VLO-3.0

Taking this off the 3.0 milestone, issue has been resolved (or mitigated) by removing the "​http://purl.org/dc/terms/language" data category from the facet mapping (see #412, r5192, r5195).

comment:11 Changed 10 years ago by Twan Goosen

Closing this; followed up by MPI #4271.

Menzo has looked into an alternative mapping strategy where there is a fallback to patterns in case of no matches from the data categories; this might improve the situation, too.

comment:12 in reply to:  11 Changed 10 years ago by Twan Goosen

Replying to twan.goosen@…:

Closing this; followed up by MPI #4271.

Menzo has looked into an alternative mapping strategy where there is a fallback to patterns in case of no matches from the data categories; this might improve the situation, too.

See #668

comment:13 Changed 9 years ago by teckart@informatik.uni-leipzig.de

Milestone: VLO-3.1
Resolution: fixed
Status: newclosed

Concept "http://purl.org/dc/terms/language" was added again to the language facets (r5990), problematic ethnologue files are still working because of modified extraction mechanism (r5985).

Note: See TracTickets for help on using tickets.