Opened 11 years ago
Closed 9 years ago
#246 closed enhancement (fixed)
exclude actor language from language facet
Reported by: | dietuyt | Owned by: | teckart@informatik.uni-leipzig.de |
---|---|---|---|
Priority: | major | Milestone: | VLO-3.3 |
Component: | VLO importer | Version: | |
Keywords: | Cc: | teckart, keeloo, mwindhouwer |
Description
Right now all elements that are linked to the data category 2482 (language code) and 2484 (language name) are imported to the facet "language". As reported by Florian Schiel and Jan Odijk, this leads to the situation where languages that do not appear in a certain recording are listed nevertheless in the VLO.
To avoid this, some kind of check should find place upon import if one of the ancestors of the element is not linked to a container data category that indicates this language is *not* a relevant content language. Currently there is only http://www.isocat.org/datcat/DC-4146 (container data category "actor") but this list will be extended (e.g. for documentation language)
See also: example of such a CMDI , leading to the (incorrect) language facet value "Latin"
Change History (10)
comment:1 Changed 11 years ago by
Cc: | teckart keeloo added |
---|
comment:2 Changed 11 years ago by
comment:3 Changed 11 years ago by
Considering the CMDI example in the initial report: Wouldn't you want to look for SubjectLanguage? instead of ActorLanguage? anyway?
comment:4 Changed 11 years ago by
Cc: | mwindhouwer added |
---|
Is addressed by adding the necessary container data categories (overview) and Menzo's recent addition to the VLO ingester that takes care of the container data category. To be tested soon.
comment:5 Changed 11 years ago by
Milestone: | → VLO-2.16 |
---|
comment:6 Changed 10 years ago by
Component: | VLO web app → VLO importer |
---|---|
Milestone: | VLO-2.16 |
comment:7 Changed 10 years ago by
Owner: | changed from herste to Twan Goosen |
---|---|
Status: | new → assigned |
comment:8 Changed 9 years ago by
Owner: | changed from Twan Goosen to teckart@informatik.uni-leipzig.de |
---|
comment:9 Changed 9 years ago by
Milestone: | → VLO-3.3 |
---|
Moved a number of existing VLO tickets to 3.3 milestone
comment:10 Changed 9 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Enabling context blacklisting again using concept DC-4146 as illegal context (r6318). No direct impact on extraction as problematic elements were already blacklisted by using simple blacklist patterns. Improves readability of facetconcept file. Very likely more blacklisting rules have to be added in the future for other facets.
some context - a related mail from Florian on this subject:
Hi Florian,
Thanks for reporting I'll answer this mail in 2 parts as different
people are involved for each issue. Here is part 1 - container data
categories and the VLO
On 30/11/12 09:44 , Florian Schiel wrote:
(see
http://catalog.clarin.eu/ds/ComponentRegistry?item=clarin.eu:cr1:c_1271859438147)
Yes, it's not the most elegant solution. But I changed it this way
because you reported earlier on that the value of actor.cmdi-language
ended up in the VLO facet "Language" as explained in
https://trac.clarin.eu/ticket/246
So we need to be able to distinguish here between content language and
actor language. I see 3 options:
container datcat for actor + the general language datcat) - that is what
the ticket above says. The only way to make sure that this is excluded
is to have both container datcat and language datcat in the same
component (relying on a higher located actor somewhere does not work if
someone uses this component without embedding it in an actor)
(backward compatible)
± 7000 language codes) and give it a more detailed data category
(content language, actor language, etc.)
(maybe backward compatible if we give it the same name which would be
confusing)
create new components that are purely a wrapper around
iso-language-639-3 so that we have a direct
[actor container DC].[language ID DC]
[content container DC].[language ID DC]
etc.
(not backward compatible)
What are the opinions on these approaches?