Opened 8 years ago
Closed 8 years ago
#836 closed defect (fixed)
Support extraction of CMD attributes based on conceptlinks
Reported by: | teckart@informatik.uni-leipzig.de | Owned by: | teckart@informatik.uni-leipzig.de |
---|---|---|---|
Priority: | major | Milestone: | VLO-4.0 |
Component: | VLO importer | Version: | |
Keywords: | Cc: | Twan Goosen |
Description
Change History (12)
comment:1 Changed 8 years ago by
Cc: | Twan Goosen added |
---|
comment:2 follow-up: 4 Changed 8 years ago by
comment:3 Changed 8 years ago by
Owner: | changed from teckart@informatik.uni-leipzig.de to Twan Goosen |
---|---|
Status: | new → assigned |
comment:4 Changed 8 years ago by
Owner: | changed from Twan Goosen to teckart@informatik.uni-leipzig.de |
---|
Replying to teckart@…:
@Twan: could you add a concept link to "@olac-language" in clarin.eu:cr1:p_1288172614026? (probably: http://hdl.handle.net/11459/CCR_C-2482_08eded24-4086-7e3f-88e5-e0807fb01e17). AFAIK this shouldn't do any harm as importer v3.3 ignores these attributes and according to CMD the link is missing anyway. When changed I will start a test import on my local test machine and we can have a look on the consequences.
Done (see clarin.eu:cr1:p_1288172614026).
comment:5 Changed 8 years ago by
Milestone: | → VLO-4.0 or later |
---|
comment:6 Changed 8 years ago by
Support added with e3b762473. This shouldn't change a lot as relevant attributes are included in the facet definition via fallback patterns.
For an active use (and a cleanup of the facetconcepts file) some components have to be modified (i.e. conceptlinks have to be added). The only apparent change is that language/@olac-language for OLAC-DcmiTerms? (clarin.eu:cr1:p_1288172614026) is now extracted based on its conceptlink (and its pattern is removed). There may be new values extracted from attributes currently not added via patterns, but a test import didn't show noise in one of the facets.
Nonetheless a more systematic analysis of the fall-back patterns should take place before the 4.0 release; leaving this ticket open.
comment:8 Changed 8 years ago by
Milestone: | VLO-4.1 or later → VLO-4.0 |
---|
comment:9 Changed 8 years ago by
In order to extract values from attributes for "teiHeader" profile the xpath:
"./xs:complexType/xs:simpleContent/xs:extension/xs:attribute" is not sufficient,
"./complexType/attribute" is needed as well.
Unfortunately I couldn't find the problematic CMD record but it was something related to the availability facet and
"status" attribute in xsd, look for:
<xs:attribute name="status" dcr:datcat="http://hdl.handle.net/11459/CCR_C-5303_9afa7d0c-f292-8e89-24f8-997a3d2971ae">
comment:10 Changed 8 years ago by
Thanks for the hint regarding profile "clarin.eu:cr1:p_1380106710826". The current solution only works for attributes on CMD elements. "./complexType/attribute" is needed for component attributes.
Example: CLARIN_DK_UCPH_Repository/oai_clarin_dk_dkclarin_309874.xml (for CCR_C-5303 which is currently not part of the facet mapping)
comment:12 Changed 8 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Evaluation showed that 14 XPaths are currently extracted based on attribute concept links. Of those, two contain potentially relevant information for the language facet (OLAC-DcmiTerms? + teiHeader). The other 12 are now part of the blacklist (4602b98 + 41fcbf4).
Closing the ticket.
A first test implementation (that treats attributes the same way as elements) is available and waits for testing. There is still the question if we want to have a more advanced logic like "ignore element's content if there are attributes with the same conceptlink" or "prefer the first found concept link in the facetMapping file", but I guess we are not there yet.
@Twan: could you add a concept link to "@olac-language" in clarin.eu:cr1:p_1288172614026? (probably: http://hdl.handle.net/11459/CCR_C-2482_08eded24-4086-7e3f-88e5-e0807fb01e17). AFAIK this shouldn't do any harm as importer v3.3 ignores these attributes and according to CMD the link is missing anyway. When changed I will start a test import on my local test machine and we can have a look on the consequences.