wiki:CmdiVirtualLanguageObservatory/ManualMapping/HZSK
now I'm afraid I don't have the simple black- and whitelist you asked for, and I also have a kind of fundamental question:
Why aren't we trying (or are we, and I don't know about it?) to harmonize
a) the VLO Facets
b) the FCS ResourceInfo
c) the recommended CLARIN CMDI CollectionInfo component?
Wouldn't it be a good idea since we're collecting so much information on profile, component and element usage to just decide on a small common set of information that will simply be required for a resource to integrate smoothly into the CLARIN infrastructure? And should there be any changes to this set, at least they'll only affect this component. If required, I assume it would also be possible to create corresponding new CMDI profiles and instances automatically by adding the component and filling it out on the basis of the XPaths collected using some XSLT. Redundancy is ugly, but if the data is not maintained manually it might not be a problem? In addition we'd have a user-friendly recommended component on which new CMDI profiles could be based.
Anyway, now to the real question, the mapping for the HZSK resources, please let me know if you need more/other information...

1. We use the SelfLink.

(2. Yes, OK.)

3. Project name: In the cmdi-project component the Name and Title DCs are both used (with different definitions) - please just take one (preferrably Title, DC-2537) if both are present.

4. Name: Not sure whether you use our dc-title or the DC-2545-Title from GeneralInfo/Title, but at least there's not Name (DC-2544) and Title (DC-2545), which is good.

5. Year: Maybe there could be some post processing to extract only years from the dates and the create a time span if there are several years? We should probably also differentiate between the creation of a resource meaning the publishing of a text or recording of a conversation and the publication of such resources as a digital corpus, but this is a facet semantic question...

6. Continent: I don't know why VLO doesn't accept our two-letter ISO codes, but it's good, since we only filled out Continent within the OriginLocation component - see 7. Country.

7. Country: Since the "Component" (I don't know what went wrong there) OriginLocation in cmdi-COLLECTION doesn't have any documentation or an ISOcat-DC, it's hard to know what it's supposed to mean and I don't remember what I used as a reference, but our interpretation is the origin of the resource, not the data, i.e. Germany for our corpora, even if the conversations were recorded in Argentina, but this is not what the user expects as "Country", therefore either please
 - blacklist "//OriginLocation/Location/Country" and "//OriginLocation/Location/Continent"
or if we got the meaning of the facet wrong, let me knwo and we'll change our CMDI data.

8. Language(s): If Florians conclusion is right, i.e. we were all (including whoever wrote the "original" imdi and cmdi profiles we were re-using) using the wrong component then we'll have to change our components, but maybe you could also (this would work in our case)
 - use http://purl.org/dc/elements/1.1/language exclusively if present?
Btw I'll look into the "Spanish" vs. "Spanish; Castilian" issue later.

9. Organisation: This facet is kind of vague, since the ISOcat DCs are not organisation = publisher, but the Dublin Core element publisher of course is. We have the dc-publisher, but instead the Collection/Access/Contact/Organisation with DC-2979 ("developers or distributors") is used.

10. Genre: VLO only uses one of our sub-genres (DC-3899) per resource.

11. Subject: We use the dc-subject component with the http://purl.org/dc/elements/1.1/subject conceptlink, but there are no results for our corpora.

12. Description: Please
 - blacklist //HZSKAssociatedFile/Type
The DC-2520 in the component hzsk-shortDescription works great, we also have a more general description in the dc-description component that is not used by the VLO.

13. Resource type: Since we know these are language resources and tools, maybe we could be a bit more specific - and user-friendly. Perhaps DC-3806 would be an alternative/addition?

14. National project: Has been fixed by Thomas, who was faster than me;), for the next update.

(15. OK)

16: Tag: I'd like to change the ISOcat-DC for the component hzsk-keyword to 5436, but can't change my public components in the registry...:)

Last modified 10 years ago Last modified on 11/11/13 16:52:34