Opened 13 years ago

Closed 13 years ago

#61 closed enhancement (fixed)

add a resource type facet

Reported by: dietuyt Owned by: patdui
Priority: major Milestone:
Component: VLO web app Version:
Keywords: Cc:

Description

See http://trac.clarin.eu/wiki/CmdiVirtualLanguageObservatory for the mapping from LRT inventory to this to be created facet.

For IMDI we should look at the mime types of the resources linked in and map it to:

  • audio
  • video
  • text (txt, PDF)
  • annotation (EAF, shoebox, toolbox, CHAT)
  • image (png, jpg, gif, tiff)

For OLAC we should map the following values of /CMD/Components/OLAC-DcmiTerms?/type[@dcterms-type="DCMIType"]:

(based on http://dublincore.org/documents/dcmi-type-vocabulary/)

Still Image > image
Image > image
Text > text
Sound > audio
Moving Image > video

All these mappings will result in a heterogeneous fact ("Speech corpus" next to "video") but that's because of the differences in the metadata we are importing.

Attachments (1)

oai_crdo_fr_crdo000001.xml.cmdi (3.0 KB) - added by dietuyt 13 years ago.
CMDI file with a type element that should be used for the ResourceType? facet

Download all attachments as: .zip

Change History (5)

comment:1 Changed 13 years ago by patdui

The /CMD/Components/OLAC-DcmiTerms??/type[@dcterms-type="DCMIType"]

Should be put in the mimeType attribute of the ResourceType? Element:
<ResourceType? mimetype="">Resource</ResourceType?>

At least that is what is being used for imdi now, probably a good idea to get that all out when generating the cmdi files.

comment:2 Changed 13 years ago by herste

OLAC moet nog gemapped worden, (in de importer) dan is het goed :)

Changed 13 years ago by dietuyt

CMDI file with a type element that should be used for the ResourceType? facet

comment:3 Changed 13 years ago by dietuyt

I attached an example file of a olac CMDI description where there is no possibility to use the extension of the filename as guidance and where the only info about the resource type is in <type dcterms-type="DCMIType">Sound</type>

comment:4 Changed 13 years ago by patdui

Resolution: fixed
Status: newclosed

I added the resource type pattern for the olac DCMITypes and a postprocessor to map e.g. Still Image to image.

Currently the resourceType facet show all values of resourceTypes without normalizing them to "real" mimetypes. E.g. 'Lexicon / Knowledge Source' is a value, normalizing it would result in an "unknown type" mimetype. This mimetype is stored with the resource url. I think this is both good and bad. Bad because we show a list of resource type that are weird and with the postproccessor we remap some resource types but not all. Good because we show the data that is in the metadata record and that is perhaps what people will search for. If we should add the normalized mimetype I propose to make a new ticket to do this because this ticket gets a little bit sidetracked.

Note: See TracTickets for help on using tickets.