wiki:CmdiVirtualLanguageObservatory

Version 27 (modified by dietuyt, 13 years ago) (diff)

--

Notes on the re-implementation of the VLO

inspiration sources

technology to be used

  • SOLR for the backend
  • wicket for the front-end
  • CMDI for the input

data sources

requirements

  • in first instance same functionality as VLO 1.0
  • using CMDI as direct input
  • being reliable and debuggable
  • using standardized well-maintained libraries
  • a button to report errors spotted in the metadata
  • link the ISO-639-3 language codes to http://www.clarin.eu/external/language.php?code=eng so that users can get more information about a used language at each point

suggested mapping CMDI OLAC profile > VLO 2.0 fields/facets

/CMD/Components/OLAC-DcmiTerms/title -> name

/CMD/Components/OLAC-DcmiTerms/subject -> subject when multiple subject elements exist, give a preference to (in order of preference):

[@dcterms-type="LCSH"]

ignore (for now, until we have a resolution solution for the numeric codes): [@dcterms-type="DDC"] [@dcterms-type="LCC"]

/CMD/Components/OLAC-DcmiTerms/publisher -> organisation

/CMD/Header/MdSelfLink -> id

/CMD/Components/OLAC-DcmiTerms/language[@olac-language] -> language

/CMD/Header/MdSelfLink (URL after first ":") (via OAI-PMH) -> origin

/CMD/Components/OLAC-DcmiTerms/type[@olac-linguistic-type] -> genre

/CMD/Components/OLAC-DcmiTerms/description -> description

/CMD/Components/OLAC-DcmiTerms/identifier (if starting with http:// or hdl:) -> open in original context (now: IMDI browser)

/CMD/Components/OLAC-DcmiTerms/date -> year (new facet, extract yyyy from yyyy-mm-dd or yyyy-mm-ddThh:mm:ssZ or take over yyyy)

Some example dates, this won't be easy to curate and put in nice little facets.
<date>1985-87</date>
<date>1988-90?</date>
<date>Jan-Feb 1990</date>
<date>Dec 1988-</date>
<date>1988-?</date>
<date>5/91</date>
<date>16/9/90</date>
<date>5/91</date>
<date>6/4/1991</date>

/CMD/Components/OLAC-DcmiTerms/type[@dcterms-type="DCMIType"] -> resource type (new facet)

/CMD/Components/OLAC-DcmiTerms/spatial[@dcterms-type="ISO3166"] -> country

/CMD/Components/OLAC-DcmiTerms/coverage[@dcterms-type="ISO3166"] -> country

/CMD/Components/OLAC-DcmiTerms/format[@dcterms-type="IMT"] -> format (new facet, contains mime type)

suggested mapping CMDI LRT profile > VLO 2.0 fields/facets

/CMD/Components/LrtInventoryResource/LrtCommon/ResourceName -> name

/CMD/Header/MdSelfLink -> id

/CMD/Components/LrtInventoryResource/LrtCommon/Languages/ISO639/iso-639-3-code -> language (convert code to full language name)

clarin.eu -> origin (fixed value)

?? -> genre

/CMD/Components/LrtInventoryResource/LrtCommon/Description -> description

/CMD/Components/LrtInventoryResource/LrtCommon/MetadataLink (if not existing: ReferenceLink?) -> open in original context

/CMD/Components/LrtInventoryResource/LrtCommon/FinalizationYearResourceCreation -> year

/CMD/Components/LrtInventoryResource/LrtCommon/ResourceType -> resource type

/CMD/Components/LrtInventoryResource/LrtCommon/Countries/Country/code -> country (convert code to full country name)

/CMD/Components/LrtInventoryResource/LrtCommon/Format-> format

short notes VLO meeting

  • have result list next to facets
  • result list: name + description
  • link back to original OLAC xml record / imdi record
  • post-process the harvested data before importing it into VLO: country, language, organisation
  • extract the year on the fly
  • eperiment with SOV facets, derived from the language name

language codes