== Notes on the re-implementation of the VLO ==
=== inspiration sources ===
* VLO 1.0: http://ems06.mpi.nl/cgi-bin/flamenco.cgi/flamh/Flamenco
* OLAC facet browser: http://syslsl01.library.upenn.edu/dla/olac/index.html
=== technology to be used ===
* SOLR for the backend
* wicket for the front-end
* CMDI for the input
=== data sources ===
* The CMDI'zed IMDI corpus: http://www.mpi.nl/imdi/documents/cmdi-20100924.tar.gz
* the CMDI files harvested via OAI-PMH: http://www.mpi.nl/imdi/documents/olac-cmdi-20101105.tar.gz
* CMDI version of the LRT inventory: http://www.mpi.nl/imdi/documents/lrt-20101201.tar.gz
=== requirements ===
* in first instance same functionality as VLO 1.0
* using CMDI as direct input
* being reliable and debuggable
* using standardized well-maintained libraries
* a button to report errors spotted in the metadata
* link the ISO-639-3 language codes to http://www.clarin.eu/external/language.php?code=eng so that users can get more information about a used language at each point
=== suggested mapping CMDI OLAC profile > VLO 2.0 fields/facets ===
/CMD/Components/OLAC-!DcmiTerms/title -> name
/CMD/Components/OLAC-!DcmiTerms/subject -> subject
when multiple subject elements exist, give a preference to (in order of preference):
[@dcterms-type="LCSH"]
ignore (for now, until we have a resolution solution for the numeric codes):
[@dcterms-type="DDC"]
[@dcterms-type="LCC"]
/CMD/Components/OLAC-!DcmiTerms/publisher -> organisation
/CMD/Header/MdSelfLink -> id
/CMD/Components/OLAC-!DcmiTerms/language[@olac-language] -> language
/CMD/Header/MdSelfLink (URL after first ":") (via OAI-PMH) -> origin
/CMD/Components/OLAC-!DcmiTerms/type[@olac-linguistic-type] -> genre
/CMD/Components/OLAC-!DcmiTerms/description -> description
/CMD/Components/OLAC-!DcmiTerms/identifier (if starting with !http:// or hdl:) -> open in original context (now: IMDI browser)
/CMD/Components/OLAC-!DcmiTerms/date -> year (new facet, extract yyyy from yyyy-mm-dd or yyyy-mm-ddThh:mm:ssZ or take over yyyy)
{{{
Some example dates, this won't be easy to curate and put in nice little facets.
1985-87
1988-90?
Jan-Feb 1990
Dec 1988-
1988-?
5/91
16/9/90
5/91
6/4/1991
}}}
/CMD/Components/OLAC-!DcmiTerms/type[@dcterms-type="DCMIType"] -> resource type (new facet)
/CMD/Components/OLAC-!DcmiTerms/spatial[@dcterms-type="ISO3166"] -> country
/CMD/Components/OLAC-!DcmiTerms/coverage[@dcterms-type="ISO3166"] -> country
/CMD/Components/OLAC-!DcmiTerms/format[@dcterms-type="IMT"] -> format (new facet, contains mime type)
=== suggested mapping CMDI LRT profile > VLO 2.0 fields/facets ===
/CMD/Components/LrtInventoryResource/LrtCommon/ResourceName -> name
/CMD/Header/MdSelfLink -> id
/CMD/Components/LrtInventoryResource/LrtCommon/Languages/ISO639/iso-639-3-code -> language (convert code to full language name)
clarin.eu -> origin (fixed value)
?? -> genre
/CMD/Components/LrtInventoryResource/LrtCommon/Description -> description
/CMD/Components/LrtInventoryResource/LrtCommon/MetadataLink (if not existing: ReferenceLink) -> open in original context
/CMD/Components/LrtInventoryResource/LrtCommon/FinalizationYearResourceCreation -> year
/CMD/Components/LrtInventoryResource/LrtCommon/ResourceType -> resource type
/CMD/Components/LrtInventoryResource/LrtCommon/Countries/Country/code -> country (convert code to full country name)
/CMD/Components/LrtInventoryResource/LrtCommon/Format-> format
=== short notes VLO meeting ===
* have result list next to facets
* result list: name + description
* link back to original OLAC xml record / imdi record
* post-process the harvested data before importing it into VLO: country, language, organisation
* extract the year on the fly
* eperiment with SOV facets, derived from the language name
=== language codes ===
* old SIL codes and their ISO-639-3 equivalent: http://www.ethnologue.com/14/show_language.asp?code=ach