== our contact persons == Christian Thomas, Susanne Haaf, Axel Herold (BBAW), Dieter Van Uytvanck == their contact persons == Alastair Dunning (coordination), Nuno Freire (technical, metadata), Antoine Isaac (metadata, LOD) == collaboration == === harvesting metadata === * harvest their metadata: an an example selection is available at [http://catalog-clarin.esc.rzg.mpg.de/vlo/search?fq=collection:The+European+Library+Central+Repository an example selection the alpha version of the VLO] * the CLARIN-D VLO taskforce is [wiki:VLO-Taskforce/Meeting-2015-04-23#a3ResourcestobeharvestedfortheVLOraisedbyDieter checking all available sets] to decide which to include * [https://docs.google.com/spreadsheets/d/1SGnkqTxJ00wTlV-mieD5xPKUsdZq3vSiXiLPXYSHonY/edit?usp=sharing analysis of the sets for harvesting] * [https://docs.google.com/document/d/1coeMFNHSkcg9wYTf2VrahjruKEGurrLr0ujd02qjFy4/edit?usp=sharing deliverable on collaboration with a.o. CLARIN] * the BBAW colleages will make a CMDI profile for historical newspapers that The European Library could use === full-text access === * look at use cases with full text OCRed data (see note below) === Europeana sounds === * Some CLARIN centres are also data providers for Europeana in the context of [http://europeanasounds.eu Europeana Sounds], e.g.: * [http://www.europeana.eu/portal/search.html?query=*%3A*&rows=24&qf=PROVIDER%3A%22Europeana+Sounds%22&qf=DATA_PROVIDER%3A%22The+Language+Archive%22&qt=false The Language Archive]: contact persons: [http://www.mpi.nl/people/koenig-alexander Alexander König] and [http://www.mpi.nl/people/geerts-jeroen Jeroen Geerts] * The British Library * Important is that we avoid 2 unnecessary metadata conversion (CMDI > EDM > CMDI), so harvest the source CMDI from the CLARIN centres directly == information from Nuno Freire == {{{ Meanwhile, maybe you can have a look at what we have in Github. It contains only one newspaper title, but you can see the metadata available and the fulltext (as plain text files). https://github.com/nfreire/HistoricalNewspapersCorpus/ Some more notes: - We will not continue using Github (so don't expect to have the materials available through Git) - The metadata formats in there are the final available ones: Dublin Core, EDM (Europeana Data Model - a richer format used in the Europeana network) - If we find the storage capacity, we will make available also the ALTO files, along with the plain text. }}} == more information == * [https://basecamp.com/1768384/projects/8527777 Europeana Research Basecamp page] - if interested ask one of our contact persons for an invitation