= Repository Registry = We need a service/registry that informs about the available repositories. Until we have that formalized, we will collect the information about individual providers here. == Candidate Services, Implementing Endpoints == Here is one preliminary list of services available (or planned) within the ''FederatedSearch - '''EDC''''' ||= Provider =||= DB/Collection =||= link =||= type =||= language =||= status =|| || KNAW || mimore || http://www.meertens.knaw.nl/mimore/srucql/ || dialects Lexicon? || Dutch dialects? || online, level 0, open issues, [[http://www.meertens.knaw.nl/mimore/srucql/?operation=searchRetrieve&query=koe&version=1.2| sample query]] || || MPI || IMDI-subset || http://cqlservlet.mpi.nl/ || spoken corpus || Dutch? || online, level 0, open issues || || INL || INL? || http://ds.dev.clarin.inl.nl/cqlwebapp/cql || text corpus? || Dutch? || online, level 0, open issues, [[http://ds.dev.clarin.inl.nl/cqlwebapp/cql?operation=searchRetrieve&query=a&version=1.2&maximumRecords=10 | sample query]] || || DANS || Lieffering || http://95.211.15.130:8080/CqlSearch/ || historical text corpus || Dutch, French || Online, level 1, no known issues,[[http://95.211.15.130:8080/CqlSearch/?operation=searchRetrieve&query=koe&version=1.2| sample query]] || || ICLTT || C4 || http://corpus3.aac.ac.at/ddconsru || historical text corpus || German || online, level 0, open issues, [[http://corpus3.aac.ac.at/ddconsru?operation=searchRetrieve&query=Wasser | sample query]] || || UPF || El Pais Newspaper Corpus (Metadata) || http://gilmere.upf.edu/girona_sru || text corpus || Catalan || only metadata (date, lang), [[ http://gilmere.upf.edu/girona_sru/?version=1.1&operation=searchRetrieve&query=dc.coverage=2006-07-17&startRecord=1&maximumRecords=30&recordSchema=dc | sample query]] || || UPF || El Pais Newspaper Corpus 2005 || http://gilmere.upf.edu/pais_sru || text corpus || Spanish || online, allows index-based queries (e.g. queries only on content `ccs.content=...` [[http://gilmere.upf.edu/pais_sru/?version=1.1&operation=searchRetrieve&query=ccs.content=crisis&startRecord=1&maximumRecords=30&recordSchema=dc | sample query]]) || || Uni Tueb || Gutenberg || http://clarin.sfs.uni-tuebingen.de/gt/startPage.php || text corpus || German || planned || || OTA || BNC ? || http://ota.oucs.ox.ac.uk/ || text corpus || English || potential || || OTA || TRACTOR-archive || || text corpus? || Central an East EU || potential || || BAS || Speech Corpora || || spoken corpora || German || planned || || IDS || Goethe || || text corpus || German || online, level 0, open issues, [[http://clarin.ids-mannheim.de/cosmassru?operation=searchRetrieve&version=1.2&query=Gott | sample query]] [[http://clarin.ids-mannheim.de/cosmassru | explain ]] || || Gothenburg Uni || Spraakbanken || http://litteraturbanken.se/ || literary texts || Swedish? || planned? || || Gothenburg Uni || Spraakbanken || http://demosb.spraakdata.gu.se/korp/ || text corpus || Swedish? || planned? || || ICLTT || CMDI-MDService || http://clarin.aac.ac.at/MDService2/sru || Metadata || multiple langs || online, but very sloppy, too different || And following some reference SRU-services: || ''Provider'' || ''endpoint'' || ''info'' || || Library of Congress || http://z3950.loc.gov:7090/voyager || ? || || Oxford English Dictionary || http://www.oed.com/srupage || [http://www.oed.com/public/sruservice/sru-service info about OED-SRU service] || || Gutenberg (metadata) provided by indexdata || http://opencontent.indexdata.com/gutenberg || [http://www.indexdata.com/opencontent project Open Content] || == Current Issues == At the time of writing (3th of may) there are some issues with all of the sru/cql implementations. We will list them here per service. === http://www.meertens.knaw.nl/mimore/srucql/ === * Does not implement x-cmd-collections * `` is a closed element instead of wrapping around the * How to generate link to resource containing hit? === http://cqlservlet.mpi.nl/ === * Not XML-ified content in the DataView element. Should have at least wrapping of elements around the hit. === http://corpus3.aac.ac.at/ddconsru === * Does not implement scan on cmd.collections * How to generate link to resource containing hit? === http://gilmere.upf.edu/girona_sru === * Encodes searchretrieve responce in DCU format (and not our format) * Does not seem to use x-cmd-collections === http://clarin.aac.ac.at/MDService2/sru === * searchRetrieve returns a completely different xml thing. * scan on cmd.collections generates a nullpointerexception === http://ds.dev.clarin.inl.nl/cqlwebapp/cql === * Doesn't seem to return any results. == Requirements == Monitoring :: an associated service must check regularly the availability of the services. (and perhaps even more) == Ideas on implementation of the Registry == As already the above list suggests, simple list of endpoints will not be enough. Even if we try to rely as much as possible on the auto-configuration based on the explain record, additional information (''Metadata'') needs to be stored and provided. So one obvious choice would be to define a CMD-Profile '''for the Repositories'''. This profile would carry only the minimal necessary primarily technical information. In particular it shouldn't provide any information about the data provided, but rather link to separate MDRecords of a collection or a resource, that would provide information like `ResourceType`, `Language`, available `AnnotationTiers`, `Time/Space Coverage` etc. We also have to beware of a big semantic/functional overlap of such a `Repository`-MDRecord and the `SRU-explain`-response of given service and '''TODO:''' have to clarify the interaction and dependencies. Following is a proposal of a sample `Repository`-instance: {{{ Metadata {collection-handle} ? MPI corpus MPI Corpora: ESF, CGN, ... http://cqlservlet.mpi.nl/ SRU text http://corpus1.mpi.nl/ds/imdi_browser/ WebApp/User Interface }}} === indexdata solution === Indexdata/Masterkey framework (from which we are exampine the Pazpar2-component as aggregator) provides [[http://www.indexdata.com/irspy|IRSpy]] (GPL) and [[http://www.indexdata.com/masterkey|Torus]] (seems proprietary). == Further (future) Issues == * Integration with MD-search * Integration with VLO * Integration with the Virtual Collection Registry * Aggregrator/endpoint. We note that all three will ideally be integrated in the same manner. For this to happen there are a few conditions that MUST be met: * All centres generate nice CMDI * This CMDI is of a same granuality as the search contstrainability in the endpoint * Everyone includes in the CMDI files a unique set identifier thingy (e.g., the node-id mapping hdl.net thing at the MPI). * The endpoints understand the unique identifiers from their corresponding CMDI files!