Changes between Version 4 and Version 5 of FCS-specification


Ignore:
Timestamp:
04/17/12 12:19:19 (12 years ago)
Author:
dietuyt
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FCS-specification

    v4 v5  
    3333The base of the communication protocol is SRU and the base of the query language is CQL. The wiki on trac.clarin.eu talks about the protocol and the query language in detail. We remark that for a proper understanding one needs to be familiar with the official documentation of the SRU/CQL standard, which can be found at http://www.loc.gov/standards/sru/.
    3434
     35*** However, it is clear that the VCR will have at least two types of collections: (1) intentional collections, i.e., collections which represent the will of the user for specific records such as a metadata query, and (2) explicit collections, i.e., collections which simply contain a list of PIDs.
     36
    3537== Corpus Selection ==
    3638We plan to show a list of available corpora (collections) that are searchable by the aggregator. Probably we need to allow people to make a rough selection first between types of collections (this could be relevant later). Each end-point serves one or more of these corpora/resources. If the endpoint serves more than one corpus/collection it must support the domain/scope selection method (as explained on the clarin.eu wiki).
     
    226228== Restricting the search ==
    227229
    228 It is possible to restrict the search to a list of PIDs that match resources. These are passed in the x-cmd-context parameter. We note that this list can get extensive, however we foresee two ways to deal with this problem.
    229 
    230 First, the clients can use HTTP POST instead of HTTP GET, which means that the parameters would be passed in the body of the request instead of in the URL. This has the consequence that the practical limit for the number of PIDs and other arguments becomes a LOT higher.
    231 
    232 Second, we propose the implementation of a virtual collection registry (VCR ). A PID could point to a specific virtual collection, which would result in the endpoint querying the virtual collection about the search-domain (the part of it relevant for the specific endpoint. Discussions on the VCR are active and ongoing within CLARIN-D.
    233 
    234 However, it is clear that the VCR will have at least two types of collections: (1) intentional collections, i.e., collections which represent the will of the user for specific records such as a metadata query, and (2) explicit collections, i.e., collections which simply contain a list of PIDs.
     230As mentioned above it is possible to restrict the content search to specific (sub)collections. A list of these (comma-seperated [http://www.clarin.eu/faq/3460 MdSelfLinks]) can be passed on in the x-context parameter. We note that this list can get extensive. However, the clients calling the SRU endpoint can use HTTP POST instead of HTTP GET, which means that the parameters would be passed in the body of the request instead of in the URL. This has the consequence that the practical limit for the number of MdSelfLinks and other arguments becomes a LOT higher.
     231
     232For example, in order to only search for the occurence of the string "bellen" within the CGN corpus as offered by our hypothetical endpoint (used throughout this document) we could format the searchRetreive query as follows;
     233
     234http://clarin_sru_endpoint?version=1.2&operation=searchRetrieve&x-context=hdl:1839/00-0000-0000-0001-53A5-2&query=bellen
     235
     236In short, we pass a comma separated list of MdSelfLinks (in this case consisting of only one) in the x-context parameter to restrict the search to the (sub)corpus as described in the CMDI files that have those MdSelfLinks.