Changes between Version 38 and Version 39 of FCS-Endpoints


Ignore:
Timestamp:
09/10/12 10:36:30 (12 years ago)
Author:
vronk
Comment:

updated to reflect CenterRegistry? and CenterProfile?

Legend:

Unmodified
Added
Removed
Modified
  • FCS-Endpoints

    v38 v39  
    11= Repository Registry =
    2 We need a service/registry that informs about the available repositories.
     2''Note: better rename page to: '''FCS-Endpoints'''?''
    33
    4 Until we have that formalized, we will collect the information about individual providers here.
     4This page was started to preliminary collect information about individual repositories/centers/providers/endpoints wrt [[FederatedSearch]]. Meanwhile there is the CenterRegistry, that collects and provides the information about the available repositories. This information is encoded as a dedicated CMDI-record according to the [[http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/profiles/clarin.eu:cr1:p_1320657629667/xsd|CenterProfile]] [`clarin.eu:cr1:p_1320657629667`]. Note, that these centers' descriptions are not restricted to the FCS-endpoints, but shall describe - besides organizational/administrational information - all services available at given center (including FCS).
    55
    66== Implementing Endpoints ==
     
    9191As already the above list suggests, simple list of endpoints will not be enough. Even if we try to rely as much as possible on the auto-configuration based on the explain record, additional information (''Metadata'') needs to be stored and provided.
    9292
    93 So one obvious choice would be to define a CMD-Profile '''for the Repositories'''.
    94 This profile would carry only the minimal necessary primarily technical information. In particular it shouldn't provide any information about the data provided, but rather link to separate MDRecords of a collection or a resource, that would provide information like `ResourceType`, `Language`, available `AnnotationTiers`, `Time/Space Coverage` etc.
     93For this, a separate service was conceived and implemented - the CenterRegistry -, together with a dedicated CMD-Profile '''for describing the Repositories''' the [[http://catalog.clarin.eu/ds/ComponentRegistry?item=clarin.eu:cr1:p_1320657629667|CenterProfile]]. This profile stores besided organizational information, mainly the endpoints to available services of given center (and minimal technical details).
    9594
    96 We also have to beware of a big semantic/functional overlap of such a `Repository`-MDRecord and the `SRU-explain`-response of given service and '''TODO:''' have to clarify the interaction and dependencies.
     95''(Sidenote: There is a semantic/functional overlap of such a CMD record for the endpoint and the `SRU-explain`-response of given service. [[BR]]'''TODO:''' have to clarify the interaction and dependencies. )''
    9796
    98 Following is a proposal of a sample `Repository`-instance:
     97Following is a snippet of a `CenterProfile` linking to the FCS-endpoint:
    9998{{{
    100  <CMD>
    101    <Resources>
    102      <ResourceProxyList> <!-- should refer to collections or resources that are reachable by given Repository. -->
    103        <ResourceProxy id="mpi-subcorpus">
    104          <ResourceType>Metadata</ResourceType>
    105          <ResourceRef>{collection-handle}</ResourceRef>
    106             </ResourceProxy>
    107         </ResourceProxyList>
    108    </Resources>
    109    <Components>
    110      <Repository>
    111         <GeneralInfo>
    112            <ID>?</ID>
    113            <Name>MPI corpus</Name>
    114            <Description>MPI Corpora: ESF, CGN, ...</Description>
    115         </GeneralInfo>
    116        <Endpoint>
    117          <URL>http://cqlservlet.mpi.nl/</URL>
    118          <type>SRU</type>
    119          <Views>
    120             <view>text</view>
    121          </Views>
    122        </Endpoint>
    123        <Endpoint>
    124          <URL>http://corpus1.mpi.nl/ds/imdi_browser/</URL>
    125          <type>WebApp/User Interface</type>
    126        </Endpoint>
    127       </Repository>
    128    </Components>
    129  </CMD>
     99#!xml
     100 <WebReference>
     101   <Website>http://weblicht.sfs.uni-tuebingen.de/rws/sru/</Website>
     102   <Description>CQL</Description>
     103 </WebReference>
    130104}}}
    131105
    132 === indexdata solution ===
    133 Indexdata/Masterkey framework (from which we are exampine the Pazpar2-component as aggregator) provides [[http://www.indexdata.com/irspy|IRSpy]] (GPL) and [[http://www.indexdata.com/masterkey|Torus]] (seems proprietary).
     106There is ongoing discussion, how much information about the data provided should be included in these description (foremost example being the `Language`), as the data is actually already described in separate CMD records.
     107So it would be "cleaner" to just link to separate CMD records of a collection or a resource, that would provide information like `ResourceType`, `Language`, available `AnnotationTiers`, `Time/Space Coverage` etc. However that would put an undue burden on the side of the aggregator/client, that would have to crawl through and resolve multiple metadata records, plus be able to make sense of the heterogeneous structure of the CMD records, to get to the required information. Thus for now, `Language` (and possibly a few other basic fields) will be added into the endpoint-description. But there are plans (and prototyping work at Meertens) to '''combine the metadata and content search''', that would allow to filter the content search on any metadata query.
     108
     109To support the metadata/content search, there is an alternative way of announcing the FCS-endpoint, by adding a [[http://trac.clarin.eu/wiki/FCS-specification#ReferringtoanSRUendpointfromaCMDIfile|`SearchService-ResourceProxy`]]
     110into the CMD record of any collection/resource.
     111Because the SearchService-Proxy leads just to the "nearest" endpoint,
     112if the endpoint exposes multiple resources, it has to accept the parameter `x-context`, to restrict the search to given resource. A client/aggregator invoking a search for given resource, passes the PID of the resource as `x-context` parameter.
    134113
    135114== Further (future) Issues ==