93 | | So one obvious choice would be to define a CMD-Profile '''for the Repositories'''. |
94 | | This profile would carry only the minimal necessary primarily technical information. In particular it shouldn't provide any information about the data provided, but rather link to separate MDRecords of a collection or a resource, that would provide information like `ResourceType`, `Language`, available `AnnotationTiers`, `Time/Space Coverage` etc. |
| 93 | For this, a separate service was conceived and implemented - the CenterRegistry -, together with a dedicated CMD-Profile '''for describing the Repositories''' the [[http://catalog.clarin.eu/ds/ComponentRegistry?item=clarin.eu:cr1:p_1320657629667|CenterProfile]]. This profile stores besided organizational information, mainly the endpoints to available services of given center (and minimal technical details). |
100 | | <CMD> |
101 | | <Resources> |
102 | | <ResourceProxyList> <!-- should refer to collections or resources that are reachable by given Repository. --> |
103 | | <ResourceProxy id="mpi-subcorpus"> |
104 | | <ResourceType>Metadata</ResourceType> |
105 | | <ResourceRef>{collection-handle}</ResourceRef> |
106 | | </ResourceProxy> |
107 | | </ResourceProxyList> |
108 | | </Resources> |
109 | | <Components> |
110 | | <Repository> |
111 | | <GeneralInfo> |
112 | | <ID>?</ID> |
113 | | <Name>MPI corpus</Name> |
114 | | <Description>MPI Corpora: ESF, CGN, ...</Description> |
115 | | </GeneralInfo> |
116 | | <Endpoint> |
117 | | <URL>http://cqlservlet.mpi.nl/</URL> |
118 | | <type>SRU</type> |
119 | | <Views> |
120 | | <view>text</view> |
121 | | </Views> |
122 | | </Endpoint> |
123 | | <Endpoint> |
124 | | <URL>http://corpus1.mpi.nl/ds/imdi_browser/</URL> |
125 | | <type>WebApp/User Interface</type> |
126 | | </Endpoint> |
127 | | </Repository> |
128 | | </Components> |
129 | | </CMD> |
| 99 | #!xml |
| 100 | <WebReference> |
| 101 | <Website>http://weblicht.sfs.uni-tuebingen.de/rws/sru/</Website> |
| 102 | <Description>CQL</Description> |
| 103 | </WebReference> |
132 | | === indexdata solution === |
133 | | Indexdata/Masterkey framework (from which we are exampine the Pazpar2-component as aggregator) provides [[http://www.indexdata.com/irspy|IRSpy]] (GPL) and [[http://www.indexdata.com/masterkey|Torus]] (seems proprietary). |
| 106 | There is ongoing discussion, how much information about the data provided should be included in these description (foremost example being the `Language`), as the data is actually already described in separate CMD records. |
| 107 | So it would be "cleaner" to just link to separate CMD records of a collection or a resource, that would provide information like `ResourceType`, `Language`, available `AnnotationTiers`, `Time/Space Coverage` etc. However that would put an undue burden on the side of the aggregator/client, that would have to crawl through and resolve multiple metadata records, plus be able to make sense of the heterogeneous structure of the CMD records, to get to the required information. Thus for now, `Language` (and possibly a few other basic fields) will be added into the endpoint-description. But there are plans (and prototyping work at Meertens) to '''combine the metadata and content search''', that would allow to filter the content search on any metadata query. |
| 108 | |
| 109 | To support the metadata/content search, there is an alternative way of announcing the FCS-endpoint, by adding a [[http://trac.clarin.eu/wiki/FCS-specification#ReferringtoanSRUendpointfromaCMDIfile|`SearchService-ResourceProxy`]] |
| 110 | into the CMD record of any collection/resource. |
| 111 | Because the SearchService-Proxy leads just to the "nearest" endpoint, |
| 112 | if the endpoint exposes multiple resources, it has to accept the parameter `x-context`, to restrict the search to given resource. A client/aggregator invoking a search for given resource, passes the PID of the resource as `x-context` parameter. |