Version 80 (modified by 10 years ago) (diff) | ,
---|
currently reachable via
server + services monitoring
- nagios: http://www.nagios.org/
- icinga (nagios fork, same format for plugins etc): https://www.icinga.org/
- clients/remote servers will need to have the Nagios NRPE package (only for local checks like diskspace, mem etc) and port 5666 open
AAI monitoring
- AAI eye: http://www.csc.fi/english/institutions/haka/instructions/services-tech/aaieye
- RAPTOR: http://iam.cf.ac.uk/trac/RAPTOR
- IDS: https://trac.clarin.eu/browser/monitoring/plugins/ids
- https://svn.ms.mff.cuni.cz/redmine/projects/dspace-modifications/wiki/AAIShibbie
MPI services monitoring
- see: https://trac.clarin.eu/browser/monitoring/plugins/mpi (includes SRU/CQL and OAI-PMH probe)
CLARIN-D monitoring requirements
The monitoring requirements of CLARIN-D are pretty modest - a simple icinga or nagios installation would be sufficient to fullfill all the needs:
- regular checks if hosts are up and reachable (ping)
- regular checks if certain network services are working (e.g. http)
- Each center can provide nagios plugins to assess the state of that center's services.
- Each center can register one or more contact persons. They will get an email with a warning if a host or service is not working correctly.
- The server running the monitoring software is also monitored itself.
- Users (not only the center administrators) should be able to see the status of each center and its service(s) on a website.
- For access to the web interface of icinga/nagios authentication & authorization via shibboleth would be nice.
- (IP) Adresses for external probes should be delivered via the Center Registry (docs)
- As visualisation a map of germany under http://de.clarin.eu/status momentan hier (für Joomla-Nutzer) via Nagvis-Plugin
- with traffic light alarm indication (?) maybe like this (?)
- with graphs to see how long or how often services have been unavailable in the past? Via link to nagios?
Service Types / Tests | ping | http | disk space | load | free mem | users | functional check | query duration time |
---|---|---|---|---|---|---|---|---|
AAI Service Providers (SP) | * | # | #(IDS probe?) | |||||
AAI Identity Providers (IdP) | * | # | * | * | #(IDS probe?) | |||
AAI Where are you From (WAYF) | * | # | #(MPI discojuice probe?) | |||||
REST-Webservices (WebLicht?) | * | #(provenance data aus TCF?) | ||||||
Federated Content Search Endpoints (SRU/CQL) | * | # | #(MPI probe?) | |||||
Federated Content Search Aggregator | * | # | # | |||||
Repositories | * | # | * | #(test for afedora content model?) | ||||
OAI-PMH Gateway | * | #(MPI probe?) | ||||||
Handle Servers | * | #(EUDAT/Jülich probe?) | #(Eric's timeout probe) | |||||
resolve a sample PID for each repository | # | # | ||||||
Center Registry | * | # | ||||||
WebLicht? webserver | * | # | ||||||
VLO webserver | * | # | ||||||
TLA webserver | * | # | ||||||
other webservers | * | # | ||||||
Nagios servers (selfcheck) | * | # | #(check_nagios plugin) | |||||
Nagios servers crosscheck (from other center) | * | #(check_nagios plugin) | ||||||
Workspaces server (not yet) | n.a. | n.a. | n.a. |
# mandatory; * optional
center requirements
- Shibboleth Identity Providers (IdP) status pages must be reachable from other servers
- Shibboleth Servers Providers (IdP) status pages must be reachable from other servers
center registry requirements
Current priorities:
- Federated content search endpoints (multiple per center)
- OAI-PMH end points
Other points:
add server data for:
- Shibboleth Identity Providers (IdP)
- Shibboleth Service Providers (SP) (multiple per center)
- Shibboleth Where are You From servers (WAYF, currently only one available?)
- REST Webservices (multiple per center)
- Repositories
- Handle servers (in case centers have their own, like IDS?)
- sample PID URLs per center
- nagios servers (if available per center)
and a list of diverse webservers (VLO. TLA, WebLicht?)