wiki:SystemAdministration/Monitoring/Icinga/Outdated

Version 80 (modified by Kai Zimmer, 10 years ago) (diff)

--

currently reachable via

server + services monitoring

AAI monitoring

MPI services monitoring

CLARIN-D monitoring requirements

The monitoring requirements of CLARIN-D are pretty modest - a simple icinga or nagios installation would be sufficient to fullfill all the needs:

  • regular checks if hosts are up and reachable (ping)
  • regular checks if certain network services are working (e.g. http)
  • Each center can provide nagios plugins to assess the state of that center's services.
  • Each center can register one or more contact persons. They will get an email with a warning if a host or service is not working correctly.
  • The server running the monitoring software is also monitored itself.
  • Users (not only the center administrators) should be able to see the status of each center and its service(s) on a website.
  • For access to the web interface of icinga/nagios authentication & authorization via shibboleth would be nice.
  • (IP) Adresses for external probes should be delivered via the Center Registry (docs)
  • As visualisation a map of germany under http://de.clarin.eu/status momentan hier (für Joomla-Nutzer) via Nagvis-Plugin
    • with traffic light alarm indication (?) maybe like this (?)
    • with graphs to see how long or how often services have been unavailable in the past? Via link to nagios?
Service Types / Tests ping http disk space load free mem users functional check query duration time
AAI Service Providers (SP) * # #(IDS probe?)
AAI Identity Providers (IdP) * # * * #(IDS probe?)
AAI Where are you From (WAYF) * # #(MPI discojuice probe?)
REST-Webservices (WebLicht?) * #(provenance data aus TCF?)
Federated Content Search Endpoints (SRU/CQL) * # #(MPI probe?)
Federated Content Search Aggregator * # #
Repositories * # * #(test for afedora content model?)
OAI-PMH Gateway * #(MPI probe?)
Handle Servers * #(EUDAT/Jülich probe?) #(Eric's timeout probe)
resolve a sample PID for each repository # #
Center Registry * #
WebLicht? webserver * #
VLO webserver * #
TLA webserver * #
other webservers * #
Nagios servers (selfcheck) * # #(check_nagios plugin)
Nagios servers crosscheck (from other center) * #(check_nagios plugin)
Workspaces server (not yet) n.a. n.a. n.a.

# mandatory; * optional

center requirements

  • Shibboleth Identity Providers (IdP) status pages must be reachable from other servers
  • Shibboleth Servers Providers (IdP) status pages must be reachable from other servers

center registry requirements

Current priorities:

  • Federated content search endpoints (multiple per center)
  • OAI-PMH end points

Other points:

add server data for:

  • Shibboleth Identity Providers (IdP)
  • Shibboleth Service Providers (SP) (multiple per center)
  • Shibboleth Where are You From servers (WAYF, currently only one available?)
  • REST Webservices (multiple per center)
  • Repositories
  • Handle servers (in case centers have their own, like IDS?)
  • sample PID URLs per center
  • nagios servers (if available per center)

and a list of diverse webservers (VLO. TLA, WebLicht?)

nagios server requirements