wiki:SystemAdministration/Monitoring/Icinga/Outdated

Version 29 (modified by kzimmer, 11 years ago) (diff)

--

server + services monitoring

AAI monitoring

CLARIN-D monitoring requirements

The monitoring requirements of CLARIN-D are pretty modest - a simple icinga or nagios installation would be sufficient to fullfill all the needs:

  • regular checks if hosts are up and reachable (ping)
  • regular checks if certain network services are working (e.g. http)
  • Each center can provide nagios plugins to assess the state of that center's services.
  • Each center can register one or more contact persons. They will get an email with a warning if a host or service is not working correctly.
  • The server running the monitoring software is also monitored itself.
  • Users (not only the center administrators) should be able to see the status of each center and its service(s) on a website.
  • For access to the web interface of icinga/nagios authentication & authorization via shibboleth would be nice.
Service Types / Tests ping http disk space load free mem shib func query func performance
=AAI Service Providers (SP) * *
=AAI Identity Providers (IdP) * *
=AAI Where are you From (WAYF) * *
=REST-Webservices (WebLicht?) *
=Federated Content Search Endpoints (FCS) * * *
=Federated Content Search Aggregator * * *
=Repositories * * *
=OAI-PMH Gateway * *
=Handle Servers * * *
=local handles resolveable * *
=Center Registry * *
=WebLicht? webserver * *
=VLO webserver * *
=other webservers * *
=Nagios servers crosscheck * *
=Workspaces server * * *

Visualisation for Clarin-D: map of germany under http://de.clarin.eu/status with traffic light alarm indication (?) maybe like this (?)

Open questions: Do we need graphs to see how long or how often services have been unavailable in the past? Perhaps via link to nagios?