== server + services monitoring == * nagios: http://www.nagios.org/ * icinga (nagios fork, same format for plugins etc): https://www.icinga.org/ == AAI monitoring == * AAI eye: http://www.csc.fi/english/institutions/haka/instructions/services-tech/aaieye * RAPTOR: http://iam.cf.ac.uk/trac/RAPTOR * IDS: https://trac.clarin.eu/browser/monitoring/plugins/ids == MPI services monitoring == * see: https://trac.clarin.eu/browser/monitoring/plugins/mpi (includes SRU/CQL and OAI-PMH probe) == CLARIN-D monitoring requirements == The monitoring requirements of CLARIN-D are pretty modest - a simple icinga or nagios installation would be sufficient to fullfill all the needs: * regular checks if hosts are up and reachable (ping) * regular checks if certain network services are working (e.g. http) * Each center can provide [http://nagiosplug.sourceforge.net/developer-guidelines.html nagios plugins] to assess the state of that center's services. * Each center can register one or more contact persons. They will get an email with a warning if a host or service is not working correctly. * The server running the monitoring software is also monitored itself. * Users (not only the center administrators) should be able to see the status of each center and its service(s) on a website. * For access to the web interface of icinga/nagios authentication & authorization via shibboleth would be nice. * As visualisation a map of germany under http://de.clarin.eu/status - with traffic light alarm indication (?) maybe [http://www.laendercheck-wissenschaft.de/archiv/privater_hochschulsektor/status_quo/status_quo_deutschlandkarte.jpg like this] (?) - with graphs to see how long or how often services have been unavailable in the past? Via link to nagios? || Service Types / Tests ||= ping =||= http =||= disk space =||= load =||= free mem =||= shib func =||= query func =||= query duration time =|| ||= AAI Service Providers (SP)=|| * || || || || || * || || || ||= AAI Identity Providers (IdP)=|| * || || || || || * || || || ||= AAI Where are you From (WAYF)=|| * || || || || || * || || || ||= REST-Webservices (WebLicht)=|| * || || || || || || || || ||= Federated Content Search Endpoints (FCS)=|| * || * || || || || || * || || ||= Federated Content Search Aggregator=|| * || * || || || || || * || || ||= Repositories=|| * || * || || || || || * || || ||= OAI-PMH Gateway=|| * || || || || || || * (MPI probe?) || || ||= Handle Servers=|| * || || || || || || * || * || ||= resolve a sample PID for each center=|| || || || || || || * || * || ||= Center Registry=|| * || || || || || || * || || ||= WebLicht webserver=|| * || * || || || || || || || ||= VLO webserver=|| * || * || || || || || || || ||= other webservers=|| * || * || || || || || || || ||= Nagios servers crosscheck=|| * || || || || || || * || || ||= Workspaces server=|| * || || * || || || || * || ||