== server + services monitoring == * nagios: http://www.nagios.org/ * icinga (nagios fork, same format for plugins etc): https://www.icinga.org/ * clients/remote servers will need to have the Nagios NRPE package (only for local checks like diskspace, mem etc) and port 5666 open == AAI monitoring == * AAI eye: http://www.csc.fi/english/institutions/haka/instructions/services-tech/aaieye * RAPTOR: http://iam.cf.ac.uk/trac/RAPTOR * IDS: https://trac.clarin.eu/browser/monitoring/plugins/ids == MPI services monitoring == * see: https://trac.clarin.eu/browser/monitoring/plugins/mpi (includes SRU/CQL and OAI-PMH probe) == CLARIN-D monitoring requirements == The monitoring requirements of CLARIN-D are pretty modest - a simple icinga or nagios installation would be sufficient to fullfill all the needs: * regular checks if hosts are up and reachable (ping) * regular checks if certain network services are working (e.g. http) * Each center can provide [http://nagiosplug.sourceforge.net/developer-guidelines.html nagios plugins] to assess the state of that center's services. * Each center can register one or more contact persons. They will get an email with a warning if a host or service is not working correctly. * The server running the monitoring software is also monitored itself. * Users (not only the center administrators) should be able to see the status of each center and its service(s) on a website. * For access to the web interface of icinga/nagios authentication & authorization via shibboleth would be nice. * As visualisation a map of germany under http://de.clarin.eu/status - with traffic light alarm indication (?) maybe [http://www.laendercheck-wissenschaft.de/archiv/privater_hochschulsektor/status_quo/status_quo_deutschlandkarte.jpg like this] (?) - with graphs to see how long or how often services have been unavailable in the past? Via link to nagios? || Service Types / Tests ||= ping =||= http =||= disk space =||= load =||= free mem =||= users =||= functional check =||= query duration time =|| ||= AAI Service Providers (SP)=|| # || # || || || || || #(IDS probe?) || || ||= AAI Identity Providers (IdP)=|| # || # || || * || * || || #(IDS probe?) || || ||= AAI Where are you From (WAYF)=|| # || # || || || || || #(MPI discojuice probe?) || || ||= REST-Webservices (WebLicht)=|| # || || || || || || # || || ||= Federated Content Search Endpoints (SRU/CQL)=|| # || # || || || || || #(MPI probe?) || || ||= Federated Content Search Aggregator=|| # || # || || || || || # || || ||= Repositories=|| # || # || * || || || || #(test for [http://localhost:8080/fedora/objects/fedora-system:FedoraObject-3.0 content model?) || || ||= OAI-PMH Gateway=|| # || || || || || || #(MPI probe?) || || ||= Handle Servers=|| # || || || || || || #(EUDAT/Jülich probe?) || # || ||= resolve a sample PID for each repository=|| || || || || || || # || # || ||= Center Registry=|| # || || || || || || # || || ||= WebLicht webserver=|| # || # || || || || || || || ||= VLO webserver=|| # || # || || || || || || || ||= TLA webserver=|| # || # || || || || || || || ||= other webservers=|| # || # || || || || || || || ||= Nagios servers (selfcheck)=|| # || # || || || || || #(check_nagios plugin) || || ||= Nagios servers crosscheck (from other center)=|| # || || || || || || #(check_nagios plugin) || || ||= Workspaces server=|| # || || # || || || || # || || # mandatory; * optional/useful