== server + services monitoring == * nagios: http://www.nagios.org/ * icinga (nagios fork, same format for plugins etc): https://www.icinga.org/ * clients/remote servers will need to have the Nagios NRPE package (only for local checks like diskspace, mem etc) and port 5666 open == AAI monitoring == * AAI eye: http://www.csc.fi/english/institutions/haka/instructions/services-tech/aaieye * RAPTOR: http://iam.cf.ac.uk/trac/RAPTOR * IDS: https://trac.clarin.eu/browser/monitoring/plugins/ids == MPI services monitoring == * see: https://trac.clarin.eu/browser/monitoring/plugins/mpi (includes SRU/CQL and OAI-PMH probe) == CLARIN-D monitoring requirements == The monitoring requirements of CLARIN-D are pretty modest - a simple icinga or nagios installation would be sufficient to fullfill all the needs: * regular checks if hosts are up and reachable (ping) * regular checks if certain network services are working (e.g. http) * Each center can provide [http://nagiosplug.sourceforge.net/developer-guidelines.html nagios plugins] to assess the state of that center's services. * Each center can register one or more contact persons. They will get an email with a warning if a host or service is not working correctly. * The server running the monitoring software is also monitored itself. * Users (not only the center administrators) should be able to see the status of each center and its service(s) on a website. * For access to the web interface of icinga/nagios authentication & authorization via shibboleth would be nice. * (IP) Adresses for external probes should be delivered via the [https://centerregistry-clarin.esc.rzg.mpg.de/ Center Registry] * As visualisation a [http://www.clarin-d.de/images/karte.png map of germany] under http://de.clarin.eu/status momentan [http://clarin-d.de/de/aktuelles/status-infrastruktur.html hier] (für Joomla-Nutzer) via Nagvis-Plugin - with traffic light alarm indication (?) maybe [http://www.laendercheck-wissenschaft.de/archiv/privater_hochschulsektor/status_quo/status_quo_deutschlandkarte.jpg like this] (?) - with graphs to see how long or how often services have been unavailable in the past? Via link to nagios? || Service Types / Tests ||= ping =||= http =||= disk space =||= load =||= free mem =||= users =||= functional check =||= query duration time =|| ||= AAI Service Providers (SP)=|| # || # || || || || || #([https://trac.clarin.eu/browser/monitoring/plugins/ids IDS probe]?) || || ||= AAI Identity Providers (IdP)=|| # || # || || * || * || || #([https://trac.clarin.eu/browser/monitoring/plugins/ids IDS probe]?) || || ||= AAI Where are you From (WAYF)=|| # || # || || || || || #([https://trac.clarin.eu/browser/monitoring/plugins/mpi MPI discojuice probe]?) || || ||= REST-Webservices (WebLicht)=|| # || || || || || || # || || ||= Federated Content Search Endpoints (SRU/CQL)=|| # || # || || || || || #([https://trac.clarin.eu/browser/monitoring/plugins/mpi MPI probe]?) || || ||= Federated Content Search Aggregator=|| # || # || || || || || # || || ||= Repositories=|| # || # || * || || || || #(test for a[http://localhost:8080/fedora/objects/fedora-system:FedoraObject-3.0 fedora content model]?) || || ||= OAI-PMH Gateway=|| # || || || || || || #([https://trac.clarin.eu/browser/monitoring/plugins/mpi MPI probe]?) || || ||= Handle Servers=|| # || || || || || || #(EUDAT/Jülich probe?) || #(Eric's timeout [https://svn.clarin.eu/monitoring/plugins/mpi/HandleSystem/ probe]) || ||= resolve a sample PID for each repository=|| || || || || || || # || # || ||= Center Registry=|| # || || || || || || # || || ||= WebLicht webserver=|| # || # || || || || || || || ||= VLO webserver=|| # || # || || || || || || || ||= TLA webserver=|| # || # || || || || || || || ||= other webservers=|| # || # || || || || || || || ||= Nagios servers (selfcheck)=|| # || # || || || || || #(check_nagios plugin) || || ||= Nagios servers crosscheck (from other center)=|| # || || || || || || #(check_nagios plugin) || || ||= Workspaces server (not yet)=|| n.a. || || n.a. || || || || n.a. || || # mandatory; * optional == center requirements == == center registry requirements == add server data for: * Shibboleth Identity Providers (IdP) * Shibboleth Service Providers (SP) (multiple per center) * Shibboleth Where are You From servers (WAYF, currently only one available?) * REST Webservices (multiple per center) * Federated content search endpoints (multiple per center) * Repositories * Handle servers (in case centers have their own, like IDS?) * sample PID URLs per center * nagios servers (if available per center) and a list of diverse webservers (VLO. TLA, WebLicht) == nagios server requirements ==