wiki:SystemAdministration/Monitoring/ServiceStatusGuidelines

Service status guidelines

Guidelines under construction!

This is a (draft) guideline for dealing with expected and unexpected downtime of A-services and other central/crucial CLARIN services.

Monitoring

Be aware that important CLARIN services are monitored in various ways; primarily using Icinga but there are also Uptime Robot and StatusCake that hook into a private Slack channel, thus notifying the CLARIN administrators.

Live and historical StatusCake data is available at status.clarin.eu.

Expected downtime report

Use the expected service downtime report form to submit your expected downtime. The central development/admin team will then process the information into the CLARIN services status page.

Don't forget to notify the system administrators when your service is back up!

Unexpected downtime

{todo}

On-site maintenance notification

It is a good practice to replace the content of your service page/front end/portal with a page showing a message indicating the current status and expected timeframe of the activities causing the downtime. However, if you do so, make sure to return a 503 Service Unavailable response code so that automated status checker can recognise the current state of your service.

Also see this post on how to handle downtime during site maintenance.

More information

Last modified 7 years ago Last modified on 11/16/16 09:44:23