From Dark Peak
Jump to: navigation, search

A live service is both usable and reliable. Some effort has been put in to ensuring that the service is maintainable and that sufficient people are available to maintain it. While there is no expectation that a service will never fail, recovery shouldn't take more than a couple of days. This includes keeping and restoring from regular backups.


All requirements for Beta met plus:

No reliance on any Beta infrastructure.

Automated backup.

A maintenance / ops guide on the wiki with instructions that all the maintainers feels happy following for:

  • service restart
  • upgrade
  • restore from a backup
  • remove
  • migrate an existing instance
  • deploy new instances
  • fixing expected problems and events
  • maintenance and rollback plans/processes

Any info (including keys/passwords) needed to access and admin the service must be provided to the board to be stored using the key management tools.

Infrastructure must define how any functionality provided to another service can be cleanly removed if needed, what default levels of resources it will provide at each level and how that will be enforced/monitored. This can include requiring information from the client. Anything too secure to be provided up front e.g. passwords can instead be escrowed with the board/membership.

Explanation of the service’s security and how you manage access to any particularly sensitive data.