An apparent hardware problem on compute node CN37 caused it to spontaneously reboot. While the system appears to be working at the moment, the event has left it unresponsive to control via the remote console. We have just been advised by the hardware vendor that the node needs to be physically power cycled urgently. We will be doing this at 10 am tomorrow (Thursday 23rd November).
Starting at 10am, we will perform an orderly shutdown of all running instances on the compute node, and then power cycle the hardware. When the node is working properly, the instances that were shutdown will be restarted. We anticipate the entire process will take less than one hour.
We do not advise any special precautions. However:
- If you have not backed up your instance recently, you should address this oversight.
- If your instance has special requirements for shutdown and restart, we advise that you perform a manual shutdown before the outage window, and then restart manually afterwards.
- If you have users who depend on an instance on the compute nodes, you may want to warn them.
We apologize for the short notice for this outage.
Tenant managers and members for affected instances have been emailed.
UPDATE 2017-11-23 12:15 - The reboot happened as scheduled without problems. Unfortunately, power cycling the node did not fix the underlying problem. We are awaiting vendor advice on the next step to take.