On Tuesday December 5th the hardware vendor will again be attempting fix a hardware problem with compute node CN37 that is preventing console access. The previous attempt (on December 1st) failed because the replacement motherboard for the node was "dead on arrival". We have scheduled an outage window from 09:00 to 13:00 for doing this essential maintenance.
Starting at 9am, all running instance on the compute node will be shut down in an orderly fashion. Once the hardware problem has been fixed, all instances that were shut down will be restarted.
We do not advise any special precautions. However:
- If you have not backed up your instance recently, you should address this oversight.
- If your instance has special requirements for shutdown and restart, we advise that you perform a manual shutdown before the outage window, and then restart manually afterwards.
- If you have users who depend on your instance, you may want to warn them of the outage.
Tenant managers and members for the instances that will be affected by the outage will be emailed.
UPDATE - 2017-12-05 - 12:25 - The outage is completed and instances have been restarted. If you experience problems connecting to your instance, it is recommended that you attempt to Hard Reboot it from the NeCTAR Dashboard. If that fails, please raise a support ticket.
UPDATE - 2017-12-05 - 13:55 - CN37 has suffered another (different) hardware failure which makes the compute node unusable - all instances on the node are offline. The hardware vendor is being contacted.
UPDATE - 2017-12-05 - 16:45 - Hardware spare parts are being sent by the vendor for CN37. These will be delivered tomorrow or Thursday with the vendor working on CN37 on Friday.
UPDATE - 2017-12-07 - 15:05 - The spare parts have arrived and the vendor will be visiting the data center tomorrow to install them.