On Monday 26th of November, QRIScloud staff will be performing essential hardware maintenance on CN17 to replace a faulty RAID controller. It was a fault in this controller that caused an outage last week for all CN17 instances. (The RAID controller is the interface between the compute node's CPUs and the local disk drives that hold root and ephemeral file systems.)
It may be necessary to power off CN18 (the "sister blade" for CN17) while maintenance is being done on CN17. This will depend on the state of the chassis that houses the two blades.
The plan is to perform an orderly shutdown of all instances on CN17 (and possibly CN18). Once the maintenance has been completed we will restart all instances that were running prior to the shutdown. We are allowing an outage window of 2 hours for this work.
- There is no particular risk to the Nectar instances on these nodes. However (as always) we recommend that you check that your backups are up to date.
- If your instances have specific requirements for shutdown and restart, we recommend that you manually put them into Shutdown state before 11:00 and manually restart them after the outage has completed.
Contacts for all instances on CN17 and CN18 should receive a notification about this outage.
2018-11-26 11:05 - shutdowns for CN17 commenced. CN18 instances still running at this stage.
2018-11-26 11:35 - hardware work + RAID reconfiguration complete. CN17 instances have now all been restarted. If you see any problems, please raise a support ticket.