Outage: Awoonga, FlashLite, and Tinaroo 8pm June 25 2021 - 9am June 28 2021

Follow

The UQ RCC Infrastructure team has obtained approval to replace the old scratch disk array with the new scratch disk array on Saturday the 26th of June. The old scratch disk array provides /home, /30days, /90days, /sw and /sw7, and /groups to Awoonga. Tinaroo and FlashLite. The new scratch disk array will provide /home, /sw and /scratch. This RCC HPC Alert will outline the general timeline for the planned outage and some general notes. A RCC HPC Alert outlining more details will be sent in the week of 7th June.  

The hardware that provides the old /30days, /90days, /sw7 and /groups will be switched off at 8pm on Friday 25th June. This is a hard deadline. After this time there will be no access to any of the clusters or user data in /home, /30days, /90days and /groups. Login nodes and queues for Awoonga, Tinaroo and FlashLite will be available again after 9am on Monday the 28th June.

After the replacement of the old scratch disk array all users will have an allocation of 150GB in /scratch. All users will be required to move their data from /30days and /90days to /scratch. Reiterating the message from previous notifications, users are strongly advised to save as much data from /30days and /90days as possible now in their Q collection(s) under /QRISdata or /RDS for example, as moving the data after the replacement will be very slow as it will be connected differently and with reduced capacity than it is now. It should also be noted that the old scratch disk array is failing and becoming unreliable. While the old scratch will be available for a short time for users to move their data to the new scratch, if the old array fails during this time it will not be brought back.

As mentioned above and in previous alerts, the new quota in /scratch is smaller than quotas in /30days and /90days. Users who require a larger allocation will need to apply for a project space in /scratch via this form: https://forms.office.com/r/nKiVRrEiyE

Users with current space in /groups and /30days/GROUPS will also need to apply for project space in /scratch via the above form.

In the time before the replacement the maximum walltime limits for jobs currently available will be gradually reduced to prevent jobs from running into the outage window. This will prevent the need to kill running jobs for the outage. However, anything still found running at 8pm on Friday 25th June will be killed. All queued jobs will be deleted from the queues as well. Users will have to submit new jobs when the clusters are available again.

More details on the new scratch and project scratch space as well as the time frame when the old scratch will be available for users to move their remaining data from recent calculations to the new scratch will be sent out in the week from the 7th June.

Details on what users need to change to submit jobs again after the replacement will be sent around mid-June.

Have more questions? Submit a request

Comments

Powered by Zendesk