Advice: Euramoo running with reduced capacity: 2016-05-30 ... resolved

Follow

The Euramoo cluster is currently running with reduced capacity to allow the hypervisor level for the compute node VMs to have critical patches applied.

At this point in time, new jobs destined for "amd" nodes are being held in the PBS queues to allow work on the nodes to "drain" ready for patching.  As the hypervisors are patched the compute node VMs will be progressively restarted, and queued jobs will be scheduled.

We apologize for the delays in getting your jobs processed.

UPDATE - 2016-05-31 - 09:00 - Jobs on "intel" and "biolinux" nodes are now draining as well. Some of the "amd" capacity is in the process of being restored to service.

UPDATE - 2016-06-02 - 09:00 - Some capacity on the updated nodes is now available.We are still waiting for a number of long running (2 week) jobs to finish so that we can update the rest of the nodes.

We have also changed the way that the job queues are organized.  The old "SharedMemory" and "SingleCPU" queues are being replaced by separate queues for "Intel", "AMD" and "BioLinux" queues.  A new queue "LongWallTime" has been introduced for jobs with a walltime > 1 week.

UPDATE - 2016-06-03 - 12:25 - We now have 29 jobs running from the new queues.  Unfortunately there are still 88 long running jobs on the old queues that are holding up the process.  Given the limited capacity and the size of the backlog, it will be a long time before "interactive" jobs can be scheduled.

UPDATE - 2016-06-03 - 15:55 - There are now 45 jobs running from the new queues.

UPDATE - 2016-06-06 - 09:55 - There are now 74 jobs running from the new queues.  Note that the LongWallTime queue is stopped, so jobs with walltime greater than 1 week are not being scheduled.

UPDATE - 2016-06-07 - 13:15 - There are now 87 jobs running from the new queues.

UPDATE - 2016-06-08 - 09:20 - There are now 154 jobs running from the new queues. Unfortunately, interactive jobs are not scheduling properly at the moment.

UPDATE - 2016-06-08 - 16:10 - There are now 199 jobs running from the new queues.

UPDATE - 2016-06-08 - 16:35 - Here is a temporary workaround for the interactive job outage. 

UPDATE - 2016-06-13 - 16:15 - Getting close to done.  There are now 433 jobs running, and the LongWallTime queue has been started.

UPDATE - 2016-07-01 - 13:10 - Interactive jobs are now working normally.

 

Have more questions? Submit a request

Comments

Powered by Zendesk