The following known operational issues relate to systems that QCIF or UQ RCC support.
QRIScloud NeCTAR issues:
- The QRIScloud availability zone (AZ) is typically full, and the ability to launch new instances is often curtailed. Users attempting to launch new instances in QRIScloud are likely to see a "No available host" error message. If you see this message, we recommend that you try to launch in a different AZ. If you need to launch in QRIScloud in order to access storage (for example) and are having difficulty, please contact QRIScloud Support.
- QRIScloud does not provide Volume Storage Backup. The Dashboard options for doing this will not work for volumes in the QRIScloud AZ.
Login / data transfer issues:
These issues apply the QRISdata Collection access services, Euramoo, Flashlite and Tinaroo:
- Each of these systems is accessed via an "haproxy" load balancer that balances the load across 2 or more login or data transfer nodes:
- The load balancers employ connection rate limiting. If there are too many simultaneous connections or connection attempts from the same system, connections can be closed abruptly with this message:
ssh_exchange_identification: read: Connection reset by peer
- The data transfer rate through the load balancers is limited by hardware.
- For bulk data transfers, it is advisable to connect to the login nodes / data transfer nodes directly to avoid these problems. This can also be used as a work-around if you are experiencing problems with login sessions.
Service Load balancer Login / transfer nodes Data access data.qriscloud.org.au ssh1.qriscloud.org.au
Flashlite flashlite.rcc.uq.edu.au flashlite1.rcc.uq.edu.au
Tinaroo tinaroo.rcc.uq.edu.au tinaroo1.rcc.uq.edu.au
- After a small number (typically 3) of failed login attempts, further attempts at login will be refused, even if you get the account and password correct.
- This "banning" typically lasts for 10 minutes.
- If you use SSH keys to login, and your SSH clients offers multiple SSH keypairs, each "offer" that is not accepted is counted by "fail2ban" as a failed login attempt. If you have lots of keypairs in (for example) your "~/.ssh" directory, you can actually be banned before your SSH client offers the right keypair. The solution is to use the "-i" option to specify the key to be used.
- Aspera is no longer available for QRIScloud collections on the GPFS service.
- Aspera is due to be decommissioned for other types of collection at the end of 2017.
- PBS file stage-in and stage-out does not work, and is liable to CRASH the PBS system. Please do not attempt to use it.
- There is a bug in the way that the "qsub" command handles memory resource request. If you specify a "vmem" resource without a "mem" resource, the qsub filter is supposed to set "mem" to "vmem". In fact, this is not happening for interactive jobs, and "mem" is left at its default value ("1GB"). The workaround is to specify both "mem" and "vmem" for interactive jobs.
- Users are strongly discouraged from submitting jobs that compute directly against QRIScloud collections, or using collections for temporary or scratch storage. If a job needs to read or write files in a collection, they should be staged in or out at the beginning / end of the job, preferably as TAR or ZIP archives if there a lots of small files.
- Euramoo no longer exists:
- Euramoo users have been automatically granted an Awonga account.
- User home directories from Euramoo have been made available on Awoonga.
- For more information, please refer to the Euramoo transition document.
Please also refer to the RCC Active Incidents page for current Awoonga, Flashlite, Tinaroo & Barrine incidents.