The following known operational issues relate to systems that QCIF or UQ RCC support.
QRIScloud NeCTAR issues:
- The QRIScloud availability zone (AZ) is typically full, and the ability to launch new instances is often curtailed. Users attempting to launch new instances in QRIScloud are likely to see a "No available host" error message. If you see this message, we recommend that you try to launch in a different AZ. If you need to launch in QRIScloud in order to access storage (for example) and are having difficulty, please contact QRIScloud Support.
- QRIScloud does not provide Volume Storage Backup. The Dashboard options for doing this will not work for volumes in the QRIScloud AZ. (This is by design.)
Login / data transfer issues:
These issues apply the QRISdata Collection access services, Awoonga, Flashlite and Tinaroo:
- Each of these systems is accessed via an "haproxy" load balancer that balances the load across 2 or more login or data transfer nodes:
- The load balancers employ connection rate limiting. If there are too many simultaneous connections or connection attempts from the same system, connections can be closed abruptly with this message:
ssh_exchange_identification: read: Connection reset by peer
- The data transfer rate through the load balancers is limited by hardware.
- For bulk data transfers, it is advisable to connect to the login nodes / data transfer nodes directly to avoid these problems. This can also be used as a work-around if you are experiencing problems with login sessions.
Service Load balancer Login / transfer nodes Data access data.qriscloud.org.au ssh1.qriscloud.org.au
ssh2.qriscloud.org.auAwoonga awoonga.qriscloud.org.au awoonga1.qriscloud.org.au
Flashlite flashlite.rcc.uq.edu.au flashlite1.rcc.uq.edu.au
flashlite2.rcc.uq.edu.auTinaroo tinaroo.rcc.uq.edu.au tinaroo1.rcc.uq.edu.au
tinaroo2.rcc.uq.edu.au - The login / data transfer nodes are all configured with "fail2ban" to protect against password guessing.
- After a small number (typically 3) of failed login attempts, further attempts at login will be refused, even if you get the account and password correct.
- This "banning" typically lasts for 10 minutes.
- If you use SSH keys to login, and your SSH clients offers multiple SSH keypairs, each "offer" that is not accepted is counted by "fail2ban" as a failed login attempt. If you have lots of keypairs in (for example) your "~/.ssh" directory, you can actually be banned before your SSH client offers the right keypair. The solution is to use the "-i" option to specify the key to be used.
- All of the above system should accept either QSAC account name and password, or a (for UQ users only) a UQ account name and password. At times, one or the other may stop working. In such cases, we recommend that you try the other approach as a work-around.
- Mediaflux and Nextcloud require either AAF login or account name / password login using a QSAC.
Nextcloud:
- There is a file size limit of 10GB for uploading individual files via the Nextcloud web interface.
- There is no file size limit for file download.
- The Nextcloud Sync client is not affected by this limit.
- You can use the SSH access services to upload a larger file to a QRIScloud collection, but this option is not available for Nextcloud personal shares.
- If you reset your QSAC, this will break any Nextcloud Sync that you have configured. You will need to change the saved password in your Nextcloud Sync client configuration.
Aspera:
The QRIScloud Aspera service was decommissioned in December 2017.
HPC issues:
- PBS file stage-in and stage-out does not work, and is liable to CRASH the PBS system. Please do not attempt to use it.
- PBS jobs cannot be submitted from compute nodes. One job "qsub"-ing another job will not work.
- Users are strongly discouraged from submitting jobs that compute directly against QRIScloud collections, or using collections for temporary or scratch storage. If a job needs to read or write files in a collection, they should be staged in or out at the beginning / end of the job, preferably as TAR or ZIP archives if there a lots of small files.
- Euramoo no longer exists:
- Euramoo users have been automatically granted an Awoonga account.
- User home directories from Euramoo have been made available on Awoonga.
- For more information, please refer to the Euramoo transition document.
- Much of the RCC HPC user documentation can only be accessed via the UQ campus network.
Please also refer to the RCC Active Incidents page for current Awoonga, Flashlite, & Tinaroo incidents.
Comments