QRIScloud Known Issues

Follow

The following known operational issues relate to systems that QCIF or UQ RCC support.

QRIScloud NeCTAR issues:

  1. The QRIScloud availability zone (AZ) is typically full, and the ability to launch new instances is often curtailed.  Users attempting to launch new instances in QRIScloud are likely to see a "No available host" error message. If you see this message, we recommend that you try to launch in a different AZ.  If you need to launch in QRIScloud in order to access storage (for example) and are having difficulty, please contact QRIScloud Support.
  2. QRIScloud does not provide Volume Storage Backup. The Dashboard options for doing this will not work for volumes in the QRIScloud AZ.  (This by design.)

Login / data transfer issues:

These issues apply the QRISdata Collection access services, Euramoo, Flashlite and Tinaroo:

  1. Each of these systems is accessed via an "haproxy" load balancer that balances the load across 2 or more login or data transfer nodes:
    • The load balancers employ connection rate limiting.  If there are too many simultaneous connections or connection attempts from the same system, connections can be closed abruptly with this message:
          ssh_exchange_identification: read: Connection reset by peer
    • The data transfer rate through the load balancers is limited by hardware. 
    • For bulk data transfers, it is advisable to connect to the login nodes / data transfer nodes directly to avoid these problems.  This can also be used as a work-around if you are experiencing problems with login sessions.
      Service Load balancer Login / transfer nodes
      Data access data.qriscloud.org.au ssh1.qriscloud.org.au
      ssh2.qriscloud.org.au
      Awoonga awoonga.qriscloud.org.au

      awoonga1.qriscloud.org.au

      Flashlite flashlite.rcc.uq.edu.au flashlite1.rcc.uq.edu.au
      flashlite2.rcc.uq.edu.au
      Tinaroo tinaroo.rcc.uq.edu.au tinaroo1.rcc.uq.edu.au
      tinaroo2.rcc.uq.edu.au
  2. The login / data transfer nodes are all configured with "fail2ban" to protect against password guessing.
    • After a small number (typically 3) of failed login attempts, further attempts at login will be refused, even if you get the account and password correct. 
    • This "banning" typically lasts for 10 minutes.
    • If you use SSH keys to login, and your SSH clients offers multiple SSH keypairs, each "offer" that is not accepted is counted by "fail2ban" as a failed login attempt.  If you have lots of keypairs in (for example) your "~/.ssh" directory, you can actually be banned before your SSH client offers the right keypair.  The solution is to use the "-i" option to specify the key to be used.
  3. All of the above system should accept either QSAC account name and password, or a (for UQ users only) a UQ account name and password.  At times, one or the other may stop working.  In such cases, we recommend that you try the other approach as a work-around.
  4. Mediaflux and Nextcloud require either AAF login or account name / password login using a QSAC.

Nextcloud:

  1. There is a file size limit of 10GB for uploading individual files via the Nextcloud web interface.
    • There is no file size limit for file download.
    • The Nextcloud Sync client is not affected by this limit.
    • You can use the SSH access services to upload a larger file to a QRIScloud collection, but this option is not available for Nextcloud personal shares.
  2. If you reset your QSAC, this will break any Nextcloud Sync that you have configured. You will need to change the saved password in your Nextcloud Sync client configuration.

Aspera:

The QRIScloud Aspera service was decommissioned in December 2017.

HPC issues:

  1. PBS file stage-in and stage-out does not work, and is liable to CRASH the PBS system. Please do not attempt to use it.
  2. PBS jobs cannot be submitted from compute nodes. One job "qsub"-ing another job will not work.
  3. Users are strongly discouraged from submitting jobs that compute directly against QRIScloud collections, or using collections for temporary or scratch storage. If a job needs to read or write files in a collection, they should be staged in or out at the beginning / end of the job, preferably as TAR or ZIP archives if there a lots of small files.
  4. Euramoo no longer exists: 
    • Euramoo users have been automatically granted an Awoonga account.
    • User home directories from Euramoo have been made available on Awoonga.
    • For more information, please refer to the Euramoo transition document.
  5. Much of the RCC HPC user documentation can only be accessed via the UQ campus network.

Please also refer to the RCC Active Incidents page for current Awoonga, Flashlite, Tinaroo & Barrine incidents.

Have more questions? Submit a request

Comments

Powered by Zendesk