Section 3: Basic QRISdata questions


Section 3: Basic QRISdata questions

Q3.1 - What is cloud storage?

By analogy with cloud computing, cloud storage is where you store your data on a infrastructure run by someone else "on the internet".

Q3.1.1 - Is QRISdata really cloud storage?

Yes it is. The models for resourcing and allocating QRISdata storage are different from commercial cloud storage providers, but QRISdata storage qualifies as cloud storage.

Q3.1.2 - Why should I use QRISdata storage?

QRISdata storage is held in physically secure data centres in Brisbane and Townsville, on systems that are managed by IT professionals.  It provides a more reliable place to store research data than putting it on USB drives and portable hard disks.

QRISdata is designed to complement well-managed data storage provided by your institution. In addition to simple storage, QRISdata's merit-based allocation processes encourage sharing and publication of data as a way to foster research collaboration, and preserve valuable research results.

Q3.1.3 - How do I get data storage?

You can start the process of applying for QRISdata storage via the QRIScloud Portal's Services page. Someone from the QCIF eRA team will then contact you to discuss your needs, and prepare the formal RDSI storage application.

Q3.1.4 - How much storage should I apply for?

You can apply for as much storage as you need: a few gigabytes to hundreds of terabytes, or even more.  It is a relatively simple process to increase the allocation size, if you find that you need more space; see FAQ 3.9 on how collection usage limits are implemented.

However there are a couple of practical caveats:

  • Requests for large amounts of storage are subject to capacity constraints.
  • For really large requests, we are only able to provide HSM collections; see FAQ 3.3.4.

Please do not apply for more space then you need. Our stakeholders measure us on actual data stored, not on allocated storage. If you are allocated storage and don't use it within the agreed time-frame, we reserve the right to take it back.

Q3.1.5 - What happens after my application is submitted?

Your RDSI application will assessed to determine whether it meets the relevant merit allocation criteria (see FAQs 3.2.3 & 3.2.4). 

  • Request for 1 terabyte or more will be assessed by QCIF RDSI Resource Allocation Committee (RAC).  This can take up to 1 month, so we will often "pre-allocate" some storage for you to get started.
  • For smaller storage requests, a "fast track" assessment procedure is used.

Your storage will then be provisioned and we will send you collection's details and links to the QRIScloud collection storage documentation.  We will SMS the collection's passwords to your mobile phone number.

Q3.1.6 - Can I request QRISdata storage via the NeCTAR Dashboard?

No. RDSI and NeCTAR resources are requested, allocated and accounted using different processes and mechanisms.

Q3.2 - What is an RDSI Collection?

There is no clear official definition from RDSI that says what a Collection is. However, we take it to encompass a collection of data (e.g. files) that is related to scientific / academic research activities.

Q3.2.1 - What do ReDS and CDS mean?

ReDS stands for Research Data Services. It is / was the merit allocated component of RDSI storage capacity.  ReDS collections are intended to be research data holdings of lasting value and importance.

CDS stands for Collection Development Storage.  It represents the 20% of storage capacity that was set aside for each RDSI "node" for the purposes of developing collections that would later qualify as ReDS collections.

Q3.2.2 - What data is eligible for RDSI storage?

Data is eligible for storage as RDSI ReDS storage if it is "nationally significant data".  This is defined by RDSI as data that is:

  • "viable and relevant as an input for future research",
  • "intended to be available to and usable by other researchers, with adequate supporting metadata", and
  • "recognised by a research organisation as valuable".

Q3.2.3 - What does RDSI merit allocation take account of?

The RDSI merit allocation process takes into account a number of factors including:

  • The storage capacity of the node
  • Your time-frame and readiness to ingest data.
  • The significance of the data collection itself; see above.

Applications for well-described, widely relevant data collections that can be quickly ingested and openly shared are viewed favourably.

Q3.2.4 - What are the RDSI merit allocation criteria?

The current criteria are outlined in the "RDSI - Merit Allocation Committee Checklist".

Q3.3 - What kind of QRISdata RSDI storage is available?

We currently offer five kinds of NFS-based RDSI storage:

  • Classic disk storage for collections up to 100 terabytes.  This is being phased out.
  • Classic DMF-based HSM storage.  This is best for collections that don't need to be online at all times.  Storage migrates between disk and tape "on demand".
  • GPFS-based disk storage. This is the default RDSI storage offering as of July 2017.  It has HSM characteristic, with the advantage that each collection's on-disk cache size and residency policy can be controlled individually.
  • RDSI volume storage, this has the same operational properties as NeCTAR volume storage, and requires a NeCTAR VM to allocate it.  (The reason for having two distinct offerings is to do with accounting.  Don't ask ...)
  • Collection storage managed by Mediaflux.

We should note that NeCTAR Object Storage and NeCTAR Volume Storage are covered by QRIScompute FAQs.

Q3.3.1 - What is a Storage Pool?

The storage that holds collection data is divided into Storage Pools. Each pool corresponds to a file system on one of the NFS servers.

  • The "Tier 2" pools are "classic" disk storage pools are 229 terabytes in size.  A collection is limited to roughly 1/2 a pool for operational reasons.  (The old "Tier1" pools no longer exist.)
  • The "GPFS" pools are for GPFS storage, they are much larger can accommodate collections of indefinite size, with the caveat that larger collections cannot be 100% on-disk.
  • There are two public storage pools for HSM storage:
    • The "Tier 3a" pool has a large (120 terabyte) front-end disk cache, and is designed for use-cases where files are likely to be accessed occasionally.
    • The "Tier 3b" pool has a smaller (30 terabyte) disk cache, and is designed for holding archival data.

Q3.3.2 - Why are the classic disk storage pools 229 terabytes?

This size was chosen so that a pool can be restored from tape in 24 hours in the event of a total failure of the pool's file system.  With the GPFS storage pools the time to restore is not so significant because of the HSM nature of the storage offering.

Q3.3.3 - Why can't I have a really large on-disk collection?

We don't have enough disk space to give everyone what they would like.   Not even on GPFS.  Building and running a large-scale disk array costs a significant amount of money.  Since QRIScloud is not operated on a "cost recovery" basis, the economics means that resources need to be rationed.

Q3.3.4 - What are the implications of using HSM?

HSM stands for Hierarchical Storage Management. With HSM, the primary copy of your files will be held on tapes, with a cache copy held on disk for faster access.  The problem is that the HSM disk caches are relatively small compared to the amount of data help on tape. If you try to access a file that is no longer in the cache it has to be retrieved from tape.

As a consequence, classic HSM is not suitable if you require fast access to your files at all times. If you are going to access a number of files in a short space of time, you can use DMF tools to instruct the HSM system to perform a bulk retrieval.  We don't currently have a way for users to bulk retrieve files with GPFS, but this is planned.

Q3.4 - How do I access a collection?

As of January 2016, we will provide a new collection owner with the collection identifier, together with instructions on how to access the collection and how to manage the access control groups (if applicable).

Q3.4.1 - What is the collection identifier?

Each collection has a unique identifier of the form "Qnnnn" where "nnnn" is a 4 digit number.

Q3.4.2 - What are access methods?

The access methods determine how you and your collaborators will be able to access the data in your collection. This needs to be specified when a collection is provisioned. There are currently 3 access methods to choose from for new collections Standard, NFS-only and Mediaflux.  Some old collections use a Legacy access method, but these are in the process of being migrated (Q 3.4.7).

The access method also determines how per-user access control works:

  • For Standard and Mediaflux collections, per-user access is implemented using access groups.
  • For Legacy collections, access control uses per-collection shared credentials.
  • For NFS-only collections, access control is your responsibility.

Q3.4.3 - What is Standard access?

The Standard access methods allow you to read and write your data using a variety of tools.  These tools include Cyberduck, Filezilla and WinSCP, various command-line utilities, and fast file transfer tools like GridFTP and Aspera.

In addition, collections with Standard access are NFS mounted on Euramoo, Flashlite and (soon) Tinaroo.  This will allow users of these systems who are members of the appropriate access groups to be able to access the collection data via the file system.

Q3.4.4 - What is NFS-only access?

The NFS-only access method allows you to NFS mount the collection on NeCTAR instances running in the QRIScloud availability zone. You need to nominate the NeCTAR tenants that can mount the collection, and then you need to configure the mounts on each instance.  Once you have done that, you can implement whatever data access services and access controls you want.

For more information, please refer to NFS access to QRIScloud Collections.

Q3.4.5 - What is Mediaflux access?

Mediaflux is a sophisticated and powerful data management product that is available for managing RDSI collections. For more details, please refer to the Getting Started with Mediaflux document, and the collection of training videos that it links to.

Q3.4.6 - Can I change my mind about my collection's Access methods?

Yes, you can change your mind.

  • Switching between "Standard" and "NFS-only" is relatively straight-forward, but it does entail a collection outage.
  • Switching between "Mediaflux" and other access methods is a significant amount of work as it entails copying all existing data in your collection.  This is likely to require an extensive outage.

Q3.4.7 - What is the access method migration?

We are currently in the process of migrating all collections to use one of  the new access methods.  Please refer to Upcoming changes to QRIScloud Collections for an overview.  All  owners of affected collections have been contacted about the migration, and the need to choose an access method.

Q3.5 - How is access to my collection controlled?

There is a document called Guide to Managing Collection Access that explains how a collection administrator can manage user access.  Note that this only applies to collection configured with Standard or Mediaflux access methods.

Q3.5.1 - What are the access groups?

An access group is essentially a managed list of (specific) people with access to a (specific) collection.  Each collection has two access groups: a "read-only" group and a "read-write" group.  Within each group, a person can have one of three roles: "user", "administrator" or "owner".

Q3.5.2 - How do I grant someone collection access?

The procedure for "inviting" a person is described in the Guide to Managing Collection Access. This procedure works for any person with current AAF access. If you wish to "invite" people who does not have AAF access, please contact QRIScloud Support.

Q3.5.3 - How do I revoke someone's collection access?

The procedure is described in the Guide to Managing Collection Access.

Q3.5.4 - How do I request access to a collection?

RDSI collections that are designated as "public" will appear in the Collections register. A collection page in the register will give:

  • The collection's title and FoR codes.
  • A link to the collection's portal (if one has been created and registered by the collection owner.
  • Links for requesting direct collection access, subject to approval by the access group owners or administrators.

We do not provide a way for users to find out about non-public collections. However, if you are aware of a collection and want to request access, you can email the owner and ask them to "invite" you.

Q3.5.5 - What do I do when I get an "invitation" URL?

If you were expecting to be invited, copy-and-paste the URL to your web browsers.  If you were not expecting it, either ignore it (do nothing!) or reach out to the person who (apparently) sent it to you.

A valid "user" invitation URL will look like this:<hex-digits>/user

where <hex-digits> is a string of 32 digits and letters ('a' through 'f').  If it looks different, be suspicious.

Q3.5.6 - What happens when I load an "invitation" URL?

The first thing that happens is that you are directed the QRIScloud Portal's AAF login page:

  • If you belong to an AAF member organization (e.g. any Australian University), login as described in FAQ 1.5.4.
  • If you do not have an AAF login, you can apply to QCIF for an AAF VHO account as described in FAQ 1.5.6. Once that has been set up, start this procedure again, using the AAF VHO as your organization.

If you don't have QRIScloud account associated with your AAF identity, the next thing that happens is that you will be asked to register, and acknowledge the QRIScloud Terms and Conditions.

Finally, you will be sent to a form for requesting access to the collection. Fill it in and submit it, and your request will be sent to the collection's owners / administrators for approval.

Q3.5.7 - What happens when my request is approved?

When your request is approved, you will receive an email from the QRIScloud portal with instructions on accessing the collection.

In your MyServices page, you can see all of the collection access groups that you belong to. If you click on the link for a collection group, you will get a page that gives some details on accessing the collection.

Q3.6 - Will my collection be backed up?

No. QRISdata does not provide a conventional backup service for collection data (see Q3.6.1),

Instead of backup, we replicate the data to protect against operator errors, storage system failures and major data center catastrophes.

Q3.6.1 - How is replication different to backup?

With a backup system, you would have a reasonable expectation that we could restore your files if you accidentally deleted or overwrite them.  A typical backup system guarantees to keep old versions of files for months or years, and provides mechanisms that allow the backup administrator to restore them.

In a replication system, the primary goal to keep copies of files so that we can restore to the most recent "known good" state of a collection.  Restoration of files from older states may be possible, but is not the primary goal.

Q3.6.2 - How does replication work?

For on-disk collections, the file system is scanned periodically, and any file that has changed since the last scan is copied to a "shadow" HSM system.

For HSM collections (and the "shadow" HSM for on-disk collections), the replicas are created by the HSM system itself. The normal replication policy is to create two tape replicas of each file in the tape store in the Polaris data centre, and a third replica in the tape store in the St Lucia data centre.

Q3.6.3 - How long will it be before my data is replicated?

The design goals for QRISdata collection replication state that the first on-site tape replica should be completed within 24 hours, and that the off-site replica should be completed within 48 hours.

Q3.6.4 - How long are the replicas retained?

The design goals for QRISdata collection replication state that on-site replica of a file should be retained for 4 weeks after deletion, and the off-site replica should be retained for 12 weeks.  (Currently we are retaining data for 6 months, though this is subject to change.)

Q3.6.5 - Why doesn't QRIScloud implement backup for me?

The RDSI organisation decided not to fund the provision of backup for collection storage. Instead, they directed their funding to maximize the available storage. (The original policy from RDSI was that backup was the user's responsibility!)

QCIF designed their initial RDSI storage offerings to meet RDSI's stated goals and requirements. The replication model was the best that we could do / justify.

The other aspect is that implementing the kind of backup system that a typical user desires is extremely expensive at the scale that QRISdata operates; i.e. multiple petabytes of data and 500 million files. The bottom line is that if we implemented "time machine" style backup and restore for QRISdata, we could probably only afford to store 10th of the data that we currently store.

Q3.7 - How can I access data in a collection?

Collection data can be accessed in the following ways, depending on the collection's access method:

  • For Standard access:
    • Using the "" access system; see Q3.7.1.
    • Using SSH based protocols such as "scp", "rsync" and "sftp" via the above system; see Q3.7.1.
    • Using Globus GridFTP; see Q3.7.2.
    • Using Aspera Shares or Aspera Drive; see Q3.7.3.  (Note: Aspera has been withdrawn for GPFS collections, and will be withdrawn for other collection types later this year.)
    • Using Nextcloud.
    • Via auto-provisioned file-system mounts on Euramoo, Flashlite and Tinaroo.
    • Via campus Medici caches (currently UQ only).
    • Soon it will be possible to access GPFS-based collections via Object Storage APIs.
  • For  Mediaflux access; see Q3.4.5.
    • Using the Arcitecta Desktop
    • Using the Arcitecta File Explorer
    • Using a custom Mediaflux portal.
  • For NFS-only access; see Q3.7.4
    • Via an NFS mount on a NeCTAR VM
    • Using data access services that you set up on such a VM

Q3.7.1 - How do I access data via SSH-based data access services?

The "" system is a load-balancer for two access machines: "" and "".  These machines allow you to read and write files using SSH and SSH-based file transfer protocols. 

  1. You can login to the machines ("data", "ssh1" or "ssh2") using an SSH client such as the "ssh" command on Mac OSX and Linux, or "putty" on Windows:
    • Use your QSAC to login, or your UQ credentials if the account names match: see Q1.2.6, Q1.2.7 & Q1.2.8.
    • Once logged in, you will have a standard Linux command environment, similar to what you have when you connect to a NeCTAR VM and typical HPC systems.
    • The files for each collection are auto-mounted as "Qnnnn" directories beneath the "/data" directory.  Access is restricted to users who are members of the respective collection access groups.
  2. You can use a desktop file transfer command or tool to copy files between the your desktop and the collection via the access machines:
    • On Windows you can use Cyberduck, Filezilla, WinSCP among others.
    • On Mac OSX or Linux you can use Cyberduck (Mac OSX only), Filezilla and command line tools such "scp", "rsync" or "sftp".
    • The path to your collection will be as above; i.e. "/data/Qnnnn", where "Qnnnn" is your collection's identifier.
    • The "sh1" and "ssh2" machines are preferred over "data".  The latter is connection rate-limited which can cause problems when transferring lots of files.
  3. If you have an account on one of the HPC systems that mount the collections (i.e. Euramoo, Flashlite and Tinaroo) you can access the files via the "/RDS" directory.

Note: if you are transferring large numbers of small files, you will typically get better performance using "rsync" rather than "scp" or "sftp".  The latter need to create a separate SSH-enabled TCP connection for each file transferred.  When the files are small, the connection overhead is large compared to the time taken to transfer the bytes.

Q3.7.2 - What is Globus GridFTP?

Globus GridFTP is a file transfer protocol that is designed for large-scale transfers on high bandwidth networks. The Globus GridFTP for data transfer document provides brief instruction on using GridFTP with QRISData collections.

Q3.7.3 - What is Aspera?

Aspera is a suite of proprietary high-speed file transfer software. We have a license that allows all QRIScloud users to download, install and use the client-side software. The primary services are:

  • Aspera Shares which provides web-based access to your collection.
  • Aspera Drive which supports "synchronization" of files between a collection and another computer.

For more details, please refer to the Getting Started Guide for Aspera document.


  1. Aspera does not perform well for transfers that involve moving a lot of small files. If you have many small files to transfer you will get a much better file throughput using "rsync"; see Q3.7.1.
  2. We no longer support the Aspera "ascp" command line tool.
  3. For operational reasons, we have had to turn of Aspera access to GPFS collections.
  4. It turns out that for typical users transferring files to and from desktops and laptops over low-speed or congested networks, the proprietary protocol offers little speedup.

Q3.7.4 - What is Nextcloud?

Nextcloud is an open source file access portal that offers similar functionality to Aspera.

  • Nextcloud is available as a browser-based web portal, a desktop client or a mobile app.  This includes the facility for creating "share" links that can be emailed to other people.
  • Nextcloud can be connected to an QRISdata collection as an "external storage server".  We will shortly be rolling out a service that allows a collection's custodian to make the collection directly available.
  • The Nextcloud desktop client offers file syncing.
  • Nextcloud can also be connected to AARNET CloudStor+, Dropbox, Google Drive, Amazon S3 and NeCTAR Object Storage.


QRIScloud users of Nextcloud are provided with a "free" allocation of 110Gb that is separate from their collections.


For more details, please refer to the Getting Started Guide for Nextcloud document.

Q3.7.5 - How do I get NFS access to my data?

The first step is that the collection manager needs to lodge a QRIScloud support request to "export" the collection to a specified NeCTAR project.  Once that has been done, the NFS access to QRIScloud Collections document explains how to set up an NFS mount on a NeCTAR instance.

You are free to install and use other software on your NeCTAR instance, and use that to manage your collection data. However, the onus will be on you to manage security and access control.

Note: NFS access is only available for "NFS-only" collections.  NFS-only and Standard Access are mutually exclusive, and NFS-only collections are NOT auto-mounted on the HPC / HTC systems.

Q3.7.6 - Is NFS access secure?

That is a complicated question.

  • On the one hand, exporting your collection does not directly expose it to the internet.  The exported collection is only directly accessible via a private IP address, and it should be impossible for anything outside of the Polaris data centre to access it.
  • On the other hand, once your collection has been attached to an instance in your NeCTAR project, anyone who has (or can gain) privileged access to the instance has unfettered access to that data.

Thus, while providing NFS access to your collection is not insecure per se, it is definitely increasing the risk to your data. 

(In theory, the cryptolocker problem exists for NFS mounts on Linux; see Q3.7.7. However there have been no reports of cryptolocker criminals targeting Linux systems.  A typical Linux-based NeCTAR instance is less likely to be targeted because 1) it is not a Window system, and 2) because the standard NeCTAR images don't include a web browser or email client.  Web pages and email are the most common attack vectors.)

Q3.7.7 - Can I mount my collection as a "network share"?

We strongly recommend that you DO NOT mount your collection on your home system (or any other system outside of QRIScloud) as a "network share".  Doing this will place your RDSI data at risk.  Please read this page for more information. 

The short explanation is that Windows network shares make your data vulnerable if your home system is infected with cryptolocker ransomware. If your RDSI collection data does get locked by cryptolocker criminals, please contact QRIScloud support urgently.

Q3.8 - How do I get data into my collection?

The normal method for "ingesting" data into a collection is to go to the system where the data currently lives and "upload" it to the collection using one of the supported access methods; see Q3.7.

If your data is held on removable media (e.g. external hard drives, memory sticks, DVDs) you will need to plug them in one at a time.

Q3.8.1 - How do I ingest huge amounts of data?

If you have really large amounts of data to ingest, then conventional upload is liable to be problematic. Transferring terabytes of data over the network takes a long time, especially if your local networking is slow.  Some of the alternatives that we can try include:

  • Using a high bandwidth transfer method such as Globus GridFTP or Aspera
  • Running transfers as background processes.
  • Optimize transfer patterns; e.g. instead of downloading files from somewhere to your laptop or PC and then uploading them, transfer them directly.

If have lots of really small files to ingest / upload then you are going to be in for a hard time, no matter what approach you take. We strongly advise you NOT to do this. Instead, we recommend that you use a utility like "zip" or "tar" to bundle up the files into larger "archive" files before you upload them.  If you need to compute against the little files, there are two approaches:

  1. Copy or download the ZIP / TAR archive file to a local file system on the machine where you are doing the computation, unzip / untar the bundle into a local directory (i.e. not NFS mounted!) and compute against that tree.
  2. Modify your application so that it can open and use the archive file directly. (There are standard runtime libraries for doing this in most mainstream programming languages.)

Q3.8.2 - Are there alternatives to uploading?

QCIF has a couple of systems that provide alternatives to over-the-net uploading.

  • DustBuster is a portable 20 terabyte NAS system.  We can loan you this system temporarily to load up your data. When you return the system, we can plug it into a fast network and upload the data to your collection.
  • Hoover is a system that we can use to read and upload data from a range of portable USB media.

We have access to fast node-to-node data transfer mechanisms that can be used for high-volume ingestion.

Q3.8.3 - Can you help me with ingestion?

If you need advice or assistance with ingesting your data, please contact QRIScloud support.

Q3.9 - Is my QRISdata usage controlled?

For Tier 1 and Tier 2 collections, we enforce quotas on each collection's data usage.  For Tier 3 (HSM) collections and GPFS collections quotas are not currently enforced.

Q3.9.1 - How are disk quotas implemented?

Disk usage for on-disk collections have associate file quotas implemented using the XFS quota mechanism on the NFS server:

  • The "soft" quotas are set to your current allocation size.  (The quota system allows you to exceed the soft quota for a short period of time; see Q3.9.2.)
  • The "hard" quota is set to a value larger than your current allocation size.

Currently, the hard quota is set at twice the allocation size, but this is subject to change.

A different approach is used for HSM and GPFS.  There are quotas on the total storage usage, but the more critical issue s how much cache disk space your collection uses:

  • For classical HSM (Tier3a and Tier3b) collections, all collections compete for disk space in the cache.  If the cache gets too full, the HSM system evicts files based on when they were last accessed.
  • For GPFS collections, each collection has its own cache, and the dimensions of that cache can be controlled on a per collection basis. We will adjust collection GPFS cache sizes up and down according to your access patterns, and general demand for disk space.

Q3.9.2 - What happens when my collection is over quota?

The following applies to Tier 2 collections only:

  • When the collection goes over its soft quota, you should start seeing a message each time you login to the collection VM, warning you that you need to reduce your data usage. You can also check this by running the "quota" command.
  • When the collection goes over its hard quota, you will be unable to create or modify files.  The only thing that you can do is to delete files.
  • If a collection is over soft quota for more than 50 days, the quota violation escalates to a hard quota violation.

Q3.9.3 - What should I do if my collection is over soft quota?

You need to either delete files or take other steps to reduce your usage below your allocated amount, OR request an increase in your RDSI allocation.

You should not wait until the soft quota violation escalates to a hard quota violation.  When your collection is in that state, it is difficult to recover to a state where you can use your collection normally.

Q3.9.4 - What should I do if my collection is over hard quota?

Contact QRIScloud support urgently and we will advise you on how to proceed.

BEWARE: The only thing that you can safely do when you are over hard quota is to delete files and directories.  If you attempt to compress files or create ZIP files or TAR files in place, your files are liable to be truncated and lost.

Q3.9.5 - Are there quotas on file counts?

Not yet.  Currently, we only implement quotas on bytes stored, but we are also considering imposing quotas on the file counts. 

Collections that contain large number of small files present a number of technical and operational problems.  Data access performance is impacted, replication scanning is impacted, writing of replicas to tape is impacted, migration of collections is impacted. 

Q3.10 - How do I share my data with other people?

  • If your collection is configured with "Standard" or "Mediaflux" access, then you can grant read-only or read-write access to your collection to anyone who has a QRIScloud account.  (Anyone with AAF access can get a QRIScloud account.)
  • If your collection is configured as "NFS only", it is up to you implement your own collection access controls.

Q3.10.1 - Does a collection have a DOI or other permanent identifier?

No. Usually your institution's library manages Digital Object Identifiers (DOI) and other identifiers for publications and data. Please contact your library for institution-specific questions.

Q3.10.2 - Can I make my QRIScloud data open access?

  • We don't currently have a way to implement open access for Standard Access or Mediaflux collections.  However, we can include your collection in the list of collections that other QRIScloud users can request access to.
  • If your collection is NFS-only, then you are free to expose the data as you see fit. However, you need to be mindful of data security and privacy concerns.

Q3.10.3 - Can I provide access to different files to different people?

This is not something that is currently supported.  There are some other options:

  • You could NFS mount your collection on a NeCTAR instance, then implement a portal that does fine-grained access control. 
  • A Mediaflux collection could be configured to provide fine-grained control.
  • It is technically possible to split a collection into sub-collections with separate read-write and read-only access groups.  However, this is results in overhead for QCIF operational and administrative staff.

Q3.10.4 - Can I change my collection's metadata?

Your collection's metadata includes things such as:

  • The collection's title and description
  • The collection's custodian, requester and technical contacts.
  • The collection's FoR codes.
  • The organization and organizational unit that the collection belongs to.
  • The URL of a public portal for the collection.
  • The URL of a public metadata record for the collection.

If you need changes to be made to the metadata for your collection, please submit a QRIScloud support request.

Q3.11 - How do collection file and directory permissions work?

Collections are stored as files and directories on a POSIX compliant file system. This that the low-level access control mechanisms for files and directories in a collection are based on POSIX users and groups, POSIX file permissions and the POSIX ACL (access control list).

By contrast, the QRIScloud collection access model is based on each collection having one group of users with read-write (RW) access to the collection, and a second group of users who have read-only (RO) access. Ideally, the person who created the file is not supposed to have more access than anyone else.

The QRIScloud access model is implemented as follows:

  • The RW and RO groups are implemented as POSIX groups.  Thus collection Q0042 has POSIX groups called Q0042RW and Q0042RO. 
  • Each collection user maps to a distinct POSIX user.
  • There is a distinct collection user identity for each collection (e.g. Q0042) which is used when the actual user identities is not available. For example, files and directories ingested using Aspera are owned by the collection's user identity.
  • Each file or directory should have "rwx" as its owner and group access settings, and "---" for "other" access.
  • Each file or directory should have the collection's RW group as its POSIX group.
  • All directories should have the sticky "inherit group" access bit set, so that newly created subdirectories are also owned by the RW group.
  • All files and directories should inherit ACLs that:
    • Grant "r-x" access to members of the collection's RO group.
    • Provide default group access and umask settings.
    • Forbid access to "other" users.

Q3.11.1 - Are there any permissions gotchas?

Unfortunately, there are:

  • The person who creates a file or directory will be the POSIX owner of the object.  That means they will be able to change the access bits and the ACLs.
  • If a file or directory (somehow) has access other than "rwxrwx---" / "drwxrws---", or has its POSIX group set incorrectly, or has its ACLS set incorrectly, it could prevent users in the RW or RO from doing what they should be able to do.  These problems can spread to newly created subdirectories of a "broken" parent directory.
  • For files and directories in GPFS, problems can arise if user and group identities or ACLS are not mapped consistently by all caches.

Q3.11.2 - How do I fix incorrect permissions in my collection?

If the owners, permissions or ACLs on a file or directory are incorrect, you have two options:

  • If the file system recognizes you the owner of the file or directory (according to the "ls -l" command) you may be able to use "chgrp" to fix incorrect groups,  "chmod" to correct permissions, and "setfacl" command to add missing ACLs.
  • If you are not the owner then you should raise a support ticket.

It is also worth noting is that fixing a collection takes time in proportion to the number of files. This is another case where having lots of small files causes pain.

Q3.12 - What is a collection migration?

Sometimes QRIScloud operations staff need to move a collection from one physical storage medium to another.  Unfortunately, our infrastructure does not allow this to be done without an outage. It is also sometimes necessary to make access control changes that impact on your ability to use the collection.

If we need to migrate your collection, we will contact you and provide you with a detailed description of the migration procedure.  There are some points in the procedure where we need to coordinate with you (the collection owner) via a support ticket.  We would ask you to respond to migration tickets promptly.

Q3.13 - Are there other things that I need to know to use my collection?

We have written a "QRIScloud Collections Dos and Don'ts" document with recommendations on how to use QRISData collections safely, and without causing operational problems.

Have more questions? Submit a request


Powered by Zendesk