Tips for using GPFS from a script or program

Follow

General advice

Avoid using GPFS (or any collection for that matter) as computational storage.  It is not designed for that purpose.  If your application needs to repeatedly read and write files, they should be copied to local storage so that you are computing against the local disk (or SSD) file system.  (You could even use a "tmpfs" file system where the files only exist in memory.)

Avoid writing large numbers of small files to GPFS.  GPFS collections have a default file count limit of 100,000 files.  Generally speaking storing data in small files leads to inefficient storage (i.e. wasted space due to disk block gralarity),  inefficient file access (due to lots of open / close syscalls), and inefficient replication to the backend tape.

If you need to access a large number of files that are not in your collection's GPFS cache, contact QRIScloud support. (We have ways to make it happen efficiently, but it involves running privileged commands.)

Using GPFS from a program

If you want to tell if a file is in the GPFS cache, use the "stat" library method (C / C++) or the equivalent in your implementation language. Then compare the file's size in bytes with the block count. If the file size is > zero but the block count is zero, that means that the file is not in the cache.

The current behavior of GPFS when you attempt to read a file that is not in the cache is to send a request to the back end server to fetch it. If the server does not deliver the file within a given period (currently 1 minute, but this could be changed) then the GPFS server returns an I/O error to the client. (The reason for returning an I/O error is to avoid recalls from blocking NFS server threads indefinitely. That eventually leads to kernel thread starvation which blocks access to files that >>are<< in the cache!)

If you get an I/O error on a recall, wait a few seconds or minutes and try again. The original recall request will still be in the back-end DMF server's recall queue. and will eventually deliver the file to the DMF cache. When it is there, a read request to the GPFS server will succeed quickly.

For people who are trying to expose GPFS-resident files to end users via a web portal: it is not possible to predict file recall times for files that are in the cache:

  • It may take a second or so for a file that is already in the cache on the back-end DMF server.
  • It may take a couple of minutes, if the correct tape is already loaded, or a drive is idle.
  • It may take many minutes the DMF system is really busy.
  • There is no way for GPFS to find out what the delay will be.

Using GPFS from a script

You can use the "stat" command to find out a file's size and block count, then compare them to test if a file is in the GPFS cache.

Use the "tar" command to bundle up small files as TAR files before writing them to a GPFS collection.

If you need to unpack a TAR file, don't unpack it into a GPFS collection.  Instead, unpack it into local storage on the NeCTAR instance or HPC system where you need to use the files.

The HPC systems have a script called "recall_medici" that you can use to ensure that a GPFS file is cached.  If you run it before you submit a job, you can avoid your job waiting for a file to be recalled; i.e. wasting "wall time".

Have more questions? Submit a request

Comments

Powered by Zendesk