We have become aware of several occurrences of truncated files on QRIScloud in MeDiCI data collections.
It is important to note there has been no corruption to any underlying data stored on QRIScloud, however, at present intervention is needed by the QRIScloud support team to make the files available for use through the MeDiCI cache.
If you become aware of what appear to be truncated and/or corrupted files for your MeDiCI collection, please contact the QRIScloud Helpdesk.
What is causing this to occur?
After a thorough investigation, the QRIScloud operations team has determined that the truncated files are an artefact of a timeout being triggered in the MeDiCI caching layer due to delays in retrieving files from tape. Files are retrieved from tape when they are not resident in the MeDiCI cache or the underlying DMF cache.
There is no mechanism within the MeDiCI caching layer to query the DMF caching layer to determine whether the files being accessed are on-line (resident in the DMF cache) or off-line (not resident in the DMF cache and need to be retrieved from tape).
When the timeout to retrieve data from the DMF cache into the MeDiCI cache expires, the MeDiCI caching layer creates a sparse file which will look like the file is truncated and/or corrupted, as it does not contain the expected data and is unusable.
In the background, the original request to access data, which triggered the retrieval of files from tape, continues, with the files eventually pulled into the DMF caching layer. Whilst the files are available in the DMF caching layer they are not automatically pulled into the MeDiCI caching layer, leaving the MeDiCI cache out of synch.
There is presently no mechanism for end-users to re-synchronise between the MeDiCI cache and the DMF cache, hence, the need for the QRIScloud support team to become involved to correct it.
How are we addressing this?
In the short-term, the QRIScloud operations team has increased the timeout value in the MeDiCI caching layer to minimise the occurrence of truncated files, however, they may still occur.
This issue will be fully resolved in the coming months with the introduction of a new disk layer within the DMF environment. The DMF disk layer that is being introduced has been designed to store a complete copy of MeDiCI collections data on disk, in conjunction with the additional tape copies for resilience. With a complete copy of MeDiCI collections on disk the latency associated with the retrieval of files is removed, preventing the timeout from being triggered in the MeDiCI caching layer.
What do you need to do?
If you become aware of what appear to be truncated and/or corrupted files for your MeDiCI collection or have any questions, please contact the QRIScloud Helpdesk.