We experienced some network related problems within Polaris between about 11:40 and 12:10 that affected a limited number of machines. For example, some of the Euramoo login nodes were rejecting logins.
The problems have stopped for now. Engineers are investigating to determine the root cause.
UPDATE - The diagnosis of the problem is that there are hardware / firmware problems with the pair of DataIO network switches. This has been causing intermittent problems for some services including (at various times) Flashlite and the "QRIScloud-rdc" ceph cluster.
UPDATE -The problems were resolved by updating the firmware.
Comments