Taki is back online
The cooling issue has been resolved
I have brought Taki back online. You should be able to start running jobs. I do want to note the status of some nodes that still offline.
- gpunode001
This node is currently offline. We have had issues with this node in the past and the usual resolution is fixing the problem onsite. I should be able to resolve is this issue on Monday. - A number of 2013 nodes
We have a failed Infiniband switch on the cluster that has taken a number of 2013 nodes offline. We have a ticket in to get this issue resolved with the vendor. I expect this to be resolved early next week.
As always, if you run into issues, please submit a help desk ticket with as much detail as you can provide about the issue.
I hope everyone has a good weekend.
Posted: June 2, 2023, 8:45 PM