← Back to News List

Roll-out of Dedicated Access to GPU Machines for Contributing Groups

To all HPC users,


Based on guidance provided by the Shared Infrastructure Governance subcommittee on advanced GPU resources, the DoIT Research Computing Team has made some changes to the Slurm model that governs chip-gpu. In this new Slurm model, all GPU machines are available for general use so that resources never sit idle, but machines contributed by funds from specific research groups can opt for dedicated access to those machines through Slurm. Jobs running on contributed machines may be preempted if and when a user from a contributing research group opts for this dedicated access. For jobs run by all other users, these jobs may occasionally be preempted if the job is running on a contributed machine.


In this way, all users have access to all GPU machines, but users belonging to contributing research groups retain dedicated access to the contributed machines.


Users will notice that the chip-gpu Slurm cluster now shows three partitions. The default gpu partition encompasses all GPU machines, the gpu-contrib partition encompasses only machines where jobs may be preempted, and the gpu-general partition encompasses all machines where jobs will not be preempted due to dedicated contributor access. Note that the gpu-contrib partition is meant only to show users which nodes may see jobs preempted due to dedicated contributor access.


The three day preemption rule is unchanged across chip-gpu.


For more information about any of these hardware partitions and how preemption works across the chip HPC cluster, please see the wiki page: https://umbc.atlassian.net/wiki/spaces/faq/pages/1249509377/chip+Partitions+and+Usage


As always, let us know if you notice any issues or otherwise have questions or concerns by submitting a descriptive help request form found at the following link.


https://rtforms.umbc.edu/rt_authenticated/doit/DoIT-support.php?auto=Research%20Computing


Thanks for reading,

Roy Prouty

Assistant Director for Research Computing

(& Max!)

Tags:

Posted: September 4, 2025, 4:19 PM