Posts about slurm

Why hasn't my (Slurm) job started?

A job can be blocked from being scheduled for the following reasons:

  • There are insufficient resources available to start the job, either due to active reservations, other running jobs, component status, or system/partition size.

  • Other higher-priority jobs are waiting to run, and the job's time limit prevents it from being backfilled.

  • The job's time limit exceeds an upcoming reservation (e.g., scheduled preventative maintenance)

  • The job is associated with an account that has reached or exceeded its GrpCPUMins.

Display a list of queued jobs sorted in the order considered by the scheduler using squeue.

squeue --sort=-p,i --priority --format '%7T %7A %10a %5D %.12L %10P %10S %20r'

Reason codes

A list of reason codes1 is available as part of the squeue manpage.2

Common reason codes:

  • ReqNodeNotAvail
  • AssocGrpJobsLimit
  • AssocGrpCPUMinsLimit
  • resources
  • QOSResourceLimit
  • Priority
  • AssociationJobLimit
  • JobHeldAdmin

How are jobs prioritized?

PriorityType=priority/multifactor

Slurm prioritizes jobs using the multifactor plugin3 based on a weighted summation of age, size, QOS, and fair-share factors.

Use the sprio command to inspect each weighted priority value separately.

sprio [-j jobid]

Age Factor

PriorityWeightAge=1000
PriorityMaxAge=14-0

The age factor represents the length of time a job has been sitting in the queue and eligible to run. In general, the longer a job waits in the queue, the larger its age factor grows. However, the age factor for a dependent job will not change while it waits for the job it depends on to complete. Also, the age factor will not change when scheduling is withheld for a job whose node or time limits exceed the cluster's current limits.

The weighted age priority is calculated as PriorityWeightAge[1000]*[0..1] as the job age approaches PriorityMaxAge[14-0], or 14 days. As such, an hour of wait-time is equivalent to ~2.976 priority.

Job Size Factor

PriorityWeightJobSize=2000

The job size factor correlates to the number of nodes or CPUs the job has requested. The weighted job size priority is calculated as PriorityWeightJobSize[2000]*[0..1] as the job size approaches the entire size of the system. A job that requests all the nodes on the machine will get a job size factor of 1.0, with an effective weighted job size priority of 28 wait-days (except that job age priority is capped at 14 days).

Quality of Service (QOS) Factor

PriorityWeightQOS=1500

Each QOS can be assigned a priority: the larger the number, the greater the job priority will be for jobs that request this QOS. This priority value is then normalized to the highest priority of all the QOS's to become the QOS factor. As such, the weighted QOS priority is calculated as PriorityWeightQOS[1500]*QosPriority[0..1000]/MAX(QOSPriority[1000]).

QOS          Priority  Weighted priority  Wait-days equivalent
-----------  --------  -----------------  --------------------
admin            1000               1500                  21.0
janus               0                  0                   0.0
janus-debug       400                600                   8.4
janus-long        200                300                   4.2

Fair-share factor

PriorityWeightFairshare=2000
PriorityDecayHalfLife=14-0

The fair-share factor serves to prioritize queued jobs such that those jobs charging accounts that are under-serviced are scheduled first, while jobs charging accounts that are over-serviced are scheduled when the machine would otherwise go idle.

The simplified formula for calculating the fair-share factor for usage that spans multiple time periods and subject to a half-life decay is:

F = 2**((-NormalizedUsage)/NormalizedShares))

Each account is granted an equal share, and historic records of use decay with a half-life of 14 days. As such, the weighted fair-share priority is calculated as PriorityWeightFairshare[2000]*[0..1] depending on the account's historic use of the system relative to its allocated share.

A fair-share factor of 0.5 indicates that the account's jobs have used exactly the portion of the machine that they have been allocated and assigns the job additional 1000 priority (the equivalent of 2976 wait-hours). A fair-share factor of above 0.5 indicates that the account's jobs have consumed less than their allocated share and assigns the job up to 2000 additional priority, for an effective relative 14 wait-day priority boost. A fair-share factor below 0.5 indicates that the account's jobs have consumed more than their allocated share of the computing resources, and the added priority will approach 0 dependent on the account's history relevant to its equal share of the system, for an effective relative 14-day priority penalty.