A job can be blocked from being scheduled for the following reasons:
There are insufficient resources available to start the job, either due to active reservations, other running jobs, component status, or system/partition size.
Other higher-priority jobs are waiting to run, and the job's time limit prevents it from being backfilled.
The job's time limit exceeds an upcoming reservation (e.g., scheduled preventative maintenance)
The job is associated with an account that has reached or exceeded its
Display a list of queued jobs sorted in the order considered by the
squeue --sort=-p,i --priority --format '%7T %7A %10a %5D %.12L %10P %10S %20r'
Common reason codes:
How are jobs prioritized?
Slurm prioritizes jobs using the multifactor plugin3 based on a weighted summation of age, size, QOS, and fair-share factors.
sprio command to inspect each weighted priority value
sprio [-j jobid]
The age factor represents the length of time a job has been sitting in the queue and eligible to run. In general, the longer a job waits in the queue, the larger its age factor grows. However, the age factor for a dependent job will not change while it waits for the job it depends on to complete. Also, the age factor will not change when scheduling is withheld for a job whose node or time limits exceed the cluster's current limits.
The weighted age priority is calculated as PriorityWeightAge*[0..1] as the job age approaches PriorityMaxAge[14-0], or 14 days. As such, an hour of wait-time is equivalent to ~2.976 priority.
Job Size Factor
The job size factor correlates to the number of nodes or CPUs the job has requested. The weighted job size priority is calculated as PriorityWeightJobSize*[0..1] as the job size approaches the entire size of the system. A job that requests all the nodes on the machine will get a job size factor of 1.0, with an effective weighted job size priority of 28 wait-days (except that job age priority is capped at 14 days).
Quality of Service (QOS) Factor
Each QOS can be assigned a priority: the larger the number, the greater the job priority will be for jobs that request this QOS. This priority value is then normalized to the highest priority of all the QOS's to become the QOS factor. As such, the weighted QOS priority is calculated as PriorityWeightQOS*QosPriority[0..1000]/MAX(QOSPriority).
QOS Priority Weighted priority Wait-days equivalent ----------- -------- ----------------- -------------------- admin 1000 1500 21.0 janus 0 0 0.0 janus-debug 400 600 8.4 janus-long 200 300 4.2
The fair-share factor serves to prioritize queued jobs such that those jobs charging accounts that are under-serviced are scheduled first, while jobs charging accounts that are over-serviced are scheduled when the machine would otherwise go idle.
The simplified formula for calculating the fair-share factor for usage that spans multiple time periods and subject to a half-life decay is:
F = 2**((-NormalizedUsage)/NormalizedShares))
Each account is granted an equal share, and historic records of use decay with a half-life of 14 days. As such, the weighted fair-share priority is calculated as PriorityWeightFairshare*[0..1] depending on the account's historic use of the system relative to its allocated share.
A fair-share factor of 0.5 indicates that the account's jobs have used exactly the portion of the machine that they have been allocated and assigns the job additional 1000 priority (the equivalent of 2976 wait-hours). A fair-share factor of above 0.5 indicates that the account's jobs have consumed less than their allocated share and assigns the job up to 2000 additional priority, for an effective relative 14 wait-day priority boost. A fair-share factor below 0.5 indicates that the account's jobs have consumed more than their allocated share of the computing resources, and the added priority will approach 0 dependent on the account's history relevant to its equal share of the system, for an effective relative 14-day priority penalty.