X

Scheduler Control Commands

Scontrol

Overview

The primary command to make modifications and show information on running, pending, or held jobs, in addition to scancel, is scontrol. The format of the command is:
scontrol [options] [command]
Where [options] are typically:
Option Short Long Description
Help -h --help Show scontrol help.
One-Line -o --oneliner Use one line per record for show command.
All -a --all List all jobs on all partitions for show command.
Details -d --details List all available information about jobs for show command.
Verbose -v --verbose Show debugging information when running any command.

 

And [command] is usually:
Command Input Example Description
update specification scontrol update jobid=12345 timelimit=4-00:00:00 Used to update a job specification. If timelimit, numcpus, numtasks, numnodes, minmemorycpu, or minmemorynode is updated for a pending job it must be less than what was initially requested. This command doesn't usually work for running jobs.
hold jobList scontrol hold 12345 Hold a pending job to prevent it from running.
suspend jobList scontrol suspend 12345 Suspend a running job to allow other jobs to run.
requeue jobList scontrol requeue 12345 Requeue a running, suspended, or completed job.
resume jobList scontrol resume 12345 Resume a suspended job.
release jobList scontrol release 12345 Release a held job.
show
entity=ID or
entity ID
show job 12345 Show details about an entity.

 

Job lists (jobList for a given [command]) is a single job id or a comma separated list such as 12345,12346,12347, for which a [command] is applied. Job names, with jobname=[some job name] instead of jobList, can be used to apply the [command] to any jobs with the same job name. Entities, specified with entity ID, can be:
Entity
ID example
Description
Account
hkurtzlab
Execute the sacctmgr command to see available accounts:
sacctmgr show assoc user=$USER format=account -P
Comment
 "A Comment"
 Could be most anything, but it is probably best to include information regarding job, dependencies, etc...
Dependency
after:12345
Job is on hold until a jobid, the ID, is satisfied:
  1. "after:jobid_1,jobid_2,...": Wait until jobs have started.
  2. "afterany:jobid_1,jobid_2,...": Wait until jobs have terminated.
  3. "afternotok:jobid_1,jobid_2,...": Wait until jobs have terminated in a non-zero state (usually failed).
  4. "aftertok:jobid_1,jobid_2,...": Wait until jobs have terminated in a zero state (usually succeeded).
  5. "singleton": Wait until jobs with the same name have terminated.
JobId or job
12345
Cannot be modified, but can be used with show to see details about a job.
ExcNodeList
c001,c002,c003
Exclude a comma separated list of nodes. Usually useful if a job has a problem with a particular node.
MinCPUsNode
4
Execute with at least ID's number of CPUs per node.
MinMemoryCPU
400
Execute with at least ID's memory per CPU for pending jobs. In megabytes.
MinMemoryNode
400
Execute with at least ID's memory per Node for pending jobs. In megabytes.
JobName
slurmJob
Name of the job(s) to be shown or modified.
NodeList
c001,c002,c003
Shrink a job to the node's listed in ID. Must be a subset of currently allocated nodes for the job.
NumCPUs
3-5
Set minimum-maximum number of CPUs for job. Maximum is optional, and if the job is running, it must be smaller than what is currently allocated for the job.
NumTasks
4
Set the number of tasks required for a job.
Partition
computeq
Set the job's partition.
QOS
mpi
Set the job's quality of service.
ReqNodeList
c001,c002,c003
Require a comma separated list of nodes. Usually useful if a job has a problem with a particular node.
Requeue
1
Set the job to requeue (resubmit) after a node failure.
StartTime
4/25/19
Set the job to initiate on or after a particular time/date.
TimeLimit
10-20:00:00
Set the job to terminate after running for an allotted amount of time.
UserID
jspngler
Identify a job by user name.

 

Examples

Showing a currently running job with scontrol show:
Skip Example
[user@log001 ~] scontrol show job 185995 -d
JobId=185995 JobName=lipo2Nano_-0.2_17500_1.0472.sh
UserId=jspngler(10123) GroupId=users(100) MCS_label=N/A
Priority=4294771864 Nice=0 Account=mlaradjilab QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
DerivedExitCode=0:0
RunTime=3-22:27:11 TimeLimit=5-00:00:00 TimeMin=N/A
SubmitTime=2019-04-11T17:44:03 EligibleTime=2019-04-11T17:44:03
StartTime=2019-04-18T10:06:03 EndTime=2019-04-23T10:06:03 Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
LastSchedEval=2019-04-18T10:06:03
Partition=computeq AllocNode:Sid=log002:91002
ReqNodeList=(null) ExcNodeList=(null)
NodeList=c063
BatchHost=c063
NumNodes=1 NumCPUs=8 NumTasks=1 CPUs/Task=8 ReqB:S:C:T=0:0:*:*
TRES=cpu=8,mem=3200M,node=1,billing=8
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
Nodes=c063 CPU_IDs=10-13,20-23 Mem=3200 GRES_IDX=
MinCPUsNode=8 MinMemoryCPU=400M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
Gres=(null) Reservation=(null)
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/home/jspngler/nanoPatch/test/2N_D10_vesicle/U_-0.2/lipo2Nano_-0.2_17500_1.0472.sh --job-name=5NM_-0.2_17500_1.0472 -D /home/jspngler/nanoPatch/test/2N_D10_vesicle/U_-0.2
WorkDir=/home/jspngler/nanoPatch/test/2N_D10_vesicle/U_-0.2
StdErr=/home/jspngler/nanoPatch/test/2N_D10_vesicle/U_-0.2/MD-185995-4294967294.err
StdIn=/dev/null
StdOut=/home/jspngler/nanoPatch/test/2N_D10_vesicle/U_-0.2/MD-185995-4294967294.out
Power=
Updating a pending job with less memory per CPU:
Skip Example
[jspngler@log001 ~] scontrol show job 186120
JobId=186120 JobName=lipo2Nano_-0.7_21500_2.0944.sh
UserId=jspngler(10123) GroupId=users(100) MCS_label=N/A
Priority=4294771739 Nice=0 Account=mlaradjilab QOS=normal
JobState=PENDING Reason=AssocMaxJobsLimit Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:00 TimeLimit=5-00:00:00 TimeMin=N/A
SubmitTime=2019-04-11T17:49:49 EligibleTime=2019-04-11T17:49:49
StartTime=Unknown EndTime=Unknown Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
LastSchedEval=2019-04-22T08:46:10
Partition=computeq AllocNode:Sid=log002:91002
ReqNodeList=(null) ExcNodeList=(null)
NodeList=(null)
NumNodes=1 NumCPUs=8 NumTasks=1 CPUs/Task=8 ReqB:S:C:T=0:0:*:*
TRES=cpu=8,mem=3200M,node=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=8 MinMemoryCPU=400M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
Gres=(null) Reservation=(null)
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/home/jspngler/nanoPatch/test/2N_D10_vesicle/U_-0.7/lipo2Nano_-0.7_21500_2.0944.sh --job-name=5NM_-0.7_21500_2.0944 -D /home/jspngler/nanoPatch/test/2N_D10_vesicle/U_-0.7
WorkDir=/home/jspngler/nanoPatch/test/2N_D10_vesicle/U_-0.7
StdErr=/home/jspngler/nanoPatch/test/2N_D10_vesicle/U_-0.7/MD-186120-4294967294.err
StdIn=/dev/null
StdOut=/home/jspngler/nanoPatch/test/2N_D10_vesicle/U_-0.7/MD-186120-4294967294.out
Power=
[jspngler@log001 ~] scontrol update job 186120 minmemorycpu=300
[jspngler@log001 ~] scontrol show job 186120
JobId=186120 JobName=lipo2Nano_-0.7_21500_2.0944.sh
UserId=jspngler(10123) GroupId=users(100) MCS_label=N/A
Priority=4294771739 Nice=0 Account=mlaradjilab QOS=normal
JobState=PENDING Reason=AssocMaxJobsLimit Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:00 TimeLimit=5-00:00:00 TimeMin=N/A
SubmitTime=2019-04-11T17:49:49 EligibleTime=2019-04-11T17:49:49
StartTime=Unknown EndTime=Unknown Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
LastSchedEval=2019-04-22T08:46:32
Partition=computeq AllocNode:Sid=log002:91002
ReqNodeList=(null) ExcNodeList=(null)
NodeList=(null)
NumNodes=1 NumCPUs=8 NumTasks=1 CPUs/Task=8 ReqB:S:C:T=0:0:*:*
TRES=cpu=8,mem=2400M,node=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=8 MinMemoryCPU=300M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
Gres=(null) Reservation=(null)
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/home/jspngler/nanoPatch/test/2N_D10_vesicle/U_-0.7/lipo2Nano_-0.7_21500_2.0944.sh --job-name=5NM_-0.7_21500_2.0944 -D /home/jspngler/nanoPatch/test/2N_D10_vesicle/U_-0.7
WorkDir=/home/jspngler/nanoPatch/test/2N_D10_vesicle/U_-0.7
StdErr=/home/jspngler/nanoPatch/test/2N_D10_vesicle/U_-0.7/MD-186120-4294967294.err
StdIn=/dev/null
StdOut=/home/jspngler/nanoPatch/test/2N_D10_vesicle/U_-0.7/MD-186120-4294967294.out
Power=