SLURM Scheduler

本文介绍了SLURM集群管理系统中的几种作业调度策略,包括内置的优先级调度、回填调度以及帮调度等,并详细解释了这些调度策略的工作原理及配置方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1.src/plugins/sched/
built-in: will initiate jobs strictly in their priority order, typically (first-in-first-out)
backfill: will initiate a lower-priority job if doing so does not delay the expected initiation time of higher priority jobs; essentially using smaller jobs to fill holes in the resource allocation plan. Effective backfill scheduling does require users to specify job time limits.
gang: gang time-slices jobs in the same partition/queue and can be used to preempt jobs from lower-priority queues in order to execute jobs in higher priority queues.
wiki: is an interface for use with The Maui Scheduler
wiki2: is an interface for use with the Moab Cluster Suite


2. slurm.conf 
# SCHEDULING 
FastSchedule=0
SchedulerType=sched/backfill
#SchedulerPort=7321 
SelectType=select/cons_res




3. slurmctld/job_scheduler.c


/*
  * build_job_queue - build (non-priority ordered) list of pending jobs
* IN clear_start - if set then clear the start_time for pending jobs
  * RET the job queue
  * NOTE: the caller must call list_destroy() on RET value to free memory
  */
extern List build_job_queue(bool clear_start);


/*
  * schedule - attempt to schedule all pending jobs
  * pending jobs for each partition will be scheduled in priority
  * order until a request fails
  * IN job_limit - maximum number of jobs to test now, avoid testing the full
  *  queue on every job submit (0 means to use the system default,
  *  SchedulerParameters for default_queue_depth)
  * RET count of jobs scheduled
  * Note: We re-build the queue every time. Jobs can not only be added
  * or removed from the queue, but have their priority or partition
  * changed with the update_job RPC. In general nodes will be in priority
  * order (by submit time), so the sorting should be pretty fast.
  */
extern int schedule(uint32_t job_limit);


/* Determine if a pending job will run using only the specified nodes
  * (in job_desc_msg->req_nodes), build response message and return
  * SLURM_SUCCESS on success. Otherwise return an error code. Caller
  * must free response message */
extern int job_start_data(job_desc_msg_t *job_desc_msg, will_run_response_msg_t **resp);


/*
  * launch_job - send an RPC to a slurmd to initiate a batch job
  * IN job_ptr - pointer to job that will be initiated
  */
extern void launch_job(struct job_record *job_ptr);
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值