SLURM Scheduler

最新推荐文章于 2024-12-15 07:30:00 发布

amaowolf

最新推荐文章于 2024-12-15 07:30:00 发布

阅读量1k

点赞数

CC 4.0 BY-SA版权

分类专栏： Cluster

本文链接：https://blog.youkuaiyun.com/amaowolf/article/details/8127008

Cluster 专栏收录该内容

30 篇文章

订阅专栏

本文介绍了SLURM集群管理系统中的几种作业调度策略，包括内置的优先级调度、回填调度以及帮调度等，并详细解释了这些调度策略的工作原理及配置方法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.src/plugins/sched/
built-in: will initiate jobs strictly in their priority order, typically (first-in-first-out)
backfill: will initiate a lower-priority job if doing so does not delay the expected initiation time of higher priority jobs; essentially using smaller jobs to fill holes in the resource allocation plan. Effective backfill scheduling does require users to specify job time limits.
gang: gang time-slices jobs in the same partition/queue and can be used to preempt jobs from lower-priority queues in order to execute jobs in higher priority queues.
wiki: is an interface for use with The Maui Scheduler
wiki2: is an interface for use with the Moab Cluster Suite

2. slurm.conf
# SCHEDULING
FastSchedule=0
SchedulerType=sched/backfill
#SchedulerPort=7321
SelectType=select/cons_res

3. slurmctld/job_scheduler.c

/*
* build_job_queue - build (non-priority ordered) list of pending jobs
* IN clear_start - if set then clear the start_time for pending jobs
* RET the job queue
* NOTE: the caller must call list_destroy() on RET value to free memory
*/
extern List build_job_queue(bool clear_start);

/*
* schedule - attempt to schedule all pending jobs
* pending jobs for each partition will be scheduled in priority
* order until a request fails
* IN job_limit - maximum number of jobs to test now, avoid testing the full
* queue on every job submit (0 means to use the system default,
* SchedulerParameters for default_queue_depth)
* RET count of jobs scheduled
* Note: We re-build the queue every time. Jobs can not only be added
* or removed from the queue, but have their priority or partition
* changed with the update_job RPC. In general nodes will be in priority
* order (by submit time), so the sorting should be pretty fast.
*/
extern int schedule(uint32_t job_limit);

/* Determine if a pending job will run using only the specified nodes
* (in job_desc_msg->req_nodes), build response message and return
* SLURM_SUCCESS on success. Otherwise return an error code. Caller
* must free response message */
extern int job_start_data(job_desc_msg_t *job_desc_msg, will_run_response_msg_t **resp);

/*
* launch_job - send an RPC to a slurmd to initiate a batch job
* IN job_ptr - pointer to job that will be initiated
*/
extern void launch_job(struct job_record *job_ptr);