解决并行排序合并Join算法中数据倾斜的Skew算法 -- 摘自J. L. Wolf, D. M. Dias, and P. S. Yu. A parallel sort merge join alg

本文介绍了一种名为SKEW的任务调度算法，该算法旨在通过创建任务并将其分配给多个处理器来最小化完成所有任务所需的时间（即最小化最大完成时间）。SKEW算法考虑了处理器数量、已排序的任务集及每项任务的特性等因素，通过迭代过程不断细化任务划分，最终实现接近最优的任务调度方案。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Procedure: SKEW

Input: Number of processors P, ZP sets of sorted runs, {a_i,p,r | i = 1, ..., CARD_p,r|, one for each processor p ∈ {1, .., P} and each relation r ∈ {1, 2}, where CARD_p,r is the cardinality of the sorted run of relation r at processor p, and a_i,p,r is the ith tuple in this sorted run.

Output: The creation of tasks and a heuristic assignment of those tasks to the processors which approximately minimizes the makespan.

Set the number of tasks N = 1.

Set the top and bottom of the first task to be TOP_N,p,r= 1 and BOT_N,p,r= CARD_p,r for each processor p = 1, .., P and each relation r = 1, 2.

Determine the type(1 or 2) of the first task.

Do forever

Determine the optimal multiplicities MULT_n of each type 2 task n ∈ {1, .., N}. (Set MULT_n = 1 for each type 1 task n ∈ {1, .., N}.) Compute the total number of tasks to be NN = ∑_n=1^NMULT_n. Compute the task times {TIME_n^MULT_n | n = 1, .., N}.

If NN >= P then apply LPT.

If [solution is unacceptable] then begin

Apply GM to find the median element μ^(η) for the region {TOP_n,p,r, .., BOT_n,p,r | p = 1, .., P, r = 1,2} consisting of the largest type 1 task n.

The median element corresponds to a type 2 task with region {TOP_n,p,r², ..., BOT_n,p,r² | p = 1, ..., P, r = 1,2}

Relabel this new type 2 task as task number n.

Determine its optimal multiplicity MULT_n and task time TIME_n^MULT_n.

There also exist(1 or) 2 tasks, most likely of type 1, corresponding to regions {TOP_n,p,r¹, ..., BOT_n,p,r¹| p = 1, ..., P, r = 1, 2} and {TOP_n,p,r³, ..., BOT_n,p,r³ | p = 1, ...,P, r = 1, 2}. Increment N (by 1 or 2) to add these tasks and their optimal multiplicities and task times.

Sort the tasks in order of decreasing task times, so that n₁ <= n₂ implies TIME_n1^MULT_n1>= TIME_n2^MULT_n2.

End

Else halt with solution from final LPT.

End do

End SKEW