Fastest Table Sort in the West - Redesigning DuckDB’s Sort - Laurens Kuiper
瓶颈在:
- 内存的随机访问
- 分支预测
规避办法:
- 行式比较
- Key Normalization
实现关键点:
- Key Normalization
- Row Compare Vs Column Compare
- 算法的选择:
- RadixSort
- QuickSort
引申
paralleled using merge path
pointer swizzling: https://en.wikipedia.org/wiki/Pointer_swizzling
Push-Based Execution in DuckDB - Mark Raasveldt
Pull-Based pipeline模型的弊端:
- Load imbalance问题
- Plan explosion
- Added materialization costs
Morsel-Driven Parallelism
- operators为parallelism-aware,是并发可控的;
- query可以被切分成pipeline
- pipeline是可以并行执行的
Push-Based pipeline模型改造:
- 怎么更好地实现Union
- Right/Full outer join
Future Work
- Hybrid Async IO
- Hybrid Early/Late Materialization
本文探讨了DuckDB的Sort算法优化,重点关注了内存访问效率、分支预测问题,通过行式比较和KeyNormalization提升性能。介绍了使用RadixSort和QuickSort,以及行级并行操作如Morsel-Driven Parallelism和Pipeline改造。未来工作包括HybridAsyncIO和HybridEarly/LateMaterialization。
2万+

被折叠的 条评论
为什么被折叠?



