今天看别人的博客,发现streamtable这个东西,作者描述是:
将大表放在JION的右边,这是就需要指定使用/*+ STREAMTABLE(..) */:
- hive> SELECT /*+ STREAMTABLE(b) */ a.val, b.val, c.val FROM a JOIN b
- > ON (a.key = b.key1) JOIN c将大表放在JION的右边,这是就需要指定使用/*+ STREAMTABLE(..) */: hive> SELECT /*+ STREAMTABLE(b) */ a.val, b.val, c.val FROM a JOIN b > ON (a.key = b.key1) JOIN c ON (c.key = b.key1) ON (c.key = b.key1)
有点懵懂,看完另一个哥们写的才若有所悟From my understanding, when you have the join happening in map or reduce, the values corresponding to a key from all all table's except one (if two tables are involved in join on same key, then just one table here) are buffered in memory and the left out one is streamed. Usually it is the largest table to be streamed, else the larger data can go into the memory(buffer) and create OOM errors.This stream table hint is used to specify which table to be streamed. By default it is the table that comes on the right is streamed and the other is buffered. But if you wan't other than right table to be streamed you go for this hint.If you are joining more tables on different keys, then for every join set just specify the larger table on the right of ON condition. No need of stream table hint here.
本文介绍了Hive中StreamTable提示的使用方法及其重要性。当进行表连接操作时,通过指定较大表作为流表可以有效避免内存溢出问题。文章详细解释了如何正确设置流表以及在涉及多个表连接时的最佳实践。

被折叠的 条评论
为什么被折叠?



