SQL_优化/问题_column reference should appear in group by key-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_43875878/article/details/145879100

1 优化

1.1 SQL优化

(1) 数据读取和筛选

①分区:筛选并优先筛选分区,区分度大的字段先过滤

②列裁剪:读取操作(select、where、join、group by、sort by等),不读取不需要的列,减少IO消耗.

③尽量不在筛选字段上使用函数和表达式.

(2) 分组聚合

①使用窗口函数时,排序键区分度大的放在前面.

②数据量大时,用group by替代distinct.

(3) 关联

①关联前预处理数据:筛选不必要的数据;处理用于关联的字段.

②能减少数据量的表先关联,关联键区分度大的先执行.

③减少不必要的关联字段.

④拆分及合并一些join操作.

(4) 索引

①不在索引列使用函数和表达式.

②不做不等值判断(<>)和模糊匹配(like).

(5) 整体

①将复杂SQL分解为多个简单SQL.

②将重复计算的结果存储在变量或临时表中.

1.2 表结构优化

(1) 字段

①减少单表字段数.

②整数尽量用短的类型.

③可用枚举或者整数代替字符串类型.

(2) 索引

合理设置索引.

1.3 参数优化

1.4 资源优化

①调整内存参数:如缓冲池大小、缓存大小等.

②调整连接池:优化数据库连接池的大小和配置.

1.5 综合优化

1.5.1 数据倾斜

(1) 通用

①过滤空值.

②单独处理热点数据.

③调整分区策略.

④在join列或分组列上添加索引.

(2) join 数据倾斜

使用随机分片

with large_tb2 as (
    select
         ...
		,concat(case floor(rand() * 10) 
						when 0 then 'a1'
						when 1 then 'b1'
						when 2 then 'c1'
						when 3 then 'd1'
						when 4 then 'e1'
						when 5 then 'f1'
						when 6 then 'g1'
						when 7 then 'h1'
						when 8 then 'i1'
						when 9 then 'j1'
					end
		,'_',join_key)  as prefixed_join_key 	   
    from large_tb
)
,prefix_helper as (
    select 'a1' as n union all
    select 'b1' as n union all
    select 'c1' as n union all
    select 'd1' as n union all
    select 'e1' as n union all
    select 'f1' as n union all
    select 'g1' as n union all
    select 'h1' as n union all
    select 'i1' as n union all
    select 'j1' as n    
)
,smalll_tb2 as (
    select
         ...
        ,concat(cast(prefix_helper.n as string), '_',join_key) as prefixed_join_key
    from smalll_tb
    cross join prefix_helper
)
select
    ...
from large_tb2 
inner join smalll_tb2 
    on large_tb2.prefixed_join_key = smalll_tb2.prefixed_join_key

(2) 聚合数据倾斜

①group by 的字段的值有不同数据类型:将数据类型调整一致.

②group by使用随机分片

select group_key, sum(count) as total_count
from (
    select group_key, count(*) as count
    from (
        select group_key, concat(group_key, '_', floor(rand() * 10)) as new_group_key
        from example_table
    ) as subquery
    group by new_group_key
) as final_result
group by group_key;

③count distinct 时,数据量非常大:使用 sum ... group by 代替.

1.6 高级优化

1.6.1 数据分片(分库分表)

(1) 垂直分片

根据数据库中表的相关性进行拆分

①专库专用:按照业务将表进行归类,分布到不同的数据库或表中.

②大表拆分:按常用字段和非常用字段,按列拆分到不同表中,表之间用主键关联.

(2) 水平分片:根据分片策略,将数据按行分散到不同的表或库.分片策略有范围分片、时间分片、字段冷热拆分.

1.6.2 读写分离

一台数据库服务器提供增删改业务,一台数据库服务器提供查询业务.

1.6.3 数据归档

将历史数据归档到其他实例

1.6.4 使用缓存

①MyBatis针对SQL语句做缓存;

②针对web页面做缓存,客户端做缓存.

1.7 针对性优化

Hive_优化/问题-优快云博客文章浏览阅读2次。①除了需要导入数据表使用textfile外,表的存储格式使用orc 和 parquet.https://blog.youkuaiyun.com/weixin_43875878/article/details/145882724

2 问题

2.1 SQL

(1) group by

Semantic analysis exception - column reference 表名.字段名 should appear in GROUP BY key
原因:查询字段不是分组列且未聚合

(2) union

1 type mismatch for UNION, left has 2 columns while right has 1 columns

原因:union 两边的字段数量不一样.


2 Illegal union operation - type mismatch for column 0 of UNION, left is BIGINT while right is STRING
原因:union 两边对应字段的数据类型不一致.

(3) insert

1 wrong columns count 1 in data source, requires 23 columns (includes dynamic partitions if any)
原因:insert into/overwrite 后的结果集的字段数与表的字段数不一致.