SQL常见考点及万能模板

邪恶马铃薯

已于 2024-09-13 17:36:43 修改

阅读量1.3k

点赞数 25

CC 4.0 BY-SA版权

分类专栏： MySQL知识点文章标签： sql 数据库数据分析

于 2024-09-07 15:39:16 首次发布

本文链接：https://blog.youkuaiyun.com/sherrinzhang/article/details/141996445

MySQL知识点专栏收录该内容

6 篇文章

订阅专栏

笔记来自书籍《SQL面试宝典：图解数据库求职题》

1. 简单查询

1.1 基本查询语句如何写？

select 列名,列名,……
from 表名
where 列名 = 条件;

1.2 数据清洗常用的SQL语句如何写？

例如查找空值（缺失值）、重命名列名、去掉重复数据等。

（1）查找空值/非空值

查找空值

select 列名,列名,……
from 表名
where 列名 is null;

查找非空值

select 列名,列名,……
from 表名
where 列名 is not null;

（2）列名重命名

select distinct 列名,列名,……

（3）去掉重复数据

select 列名 as 新列名,列名 as 新列名,……
from 表名
where 列名 = 条件;

1.3 SQL语句的书写规则是什么？

（1）以英文分号;结尾
（2）不区分英文语句的大小写，使用英文字符

2. 汇总分析

常用在如分析每个部门的销售业绩。

2.1 汇总函数

在这里插入图片描述

2.2 分组汇总

涉及“查找重复数据”或者类似**“每个”“每天”**这样的词汇时，要马上想到可以用分组汇总group by来实现。
分组筛选何时用where 和 having？
where子句无法与聚合函数一起用，先筛选后分组。having支持聚合后筛选。

2.3 SQL书写顺序和运行顺序

书写顺序：select—from—where—group by—having—order by—limit
运行顺序：from—where—group by—having—select—order by—limit

3. 复杂查询

3.1 子查询

（1）常见结构

【万能模板】查找字段1排名第二的字段1信息：子查询+max

select max(字段)
from 表名
where 字段 < (
    select max(字段)
    from 表名);

【案例】查找排名第二的学生成绩
如找到成绩排名第二的学生成绩，找到薪资排名第二的员工薪资

select max(成绩)
from 成绩表
where 成绩 < (
    select max(成绩)
    from 成绩表);
    
--输出结果：90

（2） Where in+子查询

【万能模板】找出字段2满足多个条件的字段1信息：子查询+where in
比如访客表中，查找访问过A页面、B页面，但无访问过C页面的访客id

select count(distinct字段1)
from 表名
where 字段1 in （select 字段1 from 表名 where 字段2条件1）
    and 字段1 not in （select 字段1 from 表名 where 字段2条件2）;

【案例】找出买过A产品又买过B产品，但没有买过C产品的顾客人数。

select count(distinct顾客ID)
from 订单表
where 顾客ID in （select 顾客ID from 订单表 where 产品='ProductA'）
    and 顾客ID in （select 顾客ID from 订单表 where 产品='ProductB'）
    and 顾客ID not in （select 顾客ID from 订单表 where 产品='ProductC'）;

（3） all（子查询）和any（子查询）

需要和比较运算符一起用，如<、>、<>
all（子查询）表示要满足all里的所有条件；any（子查询）表示满足any里任一条件即可。

--A表：列名数字，2、3。B表：列名数字，1、2、3、4

--找出B表中比A表数据都大的数
select * from B表
where 数字 > all (select 数字 from A表)  ;
--输出：4

--找出B表中比A表任一数据大的数
select * from B表
where 数字 > any (select 数字 from A表) ;
--输出：3、4

3.2 临时表with…as

将子查询定义为临时表，可读性更强。
临时表无需删除，创建使用后立即释放，不会保存在数据库中

with
临时表名称1 as 子查询语句1,
临时表名称2 as 子查询语句2,
……

注意：with…as 不能表嵌套使用

--以下用法错误
with
a as (select 字段1,字段2 from sheet1)
b as (select 字段1 from a)

【案例】

--找出买过A产品又买过B产品，但没有买过C产品的顾客人数。
select count(distinct顾客ID)
from 订单表
where 顾客ID in （select 顾客ID from 订单表 where 产品='ProductA'）
    and 顾客ID in （select 顾客ID from 订单表 where 产品='ProductB'）
    and 顾客ID not in （select 顾客ID from 订单表 where 产品='ProductC'）;
    
    
--改成with…as语句
with
a as (select 顾客ID from 订单表 where 产品='ProductA')
b as (select 顾客ID from 订单表 where 产品='ProductB')
c as (select 顾客ID from 订单表 where 产品='ProductC')
select count(distinct 顾客ID)
from 订单表
where 顾客ID in （select * from a）
    and 顾客ID in （select * from b）
    and 顾客ID not in （select * from c）;

3.3 视图

试图是一个SQL语句，可以理解为一个虚拟表。一般用在私信信息的查看，保证数据安全。

create view 视图名称 as
select *
from 视图名称;

With…as语句和视图的区别：
（1）是否存储在数据库中。
with as是创建临时表，不储存在数据库中；视图是创建虚拟表，储存在数据库中。
（2）是否能多次调用。
With as只能使用在紧跟其后的select语句，视图可在SQL中多次调用。
（3）作用不同。
With as用于提高易读性；视图提高易读性之外，保证数据安全性。

3.4 case表达式（用于多条件判断）。

常用于多条件判断、按区间统计、行列互换等问题。

case
when 条件判断表达式1 then 值1
when 条件判断表达式2 then 值2
……
else 值n
end

【案例】

--多条件判断，判断学生成绩对应的等级
select 学号,成绩，（case when 成绩>=60 then '及格'
    when 成绩>=80 then '良好'
    else '不及格'
    end） as 等级
from 成绩表;


--区间统计，统计等级区间的学生人数
select 等级,count(*) as 学生人数
from(
    `select select 学号,成绩，（case when 成绩>=60 then '及格'
    when 成绩>=80 then '良好'
    else '不及格'
    end） as 等级
    from 成绩表) as a
group by 等级


--行列互换，一维图表变二维图表，万能模板
（用power query更快）
select A,
max(case B when 'm' then C else 0 end) as 'm'
max(case B when 'n' then C else 0 end) as 'n'
from 表
group by A;

--原表格字段包括学号、科目、成绩，转换成字段为学号、语文成绩、数学成绩
select 学号，
max(case 课程 when '语文' then '成绩' else 0 end) as 语文成绩，
max(case 课程 when '数学' then '成绩' else 0 end) as 数学成绩，
group by 学号

4. 多表查询

4.1连接的类型

在这里插入图片描述

5. 窗口函数

窗口函数可以对数据进行复杂分析

窗口函数 over (partition by 分组字段 order by 排序字段)

5.1 排名问题

排名函数
rank():考虑并列排名，序号不连续
dense_rank():考虑并列排名，序号连续——常用
row_number():不考虑并列排名
percent_rank():百分位数
【案例】

--输出每名学生不同课程的成绩排名
select *,row_number() over(partition by 学号 order by 成绩 desc) as 排名
from 成绩表;


--找到每个部门工资最高的数据(可能拥有最高薪资的不止一人)
select *
from (select *, rank() over(partition by 部门id order by 工资 desc) as 部门工资排名
    from 工资表) as a
where 部门工资排名=1;


--去掉工资最高、最低的记录，求部门平均工资
select a.部门id,avg(a.工资) as 平均工资
from (select *, rank() over(partition by 部门id order by 工资 desc) as 部门工资排名降序,
    rank() over(partition by 部门id order by 工资) as 部门工资排名升序
    from 工资表) as a
where 部门工资排名降序>1 and 部门工资排名升序>1
group by a.部门id;

5.2 Top N（排名前N）问题

【案例】

--查询每个科目排名前三的学生信息
--考虑并列且序号连续，用dense_rank()
select * 
from (select *,dense_rank() over (partition by 科目 order by 成绩 desc) as 排名
from 成绩表) as a
where 排名<=3;

5.3 前百分之N问题

【案例】

--查询成绩排名在前40%的学生信息
select *
from (select *,percent_rank() over(partition by 班级 order by 成绩 desc) as 百分数排名
from 成绩表) as a
where 百分位排名<=40%;

5.4 累计问题

移动窗口：非固定，随着设定条件逐行移动的。
【万能模板】

select *,sum(字段) over(partition by 分组字段 
                        order by 排序字段
                        rows between 起始行 and 终止行)
from 表名;

起始行、终止行使用的特定关键词：
n preceding：当前行的前n行
n following：当前行的后n行
current row：当前行
unbounded preceding：第一行
Unbounded following：最后一行
【案例】

--成绩从大到小排列后，进行累计求和
select *,sum(成绩) over(order by 成绩 desc
                    row between unbounded preceding and current row) as 累计得分
from 成绩表;

5.5 每组内比较问题

比较的值可以是：
sum、avg、max、min
【万能模板】

select *
from (
    select *,avg(字段) over(partition by 分组字段) as 组内比较值
    from 表名) as a

【案例】

--薪资低于部门平均水平的员工
select 员工id,部门id,工资
from(select *,avg(工资) over(partition by 部门id) as 平均工资
    from 工资表) as a
where 工资<平均工资;

5.6 连续问题。

偏移窗口函数lead()、lag()

--lag：查看上一行数据
lag(列名,n,null) over (partition 分组列名 order by 排序列名)
--lead：查看下一行数据
lead(列名,n,null) over (partition 分组列名 order by 排序列名)

【万能模板】连续出现N次问题

select distinct 列
from (select 列,
    lead(列名,1) over (partition by 分组列名 order by 排序列名) as 列1,
    lead(列名,2) over (partition by 分组列名 order by 排序列名) as 列2,
    ……
    lead(列名,n-1) over (partition by 分组列名 order by 排序列名) as 列n-1,
    from 表名) as a
where 列=列1 and 列=列2……and 列=列n-1;

【案例】每组球队内连续3次得分的球员名单

select distinct 球员名字
from (select 球员名字,
    lead(球员名字,1) over (partition by 球队id order by 得分时间) as 姓名1,
    lead(球员名字,2) over (partition by 球队id order by 得分时间) as 姓名2,
    from 得分表) as a
where 球员姓名=姓名1 and 球员姓名= 姓名2;

6. SQL高级功能

6.1 存储过程

存储：又是需要重复编写特定SQL语句，可通过存储语句，有需要时直接调用
优点：简化语句、提高效率、加强数据安全、易于维护、减少网络流量
缺点：可移植性差、调试困难、无法处理复杂的业务逻辑

--定义存储过程
dilimiter $$
create procedure <存储过程名称>([参数1,参数2,……])
begin
<SQL语句1>;
<SQL语句12>;
……
end$$
delimiter;

--调用存储语句
call <存储过程名称>([参数1,参数2,……])

（1）不带参数的存储过程

--定义一个查询所有学生姓名的存储过程
dilimiter $$
create procedure student_name()
begin
select 姓名
from 学生表;
end$$
delimiter;

--调用该存储过程的SQL语句
call student_name()

（2）带in参数的存储过程

--学生学号值当作一个变量处理，调用时可直接传入制定参数值、
--定义通过查询学号查询学生姓名的存储过程
dilimiter $$
create procedure get_name(in num varchar(100))
begin
select 姓名
from 学生表
where 学号 = num;
end$$
delimiter;

--调用函数，查询学号为001的学生姓名
call get_name('0001')

【案例】编写添加数据、查询数据的存储过程，并调用

--编写带参数的存储过程
--向职员表中添加记录，字段包括姓名、年龄、职位
dilimiter $$
create procedure insert_table(in name varchar(20),
    in age in int,in post varchar(20))
begin
insert into 职员表(姓名,年龄,职位)
values(name,age,post);
end$$
delimiter;

--编写带参数的存储过程
-- 查询表中年龄大于某一数值的记录
dilimiter $$
create procedure filter_age(in age int)
begin
select *
from 年龄 > age;
end$$
delimiter;

--调用insert_table,插入记录“张三 23 专员”
call insert_table('张三',23,'专员')

--调用filter_age,查询年龄大于30岁的员工信息
call filter_age(30)

6.2 自定义变量

分为系统变量、用户自定义变量
系统变量是用于设置数据库行为和方式的参数。比如日志大小、文件放置位置等。
用户自定义包括会话变量、局部变量，主要用于SQL语句
在这里插入图片描述

（1）会话变量：

会话变量是在当前会话中起作用的变量，其作用域和生命周期均与当前客户端（例如数据库客户端Navicat）连接。若当前客户端连接断开，会话变量随即被清除。会话变量主要用于在不同SQL语句之间传递数据。
（1.1）使用set关键字声明并赋值

set @会话变量名 =  值或表达式

【案例】使用set关键字对会话变量声明并赋值，变量@1=1，变量@2=2

--定义变量
set @a1=1,@2=2;

--查询变量值
select @1=1,@2=2
  （1.2）使用select关键字声明并赋值
select @会话变量名 := 值或表达式;
select 值或表达式 into @会话变量名;
select 字段 into @会话变量名 from …;  #字段必须为单个值

【案例】

#使用select 关键字对会话变量进行声明并赋值
select @a1 := 1 as a1,@a2 := 2 as a2;
select @a1 + @a2 into @a3;
#使用select...into语句声明赋值后，必须再使用select关键字查询便来给你的值
select @a3

（2）局部变量

放在begin和end之间的语句块中，作用仅限于该语句块内。

declare 变量名[,...] 数据类型 [default value];
set 局部变量名 = 值或表达式;

【案例】

--定义存储过程
dilimiter $$
create procedure ab(in a int,in b int)
begin
declare c,d int default 0
set c=a+b;
select c,d;
end$$
delimiter;


调用存储过程
dall ab(4,5)