MySQL

原创已于 2022-02-19 17:13:23 修改 · 1.3k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#mysql #数据库

于 2022-02-19 17:13:05 首次发布

mysql 专栏收录该内容

2 篇文章

订阅专栏

MySQL

简介：

按照数据结构来组织存储和管理数据的仓库

每个仓库都有一个或多个不同的API用于创建访问管理搜索和复制保存的数据

MySQL CUME_DIST() 函数 | 新手教程 (begtut.com)

RDBMS术语

列、行、冗余、主键（主键唯一）、外键、复合键（多个列作为一个索引键）、索引、参照完整性（关系中不允许应用不存在的实体，与实体完整性是关系模型必须满足的完整性约束条件目的是保证数据的唯一性）

Linux下检查mysql是否启动

ps -ef | grep musqld #检查进程列表mysql是否启动
./mysqladmin -u root -p shutdown

mysql -h localhost -u root -p #cmd进入mysql

#退出命令
exit
quit
\q

#windows 重启数据库
 #cmd
 net stop mysql
 net start mysql
#linux重启数据库
 1、service mysqld restart
 2、$mysql_dir/bin/mysqladmin -uroot -p shutdown
    $mysql_dir/bin/mysqld_safe &
 3、killall mysql
 #https://www.cnblogs.com/adolfmc/p/5497974.html

CTE

公用公共表达式cte，定义cte后可以使用增改查，以及创建视图

#cte语法
with sc as (query)
select * from cte_name

#实例
with c as(select customername,state from customers where country ='usa')
select custmoername from c where state='CA' order by customername;


#实例二
WITH dups AS (SELECT 
        id,
        name,
        ROW_NUMBER() OVER(PARTITION BY name ORDER BY name) AS row_num
    FROM rowNumberDemo)
DELETE rowNumberDemo FROM rowNumberDemo INNER JOIN dups ON rowNumberDemo.id = dups.id
WHERE dups.row_num <> 1;

注意mysql不支持基于CTE的删除，但是可以通过将原始表与CTE一起操作（内联一起）后才能删除，如实例二

特殊字符

Mysql管理

更改当前数据库

use 20211221test; --更改当前连接数据库
show tables; --
show columns from sys_config; --
-- show tables where tables_in_sys ='sys_config';
-- show columns from sys_config;
-- select * from user;

select * from mysql.user where user='root';
#查看用户在使用数据库的权限 表段增删改查，视图创建等，所有权限都能看见

数据库创建\删除\连接

create database jkl;

drop database jkl;
mysqladmin -u root -p drop RUNOOB #shell进入删除

#连接jkl
use jkl;

数据库数据类型

数值类型

类型	大小	范围（有符号负号）	范围（无符号）	用途
tinyint	1bytes	（-127，128）	（0，255）	小整数
smallint	2bytes	（-32768，32767）	（0，65535）	大整数值
mediumint	3bityes			(0，18 446 744 073 709 551 615)大整数值
int/integer	4bityes			大整数值
bigint	8	(-9,223,372,036,854,775,808，9 223 372 036 854 775 807)	(0，18 446 744 073 709 551 615)	极大整数值
float	4	(-3.402 823 466 E+38，-1.175 494 351 E-38)，0，(1.175 494 351 E-38，3.402 823 466 351 E+38)	（0，1.175 494 351 E-38，3.402 823 466 E+38)	单精度浮点数
double	8			双精度浮点数
decimal	decimal（m,d）m>d大小为m+2，否则为d+2	依赖md的值	依赖md的值	小数值

字符串类型

类型	大小	用途
char（n）	0-255bytes	定长字符串
varchar(n)		变长字符串
tinyblob		不超过255字符的二进制字符串
tinytext		短文本字符串
blob		二进制形式长文本数据
text		长文本数据
mediumblob		二进制中等长度文本数据
mediumtext		中等文本长度数据
longblob		二进制极大文本数据
longtext		极大文本数据

char和varchar中n代表字符个数，不代表字节bytes个数，存储和检索过程中不进行大小写转换

binary和varbinary类似char和varchar，但是只支持字节字符串，没有字符集

四种text对应四种blob，text与blob范围仅在于存储的最大长度不同

时间日期类型

类型	格式	用途
date	YYYY-MM-DD	日期值
time	HH:MM:SS	时间值或持续时间
year	YYYY	年份值
datetime	YYYY-MM-DD HH:MM:SS	混合日期和时间值
timestamp	YYYYMMDDHHMMSS	混合日期和时间值,时间戳

创建与删除数据库表段

#要使用反引号创建表名mysql8
create table table02(
 `id` integer unsigned auto_increment,
 `runob_title` varchar(100) not null,
 `runoob_author` varchar(40) not null,
 `runoob_date` date,
 primary key(`id`)
)engine=innodb default charset=utf8;

select * from 20211221test.table02;
#auto_increment #定义列为自增加属性，engine设置存储引擎，charset设置编码

create table if not exist `table02`(
`id` int unsigned auto_increment,
 `title` varchar(100)   not null,
  `author` varchar(40) not null,
   `date` date,
    permary key(`id`)
)engine=innodb default charset=utf8

drop table table02;

数据库存储引擎

mysql中数据用不同存储技术存储在文件或内存里，这些技术中每一种技术都是用不同存储机制、索引技巧等，选择不同的技术能获得额外的速度或者功能。mysql8支持九种引擎

show engines; #展示所有引擎

create table table_name(column type)engine=Innodb;
alter table table_name engine=MyISAM

引擎engine名称	优势	劣势	适用范围	特征
myisam	快速读取	不支持事务、外键没有原子性	做大量cont计算且插入不频繁，查询频繁；不需要事务功能	.frm格式文件\.MYD数据文件\.MYI索引文件共三个文件组成一个表，灵活使用auto_increment自增加属性，可以转换压缩节省空间
mrg_myisam				本质是多个myisam表聚合一起，内部无数据，真实数据还是在myisam表内部，删除mrg_myisam不会影响myisam的数据
innodb	提交、回滚、崩溃恢复的防止失误产生的安全引擎	读取比isam慢	标段更新查询非常频繁且标段锁定计划比较大
archive	高校插入速度	不支持索引和事务，除insert和create语句外都不支持	存储大量独立的历史记录数据等，日志	压缩比非常高，使用行锁实现高并发插入操作
blackhole		丢失数据	用于binlog做复制的中继存储
memory	插入效率快	不支持事务	行长度固定，存储在内存里的数据	.frm格式文件，不支持text和二进制blob字段
csv	csv文件存储表内容，csm文件存储表元数据和状态	不支持索引，不支持为null的行	需要存储csv文件的，	.frm格式文件结构，存储csv文件作为表段内容，使用csv引擎的标段所有行不为空
federated	适合开发使用，本地建立federated表段连接远程数据库			不真实存放数据，都是获取远程mysql服务器数据，没有本地表

performance_schema引擎

(5条消息) 【MySQL】MySQL8.0存储引擎详解_GengMS的博客-优快云博客_mysql8 引擎

监控mysql server在一个较低级别运行过程中的资源消耗等待情况

提供数据库运行实时检查server内部执行情况的方法
performance_schema监视server事件来实现监视server内部运行情况

数据库增删改查

条件选择有两种，having和where ,只有having支持聚合函数的使用

#插入数据用单引号，不是反引号
insert into table02(id,runob_title,runoob_author,runoob_date)values('1','wheris','prince','2022-02-01'),('7','wheris','prince','2022-02-01'),
('8','wheris','prince','2022-02-01');


create table 20211221test.table03(
 `'id'` integer unsigned auto_increment,
 `runob_title` varchar(100) not null,
 `runoob_author` varchar(40) not null,
 `runoob_date` date,
 primary key(`'id'`)
)engine=innodb default charset=utf8;
insert into 20211221test.table03
(`'id'`,runob_title,runoob_author,runoob_date)
values('6','wheris','prince','2022-02-01'),
('7','wheris','prince','2022-02-01'),
('8','wheris','prince','2022-02-01');

#查询数据
select * from 20211221test.table02 where id between 5 and 8 limit 3;
#limit 限制3条数据

#where
 #=
 #<> != 不等于 select * from table where id !=3;
 #<,>,>=,<=
 select * from table where id !=3;
 select id,runob_title from table02 where runob_title='wheris' limit 2;

#where binary 
 #设定where比较区分大小写
 select * from table02 where binary runob_title='Wheris';
 select * from table02 where runob_title='Wheris';

#update 更新已有的某一行数据内容和
update table02 set runob_title='jkl' where id between 5 and 6 and runob_title='wheris';
#workbench 使用update必须添加where关键字，且必须带上主键搜索，否则不能update

#delete 删除
delete from table03 where 'id'=6;
delete from table02 where id=6;

数据库的where的like语句

select * from table03 where runob_title like w%;

数据库union连接操作符

#连接两个以上select 语句，然后结果放到一个显示上，多余重复数据会自动删除
select expresion1,expersion2,... from table1 
[where conditions]
union [all | Distinct]
select expersion1,expersion2,... from table2
[where conditions]

union [all | Distinct]#判断是否展示或删除多余重复数据
不同表连接查询，需要在查询前将连接的临时表段互相保持一致

select country from websites
union all
select country from apps
order by country;

排序

指定读取数据字段名进行排序然后返回搜索结果

select field1,field2,... from table,table2,...
where 
order by field1 [ASC|DESC]
#任何使用情况下，order by默认升序排列ASC

select * from table01 order by id

分组

group by 根据一个或多个列结果进行分组，由此使用count、sum、avg等函数作为列

#创建数据
-- ----------------------------
--  Table structure for `employee_tbl`
-- ----------------------------
DROP TABLE IF EXISTS 20220210_test.employee_tb1;
CREATE TABLE 20220210_test.employee_tbl (
  `id` int(11) NOT NULL,
  `name` char(10) NOT NULL DEFAULT '',
  `date` datetime NOT NULL,
  `signin` tinyint(4) NOT NULL DEFAULT '0' COMMENT '登录次数',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

-- ----------------------------
--  Records of `employee_tbl`
-- ----------------------------
BEGIN;
INSERT INTO 20220210_test.employee_tbl VALUES ('1', '小明', '2016-04-22 15:25:33', '1'), ('2', '小王', '2016-04-20 15:25:47', '3'), ('3', '小丽', '2016-04-19 15:26:02', '2'), ('4', '小王', '2016-04-07 15:26:14', '4'), ('5', '小明', '2016-04-11 15:26:40', '4'), ('6', '小明', '2016-04-04 15:26:54', '2');
COMMIT;

SET FOREIGN_KEY_CHECKS = 1;
set names utf8;
select * from employee_tbl;

 select colmunname, function（avg\count\sum）
 from table_name
 where column_name operator value
 group by column_name;
 

#使用
select name count(*) avg(signin) from 20220210_test.employee_tb1 group by name;

#with rollup 实现分组统计数据之上的在进行相同统计，
#说白了，第二次统计的是第一次统计数据的统计，sum就是所有之和总数，avg是所有之和的平均数
select name sum(signin) as signin_count from employee_tb1 group by name with rollup;

#上面的统计完一次后会在进行一次总统计

#having 一般与group连用，过滤条件
select ordernumber,sum(quaryordered)as itemscount,sum(priceeach*quaintordered) as  tatal
from
 orderedtailed
group by ordernumber
having total >550000;

in操作符

select * from websites where name in ('google','school');
#拿取name字段为google和school的数据

select * from websites where (alexa between 1 and 20 )
and country not in ('USA','ind');

select customers.name as customers from customers
where customers.id not in (select c.id from customers as c inner join orders as o on c.id=o.customerid ) ;

数据库的连接

在这里插入图片描述

select * from table where a.key between 1 and 10 order by a.key desc as A inner join table as B on A.key=B.key ;
#内连接 表中至少一个匹配则返回行
select * from table as A left join table as B on A.key=B.key;
#左连接，右表没有匹配内容也从左表返回全部内容；同理右连接

#数据前提
drop table websites;
create table websites (
 `id` int unsigned auto_increment,
 `name` varchar(40) not null,
 `url` varchar(100) not null,
 `alexa` int not null,
 `country` char(40),
 primary key(`id`)
)engine=innodb default charset=utf8;
 
 insert  into 20220210_test.websites(id,name,url,alexa,country) 
 values ('1','google','http://www.baidu.com',1,'USA'),
 ('2','taobao','http://www.taobao.com',13,'CN'),
 ('3','school','http://www.school.com',4689,'CN'),
 ('4','weibo','http://www.weibo.com',20,'CN'),
 ('5','facebook','http://www.facebook.com',3,'USA');
 
 drop table access_log;
 create table 20220210_test.access_log (
  `aid` int unsigned auto_increment,
  `site_id`int not null,
  `count` int,
  Primary key(`aid`)
 )engine=innodb default charset=utf8;
 
 show tables;
 insert into 20220210_test.access_log(aid,site_id,count) values
 (1,1,45),
 (2,3,100),
 (3,1,230),
 (4,2,10),
 (5,5,205),
 (6,4,13),
 (7,3,220),
 (8,5,545),
 (9,3,201);

select w.id,w.name,a.count from websites as w 
inner join access_log as a on w.id=a.site_id
order by id desc;
#内连接后以id为基准降序排列

select * from websites as w 
inner join access_log as a 
on w.id=a.aid and w.name in ('google','school')  
order by w.id desc;

#表连接三种方式
select * from type,atricle where type.id=article.type_id;

#最常用，on指定连接条件
select * from type inner join article on type.id=article.id;
#using替代on连接表段
select type.*,article.* from type inner join article using(id);

#having 
 #having 必须跟group by 一起使用，有having不能有where,有group by可以没有having
select ordernumber,sum(quantityordered) as itemscount,sum(priceeach*quantityordered)as total
from ordertails
group by ordernumber
having total >1000;

数据库用法

复制数据到新的表（insert into select）

#创建websites2表，以wensites为模板（表结构的所有约束），同时拉取数据到webisties2
create table websites2 as select * from wensites;

#mysql、sql通用，将websites满足条件的数据插入到websites3中
insert into websites3 select * from websites where websites.id between 2 and 3;

#仅sql支持
select * into websites3 from websites where websites.id between 2 and 3;

限制数据条数limit

 select * from websites limit 2;
 select * from websites limit 0,3;
 select* from websites limit 1,2;
 
 select * from websites limit m,n;
 #m为第一行偏移量，n为返回的最大行数
 # m+1到m+n行
 # m+1 = m+n时只显示第m+1行  
 
 limit不能用运算表达式

创建数据库表段的约束

 create table createtabletest (
 `cid` int unique not null auto_increment,
 `name` varchar(40) not null,
 `sex` varchar(10) not null,
 `age` int default '18',
 `id` int not null,
 `time` date,
 primary key(`cid`),
 foreign key(`id`) references websites.id,   
 check(`sex` in ('femal','male','unknown'))
) engine=innodb default charset=utf8;;
#unique 表示字段在表内唯一适用于主键primary key一起使用
#auto_increment 表示没有值自动创建值
#not null 表示值不空
#dafault 没有值填入采取规定默认值
#primary key() 主键值
#check（） 值只能在规定范围内的
#foreign key（） # 外键，必须要跟连接位置
 #foreign key（`id`） references websites.id
 #constraint persion foreign key(`pid`) references persion2(pid)
    #
#constraint 约束附加加不加都行

alter修改列

alter table table04 modify runob_title varchar(50) not null;
#修改table04的runob_title的列

罗列表段约束条件

desc table04;

视图

虚拟数据表，数据指向真实表，行和列的数据从真实表中拿取，是使用视图时动态生成。所拿取的数据以真实表数据为基准

视图与表段要点

视图不是真实表，结构和数据是建立在对数据中真实表的查询基础上
存储在数据库中查询操作定义了视图的内容，列数据和行数据来自查询使用的实际表
视图没有物理记录，不以数据集存在数据库中，对应数据存在真实表中
视图是数据窗口，表是数据内容。视图展示的表数据，来源是实际表
视图建立和删除只影响试图本身，不影响基本表

视图优点

定制用户数据，聚焦特定数据‘
简化数据操作，查询是要聚合函数，同时显示其他字段信息
提高数据安全性，视图是虚拟的不影响表段
共享所需数据，同样数据只存储一次
更改数据格式，格式阿虎检索出的数据
重用sql语句，视图提供对查询操作封装，本身不包括数据本身

#创建视图
create view view as select * from websites; --#创建视图

create view view_websites(w_id,w_name,w_url,w_alexa,w_country) 
as select * from websites;--#创建视图，重新命名保护基本表websites中数据’

#查看视图信息
describe view_websites; #等效与 desc view_websites,查看表段信息
show create view view_websites;#查看视图详细定义

#修改视图表段,展示内容，只是修改视图内展示的内容，而不是修改列约束
 #视图时虚拟表，引用数据来自真实表端，所以可以增改删操作更新数据，
 #千万注意视图被修改，就是基本表的修改，修改时必须满足基本表的数据定义
 #修改视图有限制，拥有以下结构的视图不能被更新
   #聚合函数、distinct，group by/having，union ,from中的不可更新表段和视图，where子句中的子查询（多表查询），
alter view  view_websites as select id,name,url from tb_students_info; #将websites的id,name,url更新到视图上

update view view_websites set alexa=20 where id =1;
select * from tb_students_info limit 2;
#修改视图同时原表段数据也会被修改


#删除视图
drop view if exists view_websites;
show create view view_websites;

索引

特殊数据结构，由表中一个列或者多个列组成，用来快速查询数据

索引类型

存储方式区分

b树索引，树结构索引参考数据结构，根节点，叶子节点。支持order by查询
哈希索引，

逻辑区分

普通索引，

create index index_id on tb_student(id);

唯一索引

不为查询，只为保证数据唯一性

create unique index index_id on tb_student(id); #id有没有在创建表时unique约束都无所谓

主键索引

primery key()#创建表时有主键约束就是主键索引

空间索引

#创建空间索引，对应的索引列必须规定not null 约束，同时engine=MyISAM，一般适用于表段数据类型geometry地理数据类型
create spatial index index_line on tb_student(line);

全文索引

#查找文本中的关键字，只能在char，varchar，text上创建索引，engine=myisam支持全文索引
#允许，插入重复值和空值，大容量数据非常消耗时间和硬盘空间
create fulltext index index_info on tb _students(info);
 #info字段必须是char\varchar\text

实际使用区分

单列索引

#索引包含原表一个列，在表中单个字段创建索引，只要是单字段的索引（全文、唯一、主键、空间等）都叫单列索引
create index index_addr on tb_students(address(4))
 #查询时只查询address字段内容的前四个字符

多列索引

create index index_addr on tb_students(name,address)

索引特点

索引会提高查询速度，但是严重影响插入记录的速度，因为索引表中插入记录时，数据系统会先按照索引先排序—建议先删除索引后插入数据然后在创建索引
索引可以保持数据每行唯一性，增加查询速度，加快表与表之间的连接

#查看索引
show index from websites2 ;

#索引可以分开创建多个
create index index_id from tb_students;
create fulltext index index_name from tb_students;
#删除索引
drop index index_name on websites2;
alter table websites2 drop index index_name;

存储过程

完成特定功能的sql语句集合，将常用的sql语句预先写好同时指定名称存储在sql数据库内。使用时调用

优点：

封装性，完成一个逻辑功能需要多条sql语句，封装起来不影响应用程序源代码
增强sql语句功能和灵活性，流程控制编写
减少网络流量，只传送调用语句，而不是一大堆sql执行语句，降低网络负载
高性能
提高数据库安全性和数据完整性
使数据独立

#创建存储过程
create procedure showstuscore()
begin
select * from tb_student_score

生成临时表段with as

with productline as (
    select productline,year(orderdate) order_year,round(sum(quantityordered * princeach),0)order_value
from orders
inner join orderdetails using (ordernumber)
inner join products using(productcode)
group by productline,order_year)
select
   productline,order_year,order_value,
   lag(order_value,1) over (
    partition by productline
       order by order_year
   ) prev_year_order_value

判断条件null 与in

# 判断空只使用 is not null/is null
select c.name as customers from 
customers as c left join orders as o on c.id=o.customerid
where o.customerid is null;

#in 结果为真返回1，假为0
expr in(value1,...)
expr  not in (value1,....)

select customers.name as customers from customers
where customers.id not in (select c.id from customers as c inner join orders as o on c.id=o.customerid )

数据库自定义函数创建

函数只会返回一个值，不允许返回一个结果集。函数强调返回值，所以函数不允许返回多个值的情况，即使是查询语句。

CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGIN
  set n=n-1;
  RETURN (
      # Write your MySQL query statement below.
      select distinct salary from employee 
      order by salary desc limit n,1
  );
END
#create function getback(N int)

数据库窗口函数

mysql8之后支持数据库窗口函数，用更加简单方式解决查询问题

CREATE TABLE sales(
    sales_employee VARCHAR(50) NOT NULL,
    fiscal_year INT NOT NULL,
    sale DECIMAL(14,2) NOT NULL,
    PRIMARY KEY(sales_employee,fiscal_year)
)engine=innodb default charset=utf8;
 
INSERT INTO sales(sales_employee,fiscal_year,sale)
VALUES('Bob',2016,100),
      ('Bob',2017,150),
      ('Bob',2018,200),
      ('Alice',2016,150),
      ('Alice',2017,100),
      ('Alice',2018,200),
       ('John',2016,200),
      ('John',2017,150),
      ('John',2018,250);
 
SELECT  * FROM sales

#窗口函数作用如下
select firsttime,sum(sale) from sales group by firsttime
#参照聚合函数，也是作用于子集，按照要求削减行数

#窗口函数与聚合函数唯一区别是窗口函数不会削减子集，也就是查询行数
select fiscal_year,sales_employee,sale, 
sum(sale) over (partition by fiscal_year) 
total_sales from sales;   
 #sum函数用作窗口函数，函数通过over子句内容对子集操作，sum()称为窗口函数

#窗口函数语法
windowsFunction(expression)
over(partition by ext1,ext2,...
     order by ext1 asc/desc,ext2 asc/desc,...
     frame_unit{frame_start|frame_between}
     )
#over内即使没有内容，也必须跟一个over()
#over内部分为三个部分，分区定义partition by，顺序定义order by，帧定义frame_unit{}

 #分区定义parttion by
    #子句后跟随一到多个表达式，逗号隔开
    
 #顺序定义order by 
    #子句指定分区中排序方式，可以在多个列（键）上的分区进行排序
    #order by仅在对顺序敏感的窗口函数使用才有意义
    
 #帧定义 frame_unit
    #何为帧，即当前分区的子集，定义帧也就是定义当前分区的子集。允许帧根据其分区内当前行位置在分区内移动
    #帧单位指定当前行与帧行（子集的行）之间的关系

frame_unit{frame_start|frame_between}
#帧定义
帧单位是指定当前行与帧行关系类型

当前行与帧行（分区子集行）偏移量是行号，如果帧单位是rows行值，则行值是帧单位的range
frame_start和frame_between定义帧边界

frame_start 包含三项：
   1、
frame_between 构成
  between frame_boundary_1 and frame_boundary_2
  
#未定义frame_defination,也就是没有帧定义使用如下默认内容
  range between unbounded preceding and current row
 rows
 range

cume_dist()

计算一组值中值累积分布

升序排列，表示值小于或等于行的值除以总行数的行数

降序排列，表示值大于或等于行的值除以总行数的行数

应用：

假设有一张统计某部门每个人工资的表段

统计小于或等于当前工资的人数在部门中的比例，

人话是：以这个人的自己工资salary为基础，小于该salary的人在部门中的比例，即小于等于salary的工资的人数占比
统计大于或等于当前工资的人数在部门中的比例

人话是，以这个人自己的工资salary为基础，大于或等于该salary的人在部门中的比例，即大于等于salary的工资的人数占比

CREATE TABLE scores (
    name VARCHAR(20) PRIMARY KEY,
    score INT NOT NULL
);
 
INSERT INTO
 scores(name, score)
VALUES
 ('Smith',81),
 ('Jones',55),
 ('Williams',55),
 ('Taylor',62),
 ('Brown',62),
 ('Davies',84),
 ('Evans',87),
 ('Wilson',72),
 ('Thomas',72),
 ('Johnson',100); 
 
 #查询集中分数累计,order by没指定排序舒徐默认asc
 select name,score,
 row_number() OVER(order by score) row_num,
 cume_dist() over(order by score ) cume_dist_total
 from scores;
 
+----------+-------+---------+---------------+
| name     | score | row_num | cume_dist_val |
+----------+-------+---------+---------------+
| Jones    |    55 |       1 |           0.2 |
| Williams |    55 |       2 |           0.2 |
| Brown    |    62 |       3 |           0.4 |
| Taylor   |    62 |       4 |           0.4 |
| Thomas   |    72 |       5 |           0.6 |
| Wilson   |    72 |       6 |           0.6 |
| Smith    |    81 |       7 |           0.7 |
| Davies   |    84 |       8 |           0.8 |
| Evans    |    87 |       9 |           0.9 |
| Johnson  |   100 |      10 |             1 |
+----------+-------+---------+---------------+
#第四行为例，行数查找集中行数，其值小于或等于62的有四个数据，所以4/10=0.2，即得出分数小于等于62的占总人数的0.4，应用第一行逻辑

dense_rank()

利用order by对每一行进行排名

最简单的应用就是给出排名

dense_rank() over(partition by
                  order by columnname asc|desc)
              #partition by将from子句划分分区，dense_rank作用在每个分区内
              #order by 指定dense_rank窗口函数操作每个分区的行顺序

 #以上面表段sales为例,介绍排名函数order by和partition by 区别
select sales_employee,fiscal_year,sale,
 dense_rank() over(
     partition by fiscal_year
     order by sale asc) saleyears_rank
 from sales;
 #因为制定了分区partition by fiscal_year,此为计算在不同年度内，每年sale的排名，图一
 
select sales_employee,fiscal_year,sale,
dense_rank() over(
order by sale asc
)sales_rank
from sales;  
#没有指定分区partition by 所以这里变成sale的销售排名，图二

在这里插入图片描述

first_value(）

返回指定表达式相对窗口框架第一行值

选择窗口框架、分区、或结果集的第一行

first_value(columnname1) over(partition by exper1,..
                              order by exper1,...)
#columnname1 为拿取到第一行数据时选取你所需要的列的数据。说是拿取一整行数据，但是实际测试只能拿取第一行数据的某列数据，这个columnname1就是你要拿取的某列数据项，名

#partition by 按照分区，每个分区单独拿取分区的第一行数据
#order by 排序顺序，在多个窗口函数使用情况下，不需要在使用了，除非特殊要求

在这里插入图片描述

用途

可以与其他窗口函数一起使用，比如在已经拍好顺序的dense_rank表里，单独单独拿取第一行，这个数据可以作为最大项或最小项使用

#以上面sales表段为例，拿取最高销售额的人名（如果要使用，最好是id,但是本表没设置）
select sales_employee,fiscal_year,sale,
 dense_rank() 
 over(partition by fiscal_year order by sale asc) sales_rank,
 first_value(sales_employee) 
 over(partition by fiscal_year order by sale asc) leastsale      
 from sales;
 #first_value这里就表示计算每年销量最低者，图一，asc和asc
 
select sales_employee,fiscal_year,sale,
 dense_rank() over(partition by fiscal_year
     order by sale desc) sales_rank,
 first_value(sales_employee) 
 over(partition by fiscal_year order by sale desc) leastsale      
 from sales;
 #每年度最高销量者 desc和desc
 
 #如果是如下asc和desc，图三
  select sales_employee,fiscal_year,sale,
 dense_rank() over(partition by fiscal_year
     order by sale asc) sales_rank,
 first_value(sales_employee) 
 over(partition by fiscal_year order by sale desc) leastsale      
 from sales;
 #得出是仍然是每年度最高销售人，但是表段会乱掉，不好看，所以多个窗口函数使用order by时注意保持order by排序顺序一致得到美观的

lag（）

返回分区内当前行之前的第n行的值，没有就空

回顾多行并从当前行访问其他行数据

lag(expression，offset) 
over(partition by   order by )
#expression是当前行之前行的值
#offset 是在此行前的offset行

select customername,orderdate,
       lag(orderdate,1) over(
       partition by customernumber
       order by orderdate) nextorderdate
from orders
inner join customers using(customernumber)
#lead 这里是返回当前行和前1行内的orderdate列数据

select customername,orderdate,
       lag(orderdate,3) over(
       partition by customernumber
       order by orderdate) nextorderdate
from orders
inner join customers using(customernumber)
#这里前3行数据

last_value(）

返回指定表达式相对窗口框架最后一行值

lead(）

返回分区中当前行之后第n行，没有就空

lead(experssion1，offset) over(partition by
                       order by)
#expression 返回当前expression的值，从排序后结果
#offset 是当前行后的第offset行

#查找每个客户的订单日期和下一个订单日期
select customername,orderdate,
       lead(orderdate,1) over(
       partition by customernumber
       order by orderdate) nextorderdate
from orders
inner join customers using(customernumber)
#lead 这里是返回当前行和后1行内的orderdate列数据

select customername,orderdate,
       lead(orderdate,3) over(
       partition by customernumber
       order by orderdate) nextorderdate
from orders
inner join customers using(customernumber)#后3行的orderdat列数据

nth_value(）

返回窗口框架第n行参数值

ntile (）

将每个窗口分区分配到指定数量的已排名组

#排序分区partition中将行划分为特定数量的组，每组分配一个桶号，ntile产生的行返回一个桶号
ntile(n) over(
partition by expression1
    order by expression1
)
#n代表字面正整数 桶号范围是1到n
#分区partition 返回结果集划分from子集为分区到ntile

percent_rank(）

计算分区或结果子集中百分数

percent_rank()
 over(partition by expr,
     order by expr desc|asc)
     #返回的是一个0-1的数字
     #(rank -1)/(total_rows -1) 一共10行，当前计算第6行百分比为5/9=0.5555555

可以用来分析某一种产品相对与其他产品的好坏，前提是这张表已经按照某种规则排过序

select productline,orderyear,ordervalue,
round(percent_rank() over(partition by orderyear order by ordervalue),2) percent_rank
from productlinesales
#因为是小数，所以使用聚合函数round去两位小数

rank(）

与dense_rank函数类似，当两行或多行有相同排名时，排列序列产生空隙

row_number(）

为分区中每一行分配一个连续整数

使用row_number可以实现自动给每行分配一个序号

#
row_number() over(
partition by
    order by
)

select
 row_number() over(order by profuctname) row num,
 productname,msrp
 from products order by productname

可以扩展为查找每组的前n行

with invertory as (
select productline,productname,quantituinstock,
    row_number() over(partition by productline order by  quantityinstock desc) row_num
    from products
)
select productline,productname,quantityinstock from inventoty where row_num <=3;
#order by根据quantity大小为productline的不同分区排序，然后因为是按分区给定的自增序号，那我指定该序号范围就能指定前n行数据

可以分页

使用该窗口函数，会生成表段自增序号，指定该自增序号范围就能排序

可以删除重复行

将非唯一行转换为唯一行（多加一列数据），然后删除重复行。

#数据
DROP TABLE IF EXISTS rowNumberDemo;
CREATE TABLE rowNumberDemo (
    id INT,
    name VARCHAR(10) NOT NULL
);
 
INSERT INTO rowNumberDemo(id,name) 
VALUES(1,'A'),
      (2,'B'),
      (3,'B'),
      (4,'C'),
      (5,'C'),
      (6,'C'),
      (7,'D'); 
      
# 
with a as(select id,name,row_number() over(partition by name order by name) rowname from rowNumberDemo)
delete r.* from rowNumberDemo as r inner join a on a.id = r.id where a.rowname<>1;
#因为是CTE删除，所以要将原始表段与CTE一起作为解决方式

数据库实例-题目来自力扣

做题思路，要将一个问题，拆分成多个小问题，通过小问题解决方式的综合得出大问题结果，数据库就要这么做题

#查找第二大的某数据,记得去重
select ifnull((select distinct salary from employee as e order by e.salary desc limit 1,1 ),null) as secondhighestsalary;

select distinct max(salary) as SecondHighestSalary from employee as e
where e.salary!=(select distinct max(salary) from employee);

编写 SQL 查询对分数进行排序。排名按以下规则计算:

分数应按从高到低排列。
如果两个分数相等，那么两个分数的排名应该相同。
在排名相同的分数后，排名数应该是下一个连续的整数。换句话说，排名之间不应该有空缺的数字。

scores表

±------------±--------+
| Column Name | Type |
±------------ ±--------+
| id | int |
| score | decimal |
±------------ ±--------+

#解题思路
#分成两部分，第一部分计算排序，第二部分计算排名
 #第一部简单
 select score from scores order by scores.score desc;
 #排名计算
 #假设最高分是99，两个个人考97，，三个人考98。
  #我们计算第三名名次时，只要知道大于97的分数有几种就行了，所以count(distinct score),但又因为是名次不是从0开始的所以要计算score>=97的才是当前名次
  #所以 计算当前分数x的排名就出来了
  select count(distinct s2.score) from scores as s2 where s2.score>=x;
  #带入第一部分可得出答案
  select s1.scores,(select count(distinct s2.score) from scores as s2 where s2.score>=s1.score) as 'rank' from score as s1 order by s1.score desc;
  
#新版本提供了rank排名函数 
select score,dense_rank() over(order by score desc)as'rank' from score;

招收收入超过经理的员工

±------------±--------+
| Column Name | Type |
±------------±--------+
| id | int |
| name | varchar |
| salary | int |
| managerId | int |
±------------±--------+

编写一个SQL查询来查找收入比经理高的员工。

以 任意顺序 返回结果表。

查询结果格式如下所示。

输入: 
Employee 表:
+----+-------+--------+-----------+
| id | name  | salary | managerId |
+----+-------+--------+-----------+
| 1  | Joe   | 70000  | 3         |
| 2  | Henry | 80000  | 4         |
| 3  | Sam   | 60000  | Null      |
| 4  | Max   | 90000  | Null      |
+----+-------+--------+-----------+
输出: 
+----------+
| Employee |
+----------+
| Joe      |
+----------+
解释: Joe 是唯一挣得比经理多的雇员。

#解法1 分成两个表，一个表用来查询数据，另外一个用来当选择条件筛选
select e1 as employee from employee as e1,employee as e2 
where e1.mangerid=e2.id and e1.salery>e2.salary

#解法2 内连接产生一个包含经理所有数据的表段,然后在这个表段里查询数据
select  e1.name as employee from employee as e1 inner join employee as e2 on e1.managerid = e2.id where e1.salary >e2.salary;

编写一个 SQL 查询，查找 Person 表中所有重复的电子邮箱。

示例：

±—±--------+
| Id | Email |
±—±--------+
| 1 | a@b.com |
| 2 | c@d.com |
| 3 | a@b.com |
±—±--------+
根据以上输入，你的查询应返回以下结果：

±--------+
| Email |
±--------+
| a@b.com |
±--------+

#解法1
with e as (select email,count(email) as totalemail from person group by email)
select e.email from e where e.totalemail>1;
#解法2
select email from person group by email having count(email)>1
#解法3

求第n高工资

±------------±-----+
| Column Name | Type |
±------------±-----+
| id | int |
| salary | int |
±------------±-----+
Id是该表的主键列。
该表的每一行都包含有关员工工资的信息。

编写一个SQL查询来报告 Employee 表中第 n 高的工资。如果没有第 n 个最高工资，查询应该报告为 null 。

查询结果格式如下所示。

示例 1:

输入:
Employee table:
±—±-------+
| id | salary |
±—±-------+
| 1 | 100 |
| 2 | 200 |
| 3 | 300 |
±—±-------+
n = 2
输出:
±-----------------------+
| getNthHighestSalary(2) |
±-----------------------+
| 200 |
±-----------------------+
示例 2:

输入:
Employee 表:
±—±-------+
| id | salary |
±—±-------+
| 1 | 100 |
±—±-------+
n = 2
输出:
±-----------------------+
| getNthHighestSalary(2) |
±-----------------------+
| null |
±-----------------------+

CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGIN
  RETURN (
      # Write your MySQL query statement below.
            # Write your MySQL query statement below.
       select ifnull(c.salary,null) from (
          select distinct rs.salary ,rs.rankname from (select id,salary,dense_rank() 
          over( order by salary desc) rankName 
          from employee) as rs  
          where rs.rankName=n) as c
  );
END
#

某网站包含两个表，Customers 表和 Orders 表。编写一个 SQL 查询，找出所有从不订购任何东西的客户。

Customers 表：

±—±------+
| Id | Name |
±—±------+
| 1 | Joe |
| 2 | Henry |
| 3 | Sam |
| 4 | Max |
±—±------+
Orders 表：

±—±-----------+
| Id | CustomerId |
±—±-----------+
| 1 | 3 |
| 2 | 1 |
±—±-----------+
例如给定上述表格，你的查询应返回：

±----------+
| Customers |
±----------+
| Henry |
| Max |

#1
select customers.name as customers from customers
where customers.id not in (select c.id from customers as c inner join orders as o on c.id=o.customerid ) ;
#内连接拿取买过东西的顾客的表，然后在主表里来一次不在上述表内的内容，
#不用name，是为了防止出现意外，若客户订单表出现重复订单，会把客户表内的数据剔除，类似情况一定要使用主键等特殊列判断

#2
select c.name as customers from 
customers as c left join orders as o on c.id=o.customerid
where o.customerid is null;
#左连接两张表，找到customerid为空的就是没有买过东西的顾客

编写SQL查询以查找每个部门中薪资最高的员工。

按任意顺序返回结果表。
查询结果格式如下例所示。

表： Employee

±-------------±--------+
| 列名 | 类型 |
±-------------±--------+
| id | int |
| name | varchar |
| salary | int |
| departmentId | int |
±-------------±--------+
id是此表的主键列。
departmentId是Department表中ID的外键。
此表的每一行都表示员工的ID、姓名和工资。它还包含他们所在部门的ID。

表： Department

±------------±--------+
| 列名 | 类型 |
±------------±--------+
| id | int |
| name | varchar |
±------------±--------+
id是此表的主键列。
此表的每一行都表示一个部门的ID及其名称。

编写SQL查询以查找每个部门中薪资最高的员工。
按任意顺序返回结果表。
查询结果格式如下例所示。

示例 1:

输入：
Employee 表:
±—±------±-------±-------------+
| id | name | salary | departmentId |
±—±------±-------±-------------+
| 1 | Joe | 70000 | 1 |
| 2 | Jim | 90000 | 1 |
| 3 | Henry | 80000 | 2 |
| 4 | Sam | 60000 | 2 |
| 5 | Max | 90000 | 1 |
±—±------±-------±-------------+
Department 表:
±—±------+
| id | name |
±—±------+
| 1 | IT |
| 2 | Sales |
±—±------+
输出：
±-----------±---------±-------+
| Department | Employee | Salary |
±-----------±---------±-------+
| IT | Jim | 90000 |
| Sales | Henry | 80000 |
| IT | Max | 90000 |
±-----------±---------±-------+
解释：Max 和 Jim 在 IT 部门的工资都是最高的，Henry 在销售部的工资最高。

#解法1 使用窗口函数
with TP as (select e.id,e.name,e.salary,e.departmentid,d.name as departmentname,
dense_rank() over( partition by e.departmentid
order by salary desc ) departRankSalary from employee as e inner join department as d 
on e.departmentid = d.id)
#获取连接总表
#表内根据工资分部门排名
select departmentname as department,name as employee,
salary from TP
where departRankSalary=1;


#解法二非窗口函数
select d.name as department ,e.name as employee,e.salary 
from employee as e left join department as d on d.id=e.departmentid
where (e.departmentid,e.salary) in (select departmentid,max(salary) from employee group by departmentid)
and e.departmentid is not null;
#将问题拆分成多个小问题，求的是部门最高工资的员工，所以我们要先弄出来最高工资的表table1，然后根据这个table1去连接后的总表筛选出

编写SQL查询以查找每个部门中薪资前三高的员工。

条件与上题一致，注意审题，前三高工资员工不是就只有三个人，否则题目变成工资

#解法一 窗口函数最简单
select departname department,name employee,salary from (select e.id,e.name,e.salary,e.departmentid,d.name departname ,
dense_rank() over(
    partition by e.departmentid 
    order by e.salary desc
) departrank
from employee as e left join department as d
on e.departmentid=d.id ) as tp where tp.departrank between 1 and 3;

游戏玩法分析

编写一个 SQL 查询，同时报告每组玩家和日期，以及玩家到目前为止玩了多少游戏。也就是说，在此日期之前玩家所玩的游戏总数。详细情况请查看示例。

查询结果格式如下所示：

Activity table:
±----------±----------±-----------±-------------+
| player_id | device_id | event_date | games_played |
±----------±----------±-----------±-------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 1 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
±----------±----------±-----------±-------------+

Result table:
±----------±-----------±--------------------+
| player_id | event_date | games_played_so_far |
±----------±-----------±--------------------+
| 1 | 2016-03-01 | 5 |
| 1 | 2016-05-02 | 11 |
| 1 | 2017-06-25 | 12 |
| 3 | 2016-03-02 | 0 |
| 3 | 2018-07-03 | 5 |
±----------±-----------±--------------------+
对于 ID 为 1 的玩家，2016-05-02 共玩了 5+6=11 个游戏，2017-06-25 共玩了 5+6+1=12 个游戏。
对于 ID 为 3 的玩家，2018-07-03 共玩了 0+5=5 个游戏。
请注意，对于每个玩家，我们只关心玩家的登录日期。

#解法 mssql
select 
    player_id, 
    event_date, 
    sum(games_played)over(partition by player_id order by event_date) games_played_so_far 
from Activity;

#普通解法
select a1.player_id,a1.event_date,sum(a2.games_played) from activity a1,activity a2 
where a1.player_id=a2.player_id and a1.event_date >=a2.event_date 
group by a1.player_id,a1.event_date;
#两个表查询，因为是两个构成主键所以，同组下当一个表段日期比另外一个日期大，我就进行加和操作

游戏玩法分析二

如上题，求

请编写一个 SQL 查询，描述每一个玩家首次登陆的设备名称

查询结果格式在以下示例中：

Activity table:
±----------±----------±-----------±-------------+
| player_id | device_id | event_date | games_played |
±----------±----------±-----------±-------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
±----------±----------±-----------±-------------+

Result table:
±----------±----------+
| player_id | device_id |
±----------±----------+
| 1 | 2 |
| 2 | 3 |
| 3 | 1 |
±----------±----------+

#
with a1 as (select player_id,(event_date) mindateevent from activity group by player_id)
select a.player_id,a.device_id from activity as a,a1 where a.player_id=a1.player_id and a.event_date=a1.mindateevent;
#查出最小年纪的主键，然后根据这个主键表查询第一次登录

求订单最多客户

Orders

±----------------±---------+
| Column Name | Type |
±----------------±---------+
| order_number | int |
| customer_number | int |
±----------------±---------+
Order_number是该表的主键。
此表包含关于订单ID和客户ID的信息。

编写一个SQL查询，为下了最多订单的客户查找 customer_number 。

测试用例生成后，恰好有一个客户比任何其他客户下了更多的订单。

查询结果格式如下所示。

示例 1:

输入:
Orders 表:
±-------------±----------------+
| order_number | customer_number |
±-------------±----------------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 3 |
±-------------±----------------+
输出:
±----------------+
| customer_number |
±----------------+
| 3 |
±----------------+
解释:
customer_number 为 ‘3’ 的顾客有两个订单，比顾客 ‘1’ 或者 ‘2’ 都要多，因为他们只有一个订单。
所以结果是该顾客的 customer_number ，也就是 3 。

select t.customer_number from(select count(order_number) customercount ,customer_number  from orders group by customer_number 
order by customercount desc ) as t limit 0,1;

#窗口函数排名升序排名

求连续空余座位

Cinema

编写一个SQL查询来报告电影院所有连续可用的座位。

返回按 seat_id 升序排序的结果表。

测试用例的生成使得两个以上的座位连续可用。

查询结果格式如下所示。

示例 1:

输入:
Cinema 表:
±--------±-----+
| seat_id | free |
±--------±-----+
| 1 | 1 |
| 2 | 0 |
| 3 | 1 |
| 4 | 1 |
| 5 | 1 |
±--------±-----+
输出:
±--------+
| seat_id |
±--------+
| 3 |
| 4 |
| 5 |
±--------+

#解法一
select distinct c2.seat_id from cinema as c1 ,cinema as c2 
where abs(c2.seat_id-c1.seat_id)=1 and c1.free=1 and c2.free =1;
 #连续座位，先要知道成为连续作为的标准，题给出seat_id是自增主键，那么筛选连续座位id方式出来了，只要后seat_id-前seat_id =1 ，那这两个作为就是连续的，
 #所以需要两张表一张作为前seat_id，一张作为后seat_id

#解法二
select distinct a1.seat_id
from cinema as a1 left join cinema as a2
on (a1.seat_id+1=a2.seat_id or a1.seat_id-1=a2.seat_id)
where a1.free=1 and a2.free=1;
#操作连接条件，本质还是是添加连续判断标准

求每部门的工资中位数

表: Employee

±-------------±--------+
| Column Name | Type |
±-------------±--------+
| id | int |
| company | varchar |
| salary | int |
±-------------±--------+
Id是该表的主键列。
该表的每一行表示公司和一名员工的工资。

写一个SQL查询，找出每个公司的工资中位数。

以任意顺序返回结果表。

查询结果格式如下所示。

示例 1:

输入:
Employee 表:
±—±--------±-------+
| id | company | salary |
±—±--------±-------+
| 1 | A | 2341 |
| 2 | A | 341 |
| 3 | A | 15 |
| 4 | A | 15314 |
| 5 | A | 451 |
| 6 | A | 513 |
| 7 | B | 15 |
| 8 | B | 13 |
| 9 | B | 1154 |
| 10 | B | 1345 |
| 11 | B | 1221 |
| 12 | B | 234 |
| 13 | C | 2345 |
| 14 | C | 2645 |
| 15 | C | 2645 |
| 16 | C | 2652 |
| 17 | C | 65 |
±—±--------±-------+
输出:
±—±--------±-------+
| id | company | salary |
±—±--------±-------+
| 5 | A | 451 |
| 6 | A | 513 |
| 12 | B | 234 |
| 9 | B | 1154 |
| 14 | C | 2645 |
±—±--------±-------+

进阶: 你能在不使用任何内置函数或窗口函数的情况下解决它吗?

#做题，要先了解什么是中位数，即正着数在中间，倒着数也在中间
#中位数分奇数和偶数两种情况，奇数情况在最中间的，偶数情况则是中间两个
#假设目前有5和6两种表，只使用，5情况下中位数3=floor(（5+1）/2) ,6情况下中位数3=floor（（6+1）/2），4=floor（（6+2）/2）
#可以注意到向下取整，奇数情况下floor（n+1/2）和floor（n+2/2）结果是一致的
#那么排名数据在floor((n+1)/2)和floor((n+2)/2)之间就可以是中位数

with t as (select id,company,salary,row_number over(partition by company order by salary asc) ranknumber,count(id) over(partition by company) totalofcompany from employee)

select t.id,t.company,t.salary from t where t.ranknumber in (floor((t.totalofcompany+1)/2),floor((t.totalofcompany+2)/2));
 #中位数判断方式有多种，如果用开窗函数思想是一致，不同写法是中位数判断不同


#不用窗口函数解法
#表本身与自己通过company进行连接，假设A公司有6个员工，自连接后表内数据对于A公司就有36条数据，
#每位员工各自对应包括自己在内的其他员工数据6条，此时只要比较当前员工工资大于其他员工工资的出现次数，是否等于中位数计算结果就能确定当前员工是否中位数
#
with t as (select e1.id i1,e1.salary s1,e2.id i2,e2.salary s2,e1.company c1 from employee e1 join employee e2 using(company) group by e1.id  
having sum(e1.salary >=e2.salary) >=count(e1.id)/2 and sum(e1.salary<=e2.salary)>=count(e1.id)/2)
select t.i1 id,t.c1 company,t.s1 salary  from t;

#这种解法，在同一公司内有多个与中位数相同的数字后会一起展示，即无法去重，但是确实能查出中位数。我是按照员工id分组的

删除重复电子邮箱

编写一个 SQL 查询，来删除 Person 表中所有重复的电子邮箱，重复的邮箱里只保留 Id 最小的那个。

±—±-----------------+
| Id | Email |
±—±-----------------+
| 1 | john@example.com |
| 2 | bob@example.com |
| 3 | john@example.com |
±—±-----------------+
Id 是这个表的主键。
例如，在运行你的查询语句之后，上面的 Person 表应返回以下几行:

±—±-----------------+
| Id | Email |
±—±-----------------+
| 1 | john@example.com |
| 2 | bob@example.com |
±—±-----------------+

delete p1 from person p1,person p2 where p1.id>p2.id and p1.email=p2.email
#不要想的复杂，

游戏玩法分析 IV-首日连续两天登录的

Table: Activity

编写一个 SQL 查询，报告在首次登录的第二天再次登录的玩家的比率，四舍五入到小数点后两位。换句话说，您需要计算从首次登录日期开始至少连续两天登录的玩家的数量，然后除以玩家总数。

查询结果格式如下所示：

Activity table:
±----------±----------±-----------±-------------+
| player_id | device_id | event_date | games_played |
±----------±----------±-----------±-------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-03-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
±----------±----------±-----------±-------------+

Result table:
±----------+
| fraction |
±----------+
| 0.33 |
±----------+

#注意此题有坑，求的是首日开始连续两天登录的比例，而不是连续两天登录的比例，千万注意
#很容易写成这样
select 
round(count(distinct a4.id)/(select count(distinct player_id) from activity),2) fraction 
from (select distinct a.player_id as id,count(t.games_played) 
      over(partition by a.player_id) totalsum
      from (select a1.* from activity a1 inner join activity a2 
            where a1.event_date=(a2.event_date +1) 
            and a1.player_id=a2.player_id) 
      as t,activity as a
      where a.player_id = t.player_id) as a4;
      
      
#所以实际应该这么写  
select round(count(distinct t.player_id)/(select count(distinct player_id) from activity),2) fraction 
from (select *,
      first_value(event_date) over(partition by player_id order by event_date) firstdate,
      lead(event_date,1) over(partition by player_id order by event_date) nextdate from activity) t
      where t.nextdate-t.firstdate=1;
      #首日连续登录，我们要先过滤出每个人的第一次登录时间，然后才能做下一步所以first_value(partition by player_id order by event_date)同时按照id分区和日期排序
      #因为每行登陆日期不一样，所以每行都要按照第一次登录时间判断是否满足，那我们假设同组下不同行的登陆时间都是第一次登录时间，前面已经排序了，那我们只要拿到下一行的登陆时间，然后外部表通过这张表根据第一次登陆时间去比较就能筛选出当前行是否连续了
      
      #总表基础上生成一个每行添加两列数据，第一次登陆日期，以及同id的下一次登录日期的表，
      #在上面这张表上再次进行查询，按照条件拿取数据

数据库测试手段

CRUD----C创建用户、R检索视图、U更新数据、D删除数据

预备工作

数据库事务在执行前的四个基本要素是否正确，原子性、一致性、隔离性、持久性
数据完整性，考虑不同模块应用程序用不同方式使用相同数据执行CRUD，所以必须确保数据的准确和一致，这就是数据完整性
业务准确性，能否完整拉取满足用户要求的数据

测试思路

创建查询，验证数据正确和准确性，验证表段的结构
查询数据表段，查看数据表段来验证应用程序gui的操作数据库的结果，但是对于大量数据的数据库是很繁琐的，要验证多个数据表也是困难的
开发人员查询，最简单的验证方式，从workbench或者其他linux执行脚本，执行各自的sql查询，从开发人员处获得结果并验证，在查出错误情况下可以尽快修改数据库。完善业务

数据库测试

压力测试，多个业务语句并发操作
| 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
±----------±----------±-----------±-------------+

Result table:
±----------+
| fraction |
±----------+
| 0.33 |
±----------+

#注意此题有坑，求的是首日开始连续两天登录的比例，而不是连续两天登录的比例，千万注意
#很容易写成这样
select 
round(count(distinct a4.id)/(select count(distinct player_id) from activity),2) fraction 
from (select distinct a.player_id as id,count(t.games_played) 
      over(partition by a.player_id) totalsum
      from (select a1.* from activity a1 inner join activity a2 
            where a1.event_date=(a2.event_date +1) 
            and a1.player_id=a2.player_id) 
      as t,activity as a
      where a.player_id = t.player_id) as a4;
      
      
#所以实际应该这么写  
select round(count(distinct t.player_id)/(select count(distinct player_id) from activity),2) fraction 
from (select *,
      first_value(event_date) over(partition by player_id order by event_date) firstdate,
      lead(event_date,1) over(partition by player_id order by event_date) nextdate from activity) t
      where t.nextdate-t.firstdate=1;
      #首日连续登录，我们要先过滤出每个人的第一次登录时间，然后才能做下一步所以first_value(partition by player_id order by event_date)同时按照id分区和日期排序
      #因为每行登陆日期不一样，所以每行都要按照第一次登录时间判断是否满足，那我们假设同组下不同行的登陆时间都是第一次登录时间，前面已经排序了，那我们只要拿到下一行的登陆时间，然后外部表通过这张表根据第一次登陆时间去比较就能筛选出当前行是否连续了
      
      #总表基础上生成一个每行添加两列数据，第一次登陆日期，以及同id的下一次登录日期的表，
      #在上面这张表上再次进行查询，按照条件拿取数据

数据库测试手段

CRUD----C创建用户、R检索视图、U更新数据、D删除数据

预备工作

数据库事务在执行前的四个基本要素是否正确，原子性、一致性、隔离性、持久性
数据完整性，考虑不同模块应用程序用不同方式使用相同数据执行CRUD，所以必须确保数据的准确和一致，这就是数据完整性
业务准确性，能否完整拉取满足用户要求的数据

测试思路

创建查询，验证数据正确和准确性，验证表段的结构
查询数据表段，查看数据表段来验证应用程序gui的操作数据库的结果，但是对于大量数据的数据库是很繁琐的，要验证多个数据表也是困难的
开发人员查询，最简单的验证方式，从workbench或者其他linux执行脚本，执行各自的sql查询，从开发人员处获得结果并验证，在查出错误情况下可以尽快修改数据库。完善业务