阿里巴巴开发手册解析个人笔记（五）Mysql规约(三）SQL语句_标准统计行数的语法,跟数据库无关,跟 null 和非 null 无关。-优快云博客

本文链接：https://blog.youkuaiyun.com/u010498753/article/details/85994623

文章目录

(三)SQL 语句

(三)SQL 语句

1. 【强制】不要使用 count(列名)或 count(常量)来替代 count()， count()是 SQL92 定义的

标准统计行数的语法，跟数据库无关，跟 NULL 和非 NULL 无关。
说明： count(*)会统计值为 NULL 的行，而 count(列名)不会统计此列为 NULL 值的行。
例子：摘自：https://blog.youkuaiyun.com/iFuMI/article/details/77920767 并做了点优化

drop table if exists counttest;
create table counttest(name varchar(255), age  varchar(255));

delete from counttest;
#插入测试数据，有的为null列，有的是空字符串
insert into counttest values ('age为14的列', '14'),('age为15的列', '15'), ('age为15的列', '15'),('age为NULL的列', NULL), ('age为16的列', '16'),('age为17的列', '17'),('age为NULL的列', null), ('age为空字符串的列', '');


select * from counttest;

select name, count(name), count(2), count(*), count(age), count(distinct(age)) from counttest group by name;

在这里插入图片描述

1.1题外话count(1) 与 count(*)有什么区别？

没区别
摘自
https://stackoverflow.com/questions/18291036/what-is-the-difference-between-count0-count1-and-count-in-mysql-sql

Nothing really, unless you specify a field in a table or an expression 
within parantheses instead of constant values or *

Let me give you a detailed answer. Count will give you non-null
 record number of given field. Say you have a table named A

select 1 from A
select 0 from A
select * from A
will all return same number of records, that is the number of rows in table A. Still the output is different. If there are 3 records in table. With X and Y as field names

select 1 from A will give you

1
1
1

select 0 from A will give you
0
0
0

select * from A will give you ( assume two columns X and Y is in the table )
X      Y
--     --
value1 value1
value2 (null)
value3 (null)
So, all three queries return the same number. Unless you use

select count(Y) from A 
since there is only one non-null value you will get 1 as output

2. 【强制】 count(distinct col) 计算该列除 NULL 之外的不重复行数

注意 count(distinct col1, col2) 如果其中一列全为 NULL，那么即使另一列有不同的值，也返回为 0。

drop table if exists counttest;
create table counttest(name varchar(255), age  varchar(255));

delete from counttest;
#插入测试数据，有的为null列，有的是空字符串
insert into counttest values ('age为14的列', '14'),('age为15的列', '15'), ('age为15的列', '15'),('age为NULL的列', NULL), ('age为16的列', '16'),('age为17的列', '17'),('age为NULL的列', null), ('age为空字符串的列', '');


select * from counttest;

select name, count(name), count(2), count(*), count(age), count(distinct(age)) from counttest group by name;

#这个最后一列的结果也是一样的，我觉的应该是不标准的sql导致的。
select name, count(name), count(2), count(*), count(age), count(distinct(age)),count(distinct age,name) from counttest group by name;

在这里插入图片描述

3. 【强制】当某一列的值全是 NULL 时， count(col)的返回结果为 0，但 sum(col)的返回结果为NULL，因此使用 sum()时需注意 NPE 问题。

正例：可以使用如下方式来避免 sum 的 NPE 问题： SELECT IF(ISNULL(SUM(g)),0,SUM(g))
FROM table; 所有聚合函数同理
例子程序

drop table if exists counttest;
create table counttest(name varchar(255), age  varchar(255));

delete from counttest;
#插入测试数据，有的为null列，有的是空字符串
insert into counttest values ('age为14的列', '14'),('age为15的列', '15'), ('age为15的列', '15'),('age为NULL的列', NULL), ('age为16的列', '16'),('age为17的列', '17'),('age为NULL的列', null), ('age为空字符串的列', '');


select * from counttest;

select name, count(name), count(2), count(*), count(age), count(distinct(age)),count(distinct age,name),SUM(age),sum(IFNULL(age,0)) from counttest group by name;

在这里插入图片描述

4. 【强制】使用 ISNULL()来判断是否为 NULL 值。

说明： NULL 与任何值的直接比较都为 NULL。
1） NULL<>NULL 的返回结果是 NULL，而不是 false。
2） NULL=NULL 的返回结果是 NULL，而不是 true。
3） NULL<>1 的返回结果是 NULL，而不是 true。

NULL与任何值比较(>,<,!=,!,+,-)时在mysql都返回null，不是0或者1，所以在sql查询中要注意为NULL的列

drop table if exists counttest;
create table counttest(name varchar(255), age  varchar(255));

delete from counttest;
#插入测试数据，有的为null列，有的是空字符串
insert into counttest values ('age为14的列', '14'),('age为15的列', '15'), ('age为15的列', '15'),('age为NULL的列', NULL), ('age为16的列', '16'),('age为17的列', '17'),('age为NULL的列', null), ('age为空字符串的列', '');
select * from counttest;

#NULL 是空指针，可以是一个黑箱，黑箱里可能藏着很多东西，但他不是确定的值 
#一共七行，age有 2列为NULL，5列不为NULL
select name,age from counttest where age = NULL #null有2行以为能找到
select name,age from counttest where age != NULL #null有5行以为能找到 ，而且返回的是NULL
select name,age from counttest where isnull(age) #正确返回


#都是NULL，而不是0或者1
select NULL=NULL,NULL!=NULL,NULL<>NULL,NULL<>1

5. 【强制】在代码中写分页查询逻辑时，若 count 为 0 应直接返回，避免执行后面的分页语句。

我们前面说了 limit 10000,20的原理是把1万20行找出来，再把前1万行去掉，这里规范说明的意思应该是在开始时

drop table if exists counttest;
create table counttest(name varchar(255), age  varchar(255));

delete from counttest;
#插入测试数据，有的为null列，有的是空字符串

#为0的数据了
select count(*) from counttest

#白白浪费了搜索5000条数据的效率
select * from counttest limit 0,5000

6. 【强制】不得使用外键与级联，一切外键概念必须在应用层解决。

说明：以学生和成绩的关系为例，学生表中的 student_id是主键，那么成绩表中的 student_id
则为外键。如果更新学生表中的 student_id，同时触发成绩表中的 student_id 更新，即为
级联更新。外键与级联更新适用于单机低并发，不适合分布式、高并发集群；级联更新是强阻
塞，存在数据库更新风暴的风险；外键影响数据库的插入速度。
转载来自：https://www.zhihu.com/question/19600081/answer/13295957

在这里插入图片描述

7. 【强制】禁止使用存储过程，存储过程难以调试和扩展，更没有移植性。

存储过程大部分是为了清理脏数据，不要用他来控制业务逻辑。
若必须要调试mysql，使用

dbForge Studio for MySQL

8. 【强制】数据订正（特别是删除、修改记录操作）时，要先 select，避免出现误删除，确认无误才能执行更新语句。

一旦delete或者update，数据库无法回滚找回，没有后悔药，经常有些人 update或者delete时忘记写条件，一旦发现后悔莫及，又经常忘记备份数据库，导致爆炸

select column_x from table_a,table_b ; #确认一下这一句的正确性
#再执行这一句
insert table(column_y) select column_x from table_a,table_b

select column_x from table_a,table_b ; #确认一下这一句的正确性
#再执行 
update table_a,table_b set column_x=k ; #注意select 和update的方法不一样

column_y 与column_x 的数据状态要一致

9. 【推荐】 in 操作能避免则避免，若实在避免不了，需要仔细评估 in 后边的集合元素数量，控制在 1000 个之内。

我们来查看一下 http://www.mysqltutorial.org/mysql-exists/
1、IN()语句内部工作原理
IN()只执行一次，它查出B表中的所有id字段并缓存起来。之后，检查A表的id是否与B表中的id相等，如果相等则将A表的记录加入结果集中，直到遍历完A表的所有记录。
摘自：
https://www.jianshu.com/p/f212527d76ff
http://www.mysqltutorial.org/mysql-exists/

select * from A where id in (select id from B);

select * from A where exists (select 1 from B where A.id=B.id);
in的过程它的查询过程类似于以下过程：
select * from A where id in (select id from B);
IN()只执行一次，它查出B表中的所有id字段并缓存起来。之后，检查A表的id是否与B表中的id相等，如果相等则将A表的记录加入结果集中，直到遍历完A表的所有记录。
它的查询过程类似于以下过程：

List resultSet={};
Array A=(select * from A);
Array B=(select id from B);

for(int i=0;i<A.length;i++) {
  for(int j=0;j<B.length;j++) {
      if(A[i].id==B[j].id) {
        resultSet.add(A[i]);
        break;
      }
  }
}
return resultSet;

2、EXISTS()语句内部工作原理
exists()会执行A.length次，它并不缓存exists()结果集，因为exists()结果集的内容并不重要，重要的是其内查询语句的结果集空或者非空，空则返回false，非空则返回true。
它的查询过程类似于以下过程：

select * from A where exists (select 1 from B where A.id=B.id);

List resultSet={};
Array A=(select * from A);

for(int i=0;i<A.length;i++) {
   if(exists(A[i].id) {  //执行select 1 from B where B.id=A.id是否有记录返回
       resultSet.add(A[i]);
   }
}
return resultSet;

当B表比A表数据大时适合使用exists()，因为它没有那么多遍历操作，只需要再执行一次查询就行。

例1：A表有10000条记录，B表有1000000条记录，那么exists()会执行10000次去判断A表中的id是否与B表中的id相等。
例2：A表有10000条记录，B表有100000000条记录，那么exists()还是执行10000次，因为它只执行A.length次，可见B表数据越多，越适合exists()发挥效果。
例3：A表有10000条记录，B表有100条记录，那么exists()还是执行10000次，还不如使用in()遍历10000*100次，因为in()是在内存里遍历比较，而exists()需要查询数据库，我们都知道查询数据库所消耗的性能更高，而内存比较很快。

结论：EXISTS()适合B表比A表数据大的情况