SQL数据处理

DBeaver数据库操作软件,SQL语言

SELECT …… FROM…… AS

select 表内容 form 数据库名 order by asc/desc
升序或降序排列
只能出现一个selcet

从一个/多个表中检索数据。
select 是sql中的数据查询关键词 from 和 where是后缀的定位词 。
select from【<数据库名!>】<表名>【,【<数据库名!>】<表名>……】 where<连接条件>
SELECT后跟想检索的内容,一/多个字段,即可以是字段名也可以是函数(系统自定义的);也可以是星号,表示输出表中所有的字段。
FROM是检索内容的来源,即表的名称。
WHERE是指定查询条件,把满足逻辑表达式的座位查询结果,是可选项。

SELECT …… FROM…… WHERE

如:SELECT 学号,成绩 FROM 学生选课 WHERE 学号 like ‘2006*’
在学生选课表中查询学号以2006开头的学生的学号和成绩。

select distinct :去重操作,当多行重复时,只取出一行

 如:
 select DISTINCT
t1.send_stockorg_code as org_id,
t2.mng_org_name as mng_reg_id  
from edw_ai.dim_stockorg_relationship as t1 
left join 
(select 
store_code,
mng_org_name
from edw_ai.dim_mng_org_store) as t2
on t1.receive_stockorg_code = t2.store_code 

## 

两个应用,一个是SELECT…FROM…AS…
一个是,作为连接语句的操作符。
如sql中:create table tablename as select * from tablename2;
解释:先获取tablename表中的所有记录,之后创建一张tablename表,结构和tablename2表相同,记录为后面语句的查询结果。

WITH… AS…

with 公用表达式名字 as
(select * from xxx where 条件)
select * from 公用表达式名字
如:with A as (select * from class)
select *from A
这个语句的意思就是,先执行select * from class 得到一个结果,将这个结果记录为A ,再执行select *from A 语句。A 表只是一个别名。 也就是将重复用到的大批量 的SQL语句,放到with as 中,加一个别名,在后面用到的时候就可以直接用。(也就是说,with as 类似于子函数)

对于大批量的SQL数据,起到优化的作用。

WITH的相关总结

with as 的使用范围仅限制在,with as 紧跟着的下一条SQL。之后with as 公用SQL失效。
1.使用with子句可以让子查询重用相同的with查询块,通过select调用(with子句只能被select查询块引用),一般在with查询用到多次情况下。在引用的select语句之前定义,同级只能定义with关键字只能使用一次,多个用逗号分割。
2.with子句的返回结果存到用户的临时表空间中,只做一次查询,反复使用,提高效率。
3.在同级select前有多个查询定义的时候,第1个用with,后面的不用with,并且用逗号隔开。
4.最后一个with 子句与下面的查询之间不能有逗号,只通过右括号分割,with 子句的查询必须用括号括起来。
5.如果定义了with子句,而在查询中不使用,那么会报ora-32035 错误:未引用在with子句中定义的查询名。(至少一个with查询的name未被引用,解决方法是移除未被引用的with查询),注意:只要后面有引用的就可以,不一定非要在主查询中引用,比如后面的with查询也引用了,也是可以的。
6.前面的with子句定义的查询在后面的with子句中可以使用。但是一个with子句内部不能嵌套with子句。
7.当一个查询块名字和一个表名或其他的对象相同时,解析器从内向外搜索,优先使用子查询块名字。
8.with查询的结果列有别名,引用的时候必须使用别名或*。

多表合并的时候,经常会用到 WITH xxx AS和JOIN命令,比如说:

JOIN
SELECT t1.product_code AS prod_id,
              t1.color_code AS color_id,
              t1.size_code AS size_id,
              t1.size_order,
FROM edw_ai.dim_sku AS t1
JOIN (SELECT product_year, product_quarter 
      FROM edw_ai.config_target_product 
      WHERE day_date = '2019-04-15') AS t2 
ON t1.sku_year = t2.product_year AND
  t1.sku_quarter = t2.product_quarter

注释:

  1. ’‘’用product,color,size的code来代替其id
    2.’’'在form的表里的2019-04-15一天里找到poduct_year及product_quarter
    3.JOIN ON 语句几位连接t1和t2表

WITH……AS和JOIN一起使用,

WITH……AS相当于建立了一个临时表

 WITH 
      prod_info_targ AS
      (SELECT t1.product_code AS prod_id,
                    t1.color_code AS color_id
       FROM edw_ai.dim_sku AS t1
       JOIN (SELECT product_year, product_quarter 
             FROM edw_ai.config_target_product 
             WHERE day_date = '{0}') AS t2
       ON t1.sku_year = t2.product_year AND
          t1.sku_quarter = t2.product_quarter)  
          
    SELECT t1.product_code AS prod_id,
        t1.color_code AS color_id,
        t1.size_code AS size_id,
        t1.size_code_num AS lfs
    FROM edw.skc_main_size AS t1
    JOIN prod_info_targ AS t2
    ON t1.product_code = t2.prod_id
    AND t1.color_code = t2.color_id


      with
       t1 as 
       (SELECT stockorg_code as store_id,
               org_code AS mng_reg_id
       FROM edw_ai.dim_stockorg),
       
       t2 as
       (SELECT send_stockorg_code as org_id,
               receive_stockorg_code,
               t1.store_id,
               t1.mng_reg_id
       FROM edw_ai.dim_stockorg_relationship
       join t1
       on t1.store_id = edw_ai.dim_stockorg_relationship.receive_stockorg_code
       where move_type = '发货')
      
     select store_id, mng_reg_id, org_id, receive_stockorg_code from t2

偶尔还会用到的命令:

COALESCE(t1.qty, 0)
这个函数主要用来进行空值处理。即返回其参数中第一个非空表达式,如果所有参数均为 NULL,则 COALESCE 返回 NULL。至少应有一个 Null 值为 NULL 类型。
sql = '''
		WITH 
		  prod_info_targ AS
		  (SELECT t1.product_code AS prod_id,
		          t1.color_code AS color_id,
		          t1.size_code AS size_id
		   FROM  edw_ai.dim_sku AS t1
		   JOIN (SELECT product_year, product_quarter 
		         FROM edw_ai.config_target_product 
		         WHERE day_date = '{1}') AS  t2
		   ON t1.sku_year = t2.product_year AND
		      t1.sku_quarter = t2.product_quarter),
		
		  org_info_targ AS
		  (SELECT DISTINCT stockorg_code AS org_id
		   FROM edw_ai.dim_stockorg
		   WHERE status = '正常' AND org_flag = '1')
		
		SELECT t1.product_code AS prod_id,
		       t1.color_code AS color_id,
		       t1.size_code AS size_id,
		       t1.org_code AS org_id,
		       t1.sale_date AS date_sell,
		       SUM(COALESCE(t1.qty, 0)) AS s
		FROM edw_ai.fct_sales AS t1
		JOIN prod_info_targ AS t2
		ON t1.product_code = t2.prod_id AND
		   t1.color_code = t2.color_id AND
		   t1.size_code = t2.size_id
		JOIN org_info_targ AS t3
		ON t1.org_code = t3.org_id
		     WHERE t1.sale_date BETWEEN '{0}' AND '{1}'
		GROUP BY t1.product_code, t1.color_code, t1.size_code,
		         t1.org_code, t1.sale_date
'''.format((date_decision - relativedelta(weeks=4))
           .strftime('%Y-%m-%d'),
           (date_decision - relativedelta(days=1))
           .strftime('%Y-%m-%d'))

 sales_info = spark.sql(sql).toPandas()
 sql =  '''  
     select 
       product_code as prod_id,color_code as color_id ,receive_org_code as org_id ,min(receive_date)as first_date
      from edw_ai.fct_stock_move
      where send_date BETWEEN '{0}' and '{1}'
      group by product_code,color_code,receive_org_code
      '''.format((date_decision - relativedelta(years=1))
                  .strftime('%Y-%m-%d'),
                  (date_decision - relativedelta(days=1))
                  .strftime('%Y-%m-%d'))

data_type = {'dt': ['first_date'],
              'int': [],
              'float': [],
              'str': ['prod_id', 'color_id',
                      'org_id']}
 first_rec_date = spark.sql(sql).toPandas()
 first_rec_date = self.trans_data(first_rec_date, data_type)

# 转换类型函数 def trans_data(a,b)
df_t = df.copy()
for col in data_type['dt']:
    if col in df_t.columns:
        df_t[col] = pd.to_datetime(df_t[col])

for col in data_type['int']:
    if col in df_t.columns:
        df_t[col] = df_t[col].astype('float').astype('int')

for col in data_type['float']:
    if col in df_t.columns:
        df_t[col] = df_t[col].astype('float')

for col in data_type['str']:
    if col in df_t.columns:
        df_t[col] = df_t[col].astype('str')
SELECT stockorg_code AS org_id, 
       province AS prov_id, 
       city AS city_id,
       reserved1 AS dist_id,
       org_code AS mng_reg_id,
       brand_code,
       (CASE org_flag WHEN 1 THEN 1 ELSE 0 END) AS is_store
FROM edw_ai.dim_stockorg
WHERE  status = '正常' and ((stockorg_code <> '9686'  and org_flag= 2) OR
      org_flag = 1 )

COALESCE ( expression,value1,value2……,valuen)
COALESCE()函数的第一个参数expression为待检测的表达式,而其后的参数个数不定。
COALESCE()函数将会返回包括expression在内的所有参数中的第一个非空表达式。如果
expression不为空值则返回expression;否则判断value1是否是空值,如果value1不为空值则返
回value1;否则判断value2是否是空值,如果value2不为空值则返回value3;……以此类推,
如果所有的表达式都为空值,则返回NULL。

GROUP BY
统计某一个组的信息。如:
SELECT Customer,SUM(OrderPrice) FROM Orders
GROUP BY Customer 即统计customer的orderprice之和。

BETWEEN ‘2019-04-15’ AND ‘2019-05-01’
选取介于两个值之间的数据范围。这些值可以是数值、文本或者日期。
有的是介于两者之间,不包含边界;有的是包含左界,不包含右届。

SUM(t1.qty) AS q

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值