1。正则表达式
搜索多个关键词,“或”, 等同 %上海% | %内蒙古%
SELECT * FROM analysis_result WHERE result REGEXP '上海|内蒙古' LIMIT 1;
"且"
SELECT * FROM analysis_result WHERE id = 1 AND result REGEXP '上海' AND result REGEXP '云南' LIMIT 1;
SELECT * FROM analysis_result WHERE id = 1 AND result LIKE '%上海%' AND result LIKE '%云南%' LIMIT 1;
2.替换字符串
将"3G"替换成"MOBILE"
UPDATE analysis_result SET result=replace(result,'3G','MOBILE') WHERE result_type = 'PCNUM';
3.betweeen and 与 >= ,决不能等同于 in
select count(1) from default.user where univname REGEXP '上海' and univyear between 2011 and 2013;
select count(1) from default.user where univname REGEXP '上海' and univyear >= 2011 and univyear <= 2013;
不同于下面
select count(1) from default.user where univname REGEXP '上海' and univyear in (2010, 2013);
select a.collegename, a.allcount from (select collegename, count(id) as allcount from default.user where collegename != '\N' group by collegename) a sort by a.allcount DESC
5.数据显示 \N 的值,需要用 = '\\N' 去查询,非的时候就用 != '\N'
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/result.txt' select id,name from t_test;
hive -e "select count(id), friendcount from default.user where id > 0 and birthday != '\\N' group by friendcount" > friendcount.txt
hive -e "select * from default.user where id > 0 and birthday != '\\N' group by friendcount limit 10000" > user.txt
需要特别指出的是,在筛选某些时间字段的时候,\N仍会被筛选进来
如下面会把 \N晒进来
select * from user where regtime > '2014-01-01' limit 10
所以应该
select * from user where regtime != '\\N' and regtime > '2014-01-01' limit 10;
6.分组后排序
select a.univname, a.allcount from (select univname, count(id) as allcount from default.user where univname != '\N' group by univname) a sort by a.allcount DESC
本文介绍了如何使用正则表达式进行关键词搜索(如上海、内蒙古),对比了LIKE与REGEXP在SQL中的应用,并讲解了`between and`与`>= <=`的区别。还涉及了字符串替换(3G到MOBILE)、数据筛选和特殊字符处理(如'N')。

被折叠的 条评论
为什么被折叠?



