2019独角兽企业重金招聘Python工程师标准>>>
1。正则表达式
搜索多个关键词,“或”, 等同 %上海% | %内蒙古%
SELECT * FROM analysis_result WHERE result REGEXP '上海|内蒙古' LIMIT 1;
"且"
SELECT * FROM analysis_result WHERE id = 1 AND result REGEXP '上海' AND result REGEXP '云南' LIMIT 1;
SELECT * FROM analysis_result WHERE id = 1 AND result LIKE '%上海%' AND result LIKE '%云南%' LIMIT 1;
2.替换字符串
将"3G"替换成"MOBILE"
UPDATE analysis_result SET result=replace(result,'3G','MOBILE') WHERE result_type = 'PCNUM';
3.betweeen and 与 >= ,决不能等同于 in
select count(1) from default.user where univname REGEXP '上海' and univyear between 2011 and 2013;select count(1) from default.user where univname REGEXP '上海' and univyear >= 2011 and univyear <= 2013;
不同于下面
select count(1) from default.user where univname REGEXP '上海' and univyear in (2010, 2013);
4.Group by 问题:Expression Not In Group By Key
select a.collegename, a.allcount from (select collegename, count(id) as allcount from default.user where collegename != '\N' group by collegename) a sort by a.allcount DESC
5.数据显示 \N 的值,需要用 = '\\N' 去查询,非的时候就用 != '\N'
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/result.txt' select id,name from t_test;
hive -e "select count(id), friendcount from default.user where id > 0 and birthday != '\\N' group by friendcount" > friendcount.txt
hive -e "select * from default.user where id > 0 and birthday != '\\N' group by friendcount limit 10000" > user.txt
需要特别指出的是,在筛选某些时间字段的时候,\N仍会被筛选进来
如下面会把 \N晒进来
select * from user where regtime > '2014-01-01' limit 10
所以应该
select * from user where regtime != '\\N' and regtime > '2014-01-01' limit 10;
6.分组后排序
select a.univname, a.allcount from (select univname, count(id) as allcount from default.user where univname != '\N' group by univname) a sort by a.allcount DESC