说明
Hive、Phoenix、Mysql、Oracle、SparkSQL、FlinkSQL函数
目录
2、函数大全
a、关系运算:
- 等值比较: =
- 不等值比较: <>
- 小于比较: <
- 小于等于比较: <=
- 大于比较: >
- 大于等于比较: >=
- 空值判断: is null
- 非空判断: is not null
- like比较: like
- java的like操作: rlike
- regexp操作: regexp
b、数学运算:
- 加法操作: +
- 减法操作: -
- 乘法操作: *
- 除法操作: /
- 取余操作: %
- 位与操作: &
- 位或操作: |
- 位异或操作: ^
- 位取反操作: ~
c、逻辑运算:
- 逻辑与操作: and
- 逻辑或操作: or
- 逻辑非操作: not
d、数值计算:
- 取整函数: round
- 指定精度取整函数: round
- 向下取整函数: floor
- 向上取整函数: ceil
- 向上取整函数: ceiling
- 取随机数函数: rand
- 自然指数函数: exp
- 以10为底对数函数: log10
- 以2为底对数函数: log2
- 对数函数: log
- 幂运算函数: pow
- 幂运算函数: power
- 开平方函数: sqrt
- 二进制函数: bin
- 十六进制函数: hex
- 反转十六进制函数: unhex
- 进制转换函数: conv
- 绝对值函数: abs
- 正取余函数: pmod
- 正弦函数: sin
- 反正弦函数: asin
- 余弦函数: cos
- 反余弦函数: acos
- positive函数: positive
- negative函数: negative
e、日期函数:
- unix时间戳转日期函数: from_unixtime
- 获取当前unix时间戳函数: unix_timestamp
- 日期转unix时间戳函数: unix_timestamp
- 指定格式日期转unix时间戳函数: unix_timestamp
- 日期时间转日期函数: to_date
- 日期转年函数: year
- 日期转月函数: month
- 日期转天函数: day
- 日期转小时函数: hour
- 日期转分钟函数: minute
- 日期转秒函数: second
- 日期转周函数: weekofyear
- 日期比较函数: datediff
- 日期增加函数: date_add
- 日期减少函数: date_sub
f、条件函数:
- if函数: if
- 非空查找函数: coalesce
- 条件判断函数:case
- 条件判断函数:case
g、字符串函数:
- 字符串长度函数:length
- 字符串反转函数:reverse
- 字符串连接函数:concat
- 带分隔符字符串连接函数:concat_ws
- 字符串截取函数:substr,substring
- 字符串截取函数:substr,substring
- 字符串转大写函数:upper,ucase
- 字符串转小写函数:lower,lcase
- 去空格函数:trim
- 左边去空格函数:ltrim
- 右边去空格函数:rtrim
- 正则表达式替换函数:regexp_replace
- 正则表达式解析函数:regexp_extract
- url解析函数:parse_url
- json解析函数:get_json_object
- 空格字符串函数:space
- 重复字符串函数:repeat
- 首字符ascii函数:ascii
-左补足函数:lpad
-右补足函数:rpad
-分割字符串函数: split
-集合查找函数: find_in_set
h、集合统计函数:
-个数统计函数: count
-总和统计函数: sum
-平均值统计函数: avg
-最小值统计函数: min
-最大值统计函数: max
-非空集合总体变量函数: var_pop
-非空集合样本变量函数: var_samp
-总体标准偏离函数: stddev_pop
-样本标准偏离函数: stddev_samp
-中位数函数: percentile
-中位数函数: percentile
-近似中位数函数: percentile_approx
-近似中位数函数: percentile_approx
-直方图: histogram_numeric
i、复合类型构建操作:
-map类型构建: map
-struct类型构建: struct
-array类型构建: array
j、复杂类型访问操作:
-array类型访问: a[n]
-map类型访问: m[key]
-struct类型访问: s.x
k、复杂类型长度统计函数:
-map类型长度函数: size(map
-array类型长度函数: size(array
-类型转换函数:cast
l、序号函数:
-唯一行号:row_number
-跳跃序号:rank
-连续序号:dense_rank
-分组函数:ntile
m、窗口函数:
-开窗函数:over
n、脚本函数:
-python脚本:transform
3、自定义函数
1、查看支持哪些函数
show functions;
2、查看函数描述
desc function 函数名;
desc function extended 函数名;
二、函数大全
hive> select 1 from ljc_dual where 1=1;
1
操作类型:所有基本类型
描述:如果 A 和 B 都是非 NULL 值,则返回结果和=一样,如果两者都为 NULL,返回 TRUE, 如果有一个为 NULL,则返回 FALSE。
举例:
hive> select 1 from ljc_dual where NULL <=> NULL;
OK
1
hive> select 1 from ljc_dual where 1 <> 2;
1
hive> select 1 from ljc_dual where 1 < 2;
1
hive> select 1 from ljc_dual where 1 <= 1;
1
hive> select 1 from ljc_dual where 2 > 1;
1
hive> select 1 from ljc_dual where 1 >= 1;
1
hive> select * from ljc_dual;
ok
2011111209 00:00:00 2011111209
hive> select a,b,a<b,a>b,a=b from ljc_dual;
2011111209 00:00:00 2011111209 false true false
操作类型: 所有类型
描述: 如果 A、B、C 有任一个为 NULL,则返回 FALSE. 等价于 B <= A < C. 举例:
hive> select 1 from ljc_dual where 1 between 1 and 2;
OK
1
hive> select 1 from ljc_dual where null is null;
1
hive> select 1 from ljc_dual where 1 is not null;
1
hive> select 1 from ljc_dual where 'football' like 'foot%';
1
hive> select 1 from ljc_dual where 'football' like 'foot____';
1
hive> select 1 from ljc_dual where not 'football' like 'fff%';
1
# 是否以f开头r结尾
hive> select 1 from ljc_dual where 'footbar’ rlike '^f.*r$’;
1
# 判断一个字符串是否全为数字:
hive>select 1 from ljc_dual where '123456' rlike '^\\d+$';
1
hive> select 1 from ljc_dual where '123456aa' rlike '^\\d+$';
# 查找name中是否有中文
hive>select 1 from ljc_dual where name rlike '^[\\u4e00-\\u9fa5]+$';
1
# 是否以f开头r结尾
hive> select 1 from ljc_dual where 'footbar' REGEXP '^f.*r$';
1
# 取create_date_wid不为数值的记录
select count(*) from log_adx_request where create_date_wid not regexp '\\d{8}'
与下面查询的效果是等效的:
select count(*) from olap_b_dw_hotelorder_f where create_date_wid not rlike '\\d{8}';
hive> select 1 + 9 from ljc_dual;
10
hive> create table ljc_dual as select 1 + 1.2 from ljc_dual;
hive> describe ljc_dual;
_c0 double
hive> select 10 – 5 from ljc_dual;
5
hive> create table ljc_dual as select 5.6 – 4 from ljc_dual;
hive> describe ljc_dual;
_c0 double
hive> select 40 * 5 from ljc_dual;
200
hive> select 40 / 5 from ljc_dual;
8.0
hive>select ceil(28.0/6.999999999999999999999) from ljc_dual limit 1;
hive>select ceil(28.0/6.99999999999999) from ljc_dual limit 1;
hive> select 41 % 5 from ljc_dual;
1
hive> select 8.4 % 4 from ljc_dual;
0.40000000000000036
hive> select round(8.4 % 4 , 2) from ljc_dual;
0.4
hive> select 4 & 8 from ljc_dual;
0
hive> select 6 & 4 from ljc_dual;
4
hive> select 4 | 8 from ljc_dual;
12
hive> select 6 | 8 from ljc_dual;
14
hive> select 4 ^ 8 from ljc_dual;
12
hive> select 6 ^ 4 from ljc_dual;
2
hive> select ~6 from ljc_dual;
-7
hive> select ~4 from ljc_dual;
-5
hive> select 1 from ljc_dual where 1=1 and 2=2;
1
hive> select 1 from ljc_dual where 1=2 or 2=2;
1
hive> select 1 from ljc_dual where not 1=2;
1
hive> select round(3.1415926) from ljc_dual;
3
hive> select round(3.5) from ljc_dual;
4
hive> create table ljc_dual as select round(9542.158) from ljc_dual;
hive> describe ljc_dual;
_c0 bigint
hive> select round(3.1415926,4) from ljc_dual;
3.1416
hive> select floor(3.1415926) from ljc_dual;
3
hive> select floor(25) from ljc_dual;
25
hive> select ceil(3.1415926) from ljc_dual;
4
hive> select ceil(46) from ljc_dual;
46
hive> select ceiling(3.1415926) from ljc_dual;
4
hive> select ceiling(46) from ljc_dual;
46
hive> select rand() from ljc_dual;
0.5577432776034763
hive> select rand() from ljc_dual;
0.6638336467363424
hive> select rand(100) from ljc_dual;
0.7220096548596434
hive> select rand(100) from ljc_dual;
0.7220096548596434
hive> select exp(2) from ljc_dual;
7.38905609893065
hive> select ln(7.38905609893065) from ljc_dual;
2.0
hive> select log10(100) from ljc_dual;
2.0
hive> select log2(8) from ljc_dual;
3.0
hive> select log(4,256) from ljc_dual;
4.0
hive> select pow(2,4) from ljc_dual;
16.0
hive> select power(2,4) from ljc_dual;
16.0
hive> select sqrt(16) from ljc_dual;
4.0
hive> select bin(7) from ljc_dual;
111
hive> select hex(17) from ljc_dual;
11
hive> select hex(‘abc’) from ljc_dual;
616263
hive> select unhex(‘616263’) from ljc_dual;
abc
hive> select unhex(‘11’) from ljc_dual;
-
hive> select unhex(616263) from ljc_dual;
abc
hive> select conv(17,10,16) from ljc_dual;
11
hive> select conv(17,10,2) from ljc_dual;
10001
hive> select abs(-3.9) from ljc_dual;
3.9
hive> select abs(10.9) from ljc_dual;
10.9
hive> select pmod(9,4) from ljc_dual;
1
hive> select pmod(-9,4) from ljc_dual;
3
hive> select sin(0.8) from ljc_dual;
0.7173560908995228
hive> select asin(0.7173560908995228) from ljc_dual;
0.8
hive> select cos(0.9) from ljc_dual;
0.6216099682706644
hive> select acos(0.6216099682706644) from ljc_dual;
0.9
hive> select positive(-10) from ljc_dual;
-10
hive> select positive(12) from ljc_dual;
12
hive> select negative(-5) from ljc_dual;
5
hive> select negative(8) from ljc_dual;
-8
hive> select from_unixtime(1323308943,'yyyyMMdd') from ljc_dual;
20111208
hive> select unix_timestamp() from ljc_dual;
1323309615
hive> select unix_timestamp('2011-12-07 13:01:03') from ljc_dual;
1323234063
hive> select unix_timestamp('20111207 13:01:03','yyyyMMdd HH:mm:ss') from ljc_dual;
1323234063
hive> select to_date('2011-12-08 10:03:01') from ljc_dual;
2011-12-08
hive> select year('2011-12-08 10:03:01') from ljc_dual;
2011
hive> select year('2012-12-08') from ljc_dual;
2012
hive> select month('2011-12-08 10:03:01') from ljc_dual;
12
hive> select month('2011-08-08') from ljc_dual;
8
hive> select day('2011-12-08 10:03:01') from ljc_dual;
8
hive> select day('2011-12-24') from ljc_dual;
24
hive> select hour('2011-12-08 10:03:01') from ljc_dual;
10
hive> select minute('2011-12-08 10:03:01') from ljc_dual;
3
hive> select second('2011-12-08 10:03:01') from ljc_dual;
1
hive> select weekofyear('2011-12-08 10:03:01') from ljc_dual;
49
hive> select datediff('2012-12-08','2012-05-09') from ljc_dual;
213
hive> select date_add('2012-12-08',10) from ljc_dual;
2012-12-18
hive> select date_sub('2012-12-08',10) from ljc_dual;
2012-11-28
参数:string/date:不支持中文格式的(yyyy年mm月dd日),支持yyyy-mm-dd格式的
返回值:string
hive> select date_format('2011-12-08 10:03:01','yyyy-mm') from task31_table limit 1;
2011-03
hive> select if(1=2,100,200) from ljc_dual;
200
hive> select if(1=1,100,200) from ljc_dual;
100
hive> select COALESCE(null,'100','50′) from ljc_dual;
100
hive> Select case 100 when 50 then 'tom' when 100 then 'mary' else 'tim' end from ljc_dual;
mary
hive> Select case 200 when 50 then 'tom' when 100 then 'mary' else 'tim' end from ljc_dual;
tim
hive> select case when 1=2 then 'tom' when 2=2 then 'mary' else 'tim' end from ljc_dual;
mary
hive> select case when 1=1 then 'tom' when 2=2 then 'mary' else 'tim' end from ljc_dual;
tom
hive> select length('abcedfg') from ljc_dual;
7
hive> select reverse(abcedfg’) from ljc_dual;
gfdecba
hive> select concat(‘abc’,'def’,'gh’) from ljc_dual;
abcdefgh
d、带分隔符字符串连接函数:concat_ws
hive> select concat_ws(',','abc','def','gh') from ljc_dual;
abc,def,gh
hive> select substr('abcde',3) from ljc_dual;
cde
hive> select substring('abcde',3) from ljc_dual;
cde
hive> select substr('abcde',-1) from ljc_dual; (和ORACLE相同)
e
hive> select substr('abcde',3,2) from ljc_dual;
cd
hive> select substring('abcde',3,2) from ljc_dual;
cd
hive>select substring('abcde',-2,2) from ljc_dual;
de
hive> select upper('abSEd') from ljc_dual;
ABSED
hive> select ucase('abSEd') from ljc_dual;
ABSED
hive> select lower('abSEd') from ljc_dual;
absed
hive> select lcase('abSEd') from ljc_dual;
absed
hive> select trim(' abc ') from ljc_dual;
abc
hive> select ltrim(' abc ') from ljc_dual;
abc
hive> select rtrim(' abc ') from ljc_dual;
abc
hive> select regexp_replace('foobar', 'oo|ar', '') from ljc_dual;
fb
第二参数: 需要匹配的正则表达式
第三个参数:
0是显示与之匹配的整个字符串
1 是显示第一个括号里面的
2 是显示第二个括号里面的字段
hive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 1) from ljc_dual;
the
hive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 2) from ljc_dual;
bar
hive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 0) from ljc_dual;
foothebar
select data_field,
regexp_extract(data_field,'.*?bgStart\\=([^&]+)',1) as aaa,
regexp_extract(data_field,'.*?contentLoaded_headStart\\=([^&]+)',1) as bbb,
regexp_extract(data_field,'.*?AppLoad2Req\\=([^&]+)',1) as ccc
from pt_nginx_loginlog_st
where pt = '2012-03-26' limit 2;
n、url解析函数:parse_url
hive> select parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') from ljc_dual;
facebook.com
hive> select parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1') from ljc_dual;
v1
hive> select get_json_object('{"store":
> {"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],
> "bicycle":{"price":19.95,"color":"red"}
> },
> "email":"amy@only_for_json_udf_test.net",
> "owner":"amy"
> }
> ','$.owner') from ljc_dual;
amy
hive> select space(10) from ljc_dual;
hive> select length(space(10)) from ljc_dual;
10
hive> select repeat('abc',5) from ljc_dual;
abcabcabcabcabc
hive> select ascii('abcde') from ljc_dual;
97
hive> select lpad('abc',10,'td') from ljc_dual;
tdtdtdtabc
hive> select rpad('abc',10,'td') from ljc_dual;
abctdtdtdt
# 返回为一个数组
hive> select split('a,b,c,d',',') from ljc_dual;
["a","b","c","d"]
# 截取字符串中的某个值
hive> select split('a,b,c,d',',')[0] from ljc_dual;
["a","b","c","d"]
a
# 特殊字符处理,特殊字符有. \ |等,regex 为字符串匹配的参数,所以遇到特殊字符的时候需要做特殊的处理,以下"."点为例
hive> select split('192.168.0.1','.') from ljc_dual;
[]
# 正确的写法
hive> select split('192.168.0.1','\\.') from ljc_dual;
["192","168","0","1"]
v、集合查找函数: find_in_set
hive> select find_in_set('ab','ef,ab,de') from ljc_dual;
2
hive> select find_in_set('at','ef,ab,de') from ljc_dual;
0
hive> select count(*) from ljc_dual;
20
hive> select count(distinct t) from ljc_dual;
10
hive> select sum(t) from ljc_dual;
100
hive> select sum(distinct t) from ljc_dual;
70
hive> select avg(t) from ljc_dual;
50
hive> select avg (distinct t) from ljc_dual;
30
hive> select min(t) from ljc_dual;
20
hive> select max(t) from ljc_dual;
120
select percentile(score,<0.2,0.4>) from ljc_dual; 取0.2,0.4位置的数据
hive> select histogram_numeric(100,5) from ljc_dual;
[{"x":100.0,"y":1.0}]
hive> Create table ljc_test as select map('100','tom','200','mary') as t from ljc_dual;
hive> describe ljc_test;
t map<string,string>
hive> select t from ljc_test;
{"100":"tom","200":"mary"}
hive> create table ljc_test as select struct('tom','mary','tim') as t from ljc_dual;
hive> describe ljc_test;
t struct<col1:string,col2:string,col3:string>
hive> select t from ljc_test;
{"col1":"tom","col2":"mary","col3":"tim"}
hive> create table ljc_test as select array("tom","mary","tim") as t from ljc_dual;
hive> describe ljc_test;
t array
hive> select t from ljc_test;
["tom","mary","tim"]
hive> create table ljc_test as select array("tom","mary","tim") as t from ljc_dual;
hive> select t[0],t[1],t[2] from ljc_test;
tom mary tim
hive> Create table ljc_test as select map('100','tom','200','mary') as t from ljc_dual;
hive> select t['200'],t['100'] from ljc_test;
mary tom
hive> create table ljc_test as select struct('tom','mary','tim') as t from ljc_dual;
hive> describe ljc_test;
t struct<col1:string,col2:string,col3:string>
hive> select t.col1,t.col3 from ljc_test;
tom tim
hive> select size(map('100','tom','101','mary')) from ljc_dual;
2
hive> select size(array('100','101','102','103')) from ljc_dual;
4
举例
select userid,month,money,row_number() over(partition by userid order by money) rownumber from user;
select userid,month,money,rank() over(partition by userid order by money) rownumber from user;
select userid,month,money,dense_rank() over(partition by userid order by money) rownumber from user;
select money,userid,ntile(10) over(partition by money order by month) rownumber from user
partition by用于分组
order by用于排序
rows、range用于指定区间
结合preceding 、following来用
preceding用于指定上区间
following用于指定下区间
示例
指定上、下区间
select *,sum(money) over(order by month range between 2 preceding and 2 following) aa from user order by userid,month;
指定上区间
select *,sum(money) over(partition by userid order by month range 12 preceding) aa from user order by userid,month;
语法:transform(字段,字段,..) using '脚本' as (字段,字段,...)
说明:
1)、脚本可以python脚本,但py脚本中,每个字段必须是通过\t分割的,否则可能遇到查询结果为null
2)、python脚本的第一行必须正确,如果运行在centos,编码需是unix,否则会报no such file or directory
1、udf(user defined function): 自定义函数,特点是输入一行,输出一行
2、udaf(user defined aggregation function): 自定义聚合函数,特点是输入多行,输出一行
3、udtf(user-defined table-generating functions): 自定义拆分函数,特点是输入一行,输出多行
注意:本文归作者所有,未经作者允许,不得转载