spark sql常见的内置函数

最新推荐文章于 2025-05-30 09:50:32 发布

m0_48714980

最新推荐文章于 2025-05-30 09:50:32 发布

阅读量1.1k

点赞数

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/m0_48714980/article/details/112461145

本文详细介绍了Spark SQL中常用的字符串处理函数，包括concat、concat_ws、decode、encode、format_string等，以及日期时间操作，如current_date、current_timestamp、unix_timestamp、from_unixtime等。这些函数在数据处理和分析中起着重要作用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

字符串：

1.concat对于字符串进行拼接
concat(str1, str2, …, strN) - Returns the concatenation of str1, str2, …, strN.

Examples:> SELECT concat(‘Spark’, ‘SQL’);　　SparkSQL

2.concat_ws在拼接的字符串中间添加某种格式
concat_ws(sep, [str | array(str)]+) - Returns the concatenation of the strings separated by sep.

Examples:> SELECT concat_ws(’ ', ‘Spark’, ‘SQL’);　　Spark SQL

3.decode转码
decode(bin, charset) - Decodes the first argument using the second argument character set.

Examples: > SELECT decode(encode(‘abc’, ‘utf-8’), ‘utf-8’);　　 abc

4.encode设置编码格式
encode(str, charset) - Encodes the first argument using the second argument character set.

Examples: > SELECT encode(‘abc’, ‘utf-8’);abc

5.format_string/printf 格式化字符串
format_string(strfmt, obj, …) - Returns a formatted string from printf-style format strings.

Examples:> SELECT format_string(“Hello World %d %s”, 100, “days”);　　Hello World 100 days

6.initcap将每个单词的首字母变为大写，其他字母小写; lower全部转为小写，upper大写
initcap(str) - Returns str with the first letter of each word in uppercase. All other letters are in lowercase. Words are delimited by white space.

Examples:> SELECT initcap(‘sPark sql’);　　Spark Sql

7.length返回字符串的长度
Examples:> SELECT length('Spark SQL ');　　10

8.levenshtein编辑距离（将一个字符串变为另一个字符串的距离）
levenshtein(str1, str2) - Returns the Levenshtein distance between the two given strings.

Examples:> SELECT levenshtein(‘kitten’, ‘sitting’);　　 3

9.lpad返回固定长度的字符串，如果长度不够，用某种字符补全，rpad右补全
lpad(str, len, pad) - Returns str, left-padded with pad to a length of len. If str is longer than len, the return value is shortened to len characters.

Examples:> SELECT lpad(‘hi’, 5, ‘??’);　　 ???hi

10.ltrim去除空格或去除开头的某些字符,rtrim右去除，trim两边同时去除
ltrim(str) - Removes the leading space characters from str.

ltrim(trimStr, str) - Removes the leading string contains the characters from the trim string

Examples:

SELECT ltrim(’ SparkSQL ');　　 SparkSQL
SELECT ltrim(‘Sp’, ‘SSparkSQLS’);　　 arkSQLS
11.regexp_extract 正则提取某些字符串，regexp_replace正则替换
Examples:> SELECT regexp_extract(‘100-200’, ‘(\d+)-(\d+)’, 1);　　 100

Examples: > SELECT regexp_replace(‘100-200’, ‘(\d+)’, ‘num’); 　　num-num

12.repeat复制给的字符串n次
Examples: > SELECT repeat(‘123’, 2);　　123123

13.instr返回截取字符串的位置/locate
instr(str, substr) - Returns the (1-based) index of the first occurrence of substr in str.

Examples:> SELECT instr(‘SparkSQL’, ‘SQL’);　　6

Examples:> SELECT locate(‘bar’, ‘foobarbar’);　　 4

14.space 在字符串前面加n个空格
space(n) - Returns a string consisting of n spaces.

Examples:> SELECT concat(space(2), ‘1’);　　1

15.split以某些字符拆分字符串
split(str, regex) - Splits str around occurrences that match regex.

Examples:> SELECT split(‘oneAtwoBthreeC’, ‘[ABC]’);　　　　　　[“one”,“two”,“three”,""]

16.substr截取字符串，substring_index
Examples:

SELECT substr(‘Spark SQL’, 5);　　k SQL
SELECT substr(‘Spark SQL’, -3);　　SQL
SELECT substr(‘Spark SQL’, 5, 1);　　 k
SELECT substring_index(‘www.apache.org’, ‘.’, 2);　　 www.apache
17.translate 替换某些字符串为
Examples: > SELECT translate(‘AaBbCc’, ‘abc’, ‘123’);　　 A1B2C3

18.get_json_object
get_json_object(json_txt, path) - Extracts a json object from path.

Examples:> SELECT get_json_object(’{“a”:“b”}’, ‘$.a’);　　b

19.unhex
unhex(expr) - Converts hexadecimal expr to binary.

Examples:> SELECT decode(unhex(‘537061726B2053514C’), ‘UTF-8’);　　 Spark SQL

20.to_json
to_json(expr[, options]) - Returns a json string with a given struct value

Examples:

SELECT to_json(named_struct(‘a’, 1, ‘b’, 2));　　 {“a”:1,“b”:2}

SELECT to_json(named_struct(‘time’, to_timestamp(‘2015-08-26’, ‘yyyy-MM-dd’)), map(‘timestampFormat’, ‘dd/MM/yyyy’));　　 {“time”:“26/08/2015”}

SELECT to_json(array(named_struct(‘a’, 1, ‘b’, 2));　　 [{“a”:1,“b”:2}]

SELECT to_json(map(‘a’, named_struct(‘b’, 1)));　　{“a”:{“b”:1}}

SELECT to_json(map(named_struct(‘a’, 1),named_struct(‘b’, 2)));　　 {"[1]":{“b”:2}}

SELECT to_json(map(‘a’, 1));　　{“a”:1}

SELECT to_json(array((map(‘a’, 1))));　　[{“a”:1}]
时间日期：

一、获取当前时间
1.current_date获取当前日期
2018-04-09

2.current_timestamp/now()获取当前时间
2018-04-09 15:20:49.247

二、从日期时间中提取字段
1.year,month,day/dayofmonth,hour,minute,second
Examples:> SELECT day(‘2009-07-30’); 30

2.dayofweek (1 = Sunday, 2 = Monday, …, 7 = Saturday),dayofyear
Examples:> SELECT dayofweek(‘2009-07-30’);　　 5

Since: 2.3.0

3.weekofyear
weekofyear(date) - Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days.

Examples:> SELECT weekofyear(‘2008-02-20’);　　 8

4.trunc截取某部分的日期，其他部分默认为01
第二个参数 [“year”, “yyyy”, “yy”, “mon”, “month”, “mm”]

Examples:

SELECT trunc(‘2009-02-12’, ‘MM’);
2009-02-01
SELECT trunc(‘2015-10-27’, ‘YEAR’);
2015-01-01
5.date_trunc [“YEAR”, “YYYY”, “YY”, “MON”, “MONTH”, “MM”, “DAY”, “DD”, “HOUR”, “MINUTE”, “SECOND”, “WEEK”, “QUARTER”]
Examples:> SELECT date_trunc(‘2015-03-05T09:32:05.359’, ‘HOUR’);　　2015-03-05T09:00:00

Since: 2.3.0

6.date_format将时间转化为某种格式的字符串
Examples:> SELECT date_format(‘2016-04-08’, ‘y’);　　　　2016

三、日期时间转换
1.unix_timestamp返回当前时间的unix时间戳
Examples:

SELECT unix_timestamp();　　1476884637
SELECT unix_timestamp(‘2016-04-08’, ‘yyyy-MM-dd’);　　 1460041200
2.from_unixtime将时间戳换算成当前时间，to_unix_timestamp将时间转化为时间戳
Examples:

SELECT from_unixtime(0, ‘yyyy-MM-dd HH:mm:ss’);　　1970-01-01 00:00:00
SELECT to_unix_timestamp(‘2016-04-08’, ‘yyyy-MM-dd’);　　
1460041200
3.to_date/date将字符串转化为日期格式，to_timestamp（Since: 2.2.0）
SELECT to_date(‘2009-07-30 04:17:52’);　　2009-07-30
SELECT to_date(‘2016-12-31’, ‘yyyy-MM-dd’);　　 2016-12-31
SELECT to_timestamp(‘2016-12-31 00:12:00’);　　 2016-12-31 00:12:00
4.quarter 将1年4等分(range 1 to 4)
Examples:> SELECT quarter(‘2016-08-31’); 3

四、日期、时间计算
1.months_between两个日期之间的月数
months_between(timestamp1, timestamp2) - Returns number of months between timestamp1 and timestamp2.

Examples:> SELECT months_between(‘1997-02-28 10:30:00’, ‘1996-10-30’);　　3.94959677

add_months返回日期后n个月后的日期
Examples:> SELECT add_months(‘2016-08-31’, 1);　　2016-09-30

3.last_day(date),next_day(start_date, day_of_week)
Examples:

SELECT last_day(‘2009-01-12’);　　2009-01-31
SELECT next_day(‘2015-01-14’, ‘TU’);　　2015-01-20
4.date_add,date_sub(减)
date_add(start_date, num_days) - Returns the date that is num_days after start_date.

Examples:

SELECT date_add(‘2016-07-30’, 1);　　2016-07-31
5.datediff（两个日期间的天数）
datediff(endDate, startDate) - Returns the number of days from startDate to endDate.

Examples:> SELECT datediff(‘2009-07-31’, ‘2009-07-30’); 1

6.关于UTC时间
to_utc_timestamp
to_utc_timestamp(timestamp, timezone) - Given a timestamp like ‘2017-07-14 02:40:00.0’, interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, ‘GMT+1’ would yield ‘2017-07-14 01:40:00.0’.

Examples:> SELECT to_utc_timestamp(‘2016-08-31’, ‘Asia/Seoul’);　　2016-08-30 15:00:0

from_utc_timestamp
from_utc_timestamp(timestamp, timezone) - Given a timestamp like ‘2017-07-14 02:40:00.0’, interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, ‘GMT+1’ would yield ‘2017-07-14 03:40:00.0’.

Examples:> SELECT from_utc_timestamp(‘2016-08-31’, ‘Asia/Seoul’);　　2016-08-31 09:00:00