Table of Contents
因为Spark主要是对DataFrame的处理,所以有一个包org.apache.spark.sql.functions._包含了所有对DataFrame中的列操作,链接是:https://spark.apache.org/docs/2.4.3/api/scala/index.html#org.apache.spark.sql.functions$。在本系列接下来的几篇文章中就分别介绍这个包中的各类函数。
首先是日期时间函数(Date time functions)。我也将其分为几个小类来介绍,而且直接上例子,后面给出简单的说明。下面是例子中用到的数据:
val df = Seq(
(1, "2019-02-19", "2019-02-19 03:14:12.254", "02/19/2019", "20190219T031412.254"),
(1, "2019-06-13", "2019-06-13 05:38:53.892", "06/13/2019", "20190613T053853.892"),
(4, "9999-12-31", "9999-12-31 12:23:41.182", "12/31/9999", "99991231T122341.182"),
(5, "2001-01-25", "2001-01-25 09:56:24.712", "01/25/2001", "20010125T095624.712"),
(0, "2019-12-19", "2019-12-19 03:51:42.671", "12/19/2019", "20191219T035142.671"),
(7, "1890-11-09", "1890-11-09 07:45:45.012", "11/09/1890", "18901109T074545.012"),
(7, "1906-08-06", "1906-08-06 01:17:32.932", "08/06/1906", "19060806T011732.932")
).toDF("Id", "Date", "Timestamp", "ExDate", "ExTimestamp")
这里说明一下,在Spark中date和timestamp的标准格式就是上面数据中的列Date和列Timestamp。上面例子数据中这两列显然都是string的类型,而Spark的日期时间函数能够处理这些标准格式的date和timestamp信息的string类型数据,而且函数的结果会是自动转换后的date或timestamp类型数据。
操作日期的函数
/*
add_months
date_add
date_sub
datediff
last_day
months_between
next_day
*/
val rstDFOne = df
.select("Date")
.withColumn("AddOneMonth", add_months('Date, 1))
.withColumn("DateAddOne", date_add('Date, 1))
.withColumn("DateSubTwo", date_sub('Date, 2))
.withColumn("DateDiff", datediff('Date, lit("2019-02-15")))
.withColumn("MonthBetween", months_between('Date, lit("2019-02-15")))
.withColumn("MonthBtw(notRound)", months_between('Date, lit("2019-02-15"), false))
.withColumn("LastDay(OfMonth)", last_day('Date))
.withColumn("Next(Saturday)Day", next_day('Date, "Friday"))
前面的注释中列出了所有的这一小类函数。它们都是对Date进行操作,结果如下:
scala> rstDFOne.show(false)
+----------+-----------+-----------+----------+--------+--------------+-------------------+----------------+-----------------+
|Date |AddOneMonth|DateAddOne |DateSubTwo|DateDiff|MonthBetween |MonthBtw(notRound) |LastDay(OfMonth)|Next(Saturday)Day|
+----------+-----------+-----------+----------+--------+--------------+-------------------+----------------+-----------------+
|2019-02-19|2019-03-19 |2019-02-20 |2019-02-17|4 |0.12903226 |0.12903225806451613|2019-02-28 |2019-02-22 |
|2019-06-13|2019-07-13 |2019-06-14 |2019-06-11|118 |3.93548387 |3.935483870967742 |2019-06-30 |2019-06-14 |
|9999-12-31|10000-01-31|10000-01-01|9999-12-29|2914954 |95770.51612903|95770.51612903226 |9999-12-31 |10000-01-07 |
|2001-01-25|2001-02-25 |2001-01-26 |2001-01-23|-6595 |-216.67741935 |-216.67741935483872|2001-01-31 |2001-01-26 |
|2019-12-19|2020-01-19 |2019-12-20 |2019-12-17|307