Scala笔记
杨羊不是羊
这个作者很懒,什么都没留下…
展开
专栏收录文章
- 默认排序
- 最新发布
- 最早发布
- 最多阅读
- 最少阅读
-
spark将DF的一列转成list
使用collect之后 要取map里第0个元素select("ad_id").collect().map(_(0)).toList原创 2021-09-07 17:16:01 · 3881 阅读 · 0 评论 -
spark之拆分train/test/vali集合
val weightList = Array(1 - validationRatio - testRatio, validationRatio, testRatio)val dsList = result.randomSplit(weightList, splitSeed)val dfList = dsList.map(_.toDF)val trainDF = dfList(0)val validDF = dfList(1)val testDF = dfList(2)如果用sql的row_nu原创 2020-07-27 19:29:40 · 389 阅读 · 0 评论 -
spark之交互页面避免打印无效连接日志
import org.apache.log4j.Loggerimport org.apache.log4j.LevelLogger.getLogger(“org”).setLevel(Level.OFF)Logger.getLogger(“akka”).setLevel(Level.OFF)原创 2020-09-07 17:50:34 · 156 阅读 · 0 评论 -
scala之按行拼接
scala之拼接原创 2020-07-24 17:22:23 · 536 阅读 · 0 评论 -
Scala之窗口函数排序
scala窗口函数这排序rankimport org.apache.spark.sql.expressions.Windowimport spark.implicits._val testDF =Seq( ("A", 50), ("B", 39), ("A", 48), ("A", 48), ("B", 35), ("C", 42), ("C", 60), ("C", 45), ("C", 52), ("C", 52)).toDF("name","score"原创 2020-05-27 21:19:24 · 766 阅读 · 0 评论 -
Scala之判断hdfs路径
import java.net.URIval putPath = new Path(modelPath)val conf = new Configuration()val hdfs = FileSystem.newInstance(URI.create(modelPath),conf)if (hdfs.exits(putPath)){println("1")}else{println("2")}if (hdfs.getFileStatus(putPath).isDirectory){pr原创 2020-05-18 21:15:12 · 1505 阅读 · 0 评论 -
Scala之求差集,使用RDD
#rdd求差集val monthActiveImei = sql(""“select imei from ad_tag.f_tag_month_active_user where dayno = 20200512"”")val kuaishouYearImei = sql(""“select imei from ad_tmp.test_tag_video_0513_02_kuaishou”"")val monthActiveImeiRdd=monthActiveImei.rdd.map(x =>原创 2020-05-14 21:07:57 · 664 阅读 · 0 评论 -
Scala之udf(举例内积、交集)
Scala之udf1.两个list(String)做内积定义内积udf,两个list做内积,list不能直接toInt,需要map toInt。//定义内积udfdef getInner(listNameA:String,listNameB:String):Int={ val listIntA = listNameA.split(",").map(_.toInt) val listIn...原创 2020-03-28 23:15:10 · 1221 阅读 · 0 评论
分享