
Spark基础
小胖超凶哦!
睡觉了,别学了!
展开
-
checkpoint
checkpoint原创 2022-06-17 19:56:29 · 294 阅读 · 0 评论 -
缓存cache
缓存cache原创 2022-06-16 21:48:29 · 375 阅读 · 0 评论 -
mapPartitions
mapPartitions原创 2022-06-16 20:10:49 · 239 阅读 · 0 评论 -
人体的指标
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-mllib_2.11</artifactId> <version>2.4.5</version></dependency>package com.shujia.mllibimport org.apache.spark.ml.{featur..原创 2022-05-25 21:08:51 · 107 阅读 · 0 评论 -
StructuredStreaming
package com.shujia.streamingimport org.apache.spark.sql.streaming.OutputModeimport org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}object Demo05StructuredStreaming { def main(args: Array[String]): Unit = { //创建SparkSession val spar.原创 2022-05-24 10:58:47 · 109 阅读 · 0 评论 -
缉查布控操作
package com.shujia.streamingimport org.apache.spark.broadcast.Broadcastimport org.apache.spark.sql.SparkSessionimport org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream}import org.apache.spark.streaming.{Durations, StreamingContext}...原创 2022-05-20 21:11:26 · 327 阅读 · 0 评论 -
滑动窗口操作
package com.shujia.streamingimport org.apache.spark.sql.SparkSessionimport org.apache.spark.streaming.dstream.DStreamimport org.apache.spark.streaming.{Durations, StreamingContext}object Demo03Window { def main(args: Array[String]): Unit = { /.原创 2022-05-20 19:22:30 · 292 阅读 · 0 评论 -
Action算子、Pi
package com.shujia.coreimport com.shujia.core.Demo10Join.Studentimport org.apache.spark.{SparkConf, SparkContext}import org.apache.spark.rdd.RDDobject Demo16Action { def main(args: Array[String]): Unit = { //常见的Action算子 //foreach take col.原创 2022-05-19 21:24:00 · 227 阅读 · 0 评论 -
有状态算子
package com.shujia.streamingimport org.apache.spark.sql.SparkSessionimport org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream}import org.apache.spark.streaming.{Durations, StreamingContext}object Demo01WordCountOnStreaming { d...原创 2022-05-19 16:28:16 · 264 阅读 · 0 评论 -
SparkStreaming介绍及开发环境搭建
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.11</artifactId> <version>2.4.5</version></dependency>package com.shujia.streamingimport org.apache.spa...原创 2022-05-19 11:17:34 · 525 阅读 · 0 评论 -
aggregateByKey
aggregateByKey原创 2022-05-19 20:02:40 · 279 阅读 · 0 评论 -
GroupByKey VS ReduceByKey
package com.shujia.coreimport org.apache.spark.rdd.RDDimport org.apache.spark.{SparkConf, SparkContext}object Demo11Cartesian { def main(args: Array[String]): Unit = { //创建Spark Context val conf: SparkConf = new SparkConf() conf.setAppN.原创 2022-05-18 20:48:02 · 319 阅读 · 0 评论 -
决定RDD分区数因素、关联
package com.shujia.coreimport org.apache.spark.{SparkConf, SparkContext}import org.apache.spark.rdd.RDDobject Demo09Union { def main(args: Array[String]): Unit = { //创建Spark Context val conf: SparkConf = new SparkConf() conf.setAppName(.原创 2022-05-18 19:59:14 · 343 阅读 · 0 评论 -
SparkOnHive
package com.shujia.sqlimport org.apache.spark.sql.expressions.Windowimport org.apache.spark.sql.{DataFrame, SparkSession}object Demo06SparkOnHive { def main(args: Array[String]): Unit = { /** * 通过enableHiveSupport()可以开启Hive的支持 * 需要在po.原创 2022-05-18 11:03:47 · 355 阅读 · 0 评论 -
Spark SQL写代码的几种方式
package com.shujia.sqlimport org.apache.spark.sql.expressions.Windowimport org.apache.spark.sql.{DataFrame, Dataset, Row, SparkSession}object Demo04DSL { def main(args: Array[String]): Unit = { val spark: SparkSession = SparkSession .buil.原创 2022-05-18 10:25:56 · 931 阅读 · 0 评论 -
Burks练习题、JD Log练习题
公司代码,年度,1月-------------------------12月的收入金额burk,year,tsl01,tsl02,tsl03,tsl04,tsl05,tsl06,tsl07,tsl08,tsl09,tsl10,tsl11,tsl12853101,2010,100200,25002,19440,20550,14990,17227,40990,28778,19088,29889,10990,20990853101,2011,19446,20556,14996,17233,40996,2..原创 2022-05-17 17:00:10 · 249 阅读 · 0 评论 -
DSL实现union、join、case when
package com.shujia.sqlimport org.apache.spark.sql.{DataFrame, Dataset, Row, SparkSession}object Demo04DSL { def main(args: Array[String]): Unit = { val spark: SparkSession = SparkSession .builder() .appName("Demo04DSL") .master.原创 2022-05-17 15:46:38 · 728 阅读 · 0 评论 -
DSL的基本使用
package com.shujia.sqlimport org.apache.spark.sql.{DataFrame, SparkSession}object Demo04DSL { def main(args: Array[String]): Unit = { val spark: SparkSession = SparkSession .builder() .appName("Demo04DSL") .master("local") .原创 2022-05-16 22:00:19 · 834 阅读 · 0 评论 -
Spark SQL常用Source
package com.shujia.sqlimport org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}object Demo03SourceAPI { //Spark SQL中常见的DataSourceAPI def main(args: Array[String]): Unit = { val spark: SparkSession = SparkSession .builder() .a.原创 2022-05-16 20:27:17 · 327 阅读 · 0 评论 -
SparkSQL环境搭建、SQL VS DSL
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.11</artifactId> <version>2.4.5</version></dependency>package com.shujia.sqlimport org.apache.spark.SparkContext...原创 2022-05-16 15:01:22 · 573 阅读 · 0 评论 -
常用的算子
package com.shujia.coreimport org.apache.spark.rdd.RDDimport org.apache.spark.{SparkConf, SparkContext}object Demo04FlatMap { def main(args: Array[String]): Unit = { //创建Spark Context val conf: SparkConf = new SparkConf() conf.setAppNam.原创 2022-05-16 17:09:55 · 282 阅读 · 0 评论 -
Spark On Yarn History Server配置、转换算子vs行为算子
[root@master spark-2.4.5]# cd conf/[root@master conf]# lsdocker.properties.template slaves.templatefairscheduler.xml.template spark-defaults.conf.templatelog4j.properties.template spark-env.shmetrics.properties.template spark-env.sh.template..原创 2022-05-15 21:29:11 · 190 阅读 · 0 评论 -
提交任务到Yarn、yarn-client VS yarn-cluster
package com.shujia.coreimport org.apache.spark.rdd.RDDimport org.apache.spark.{SparkConf, SparkContext}object Demo02WordCountOnYarn { /** * 将程序提交到Yarn上的流程: * 1、setMaster不能指定为local本地方式运行 * 2、将输入输出路径换成HDFS的路径 * 3、将程序打成jar包(如果运行时出现依赖问题,可以添.原创 2022-05-15 20:14:13 · 240 阅读 · 0 评论 -
Standalone、On Yarn模式
q原创 2022-05-12 21:00:05 · 119 阅读 · 0 评论 -
RDD五大特性
package com.shujia.coreimport org.apache.spark.rdd.RDDimport org.apache.spark.{SparkConf, SparkContext}object Demo01WordCount { def main(args: Array[String]): Unit = { //初始化Spark环境 //创建Spark配置对象 val conf: SparkConf = new SparkConf() .原创 2022-05-12 14:42:46 · 133 阅读 · 0 评论 -
WordCount代码实现及分析
q原创 2022-05-12 14:15:52 · 327 阅读 · 0 评论 -
Spark开发环境搭建
<dependencies> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>2.11.12</version> </dependency> <dependency> ..原创 2022-05-12 09:54:13 · 375 阅读 · 0 评论