Spark
drl_blogs
这个作者很懒,什么都没留下…
展开
专栏收录文章
- 默认排序
- 最新发布
- 最早发布
- 最多阅读
- 最少阅读
-
spark-2.2.2-bin-hadoop2.7 安装
1.上传spark-2.2.2-bin-hadoop2.7.tgz2.解压文件tar -zxvf spark-2.2.2-bin-hadoop2.7.tgz -C /usr/local/3.进入conf/下把spark-env.sh.template改为spark-env.shcd /usr/local/spark-2.2.2-bin-hadoop2.7/conf/mv s...原创 2019-06-14 09:23:37 · 3554 阅读 · 0 评论 -
示例:Spark Streaming+Flume整合
文章目录pushpullpushimport org.apache.log4j.{Level, Logger}import org.apache.spark.SparkConfimport org.apache.spark.streaming.dstream.ReceiverInputDStreamimport org.apache.spark.streaming.flume.{Flum...原创 2019-07-01 14:49:50 · 288 阅读 · 0 评论 -
示例:Spark Streming+Kafka整合 (spark-streaming-kafka-0-8_2.11)
文章目录ReceiverDirectReceiverimport org.apache.log4j.{Level, Logger}import org.apache.spark.SparkConfimport org.apache.spark.streaming.kafka.KafkaUtilsimport org.apache.spark.streaming.{Seconds, Str...原创 2019-07-01 17:24:20 · 2368 阅读 · 0 评论 -
示例:Log4j日志模拟数据整合 Flume+Kafka+Spark Streaming
flume_kafka.confagent1.sources = avro-sourceagent1.channels = logger-channelagent1.sinks = kafka-sink# define sourceagent1.sources.avro-source.type = avroagent1.sources.avro-source.bind = 0....原创 2019-07-01 21:22:06 · 313 阅读 · 0 评论 -
【error】SparkUI端口被占用
ERROR ui.SparkUI: Failed to bind SparkUIjava.net.BindException: Address already in use: bind: Service 'SparkUI' failed after 16 retries (starting from 4040)! Consider explicitly setting the appropri...原创 2019-07-02 14:35:00 · 1925 阅读 · 1 评论 -
示例:python模拟日志生成+Flume+Kafka+Spark Streaming
生成模拟数据编写 generate_log.py#coding=UTF-8import randomimport timeurl_paths=[ "class/112.html", "class/128.html", "class/145.html", "class/130.html", "class/146.html", "cla...原创 2019-07-02 16:44:44 · 1083 阅读 · 0 评论 -
Spark:【error】无法解析重载方法“agg”
错误信息:Cannot resolve overloaded method 'agg'解决方法:导入包import org.apache.spark.sql.functions._原创 2019-06-29 13:13:50 · 2942 阅读 · 0 评论 -
示例:Spark Streming+Kafka整合(spark-streaming-kafka-0-10_2.11)
import org.apache.kafka.common.serialization.StringDeserializerimport org.apache.log4j.{Level, Logger}import org.apache.spark.SparkConfimport org.apache.spark.streaming.kafka010.ConsumerStrategies....原创 2019-07-05 11:01:03 · 2948 阅读 · 0 评论 -
Spark Core:创建RDD
#方式一:把数据并行化分片到节点sc.parallelize(Array(1,2,3,4))#方式二:把数据并行化分片到节点sc.makeRDD(Array(1,2,3))#方式三:可以指定RDD的存放位置#创建一个list集合val list1=List((1,List("Hello","Word","spark")),(2,List("at","as")))#将List集合放入R...原创 2019-06-17 16:35:16 · 213 阅读 · 0 评论 -
Spark Core:RDD编程Transformation
文章目录创建RDD操作map[U: ClassTag](f: T => U): RDD[U]filter(f: T => Boolean): RDD[T]flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U]mapPartition[U: ClassTag]( f: Iterator[T] => Iterator[U...原创 2019-06-19 16:53:34 · 726 阅读 · 0 评论 -
spark-2.2.2-bin-hadoop2.7 HA 配置
安装spark-2.2.2-bin-hadoop2.7:https://blog.youkuaiyun.com/drl_blogs/article/details/919483941.编辑 主节点conf/spark-env.shexport JAVA_HOME=/usr/local/jdk1.8.0_211# export SPARK_MASTER_HOST=hadoop01# export ...原创 2019-06-14 11:08:28 · 353 阅读 · 0 评论 -
Spark Core:RDD编程Action
Action目录reduce(f: (T, T) => T): Tcollect(): Array[T]count(): Longfirst(): Ttake(num: Int): Array[T]takeOrdered(num: Int)(implicit ord: Ordering[T])aggregate[U: ClassTag](zeroValue: U)(seqOp: (U, T)...原创 2019-06-19 16:53:59 · 212 阅读 · 0 评论 -
Spark Core:数据输入输出
文本文件输入输出读取文本文件scala> sc.textFile("./wc.txt")res4: org.apache.spark.rdd.RDD[String] = ./wc.txt MapPartitionsRDD[5] at textFile at <console>:25保存文本文件scala> res4.saveAsTextFile("./test"...原创 2019-06-19 16:54:20 · 546 阅读 · 0 评论 -
Spark SQL:RDD、DataFrames、DataSet之间的转换
文章目录RDD转DataFramesRDD转DataSetDataFrame/Dataset转RDDDataFrame转DatasetDataset转DataFramepeople.txtMichael,29Andy,30Justin,19RDD转DataFramesscala> val rdd=sc.textFile("people.txt")rdd: org.apache...原创 2019-06-19 16:55:28 · 299 阅读 · 0 评论 -
Spark:常用JAR包集成(pom.xml)
<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ma...原创 2019-06-19 23:17:11 · 728 阅读 · 0 评论 -
Spark:【error】远程调试 root:supergroup:drwxr-xr-x
Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=xxxx, access=WRITE, inode="/test/out/_temporary/0":root:supergroup:drwxr-xr-x.....................原创 2019-06-15 16:27:46 · 884 阅读 · 0 评论 -
Spark :【error】System memory 259522560 must be at least 471859200
java.lang.IllegalArgumentException: System memory 259522560 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration.。。。。。。。...原创 2019-06-20 15:36:29 · 484 阅读 · 0 评论 -
Spark:【error】DataFrames转DataSet失败
Error:(45, 63) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for seria...原创 2019-06-20 17:01:13 · 339 阅读 · 0 评论 -
示例:Spark SQL自定义函数(UDF/UDAF)
文章目录UDF函数UDAF函数UDF函数scala> val df=spark.read.json("people.json")df: org.apache.spark.sql.DataFrame = [age: bigint, name: string]scala> df.show+---+------+|age| name|+---+------+| 30| ...原创 2019-06-20 17:12:00 · 1621 阅读 · 0 评论 -
示例:统计IP地址对应的省份,并把结果存入到mysql
数据ip.txtaccess.logimport java.io.{BufferedReader, FileInputStream, InputStreamReader}import java.sql.{Connection, DriverManager, PreparedStatement}import org.apache.spark.{SparkConf, SparkCont...原创 2019-06-25 10:20:58 · 956 阅读 · 3 评论
分享