
Spark生态系统
文章平均质量分 62
维维weiwei
热衷于软件开发行业
展开
-
Spark中transformation算子的操作
package com.uplooking.bigdata.core.p2; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api原创 2017-04-09 14:46:09 · 733 阅读 · 0 评论 -
Spark之的Transformation
package com.java;import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaPairRDD;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.api.java.JavaSparkContext;i原创 2017-04-09 14:34:24 · 245 阅读 · 0 评论 -
SparkSQL之保存数据
public static void main(String[] args) { SparkConf conf = new SparkConf(); conf.set("spark.app.name", "JavaSparkSQLSave"); conf.set("spark.master", "local"); JavaSparkContext sc = new原创 2017-04-20 09:26:44 · 3520 阅读 · 0 评论 -
SparkSQL之连接thirftserver
def main(args: Array[String]): Unit = { //1、注册驱动 classOf[org.apache.hive.jdbc.HiveDriver] //2、获得连接Connection val url = "jdbc:hive2://master:10000/default" val username = "root" val passwo原创 2017-04-20 09:29:31 · 456 阅读 · 0 评论 -
SparkSQL之函数的操作
package com.uplooking.bigdata.sql.p3import com.uplooking.bigdata.utils.MySparkUtilimport org.apache.spark.SparkConfimport org.apache.spark.sql.{Column, SQLContext}/** * SparkSQL函数的操作 */obje原创 2017-04-20 09:33:57 · 1076 阅读 · 0 评论 -
SparkSQL之Hive操作
package com.uplooking.bigdata.sql.p2import org.apache.spark.{SparkConf, SparkContext}import org.apache.spark.sql.hive.HiveContext/** * sparksql集成hive的基本操作 * 需求: * 把在jdbc操作过程使用hive来一原创 2017-05-10 22:59:02 · 5188 阅读 · 0 评论 -
SparkStreaming之HDFS操作
public static void main(String[] args) { SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("JavaSparkStreamingHDFS"); JavaSparkContext sc = new JavaSparkContext(conf); Java原创 2017-04-20 18:12:42 · 2815 阅读 · 0 评论 -
SparkStreaming之TCP流式处理(netcat)
public static void main(String[] args) { SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("JavaSparkStreamingNC"); JavaStreamingContext jsc = new JavaStreamingContext(conf, Du原创 2017-04-20 18:20:59 · 860 阅读 · 0 评论 -
SparkStreaming整合Kafka
package com.uplooking.bigdata.streaming.p2;import org.apache.kafka.common.serialization.StringDeserializer;import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaSparkContext;原创 2017-05-08 15:27:03 · 400 阅读 · 0 评论 -
SprakStreaming整合Kafka2
package com.uplooking.bigdata.streaming.p2;import kafka.serializer.StringDecoder;import org.apache.spark.SparkConf;import org.apache.spark.streaming.Durations;import org.apache.spark.streami原创 2017-05-08 15:29:43 · 612 阅读 · 0 评论 -
Spark操作ElasticSearch
def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster("local").setAppName("ScalaSparkElasticSearch") /** * 根据es官网的描述,集成需要设置: * es.index.auto.create--->true *原创 2017-04-20 09:37:16 · 4083 阅读 · 0 评论 -
Spark之TopN
package com.uplooking.bigdata.core.p3; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api原创 2017-04-09 14:42:17 · 509 阅读 · 0 评论 -
Spark之BroadCast
package com.uplooking.bigdata.core.p3; import org.apache.spark.SparkConf; import org.apache.spark.SparkContext; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.Ja原创 2017-04-09 14:43:29 · 1546 阅读 · 0 评论 -
Spark之二次排序
package com.uplooking.bigdata.core.p3; import com.uplooking.bigdata.domain.SecondSort; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spa原创 2017-04-09 14:44:46 · 383 阅读 · 0 评论 -
Spark之WordCount
package com.uplooking.bigdata.core.p1; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api原创 2017-04-09 14:49:23 · 427 阅读 · 0 评论 -
Spark之Action
package com.uplooking.bigdata.core.p2; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org原创 2017-04-09 14:47:20 · 318 阅读 · 0 评论 -
SparkSQL之缓存表
val conf = new SparkConf().setMaster("local").setAppName("ScalaDataFrameOps") val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc) val listRDD = sc.parallelize(List("zhangsan 13 1原创 2017-04-19 13:27:26 · 2895 阅读 · 0 评论 -
SparkSQL之创建DataFrame
///定义一个RDDJavaRDD listRDD = sc.parallelize(Arrays.asList( new Person("张三", 13, 168.8), new Person("李四", 14, 169.8), new Person("王五", 15, 175.8), new Person("赵六", 16, 1原创 2017-04-19 13:24:38 · 820 阅读 · 0 评论 -
Spark之广播变量
private static void broadCastOps(JavaSparkContext sc) { //加载user表到rdd JavaRDD linesRDD = sc.parallelize(Arrays.asList("1,3,张三,河北", "2,1,李四,北京", "3,0,王五,天津", "4,1,赵六,广东")); JavaRDD sexLineR原创 2017-04-19 14:38:17 · 3040 阅读 · 0 评论 -
SparkSQL之JDBC
def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("ScalaSparkSQLJDBCOps").setMaster("local") conf.set("spark.sql.shuffle.partitions", "1") val sc = new SparkContext(c原创 2017-04-19 13:54:02 · 425 阅读 · 0 评论 -
SparkSQL之排序,保存数据
def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("ScalaSparkSQLJson").setMaster("local") conf.set("spark.sql.shuffle.partitions", "1") val sc = new SparkContext(conf原创 2017-04-19 13:42:42 · 3190 阅读 · 0 评论 -
SparkSQL之查询,过滤
def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster("local").setAppName("ScalaSparkDataFrameOps") val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc)原创 2017-04-19 13:39:00 · 4079 阅读 · 0 评论 -
SparkSQL之读取数据
def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster("local").setAppName("ScalaSparkSQL") val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc) //1.读取文件格原创 2017-04-19 13:35:50 · 1247 阅读 · 0 评论 -
Spark 之RDD API大全
package scalaimport org.apache.spark.{SparkConf, SparkContext}/** * Created by root on 17-4-11. */object SparkAPI extends App { val conf = new SparkConf().setAppName("SparkTransformationTes原创 2017-04-11 20:01:59 · 475 阅读 · 0 评论 -
Spark作业运行原理
1.Spark启动集群后,Master节点会获取各个Work节点的资源信息,内存和CPU数量信息。2.Word节点要通过心跳机制去连接master节点,同时要定时向master节点汇报自身的资源情况。3.master收到work的汇报消息后给work简单反馈。4.driver节点向master节点提交spark作业,发送注册通知,需要master为application预留相应的资源。原创 2017-05-10 10:18:50 · 627 阅读 · 0 评论