spark操作es

  1. 查询elasticsearch数据

elasticsearch用1.7jdk,spark装的1.8jdk,搞了两天spark终于能访问es了,还要下载elasticsearch_spark的jar包。

新建maven工程,添加依赖:

<dependency>

            <groupId>org.apache.spark</groupId>

            <artifactId>spark-core_2.11</artifactId>

            <version>2.2.0</version>

        </dependency>

        <dependency>

            <groupId>org.apache.spark</groupId>

            <artifactId>spark-sql_2.11</artifactId>

            <version>2.2.0</version>

            <scope>provided</scope>

        </dependency>      

        <dependency>

            <groupId>org.elasticsearch</groupId>

            <artifactId>elasticsearch-spark-20_2.11</artifactId>

            <version>5.5.1</version>

            <exclusions>

                <exclusion>

                    <artifactId>log4j-over-slf4j</artifactId>

                    <groupId>org.slf4j</groupId>

                </exclusion>

            </exclusions>

        </dependency>

其实我们既可以通过elasticsearch-spark包访问es,也可以通过spark-sql以sql方式访问es。我们先看看第一种方式。

 

    1. 第一种JavaEsSpark方式

上代码:

String query = "{\"query\":{\"term\":{\"bizCode\": \"140000000040\"}}}";   

    JavaPairRDD<String, Map<String, Object>>  esRDD = JavaEsSpark.esRDD(sc, "cmall_order/order",query);

    long count = esRDD.count();

System.out.println("total count:"+count);

query有3种方式:

# uri (or parameter) query

es.query = ?q=costinl

 

# query dsl

es.query = { "query" : { "term" : { "user" : "costinl" } } }

 

# external resource

es.query = org/mypackage/myquery.json

官方建议采用第三种,将查询条件写在json文件中,es.query指定json文件路径即可,json文件作为资源打包在jar中。

    1. 第二种spark sql方式
private static void read_es_sql(JavaSparkContext sc){

        SparkSession spark = SparkSession

                  .builder()

                  .appName("Java Spark SQL basic example")

                  .config("pushdown", "true")

        .config("es.nodes", "10.37.154.83")

        .config("es.port", "9200")

                  .getOrCreate();

       

        Dataset<Row> rows = spark.read().format("org.elasticsearch.spark.sql").load("mydb/order")

.select(col("skuId"),col("orderId")).filter(col("skuId").equalTo("140000000040"));



        rows.show();

        long count = rows.count();     

        System.out.println("total count:"+count);      

       

        Encoder<TestBean> testEncoder = Encoders.bean(TestBean.class);

       

        Dataset<Row>  df=spark.read().format("org.elasticsearch.spark.sql").load("test/testbean");

        df.createOrReplaceGlobalTempView("table1");

        Dataset<TestBean> selects = spark.sql("SELECT myid,name,age FROM global_temp.table1 where age>13").as(testEncoder);

        selects.show();

       

        TestBean bean1 = new TestBean("4","hello",1222);

        Dataset<TestBean> javaBeanDS = spark.createDataset(

                  Collections.singletonList(bean1),

                  testEncoder

                );

       

        spark.close();     

}

 

  1. 写入elasticsearch数据
    1. 第一种是通过JavaEsSpark方式写入

SparkConf sparkConf = new SparkConf();

        sparkConf.setAppName("Demo_Mysql2");

        sparkConf.set("pushdown", "true");

        sparkConf.set("es.nodes", "10.37.154.83");

        sparkConf.set("es.port", "9200");

       

        JavaSparkContext sc = null;

        try {

            sc = new JavaSparkContext(sparkConf);

}catch (Exception e) {

            e.printStackTrace();

        } finally {

            if (sc != null) {

                sc.stop();

            }



        }



private static void insert_es(JavaSparkContext sc) {

        TestBean b1 = new TestBean("1","name1",12);

        TestBean b2 = new TestBean("2","name3",34);

        JavaRDD<TestBean> javaRDD  = sc.parallelize(  ImmutableList.of(b1, b2));

        Map<String,String> map=new HashMap<String,String>();

        map.put("es.mapping.id" , "myid");

        JavaEsSpark.saveToEs(javaRDD, "test/testbean",map);  

       

    }

第二种是通过spark sql方式写入
//创建SparkSession

SparkSession spark = SparkSession.builder()

                  .appName("Java Spark SQL basic example")

                  .config("pushdown", "true")

        .config("es.nodes", "10.37.154.83")

        .config("es.port", "9200").getOrCreate();

TestBean bean1 = new TestBean("4","hello",1222);

        Dataset<TestBean> javaBeanDS = spark.createDataset(

                  Collections.singletonList(bean1),

                  testEncoder

                );

       

        // set primary key for es

        Map<String,String> map=new HashMap<String,String>();

        map.put("es.mapping.id" , "myid");

            javaBeanDS.write().mode(SaveMode.Append).format("org.elasticsearch.spark.sql").options(map).save("test/testbean");

        spark.close();     

 

转载于:https://my.oschina.net/u/778683/blog/1828796

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值