Spark2.1.0 向MongoDB写入json数据

本文介绍如何使用Spark从外部文件读取JSON数据,并通过DataFrame将数据写入MongoDB的过程。涉及Spark配置、数据读取及转换为特定对象类型,最终实现数据存储。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

需求:

1、从外部文件读取json数据
2、根据需求拆分数据
3、利用DataFrame直接写入MongoDB

Spark-Mongodb官网写入MongoDB实例
采用官网实例的方案实验,不成功,且json数据中部分字段为空,读取报错。

import com.mongodb.spark.MongoSpark;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.sql.SparkSession;
import org.bson.Document;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.json.JSONObject;
import com.mongodb.spark.config.WriteConfig;
import org.apache.spark.api.java.function.Function;
import java.util.HashMap;
import java.util.Map;
import mongo.Book;
public class Mongo {
    public static void main(String[] args) {
        SparkSession spark = SparkSession.builder()
                .appName("mongo")
                .master("local[4]")
                .config("spark.mongodb.input.url","mongodb://127.0.0.1/spark.mongo")
                .config("spark.mongodb.output.uri","mongodb://127.0.0.1/spark.mongoœ")
                .getOrCreate();
        JavaRDD<String> input = spark.sparkContext()
                .textFile("/Users/yangyang/Desktop/json/part-r-00003.txt",1)
                .toJavaRDD().map(x -> x.split("____")[1]);
        Map<String, String> writeOverrides = new HashMap<String, String>();
        writeOverrides.put("collection", "spark");
        writeOverrides.put("writeConcern.w", "majority");
        WriteConfig writeConfig = WriteConfig.create(spark).withOptions(writeOverrides);

        JavaRDD<Book> books = input.map(new Function<String, Book>() {
            public Book call(String s) throws Exception {
                JSONObject jsons = new JSONObject(s);
                Book book = new Book();

                book.setBookId(jsons.get("bookId").toString());
                book.setContent(jsons.get("content").toString());
                book.setContentStartPos(jsons.get("contentStartPos").toString());
                book.setCoord(jsons.get("coord").toString());
                book.setId(jsons.get("id").toString());
                book.setLineColor(jsons.get("lineColor").toString());
                book.setLineType(jsons.get("lineType").toString());
                book.setLineWidth(jsons.get("lineWidth").toString());
                book.setNoteCatalog(jsons.get("noteCatalog").toString());
                book.setNoteLabels(jsons.get("noteLabels").toString());
                book.setNoteOrigin(jsons.get("noteOrigin").toString());
                book.setNotePath(jsons.get("notePath").toString());
                book.setNotePostil(jsons.get("notePostil").toString());
                book.setNoteType(jsons.get("noteType").toString());
                book.setPageAngle(jsons.get("pageAngle").toString());
                book.setPageHeight(jsons.get("pageHeight").toString());
                book.setPageIndex(jsons.get("pageIndex").toString());
                book.setPageWidth(jsons.get("pageWidth").toString());
                book.setPdfId(jsons.get("pdfId").toString());
                book.setUpdateTime(jsons.get("updateTime").toString());
                book.setUserName(jsons.get("userName").toString());
                book.setSourceType(jsons.get("sourceType").toString());
                return book;
            }
        });
        MongoSpark.save(books,Book.class);
        spark.close();
    }
}

研究了下MongoSpark的API,API里就有转化json RDD的方法。

import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import com.mongodb.spark.MongoSpark;

public class DataSetMongo {
    public static void main(String[] args) {
        SparkSession spark = SparkSession.builder()
                .appName("mongo")
                .master("local[4]")
                .config("spark.mongodb.input.uri", "mongodb://127.0.0.1/spark.mongo")
                .config("spark.mongodb.output.uri", "mongodb://127.0.0.1/spark.momgo")
                .getOrCreate();
        JavaRDD<String> input = spark.sparkContext()
                .textFile("/Users/yangyang/Desktop/json/part-r-00003.txt",1)
                .toJavaRDD().map(x -> x.split("____")[1]);
        Dataset<Row> books = MongoSpark.read(spark).json(input);
        books.show();
        MongoSpark.write(books).option("collection", "mongo").mode("overwrite").save();; 
        spark.close();
    }
}

运行后MongoDB数据库数据截图如下:
这里写图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值