三种入库代码分析

Survive by day and develop by night.
talk for import biz , show your perfect code,full busy,skip hardness,make a better result,wait for change,challenge Survive.
happy for hardess to solve denpendies.

目录

在这里插入图片描述

概述

网络爬虫的是一个非常常见的需求。

需求:

设计思路

实现思路分析

1.单批量式

private void processSingle(List<Map<String, Object>> list1) {
		//1.遍历
		for (int i=0;i<list1.size();i++){
			//4.转化对应的map记录
			Map<String, Object> dataMap = list1.get(i);
			Map<Object,Object> dm=new HashMap<>();
			for (Map.Entry<String, Object> entry : dataMap.entrySet()) {

				dm.put(lineToHump(entry.getKey()),  entry.getValue());
				dm.put("description","描述");
				dm.put("year",2008);
				dm.put("trxId","交易ID");
				dm.put("contractNo","12332131");
				dm.put("deadline",12332L);
			}
			ArcDocument arcDocument =arcDocumentConvert.convert(dm);
			arcDocumentService.createDoc(arcDocument);
			log.info("Do create action, id={}"+" 记录数={}", arcDocument.getId(),i);
		}
	}

2.批量式

private void processBatch(List<Map<String, Object>> list1) {
		ArrayList<ArcDocument> docList=new ArrayList<>();
		//1.遍历
		for (int i=0;i<list1.size();i++){
			//4.转化对应的map记录
			Map<String, Object> dataMap = list1.get(i);
			Map<Object,Object> dm=new HashMap<>();
			for (Map.Entry<String, Object> entry : dataMap.entrySet()) {

				dm.put(lineToHump(entry.getKey()),  entry.getValue());
				dm.put("description","描述");
				dm.put("year",2008);
				dm.put("trxId","交易ID");
				dm.put("contractNo","12332131");
				dm.put("deadline",12332L);
			}
			ArcDocument arcDocument =arcDocumentConvert.convert(dm);
			docList.add(arcDocument);
			log.info("batch action, id={}"+" 记录数={}", arcDocument.getId(),i);
		}
		arcDocumentService.insertBatch(docList);
	}

异步代码式样:

 /**
     * Elasticsearch数据导入
     */
    public void addElasticsearchData(List<Map<String, Object>> addEsDataMapList) {
        //获取连接
        RestHighLevelClient client = restHighLevelClient();
        try {
            //创建请求
            BulkRequest bulkRequest = new BulkRequest();
            //创建index请求 千万注意,这个写在循环外侧,否则UDP协议会有丢数据的情况,看运气
            IndexRequest requestData = null;
            Map<Object,Object> dataMap=new HashMap<>();
            for (Map<String, Object> addEsDataMap : addEsDataMapList) {//添加数据
                    for (Map.Entry<String, Object> entry : addEsDataMap.entrySet()) {
                        dataMap.put(lineToHump(entry.getKey()),  entry.getValue());
                        dataMap.put("description","描述");
                        dataMap.put("year",2008);
                        dataMap.put("trxId","交易ID");
                        dataMap.put("contractNo","12332131");
                        dataMap.put("deadline",12332L);
                    }

                ArcDocument arcDocument =arcDocumentConvert.convert(dataMap);

                requestData=new IndexRequest(arc_document, "_doc", dataMap.get("arcId").toString()).source(arcDocument, XContentType.JSON);
                bulkRequest.add(requestData);
            }
            log.info("es同步数据数量:{}", bulkRequest.numberOfActions());
            //设置索引刷新规则
            bulkRequest.setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE);
            //分批次提交,数量控制
            if (bulkRequest.numberOfActions() >= 1) {
//                BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT);
//                log.info("es同步数据结果:{}", bulkResponse.hasFailures());

                BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT);
                if(bulkResponse.hasFailures()){
                    log.info("数据写入失败:{}",bulkResponse.buildFailureMessage());
                }else {
                    log.info("实时消息es写入成功");

                }
            }
        } catch (Exception e) {
            e.printStackTrace();
            log.error("es同步数据执行失败:{}", addEsDataMapList);
        } finally {
            try {
                client.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

拓展实现

这里参考:github:简单实现上述流程:
入门级实现:
: [部分源码实现]
: 源码实现

性能参数测试:

参考资料和推荐阅读

  1. 暂无
  2. https://blog.youkuaiyun.com/weixin_43702146/article/details/128494180
  3. https://blog.youkuaiyun.com/hellow0rd/article/details/108168060
  4. https://blog.youkuaiyun.com/u011250186/article/details/125483759
    5.https://blog.youkuaiyun.com/huakai_sun/article/details/79163298
    6.https://my.oschina.net/u/4269649/blog/3296267
    7.https://blog.youkuaiyun.com/Octopus21/article/details/128988806

欢迎阅读,各位老铁,如果对你有帮助,点个赞加个关注呗!~

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

迅捷的软件产品制作专家

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值