Elastic Search 对Hadoop数据进行交互分析

本文介绍如何使用ES-Hadoop在Elasticsearch与Hadoop间高效移动数据,支持实时决策及深度分析。涵盖产品推荐、基因组测序等应用场景,特别强调了与Spark的本地集成,确保数据安全的同时支持多种Hadoop版本。

可以使用ES-Hadoop将Hadoop数据索引到Elastic Stack,以充分利用快速的的ElasticSearch引擎和美观的kibana进行可视化。

通过Es-Hadoop,可以构建动态的嵌入式索引应用来处理Hadoop数据,或者使用全文本,空间地理查询和聚合,执行深度的低延时分析。

应用于产品推荐,基因组测序等。

 

数据可以在ElasticSearch与Hadoop之间无缝移动

实现数据的快速移动,即可以让实时决策成为可能。对现有Hadoop API进行扩展,Es-Hadoop可以让ElasticSearch和Hadoop之间轻松的双向移动数据,

同时借助HDFS作为存储库,进行长期存档。分区感知,故障处理,类型转换,数据共置均可透明完成。

 

本地对接Spark及其衍生技术

ES-Hadoop完全支持Spark,Spark Streaming 和SparkSQL。

数据安全

HTTP授权和对SSL/TLS 的支持,此外,它还能与支持Kerberos的Hadoop和支持X-Pack的ElasticSearch集群一起使用

 

支持任意风格的Hadoop,通过CDH ,MapR和HDP认证

ES-Hadoop下载地址:https://www.elastic.co/cn/downloads/hadoop

转载于:https://www.cnblogs.com/dadouxiaodou/p/9109599.html

Table of Contents Elasticsearch for Hadoop Credits About the Author About the Reviewers www.PacktPub.com Support files, eBooks, discount offers, and more Why subscribe? Free access for Packt account holders Preface What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support Downloading the example code Downloading the color images of this book Errata Piracy Questions 1. Setting Up Environment Setting up Hadoop for Elasticsearch Setting up Java Setting up a dedicated user Installing SSH and setting up the certificate Downloading Hadoop Setting up environment variables Configuring Hadoop Configuring core-site.xml Configuring hdfs-site.xml Configuring yarn-site.xml Configuring mapred-site.xml The format distributed filesystem Starting Hadoop daemons Setting up Elasticsearch Downloading Elasticsearch Configuring Elasticsearch Installing Elasticsearch's Head plugin Installing the Marvel plugin Running and testing Running the WordCount example Getting the examples and building the job JAR file Importing the test file to HDFS Running our first job Exploring data in Head and Marvel Viewing data in Head Using the Marvel dashboard Exploring the data in Sense Summary 2. Getting Started with ES-Hadoop Understanding the WordCount program Understanding Mapper Understanding the reducer Understanding the driver Using the old API – org.apache.hadoop.mapred Going real — network monitoring data Getting and understanding the data Knowing the problems Solution approaches Approach 1 – Preaggregate the results Approach 2 – Aggregate the results at query-time Writing the NetworkLogsMapper job Writing the mapper class Writing Driver Building the job Getting the data into HDFS Running the job Viewing the Top N results Getting data from Elasticsearch to HDFS Understanding the Twitter dataset Trying it yourself Creating the MapReduce job to import data from Elasticsearch to HDFS Writing the Tweets2Hdfs mapper Running the example Testing the job execution output Summary ...
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值