Plume_WZ-优快云博客

原创 docker-启动hue小记

1. 下载镜像到本地docker pull plumewhite/hue2. 启动容器2.1 直接启动docker run -it -p 宿主机端口:容器内部服务端口镜像ID bashdocker run -it -p 1012:8888 7bea914601b4 bash2.2 docker-compose启动version: "2.1"services: hue: image: 622d6a0a98c5 # 容器ID hostname: hue

2020-11-05 16:24:43 574

原创 hue-登录相关-简

登录点击sign in按钮调用http://192.168.1.102:8000/hue/accounts/login进行登录Django登录通过文件hue-release-4.7.1\desktop\core\ext-py\Django-1.11.29\build\lib\django\contrib\auth\views.py显示登录表单并处理登录操作 def dispatch(self, request, *args, **kwargs): if self.redir

2020-09-11 15:29:15 1302

原创 Hue源码编译-添加Phoenix

准备hue需要的环境hadoop zookeeper hbase # 此次连接的是Phoenix所以需要PhoenixPhoenix下载hue源码包解压源码包unzip hue-release-4.7.1.zip创建一个hue的用户安装必要的依赖yum install apache-maven \ ant \ asciidoc \ cyrus-sasl-devel \ cyrus-sas.

2020-09-11 15:27:28 580

原创 hue和Phoenix的语法和函数对比

文章目录目录GrammarAggregate Functions目录Grammar指令Phoenixhue对比/差异SQLSELECT从一个或多个表中选择数据-无select * from test.us_populationUPSERT VALUES如果不存在，则插入，否则更新表中的值。-无UPSERT INTO test.us_population (state, city, population) values (‘IA’,‘Chicanm’,28

2020-09-11 14:01:58 360

原创 Spark算子之reduceByKey、groupByKey

reduceByKey： /** * Merge the values for each key using an associative and commutative reduce function. This will * also perform the merging locally on each mapper before sending results to a r...

2019-07-10 22:55:21 253

原创 Spark算子map()、mapPartitions()、mapPartitionsWithIndex()

map()：通过将函数应用于此RDD的所有元素来返回新的RDD。 /** * Return a new RDD by applying a function to all elements of this RDD. */ def map[U: ClassTag](f: T => U): RDD[U] = withScope { val cleanF = sc.cl...

2019-07-08 11:34:32 260

原创 Kafka_LEO_HW

HW (HighWaterMark) 和 LEO (Last End Offset)LEO：每个最后一个offset即为该副本的LEOHW：ISR和副本leo落后leader LEO的时长不大于replica.lag.time.max.ms参数值(默认是10s)的副本中最小的LEO即为该分区的HW，用来判定副本的备份进度，HW以外的消息不被消费者可见leader持有的HW即为分区的HW,同...

2019-07-07 09:09:59 622

原创 Kafka概述

Kafka是一个分布式的基于发布/订阅模式的消息队列，主要应用于大数据实时处理领域。消息队列的两种模式：点对点模式：一对一，消费和主动拉取数据，消息收到后消息清除消息生产者生产消息发送到Queue中，然后消息消费者从Queue中取出并且消费消息。消息被消费以后，queue中不再有存储，所以消息消费者不可能消费到已经被消费的消息。Queue支持存在多个消费者，但是对一个消息而言，只会有...

2019-07-07 08:29:53 110

原创 Flume优化

Flume参数调优Source增加Source个数（使用Tair Dir Source时可增加FileGroups个数）可以增大Source的读取数据的能力。例如：当某一个目录产生的文件过多时需要将这个文件目录拆分成多个文件目录，同时配置好多个Source 以保证Source有足够的能力获取到新产生的数据。batchSize参数决定Source一次批量运输到Channel的event条数，...

2019-07-05 10:31:36 394

原创 Flume组件

Flume：分布式的海量日志采集、聚合和传输的系统。基于流式架构，灵活简单。优点：可以和任意存储进程集成。输入的的数据速率大于写入目的存储的速率，flume会进行缓冲。flume中的事务基于channel，使用了两个事务模型（sender + receiver），确保消息被可靠发送。组件有Source、Channel、SinkSource数据输入端常见类型有：Spooling、di...

2019-07-04 22:35:38 148

原创 Spark算子之map、flatMap

map(func)：源码 /** * Return a new RDD by applying a function to all elements of this RDD. */ def map[U: ClassTag](f: T => U): RDD[U] = withScope { val cleanF = sc.clean(f) new MapP...

2019-07-04 21:57:25 569

Swhite_WZ