super_ruichao-优快云博客

原创 HBase修改表压缩，并压缩历史数据

由于前期hbase建表没有指定压缩，后期磁盘增长太快，决定使用snappy压缩。具体解决步骤如下：disable ‘tableName’alter ‘tableName’,{NAME=>‘cf’,compression=>‘snappy’}enable ‘tableName’major_compact ‘tableName’在此主要讲解一下major_compact...

2020-04-27 15:08:00 1070

原创 jupyter通过pip部署并配置spark2

jupyter通过pip部署并配置spark2以下步骤在HDP03节点操作创建jupyter用户并创建jupyter数据目录// 创建用户[root@HDP03 ~]# useradd jupyter[root@HDP03 ~]# passwd jupyterChanging password for user jupyter.New password: // 密码：jup...

2020-04-24 16:19:37 256

原创 sparksteaming kafka维护offset

使用checkpoint管理消费者offset（spark1.6和spark2.3都行）如果业务逻辑不变，可以使用checkpoint来管理消费者offset.使用streamingContext.getOrCreate(check目录，streamingContext)；首先从checkpoint目录中回复streaming配置信息，逻辑，offset。如果业务逻辑变了，使用这种方式不会...

2020-04-21 11:40:40 307

原创 Elasticsearch使用IK分词器的坑

Elasticsearch使用IK分词器定义实体指定index,type,分片数量IK分词器分类：ik_smart(粗粒度)，ik_max_word(细粒度)注意： es版本为7.0.0以下的不支持ik_max_word，7.0.0以下的仅支持ik_smart分词器具体使用方式如下：@Data@Document(indexName = "entertainment_idx_sole...

2020-04-20 15:16:17 1909 1

原创 spark把hive数据同步到ES中（upsert）

spark把hive数据同步到ES中（upsert）链接es的配置 private static Map<String, String> getEsOption() { Map<String, String> map = new HashMap<>(6); map.put("es.index.auto.create", "tr...

2020-04-20 15:07:02 2700 1

原创 sparkstreaming API 操作实例 java

sparkstreaming API 操作实例 javapublic static void main(String[] args) throws InterruptedException { SparkSession spark = SparkSession.builder().appName("test streaming").master("local[2]").getOr...

2020-04-20 14:56:09 566

原创 HBase数据上传----生成HFile文件，通过bulkload到habse中,多个column

HBase数据上传----生成HFile文件，通过bulkload到habse中具体实现步骤文中讲述了实现过程中的注意事项：采用spark2.1.0和scala2.11.12进行开发，注：spark和scala版本适配如果hbase列族中是单列，只需对rowkey排序如果hbase列族中是多列，需要对rowkey和colunm排序，字典排序全部实现代码其中自己实现的预分区...

2019-05-24 16:20:24 1251 4

sinat_25932097的博客