风语飘摇-优快云博客

原创 Flink CEP 示例（可运行）

package com.cepimport org.apache.flink.api.common.serialization.SimpleStringSchemaimport org.apache.flink.cep.PatternSelectFunctionimport org.apache.flink.cep.pattern.conditions.SimpleConditionimport org.apache.flink.cep.scala.pattern.Patternimport .

2022-02-25 13:30:27 1260

原创 scal sdk plugin 地址 scala-intellij-bin**.zip

scal sdk plugin 地址各个版本Adds support for the Scala language. The following features are available for free with IntelliJ IDEA Community Edition:Coding assistance (highlighting, completion, formatting, refactorings, etc.) Navigation, search, informat...

2021-07-09 08:12:51 253

原创 HIve 使用MapReduce查询计算引擎，输出结果汉字显示乱码

HIve 使用MapReduce查询计算引擎，输出结果汉字显示乱码在配置MultiDelimitSerDe后，建立hive多分隔符表，select * from tab1 正常显示汉字；但是select s2,substr(s2,3) from db_mul.multi_delimiter_test 通过MR 引擎处理后，查询结果出现乱码。建表语句如下： create table db_mul.multi_delimiter_test( s1 string, s2 string, s3 string)

2020-11-24 14:01:29 940

原创 Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found

Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not foundTo resolve this issue, do the following:Option 1: For clusters without Sentry:Manually add the jar from Hive/ Beeline before running the query:ADD JAR /opt/cloudera/parcels/CDH

2020-11-24 13:35:38 1635

原创 kudu 文件描述符更改

kudu 文件描述符超过阀值kudu 文件描述符缺省打开文件数为32768在/etc/security/limits.d/下找到了cloudera的limit配置文件，里面限制为32768/etc/security/limits.d/cloudera-scm.conf修改：32768会覆盖系统配置，cm启动的进程最大打开文件数都是32768.要修改这个配置，需要修改cm...

2019-11-26 15:51:37 1237

原创 Idea Error:java: Compilation failed: internal java compiler error

Idea Error:java: Compilation failed: internal java compiler error解决办法很简单：File-->Setting...-->Build,Execution,Deployment-->Compiler-->Java Compiler 设置相应Module的target bytecode version的合适版本...

2019-11-26 13:51:46 210

原创 HUE middleware INFO Processing exception: StandbyException: Operation category RAD is not supported

HUE middleware INFO Processing exception: StandbyException: Operation category RAD is not supported原因是：HDFS高可用（HA）活动节点变了，而HUE HDFS Web url没有变，导致HUE HDFS Web url用的是NameNode节点是standby namenode，所以出现问...

2019-11-21 14:23:27 246

原创 Keras 更新指令

Keras 更新指令pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps

2019-01-19 11:59:05 5709

原创 CDH5 某机器节点每个角色都提示：此角色的主机的运行状况为存在隐患。以下运行状况测试存在隐患网络接口速度. 看看是不是网络问题

CDH5 某机器节点每个角色都提示：此角色的主机的运行状况为存在隐患。以下运行状况测试存在隐患网络接口速度. 看看是不是网络问题问题解决： 1、查找不是网络及网卡问题；2、查看防火前状态（OS：RHEL7.3）发现防火墙是开着的 #systemctl status firewalld● firewalld.service - firewalld - dynamic...

2018-09-21 10:09:42 9545

转载 PR曲线，ROC曲线，AUC指标等，Accuracy vs Precision

混淆矩阵（Confusion Matrix）： PR Precision-Recall曲线，这个东西应该是来源于信息检索中对相关性的评价吧，precision就是你检索出来的结果中，相关的比率；recall就是你检索出来的结果中，相关的结果占数据库中所有相关结果的比率；所以PR曲线要是绘制的话，可以先对decision进行排序，就可以当作一个rank值来用了，然后把分类问题

2018-02-02 12:24:19 1331

原创 java.lang.RuntimeException: java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoNodeE

写了一个storm集成kfaka的程序，kafkaSpout消费的数据作为storm的数据源。运行报错如下：java.lang.RuntimeException: java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brok

2018-01-10 20:29:29 4504

转载 CentOS 6.6 升级GCC G++ (当前最新版本为v6.1.0) (完整)

---恢复内容开始---CentOS 6.6 升级GCC G++ (当前最新GCC/G++版本为v6.1.0)没有便捷方式,yum update.... yum install 或者添加yum 的 repo 文件也不行, 只能更新到 4.4.7!then, 只能手动编译安装了,那么开始第一步下载源代码吧,GO!1、获取安装包并解压wget http://ft

2017-03-18 09:04:54 1359

转载 Spark集群某些worker无法停止的原因分析和解决

今天想停止spark集群，发现执行stop-all.sh的时候spark的相关进程都无法停止。提示：no org.apache.spark.deploy.master.Master to stopno org.apache.spark.deploy.worker.Worker to stop上网查了一些资料，再翻看了一下stop-all.sh，stop-master.sh，

2017-03-13 10:06:27 3356

转载 centos install scipy 问题：File "scipy/linalg/setup.py", line 20, in configuration raise NotFoundE

依赖包：pyparsing、dateutil、scipy、numpy、libpng 1.2 (or later)、`freetype` 1.4 (or later)安装pyparsing：# pip install pyparsing安装numpy：# pip install numpy安装dateutil：# pip install

2017-03-13 08:54:03 1075

转载 Hbase万亿级存储性能优化总结

背景 Hbase主集群在生产环境已稳定运行有1年半时间，最大的单表region数已达7200多个，每天新增入库量就有百亿条，对hbase的认识经历了懵懂到熟的过程。为了应对业务数据的压力，hbase入库也由最初的单机多线程升级为有容灾机制的分布式入库，为及早发现集群中的问题，还开发了一套对hbase集群服务和应用全面监控的报警系统。总结下hbase优化(针对0.94版本)方面

2017-03-08 12:07:55 745

转载 Spark(二): 内存管理

Spark 作为一个以擅长内存计算为优势的计算引擎，内存管理方案是其非常重要的模块； Spark的内存可以大体归为两类：execution和storage，前者包括shuffles、joins、sorts和aggregations所需内存，后者包括cache和节点间数据传输所需内存；在Spark 1.5和之前版本里，两者是静态配置的，不支持借用，spark1.6 对内存管理模块进行了优化，通过内存

2017-03-08 11:29:17 1560

转载 Spark(一): 基本架构及原理

Apache Spark是一个围绕速度、易用性和复杂分析构建的大数据处理框架，最初在2009年由加州大学伯克利分校的AMPLab开发，并于2010年成为Apache的开源项目之一，与Hadoop和Storm等其他大数据和MapReduce技术相比，Spark有如下优势：Spark提供了一个全面、统一的框架用于管理各种有着不同性质（文本数据、图表数据等）的数据集和数据源（批量数据或实时的流数

2017-03-08 11:26:45 116263 3

原创 centos7 能联通内网，但是不能访问外网网页问题

需要把连接配置文件（/etc/sysconfig/network-scripts/ifcfg-Shared_Wired_Connection）内容中：BOOTPROTO=none 改为 BOOTPROTO=static 或 BOOTPROTO=dhcp 即可。注：这个是网络配置参数：BOOTPROTO=static 静态IPBOOTPROTO=dhcp 动态

2017-03-03 15:19:54 2233

原创 Hue 安装问题django.core.exceptions.ImproperlyConfigured: Error loading MySQLdb module: libmysqlclient.so

问题：[root@master hue-3.11.0]# build/env/bin/hue syncdbTraceback (most recent call last): File "build/env/bin/hue", line 9, in load_entry_point('desktop==3.11.0', 'console_scripts', 'hue'

2017-03-01 14:56:05 5037

转载 park将数据写入hbase以及从hbase读取数据

本文将介绍1、Spark如何利用saveAsHadoopDataset和saveAsNewAPIHadoopDataset将RDD写入Hbase2、spark从hbase中读取数据并转化为RDD操作方式为在eclipse本地运行spark连接到远程的hbase。Java版本：1.7.0Scala版本：2.10.4zookeeper版本：3.4.5（禁用了hbase自带zoo

2017-02-06 18:58:14 1380

原创 HIVE2:ERROR [main]: ql.Driver (:()) - FAILED: Execution Error, return code 1 from org.apache.hadoop.

在Hive2.1 on Tez环境中运行select count(*) from students;时，遇到ERROR [main]: ql.Driver (:()) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask查看hive日志，具体问题是：2016-12

2016-12-21 11:31:53 8636

原创 HIVE2 ：beeline连接设置用户名和密码注意问题

beeline connect有几种方式，见hive-site.xml,缺省为NONE。 hive.server2.authentication NONE Expects one of [nosasl, none, ldap, kerberos, pam, custom]. Client authentication types

2016-12-19 17:26:17 35614

原创 HIVE2 Error: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteExc

LF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]Connecting to jdbc:hive2://localhost:10

2016-12-19 17:00:44 7557 4

原创 https://packages.elastic.co/elasticsearch/2.3/centos/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22

操作系统CENTOS# yum install xinetdLoaded plugins: fastestmirror, refresh-packagekit, securitySetting up Install ProcessLoading mirror speeds from cached hostfile * base: mirrors.btte.net *

2016-11-29 16:17:53 6592

原创 hive2.1.insert、update、delete操作测试

hive2.1.insert、update、delete操作测试在HIve缺省配置设置中，转换管理器不支持update跟delete操作。若要Hive支持update操作跟delete操作，必须额外再配置一些东西，详细见：https://cwiki.apache.org/confluence/display/Hive/Hive+TransactionsConfiguratio

2016-11-27 18:06:18 4536

转载 Hadoop列式存储引擎Parquet/ORC和snappy压缩

Hadoop列式存储引擎Parquet/ORC和snappy压缩原文 http://www.itweet.cn/2016/03/15/columnar-storage-parquet-and-orc/主题 Parquet Hadoop相对于传统的行式存储格式，列式存储引擎具有更高的压缩比，更少的IO操作而备受青睐。列式存储缺点：在column数很多，每次操作大部分

2016-11-26 22:01:39 6259

原创 Caused by: org.apache.hive.service.cli.HiveSQLException: Failed to open new session: java.lang.Runti

Caused by: org.apache.hive.service.cli.HiveSQLException: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=hive, access

2016-11-26 14:18:47 16884

转载 Hadoop配置项整理(hdfs-site.xml)

HADOOP: hdfs-stie.xml配置： name valueDescription dfs.default.chunk.view.size32768namenode的http访问页面中针对每个文件的内容显示大小，通常无需设置。dfs.datanode.du.reserved1073741824每块磁盘

2016-11-26 14:11:22 1023

原创 YARN安装配置

（一）YARN初步理解yarn结构图如下：1、YARN　　下一代的MapReduce系统框架，也称为MRv2(MapReduce version 2), 它是一个通用资源管理系统，可为上层应用提供统一的资源管理和调度。　　YARN的基本思想是将JobTracker的两个主要功能（资源管理和作业调度/监控）分离，主要方法是创建一个全局的ResourceManager（

2016-11-26 10:05:36 3689

转载 hive配置参数的说明

hive配置参数的说明： hive.ddl.output.format：hive的ddl语句的输出格式，默认是text，纯文本，还有json格式，这个是0.90以后才出的新配置；hive.exec.script.wrapper：hive调用脚本时的包装器，默认是null，如果设置为python的话，那么在做脚本调用操作时语句会变为python ，null的话就是直接执行；h

2016-11-26 09:03:42 485

原创 Hive2.1安装配置文件名称修改注意问题

注意问题：要把hive-default.xml.template 改成hive-default.xml,发现此配置文件不会发生作用，还要把名称最终改为：hive-site.xml ，此名称的配置才会发生作用。

2016-11-25 16:38:00 455

原创 Hive2.1：Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException:

Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D at org.apache.hadoop.

2016-11-25 16:34:12 3357

原创 Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveExcept

[root@master apache-hive-2.1.0-bin]# hiveSLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/home/hive/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/s

2016-11-25 16:30:00 15156

转载 ubuntu系统查找命令

ubuntu系统查找命令一.以文件名查找：1. find 命令find / -name "filename"目的：在根目录“/”开始搜被称为filename的文件，“filename”文件名可以包含通配符（*，？），注意：filename是文件名字符串，可以带双引号，也可不带find命令功能强大，它有很多选项让你以不同的方式搜索文件，例如，通过日期，文件大小，权限

2016-11-19 14:45:50 378

转载 CentOS下安装postgresql 9.4

一、前言 PostgreSQL通常也简称Postgres，是一个关系型数据库管理系统，适用于各种Linux操作系统、Windows、Solaris、BSD和Mac OS X。PostgreSQL遵循PostgreSQL许可，是一个开源软件。PostgreSQL由PostgreSQL全球开发组开发，由极少数的公司志愿组成并进行监督管理，这些公司有红帽、EnterpriseDB等。

2016-11-19 14:45:18 426

转载 solr教程，值得刚接触搜索开发人员一看

目录Solr调研总结开发类型全文检索相关开发Solr版本4.2文件内容本文介绍solr的功能使用及相关注意事项;主要包括以下内容:环境搭建及调试;两个核心配置文件介绍;维护索引;查询索引,和在查询中可以应用的高亮显示、拼写检查、搜索建议、分组统计、拼音检索等功能的使用方法。

2016-11-19 14:44:58 3693

转载 Zookeeper实战之单机集群模式

Zookeeper的单机模式的安装及应用，但是Zookeeper是为了解决分布式应用场景的，所以通常都会运行在集群模式下。由于手头机器不足，所以打算在一台机器上部署三个Zookeeper服务来组成一个Zookeeper集群。这里解压Zookeeper的安装包到/opt目录下，这里用三个目录来代表三个Zookeeper实例，分别是/opt/zookeeper1，/opt/zookeeper2和/op

2016-11-19 14:44:04 390

原创 Linux CentOS6.5下编译安装MySQL

目录一、编译安装MySQL前的准备工作安装编译源码所需的工具和库yum install gcc gcc-c++ ncurses-devel perl 安装cmake，从http://www.cmake.org下载源码并编译安装wget http://www.cmake.org/

2016-11-19 14:42:36 597

转载 Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.

异常详情如下：Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:us

2016-11-19 14:41:19 11820

原创执行hive2.0中hplsql 遇到问题（未解决）

[root@master Desktop]# hplsql -f /home/hive/1.sqlSLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/home/hive/apache-hive-2.0.0-bin/lib/hive-jdbc-2.0.0-standalon

2016-11-19 14:40:29 2747

Oracle数据库10g：OLAPOracle数据库10g：OLAP

Oracle数据库10g：OLAPOracle数据库10g：OLAPOracle数据库10g：OLAPOracle数据库10g：OLAPOracle数据库10g：OLAP

2008-09-16

Beginning Spring Boot 2 Applications and Microservices with the Spring Framework

Copyright © 2017 by K. Siva Prasad Reddy Spring is the most popular Java-based framework for building enterprise applications. The Spring framework provides a rich ecosystem of projects to address modern application needs, like security, simplified access to relational and NoSQL datastores, batch processing, integration with social networking sites, large volume of data streams processing, etc. As Spring is a very flexible and customizable framework, there are usually multiple ways to configure the application. Although it is a good thing to have multiple options, it can be overwhelming to the beginners. Spring Boot addresses this “Spring applications need complex configuration” problem by using its powerful autoconfiguration mechanism. Spring Boot is an opinionated framework following the “Convention Over Configuration” approach, which helps build Spring-based applications quickly and easily. The main goal of Spring Boot is to quickly create Spring-based applications without requiring the developers to write the same boilerplate configuration again and again. In recent years, the microservices architecture has become the preferred architecture style for building complex enterprise applications. Spring Boot is a great choice for building microservices-based applications using various Spring Cloud modules. This book will help you understand what Spring Boot is, how Spring Boot helps you build Spring-based applications quickly and easily, and the inner workings of Spring Boot using easy-to-follow examples.

2017-11-10

Learning PySpark.pdf

In this book, we will guide you through the latest incarnation of Apache Spark using Python. We will show you how to read structured and unstructured data, how to use some fundamental data types available in PySpark, build machine learning models, operate on graphs, read streaming data, and deploy your models in the cloud. Each chapter will tackle different problem, and by the end of the book we hope you will be knowledgeable enough to solve other problems we did not have space to cover here.

2017-10-28

用户网络行为画像:大数据中的用户网络行为画像

如何能牢牢地黏住老用户、吸引新用户、读懂用户的偏好兴趣和喜怒哀乐，这都是对企业发展至关重要甚至关乎生死存亡的问题，解决这个问题的方法就是推荐系统。本书分为上中下三篇，共13章，上篇为用户画像知识工程基础，包括表征建模、画像计算、存储及各种更新维护等管理操作；中篇为推荐系统与用户画像，包括传统协同过滤等经典推荐算法的介绍，以及涉及用户画像的推荐方法；下篇为应用案例分析，包括Netflix、阿里等数据竞赛的经典数据案例，以及在具体工程开发过程的具体案例，分别从系统需求、总体结构、算法设计、运行流程及测试结果等五个方面提供详细案例指导。

2017-11-22

Deep Learning with Hadoop.pdf

Deep Learning with Hadoop Copyright © 2017 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First published: February 2017 Production reference: 1130217 Published by Packt Publishing Ltd.

2017-10-28

一汽大众_MDM_入门培训_整体简介

熟悉SAP MDM主要功能熟悉MDM客户端主要工具

2014-04-30

Hadoop MapReduce 实战手册

这是一本学习Hadoop MapReduce 的一站式指南，完整介绍了 Hadoop生态体系，包括Hadoop平台安装、部署、运维等，以及 Hadoop生态系统成员Hive、Pig、HBase、Mahout等。最重要的是，书中包含丰富的示例和多样的实际应用场景，以一种简单而直接的方式呈现了90个实战攻略，并给出一步步的指导。本书从获取Hadoop并在集群中运行Hadoop讲起，依次介绍了高级HDFS，高级Hadoop MapReduce管理，开发复杂的Hadoop MapReduce应用程序，Hadoop的生态系统，统计分析，搜索与索引，聚类、推荐和寻找关联，海量文本数据处理，云部署等内容。

2017-10-31

精益数据分析-高清（含目录）

本书包含了大量内容。我们采访了100多位创始人、投资人、内部创业者和创新者，他们中的许多人与我们分享了自己的经历，我们在书中呈现了30多个案例分析。我们也列出了许多你可以立即应用的最佳实践模式。我们将这些内容分为四大部分。第一部分聚焦于对精益创业和基本分析技术的理解，以及帮助你迈向成功的数据启示的思维方式。我们综述了很多现有的创业方法框架，并提出了我们自己专注于数据分析的框架。这是你在精益分析世界的第一课。在这部分末尾，你会对基本的分析技术有一个很好的了解。第二部分展示如何将精益分析用于创业公司中。我们会以六种商业模式为例，讨论每个创业公司都要经历的五个发展阶段，在这些阶段中，企业逐渐探索出正确的产品和最佳的目标市场。我们也讨论了如何寻找你的业务的第一关键指标。读完这一部分，你会知道你所处的商业领域、所处的发展阶段以及应该去做的事情。第三部分对指标的正常范围进行审视。除非你划出了一条不可逾越的底线，否则你永远不会知道你做得是好还是差。通过阅读这一部分，你会得到关于关键指标的一些参考值，并学到如何设置自己的目标。第四部分展示了如何将精益分析用于你所在的组织，以改变组织内的文化，无论它是面向消费者或企业的创业公司，还是地位稳固的公司。毕竟数据驱动的方法不仅仅适用于初创企业。大多数章的末尾都给出了一些问题，以帮助你思考和应用读过的内容。

2017-08-14

machine-learning-algorithms 机器学习算法

This book is an introduction to the world of machine learning, a topic that is becoming more and more important, not only for IT professionals and analysts but also for all those scientists and engineers who want to exploit the enormous power of techniques such as predictive analysis, classification,clustering and natural language processing. Of course, it's impossible to cover all the details with the appropriate precision; for this reason, some topics are only briefly described, giving the user the double opportunity to focus only on some fundamental concepts and, through the references, examine in depth all those elements that will generate much interest. I apologize in advance for any imprecision or mistakes, and I'd like to thank all Packt editors for their collaboration and constant attention.

2019-01-07

Docker开发指南 pdf

第一部分首先讲解什么是容器，以及为什么应该关注它。之后将示范Docker的基本操作。最后会用较长篇幅来讲解 Docker 的基本概念和技术，其中包括 Docker 命令的概览。 ?? 第二部分讲解如何将 Docker 应用于软件开发的生命周期。首先讲解如何配置开发环境，然后构建一个简单的 Web 应用，这个 Web 应用的例子将用于整个第二部分。这一部分还会涵盖开发、测试、集成，以及如何部署容器，如何有效地监控和记录生产环境的日志。 ?? 第三部分的内容更为深入，其中包括在多主机集群环境中，有哪些工具及技巧能使Docker 容器既安全又可靠地运行。这部分适合已经使用 Docker，并需要了解如何扩展或解决网络和安全问题的读者。

2017-08-28

深入理解Redis(英文版)(Mastering Redis)

本书以由浅入深、由原理到应用场景的方式介绍了Redis 这款NoSQL 数据库产品。书中不仅细致地讲解了Redis 中的数据结构及流行的使用模式，还针对Redis 键的设计和管理，以及内存管理提出了建设性的方案。同时，作者深入Redis 源码，将其内部构造通过源代码调试的方式进行呈现。本书适合有一定NoSQL 经验的开发者或者架构师阅读。读者可以从书中找到许多应用场景和解决方案，例如Docker 部署、Redis 消息队列、基于Redis 的ETL 应用和基于Redis 的机器学习等。

2017-08-15

数据仓库工具箱：维度建模的完全指南

数据仓库工具箱：维度建模的完全指南数据仓库工具箱：维度建模的完全指南

2008-09-16

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人