
大数据
嵇康
冷静,爱思考,爱读书
爱结识有共同爱好的人
与公司共成长。
展开
-
ubuntu下hadoop2安装
1使用ppa/源方式安装JDKsudo add-apt-repository ppa:webupd8team/javasudo apt-get updatesudo apt-get install oracle-java7-installersudo update-java-alternatives -s java-7-oraclejava -version 2用原创 2016-11-12 11:29:28 · 364 阅读 · 0 评论 -
install docker in aws redhat7
编辑/etc/yum.reop.d/redhat-rhui.repo 文件,找到如下配置块。[rhui-REGION-rhel-server-extras]name=Red Hat Enterprise Linux Server 7 Extra(RPMs)mirrorlist=https://rhui2-cds01.REGION.aws.ce.redhat.com/pulp/mirror/原创 2017-03-27 15:58:33 · 851 阅读 · 0 评论 -
integrate hbase into sparkSql in fedora
1 in hbase-env.shexport JAVA_HOME=/home/jka07@int.hrs.com/software/jdk1.8.0_1212. in hbase-site.xmlhbase.zookeeper.property.dataDir/datahbase.rootdirhdfs://hrs-hadoop:9000/hbase原创 2017-03-22 08:58:55 · 348 阅读 · 0 评论 -
install hive-2.1.1 in fedora
1、环境变量 /etc/profilePATH=$PATH:$HOME/.local/bin:$HOME/binexport JAVA_HOME=/home/jka07@int.hrs.com/software/jdk1.8.0_121PATH=$PATH:$JAVA_HOME/binexport SCALA_HOME=/usr/share/scalaexport CL原创 2017-03-23 11:45:42 · 383 阅读 · 0 评论 -
install hive in fedora
1、环境变量 ~/.bash_profileexport JAVA_HOME=/home/jka07@int.hrs.com/software/jdk1.8.0_121PATH=$PATH:$JAVA_HOME/binexport SCALA_HOME=/usr/share/scalaexport CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOM原创 2017-03-17 09:39:23 · 336 阅读 · 0 评论 -
intergrate hbase-1.2.4 into hive-2.1.1
hbase> create ' hvtohbase', 'cf1'hive >create table ccc(foo int,bar string) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile;hive >CREATE external原创 2017-03-23 14:00:24 · 509 阅读 · 0 评论 -
spark JavaSQLPageViewanalyzer for hive table
package com.ibeifeng.bigdata.spark.sql; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spa原创 2017-03-23 14:43:57 · 350 阅读 · 0 评论 -
SQLAccesslogAnalyzer in sparkSQL
package com.ibeifeng.bigdata.spark.sqlimport com.ibeifeng.bigdata.spark.core.ApacheAccessLogimport org.apache.spark.rdd.RDDimport org.apache.spark.sql.{DataFrame, SQLContext}import org.apache.spa原创 2017-03-23 18:58:27 · 538 阅读 · 0 评论 -
install hadoop in fedora
1>hostname hrs-hadoop (which will used in xml)2>mkdir -p /home/jka07@int.hrs.com/hadoop/dfs/name3> add property in /home/jka07@int.hrs.com/software/hadoop-2.7.0/etc/hadoop/hdfs-site.xml原创 2017-03-16 16:27:21 · 349 阅读 · 0 评论 -
mac下hive安装
参考:http://www.cnblogs.com/yjmyzz/p/4555507.html1》 brew install hive二、环境变量/etc/profileHBASE_HOME=/usr/local/Cellar/hbase/1.2.2HADOOP_HOME=/usr/local/Cellar/hadoop/2.7.3export HIVE_HOME=/u原创 2016-11-27 12:02:25 · 3969 阅读 · 0 评论 -
deploy hadoop cluster in docker using of sequenceiq/hadoop-docker:2.7.0
see:http://www.jianshu.com/p/5f4be94630a3http://blog.youkuaiyun.com/xu470438000/article/details/5051244210.177.3.93>sudo docker run --name hadoop0 --hostname hadoop0 --restart=always -d --net=host原创 2017-06-22 12:07:21 · 661 阅读 · 0 评论 -
upgrade pip3 in fedora
how can we use pip3 after we install python27(then we have pip already) and python35refer to :http://blog.youkuaiyun.com/ouening/article/details/53358726wget https://bootstrap.pypa.io/ez_setup.py#安装转载 2017-11-17 10:35:21 · 598 阅读 · 0 评论 -
word2vector体验
参见http://www.52nlp.cn/%E4%B8%AD%E8%8B%B1%E6%96%87%E7%BB%B4%E5%9F%BA%E7%99%BE%E7%A7%91%E8%AF%AD%E6%96%99%E4%B8%8A%E7%9A%84word2vec%E5%AE%9E%E9%AA%8C1 install python3 opencc brew install python3原创 2017-11-16 10:47:46 · 1003 阅读 · 0 评论 -
install sparkStreaming +karka in fedora
>source /etc/profile>sudo hostname hrs-hadoop>sudo sshd starthadoop/sbin>./start-allkarka/bin>./kafka-server-start.sh ../config/server.properties karka/bin>./kafka-console-producer.sh --br原创 2017-03-21 08:41:53 · 382 阅读 · 0 评论 -
mac下kafka安装及配置
》brew install kafka4. 配置config/server.propertiesbroker.id为依次增长的:0、1、2、3、4,集群中唯一idlog.dirs设置到大硬盘路径下num.network.threadsnum.partitions ,默认分区数num.io.threads 建议值为机器的核数;zookeeper.connect 设置原创 2017-01-15 16:46:35 · 6164 阅读 · 0 评论 -
intergrate sparkSQL with hive
copy hive-site to {spack_HOME}/conf ,and add two properties: hive.metastore.local true --> hive.metastore.schema.verification false原创 2017-03-20 10:09:55 · 452 阅读 · 0 评论 -
windows在IDEA环境运行spark程序(spark安装在另一台mac机器)
源代码:https://github.com/jimingkang/spark环境:scala 2.11.8:spark-2.0.2-bin-hadoop2.7 IDEA配置参考:http://blog.youkuaiyun.com/javastart/article/details/43372977‘’关键代码:val conf = new SparkConf().setAp原创 2016-12-10 18:02:51 · 836 阅读 · 0 评论 -
CDH5在centOS下安装
》 service iptables stop》chkconfig iptables off》vi /etc/selinux/configSELINUX=disabled》 vi /etc//sysconfig/network-scripts/ifcfg-eth0DEVICE=eth0HWADDR=08:00:27:31:53:F9TYPE=Eth原创 2016-11-19 15:04:54 · 451 阅读 · 0 评论 -
sqoop导出hive表数据到mysql
直接在mysql里从本地文件系统导入数据mysql》LOAD DATA LOCAL INFILE 'C:\\Users\\asys\\Documents\\Tencent Files\\13174605\\FileRecv\\2015082818' INTO TABLE track_log FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\r'原创 2016-12-04 12:56:59 · 6954 阅读 · 3 评论 -
window下eclipse提交hadoop作业遇到的坑
1 用户权限问题,由于是windows用户,可能与远程集群用户不一致解决方案:首先hdfs-site。xml下添加 dfs.permissions false dfs.web.ugi jimmy,supergroup //这里jimmy为我当时安装hadoop时的操作系统用户 dfs.datanode.data.dir.per原创 2016-11-28 13:48:15 · 1183 阅读 · 0 评论 -
解决sqoop从mysql导入到hive表的多分区问题
参考:http://blog.youkuaiyun.com/liweiwei71/article/details/23434189对于分区表drop table track_log;create table track_log (id string ,。。curMerchantId string ,provi原创 2016-12-18 16:37:41 · 6556 阅读 · 0 评论 -
scoop导入数据从mysql到hive
1 直接导入到hdfs的文件系统(hive查询时候有问题 ,比如select id from track_log时候返回多列)sqoop import --connect jdbc:mysql://localhost:3306/track_log --username root --password Nokia123 --table track_log18 //mysql的源表原创 2016-12-18 15:47:23 · 2031 阅读 · 0 评论 -
sqoop导入mysql数据到hbase
提前建立表fct_session_info,列簇为sessionhbase》create 'fct_session_info', 'session'jimmy》sqoop import --connect jdbc:mysql://localhost:3306/track_log --username root --password Nokia123 -原创 2016-12-25 14:08:04 · 594 阅读 · 0 评论 -
MAC下hadoop2.7.3安装
参考:http://www.cnblogs.com/micrari/p/5716851.html》ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" 》brewinstall Hadoop》 ssh-keygen -t dsa -P ''原创 2016-11-04 23:22:55 · 1628 阅读 · 3 评论 -
windows的eclipse下运行kafka工程
项目至少包含kafka-clients-0.10.1.1.jar 和kafka_2.10-0.10.1.1.jar及scala-library-2.10.6.jar,metrics-core-2.2.0.jar如果要结合storm,则还要包含storm-core-0.10.2.jar项目包含文件1)KafkaProperties属性文件package com.c原创 2017-01-17 15:56:47 · 2651 阅读 · 1 评论 -
STORM整合kafka消费数据
参见我的git项目:https://github.com/jimingkang/StormTwo/tree/master/src/user_visit项目文件:1)package user_visit;import cloudy.spout.OrderBaseSpout;import com.ibf.base.spout.SourceSpout;import backtype.原创 2017-01-17 16:14:54 · 3171 阅读 · 0 评论 -
hbase的二级检索
1)mysql源个表 sale_order -> so》desc so;+-----------+--------------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+-----------+--------------+------+原创 2017-01-12 22:29:17 · 376 阅读 · 0 评论 -
windows本地运行offline_data_analysis项目(项目运行在本地hadoop上)
项目路径:https://github.com/jimingkang/offline_data_analystics.git1下载hadoop-2.5.0-cdh5.3.6解压到E:\hadoop-2.5.0-cdh5.3.62.在bin下添加winutils.exe3添加修改了的NativeIO到src/main/extr文件下的包org.apache原创 2017-02-26 18:44:00 · 504 阅读 · 0 评论 -
使用go-ethereum建立私链
ref: http://blog.youkuaiyun.com/lhtzbj12/article/details/79405238https://www.cnblogs.com/jackluo/p/8513880.html>docker pull ethereum/client-go>cd docker_ethereum/>[root@mo-cn-491 docker_ethereum]# ...转载 2018-03-08 10:14:19 · 525 阅读 · 0 评论