
Hadoop
ylzhjlinux
这个作者很懒,什么都没留下…
展开
专栏收录文章
- 默认排序
- 最新发布
- 最早发布
- 最多阅读
- 最少阅读
-
7 Tips for Improving MapReduce Performance
Since MapReduce and HDFS are complex distributed systems that run arbitrary user code, there’s no hard and fast set of rules to achieve optimal performance; instead, I tend to think of tuning a cl...原创 2014-05-15 15:32:28 · 156 阅读 · 0 评论 -
Content based and collaborative filtering based recommendation and personalizati
References https://github.com/pranab/sifarish原创 2015-01-21 15:53:59 · 200 阅读 · 0 评论 -
hadoop 2.3.0 在 ubuntu/Centos 64位下的编译
hadoop 2..30的官方tarball中 ./lib/native中的库只适合32位操作系统,在64位下安装会报一些错误,使用hadoop启动不起来。所以需要在64位上重新编译。 1. enviroment hadoop 2.3.0 ubuntu 12.04 64 2. follow these steps to recompile hadoop sudo a...原创 2014-03-13 17:38:49 · 131 阅读 · 0 评论 -
hadoop 2.2.0 cluster errors messages
1. when put local file to HDFS using #hadoop fs -put in.txt /test, there is a error message: hdfs.DFSClient: Exception in createBlockOutputStream java.net.NoReouteToHostException solution...原创 2014-03-23 23:08:24 · 216 阅读 · 0 评论 -
Add third party jars in a job
When I submit a java job (include some map/reduce jobs) in the hue UI using oozie Editor, the third party jars are not loaded correctly. 1. the only success way i used is to build a fat jar wh...原创 2014-08-18 15:10:25 · 130 阅读 · 0 评论 -
hadoop: data join exception
http://stackoverflow.com/questions/12956488/hadoop-nosuchmethodexception原创 2014-08-26 18:39:28 · 107 阅读 · 0 评论 -
Hadoop: Output data to mutiple dir
import java.io.IOException; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; im...原创 2014-09-01 12:47:02 · 112 阅读 · 0 评论 -
Eclipse Hadoop Development ENV Construction
ENV: Ubuntu 12.04 1. Install Eclipse 2. create desktop shortcut for Eclipse a. create an empty document named eclipse.xx b. edit eclipse.xx like followings (avoid failing to open eclip...原创 2014-04-24 09:58:41 · 111 阅读 · 0 评论 -
Hadoop:Integrating Hadoop Data with Oracle Parallel Processing
Reference https://blogs.oracle.com/datawarehousing/entry/integrating_hadoop_data_with_o原创 2014-10-09 16:52:42 · 121 阅读 · 0 评论 -
Hadoop: How to using two mapper to do different thing
In my work, I run a situation that I want to use A mapper reading a file with to fields (questionId, questionTags) and outpute format likes key: questionId value: questionTags, while B mapper readi...原创 2014-10-10 10:30:35 · 103 阅读 · 0 评论 -
The Hadoop Ecosystem Table
http://hadoopecosystemtable.github.io/ http://blog.andreamostosi.name/big-data/ https://github.com/youngwookim/awesome-hadoop原创 2014-11-10 15:28:24 · 128 阅读 · 0 评论 -
mysql applier with hadoop
MySQL Applier for Hadoop Replication via the Hadoop Applier is implemented by connecting to the MySQL master and reading binary log events as soon as they are committed, and writing them into a fil...原创 2014-12-08 11:25:14 · 174 阅读 · 0 评论 -
MySQL Applier For Hadoop: Real time data export from MySQL to HDFS
http://innovating-technology.blogspot.com/2013/04/mysql-hadoop-applier-part-1.html MySQL replication enables data to be replicated from one MySQL database server (the master) to one or more MySQL da...原创 2014-12-08 17:00:04 · 210 阅读 · 0 评论 -
MySQL Applier For Hadoop: Implementation
http://innovating-technology.blogspot.com/2013/04/mysql-hadoop-applier-part-2.html This is a follow up post, describing the implementation details of Hadoop Applier, and steps to configure and insta...原创 2014-12-08 17:15:34 · 255 阅读 · 0 评论 -
mysql hadoop applier install and configure
1.install and configure hadoop-2.6.0 ($HADOOP_HOME must be set). 2. download mysql-5.6.22.tar.gz source code from http://dev.mysql.com/downloads/mysql/ #tar xf mysql-5.6.22.tar.gz #cd mysql-5.6....原创 2014-12-11 17:36:50 · 178 阅读 · 0 评论 -
Is HDFS an append only file system? Then, how do people modify the files stored
HDFS is append only, yes. The short answer to your question is that, to modify any portion of a file that is already written, one must rewrite the entire file and replace the old file."Even for a sin...原创 2014-12-17 17:22:14 · 139 阅读 · 0 评论 -
sqoop: truncate table prior export data from hdfs
We are using Sqoop to export data from the hive to SQL Server. The new data is always appended to the existing data in SQL Server. Is it possible to truncate the SQL Server table via Sqoop before...原创 2015-01-06 17:18:02 · 610 阅读 · 0 评论 -
Real-time Clickstream Analytics using Flume, Avro, Kite Morphlines and Impala
http://techkites.blogspot.com/2014/06/real-time-clickstream-analytics-using.html原创 2014-12-30 14:16:20 · 112 阅读 · 0 评论 -
TopK problem in Hadoop
Some example codes here https://github.com/adamjshook/mapreducepatterns/tree/master/MRDP/src/main/java/mrdp https://github.com/adamjshook/mapreducepatterns/blob/maste...原创 2014-05-19 18:08:44 · 101 阅读 · 0 评论 -
Number of Maps and Reduces
The number of map tasks for a given job is driven by the number of input splits and not by the mapred.map.tasks parameter. For each input split a map task is spawned. So, over the lifetime of a mapr...原创 2014-05-20 09:45:01 · 246 阅读 · 0 评论 -
difference between 0 reducer and identity reducer
0 reducer means reduce step will be skipped and mapper output will be the final out Identity reducer means then shuffling/sorting will still take place If you do not need sorting of map results -...原创 2014-05-20 15:38:36 · 140 阅读 · 0 评论 -
Chain MapReduce Jobs
References http://stackoverflow.com/questions/2499585/chaining-multiple-mapreduce-jobs-in-hadoop https://developer.yahoo.com/hadoop/tutorial/module4.html#chaining ht...原创 2014-05-20 18:16:20 · 131 阅读 · 0 评论 -
common errors solution
1.when i create a hive table in hue, there errors comes Solution:#hadoop dfsadmin -safemode leave http://www.linkedin.com/groups/Creating-table-in-Hive-getting-4547204.S.225243871 2.error ...原创 2014-05-26 11:09:19 · 136 阅读 · 0 评论 -
Making Hadoop MapReduce Work with a Redis Cluster
Redis is a very cool open-source key-value store that can add instant value to your Hadoop installation. Since keys can contain strings, hashes, lists, sets and sorted sets, Redis can be used a...原创 2014-05-28 15:18:49 · 147 阅读 · 0 评论 -
Hadoop: Configuration 1
hadoop-env.sh Must set JAVA_HOME in namenode and secondary namenodes, or the start-dfs.sh will run errors原创 2014-06-12 11:45:07 · 83 阅读 · 0 评论 -
Hadoop 2.2.0 cluster install guid
Installing hadoop 2.2.0 clusters with 3 nodes(one for namenode/resourcemanager and secondary namenode while the other tow nodes for datanode/nodemanager) 1. ip assignments 192.168.122.1 ...原创 2014-02-05 01:56:35 · 124 阅读 · 0 评论 -
HDFS: API Introduction
References http://blog.youkuaiyun.com/lastsweetop/article/details/9001467原创 2014-06-17 15:27:31 · 108 阅读 · 0 评论 -
Hadoop: Data Join
Reduce-side joining / repartitioned sort-merge join Note:DataJoinReducerBase, on the other hand, is the workhorse of the datajoin package, and it simplifies our programming by performing a fu...原创 2014-06-30 15:12:12 · 164 阅读 · 0 评论 -
Hadoop: High Qulity Blog
http://www.cnblogs.com/zhangchaoyang/articles/2647905.html http://blog.pureisle.net/archives/1618.html http://www.youkuaiyun.com/article/2014-01-01/2817984-13-tools-let-hadoop-fly http://blog.mortar...原创 2014-07-01 15:01:29 · 149 阅读 · 0 评论 -
In-Memory Hadoop Accelerator
https://gridgaintech.wordpress.com/2013/11/07/hadoop-100x-faster-how-we-did-it/ Almost two years ago, Dmitriy and I stood in front of a white board at GridGain’s office thinking: “How can we deli...原创 2014-12-19 15:02:29 · 349 阅读 · 0 评论 -
data replication from different databases
tungsten-replicator-3.0.0-524-src原创 2014-12-22 10:22:15 · 144 阅读 · 0 评论 -
open replicator
http://blog.youkuaiyun.com/menergy/article/details/17583823原创 2014-12-22 20:35:00 · 197 阅读 · 0 评论 -
Data ETL tools for hadoop ecosystem Morphlines
when i use there is a error java.lang.NoClassDefFoundError: org/kitesdk/morphline/api/MorphlineCompilationException at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Cla...原创 2014-12-25 11:39:45 · 315 阅读 · 0 评论 -
flume source using mysql-replication-listener to realtime copy data from mysql
https://bitbucket.org/winebarrel/mysql-replication-listener http://flume.apache.org/FlumeUserGuide.html#a-simple-example https://www.cyberagent.co.jp/recruit/techreport/report/id=7474 https://do...原创 2014-12-18 11:46:06 · 175 阅读 · 0 评论