hadoop
leibnitz09
这个作者很懒,什么都没留下…
展开
专栏收录文章
- 默认排序
- 最新发布
- 最早发布
- 最多阅读
- 最少阅读
-
some features of hadoop-0.20.2
some hdfs features of hadoop-0.20.2高度容错性 a.块复制时不需要全部复制成功才算成功,它有个阀值控制,只要达到就算成功。之后NN会检查是否有不满足replication num要求的blocks并进行处理。(相信这只是其中之一) b.使用skipping模式,更灵活控制。 高可靠性 a.采用redundan...2011-05-04 15:07:42 · 131 阅读 · 0 评论 -
hadoop源码阅读-shell启动流程-start-all
when executes start-all.sh (or start-dfs.sh,start-mapred.sh) ,some config files listed below will be loaded: libexec/hadoop-config.sh ( use bin/hadoop-config.sh if without previor) confi...2012-05-06 01:13:54 · 140 阅读 · 0 评论 -
hadoop-2.0 alpha standalone install
看了一堆不太相关的东西...其实只要解压运行即可,形如hadoop-0.20.xx,不过要注意jar的位置. hadoop jar share/hadoop/hadoop-mapreduce/hadoop-example-xx.jar wordcount input output 接下来将进行cluster布署,加深了解新架构的动作流程. 12/06/17 11:...2012-06-10 12:02:59 · 136 阅读 · 0 评论 -
访问hadoop数据时注意相对路径问题
今天在nutch配置分布式搜索时出现搜索不到結果,背景是:用hadoop账号建立了索引,但使用xx账号时搜索不到。奇怪的是我在mr plugin下,使用xx账号却可以访问正常。 刚开始以为是conf下文件不对,但后来将整个tomcat布署在hadoop下却有結果,所以判断不是配置问题。转而怀疑是不同账号hadoop平台有没有做限制。如果是真的,为什么mr plugin下可以访问正常...2011-12-07 00:30:53 · 446 阅读 · 0 评论 -
hadoop 2(0.23.x) 与 0.20.x比较
以下大部分内容来自网络,这里主要是进行学习,比较 1、Hadoop 0.20.*的局限性HDFS单NameNode的不足1)扩展性问题。可以随着数据量进行水平扩展,而元数据服务器不能扩展。 2)随着文件数目的增长,元数据服务器的压力变大。据统计,2.5亿个文件在NameNode中Namespace占据 的大概64GB的内存空间。 3)...原创 2012-07-01 12:09:11 · 226 阅读 · 0 评论 -
hadoop 删除节点(Decommission nodes)
具体的操作步骤网上已经很多,这里只说明一下自己操作过程注意事项:1.exclude-file中添加的nodes不能是slaves中的,要指定ip(host-name is ok?)2.不要使用start-balancer.sh而是要hadoop dfsadmin -refreshNodes 前者是在当前所有结点中进行,不考虑是否有exclude-nod...2012-09-02 03:28:58 · 556 阅读 · 0 评论 -
hadoop-2 dfs/yarn 相关概念
一.dfs1.旧的dfs方案 可以看到block管理与NN是一对一的,即整个集群中只有一个'block pool',隔离性差;另外物理是它是置于NN端的,逻辑上把它归入到bock storage上了,这是需要注意的; 2.dfs federation 新存储架构采用了multi-namespaces机制,安全/故障隔离性好;每个Ns都有一个自己的Pool...2012-10-03 00:22:32 · 448 阅读 · 0 评论 -
3。hbase rpc/ipc/proxy通信机制
一。RPC VS IPC (relationship/difference)IPC inter-process communicationas [1] said ,there are two types ipc by now:1.LPC like RPC,but this is a 'epitome' of it,that is in general it will be us...2013-07-15 15:12:20 · 634 阅读 · 0 评论 -
install snappy compression in hadoop and hbase
1.what is snappy 2.why to use it 3.how it to work 4.compare to similar compressions 5.install it on hadoop 6.intall it on hbase 6.2 verifya.b.c....2014-03-08 00:36:23 · 153 阅读 · 0 评论 -
how to submit jars to a map reduce job?
there maybe two ways :1.server sidea.place the jars to HADOOP_HOME/lib dir; orb.setup a HADOOP_CLASSPATH to include the jars needed by a job 2.client sidea.create a 'fat' job jar which co...2014-04-02 01:23:19 · 137 阅读 · 0 评论 -
hoya--hbase on yarn
Introducing Hoya – HBase on YARN application architecutre原创 2015-04-23 17:00:02 · 223 阅读 · 0 评论 -
hadoop-compression
http://blog.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/(namely :使hadoop支持Splittable压缩lzo)Very basic question about Hadoop and compressed input filesHadoop g...原创 2015-10-26 16:52:11 · 126 阅读 · 0 评论 -
upgrades of hadoop and hbase
1.the match relationships between hadoop and hbase nowaday that means if you are now using hadoop1.0.x and hbase-0.94.x,and assume u want to upgrade to hadoop-2.5.x and hbase-1.0.x,the step...2014-10-28 11:39:14 · 145 阅读 · 0 评论 -
compile hadoop-2.5.x on OS X(macbook)
same as compile hbase ,it 's necessary for u to compile your own dist of hadoop to get best performance specified to your hardwares. bellow are some steps needed by this theme:1.install some ...2014-10-30 15:42:42 · 140 阅读 · 0 评论 -
yarn-similar logs when starting up container
15/12/09 16:47:52 INFO yarn.ExecutorRunnable: Setting up executor with environment: Map(CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark__.jar<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HO...2015-12-09 17:17:49 · 126 阅读 · 0 评论 -
hbase-export table to json file
i wanna export a table to json format files,but after gging,nothing solutions found.i known,pig is used to do soome sql like mapreduces stuff; and hive is a dataware to build on hbase.but i cant s...2015-12-25 17:21:34 · 294 阅读 · 0 评论 -
hadoop源码阅读-shell启动流程
open the bin/hadoop file,you will see the there is a config file to load: either libexec/hadoop-config.sh or bin/hadoop-config.sh and the previor is loaded if exists,else the load the later. ...2012-05-03 01:58:54 · 140 阅读 · 0 评论 -
hadoop源码阅读-第二回阅读开始
出于工作需要及版本更新带来的变动,现在开始再次进入源码空间-hadoop-1.0.1 这次阅读的目的有这几个:-比较全面的阅读整体代码,清楚大体的工作流程,各部件的连接与交互 ;-common的改动及主要职责;-config/shell 的启动流程;-hdfs具体的设计及实现;-mapreduce的詳細设计及实现;-ipc詳細实现-others ...原创 2012-05-03 01:03:17 · 104 阅读 · 0 评论 -
components commands comparation
componentstartstop clienttesthdfsstart-dfs.shstop-dfs.shjps;hadoop fs -ls .mapredstart-mapred.shstop-mapred.shjps;hadoop j...2011-05-22 20:50:19 · 127 阅读 · 0 评论 -
hadoop standalone install
只需要在hadoop-env.sh中修改java home,不需要format,不需要copyFromLocal to hdfs 注意使用的是standalone状态下的hadoopuse 5shadoop@leibnitz-laptop:/cc/hadoop/standalone/hadoop-0.20.2$ ./bin/hadoop jar hadoop-0.20.2-ex...2011-02-27 21:32:18 · 153 阅读 · 0 评论 -
hadoop pseudo-cluster install
vvvvvvvvvvvv config vvvvvvvvvvvvvvv安装jdk:sudo -s ./jdk.binset environments:/etc/profile #global~/.profile #personalize#optionalsudo addgroup hadoopgrpsudo adduser --ingroup h...2011-02-27 21:34:57 · 140 阅读 · 0 评论 -
hadoop cluster install
vvvvvvvvv config vvvvvvvvset domain alias in all nodes(optional must):/etc/hosts#let the master accesses all the slaves without passwords:#method 1:ssh-copy-id -i $HOME/.ssh/id_rsa.pub h...2011-02-27 21:39:21 · 132 阅读 · 0 评论 -
hadoop standalone running procedure
todo2011-03-04 01:08:23 · 110 阅读 · 0 评论 -
hadoop cluster running procedure
todo2011-03-04 01:09:07 · 119 阅读 · 0 评论 -
nutch 搜索流程 2-distributed search
了解了local search ,那么进行distributed search也是相当简单的。只涉及几台机器的搜索,归并服务而已。 图中虚线表示采用local fs情况,即每台机器放自己的index,segmenets(注意它也也是可以分布式) references:nutch1.0分布查询 ...原创 2011-07-20 13:55:21 · 140 阅读 · 0 评论 -
hdfs data flow-part reading
The DistributedFileSystem returns a FSDataInputStream (an input stream that supportsfile seeks) to the client for it to read data from. FSDataInputStream in turn wraps aDFSInputStream, which manag...2011-03-17 02:39:00 · 138 阅读 · 0 评论 -
hdfs data flow-part writing
The client creates the file by calling create() on DistributedFileSystem (step 1 inFigure 3-3). DistributedFileSystem makes an RPC call to the namenode to create a newfile in the filesystem’s name...2011-03-17 02:43:26 · 213 阅读 · 0 评论 -
nutch 数据增量更新
以下是转载的执行recrawl的脚本(其实还是可以再优化的,比如参数和备份处理过程等),来对比 一下与普通的crawl有啥区别。# runbot script to run the Nutch bot for crawling and re-crawling.# Usage: bin/runbot [safe]# If executed in 'safe' mode,...2011-07-22 19:25:17 · 175 阅读 · 0 评论 -
hadoop-serializations
一. Writablenote:part of codes are from other's blog!here is a integrated and optimized shards. package test;import java.io.IOException;import org.apache.hadoop.conf.Configuration;impor...2011-03-24 23:00:49 · 172 阅读 · 0 评论 -
nutch搜索架构关键类
在整个crawl->recrawl后,其实作为搜索的文件夹只有两个:* index(indexes) :提供搜索,和获取details信息(其实它也是通过 lucene doc fields来得到)。如title,url,last-modified,cache等等。* segments : 提供summary即页面的描述,也就 是parse_text和cached(快照,con...2011-12-13 00:17:12 · 168 阅读 · 0 评论 -
nutch结合hadoop解説 RPC机制
todo2011-12-13 00:18:19 · 168 阅读 · 0 评论 -
hadoop几种排序简介
在map reduce框架中,除了常用的分布式计算外,排序也算是比较重要的一环了。这形如sql查询中的排序数据一样重要。 一。无排序当书写code 时,如果指定了mapred.reduce.tasks=0(same effect as setNumReduceTasks)。这样便达到目的。产生的效果当然是只有一个part file,而且其中的entries是unorder....原创 2011-12-16 21:52:54 · 245 阅读 · 0 评论 -
hadoop 联合 join操作
hadoop join操作类似于sql中的功能,就是对多表进行取子集并合并一起。其中有很多工具已经可用,如pig,hive,cascading. map端联接 reduce端联接同样,就 是联接处理时在reduce端。那么有哪些步骤呢?(讨厌原书的翻译者把它译作几种方法!)1.由于在reduce端处理,必须会考虑到多输入问题,即多表。于是MultiInputs...2012-01-02 18:06:46 · 138 阅读 · 0 评论 -
hadoop-replication written flow
w:net writer :net read(correspond to net write)wl:write locally,ie. fs writecost time:t4-t0=1 client w+2 DN w+ 1 DN wl~3w + 1wl ~ 1wl (assume disk write is bottle neck)2017-08-14 17:00:54 · 148 阅读 · 0 评论
分享