hadoop部署错误集锦

最新推荐文章于 2024-11-25 11:42:46 发布

三少GG

最新推荐文章于 2024-11-25 11:42:46 发布

阅读量1.7k

点赞数

CC 4.0 BY-SA版权

分类专栏：云计算/大数据

本文链接：https://blog.youkuaiyun.com/pkueecser/article/details/11004439

云计算/大数据专栏收录该内容

93 篇文章

订阅专栏

本文详细阐述了在迁移Hadoop环境时遇到的几个关键问题及其解决方案，包括readObject类找不到问题、DistributedCache返回null、序列化处理及Java堆内存溢出等问题，提供了解决思路和具体代码示例。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

注意伪分布式和分布式的不同！！！环境移植时会出现以下问题

1. 错误： readObject can't find class

caught: java.lang.RuntimeException:
readObject can't find class

解决： JobConf job = new JobConf();

改为 JobConf job = new (getConf(), BloomFilterJoin.class);

进一步：（未尝试） ((JobConf) job.getConfiguration()).setJar("pr.jar");

2. DistributedCache 返回null

1) 注意路径！

conf.get("fs.default.name");

conf.set("fs.default.name", "hdfs://master:9000");

2) 注意 getCacheFile() 和 getLocalCacheFile()的区别

跟具体的分布式或伪分布式有关！！！

解决办法：可以先用getLocalCacheFile() 若为空，再getCacheFile()

Subject: Re: reading distributed cache returns null pointer

The DistributedCache behavior is not symmetrical in local mode vs
distributed mode.

As I replied earlier, you need to use

DistributedCache.getCacheFiles() in distributed mode.

In your code, you can put  a check:

if (getLocalCacheFiles()) returns null then use getCacheFiles()) instead. Or
use the right API depending upon the mode you are executing in

Thanks Rahul... That worked. Using DistributedCache.getCacheFiles() in distributed mode and

DistributedCache.getLocalCacheFiles() in pseudo-distributed mode.

3. 序列化

特殊处理

java.lang.NullPointerException
at xxxjoin$TaggedWritable.readFields(xxxjoin.java:166)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)

readFields按照 stackoverflow 回复stackoverflow.com/questions/10201500/hadoop-reduce-side-join-using-datajoin 一模一样处理！尤其是为空的判断！

4. java heap over

亦可以在mapreduce代码里通过全局变量更改。 job.set("mapred.child.java.opts", "-Xmx512m");

启动虚拟机的时候，加上一个参数：-Xms800m -Xmx800m就好了
-Xms <size>
设置JVM初始化堆内存大小

-Xmx <size>
设置JVM最大的堆内存大小

如果是应用程序，则：java -Xms800m -Xmx800m 你的类名