
hadoop hive
文章平均质量分 76
我终于有blog了
菜鸟一只
展开
-
hive相关重要变量
1.maponly 用于控制map数量如何合并小文件,减少map数? 假设一个SQL任务: Select count(1) from popt_tbaccountcopy_mes where pt = ‘2012-07-04’; 该任务的inputdir /group/p_sdo_data/p_sdo_data_etl/pt/popt_tb原创 2018-01-02 10:16:07 · 340 阅读 · 0 评论 -
hive安装在hadoop集群后的一些坑
1.hive on spark(spark1)首先要想hive on spark的话版本要对齐(spark2之后的貌似还不能成功) 在hive-site里面更改hive.execution.engine = spark spark.home=/home/hadoop/spark-2.3.0-bin-hadoop2.7 spark.submit.deployMode = client or cl...原创 2018-06-13 16:03:15 · 2295 阅读 · 0 评论 -
hadoop shuffle阶段重要设置
mapreduce.reduce.shuffle.input.buffer.percent: mapreduce.reduce.shuffle.input.buffer.percent tells about the percentage of the reducer's heap memory to be allocated for the circular buffer to sto...原创 2018-10-17 10:22:51 · 406 阅读 · 0 评论 -
记一次hive客户端部署的问题
场景:cdh集群正常的运行,要从远程安装hive客户端连接cdh集群进行数据操作。1.客户端的版本要和cdh的hive版本保持一致2.本地安装hadoop环境,配置好core-site、hdfs-site、yarn-site几个配置文件3.hive-site配置好元数据库以及远程的hive metadata连接4.启动本地hive,能进行正常的语句 报错:1.本地运行(...原创 2018-10-24 14:17:20 · 1125 阅读 · 0 评论 -
hdfs丢失block处理
You can use hdfs fsck /to determine which files are having problems. Look through the output for missing or corrupt blocks (ignore under-replicated blocks for now). This command is really verbo...原创 2018-11-06 11:13:55 · 1613 阅读 · 0 评论 -
yarn通过客户端提交application
pom:<dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.6.0&原创 2019-01-04 17:45:23 · 2900 阅读 · 1 评论 -
通过jobclient监控远程集群任务
pom:<dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.6.0&原创 2019-01-14 16:16:29 · 432 阅读 · 0 评论