mapreduce将hdfs数据存入hbase-2.3.0

本文详细介绍了如何从零开始搭建HBase集群,包括JDK与Zookeeper的安装配置,以及HBase的环境设置。此外,还提供了一个使用MapReduce进行WordCount并存储至HBase的代码示例。
<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-client</artifactId>
    <version>2.3.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-server</artifactId>
    <version>2.3.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-mapreduce -->
<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-mapreduce</artifactId>
    <version>2.3.0</version>
</dependency>

1.首先搭建hbase集群

habse是基于zookper搭建的

先上传jdk的rpm文件

  jdk-8u211-linux-x64.rpm peiyajie@10.202.234.56:/home/peiyajie
rpm   -ivh jdk-8u211-linux-x64.rpm
sudo rpm   -ivh jdk-8u211-linux-x64.rpm

然后完成安装、

解压zookper的安装文件

进入conf
cp zoo_sample.cfg zoo.cfg

在zoo.cfg增加:

server.2=10.202.234.233:2888:3888
server.3=10.202.234.56:2888:3888
server.1=10.202.234.244:2888:3888

修改

dataDir=/home/peiyajie/zookeeper-3.4.14/data

新建/home/peiyajie/zookeeper-3.4.14/data下面myid里面的值和集群的id对应

然后解压hhase的压缩文件

修改hbase-site.xml并且新增

<property>

    <name>hbase.zookeeper.quorum</name>
    <value>10.202.234.244,10.202.234.233,10.202.234.56</value>
    <description>The directory shared by RegionServers.
    </description>
  </property>
 
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://10.202.234.244:9000/hbase</value>
    <description>The directory shared by RegionServers.
    </description>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    <description>The mode the cluster will be in. Possible values are
      false: standalone and pseudo-distributed setups with managed Zookeeper
      true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
    </description>
  </property>
  <property>
   <name>hbase.master.info.port</name>
   <value>60010</value>
  </property>

在hbase-env.sh里面新增

export JAVA_HOME=/usr/java/jdk1.8.0_211-amd64
export HBASE_MANAGES_ZK=false

上代码:

package com.qihoo.hadoop.util;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class WCHbaseMapper  extends  Mapper<LongWritable, Text,Text, NullWritable> {

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        context.write(value,NullWritable.get());
    }
}
package com.qihoo.hadoop.util;

import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;

import java.io.IOException;

/**
 *
 */
public class WCHbaseReducer extends TableReducer<Text, NullWritable, ImmutableBytesWritable> {
    @Override
    protected void reduce(Text key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {

        String[] split = key.toString().split(" ");
        Put put = new Put(Bytes.toBytes(split[0]));

        put.addColumn("info".getBytes(),"name".getBytes(),split[1].getBytes());
        put.addColumn("info".getBytes(),"age".getBytes(),Bytes.toBytes(Integer.parseInt(split[2])));

        context.write(new ImmutableBytesWritable(Bytes.toBytes(split[0])),put);
    }

}
package com.qihoo.hadoop.util;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;

import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import java.io.IOException;

public class HbaseUtil {
    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException
    {
        Configuration conf = new Configuration();
        conf.set("hbase.zookeeper.quorum","10.202.234.244,10.202.234.233,10.202.234.56");
        conf.set("hbase.zookeeper.property.clientPort", "2181");
        Job job=Job.getInstance(conf, "Runner"); //

        Path path = new Path("hdfs://10.202.234.244:9000/wordcount/inputpeiyajie/3.txt");
        FileInputFormat.addInputPath(job, path);

        job.setJarByClass(HbaseUtil.class) ;

        job.setMapOutputKeyClass(ImmutableBytesWritable.class) ;
        job.setMapOutputValueClass(Put.class) ;

        job.setMapperClass(WCHbaseMapper.class) ;
      //  job.setReducerClass(WCHbaseReducer.class) ;

//    org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(job, new Path("D:/testFile/mr1/" + new Date().getTime())) ;
        TableMapReduceUtil.initTableReducerJob("member", WCHbaseReducer.class, job) ;
        job.waitForCompletion(true);
    }

}
sqoop import \ > --connect jdbc:mysql://localhost:3306/mydatabase \ > --username root \ > -P \ > --table products \ > --target-dir /user/hadoop/products \ > --fields-terminated-by &#39;,&#39; \ > --lines-terminated-by &#39;\n&#39; \ > --delete-target-dir \ > --num-mappers 1 Warning: /home/ljm1/daolun/servers/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/ljm1/daolun/servers/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 2025-07-05 01:36:29,069 INFO [main] sqoop.Sqoop (Sqoop.java:<init>(92)) - Running Sqoop version: 1.4.6 Enter password: 2025-07-05 01:36:35,176 INFO [main] manager.MySQLManager (MySQLManager.java:initOptionDefaults(69)) - Preparing to use a MySQL streaming resultset. 2025-07-05 01:36:35,177 INFO [main] tool.CodeGenTool (CodeGenTool.java:generateORM(92)) - Beginning code generation 2025-07-05 01:36:35,718 INFO [main] manager.SqlManager (SqlManager.java:execute(757)) - Executing SQL statement: SELECT t.* FROM `products` AS t LIMIT 1 2025-07-05 01:36:35,822 INFO [main] manager.SqlManager (SqlManager.java:execute(757)) - Executing SQL statement: SELECT t.* FROM `products` AS t LIMIT 1 2025-07-05 01:36:35,850 INFO [main] orm.CompilationManager (CompilationManager.java:findHadoopJars(94)) - HADOOP_MAPRED_HOME is /home/ljm1/daolun/servers/hadoop-2.7.4 Note: /tmp/sqoop-ljm1/compile/280c35d4ceb0b5eb13ccf7a0afe259ef/products.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 2025-07-05 01:36:42,374 INFO [main] orm.CompilationManager (CompilationManager.java:jar(330)) - Writing jar file: /tmp/sqoop-ljm1/compile/280c35d4ceb0b5eb13ccf7a0afe259ef/products.jar SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/ljm1/daolun/servers/hbase-1.4.0/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2025-07-05 01:36:43,216 INFO [main] tool.ImportTool (ImportTool.java:deleteTargetDir(534)) - Destination directory /user/hadoop/products is not present, hence not deleting. 2025-07-05 01:36:43,216 WARN [main] manager.MySQLManager (MySQLManager.java:importTable(107)) - It looks like you are importing from mysql. 2025-07-05 01:36:43,217 WARN [main] manager.MySQLManager (MySQLManager.java:importTable(108)) - This transfer can be faster! Use the --direct 2025-07-05 01:36:43,217 WARN [main] manager.MySQLManager (MySQLManager.java:importTable(109)) - option to exercise a MySQL-specific fast path. 2025-07-05 01:36:43,217 INFO [main] manager.MySQLManager (MySQLManager.java:checkDateTimeBehavior(189)) - Setting zero DATETIME behavior to convertToNull (mysql) 2025-07-05 01:36:43,314 INFO [main] mapreduce.ImportJobBase (ImportJobBase.java:runImport(235)) - Beginning import of products 2025-07-05 01:36:43,375 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1173)) - mapred.jar is deprecated. Instead, use mapreduce.job.jar 2025-07-05 01:36:43,412 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1173)) - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2025-07-05 01:36:43,458 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1173)) - session.id is deprecated. Instead, use dfs.metrics.session-id 2025-07-05 01:36:43,462 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId= 2025-07-05 01:36:44,783 INFO [main] db.DBInputFormat (DBInputFormat.java:setTxIsolation(192)) - Using read commited transaction isolation 2025-07-05 01:36:44,854 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of splits:1 2025-07-05 01:36:45,148 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting tokens for job: job_local745453984_0001 2025-07-05 01:36:46,724 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605463/jackson-mapper-asl-1.9.13.jar <- /home/ljm1/jackson-mapper-asl-1.9.13.jar 2025-07-05 01:36:46,755 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/jackson-mapper-asl-1.9.13.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605463/jackson- 2025-07-05 01:36:46,793 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605464/kite-data-core-1.0.0.jar <- /home/ljm1/kite-data-core-1.0.0.jar 2025-07-05 01:36:46,798 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/kite-data-core-1.0.0.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605464/kite-data-cor 2025-07-05 01:36:46,798 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605465/xz-1.0.jar <- /home/ljm1/xz-1.0.jar 2025-07-05 01:36:46,801 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/xz-1.0.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605465/xz-1.0.jar 2025-07-05 01:36:46,801 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605466/kite-hadoop-compatibility-1.0.0.jar <- /home/ljm1/kite-hadoop-compatibility-1.0.0.j 2025-07-05 01:36:46,805 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/kite-hadoop-compatibility-1.0.0.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605466/ki 2025-07-05 01:36:46,805 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605467/parquet-avro-1.4.1.jar <- /home/ljm1/parquet-avro-1.4.1.jar 2025-07-05 01:36:46,810 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/parquet-avro-1.4.1.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605467/parquet-avro-1. 2025-07-05 01:36:46,811 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605468/sqoop-1.4.6.jar <- /home/ljm1/sqoop-1.4.6.jar 2025-07-05 01:36:46,818 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/sqoop-1.4.6.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605468/sqoop-1.4.6.jar 2025-07-05 01:36:46,818 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605469/kite-data-hive-1.0.0.jar <- /home/ljm1/kite-data-hive-1.0.0.jar 2025-07-05 01:36:46,829 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/kite-data-hive-1.0.0.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605469/kite-data-hiv 2025-07-05 01:36:46,829 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605470/parquet-generator-1.4.1.jar <- /home/ljm1/parquet-generator-1.4.1.jar 2025-07-05 01:36:46,833 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/parquet-generator-1.4.1.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605470/parquet-ge 2025-07-05 01:36:46,833 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605471/hsqldb-1.8.0.10.jar <- /home/ljm1/hsqldb-1.8.0.10.jar 2025-07-05 01:36:46,840 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/hsqldb-1.8.0.10.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605471/hsqldb-1.8.0.10.ja 2025-07-05 01:36:46,840 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605472/jackson-annotations-2.3.0.jar <- /home/ljm1/jackson-annotations-2.3.0.jar 2025-07-05 01:36:46,843 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/jackson-annotations-2.3.0.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605472/jackson- 2025-07-05 01:36:46,844 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605473/avro-1.7.5.jar <- /home/ljm1/avro-1.7.5.jar 2025-07-05 01:36:46,849 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/avro-1.7.5.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605473/avro-1.7.5.jar 2025-07-05 01:36:46,849 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605474/ant-contrib-1.0b3.jar <- /home/ljm1/ant-contrib-1.0b3.jar 2025-07-05 01:36:46,852 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/ant-contrib-1.0b3.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605474/ant-contrib-1.0b 2025-07-05 01:36:46,853 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605475/commons-compress-1.4.1.jar <- /home/ljm1/commons-compress-1.4.1.jar 2025-07-05 01:36:46,856 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/commons-compress-1.4.1.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605475/commons-com 2025-07-05 01:36:46,856 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605476/mysql-connector-java-5.1.47.jar <- /home/ljm1/mysql-connector-java-5.1.47.jar 2025-07-05 01:36:46,864 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/mysql-connector-java-5.1.47.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605476/mysql- 2025-07-05 01:36:46,865 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605477/slf4j-api-1.6.1.jar <- /home/ljm1/slf4j-api-1.6.1.jar 2025-07-05 01:36:46,869 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/slf4j-api-1.6.1.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605477/slf4j-api-1.6.1.ja 2025-07-05 01:36:46,869 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605478/opencsv-2.3.jar <- /home/ljm1/opencsv-2.3.jar 2025-07-05 01:36:46,873 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/opencsv-2.3.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605478/opencsv-2.3.jar 2025-07-05 01:36:46,874 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605479/parquet-column-1.4.1.jar <- /home/ljm1/parquet-column-1.4.1.jar 2025-07-05 01:36:46,878 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/parquet-column-1.4.1.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605479/parquet-colum 2025-07-05 01:36:46,879 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605480/jackson-core-asl-1.9.13.jar <- /home/ljm1/jackson-core-asl-1.9.13.jar 2025-07-05 01:36:46,882 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/jackson-core-asl-1.9.13.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605480/jackson-co 2025-07-05 01:36:46,882 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605481/snappy-java-1.0.5.jar <- /home/ljm1/snappy-java-1.0.5.jar 2025-07-05 01:36:46,888 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/snappy-java-1.0.5.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605481/snappy-java-1.0. 2025-07-05 01:36:46,889 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605482/commons-codec-1.4.jar <- /home/ljm1/commons-codec-1.4.jar 2025-07-05 01:36:46,893 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/commons-codec-1.4.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605482/commons-codec-1. 2025-07-05 01:36:46,893 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605483/commons-logging-1.1.1.jar <- /home/ljm1/commons-logging-1.1.1.jar 2025-07-05 01:36:46,897 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/commons-logging-1.1.1.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605483/commons-logg 2025-07-05 01:36:46,898 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605484/commons-io-1.4.jar <- /home/ljm1/commons-io-1.4.jar 2025-07-05 01:36:46,902 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/commons-io-1.4.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605484/commons-io-1.4.jar 2025-07-05 01:36:46,903 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605485/commons-jexl-2.1.1.jar <- /home/ljm1/commons-jexl-2.1.1.jar 2025-07-05 01:36:46,906 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/commons-jexl-2.1.1.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605485/commons-jexl-2. 2025-07-05 01:36:46,907 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605486/jackson-databind-2.3.1.jar <- /home/ljm1/jackson-databind-2.3.1.jar 2025-07-05 01:36:46,913 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/jackson-databind-2.3.1.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605486/jackson-dat 2025-07-05 01:36:46,913 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605487/parquet-jackson-1.4.1.jar <- /home/ljm1/parquet-jackson-1.4.1.jar 2025-07-05 01:36:46,916 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/parquet-jackson-1.4.1.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605487/parquet-jack 2025-07-05 01:36:46,916 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605488/paranamer-2.3.jar <- /home/ljm1/paranamer-2.3.jar 2025-07-05 01:36:46,919 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/paranamer-2.3.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605488/paranamer-2.3.jar 2025-07-05 01:36:46,920 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605489/parquet-common-1.4.1.jar <- /home/ljm1/parquet-common-1.4.1.jar 2025-07-05 01:36:46,924 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/parquet-common-1.4.1.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605489/parquet-commo 2025-07-05 01:36:46,925 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605490/avro-mapred-1.7.5-hadoop2.jar <- /home/ljm1/avro-mapred-1.7.5-hadoop2.jar 2025-07-05 01:36:46,929 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/avro-mapred-1.7.5-hadoop2.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605490/avro-map 2025-07-05 01:36:46,930 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605491/ant-eclipse-1.0-jvm1.2.jar <- /home/ljm1/ant-eclipse-1.0-jvm1.2.jar 2025-07-05 01:36:46,932 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/ant-eclipse-1.0-jvm1.2.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605491/ant-eclipse 2025-07-05 01:36:46,933 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605492/parquet-hadoop-1.4.1.jar <- /home/ljm1/parquet-hadoop-1.4.1.jar 2025-07-05 01:36:46,943 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/parquet-hadoop-1.4.1.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605492/parquet-hadoo 2025-07-05 01:36:46,943 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605493/mysql-connector-java-5.1.32.jar <- /home/ljm1/mysql-connector-java-5.1.32.jar 2025-07-05 01:36:46,947 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/mysql-connector-java-5.1.32.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605493/mysql- 2025-07-05 01:36:46,947 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605494/parquet-format-2.0.0.jar <- /home/ljm1/parquet-format-2.0.0.jar 2025-07-05 01:36:46,950 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/parquet-format-2.0.0.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605494/parquet-forma 2025-07-05 01:36:46,950 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605495/jackson-core-2.3.1.jar <- /home/ljm1/jackson-core-2.3.1.jar 2025-07-05 01:36:46,960 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/jackson-core-2.3.1.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605495/jackson-core-2. 2025-07-05 01:36:46,960 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605496/parquet-encoding-1.4.1.jar <- /home/ljm1/parquet-encoding-1.4.1.jar 2025-07-05 01:36:46,966 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/parquet-encoding-1.4.1.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605496/parquet-enc 2025-07-05 01:36:46,967 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:sym/mapred/local/1751704605497/kite-data-mapreduce-1.0.0.jar <- /home/ljm1/kite-data-mapreduce-1.0.0.jar 2025-07-05 01:36:46,973 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:setvers/sqoop-1.4.6/lib/kite-data-mapreduce-1.0.0.jar as file:/tmp/hadoop-ljm1/mapred/local/1751704605497/kite-dat 2025-07-05 01:36:47,158 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605463/jackson-mapper-asl-1.9.13.jar 2025-07-05 01:36:47,158 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605464/kite-data-core-1.0.0.jar 2025-07-05 01:36:47,158 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605465/xz-1.0.jar 2025-07-05 01:36:47,158 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605466/kite-hadoop-compatibility-1.0.0.jar 2025-07-05 01:36:47,158 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605467/parquet-avro-1.4.1.jar 2025-07-05 01:36:47,158 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605468/sqoop-1.4.6.jar 2025-07-05 01:36:47,158 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605469/kite-data-hive-1.0.0.jar 2025-07-05 01:36:47,158 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605470/parquet-generator-1.4.1.jar 2025-07-05 01:36:47,159 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605471/hsqldb-1.8.0.10.jar 2025-07-05 01:36:47,159 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605472/jackson-annotations-2.3.0.jar 2025-07-05 01:36:47,159 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605473/avro-1.7.5.jar 2025-07-05 01:36:47,159 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605474/ant-contrib-1.0b3.jar 2025-07-05 01:36:47,159 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605475/commons-compress-1.4.1.jar 2025-07-05 01:36:47,159 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605476/mysql-connector-java-5.1.47.jar 2025-07-05 01:36:47,159 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605477/slf4j-api-1.6.1.jar 2025-07-05 01:36:47,159 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605478/opencsv-2.3.jar 2025-07-05 01:36:47,159 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605479/parquet-column-1.4.1.jar 2025-07-05 01:36:47,159 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605480/jackson-core-asl-1.9.13.jar 2025-07-05 01:36:47,160 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605481/snappy-java-1.0.5.jar 2025-07-05 01:36:47,160 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605482/commons-codec-1.4.jar 2025-07-05 01:36:47,160 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605483/commons-logging-1.1.1.jar 2025-07-05 01:36:47,160 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605484/commons-io-1.4.jar 2025-07-05 01:36:47,160 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605485/commons-jexl-2.1.1.jar 2025-07-05 01:36:47,160 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605486/jackson-databind-2.3.1.jar 2025-07-05 01:36:47,160 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605487/parquet-jackson-1.4.1.jar 2025-07-05 01:36:47,160 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605488/paranamer-2.3.jar 2025-07-05 01:36:47,160 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605489/parquet-common-1.4.1.jar 2025-07-05 01:36:47,160 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605490/avro-mapred-1.7.5-hadoop2.jar 2025-07-05 01:36:47,160 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605491/ant-eclipse-1.0-jvm1.2.jar 2025-07-05 01:36:47,160 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605492/parquet-hadoop-1.4.1.jar 2025-07-05 01:36:47,161 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605493/mysql-connector-java-5.1.32.jar 2025-07-05 01:36:47,188 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605494/parquet-format-2.0.0.jar 2025-07-05 01:36:47,188 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605495/jackson-core-2.3.1.jar 2025-07-05 01:36:47,188 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605496/parquet-encoding-1.4.1.jar 2025-07-05 01:36:47,188 INFO [main] mapred.LocalDistributedCacheManager (LocalDistributedCacheManager.java:maked/local/1751704605497/kite-data-mapreduce-1.0.0.jar 2025-07-05 01:36:47,213 INFO [main] mapreduce.Job (Job.java:submit(1294)) - The url to track the job: http://l 2025-07-05 01:36:47,214 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: job_local 2025-07-05 01:36:47,220 INFO [Thread-44] mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(471) 2025-07-05 01:36:47,468 INFO [Thread-44] output.FileOutputCommitter (FileOutputCommitter.java:<init>(108)) - F 2025-07-05 01:36:47,470 INFO [Thread-44] mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(489)uce.lib.output.FileOutputCommitter 2025-07-05 01:36:47,482 ERROR [Thread-44] output.FileOutputCommitter (FileOutputCommitter.java:setupJob(314)) -ucts/_temporary/0 2025-07-05 01:36:47,686 INFO [Thread-44] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(448)) - Waiting f 2025-07-05 01:36:47,688 INFO [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:84_0001_m_000000_0 2025-07-05 01:36:47,791 INFO [LocalJobRunner Map Task Executor #0] output.FileOutputCommitter (FileOutputCommiAlgorithm version is 1 2025-07-05 01:36:47,828 INFO [LocalJobRunner Map Task Executor #0] mapred.Task (Task.java:initialize(612)) - 2025-07-05 01:36:47,919 INFO [LocalJobRunner Map Task Executor #0] db.DBInputFormat (DBInputFormat.java:setTxIn isolation 2025-07-05 01:36:47,931 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:runNewMapper(7 2025-07-05 01:36:47,959 INFO [Thread-44] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - map task 2025-07-05 01:36:47,966 WARN [Thread-44] mapred.LocalJobRunner (LocalJobRunner.java:run(560)) - job_local74545 java.lang.Exception: java.io.IOException: Mkdirs failed to create file:/user/hadoop/products/_temporary/0/_tempxists=false, cwd=file:/home/ljm1) at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: java.io.IOException: Mkdirs failed to create file:/user/hadoop/products/_temporary/0/_temporary/attee, cwd=file:/home/ljm1) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:455) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789) at org.apache.sqoop.mapreduce.RawKeyTextOutputFormat.getRecordWriter(RawKeyTextOutputFormat.java:98) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2025-07-05 01:36:48,217 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job job_local745453984 2025-07-05 01:36:48,221 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - map 0% reduce 0% 2025-07-05 01:36:48,226 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1380)) - Job job_local745453984 2025-07-05 01:36:48,258 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1385)) - Counters: 0 2025-07-05 01:36:48,265 WARN [main] mapreduce.Counters (AbstractCounters.java:getGroup(234)) - Group FileSystemapreduce.FileSystemCounter instead 2025-07-05 01:36:48,272 INFO [main] mapreduce.ImportJobBase (ImportJobBase.java:runJob(184)) - Transferred 0 b 2025-07-05 01:36:48,277 WARN [main] mapreduce.Counters (AbstractCounters.java:getGroup(234)) - Group org.apach org.apache.hadoop.mapreduce.TaskCounter instead 2025-07-05 01:36:48,277 INFO [main] mapreduce.ImportJobBase (ImportJobBase.java:runJob(186)) - Retrieved 0 rec 2025-07-05 01:36:48,277 ERROR [main] tool.ImportTool (ImportTool.java:run(621)) - Error during import: Import j 这是怎么回事
最新发布
07-06
### ❗Sqoop 导入失败原因分析 你运行的 Sqoop 命令如下: ```bash sqoop import \ --connect jdbc:mysql://localhost:3306/mydatabase \ --username root \ -P \ --table products \ --target-dir /user/hadoop/products \ --fields-terminated-by &#39;,&#39; \ --lines-terminated-by &#39;\n&#39; \ --delete-target-dir \ --num-mappers 1 ``` 从日志输出来看,虽然数据库连接成功了,但在执行 MapReduce 任务时 **遇到了权限或路径问题导致写入 HDFS 失败**: --- ### 🚨 关键错误信息: ```text ERROR ... Mkdirs failed to create file:/user/hadoop/products/_temporary/0/_temporary/... Caused by: java.io.IOException: Mkdirs failed to create ... ``` 说明 Sqoop **无法在 HDFS 上创建目标目录 `/user/hadoop/products` 及其临时子目录**。 --- ### ✅ 解决方案:确保 HDFS 目录权限和存在性 #### ✅ 步骤 1:检查 HDFS 是否正常运行 确保 Hadoop 的 HDFS 已启动: ```bash start-dfs.sh start-yarn.sh ``` --- #### ✅ 步骤 2:手动创建并设置目标目录权限 使用 `hdfs dfs` 命令提前创建目标目录,并赋予当前用户权限: ```bash hdfs dfs -mkdir -p /user/hadoop/products hdfs dfs -chmod 777 /user/hadoop/products ``` > 如果你是以 `ljm1` 用户操作,也可以指定拥有者: > > ```bash > hdfs dfs -chown ljm1 /user/hadoop/products > ``` --- #### ✅ 步骤 3:重新运行 Sqoop 导入命令 再次运行你的 Sqoop 命令(建议加上 `-P` 更安全): ```bash sqoop import \ --connect jdbc:mysql://localhost:3306/mydatabase \ --username root \ -P \ --table products \ --target-dir /user/hadoop/products \ --fields-terminated-by &#39;,&#39; \ --lines-terminated-by &#39;\n&#39; \ --delete-target-dir \ --num-mappers 1 ``` 输入密码后观察是否仍然报错。 --- ### 🔍 补充说明与优化建议 1. **避免本地模式限制** - 默认情况下,Sqoop 可能以“本地模式”运行作业(LocalJobRunner),适合调试但不适合真实集群。 - 确保 `mapreduce.framework.name=yarn` 设置在 `mapred-site.xml` 中。 2. **使用直接导入提高性能(可选)** - 加上 `--direct` 参数可以启用 MySQL 快速导入路径: ```bash sqoop import --connect ... --direct ... ``` 3. **验证数据是否导入成功** - 成功导入后,用以下命令查看 HDFS 文件内容: ```bash hdfs dfs -cat /user/hadoop/products/part-m-00000 ``` --- ### 🧠 小结:关键知识点 | 知识点 | 说明 | |--------|------| | **HDFS目录权限管理** | Sqoop写入HDFS前必须确保目标路径存在且当前用户有写权限。 | | **MapReduce任务执行机制** | Sqoop底层依赖MapReduce进行数据传输,会自动创建临时目录。 | | **Sqoop目录覆盖策略** | 若目录已存在,需添加 `--delete-target-dir` 或先删除旧目录。 | --- 如果你还想继续做后续的数据清洗、分析或导出到MySQL的操作,也可以告诉我,我可以帮你一步步完成整套流程。💪
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值