Java Basic -- Serialization and I/0

本文深入探讨了Java中对象的状态保存与恢复,包括序列化与反序列化的实现原理,如何通过序列化保存对象状态,以及如何在不同会话间恢复对象状态。文章还介绍了序列化过程中需要注意的问题,如版本控制、类结构变化的影响,以及如何使用serialVersionUID确保兼容性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Basics

      Objects can be flattened and inflated. Objects have state and behavior. Behavior lives in the class,but state lives within each individual object. So what happens when it’s time to save the state of an object? If you’re writing a game, you’re gonna need a Save/Restore Game feature. If you’re writing an app that creates charts, you’re gonna need a Save/Open File feature. If your program needs to save state, you can do it the hard way, interrogating each object, then painstakingly writing the value of each instance to a file, in a formate you create. Or, you can do it the easy OO way – you simply flatten the object itself, and inflate it to get it back. But you’ll still have to do it the hard way sometimes, especially when the file your app saves has to be read by some other non-Java application.
      You have lots of options for how to save the state of your Java program, and what you choose will probably depend on how you plan to use the saved state.
      If your data will be used by only the Java program that generated it: Use serialization – Write a file that holds flattened (Serialized) objects. Then have your program read the serialized objects from the file and inflate them back into living, breathing, heap-inhabiting objects.
      If your data will be used by other programs:Write a plain text file – Write a file, with delimiters that other programs can parse. For example, a tab-delimited file that a spreadsheet or database application can use.
      These aren’t the only options, of course. You can save data in any formate you choose. Instead of writing charaters, for example, you can write your data as bytes. But regardless of the method you use, the fundamental I/0 techniques are pretty much the same: write some data to something, and usually the something is either a file on disk or a stream coming from a network connection. Reading the data is the same process in reverse: read some data from either a file on disk or a network connection. And of course everthing we talk about in this part is for times when you aren’t using an actual database.

Demo – Saving State

      Imagine you have a program, say, a fantasy adventure game, that takes more than one session to complete. As the game progresses, characters in the game become stronger, weaker, smarter, etc., and gather and use weapons. You don’t want to start from scratch each time you launch the game. So, you need a way to save the state of the characters, and a way to restore the state when you resume the game.
      Imagine you have three game characters to save …

Serialization

  • Writing a serialized object to a file

//Step 1
/* Make a FileOutputStream */
/* 
 Note: If the file "MyGame.ser" doesn't exist, it will be created automatically.
 Note: FileOutputStream knows how to connect to (and create) a file. 
*/
FileOutputStream fileStream = new FileOutputStream("MyGame.ser");

//Setp 2
/* Make an ObjectOutputStream */
/*
 Note: ObjectOutputStream lets you write objects, but it can't directly connect to a file. It need to be fed a 'helper'. This is actually called 'chaining' one stream to another.
*/
ObjectOutputStream os = new ObjectOutputStream(fileStream);

//Step 3
/* Write the object */
os.writeObject(characterOne);
os.writeObject(characterTwo);
os.writeObject(characterThree);

//Step 4
/* Close the ObjectOutputStream */
/*
 Note: Closing the stream at the top closes the ones underneath, so the FileOutputStream (and the file) will close automatically.
*/
os.close();

  • Data Moves in streams from one place to another

      The Java I/O API has connection streams, that represent connections to destinations and sources such as files on disk or network sockets, and chain streams that work only if chained to other streams.
      Often, it takes at least two streams hooked together to do something useful – one to represent the connection and another to call methods on. Why two? Because connection streams are usually too low-level. FileOutputStream (a connection stream), for example, has methods for writing bytes. But we don’t want to write bytes! We want to write objects, so we need a high-level chain stream.
      OK, then why not have just a single stream that does exactly what you want? One that lets you write objects but underneath converts them to bytes? Think good OO. Each class does one thing well. FileOutputStream write bytes to a file. ObjectOutputStream turn objects into data that can be written to a stream. So we make a FileOutputStream that lets us write to a file, and we hook an ObjectOutputStream (a chain stream) on the end of it. When we call writeObject() on the ObjectOutputStream, the objects gets pumbed into the stream and then moves to the FileOutputStream where it ultimately gets written as bytes to a file.
      The ability to mix and match different combinations of connection and chain streams gives you tremendous flexibility!

  • What really happens to an object when it’s serialized?

  • What exactly IS an object’s state? What needs to be saved?

      Now it starts to get interesting. Easy enough to save the primitive values 37 and 70. But what if an object has an instance variable that’s an object reference? What about an object that has five instance variables that are object references? What if those object instance variables themselves have instance variables?
      Think about it. What part of an object is potentially unique? Imagine what nees to be restored in order to get an object that’s idential to the one that was saved. It will have different memory location, of course, but we don’t care about that. All we care about is that out there on the heap, we’ll get an object that has the same state that the object had when it was saved.
      When an object is serialized, all the objects it refers to from instance variables are also serialized. And all the objects those objects refer to are serialized. And all the objects those objects refer to are serialized… and the best part is, it happens automatically.
      Serialization saves the entire object graph. All objects referenced by instance variables start with the object being serialized.

  • If you want your class to be serializable, implement Serializable.

      The Serializable interface is known as a marker or tag interface, because the interface doesn’t have any methods to implement. Its sole purpose is to announce that the class implementing it is , well, serializable. In other words, objects of that type are savable through the serialization mechanism. If any superclass of a class is serializable, the subclass is automatically serializable even if the subclass doesn’t explicitly declare implements Serializable.


/* 
 Note: whatever goes here (the argument characterOne) MUST implement Serializable or it will fail at runtime 
 */
os.writeObject(characterOne);

  • Serialization is all or nothing.

      Either the entire object graph is serialized correctly or serialization fails. You can’t serialize a object if its instance variable refuses to be serialized (by not implementing Serializable).
      If you want an instance variable to be skipped by the serialization process, mark the variable with the transient keyword.


class Game implements Serializable {
	/* transient says, "don't save this variable during serialization, just  skip it."  */
	transient String currentID; 
	/* name varibale will be saved as part of the object's state during serialization. */
	String name;
}

Deserialization

      The whole point of serializing an object is so that you can restore it back to its original state at some later date, in a different ‘run’ of the JVM (which might not even be the same JVM that was running at the time the object was serialized). Deserialization is a lot like serialization in reverse.


//Step 1
/* Make a FileInputStream */
/*
 Note: If the file "MyGame.ser" doesn't exist, you'll get an exception.
 Note: The FileInputStream knows how to connect to an existing file.
*/
FileInputStream fileStream = new FileInputStream("MyGame.ser");

//Step 2
/* Make an ObjectInputStream */
/*
 Note: ObjectInputStream lets you read objects, but it can't directly connect to a file. It needs to be chained to a connection stream, in this case a FileInputStream.
*/
ObjectInputStream os = new ObjectInputStream(fileStream);

//Step 3
/* Read the objects */
/*
 Note: Each time you say readObject(), you get the next object in the stream. So you'll read them back in the same order in which they were written. You'll get a big fat exception if you try to read more objects than you wrote.
*/
Object one = os.readObjct();
Object two = os.readObjct();
Object three = os.readObjct();

//Step 4
/* Cast the objects */
/*
 Note: The return value of readObject() is type Object, so you have to cast it back to the type you know it really is.
*/
GameCharacter elf = (GameCharacter)one;
GameCharacter troll = (GameCharacter)two;
GameCharacter magician = (GameCharacter)three;

//Step 5
/* Close the ObjectInputStream */
/*
 Note: Closing the stream at the top closes the ones underneath, so the FileInputStream (and the file) will close automatically.
*/
os.close();

  • What happens during deserialization?

      When an object is deserialized, the JVM attempts to bring the object back to life by making a new object on the heap that has the same state the serialized object had at the time it was serialized. Well, except for the transient variables, which come back either null (for object references) or as default primitive values.

      1)The object is read from the stream;
      2)The JVM determines (through info stored with the serialized object) the object’s class type.
      3)The JVM attempts to find and load the object’s class. If the JVM can’t find and/or load the class, the JVM throws an exception and the deserialization fails.
      4)A new object is given space on the heap, but the serialized object’s constructor does not run! Obviously, if the constructor ran, it would restore the state of the object back to its original ‘new’ state, and that’s not what we want. We want the object to be restored to the state it had when it was serialized, not when it was first created.
      5)If the object has a non-serializable class somewhere up its inheritance tree, the constructor for that non-serializable class will run along with any constructors above that (even if they’re serializable). Once the constructor chaining begins, you can’t stop it, which means all superclasses, begining with the first non-serializable one will reinitialize their state.
      6)The object’s instance variables are given the values from the serialized state.Transient variables are given a value of null for object references and defaults (0, false. etc.) for primitives.

SerialVersionUID

  • Version Control is crucial

If you serialize an object, you must have the class in order to deserialize and use the object. OK, that’s obvious. But what might be less obvious is what happens if you change the class in the meantime? Yikes. Imagine trying to bring back a Dog object when one of its instance variables (non-transient) has changed from a double to a String. That violates Java’s type-safe sensibilities in a Big Way. But that’s not the only change that might hurt compatibility.

Changes to a class that can hurt deserialization:
Deleting an instance variable
Changing the declared type of an instance variable
Changing a non-transient instance variable to transient
Moving a class up or down the inheritance hierarchy
Changing a class (anywhere in the object graph) from Serializable to not Serializable (by removing ‘implements Serializable’ from a class declaration)
Changing an instance variable to static

Changes to a class that are usually OK:
Adding new instance variables to the class (existing objects will deserialize with default values for the instance variables they didn’t have when they were serialized)
Adding classes to the inheritance tree
Removing classes from the inheritancne tree
Changing the access level of an instance variable has no affect on the ability of deseriablization to assign a value to the variable
Changing an instance variable from transient to non-transient (previously-serialized objects will simply have a default value for the previously-transient variables)

  • Using the serialVersionUID

      Each time an object is serialized, the object (including every object in its graph) is ‘stamped’ with a version ID number for the object’s class. The ID is called the serialVersionUID, and it’s computed based on information about the class structure. As an object is being deserialized, if the class has changed since the object was serialized, the class would have a different serialVersionUID, and deserialization will fail. But you can control this.
      If you think there is ANY possibility that your class might evolve, put a serial version ID in your class.
      When Java tries to deserialize an object, it compares the serialized object’s serialVersionUID with that of the class the JVM is using for deserializing the object. For example, if a Dog instance was serialized with an ID of, say 23 (in reality a serialVersionUID is much longer), when the JVM
deserializes the Dog object it will first compare the Dog object serialVersionUID with the Dog class serialVersionUID. If the two numbers don’t match, the JVM assumes the class is not compatible with the previously-serialized object, and you’ll get an exception during deserialization.
      So, the solution is to put a serialVersionUID in your class, and then as the class evolves, the serialVersionUID will remain the same and the JVM will say, “OK, cool, the class is compatible with this serialized object.” even though the class has actually changed.
      This works only if you’re careful with your class chagnes! In other words, you are taking responsibilty for any issues that come up when an older object is brought back to life with a newer class.

I/O

  • Writing a String to a Text File

      Writing text data (a String, actually) is similar to writing an object except you write a String instead of an Object, and you use a FileWriter instead of a FileOutputStream (and you don’t chain it to an ObjectOutputStream).


/*
 Note: All the I/O stuff must be in a try/catch. Everything can throw an exception.
*/
try {
/* If the file "Ftd.txt" doesn't exist, FileWriter will create it. */
FileWriter writer = new FileWriter("Ftd.txt");
writer.write("Hello world!");
writer.close();
} catch(IOException ex) {
	ex.printStackTrace();
}

  • The java.io.File class

      The java.io.File class represents a file on disk, but doesn’t actually represent the contents of the file. What? Think of a File object as something more like a pathname of a file (or even a directory) rather than The Actual File Itself. The File class does not, for example, have methods for reading and writing. One VERY useful thing about a File object is that it offers a much safer way to represent a file than just using a String file name. For example, most classes that take a String file name in their constructor (like FileWriter or FileInputStream) can take a File object instead. You can construct a File object, verify that you’ve got a valid path, etc. and then give that File object to the FileWriter or FileInputStream.
      A File object represents the name and path of a file or directory on disk, for example: /Users/dqs/Desktop/Ftd.txt. But it does NOT represent, or give you access to, the data in the file.

  • The beauty of buffers

      If there were no buffers, it would be like shopping without a cart. You’d have to carry each thing out to your car, one soup can or toilet paper roll at a time.

BufferedWriter writer = new BufferedWriter(new File("Ftd.txt"));

      The cool thing about buffers is that they’re much more efficient than working without them. You can write to a file using FileWriter alone, by calling write(something), but FileWriter writes each and everything you pass to the file each and every time. That’s overhead you don’t want or need, since every trip to the disk is a Big Deal compared to manipulating data in memory. By chaining a BufferdWritter onto a FileWriter, the BufferedWriter will hold all the stuff you write to it until it’s full. Only when the buffer is full will the FileWriter actually be told to write to the file on disk.
      If you want to send data before the buffer is full, you do have control. Just Flush It. Calls to writer.flush() say, “send whatever’s in the buffer, now”.

  • Reading from a Text File

      Reading text from a file is simple, but this time we’ll use a File object to represent the file, a FileReader to do the actual reading and a BufferedReader to make the reading more efficient.
      The read happens by reading lines in a while loop, ending the loop when the result of a readLine() is null. That’s the most common style for reading data (pretty much anything that’s not a Serialized object): read stuff in a while loop, terminating when there’s nothing left to read (which we know because the result of whatever read method we’re using is null).

try{
	/* A FileReader is a connection stream for characters, that connects to a text file. */
	FileReader fileReader = new FileReader(new File("Ftd.txt"));
	/* Chain the FileReader to a BufferedReader for more efficient reading. */
	BufferedReader reader = new BufferedReader(fileReader);
	/* Make a String variable to hold each line as the line is read. */
	String line = null;
	while ((line = reader.readLine()) != null) {
		  /*
			This says, "Read a line of text, and assign it to the String variable 'line'. While that variable is not null (because there WAS something to read) print out the line that was just read."
			Or another way of saying it, "While there are still lines to read, read them and print them."
		  */
		  System.out.println(line);
	}
	reader.close();
} catch(IOException ex) {
	ex.printStackTrace();
}
[root@node ~]# mysql -u root -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 8 Server version: 8.0.42 MySQL Community Server - GPL Copyright (c) 2000, 2025, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> CREATE DATABASE weblog_db; ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'CREATE DATABASE weblog_db' at line 1 mysql> CREATE DATABASE weblog_db; ERROR 1007 (HY000): Can't create database 'weblog_db'; database exists mysql> USE weblog_db; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> DROP DATABASE IF EXISTS weblog_db; Query OK, 2 rows affected (0.02 sec) mysql> CREATE DATABASE weblog_db; Query OK, 1 row affected (0.01 sec) mysql> USE weblog_db; Database changed mysql> CREATE TABLE page_visits ( -> page VARCHAR(255) , -> visits BIGINT -> ); ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'TABLE page_visits ( page VARCHAR(255) , visits BIGINT )' at line 1 mysql> CREATE TABLE page_visits ( -> page VARCHAR(255), -> visits BIGINT -> ); Query OK, 0 rows affected (0.02 sec) mysql> SHOW TABLES; +---------------------+ | Tables_in_weblog_db | +---------------------+ | page_visits | +---------------------+ 1 row in set (0.00 sec) mysql> ^C mysql> q -> quit -> exit -> ^C mysql> ^C mysql> ^C mysql> ^DBye [root@node ~]# hive SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Hive Session ID = 7bb79582-cc2b-49b6-abc7-020dcdc46542 Logging initialized using configuration in jar:file:/home/hive-3.1.3/lib/hive-common-3.1.3.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Hive Session ID = 15d9da52-e18e-40b2-a80f-e76eda81df4c hive> DESCRIBE FORMATTED page_visits; OK # col_name data_type comment page string visits bigint # Detailed Table Information Database: default OwnerType: USER Owner: root CreateTime: Tue Jul 08 01:43:42 CST 2025 LastAccessTime: UNKNOWN Retention: 0 Location: hdfs://node:9000/hive/warehouse/page_visits Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} bucketing_version 2 numFiles 1 numRows 4 rawDataSize 56 totalSize 60 transient_lastDdlTime 1751910222 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format 1 Time taken: 0.785 seconds, Fetched: 32 row(s) hive> [root@node ~]# [root@node ~]# sqoop export \ > --connect jdbc:mysql://localhost/weblog_db \ > --username root \ > --password Aa@123456 \ > --table page_visits \ > --export-dir hdfs://node:9000/hive/warehouse/page_visits \ > --input-fields-terminated-by '\001' \ > --num-mappers 1 Warning: /home/sqoop-1.4.7/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/sqoop-1.4.7/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 2025-07-08 15:28:12,550 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 2025-07-08 15:28:12,587 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 2025-07-08 15:28:12,704 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 2025-07-08 15:28:12,708 INFO tool.CodeGenTool: Beginning code generation Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. 2025-07-08 15:28:13,225 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `page_visits` AS t LIMIT 1 2025-07-08 15:28:13,266 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `page_visits` AS t LIMIT 1 2025-07-08 15:28:13,280 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop3.3 Note: /tmp/sqoop-root/compile/363869e21c2078b9742685122c43a3cc/page_visits.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 2025-07-08 15:28:16,377 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/363869e21c2078b9742685122c43a3cc/page_visits.jar 2025-07-08 15:28:16,391 INFO mapreduce.ExportJobBase: Beginning export of page_visits 2025-07-08 15:28:16,391 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2025-07-08 15:28:16,484 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 2025-07-08 15:28:17,339 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 2025-07-08 15:28:17,342 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 2025-07-08 15:28:17,343 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 2025-07-08 15:28:17,555 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at node/192.168.196.122:8032 2025-07-08 15:28:17,782 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1751959003014_0001 2025-07-08 15:28:26,026 INFO input.FileInputFormat: Total input files to process : 1 2025-07-08 15:28:26,029 INFO input.FileInputFormat: Total input files to process : 1 2025-07-08 15:28:26,495 INFO mapreduce.JobSubmitter: number of splits:1 2025-07-08 15:28:26,528 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 2025-07-08 15:28:26,619 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1751959003014_0001 2025-07-08 15:28:26,620 INFO mapreduce.JobSubmitter: Executing with tokens: [] 2025-07-08 15:28:26,805 INFO conf.Configuration: resource-types.xml not found 2025-07-08 15:28:26,805 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 2025-07-08 15:28:27,226 INFO impl.YarnClientImpl: Submitted application application_1751959003014_0001 2025-07-08 15:28:27,264 INFO mapreduce.Job: The url to track the job: http://node:8088/proxy/application_1751959003014_0001/ 2025-07-08 15:28:27,264 INFO mapreduce.Job: Running job: job_1751959003014_0001 2025-07-08 15:28:34,334 INFO mapreduce.Job: Job job_1751959003014_0001 running in uber mode : false 2025-07-08 15:28:34,335 INFO mapreduce.Job: map 0% reduce 0% 2025-07-08 15:28:38,374 INFO mapreduce.Job: map 100% reduce 0% 2025-07-08 15:28:38,381 INFO mapreduce.Job: Job job_1751959003014_0001 failed with state FAILED due to: Task failed task_1751959003014_0001_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 0 2025-07-08 15:28:38,448 INFO mapreduce.Job: Counters: 8 Job Counters Failed map tasks=1 Launched map tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=2061 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=2061 Total vcore-milliseconds taken by all map tasks=2061 Total megabyte-milliseconds taken by all map tasks=2110464 2025-07-08 15:28:38,456 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead 2025-07-08 15:28:38,457 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 21.1033 seconds (0 bytes/sec) 2025-07-08 15:28:38,462 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 2025-07-08 15:28:38,462 INFO mapreduce.ExportJobBase: Exported 0 records. 2025-07-08 15:28:38,462 ERROR mapreduce.ExportJobBase: Export job failed! 2025-07-08 15:28:38,463 ERROR tool.ExportTool: Error during export: Export job failed! at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:445) at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931) at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:80) at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252) [root@node ~]# sqoop export \ > --connect jdbc:mysql://localhost/weblog_db \ > --username root \ > --password Aa@123456 \ > --table page_visits \ > --export-dir hdfs://node:9000/hive/warehouse/page_visits \ > --input-fields-terminated-by ',' \ > --num-mappers 1 Warning: /home/sqoop-1.4.7/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/sqoop-1.4.7/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 2025-07-08 15:30:31,174 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 2025-07-08 15:30:31,218 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 2025-07-08 15:30:31,333 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 2025-07-08 15:30:31,336 INFO tool.CodeGenTool: Beginning code generation Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. 2025-07-08 15:30:31,771 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `page_visits` AS t LIMIT 1 2025-07-08 15:30:31,814 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `page_visits` AS t LIMIT 1 2025-07-08 15:30:31,821 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop3.3 Note: /tmp/sqoop-root/compile/ab00e36d1f5084a0f7d522b4e9a975e5/page_visits.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 2025-07-08 15:30:33,116 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/ab00e36d1f5084a0f7d522b4e9a975e5/page_visits.jar 2025-07-08 15:30:33,129 INFO mapreduce.ExportJobBase: Beginning export of page_visits 2025-07-08 15:30:33,129 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2025-07-08 15:30:33,212 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 2025-07-08 15:30:33,877 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 2025-07-08 15:30:33,880 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 2025-07-08 15:30:33,880 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 2025-07-08 15:30:34,097 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at node/192.168.196.122:8032 2025-07-08 15:30:34,310 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1751959003014_0002 2025-07-08 15:30:39,127 INFO input.FileInputFormat: Total input files to process : 1 2025-07-08 15:30:39,131 INFO input.FileInputFormat: Total input files to process : 1 2025-07-08 15:30:39,995 INFO mapreduce.JobSubmitter: number of splits:1 2025-07-08 15:30:40,022 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 2025-07-08 15:30:40,532 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1751959003014_0002 2025-07-08 15:30:40,532 INFO mapreduce.JobSubmitter: Executing with tokens: [] 2025-07-08 15:30:40,689 INFO conf.Configuration: resource-types.xml not found 2025-07-08 15:30:40,689 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 2025-07-08 15:30:40,746 INFO impl.YarnClientImpl: Submitted application application_1751959003014_0002 2025-07-08 15:30:40,783 INFO mapreduce.Job: The url to track the job: http://node:8088/proxy/application_1751959003014_0002/ 2025-07-08 15:30:40,784 INFO mapreduce.Job: Running job: job_1751959003014_0002 2025-07-08 15:30:46,847 INFO mapreduce.Job: Job job_1751959003014_0002 running in uber mode : false 2025-07-08 15:30:46,848 INFO mapreduce.Job: map 0% reduce 0% 2025-07-08 15:30:50,893 INFO mapreduce.Job: map 100% reduce 0% 2025-07-08 15:30:51,905 INFO mapreduce.Job: Job job_1751959003014_0002 failed with state FAILED due to: Task failed task_1751959003014_0002_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 0 2025-07-08 15:30:51,973 INFO mapreduce.Job: Counters: 8 Job Counters Failed map tasks=1 Launched map tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=2058 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=2058 Total vcore-milliseconds taken by all map tasks=2058 Total megabyte-milliseconds taken by all map tasks=2107392 2025-07-08 15:30:51,979 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead 2025-07-08 15:30:51,980 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 18.0828 seconds (0 bytes/sec) 2025-07-08 15:30:51,983 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 2025-07-08 15:30:51,983 INFO mapreduce.ExportJobBase: Exported 0 records. 2025-07-08 15:30:51,983 ERROR mapreduce.ExportJobBase: Export job failed! 2025-07-08 15:30:51,984 ERROR tool.ExportTool: Error during export: Export job failed! at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:445) at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931) at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:80) at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252) [root@node ~]# 6.2 Sqoop导出数据 6.2.1从Hive将数据导出到MySQL 6.2.2sqoop导出格式 6.2.3导出page_visits表 6.2.4导出到ip_visits表 6.3验证导出数据 6.3.1登录MySQL 6.3.2执行查询
07-09
[root@node ~]# start-dfs.sh Starting namenodes on [node] Last login: 二 7月 8 16:00:18 CST 2025 from 192.168.1.92 on pts/0 Starting datanodes Last login: 二 7月 8 16:00:38 CST 2025 on pts/0 Starting secondary namenodes [node] Last login: 二 7月 8 16:00:41 CST 2025 on pts/0 SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [root@node ~]# start-yarn.sh Starting resourcemanager Last login: 二 7月 8 16:00:45 CST 2025 on pts/0 Starting nodemanagers Last login: 二 7月 8 16:00:51 CST 2025 on pts/0 [root@node ~]# mapred --daemon start historyserver [root@node ~]# jps 3541 ResourceManager 4007 Jps 2984 NameNode 3944 JobHistoryServer 3274 SecondaryNameNode [root@node ~]# mkdir -p /weblog [root@node ~]# cat > /weblog/access.log << EOF > 192.168.1.1,2023-06-01 10:30:22,/index.html > 192.168.1.2,2023-06-01 10:31:15,/product.html > 192.168.1.1,2023-06-01 10:32:45,/cart.html > 192.168.1.3,2023-06-01 11:45:30,/checkout.html > 192.168.1.4,2023-06-01 12:10:05,/index.html > 192.168.1.2,2023-06-01 14:20:18,/product.htm > EOF [root@node ~]# ls /weblog access.log [root@node ~]# hdfs dfs -mkdir -p /weblog/raw SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [root@node ~]# hdfs dfs -put /weblog/access.log /weblog/raw/ SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [root@node ~]# hdfs dfs -ls /weblog/raw SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Found 1 items -rw-r--r-- 3 root supergroup 269 2025-07-08 16:03 /weblog/raw/access.log [root@node ~]# cd /weblog [root@node weblog]# mkdir weblog-mapreduce [root@node weblog]# cd weblog-mapreduce [root@node weblog-mapreduce]# touch CleanMapper.java [root@node weblog-mapreduce]# vim CleanMapper.java import java.io.IOException; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; public class CleanMapper extends Mapper<LongWritable, Text, Text, NullWritable> { public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String[] fields = line.split(","); if(fields.length == 3) { String ip = fields[0]; String time = fields[1]; String page = fields[2]; if(ip.matches("\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}")) { String outputLine = ip + "," + time + "," + page; context.write(new Text(outputLine), NullWritable.get()); } } } } [root@node weblog-mapreduce]# touch CleanReducer.java [root@node weblog-mapreduce]# vim CleanReducer.java import java.io.IOException; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; public class CleanReducer extends Reducer<Text, NullWritable, Text, NullWritable> { public void reduce(Text key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException { context.write(key, NullWritable.get()); } } [root@node weblog-mapreduce]# touch LogCleanDriver.java [root@node weblog-mapreduce]# vim LogCleanDriver.java import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class LogCleanDriver { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "Web Log Cleaner"); job.setJarByClass(LogCleanDriver.class); job.setMapperClass(CleanMapper.class); job.setReducerClass(CleanReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(NullWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } [root@node weblog-mapreduce]# ls /weblog/weblog-mapreduce CleanMapper.java CleanReducer.java LogCleanDriver.java [root@node weblog-mapreduce]# javac -classpath $(hadoop classpath) -d . *.java [root@node weblog-mapreduce]# ls /weblog/weblog-mapreduce CleanMapper.class CleanReducer.class LogCleanDriver.class CleanMapper.java CleanReducer.java LogCleanDriver.java [root@node weblog-mapreduce]# jar cf logclean.jar *.class [root@node weblog-mapreduce]# ls /weblog/weblog-mapreduce CleanMapper.class CleanReducer.class LogCleanDriver.class logclean.jar CleanMapper.java CleanReducer.java LogCleanDriver.java [root@node weblog-mapreduce]# hdfs dfs -ls /weblog/raw SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Found 1 items -rw-r--r-- 3 root supergroup 269 2025-07-08 16:03 /weblog/raw/access.log [root@node weblog-mapreduce]# hdfs dfs -ls /weblog/output SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. ls: `/weblog/output': No such file or directory [root@node weblog-mapreduce]# hadoop jar logclean.jar LogCleanDriver /weblog/raw /weblog/output SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [root@node weblog-mapreduce]# [root@node weblog-mapreduce]# mapred job -status job_1751961655287_0001 SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Job: job_1751961655287_0001 Job File: hdfs://node:9000/tmp/hadoop-yarn/staging/history/done/2025/07/08/000000/job_1751961655287_0001_conf.xml Job Tracking URL : http://node:19888/jobhistory/job/job_1751961655287_0001 Uber job : false Number of maps: 1 Number of reduces: 1 map() completion: 1.0 reduce() completion: 1.0 Job state: SUCCEEDED retired: false reason for failure: Counters: 54 File System Counters FILE: Number of bytes read=287 FILE: Number of bytes written=552699 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=372 HDFS: Number of bytes written=269 HDFS: Number of read operations=8 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 HDFS: Number of bytes read erasure-coded=0 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=1848 Total time spent by all reduces in occupied slots (ms)=2016 Total time spent by all map tasks (ms)=1848 Total time spent by all reduce tasks (ms)=2016 Total vcore-milliseconds taken by all map tasks=1848 Total vcore-milliseconds taken by all reduce tasks=2016 Total megabyte-milliseconds taken by all map tasks=1892352 Total megabyte-milliseconds taken by all reduce tasks=2064384 Map-Reduce Framework Map input records=6 Map output records=6 Map output bytes=269 Map output materialized bytes=287 Input split bytes=103 Combine input records=0 Combine output records=0 Reduce input groups=6 Reduce shuffle bytes=287 Reduce input records=6 Reduce output records=6 Spilled Records=12 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=95 CPU time spent (ms)=1050 Physical memory (bytes) snapshot=500764672 Virtual memory (bytes) snapshot=5614292992 Total committed heap usage (bytes)=379584512 Peak Map Physical memory (bytes)=293011456 Peak Map Virtual memory (bytes)=2803433472 Peak Reduce Physical memory (bytes)=207753216 Peak Reduce Virtual memory (bytes)=2810859520 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=269 File Output Format Counters Bytes Written=269 [root@node weblog-mapreduce]# hdfs dfs -ls /weblog/output SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Found 2 items -rw-r--r-- 3 root supergroup 0 2025-07-08 16:34 /weblog/output/_SUCCESS -rw-r--r-- 3 root supergroup 269 2025-07-08 16:34 /weblog/output/part-r-00000 [root@node weblog-mapreduce]# hdfs dfs -cat /weblog/output/part-r-00000 | head -5 SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 192.168.1.1,2023-06-01 10:30:22,/index.html 192.168.1.1,2023-06-01 10:32:45,/cart.html 192.168.1.2,2023-06-01 10:31:15,/product.html 192.168.1.2,2023-06-01 14:20:18,/product.htm 192.168.1.3,2023-06-01 11:45:30,/checkout.html [root@node weblog-mapreduce]# hive SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Hive Session ID = 5199f37c-a381-428a-be1b-0a2afaab8583 Logging initialized using configuration in jar:file:/home/hive-3.1.3/lib/hive-common-3.1.3.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Hive Session ID = f38c99b3-ff7c-4f61-ae07-6b21d86d7160 hive> CREATE EXTERNAL TABLE weblog ( > ip STRING, > access_time TIMESTAMP, > page STRING > ) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > LOCATION '/weblog/output'; OK Time taken: 1.274 seconds hive> select * from weblog; OK 192.168.1.1 2023-06-01 10:30:22 /index.html 192.168.1.1 2023-06-01 10:32:45 /cart.html 192.168.1.2 2023-06-01 10:31:15 /product.html 192.168.1.2 2023-06-01 14:20:18 /product.htm 192.168.1.3 2023-06-01 11:45:30 /checkout.html 192.168.1.4 2023-06-01 12:10:05 /index.html Time taken: 1.947 seconds, Fetched: 6 row(s) hive> select * from weblog limit 5; OK 192.168.1.1 2023-06-01 10:30:22 /index.html 192.168.1.1 2023-06-01 10:32:45 /cart.html 192.168.1.2 2023-06-01 10:31:15 /product.html 192.168.1.2 2023-06-01 14:20:18 /product.htm 192.168.1.3 2023-06-01 11:45:30 /checkout.html Time taken: 0.148 seconds, Fetched: 5 row(s) hive> hive> CREATE TABLE page_visits AS > SELECT > page, > COUNT(*) AS visits > FROM weblog > GROUP BY page > ORDER BY visits DESC; Query ID = root_20250708183002_ec44d1b4-af24-403c-bb67-380dfb6961c3 Total jobs = 2 Launching Job 1 out of 2 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1751961655287_0002, Tracking URL = http://node:8088/proxy/application_1751961655287_0002/ Kill Command = /home/hadoop/hadoop3.3/bin/mapred job -kill job_1751961655287_0002 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2025-07-08 18:30:12,692 Stage-1 map = 0%, reduce = 0% 2025-07-08 18:30:16,978 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.8 sec 2025-07-08 18:30:23,184 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.66 sec MapReduce Total cumulative CPU time: 3 seconds 660 msec Ended Job = job_1751961655287_0002 Launching Job 2 out of 2 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1751961655287_0003, Tracking URL = http://node:8088/proxy/application_1751961655287_0003/ Kill Command = /home/hadoop/hadoop3.3/bin/mapred job -kill job_1751961655287_0003 Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 2025-07-08 18:30:35,969 Stage-2 map = 0%, reduce = 0% 2025-07-08 18:30:41,155 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.23 sec 2025-07-08 18:30:46,313 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 2.95 sec MapReduce Total cumulative CPU time: 2 seconds 950 msec Ended Job = job_1751961655287_0003 Moving data to directory hdfs://node:9000/hive/warehouse/page_visits MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.66 sec HDFS Read: 12379 HDFS Write: 251 SUCCESS Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 2.95 sec HDFS Read: 7308 HDFS Write: 150 SUCCESS Total MapReduce CPU Time Spent: 6 seconds 610 msec OK Time taken: 46.853 seconds hive> hive> describe page_visits; OK page string visits bigint Time taken: 0.214 seconds, Fetched: 2 row(s) hive> CREATE TABLE ip_visits AS > SELECT > ip, > COUNT(*) AS visits > FROM weblog > GROUP BY ip > ORDER BY visits DESC; Query ID = root_20250708183554_da402d08-af34-46f9-a33a-3f66ddd1a580 Total jobs = 2 Launching Job 1 out of 2 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1751961655287_0004, Tracking URL = http://node:8088/proxy/application_1751961655287_0004/ Kill Command = /home/hadoop/hadoop3.3/bin/mapred job -kill job_1751961655287_0004 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2025-07-08 18:36:04,037 Stage-1 map = 0%, reduce = 0% 2025-07-08 18:36:09,250 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.57 sec 2025-07-08 18:36:14,393 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.3 sec MapReduce Total cumulative CPU time: 3 seconds 300 msec Ended Job = job_1751961655287_0004 Launching Job 2 out of 2 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1751961655287_0005, Tracking URL = http://node:8088/proxy/application_1751961655287_0005/ Kill Command = /home/hadoop/hadoop3.3/bin/mapred job -kill job_1751961655287_0005 Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 2025-07-08 18:36:27,073 Stage-2 map = 0%, reduce = 0% 2025-07-08 18:36:31,215 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.25 sec 2025-07-08 18:36:36,853 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.27 sec MapReduce Total cumulative CPU time: 3 seconds 270 msec Ended Job = job_1751961655287_0005 Moving data to directory hdfs://node:9000/hive/warehouse/ip_visits MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.3 sec HDFS Read: 12445 HDFS Write: 216 SUCCESS Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 3.27 sec HDFS Read: 7261 HDFS Write: 129 SUCCESS Total MapReduce CPU Time Spent: 6 seconds 570 msec OK Time taken: 44.523 seconds hive> [root@node weblog-mapreduce]# hive> [root@node weblog-mapreduce]# describe ip_visite; bash: describe: command not found... [root@node weblog-mapreduce]# hive SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Hive Session ID = 57dafc2a-afe2-41a4-8159-00f8d44b5add Logging initialized using configuration in jar:file:/home/hive-3.1.3/lib/hive-common-3.1.3.jar!/hive-log4j2.properties Async: true Hive Session ID = f866eae4-4cb4-4403-b7a2-7a52701c5a74 Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> describe ip_visite; FAILED: SemanticException [Error 10001]: Table not found ip_visite hive> describe ip_visits; OK ip string visits bigint Time taken: 0.464 seconds, Fetched: 2 row(s) hive> SELECT * FROM page_visits; OK /index.html 2 /product.html 1 /product.htm 1 /checkout.html 1 /cart.html 1 Time taken: 2.095 seconds, Fetched: 5 row(s) hive> SELECT * FROM ip_visits; OK 192.168.1.2 2 192.168.1.1 2 192.168.1.4 1 192.168.1.3 1 Time taken: 0.176 seconds, Fetched: 4 row(s) hive> hive> [root@node weblog-mapreduce]# [root@node weblog-mapreduce]# mysql -u root -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 48 Server version: 8.0.42 MySQL Community Server - GPL Copyright (c) 2000, 2025, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> CREATE DATABASE IF NOT EXISTS weblog_db; Query OK, 1 row affected (0.06 sec) mysql> USE weblog_db; Database changed mysql> CREATE TABLE IF NOT EXISTS page_visits ( -> page VARCHAR(255), -> visits BIGINT -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8; Query OK, 0 rows affected, 1 warning (0.05 sec) mysql> SHOW TABLES; +---------------------+ | Tables_in_weblog_db | +---------------------+ | page_visits | +---------------------+ 1 row in set (0.00 sec) mysql> DESCRIBE page_visits; +--------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------+--------------+------+-----+---------+-------+ | page | varchar(255) | YES | | NULL | | | visits | bigint | YES | | NULL | | +--------+--------------+------+-----+---------+-------+ 2 rows in set (0.00 sec) mysql> CREATE TABLE IF NOT EXISTS ip_visits ( -> ip VARCHAR(15), -> visits BIGINT -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8; Query OK, 0 rows affected, 1 warning (0.02 sec) mysql> SHOW TABLES; +---------------------+ | Tables_in_weblog_db | +---------------------+ | ip_visits | | page_visits | +---------------------+ 2 rows in set (0.01 sec) mysql> DESC ip_visits; +--------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------+-------------+------+-----+---------+-------+ | ip | varchar(15) | YES | | NULL | | | visits | bigint | YES | | NULL | | +--------+-------------+------+-----+---------+-------+ 2 rows in set (0.00 sec) mysql> ^C mysql> [root@node weblog-mapreduce]# hive SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Hive Session ID = f34e6971-71ae-4aa5-aa22-895061f33bdf Logging initialized using configuration in jar:file:/home/hive-3.1.3/lib/hive-common-3.1.3.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Hive Session ID = f7a06e76-e117-4fbb-9ee8-09fdfd002104 hive> DESCRIBE FORMATTED page_visits; OK # col_name data_type comment page string visits bigint # Detailed Table Information Database: default OwnerType: USER Owner: root CreateTime: Tue Jul 08 18:30:47 CST 2025 LastAccessTime: UNKNOWN Retention: 0 Location: hdfs://node:9000/hive/warehouse/page_visits Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} bucketing_version 2 numFiles 1 numRows 5 rawDataSize 70 totalSize 75 transient_lastDdlTime 1751970648 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format 1 Time taken: 1.043 seconds, Fetched: 32 row(s) hive> 到这里就不会了 6.2.2sqoop导出格式 6.2.3导出page_visits表 6.2.4导出到ip_visits表 6.3验证导出数据 6.3.1登录MySQL 6.3.2执行查询
最新发布
07-09
### 使用 Kafka Producer API 编写代码以向名为 demo 的 Topic 发送消息 Kafka Producer 是用于将数据发送到 Kafka 集群的组件。以下是编写一个完整的 Kafka Producer 程序所需的详细内容。 #### 1. 配置 Kafka Producer Producer 的配置通过 `Properties` 对象完成,包含连接信息、序列化器以及其他参数。以下是一些常用的配置参数及其作用[^3]: - **bootstrap.servers**: 指定 Kafka 集群的地址列表。 - **key.serializer** 和 **value.serializer**: 分别指定键和值的序列化器类。 - **acks**: 控制 Producer 如何确认消息已被成功写入。 - **retries**: 设置重试次数以应对临时性错误。 #### 2. 创建 Producer 实例 使用配置好的 `Properties` 对象创建一个 `KafkaProducer` 实例。 #### 3. 发送消息 通过调用 `send()` 方法将消息发送到指定主题。可以同步或异步地等待发送结果。 以下是完整的代码示例: ```java import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerRecord; import org.apache.kafka.clients.producer.RecordMetadata; import org.apache.kafka.clients.producer.Callback; import org.apache.kafka.clients.producer.ProducerConfig; import org.apache.kafka.common.serialization.StringSerializer; import java.util.Properties; public class KafkaProducerDemo { public static void main(String[] args) { // 初始化 Producer 配置 Properties props = new Properties(); props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); // Kafka 集群地址 props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName()); // 键序列化器 props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName()); // 值序列化器 props.put(ProducerConfig.ACKS_CONFIG, "all"); // 确认机制 props.put(ProducerConfig.RETRIES_CONFIG, 3); // 重试次数 props.put(ProducerConfig.LINGER_MS_CONFIG, 1); // 批量发送延迟 // 创建 Producer 实例 KafkaProducer<String, String> producer = new KafkaProducer<>(props); // 定义消息内容 String topic = "demo"; String key = "key1"; String value = "Hello Kafka"; // 创建 ProducerRecord 并发送消息 ProducerRecord<String, String> record = new ProducerRecord<>(topic, key, value); producer.send(record, new Callback() { @Override public void onCompletion(RecordMetadata metadata, Exception exception) { if (exception == null) { System.out.println("Message sent successfully to partition " + metadata.partition() + ", offset " + metadata.offset()); } else { exception.printStackTrace(); } } }); // 关闭 Producer producer.close(); } } ``` #### 注意事项 - 在实际应用中,应根据业务需求调整配置参数,如 `acks` 和 `retries`。 - 如果需要自定义序列化器,可以通过实现 `Serializer` 接口并配置 `ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG` 和 `ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG`[^3]。 - 确保 Kafka 集群已启动且网络可达。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值