C. Heap Operations(cf#357)

本文介绍了一种利用优先队列解决日志记录不一致的问题,通过添加最少的操作来修正日志,确保所有getMin操作的结果与记录相符,并且在进行getMin或removeMin操作时堆不为空。

C. Heap Operations
time limit per test
1 second
memory limit per test
256 megabytes
input
standard input
output
standard output

Petya has recently learned data structure named "Binary heap".

The heap he is now operating with allows the following operations:

  • put the given number into the heap;
  • get the value of the minimum element in the heap;
  • extract the minimum element from the heap;

Thus, at any moment of time the heap contains several integers (possibly none), some of them might be equal.

In order to better learn this data structure Petya took an empty heap and applied some operations above to it. Also, he carefully wrote down all the operations and their results to his event log, following the format:

  • insert x — put the element with value x in the heap;
  • getMin x — the value of the minimum element contained in the heap was equal to x;
  • removeMin — the minimum element was extracted from the heap (only one instance, if there were many).

All the operations were correct, i.e. there was at least one element in the heap each time getMin or removeMin operations were applied.

While Petya was away for a lunch, his little brother Vova came to the room, took away some of the pages from Petya's log and used them to make paper boats.

Now Vova is worried, if he made Petya's sequence of operations inconsistent. For example, if one apply operations one-by-one in the order they are written in the event log, results of getMin operations might differ from the results recorded by Petya, and some ofgetMin or removeMin operations may be incorrect, as the heap is empty at the moment they are applied.

Now Vova wants to add some new operation records to the event log in order to make the resulting sequence of operations correct. That is, the result of each getMin operation is equal to the result in the record, and the heap is non-empty when getMin ad removeMinare applied. Vova wants to complete this as fast as possible, as the Petya may get back at any moment. He asks you to add the least possible number of operation records to the current log. Note that arbitrary number of operations may be added at the beginning, between any two other operations, or at the end of the log.

Input

The first line of the input contains the only integer n (1 ≤ n ≤ 100 000) — the number of the records left in Petya's journal.

Each of the following n lines describe the records in the current log in the order they are applied. Format described in the statement is used. All numbers in the input are integers not exceeding 109 by their absolute value.

Output

The first line of the output should contain a single integer m — the minimum possible number of records in the modified sequence of operations.

Next m lines should contain the corrected sequence of records following the format of the input (described in the statement), one per line and in the order they are applied. All the numbers in the output should be integers not exceeding 109 by their absolute value.

Note that the input sequence of operations must be the subsequence of the output sequence.

It's guaranteed that there exists the correct answer consisting of no more than 1 000 000 operations.

Examples
input
2
insert 3
getMin 4
output
4
insert 3
removeMin
insert 4
getMin 4
input
4
insert 1
insert 1
removeMin
getMin 2
output
6
insert 1
insert 1
removeMin
removeMin
insert 2
getMin 2
Note

In the first sample, after number 3 is inserted into the heap, the minimum number is 3. To make the result of the first getMin equal to 4one should firstly remove number 3 from the heap and then add number 4 into the heap.

In the second sample case number 1 is inserted two times, so should be similarly removed twice.



这道题让我真正的学到了之前了解过的优先队列,以前只是知道有优先队列这个东西,这道题才是让我真正的懂了什么叫优先队列。

priority_queue<int>q;这是普通的优先队列,数据大的优先级高,也就是q.top()会得到最大的数

priority_queue<int,vector<int>,greater<int> >q;这是改造过的优先队列,数据小的优先级高,也就是q.top()会得到最小的数

prio_queue<node>q;这是一种自定义的优先队列,但是必须要重载<符号

struct node
{
    friend bool operator< (node n1, node n2)
    {
        return n1.priority < n2.priority;
    }
    int priority;
    int value;
}; 

 优先队列常用的函数

  q.push(x); 向优先队列添加一个元素

   q.pop(); 删除优先队列优先级最高的元素

  q.top(); 返回优先队列优先级最高的元素

  q.size(); 返回优先队列元素个数

  q.empty(); 如果优先队列为空,返回真


另外在说一下这道题的注意点,当队列为空时想要removeMin或者getMin得先insert一个数再处理,队列中有比getMin的数小的数时要先把小的数全部removeMin再处理。

#include <iostream>
#include<cstdio>
#include<cstring>
#include<cstdlib>
#include<algorithm>
#include<queue>
using namespace std;

priority_queue<int,vector<int>,greater<int> >q;
int f[1000050];
string s[1000050];
int main()
{
    ios::sync_with_stdio(false);
    int n,num=0;
    scanf("%d",&n);
    while(n--)
    {
        char a[10];
        int x;
        scanf("%s",a);
        if(a[0]=='i')
        {
            scanf("%d",&x);
            q.push(x);
            s[num]="insert";
            f[num++]=x;
        }
        if(a[0]=='g')
        {
            scanf("%d",&x);
            while(!q.empty()&&q.top()<x)
            {
                q.pop();
                s[num++]="removeMin";
            }
            if(q.empty()||q.top()>x)
            {
                q.push(x);
                s[num]="insert";
                f[num++]=x;
            }
            s[num]="getMin";
            f[num++]=x;
        }
        if(a[0]=='r')
        {
            if(q.empty())
            {
                q.push(0);
                s[num]="insert";
                f[num++]=0;
            }
            s[num++]="removeMin";
            q.pop();
        }
    }
    cout<<num<<endl;
    for(int i=0;i<num;i++)
    {
        cout<<s[i];
        if(s[i][0]!='r')
            cout<<" "<<f[i];
        cout<<endl;
    }
    return 0;
}



[root@node ~]# start-dfs.sh Starting namenodes on [node] Last login: 二 7月 8 16:00:18 CST 2025 from 192.168.1.92 on pts/0 Starting datanodes Last login: 二 7月 8 16:00:38 CST 2025 on pts/0 Starting secondary namenodes [node] Last login: 二 7月 8 16:00:41 CST 2025 on pts/0 SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [root@node ~]# start-yarn.sh Starting resourcemanager Last login: 二 7月 8 16:00:45 CST 2025 on pts/0 Starting nodemanagers Last login: 二 7月 8 16:00:51 CST 2025 on pts/0 [root@node ~]# mapred --daemon start historyserver [root@node ~]# jps 3541 ResourceManager 4007 Jps 2984 NameNode 3944 JobHistoryServer 3274 SecondaryNameNode [root@node ~]# mkdir -p /weblog [root@node ~]# cat > /weblog/access.log << EOF > 192.168.1.1,2023-06-01 10:30:22,/index.html > 192.168.1.2,2023-06-01 10:31:15,/product.html > 192.168.1.1,2023-06-01 10:32:45,/cart.html > 192.168.1.3,2023-06-01 11:45:30,/checkout.html > 192.168.1.4,2023-06-01 12:10:05,/index.html > 192.168.1.2,2023-06-01 14:20:18,/product.htm > EOF [root@node ~]# ls /weblog access.log [root@node ~]# hdfs dfs -mkdir -p /weblog/raw SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [root@node ~]# hdfs dfs -put /weblog/access.log /weblog/raw/ SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [root@node ~]# hdfs dfs -ls /weblog/raw SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Found 1 items -rw-r--r-- 3 root supergroup 269 2025-07-08 16:03 /weblog/raw/access.log [root@node ~]# cd /weblog [root@node weblog]# mkdir weblog-mapreduce [root@node weblog]# cd weblog-mapreduce [root@node weblog-mapreduce]# touch CleanMapper.java [root@node weblog-mapreduce]# vim CleanMapper.java import java.io.IOException; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; public class CleanMapper extends Mapper<LongWritable, Text, Text, NullWritable> { public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String[] fields = line.split(","); if(fields.length == 3) { String ip = fields[0]; String time = fields[1]; String page = fields[2]; if(ip.matches("\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}")) { String outputLine = ip + "," + time + "," + page; context.write(new Text(outputLine), NullWritable.get()); } } } } [root@node weblog-mapreduce]# touch CleanReducer.java [root@node weblog-mapreduce]# vim CleanReducer.java import java.io.IOException; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; public class CleanReducer extends Reducer<Text, NullWritable, Text, NullWritable> { public void reduce(Text key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException { context.write(key, NullWritable.get()); } } [root@node weblog-mapreduce]# touch LogCleanDriver.java [root@node weblog-mapreduce]# vim LogCleanDriver.java import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class LogCleanDriver { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "Web Log Cleaner"); job.setJarByClass(LogCleanDriver.class); job.setMapperClass(CleanMapper.class); job.setReducerClass(CleanReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(NullWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } [root@node weblog-mapreduce]# ls /weblog/weblog-mapreduce CleanMapper.java CleanReducer.java LogCleanDriver.java [root@node weblog-mapreduce]# javac -classpath $(hadoop classpath) -d . *.java [root@node weblog-mapreduce]# ls /weblog/weblog-mapreduce CleanMapper.class CleanReducer.class LogCleanDriver.class CleanMapper.java CleanReducer.java LogCleanDriver.java [root@node weblog-mapreduce]# jar cf logclean.jar *.class [root@node weblog-mapreduce]# ls /weblog/weblog-mapreduce CleanMapper.class CleanReducer.class LogCleanDriver.class logclean.jar CleanMapper.java CleanReducer.java LogCleanDriver.java [root@node weblog-mapreduce]# hdfs dfs -ls /weblog/raw SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Found 1 items -rw-r--r-- 3 root supergroup 269 2025-07-08 16:03 /weblog/raw/access.log [root@node weblog-mapreduce]# hdfs dfs -ls /weblog/output SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. ls: `/weblog/output': No such file or directory [root@node weblog-mapreduce]# hadoop jar logclean.jar LogCleanDriver /weblog/raw /weblog/output SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [root@node weblog-mapreduce]# [root@node weblog-mapreduce]# mapred job -status job_1751961655287_0001 SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Job: job_1751961655287_0001 Job File: hdfs://node:9000/tmp/hadoop-yarn/staging/history/done/2025/07/08/000000/job_1751961655287_0001_conf.xml Job Tracking URL : http://node:19888/jobhistory/job/job_1751961655287_0001 Uber job : false Number of maps: 1 Number of reduces: 1 map() completion: 1.0 reduce() completion: 1.0 Job state: SUCCEEDED retired: false reason for failure: Counters: 54 File System Counters FILE: Number of bytes read=287 FILE: Number of bytes written=552699 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=372 HDFS: Number of bytes written=269 HDFS: Number of read operations=8 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 HDFS: Number of bytes read erasure-coded=0 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=1848 Total time spent by all reduces in occupied slots (ms)=2016 Total time spent by all map tasks (ms)=1848 Total time spent by all reduce tasks (ms)=2016 Total vcore-milliseconds taken by all map tasks=1848 Total vcore-milliseconds taken by all reduce tasks=2016 Total megabyte-milliseconds taken by all map tasks=1892352 Total megabyte-milliseconds taken by all reduce tasks=2064384 Map-Reduce Framework Map input records=6 Map output records=6 Map output bytes=269 Map output materialized bytes=287 Input split bytes=103 Combine input records=0 Combine output records=0 Reduce input groups=6 Reduce shuffle bytes=287 Reduce input records=6 Reduce output records=6 Spilled Records=12 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=95 CPU time spent (ms)=1050 Physical memory (bytes) snapshot=500764672 Virtual memory (bytes) snapshot=5614292992 Total committed heap usage (bytes)=379584512 Peak Map Physical memory (bytes)=293011456 Peak Map Virtual memory (bytes)=2803433472 Peak Reduce Physical memory (bytes)=207753216 Peak Reduce Virtual memory (bytes)=2810859520 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=269 File Output Format Counters Bytes Written=269 [root@node weblog-mapreduce]# hdfs dfs -ls /weblog/output SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Found 2 items -rw-r--r-- 3 root supergroup 0 2025-07-08 16:34 /weblog/output/_SUCCESS -rw-r--r-- 3 root supergroup 269 2025-07-08 16:34 /weblog/output/part-r-00000 [root@node weblog-mapreduce]# hdfs dfs -cat /weblog/output/part-r-00000 | head -5 SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 192.168.1.1,2023-06-01 10:30:22,/index.html 192.168.1.1,2023-06-01 10:32:45,/cart.html 192.168.1.2,2023-06-01 10:31:15,/product.html 192.168.1.2,2023-06-01 14:20:18,/product.htm 192.168.1.3,2023-06-01 11:45:30,/checkout.html [root@node weblog-mapreduce]# hive SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Hive Session ID = 5199f37c-a381-428a-be1b-0a2afaab8583 Logging initialized using configuration in jar:file:/home/hive-3.1.3/lib/hive-common-3.1.3.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Hive Session ID = f38c99b3-ff7c-4f61-ae07-6b21d86d7160 hive> CREATE EXTERNAL TABLE weblog ( > ip STRING, > access_time TIMESTAMP, > page STRING > ) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > LOCATION '/weblog/output'; OK Time taken: 1.274 seconds hive> select * from weblog; OK 192.168.1.1 2023-06-01 10:30:22 /index.html 192.168.1.1 2023-06-01 10:32:45 /cart.html 192.168.1.2 2023-06-01 10:31:15 /product.html 192.168.1.2 2023-06-01 14:20:18 /product.htm 192.168.1.3 2023-06-01 11:45:30 /checkout.html 192.168.1.4 2023-06-01 12:10:05 /index.html Time taken: 1.947 seconds, Fetched: 6 row(s) hive> select * from weblog limit 5; OK 192.168.1.1 2023-06-01 10:30:22 /index.html 192.168.1.1 2023-06-01 10:32:45 /cart.html 192.168.1.2 2023-06-01 10:31:15 /product.html 192.168.1.2 2023-06-01 14:20:18 /product.htm 192.168.1.3 2023-06-01 11:45:30 /checkout.html Time taken: 0.148 seconds, Fetched: 5 row(s) hive> hive> CREATE TABLE page_visits AS > SELECT > page, > COUNT(*) AS visits > FROM weblog > GROUP BY page > ORDER BY visits DESC; Query ID = root_20250708183002_ec44d1b4-af24-403c-bb67-380dfb6961c3 Total jobs = 2 Launching Job 1 out of 2 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1751961655287_0002, Tracking URL = http://node:8088/proxy/application_1751961655287_0002/ Kill Command = /home/hadoop/hadoop3.3/bin/mapred job -kill job_1751961655287_0002 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2025-07-08 18:30:12,692 Stage-1 map = 0%, reduce = 0% 2025-07-08 18:30:16,978 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.8 sec 2025-07-08 18:30:23,184 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.66 sec MapReduce Total cumulative CPU time: 3 seconds 660 msec Ended Job = job_1751961655287_0002 Launching Job 2 out of 2 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1751961655287_0003, Tracking URL = http://node:8088/proxy/application_1751961655287_0003/ Kill Command = /home/hadoop/hadoop3.3/bin/mapred job -kill job_1751961655287_0003 Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 2025-07-08 18:30:35,969 Stage-2 map = 0%, reduce = 0% 2025-07-08 18:30:41,155 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.23 sec 2025-07-08 18:30:46,313 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 2.95 sec MapReduce Total cumulative CPU time: 2 seconds 950 msec Ended Job = job_1751961655287_0003 Moving data to directory hdfs://node:9000/hive/warehouse/page_visits MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.66 sec HDFS Read: 12379 HDFS Write: 251 SUCCESS Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 2.95 sec HDFS Read: 7308 HDFS Write: 150 SUCCESS Total MapReduce CPU Time Spent: 6 seconds 610 msec OK Time taken: 46.853 seconds hive> hive> describe page_visits; OK page string visits bigint Time taken: 0.214 seconds, Fetched: 2 row(s) hive> CREATE TABLE ip_visits AS > SELECT > ip, > COUNT(*) AS visits > FROM weblog > GROUP BY ip > ORDER BY visits DESC; Query ID = root_20250708183554_da402d08-af34-46f9-a33a-3f66ddd1a580 Total jobs = 2 Launching Job 1 out of 2 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1751961655287_0004, Tracking URL = http://node:8088/proxy/application_1751961655287_0004/ Kill Command = /home/hadoop/hadoop3.3/bin/mapred job -kill job_1751961655287_0004 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2025-07-08 18:36:04,037 Stage-1 map = 0%, reduce = 0% 2025-07-08 18:36:09,250 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.57 sec 2025-07-08 18:36:14,393 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.3 sec MapReduce Total cumulative CPU time: 3 seconds 300 msec Ended Job = job_1751961655287_0004 Launching Job 2 out of 2 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1751961655287_0005, Tracking URL = http://node:8088/proxy/application_1751961655287_0005/ Kill Command = /home/hadoop/hadoop3.3/bin/mapred job -kill job_1751961655287_0005 Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 2025-07-08 18:36:27,073 Stage-2 map = 0%, reduce = 0% 2025-07-08 18:36:31,215 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.25 sec 2025-07-08 18:36:36,853 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.27 sec MapReduce Total cumulative CPU time: 3 seconds 270 msec Ended Job = job_1751961655287_0005 Moving data to directory hdfs://node:9000/hive/warehouse/ip_visits MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.3 sec HDFS Read: 12445 HDFS Write: 216 SUCCESS Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 3.27 sec HDFS Read: 7261 HDFS Write: 129 SUCCESS Total MapReduce CPU Time Spent: 6 seconds 570 msec OK Time taken: 44.523 seconds hive> [root@node weblog-mapreduce]# hive> [root@node weblog-mapreduce]# describe ip_visite; bash: describe: command not found... [root@node weblog-mapreduce]# hive SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Hive Session ID = 57dafc2a-afe2-41a4-8159-00f8d44b5add Logging initialized using configuration in jar:file:/home/hive-3.1.3/lib/hive-common-3.1.3.jar!/hive-log4j2.properties Async: true Hive Session ID = f866eae4-4cb4-4403-b7a2-7a52701c5a74 Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> describe ip_visite; FAILED: SemanticException [Error 10001]: Table not found ip_visite hive> describe ip_visits; OK ip string visits bigint Time taken: 0.464 seconds, Fetched: 2 row(s) hive> SELECT * FROM page_visits; OK /index.html 2 /product.html 1 /product.htm 1 /checkout.html 1 /cart.html 1 Time taken: 2.095 seconds, Fetched: 5 row(s) hive> SELECT * FROM ip_visits; OK 192.168.1.2 2 192.168.1.1 2 192.168.1.4 1 192.168.1.3 1 Time taken: 0.176 seconds, Fetched: 4 row(s) hive> hive> [root@node weblog-mapreduce]# [root@node weblog-mapreduce]# mysql -u root -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 48 Server version: 8.0.42 MySQL Community Server - GPL Copyright (c) 2000, 2025, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> CREATE DATABASE IF NOT EXISTS weblog_db; Query OK, 1 row affected (0.06 sec) mysql> USE weblog_db; Database changed mysql> CREATE TABLE IF NOT EXISTS page_visits ( -> page VARCHAR(255), -> visits BIGINT -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8; Query OK, 0 rows affected, 1 warning (0.05 sec) mysql> SHOW TABLES; +---------------------+ | Tables_in_weblog_db | +---------------------+ | page_visits | +---------------------+ 1 row in set (0.00 sec) mysql> DESCRIBE page_visits; +--------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------+--------------+------+-----+---------+-------+ | page | varchar(255) | YES | | NULL | | | visits | bigint | YES | | NULL | | +--------+--------------+------+-----+---------+-------+ 2 rows in set (0.00 sec) mysql> CREATE TABLE IF NOT EXISTS ip_visits ( -> ip VARCHAR(15), -> visits BIGINT -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8; Query OK, 0 rows affected, 1 warning (0.02 sec) mysql> SHOW TABLES; +---------------------+ | Tables_in_weblog_db | +---------------------+ | ip_visits | | page_visits | +---------------------+ 2 rows in set (0.01 sec) mysql> DESC ip_visits; +--------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------+-------------+------+-----+---------+-------+ | ip | varchar(15) | YES | | NULL | | | visits | bigint | YES | | NULL | | +--------+-------------+------+-----+---------+-------+ 2 rows in set (0.00 sec) mysql> ^C mysql> [root@node weblog-mapreduce]# hive SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Hive Session ID = f34e6971-71ae-4aa5-aa22-895061f33bdf Logging initialized using configuration in jar:file:/home/hive-3.1.3/lib/hive-common-3.1.3.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Hive Session ID = f7a06e76-e117-4fbb-9ee8-09fdfd002104 hive> DESCRIBE FORMATTED page_visits; OK # col_name data_type comment page string visits bigint # Detailed Table Information Database: default OwnerType: USER Owner: root CreateTime: Tue Jul 08 18:30:47 CST 2025 LastAccessTime: UNKNOWN Retention: 0 Location: hdfs://node:9000/hive/warehouse/page_visits Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} bucketing_version 2 numFiles 1 numRows 5 rawDataSize 70 totalSize 75 transient_lastDdlTime 1751970648 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format 1 Time taken: 1.043 seconds, Fetched: 32 row(s) hive> 到这里就不会了 6.2.2sqoop导出格式 6.2.3导出page_visits表 6.2.4导出到ip_visits表 6.3验证导出数据 6.3.1登录MySQL 6.3.2执行查询
07-09
SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/installs/hadoop3.3.1/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/installs/hive3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/installs/hbase2.2.2/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2025-10-15 20:23:12,448 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 2025-10-15 20:23:12,530 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 2025-10-15 20:23:12,531 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 2025-10-15 20:23:12,531 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 2025-10-15 20:23:12,778 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 2025-10-15 20:23:12,794 INFO tool.CodeGenTool: Beginning code generation Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. 2025-10-15 20:23:14,590 INFO manager.SqlManager: Executing SQL statement: select region_code,region_code_desc,region_city,region_city_desc,region_province,region_province_desc from area_dim where (1 = 0) 2025-10-15 20:23:14,636 INFO manager.SqlManager: Executing SQL statement: select region_code,region_code_desc,region_city,region_city_desc,region_province,region_province_desc from area_dim where (1 = 0) 2025-10-15 20:23:14,699 INFO manager.SqlManager: Executing SQL statement: select region_code,region_code_desc,region_city,region_city_desc,region_province,region_province_desc from area_dim where (1 = 0) 2025-10-15 20:23:14,717 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/installs/hadoop3.3.1 注: /tmp/sqoop-root/compile/2779abb8eed61751692c37913db4d94d/QueryResult.java使用或覆盖了已过时的 API。 注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 2025-10-15 20:23:18,787 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/2779abb8eed61751692c37913db4d94d/QueryResult.jar 2025-10-15 20:23:20,876 INFO tool.ImportTool: Destination directory /data/nshop/ods/dim_pub_area is not present, hence not deleting. 2025-10-15 20:23:20,881 INFO mapreduce.ImportJobBase: Beginning query import. 2025-10-15 20:23:20,883 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2025-10-15 20:23:20,894 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 2025-10-15 20:23:20,928 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 2025-10-15 20:23:21,530 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at caiji/192.168.193.54:8032 2025-10-15 20:23:21,916 INFO client.AHSProxy: Connecting to Application History server at caiji/192.168.193.54:10200 2025-10-15 20:23:22,262 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1760505557106_0013 Hive Session ID = e086f596-6c65-405f-87f6-0d09bca90e4a Hive Session ID = fa8aa012-e2d9-4c4b-8fbf-ea3bbc7b7e89 2025-10-15 20:23:27,485 INFO db.DBInputFormat: Using read commited transaction isolation 2025-10-15 20:23:27,565 INFO mapreduce.JobSubmitter: number of splits:1 2025-10-15 20:23:28,128 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1760505557106_0013 2025-10-15 20:23:28,129 INFO mapreduce.JobSubmitter: Executing with tokens: [] 2025-10-15 20:23:28,587 INFO conf.Configuration: resource-types.xml not found 2025-10-15 20:23:28,591 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 2025-10-15 20:23:28,742 INFO impl.YarnClientImpl: Submitted application application_1760505557106_0013 2025-10-15 20:23:28,819 INFO mapreduce.Job: The url to track the job: http://caiji:8088/proxy/application_1760505557106_0013/ 2025-10-15 20:23:28,820 INFO mapreduce.Job: Running job: job_1760505557106_0013 2025-10-15 20:23:43,166 INFO mapreduce.Job: Job job_1760505557106_0013 running in uber mode : false 2025-10-15 20:23:43,168 INFO mapreduce.Job: map 0% reduce 0% 2025-10-15 20:23:53,451 INFO mapreduce.Job: map 100% reduce 0% 2025-10-15 20:23:53,474 INFO mapreduce.Job: Job job_1760505557106_0013 completed successfully 2025-10-15 20:23:53,720 INFO mapreduce.Job: Counters: 33 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=282339 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=33222 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 HDFS: Number of bytes read erasure-coded=0 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=7085 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=7085 Total vcore-milliseconds taken by all map tasks=7085 Total megabyte-milliseconds taken by all map tasks=7255040 Map-Reduce Framework Map input records=791 Map output records=791 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=102 CPU time spent (ms)=4340 Physical memory (bytes) snapshot=249790464 Virtual memory (bytes) snapshot=2819796992 Total committed heap usage (bytes)=238026752 Peak Map Physical memory (bytes)=249790464 Peak Map Virtual memory (bytes)=2819796992 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=33222 2025-10-15 20:23:53,733 INFO mapreduce.ImportJobBase: Transferred 32.4434 KB in 32.7752 seconds (1,013.6317 bytes/sec) 2025-10-15 20:23:53,742 INFO mapreduce.ImportJobBase: Retrieved 791 records. 2025-10-15 20:23:53,742 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table null 2025-10-15 20:23:53,795 INFO manager.SqlManager: Executing SQL statement: select region_code,region_code_desc,region_city,region_city_desc,region_province,region_province_desc from area_dim where (1 = 0) 2025-10-15 20:23:53,804 INFO manager.SqlManager: Executing SQL statement: select region_code,region_code_desc,region_city,region_city_desc,region_province,region_province_desc from area_dim where (1 = 0) 2025-10-15 20:23:53,826 INFO hive.HiveImport: Loading uploaded data into Hive 2025-10-15 20:23:53,888 INFO conf.HiveConf: Found configuration file file:/opt/installs/hive3.1.2/conf/hive-site.xml 2025-10-15 20:24:00,619 main ERROR Could not register mbeans java.security.AccessControlException: access denied ("javax.management.MBeanTrustPermission" "register") at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) at java.lang.SecurityManager.checkPermission(SecurityManager.java:585) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.checkMBeanTrustPermission(DefaultMBeanServerInterceptor.java:1848) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:322) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) at org.apache.logging.log4j.core.jmx.Server.register(Server.java:389) at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:167) at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:140) at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:556) at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:261) at org.apache.logging.log4j.core.async.AsyncLoggerContext.start(AsyncLoggerContext.java:87) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:240) at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:158) at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:131) at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:101) at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:188) at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jDefault(LogUtils.java:173) at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:106) at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:98) at org.apache.hadoop.hive.common.LogUtils.initHiveLog4j(LogUtils.java:81) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:699) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:331) at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252) Hive Session ID = a3156ff3-c06b-4333-9294-e6fd634a4b9d 2025-10-15 20:24:00,804 INFO SessionState: Hive Session ID = a3156ff3-c06b-4333-9294-e6fd634a4b9d Logging initialized using configuration in jar:file:/opt/installs/sqoop1.4.7/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true 2025-10-15 20:24:01,091 INFO SessionState: Logging initialized using configuration in jar:file:/opt/installs/sqoop1.4.7/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true 2025-10-15 20:24:01,206 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/a3156ff3-c06b-4333-9294-e6fd634a4b9d 2025-10-15 20:24:01,512 INFO session.SessionState: Created local directory: /opt/installs/hive3.1.2/iotmp/root/a3156ff3-c06b-4333-9294-e6fd634a4b9d 2025-10-15 20:24:01,523 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/a3156ff3-c06b-4333-9294-e6fd634a4b9d/_tmp_space.db 2025-10-15 20:24:01,557 INFO conf.HiveConf: Using the default value passed in for log id: a3156ff3-c06b-4333-9294-e6fd634a4b9d 2025-10-15 20:24:01,557 INFO session.SessionState: Updating thread name to a3156ff3-c06b-4333-9294-e6fd634a4b9d main 2025-10-15 20:24:06,029 INFO metastore.HiveMetaStoreClient: Trying to connect to metastore with URI thrift://caiji:9083 2025-10-15 20:24:06,217 INFO metastore.HiveMetaStoreClient: Opened a connection to metastore, current connections: 1 2025-10-15 20:24:06,321 INFO metastore.HiveMetaStoreClient: Connected to metastore. 2025-10-15 20:24:06,322 INFO metastore.RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=root (auth:SIMPLE) retries=1 delay=1 lifetime=0 Hive Session ID = 32336280-d41f-4a0a-b3d7-b39399f2bbab 2025-10-15 20:24:07,603 INFO SessionState: Hive Session ID = 32336280-d41f-4a0a-b3d7-b39399f2bbab 2025-10-15 20:24:07,617 INFO conf.HiveConf: Using the default value passed in for log id: a3156ff3-c06b-4333-9294-e6fd634a4b9d 2025-10-15 20:24:07,656 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/32336280-d41f-4a0a-b3d7-b39399f2bbab 2025-10-15 20:24:07,670 INFO session.SessionState: Created local directory: /opt/installs/hive3.1.2/iotmp/root/32336280-d41f-4a0a-b3d7-b39399f2bbab 2025-10-15 20:24:07,681 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/32336280-d41f-4a0a-b3d7-b39399f2bbab/_tmp_space.db 2025-10-15 20:24:07,865 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=a3156ff3-c06b-4333-9294-e6fd634a4b9d, clientType=HIVECLI] 2025-10-15 20:24:07,891 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. 2025-10-15 20:24:07,944 INFO metastore.HiveMetaStoreClient: Mestastore configuration metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook 2025-10-15 20:24:07,974 INFO metastore.HiveMetaStoreClient: Closed a connection to metastore, current connections: 0 2025-10-15 20:24:07,977 ERROR utils.MetaStoreUtils: Got exception: org.apache.thrift.transport.TTransportException Cannot write to null outputStream org.apache.thrift.transport.TTransportException: Cannot write to null outputStream at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:142) at org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryProtocol.java:178) at org.apache.thrift.protocol.TBinaryProtocol.writeMessageBegin(TBinaryProtocol.java:106) at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:70) at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.send_get_tables_by_type(ThriftHiveMetastore.java:1913) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_tables_by_type(ThriftHiveMetastore.java:1903) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:1676) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:1665) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212) at com.sun.proxy.$Proxy41.getTables(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2773) at com.sun.proxy.$Proxy41.getTables(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByType(Hive.java:1310) at org.apache.hadoop.hive.ql.metadata.Hive.getTableObjects(Hive.java:1222) at org.apache.hadoop.hive.ql.metadata.Hive.getAllMaterializedViewObjects(Hive.java:1217) at org.apache.hadoop.hive.ql.metadata.HiveMaterializedViewsRegistry$Loader.run(HiveMaterializedViewsRegistry.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2025-10-15 20:24:07,983 ERROR utils.MetaStoreUtils: Converting exception to MetaException 2025-10-15 20:24:07,980 INFO metastore.HiveMetaStoreClient: Trying to connect to metastore with URI thrift://caiji:9083 2025-10-15 20:24:07,984 INFO metastore.HiveMetaStoreClient: Opened a connection to metastore, current connections: 1 2025-10-15 20:24:07,992 INFO metastore.HiveMetaStoreClient: Connected to metastore. 2025-10-15 20:24:07,992 INFO metastore.RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=root (auth:SIMPLE) retries=1 delay=1 lifetime=0 2025-10-15 20:24:08,002 WARN metastore.RetryingMetaStoreClient: MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. getTables MetaException(message:Got exception: org.apache.thrift.transport.TTransportException Cannot write to null outputStream) at org.apache.hadoop.hive.metastore.utils.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:168) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:1667) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212) at com.sun.proxy.$Proxy41.getTables(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2773) at com.sun.proxy.$Proxy41.getTables(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByType(Hive.java:1310) at org.apache.hadoop.hive.ql.metadata.Hive.getTableObjects(Hive.java:1222) at org.apache.hadoop.hive.ql.metadata.Hive.getAllMaterializedViewObjects(Hive.java:1217) at org.apache.hadoop.hive.ql.metadata.HiveMaterializedViewsRegistry$Loader.run(HiveMaterializedViewsRegistry.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2025-10-15 20:24:08,212 INFO SessionState: Added [/opt/installs/hive3.1.2/iotmp/a3156ff3-c06b-4333-9294-e6fd634a4b9d_resources/json-udf-1.3.8-jar-with-dependencies.jar] to class path 2025-10-15 20:24:08,212 INFO SessionState: Added resources: [hdfs:///common/lib/json-udf-1.3.8-jar-with-dependencies.jar] 2025-10-15 20:24:08,213 INFO conf.HiveConf: Using the default value passed in for log id: a3156ff3-c06b-4333-9294-e6fd634a4b9d 2025-10-15 20:24:08,213 INFO session.SessionState: Resetting thread name to main 2025-10-15 20:24:08,213 INFO conf.HiveConf: Using the default value passed in for log id: a3156ff3-c06b-4333-9294-e6fd634a4b9d 2025-10-15 20:24:08,213 INFO session.SessionState: Updating thread name to a3156ff3-c06b-4333-9294-e6fd634a4b9d main 2025-10-15 20:24:08,258 INFO SessionState: Added [/opt/installs/hive3.1.2/iotmp/a3156ff3-c06b-4333-9294-e6fd634a4b9d_resources/json-serde-1.3.8-jar-with-dependencies.jar] to class path 2025-10-15 20:24:08,258 INFO SessionState: Added resources: [hdfs:///common/lib/json-serde-1.3.8-jar-with-dependencies.jar] 2025-10-15 20:24:08,258 INFO conf.HiveConf: Using the default value passed in for log id: a3156ff3-c06b-4333-9294-e6fd634a4b9d 2025-10-15 20:24:08,258 INFO session.SessionState: Resetting thread name to main 2025-10-15 20:24:08,258 INFO conf.HiveConf: Using the default value passed in for log id: a3156ff3-c06b-4333-9294-e6fd634a4b9d 2025-10-15 20:24:08,258 INFO session.SessionState: Updating thread name to a3156ff3-c06b-4333-9294-e6fd634a4b9d main 2025-10-15 20:24:08,297 INFO conf.HiveConf: Using the default value passed in for log id: a3156ff3-c06b-4333-9294-e6fd634a4b9d 2025-10-15 20:24:08,297 INFO session.SessionState: Resetting thread name to main 2025-10-15 20:24:08,298 INFO conf.HiveConf: Using the default value passed in for log id: a3156ff3-c06b-4333-9294-e6fd634a4b9d 2025-10-15 20:24:08,298 INFO session.SessionState: Updating thread name to a3156ff3-c06b-4333-9294-e6fd634a4b9d main 2025-10-15 20:24:08,299 INFO SessionState: Added [/opt/installs/hive3.1.2/iotmp/a3156ff3-c06b-4333-9294-e6fd634a4b9d_resources/json-udf-1.3.8-jar-with-dependencies.jar] to class path 2025-10-15 20:24:08,300 INFO SessionState: Added resources: [hdfs:///common/lib/json-udf-1.3.8-jar-with-dependencies.jar] 2025-10-15 20:24:08,300 INFO conf.HiveConf: Using the default value passed in for log id: a3156ff3-c06b-4333-9294-e6fd634a4b9d 2025-10-15 20:24:08,300 INFO session.SessionState: Resetting thread name to main 2025-10-15 20:24:08,300 INFO conf.HiveConf: Using the default value passed in for log id: a3156ff3-c06b-4333-9294-e6fd634a4b9d 2025-10-15 20:24:08,300 INFO session.SessionState: Updating thread name to a3156ff3-c06b-4333-9294-e6fd634a4b9d main 2025-10-15 20:24:08,301 INFO SessionState: Added [/opt/installs/hive3.1.2/iotmp/a3156ff3-c06b-4333-9294-e6fd634a4b9d_resources/json-serde-1.3.8-jar-with-dependencies.jar] to class path 2025-10-15 20:24:08,301 INFO SessionState: Added resources: [hdfs:///common/lib/json-serde-1.3.8-jar-with-dependencies.jar] 2025-10-15 20:24:08,301 INFO conf.HiveConf: Using the default value passed in for log id: a3156ff3-c06b-4333-9294-e6fd634a4b9d 2025-10-15 20:24:08,302 INFO session.SessionState: Resetting thread name to main 2025-10-15 20:24:08,303 INFO conf.HiveConf: Using the default value passed in for log id: a3156ff3-c06b-4333-9294-e6fd634a4b9d 2025-10-15 20:24:08,303 INFO session.SessionState: Updating thread name to a3156ff3-c06b-4333-9294-e6fd634a4b9d main 2025-10-15 20:24:08,845 INFO ql.Driver: Compiling command(queryId=root_20251015202408_6d0ebb16-1bd2-4867-9b4c-c4ef8da002a7): CREATE TABLE IF NOT EXISTS `dim_nshop`.`dim_pub_area` ( `region_code` STRING, `region_code_desc` STRING, `region_city` STRING, `region_city_desc` STRING, `region_province` STRING, `region_province_desc` STRING) COMMENT 'Imported by sqoop on 2025/10/15 20:23:53' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE 2025-10-15 20:24:09,011 INFO metastore.RetryingMetaStoreClient: RetryingMetaStoreClient trying reconnect as root (auth:SIMPLE) 2025-10-15 20:24:09,015 INFO metastore.HiveMetaStoreClient: Trying to connect to metastore with URI thrift://caiji:9083 2025-10-15 20:24:09,018 INFO metastore.HiveMetaStoreClient: Opened a connection to metastore, current connections: 2 2025-10-15 20:24:09,022 INFO metastore.HiveMetaStoreClient: Connected to metastore. 2025-10-15 20:24:09,072 INFO metastore.HiveMetaStoreClient: Trying to connect to metastore with URI thrift://caiji:9083 2025-10-15 20:24:09,072 INFO metastore.HiveMetaStoreClient: Opened a connection to metastore, current connections: 3 2025-10-15 20:24:09,076 INFO metastore.HiveMetaStoreClient: Connected to metastore. 2025-10-15 20:24:09,076 INFO metastore.RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=root (auth:SIMPLE) retries=1 delay=1 lifetime=0 2025-10-15 20:24:09,358 INFO metadata.HiveMaterializedViewsRegistry: Materialized views registry has been initialized 2025-10-15 20:24:13,737 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 2025-10-15 20:24:13,760 INFO parse.CalcitePlanner: Starting Semantic Analysis 2025-10-15 20:24:13,825 INFO parse.CalcitePlanner: Creating table dim_nshop.dim_pub_area position=27 2025-10-15 20:24:14,225 INFO ql.Driver: Semantic Analysis Completed (retrial = false) 2025-10-15 20:24:14,246 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 2025-10-15 20:24:14,319 INFO ql.Driver: Completed compiling command(queryId=root_20251015202408_6d0ebb16-1bd2-4867-9b4c-c4ef8da002a7); Time taken: 5.741 seconds 2025-10-15 20:24:14,320 INFO reexec.ReExecDriver: Execution #1 of query 2025-10-15 20:24:14,329 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 2025-10-15 20:24:14,330 INFO ql.Driver: Executing command(queryId=root_20251015202408_6d0ebb16-1bd2-4867-9b4c-c4ef8da002a7): CREATE TABLE IF NOT EXISTS `dim_nshop`.`dim_pub_area` ( `region_code` STRING, `region_code_desc` STRING, `region_city` STRING, `region_city_desc` STRING, `region_province` STRING, `region_province_desc` STRING) COMMENT 'Imported by sqoop on 2025/10/15 20:23:53' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE 2025-10-15 20:24:14,413 INFO ql.Driver: Completed executing command(queryId=root_20251015202408_6d0ebb16-1bd2-4867-9b4c-c4ef8da002a7); Time taken: 0.084 seconds OK 2025-10-15 20:24:14,413 INFO ql.Driver: OK 2025-10-15 20:24:14,414 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager Time taken: 5.849 seconds 2025-10-15 20:24:14,417 INFO CliDriver: Time taken: 5.849 seconds 2025-10-15 20:24:14,417 INFO conf.HiveConf: Using the default value passed in for log id: a3156ff3-c06b-4333-9294-e6fd634a4b9d 2025-10-15 20:24:14,417 INFO session.SessionState: Resetting thread name to main 2025-10-15 20:24:14,418 INFO conf.HiveConf: Using the default value passed in for log id: a3156ff3-c06b-4333-9294-e6fd634a4b9d 2025-10-15 20:24:14,418 INFO session.SessionState: Updating thread name to a3156ff3-c06b-4333-9294-e6fd634a4b9d main 2025-10-15 20:24:14,424 INFO ql.Driver: Compiling command(queryId=root_20251015202414_6fef43ef-ccc7-4ebf-89b4-ef659209e529): LOAD DATA INPATH 'hdfs://caiji:9820/data/nshop/ods/dim_pub_area' INTO TABLE `dim_nshop`.`dim_pub_area` 2025-10-15 20:24:14,460 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 2025-10-15 20:24:15,264 INFO ql.Driver: Semantic Analysis Completed (retrial = false) 2025-10-15 20:24:15,265 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 2025-10-15 20:24:15,265 INFO ql.Driver: Completed compiling command(queryId=root_20251015202414_6fef43ef-ccc7-4ebf-89b4-ef659209e529); Time taken: 0.841 seconds 2025-10-15 20:24:15,265 INFO reexec.ReExecDriver: Execution #1 of query 2025-10-15 20:24:15,266 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 2025-10-15 20:24:15,266 INFO ql.Driver: Executing command(queryId=root_20251015202414_6fef43ef-ccc7-4ebf-89b4-ef659209e529): LOAD DATA INPATH 'hdfs://caiji:9820/data/nshop/ods/dim_pub_area' INTO TABLE `dim_nshop`.`dim_pub_area` 2025-10-15 20:24:15,301 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode 2025-10-15 20:24:15,306 INFO metastore.HiveMetaStoreClient: Closed a connection to metastore, current connections: 2 Loading data to table dim_nshop.dim_pub_area 2025-10-15 20:24:15,306 INFO exec.Task: Loading data to table dim_nshop.dim_pub_area from hdfs://caiji:9820/data/nshop/ods/dim_pub_area 2025-10-15 20:24:15,313 INFO metastore.HiveMetaStoreClient: Trying to connect to metastore with URI thrift://caiji:9083 2025-10-15 20:24:15,313 INFO metastore.HiveMetaStoreClient: Opened a connection to metastore, current connections: 3 2025-10-15 20:24:15,320 INFO metastore.HiveMetaStoreClient: Connected to metastore. 2025-10-15 20:24:15,320 INFO metastore.RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=root (auth:SIMPLE) retries=1 delay=1 lifetime=0 2025-10-15 20:24:15,975 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode 2025-10-15 20:24:15,976 INFO metastore.HiveMetaStoreClient: Closed a connection to metastore, current connections: 2 2025-10-15 20:24:15,976 INFO stats.BasicStatsTask: Executing stats task 2025-10-15 20:24:15,999 INFO metastore.HiveMetaStoreClient: Trying to connect to metastore with URI thrift://caiji:9083 2025-10-15 20:24:15,999 INFO metastore.HiveMetaStoreClient: Opened a connection to metastore, current connections: 3 2025-10-15 20:24:16,004 INFO metastore.HiveMetaStoreClient: Connected to metastore. 2025-10-15 20:24:16,004 INFO metastore.RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=root (auth:SIMPLE) retries=1 delay=1 lifetime=0 2025-10-15 20:24:16,948 INFO stats.BasicStatsTask: Table dim_nshop.dim_pub_area stats: [numFiles=2, numRows=0, totalSize=66444, rawDataSize=0] 2025-10-15 20:24:16,949 INFO ql.Driver: Completed executing command(queryId=root_20251015202414_6fef43ef-ccc7-4ebf-89b4-ef659209e529); Time taken: 1.682 seconds OK 2025-10-15 20:24:16,949 INFO ql.Driver: OK 2025-10-15 20:24:16,949 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager Time taken: 2.525 seconds 2025-10-15 20:24:16,949 INFO CliDriver: Time taken: 2.525 seconds 2025-10-15 20:24:16,949 INFO conf.HiveConf: Using the default value passed in for log id: a3156ff3-c06b-4333-9294-e6fd634a4b9d 2025-10-15 20:24:16,949 INFO session.SessionState: Resetting thread name to main 2025-10-15 20:24:16,950 INFO conf.HiveConf: Using the default value passed in for log id: a3156ff3-c06b-4333-9294-e6fd634a4b9d 2025-10-15 20:24:16,980 INFO session.SessionState: Deleted directory: /tmp/hive/root/a3156ff3-c06b-4333-9294-e6fd634a4b9d on fs with scheme hdfs 2025-10-15 20:24:16,982 INFO session.SessionState: Deleted directory: /opt/installs/hive3.1.2/iotmp/root/a3156ff3-c06b-4333-9294-e6fd634a4b9d on fs with scheme file 2025-10-15 20:24:16,983 INFO metastore.HiveMetaStoreClient: Closed a connection to metastore, current connections: 2 2025-10-15 20:24:16,984 INFO hive.HiveImport: Hive import complete. 2025-10-15 20:24:16,990 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory. [root@caiji ~]# Hive Session ID = a9894d69-9202-4b58-9ca0-6eef301aed2c Hive Session ID = ae9cf9ab-f39f-4427-ac3a-5814406e64aa Hive Session ID = 29df4bf0-a832-4031-b833-29dd1fafb9c8 Hive Session ID = b3f7b733-9fc0-43f1-9eed-dda7772f5f80 Hive Session ID = be9341af-4b2e-4f80-b38a-366118d7b883 Hive Session ID = 6b18de3f-9c17-4284-b156-e620d002430c Hive Session ID = 45b5e4e2-c4d5-4f4c-90e3-ac3dded07d75 Hive Session ID = a0d8dc7d-cf69-4027-9c73-72585f89b631 Hive Session ID = c172f183-771d-4192-b320-1c076d8e1bcc Hive Session ID = 7320aca4-1bba-4271-b1d8-847b5f028c85 Hive Session ID = d6498fc1-0e11-4cd7-925b-4d7f4dd5a67c Hive Session ID = f9f74c84-604d-450e-8181-b8c3484aad10
10-16
'''请在Begin-End之间补充代码, 完成dijkstra函数''' class PriorityQueue: def __init__(self): self.heapArray = [(0,0)] # 初始化一个列表,用来保存堆数据 self.currentSize = 0 # 用来跟踪记录堆当前的大小 # 从无序表建立一个堆 def buildHeap(self,alist): self.currentSize = len(alist) self.heapArray = [(0,0)] for i in alist: self.heapArray.append(i) i = len(alist) // 2 while (i > 0): self.percDown(i) i = i - 1 def percDown(self,i): while (i * 2) <= self.currentSize: mc = self.minChild(i) if self.heapArray[i][0] > self.heapArray[mc][0]: tmp = self.heapArray[i] self.heapArray[i] = self.heapArray[mc] self.heapArray[mc] = tmp i = mc # 求出最小子结点 def minChild(self,i): if i*2 > self.currentSize: return -1 else: if i*2 + 1 > self.currentSize: return i*2 else: if self.heapArray[i*2][0] < self.heapArray[i*2+1][0]: return i*2 else: return i*2+1 # 不断交换,直到新结点“上浮”到正确位的置来保持堆次序 def percUp(self,i): while i // 2 > 0: if self.heapArray[i][0] < self.heapArray[i//2][0]: tmp = self.heapArray[i//2] self.heapArray[i//2] = self.heapArray[i] self.heapArray[i] = tmp i = i//2 # 添加新的的数据 def add(self,k): self.heapArray.append(k) self.currentSize = self.currentSize + 1 self.percUp(self.currentSize) # 移走堆中的最小项 def delMin(self): retval = self.heapArray[1][1] self.heapArray[1] = self.heapArray[self.currentSize] self.currentSize = self.currentSize - 1 self.heapArray.pop() self.percDown(1) return retval # 返回堆是否为空 def isEmpty(self): if self.currentSize == 0: return True else: return False # 结点 val 的 key 改变为 amt,并对堆进行重新调整 def decreaseKey(self,val,amt): done = False i = 1 myKey = 0 while not done and i <= self.currentSize: # 找到顶点val if self.heapArray[i][1] == val: done = True myKey = i else: i = i + 1 if myKey > 0: self.heapArray[myKey] = (amt,self.heapArray[myKey][1]) self.percUp(myKey) def __contains__(self,vtx): for pair in self.heapArray: if pair[1] == vtx: return True return False def dijkstra(aGraph,start): pq = PriorityQueue() start.setDistance(0) # 开始顶点的距离设为0 pq.buildHeap([(v.getDistance(), v) for v in aGraph]) # 对所有顶点建堆,形成优先队列 while not pq.isEmpty(): # 当优先队列不为空时做以下操作 # 从优先队列中出队一个顶点作为currentVert # ********** Begin ********** # # ********** End ********** # for nextVert in currentVert.getConnections(): # 遍历与当前顶点相邻的顶点 newDist = currentVert.getDistance() + currentVert.getWeight(nextVert) # 求出新的距离 # 若新的距离值小于之前的距离值,就更新 # 更新距离 # 设置前驱顶点 # ********** Begin ********** # # ********** End ********** # pq.decreaseKey(nextVert, newDist) # 优先队列重排
最新发布
12-02
以下是补充 `dijkstra` 函数里代码以实现 Dijkstra 算法的示例代码,假设已经有 `PriorityQueue` 类的定义。 ```python import math from collections import defaultdict class PriorityQueue: def __init__(self): self.queue = [] def is_empty(self): return len(self.queue) == 0 def put(self, item, priority): self.queue.append((priority, item)) self.queue.sort() def get(self): if not self.is_empty(): return self.queue.pop(0)[1] def dijkstra(graph, start): # Begin distances = {node: math.inf for node in graph} distances[start] = 0 pq = PriorityQueue() pq.put(start, 0) while not pq.is_empty(): current_node = pq.get() for neighbor, weight in graph[current_node].items(): distance = distances[current_node] + weight if distance < distances[neighbor]: distances[neighbor] = distance pq.put(neighbor, distance) # End return distances # 示例图的邻接表表示 graph = { 'A': {'B': 1, 'C': 4}, 'B': {'A': 1, 'C': 2, 'D': 5}, 'C': {'A': 4, 'B': 2, 'D': 1}, 'D': {'B': 5, 'C': 1} } start_node = 'A' result = dijkstra(graph, start_node) print(result) ``` ### 代码解释 1. **初始化距离字典**:将所有节点的距离初始化为无穷大,起始节点的距离初始化为 0。 2. **创建优先队列**:使用 `PriorityQueue` 类创建一个优先队列,并将起始节点及其距离 0 加入队列。 3. **循环处理队列**:当队列不为空时,取出队列中距离最小的节点,更新其邻居节点的距离。如果通过当前节点到达邻居节点的距离比之前记录的距离小,则更新距离并将邻居节点加入队列。 4. **返回结果**:最终返回记录所有节点最短距离的字典。 ###
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值