hadoop 中context.collect 和 context.write的区别

本文探讨了Hadoop中Context接口的使用变化。旧版本使用OutputCollector.collect方法,而新版本则采用Context.write进行键值对输出。文章揭示了这一转变背后的原因及其实现细节。

学hadoop的的时候很纠结,很多源码都是写 context.collect 而不是context.write 百度了一下,得出一下以下结论:

老版本是:output.collect(key, result); // output’s type is: OutputCollector
新版本是:context.write(key, result); // output’s type is: Context

21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.federation.gpg.application.cleaner.interval-ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.output.fileoutputformat.compress 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.federation.gpg.subcluster.cleaner.interval-ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.sharedcache.store.in-memory.staleness-period-mins 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.client.write.byte-array-manager.count-limit 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.runtime.linux.runc.layer-mounts-to-keep 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.security.group.mapping.providers.combined 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.running.map.limit 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.webapp.address 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.placement-constraints.scheduler.pool-size 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3a.multipart.size 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.client.slow.io.warning.threshold.ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.app.mapreduce.am.job.committer.commit-window 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.submithostname 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.edits.asynclogging 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.blockreport.incremental.intervalMsec 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.ifile.readahead 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.federation.state-store.sql.conn-time-out 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.runtime.linux.runc.image-tag-to-manifest-plugin 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.client.socketcache.capacity 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3a.select.input.csv.field.delimiter 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.client.retry.policy.spec 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.reencrypt.batch.size 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3a.connection.ssl.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.proxyuser.hadoop.hosts 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.read.considerLoad 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.datanode.max.slowdisks.to.exclude 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.security.groups.cache.secs 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.datanode.peer.stats.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.replication 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.storage.policy.satisfier.work.multiplier.per.iteration 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.security.group.mapping.ldap.directory.search.timeout 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.checksum.combine.mode 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.storage.policy.satisfier.max.outstanding.paths 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.sleep-delay-before-sigkill.ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.apps.cache.enable 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.automatic.close 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.reencrypt.edek.threads 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.disk-health-checker.disk-free-space-threshold.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.acls.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.client.short.circuit.replica.stale.threshold.ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.health-checker.run-before-startup 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.send.qop.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.jobhistory.intermediate-done-dir 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.slowpeer.collect.interval 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.client.server-defaults.validity.period.ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.client.libjars.wildcard 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.storage.policy.satisfier.address 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.reduce.shuffle.input.buffer.percent 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.audit.loggers 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for io.serializations 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.dispatcher.print-thread-pool.keep-alive-time 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.http.cross-origin.allowed-methods 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.snapshot.capture.openfiles 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.qjournal.queued-edits.limit.mb 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.zk.acl 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.container.stderr.pattern 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.cluster.local.dir 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for ipc.[port_number].cost-provider.impl 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.kerberos.kinit.command 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.metrics.logger.period.seconds 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.viewfs.overload.scheme.target.abfss.impl 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.block.access.token.lifetime 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.delegation.token.max-lifetime 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.datanode.drop.cache.behind.writes 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.system-metrics-publisher.timeline-server-v1.enable-batch 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.remove.dead.datanode.batchnum 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.submission-preprocessor.file-refresh-interval-ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.num.extra.edits.retained 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.block.placement.ec.classname 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for ipc.client.connect.max.retries.on.timeouts 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.client.resolve.topology.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.qjournal.http.open.timeout.ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for ha.health-monitor.connect-retry-interval.ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.edekcacheloader.initial.delay.ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.client.rbf.observer.read.enable 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.client.failover.resolver.useFQDN 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for io.mapfile.bloom.size 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.ftp.data.connection.mode 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.client-write-packet-size 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.app.mapreduce.shuffle.log.backups 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.kerberos.principal.pattern 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.webhdfs.socket.connect-timeout 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.scheduler.monitor.enable 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.proxyuser.hadoop.groups 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3a.select.output.csv.quote.character 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.task.stuck.timeout-ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.security.authorization 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.timeline-service.version 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.am.liveness-monitor.expiry-interval-ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.federation.gpg.webapp.address 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.timeline-service.leveldb-timeline-store.path 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.reduce.slowstart.completedmaps 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.delegation.token.max-lifetime 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.ha.automatic-failover.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.datanode.socket.write.timeout 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.accesstime.precision 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.security.group.mapping.ldap.conversion.rule 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for io.mapfile.bloom.error.rate 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.webapp.rest-csrf.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.timeline-service.leveldb-state-store.path 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.scheduler.configuration.zk-store.parent-path 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for ipc.[port_number].backoff.enable 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.timeline-service.writer.flush-interval-seconds 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.posix.acl.inheritance.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.datanode.outliers.report.interval 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.security.kms.client.encrypted.key.cache.low-watermark 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.top.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3a.retry.throttle.interval 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.jobhistory.webapp.rest-csrf.custom-header 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.webapp.xfs-filter.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for ipc.identity-provider.impl 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.client.cached.conn.retry 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.submission-preprocessor.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.system.tags 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.runtime.linux.runc.image-tag-to-manifest-plugin.num-manifests-to-cache 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.least-load-policy-selector.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.numa-awareness.numactl.cmd 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.path.based.cache.refresh.interval.ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.fs-limits.max-directory-items 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.ha.log-roll.period 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.distributed-scheduling.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.datanode.pmem.cache.recovery 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.minicluster.fixed.ports 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.storage.policy.satisfier.queue.limit 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.snapshot.filesystem.limit 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.resource.percentage-physical-cpu-limit 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.fs-limits.max-xattr-size 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.blocks.per.postponedblocks.rescan 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.maintenance.replication.min 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.timeline-service.app-aggregation-interval-secs 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.max.op.size 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.iostatistics.thread.level.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.reducer.unconditional-preempt.delay.sec 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.app.mapreduce.am.hard-kill-timeout-ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3a.connection.ttl 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.storage.policy.permissions.superuser-only 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.df.interval 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.cache.limit.max-single-resource-mb 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3a.assumed.role.session.duration 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.disk.balancer.block.tolerance.percent 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.webhdfs.netty.high.watermark 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.datanode.balance.max.concurrent.moves 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.log.delete.threshold 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.token.tracking.ids.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3a.assumed.role.credentials.provider 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.log-container-debug-info-on-error.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.security.kms.client.failover.sleep.max.millis 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.webapp.rest-csrf.custom-header 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.jobhistory.move.thread-count 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for io.compression.codec.zstd.level 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.timeline-service.http-authentication.simple.anonymous.allowed 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.provided.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.sharedcache.client-server.thread-count 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.scheduler.configuration.max.version 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.jobhistory.jobname.limit 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.dispatcher.print-events-info.threshold 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.decommission.blocks.per.interval 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.qjournal.write-txns.timeout.ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.federation.subcluster-resolver.class 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.read-lock-reporting-threshold-ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.task.timeout 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.resource.memory-mb 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.container-log-monitor.total-size-limit-bytes 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.fileoutputcommitter.algorithm.version 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.framework.name 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.router.clientrm.interceptor-class.pipeline 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.system-metrics-publisher.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.sharedcache.nested-level 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.security.dns.log-slow-lookups.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.jobhistory.webapp.https.address 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for file.client-write-packet-size 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for ipc.client.ping 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.federation.state-store.sql.idle-time-out 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.federation.gpg.policy.generator.interval 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.federation.gpg.webapp.https.address 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.balancer.max-no-move-interval 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.minicluster.control-resource-monitoring 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.disk.balancer.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.fs.state-store.num-retries 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.security.uid.cache.secs 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.ha.automatic-failover.zk-base-path 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.speculative.speculative-cap-running-tasks 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.node-labels.am.allow-non-exclusive-allocation 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.datanode.du.reserved.calculator 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.datanode.block.id.layout.upgrade.threads 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for io.erasurecode.codec.native.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.client.load.resource-types.from-server 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.client.application-client-protocol.poll-timeout-ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.datanode.oob.timeout-ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.sharedcache.mode 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.hdfs-servers 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.epoch.range 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.federation.gpg.subcluster.heartbeat.expiration-ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.map.output.compress 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.security.token.service.use_ip 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.security.kms.client.encrypted.key.cache.num.refill.threads 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.edekcacheloader.interval.ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.aux-services.mapreduce_shuffle.class 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.security.group.mapping.ldap.num.attempts.before.failover 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.du.interval 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.client.read.uri.cache.enabled 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.zk.retry-interval-ms 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.data.transfer.server.tcpnodelay 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.dir 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.http.client.failover.max.attempts 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3a.socket.send.buffer 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.client.block.write.locateFollowingBlock.retries 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.jvm.system-properties-to-log 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.enable.retrycache 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.encrypted-intermediate-data.buffer.kb 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.resource-plugins.gpu.docker-plugin.nvidia-docker-v1.endpoint 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.data.transfer.client.tcpnodelay 21:37:31.523 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.storage.policy.satisfier.mode 21:37:31.524 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.webapp.xfs-filter.xframe-options 21:37:31.524 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.reduce.memory.mb 21:37:31.524 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.caller.context.enabled 21:37:31.524 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.nodemanagers.heartbeat-interval-speedup-factor 21:37:31.524 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.qjournal.prepare-recovery.timeout.ms 21:37:31.524 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.router.deregister.subcluster.enabled 21:37:31.524 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.security.sensitive-config-keys 21:37:31.524 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.client.completion.pollinterval 21:37:31.524 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.namenode.secondary.http-address 21:37:31.524 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.router.interceptor.allow-partial-result.enable 21:37:31.524 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.webapp.https.address 21:37:31.524 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3a.retry.throttle.limit 21:37:31.524 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.permissions.allow.owner.set.quota 21:37:31.524 [main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.domainname.resolver.impl 21:37:31.597 [main] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://localhost:8080/ 21:37:31.598 [main] INFO org.apache.hadoop.mapreduce.Job - Running job: job_local1106899704_0001 21:37:31.601 [Thread-5] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in config null 21:37:31.603 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedAction [as: С (auth:SIMPLE)][action: org.apache.hadoop.mapreduce.Job$1@7c6442c2] java.lang.Exception: null at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1950) at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:329) at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:613) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1736) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1698) at cn.itcast.mr.dedup.MatrixMultiplication.main(MatrixMultiplication.java:128) 21:37:31.611 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedAction [as: С (auth:SIMPLE)][action: org.apache.hadoop.mapreduce.Job$1@2098d37d] java.lang.Exception: null at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1950) at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:329) at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:613) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1737) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1698) at cn.itcast.mr.dedup.MatrixMultiplication.main(MatrixMultiplication.java:128) 21:37:31.612 [Thread-5] DEBUG org.apache.hadoop.mapreduce.lib.output.PathOutputCommitterFactory - Looking for committer factory for path hdfs://192.168.88.101:8020/output 21:37:31.612 [Thread-5] DEBUG org.apache.hadoop.mapreduce.lib.output.PathOutputCommitterFactory - No scheme-specific factory defined in mapreduce.outputcommitter.factory.scheme.hdfs 21:37:31.612 [Thread-5] INFO org.apache.hadoop.mapreduce.lib.output.PathOutputCommitterFactory - No output committer factory defined, defaulting to FileOutputCommitterFactory 21:37:31.613 [Thread-5] DEBUG org.apache.hadoop.mapreduce.lib.output.PathOutputCommitterFactory - Creating FileOutputCommitter for path hdfs://192.168.88.101:8020/output and context TaskAttemptContextImpl{JobContextImpl{jobId=job_local1106899704_0001}; taskId=attempt_local1106899704_0001_m_000000_0, status=''} 21:37:31.613 [Thread-5] DEBUG org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter - Instantiating committer FileOutputCommitter{PathOutputCommitter{context=TaskAttemptContextImpl{JobContextImpl{jobId=job_local1106899704_0001}; taskId=attempt_local1106899704_0001_m_000000_0, status=''}; org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter@9f9d7ce}; outputPath=null, workPath=null, algorithmVersion=0, skipCleanup=false, ignoreCleanupFailures=false} with output path hdfs://192.168.88.101:8020/output and job context TaskAttemptContextImpl{JobContextImpl{jobId=job_local1106899704_0001}; taskId=attempt_local1106899704_0001_m_000000_0, status=''} 21:37:31.614 [Thread-5] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 2 21:37:31.614 [Thread-5] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 21:37:31.615 [Thread-5] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 21:37:31.622 [Thread-5] DEBUG org.apache.hadoop.fs.statistics.impl.IOStatisticsContextIntegration - Created instance IOStatisticsContextImpl{id=2, threadId=32, ioStatistics=counters=(); gauges=(); minimums=(); maximums=(); means=(); } 21:37:31.629 [Thread-5] DEBUG org.apache.hadoop.hdfs.DFSClient - /output/_temporary/0: masked={ masked: rwxr-xr-x, unmasked: rwxrwxrwx } 21:37:31.637 [IPC Parameter Sending Thread for xxjdxnj/192.168.88.101:8020] DEBUG org.apache.hadoop.ipc.Client - IPC Client (1759899303) connection to xxjdxnj/192.168.88.101:8020 from СIPC Client (1759899303) connection to xxjdxnj/192.168.88.101:8020 from С sending #3 org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs 21:37:31.649 [IPC Client (1759899303) connection to xxjdxnj/192.168.88.101:8020 from С] DEBUG org.apache.hadoop.ipc.Client - IPC Client (1759899303) connection to xxjdxnj/192.168.88.101:8020 from С got value #3 21:37:31.654 [Thread-5] DEBUG org.apache.hadoop.io.retry.RetryInvocationHandler - Exception while invoking call #3 ClientNamenodeProtocolTranslatorPB.mkdirs over null. Not retrying because try once and fail. org.apache.hadoop.ipc.RemoteException: Permission denied: user=С, access=WRITE, inode="/":hadoop:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:661) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:501) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermissionWithContext(FSPermissionChecker.java:525) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:395) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1964) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1945) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1904) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:60) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3531) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1173) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:750) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1246) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1169) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3203) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1584) at org.apache.hadoop.ipc.Client.call(Client.java:1529) at org.apache.hadoop.ipc.Client.call(Client.java:1426) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139) at jdk.proxy2/jdk.proxy2.$Proxy11.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.lambda$mkdirs$20(ClientNamenodeProtocolTranslatorPB.java:611) at org.apache.hadoop.ipc.internal.ShadedProtobufHelper.ipc(ShadedProtobufHelper.java:160) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:611) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:437) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:170) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:162) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:100) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:366) at jdk.proxy2/jdk.proxy2.$Proxy12.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2555) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2531) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1497) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1494) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1511) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1486) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2494) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.setupJob(FileOutputCommitter.java:356) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:541) 21:37:31.663 [IPC Parameter Sending Thread for xxjdxnj/192.168.88.101:8020] DEBUG org.apache.hadoop.ipc.Client - IPC Client (1759899303) connection to xxjdxnj/192.168.88.101:8020 from СIPC Client (1759899303) connection to xxjdxnj/192.168.88.101:8020 from С sending #4 org.apache.hadoop.hdfs.protocol.ClientProtocol.delete 21:37:31.674 [IPC Client (1759899303) connection to xxjdxnj/192.168.88.101:8020 from С] DEBUG org.apache.hadoop.ipc.Client - IPC Client (1759899303) connection to xxjdxnj/192.168.88.101:8020 from С got value #4 21:37:31.675 [Thread-5] DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine2 - Call: delete took 12ms 21:37:31.678 [Thread-5] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local1106899704_0001 org.apache.hadoop.security.AccessControlException: Permission denied: user=С, access=WRITE, inode="/":hadoop:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:661) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:501) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermissionWithContext(FSPermissionChecker.java:525) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:395) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1964) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1945) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1904) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:60) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3531) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1173) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:750) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1246) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1169) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3203) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2557) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2531) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1497) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1494) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1511) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1486) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2494) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.setupJob(FileOutputCommitter.java:356) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:541) Caused by: org.apache.hadoop.ipc.RemoteException: Permission denied: user=С, access=WRITE, inode="/":hadoop:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:661) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:501) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermissionWithContext(FSPermissionChecker.java:525) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:395) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1964) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1945) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1904) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:60) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3531) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1173) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:750) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1246) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1169) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3203) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1584) at org.apache.hadoop.ipc.Client.call(Client.java:1529) at org.apache.hadoop.ipc.Client.call(Client.java:1426) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139) at jdk.proxy2/jdk.proxy2.$Proxy11.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.lambda$mkdirs$20(ClientNamenodeProtocolTranslatorPB.java:611) at org.apache.hadoop.ipc.internal.ShadedProtobufHelper.ipc(ShadedProtobufHelper.java:160) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:611) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:437) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:170) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:162) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:100) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:366) at jdk.proxy2/jdk.proxy2.$Proxy12.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2555) ... 9 common frames omitted 21:37:31.683 [Thread-5] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedAction [as: С (auth:SIMPLE)][action: org.apache.hadoop.fs.FileContext$2@15fc336f] java.lang.Exception: null at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1950) at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:343) at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:465) at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:442) at org.apache.hadoop.fs.FileContext.getLocalFSFileContext(FileContext.java:428) at org.apache.hadoop.mapred.LocalDistributedCacheManager.close(LocalDistributedCacheManager.java:268) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:598) 21:37:32.626 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedAction [as: С (auth:SIMPLE)][action: org.apache.hadoop.mapreduce.Job$1@77b9d0c7] java.lang.Exception: null at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1950) at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:329) at org.apache.hadoop.mapreduce.Job.isUber(Job.java:1866) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1747) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1698) at cn.itcast.mr.dedup.MatrixMultiplication.main(MatrixMultiplication.java:128) 21:37:32.626 [main] INFO org.apache.hadoop.mapreduce.Job - Job job_local1106899704_0001 running in uber mode : false 21:37:32.628 [main] INFO org.apache.hadoop.mapreduce.Job - map 0% reduce 0% 21:37:32.628 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedAction [as: С (auth:SIMPLE)][action: org.apache.hadoop.mapreduce.Job$6@3b0ee03a] java.lang.Exception: null at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1950) at org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:730) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1759) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1698) at cn.itcast.mr.dedup.MatrixMultiplication.main(MatrixMultiplication.java:128) 21:37:32.629 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedAction [as: С (auth:SIMPLE)][action: org.apache.hadoop.mapreduce.Job$1@796065aa] java.lang.Exception: null at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1950) at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:329) at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:613) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1736) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1698) at cn.itcast.mr.dedup.MatrixMultiplication.main(MatrixMultiplication.java:128) 21:37:32.629 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedAction [as: С (auth:SIMPLE)][action: org.apache.hadoop.mapreduce.Job$1@28a6301f] java.lang.Exception: null at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1950) at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:329) at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:613) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1737) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1698) at cn.itcast.mr.dedup.MatrixMultiplication.main(MatrixMultiplication.java:128) 21:37:32.630 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedAction [as: С (auth:SIMPLE)][action: org.apache.hadoop.mapreduce.Job$6@2c306a57] java.lang.Exception: null at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1950) at org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:730) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1759) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1698) at cn.itcast.mr.dedup.MatrixMultiplication.main(MatrixMultiplication.java:128) 21:37:32.630 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedAction [as: С (auth:SIMPLE)][action: org.apache.hadoop.mapreduce.Job$1@773e2eb5] java.lang.Exception: null at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1950) at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:329) at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:613) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1736) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1698) at cn.itcast.mr.dedup.MatrixMultiplication.main(MatrixMultiplication.java:128) 21:37:32.631 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedAction [as: С (auth:SIMPLE)][action: org.apache.hadoop.mapreduce.Job$1@d8948cd] java.lang.Exception: null at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1950) at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:329) at org.apache.hadoop.mapreduce.Job.isSuccessful(Job.java:625) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1763) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1698) at cn.itcast.mr.dedup.MatrixMultiplication.main(MatrixMultiplication.java:128) 21:37:32.631 [main] INFO org.apache.hadoop.mapreduce.Job - Job job_local1106899704_0001 failed with state FAILED due to: NA 21:37:32.631 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedAction [as: С (auth:SIMPLE)][action: org.apache.hadoop.mapreduce.Job$8@7abe27bf] java.lang.Exception: null at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1950) at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:818) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1770) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1698) at cn.itcast.mr.dedup.MatrixMultiplication.main(MatrixMultiplication.java:128) 21:37:32.651 [main] INFO org.apache.hadoop.mapreduce.Job - Counters: 0 21:37:32.651 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedAction [as: С (auth:SIMPLE)][action: org.apache.hadoop.mapreduce.Job$1@2679311f] java.lang.Exception: null at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1950) at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:329) at org.apache.hadoop.mapreduce.Job.isSuccessful(Job.java:625) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1710) at cn.itcast.mr.dedup.MatrixMultiplication.main(MatrixMultiplication.java:128) 21:37:32.653 [shutdown-hook-0] DEBUG org.apache.hadoop.fs.FileSystem - FileSystem.close() by method: org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1530)); Key: (С (auth:SIMPLE))@hdfs://192.168.88.101:8020; URI: hdfs://192.168.88.101:8020; Object Identity Hash: 2e075efe 21:37:32.653 [shutdown-hook-0] DEBUG org.apache.hadoop.ipc.Client - stopping client from cache: Client-e9ac678cebb441d58dd3dc3f8f54b798 21:37:32.654 [shutdown-hook-0] DEBUG org.apache.hadoop.ipc.Client - removing client from cache: Client-e9ac678cebb441d58dd3dc3f8f54b798 21:37:32.654 [shutdown-hook-0] DEBUG org.apache.hadoop.ipc.Client - stopping actual client because no more references remain: Client-e9ac678cebb441d58dd3dc3f8f54b798 21:37:32.654 [shutdown-hook-0] DEBUG org.apache.hadoop.ipc.Client - Stopping client 21:37:32.655 [IPC Client (1759899303) connection to xxjdxnj/192.168.88.101:8020 from С] DEBUG org.apache.hadoop.ipc.Client - IPC Client (1759899303) connection to xxjdxnj/192.168.88.101:8020 from С: closed 21:37:32.655 [IPC Client (1759899303) connection to xxjdxnj/192.168.88.101:8020 from С] DEBUG org.apache.hadoop.ipc.Client - IPC Client (1759899303) connection to xxjdxnj/192.168.88.101:8020 from С: stopped, remaining connections 0 21:37:32.655 [shutdown-hook-0] DEBUG org.apache.hadoop.fs.FileSystem - FileSystem.close() by method: org.apache.hadoop.fs.FilterFileSystem.close(FilterFileSystem.java:529)); Key: (С (auth:SIMPLE))@file://; URI: file:///; Object Identity Hash: 2a38dfe6 21:37:32.655 [shutdown-hook-0] DEBUG org.apache.hadoop.fs.FileSystem - FileSystem.close() by method: org.apache.hadoop.fs.RawLocalFileSystem.close(RawLocalFileSystem.java:895)); Key: null; URI: file:///; Object Identity Hash: 6f3a54c5 21:37:32.656 [shutdown-hook-0] DEBUG org.apache.hadoop.hdfs.KeyProviderCache - Invalidating all cached KeyProviders. 21:37:32.656 [Thread-1] DEBUG org.apache.hadoop.util.ShutdownHookManager - Completed shutdown in 0.004 seconds; Timeouts: 0 21:37:32.664 [Thread-1] DEBUG org.apache.hadoop.util.ShutdownHookManager - ShutdownHookManager completed shutdown. Process finished with exit code 1
06-22
import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class WordCount { /* * MapReduceBase类:实现MapperReducer接口的基类 * Mapper接口: * WritableComparable接口:实现WritableComparable的类可以相互比较。所有被用作key的类要实现此接口。 */ public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ /* *LongWritable,IntWritable,Text是Hadoop中实现的用于封装Java数据类型的类,这些类实现了WritableComparable接口, *都能够被串行化,便于在分布式环境中进行数据交换,可以视为long,int,String数据类型的替代。 */ private final static IntWritable one = new IntWritable(1); private Text word = new Text();//Text实现了BinaryComparable类,可以作为key值 /* * Mapper接口中的map方法: * void map(K1 key, V1 value, OutputCollector<K2,V2> output, Reporter reporter) * 映射一个单个的输入<K1,V1>对到一个中间输出<K2,V2>对 * 中间输出对不需要输入对是相同的类型,输入对可以映射到0个或多个输出对。 * OutputCollector接口:收集MapperReducer输出的<K,V>对。 * OutputCollector接口的collect(k, v)方法:增加一个(k,v)对到output * Reporter 用于报告整个应用的运行进度 */ public void map(Object key, Text value, Context context) throws IOException, InterruptedException { /** 原始数据(以test1.txt为例): *tale as old as time true as it can be beauty and the beast map阶段,数据如下形式作为map的输入值:key为偏移量 <0 tale as old as time> <21 world java hello> <39 you me too> */ /** * 解析(Spliting)后以得到键值对<K2,V2>(仅以test1.txt为例) * 格式如下:前者是键值,后者数字是值 * tale 1 * as 1 * old 1 * as 1 * time 1 * true 1 * as 1 * it 1 * can 1 * be 1 * beauty 1 * and 1
最新发布
10-28
package cn.itcast.mr.dedup; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class ParallelFPGrowth { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); // 配置Hadoop集群连接 conf.setInt("ipc.maximum.data.length", 2000000000); conf.setInt("ipc.maximum.response.length", 2000000000); conf.setInt("dfs.client.socket-timeout", 1200000); conf.setBoolean("mapreduce.map.output.compress", true); conf.set("mapreduce.map.output.compress.codec", "org.apache.hadoop.io.compress.SnappyCodec"); conf.set("fs.defaultFS", "hdfs://192.168.88.101:8020"); // PFP算法参数配置 conf.setFloat("pfp.min.support", 0.05f); // 最小支持度 conf.setInt("pfp.num.groups", 10); // 分组数量 conf.setInt("pfp.max.heap.size", 50); // 最大堆大小 // 直接指定输入输出路径 String inputPath = "hdfs://192.168.88.101:8020/input1"; String outputPath = "hdfs://192.168.88.101:8020/output2"; // 第一阶段:计算频繁项 Job job1 = Job.getInstance(conf, "PFP Pass 1"); job1.setJarByClass(ParallelFPGrowth.class); job1.setMapperClass(PFPMapper.Pass1Mapper.class); job1.setReducerClass(PFPReducer.Pass1Reducer.class); job1.setOutputKeyClass(Text.class); job1.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job1, new Path(inputPath)); FileOutputFormat.setOutputPath(job1, new Path(outputPath + "/pass1")); if (!job1.waitForCompletion(true)) { System.exit(1); } // 第二阶段:并行FP - Growth Job job2 = Job.getInstance(conf, "PFP Pass 2"); job2.setJarByClass(ParallelFPGrowth.class); job2.setMapperClass(PFPMapper.Pass2Mapper.class); job2.setReducerClass(PFPReducer.Pass2Reducer.class); job2.setOutputKeyClass(Text.class); job2.setOutputValueClass(Text.class); FileInputFormat.addInputPath(job2, new Path(inputPath)); FileOutputFormat.setOutputPath(job2, new Path(outputPath + "/pass2")); System.exit(job2.waitForCompletion(true)? 0 : 1); } };package cn.itcast.mr.dedup; import java.util.List; public class Pattern { private List items; private int support; public Pattern(List<String> items, int support) { this.items = items; this.support = support; } public List<String> getItems() { return items; } public int getSupport() { return support; } @Override public String toString() { return items.toString() + " (" + support + ")"; } };package cn.itcast.mr.dedup; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import java.io.IOException; import java.util.ArrayList; import java.util.List; public class PFPMapper { // 第一阶段Mapper:计算项频次 public static class Pass1Mapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text item = new Text(); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] items = value.toString().split(" "); for (String i : items) { // 只保留字母组成的标签(过滤标点、数字等) if (i.matches("[a-zA-Z]+")) { item.set(i); context.write(item, one); } } } } // 第二阶段Mapper:分组处理事务 public static class Pass2Mapper extends Mapper<LongWritable, Text, Text, Text> { private Text groupId = new Text(); private Text transaction = new Text(); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] items = value.toString().split("\\s+"); if (items.length == 0) return; // 根据第一个项的哈希值决定分组 int group = Math.abs(items[0].hashCode()) % context.getConfiguration().getInt("pfp.num.groups", 10); groupId.set(String.valueOf(group)); // 构建事务字符串 StringBuilder sb = new StringBuilder(); for (String item : items) { sb.append(item).append(" "); } transaction.set(sb.toString().trim()); context.write(groupId, transaction); } } };package cn.itcast.mr.dedup; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; import java.io.IOException; import java.util.ArrayList; import java.util.Arrays; import java.util.List; import java.util.Map; public class PFPReducer { // 第一阶段Reducer:汇总项频次 public static class Pass1Reducer extends Reducer<Text, org.apache.hadoop.io.IntWritable, Text, org.apache.hadoop.io.IntWritable> { private org.apache.hadoop.io.IntWritable result = new org.apache.hadoop.io.IntWritable(); @Override protected void reduce(Text key, Iterable<org.apache.hadoop.io.IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (org.apache.hadoop.io.IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } // 第二阶段Reducer:执行优化后的FP-Growth算法 public static class Pass2Reducer extends Reducer<Text, Text, Text, Text> { private Text result = new Text(); private int numThreads; @Override protected void setup(Context context) throws IOException, InterruptedException { super.setup(context); // 从配置中获取线程数,默认为可用处理器数量 numThreads = context.getConfiguration().getInt("pfp.num.threads", Runtime.getRuntime().availableProcessors()); } @Override protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { // 收集本组所有事务 List<List<String>> transactions = new ArrayList<>(); for (Text val : values) { String[] items = val.toString().split(" "); transactions.add(Arrays.asList(items)); } // 获取最小支持度 float minSupport = context.getConfiguration().getFloat("pfp.min.support", 0.05f); // 创建FPGrowth实例,使用多线程优化 FPGrowth fpGrowth = new FPGrowth(minSupport, numThreads); Map<List<String>, Integer> frequentPatterns = fpGrowth.findFrequentPatterns(transactions); // 输出频繁项集 for (Map.Entry<List<String>, Integer> entry : frequentPatterns.entrySet()) { StringBuilder patternStr = new StringBuilder(); for (String item : entry.getKey()) { patternStr.append(item).append(" "); } result.set(patternStr.toString().trim() + " (" + entry.getValue() + ")"); context.write(key, result); } } } };package cn.itcast.mr.dedup; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.InputSplit; import org.apache.hadoop.mapreduce.RecordReader; import org.apache.hadoop.mapreduce.TaskAttemptContext; import org.apache.hadoop.mapreduce.lib.input.LineRecordReader; import java.io.IOException; public class TransactionReader extends RecordReader<Text, Text> { private LineRecordReader lineRecordReader; private Text key; private Text value; public TransactionReader() { lineRecordReader = new LineRecordReader(); } @Override public void initialize(InputSplit split, TaskAttemptContext context) throws IOException, InterruptedException { lineRecordReader.initialize(split, context); } @Override public boolean nextKeyValue() throws IOException, InterruptedException { if (!lineRecordReader.nextKeyValue()) { return false; } // 使用行号作为key,行内容作为value if (key == null) { key = new Text(); } if (value == null) { value = new Text(); } key.set(String.valueOf(lineRecordReader.getCurrentKey().get())); value.set(lineRecordReader.getCurrentValue()); return true; } @Override public Text getCurrentKey() throws IOException, InterruptedException { return key; } @Override public Text getCurrentValue() throws IOException, InterruptedException { return value; } @Override public float getProgress() throws IOException, InterruptedException { return lineRecordReader.getProgress(); } @Override public void close() throws IOException { lineRecordReader.close(); } };package cn.itcast.mr.dedup; import java.util.HashMap; import java.util.Map; class TreeNode { String item; int count; TreeNode parent; Map<String, TreeNode> children; TreeNode nodeLink; public TreeNode(String item, int count, TreeNode parent) { this.item = item; this.count = count; this.parent = parent; this.children = new HashMap<>(); this.nodeLink = null; } public void increment(int count) { this.count += count; } };package cn.itcast.mr.dedup; import java.util.; import java.util.concurrent.; import java.util.stream.Collectors; /** FP-Growth算法实现类,支持并行处理以提高性能 */ public class FPGrowth { private final float minSupport; private final int numThreads; private final ExecutorService executorService; public FPGrowth(float minSupport) { this(minSupport, Runtime.getRuntime().availableProcessors()); } public FPGrowth(float minSupport, int numThreads) { this.minSupport = minSupport; this.numThreads = numThreads; this.executorService = Executors.newFixedThreadPool(numThreads); } public Map<List, Integer> findFrequentPatterns(List<List> transactions) { if (transactions.isEmpty()) { return Collections.emptyMap(); } // 第一阶段:统计项的全局频率 Map<String, Integer> globalItemCounts = countItems(transactions); // 计算最小支持度计数 int minCount = (int) Math.ceil(minSupport * transactions.size()); // 过滤并排序频繁项 List<String> frequentItems = globalItemCounts.entrySet().stream() .filter(e -> e.getValue() >= minCount) .sorted(Map.Entry.<String, Integer>comparingByValue(Comparator.reverseOrder()) .thenComparing(Map.Entry.comparingByKey())) .map(Map.Entry::getKey) .collect(Collectors.toList()); // 构建项到索引的映射,提高查找效率 Map<String, Integer> itemIndexMap = new HashMap<>(); for (int i = 0; i < frequentItems.size(); i++) { itemIndexMap.put(frequentItems.get(i), i); } // 第二阶段:构建FP树 TreeNode root = buildFPTree(transactions, itemIndexMap, minCount); // 第三阶段:挖掘频繁模式 Map<List<String>, Integer> frequentPatterns = new ConcurrentHashMap<>(); try { // 为每个频繁项创建任务 List<Callable<Void>> tasks = new ArrayList<>(); for (String item : frequentItems) { tasks.add(() -> { List<String> prefixPath = Collections.singletonList(item); Map<List<String>, Integer> conditionalPatterns = findPatternsInConditionalTree(root, item, itemIndexMap, minCount); // 将条件模式添加到结果中 for (Map.Entry<List<String>, Integer> entry : conditionalPatterns.entrySet()) { List<String> pattern = new ArrayList<>(prefixPath); pattern.addAll(entry.getKey()); frequentPatterns.put(pattern, entry.getValue()); } // 添加单个项的支持度 frequentPatterns.put(prefixPath, globalItemCounts.get(item)); return null; }); } // 执行所有任务 executorService.invokeAll(tasks); } catch (InterruptedException e) { Thread.currentThread().interrupt(); throw new RuntimeException("挖掘频繁模式时被中断", e); } finally { executorService.shutdown(); } return frequentPatterns; } private Map<String, Integer> countItems(List<List> transactions) { Map<String, Integer> itemCounts = new ConcurrentHashMap<>(); // 并行统计项的频率 transactions.parallelStream().forEach(transaction -> { for (String item : transaction) { itemCounts.compute(item, (k, v) -> v == null ? 1 : v + 1); } }); return itemCounts; } private TreeNode buildFPTree(List<List> transactions, Map<String, Integer> itemIndexMap, int minCount) { // 创建根节点 TreeNode root = new TreeNode(“Null”, 0, null); // 构建项头表 Map<String, TreeNode> headerTable = new HashMap<>(); // 并行处理事务 List<Runnable> tasks = new ArrayList<>(); for (List<String> transaction : transactions) { // 过滤并排序事务中的项 List<String> filteredItems = transaction.stream() .filter(itemIndexMap::containsKey) .sorted(Comparator.comparingInt(itemIndexMap::get)) .collect(Collectors.toList()); if (!filteredItems.isEmpty()) { tasks.add(() -> insertTransaction(filteredItems, root, headerTable)); } } // 并行执行插入任务 executeTasksInParallel(tasks); return root; } private void insertTransaction(List items, TreeNode root, Map<String, TreeNode> headerTable) { TreeNode currentNode = root; for (String item : items) { // 检查子节点中是否已存在该项 TreeNode child = currentNode.children.get(item); if (child == null) { // 创建新节点 child = new TreeNode(item, 1, currentNode); currentNode.children.put(item, child); // 更新头表 updateHeaderTable(item, child, headerTable); } else { // 节点已存在,增加计数 child.increment(1); } currentNode = child; } } private void updateHeaderTable(String item, TreeNode node, Map<String, TreeNode> headerTable) { synchronized (headerTable) { if (!headerTable.containsKey(item)) { headerTable.put(item, node); } else { // 找到链表末尾并添加新节点 TreeNode tail = headerTable.get(item); while (tail.nodeLink != null) { tail = tail.nodeLink; } tail.nodeLink = node; } } } private Map<List, Integer> findPatternsInConditionalTree( TreeNode root, String item, Map<String, Integer> itemIndexMap, int minCount) { // 收集所有条件模式基 List<List<String>> conditionalPatternBases = new ArrayList<>(); // 从项头表获取该项目的所有节点 TreeNode node = findNodeInHeaderTable(root, item); while (node != null) { // 构建前缀路径 List<String> prefixPath = buildPrefixPath(node); if (!prefixPath.isEmpty()) { // 前缀路径重复出现的次数等于当前节点的计数 for (int i = 0; i < node.count; i++) { conditionalPatternBases.add(new ArrayList<>(prefixPath)); } } node = node.nodeLink; } // 如果没有条件模式基,直接返回 if (conditionalPatternBases.isEmpty()) { return Collections.emptyMap(); } // 递归挖掘条件FP树 return findFrequentPatterns(conditionalPatternBases); } private TreeNode findNodeInHeaderTable(TreeNode root, String item) { // 这里简化处理,实际实现需要遍历头表 // 为了代码简洁,假设根节点的子节点包含头表信息 return root.children.get(item); } private List buildPrefixPath(TreeNode node) { List path = new ArrayList<>(); TreeNode current = node.parent; while (current != null && current.item != null) { path.add(0, current.item); current = current.parent; } return path; } private void executeTasksInParallel(List tasks) { // 使用线程池并行执行任务 List<Future<?>> futures = new ArrayList<>(); for (Runnable task : tasks) { futures.add(executorService.submit(task)); } // 等待所有任务完成 for (Future<?> future : futures) { try { future.get(); } catch (Exception e) { throw new RuntimeException("执行任务时出错", e); } } } }以上代码运行时出现以下错误:17:32:52.294 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedAction [as: С (auth:SIMPLE)][action: org.apache.hadoop.mapreduce.Job$ 6@1229a2b7] java.lang.Exception: null at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1950) at org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:730) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1759) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1698) at cn.itcast.mr.dedup.ParallelFPGrowth.main(ParallelFPGrowth.java:57) 17:32:52.294 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedAction [as: С (auth:SIMPLE)][action: org.apache.hadoop.mapreduce.Job$ 1@e5cbff2] java.lang.Exception: null at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1950) at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:329) at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:613) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1736) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1698) at cn.itcast.mr.dedup.ParallelFPGrowth.main(ParallelFPGrowth.java:57)但是我的集群运行正常,路径上的output文件夹已删,并且我是直接在IDEA上面运行的,怎么修改比较好
06-27
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

明知道的博客

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值