hadoop 2.x-HDFS snapshot

本文介绍了Hadoop和HBase中快照的概念及使用场景,并比较了两者的实现方式和技术特点。重点讨论了快照如何提供读取历史数据的能力,以及其在网络带宽消耗和系统可靠性方面的影响。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

  I dont want to restruct wheels of open sources,in contrast, just wonder the implied features and use cases as possible.so i will write somethings to summary or memo.

Agenda

 1.what is 

 2.how to 

 3.hadoop snapshot vs hbase snapshot

 4.demos to use snapshot

 

  1.what is

  a long time ago,the term 'snapshot'  was introduced to describe 'the aspect of something in a point in-time',e.g memory snapshot,db's snapshot,or even google's page snapshot etc.but they have the similar or close means:a certain view/image of one thing in history.

  akin to hadoop's snapshot,we want to use this 'view' to cut the files at a point in-time.so its usages will like this:

  a. a periodic backup 

  b.restore some key data from mistaken deletions

  c.isolutes some important data from product for testing ,comparing etc

 

  and there are some features among this snapshot:

  -no any data to be moved or copied,so the network bandwidth is not affected

  -not causing too many tasks for namenode or datanode to deal with ,so reliability is also kept staying

 

  2.how to

  benefits from hdfs file support of write-once and read-many characteristic,hadoop snapshot uses it to function properly.when create a new snapshot on a dir,the namenode will register this dir as a snapshotable dir to provide protection:all operations include deletion ,move,or creation of files and dirs will only affect the 'metadata' in namenode,so the actual files and dirs will not applied instantly .so after a while,if u want to restore some files/dirs,u can move or copy  the snapshoted files or dirs from '.snapshot' dir to anywhere u wnat.when u delete the snapshot created before,then the prior operations will apply right now.

  for deep study of 'linked data structure' u can check out 'making data structures persistent'

 

  3.hadoop snapshot vs hbase snapshot

  according to the version releases between hadoop and hbase,i think hadoop's snapshot is introduced from hbase's one:) ,so the underlying implementions of them are similar.here are some differences in snapshot below:

 hadoophbasesupplement
copy/move datann 

gen new files refered

to original files

ny

hbase will gen many

temp files to point to the

real hdfs files

    

  so for a hhbase cluster,i think it's unnecessary to backup(snapshot) hadoop hdfs againt if use hbase snapshot already;else it should be.in the sense that there are most overlapings between both snapshots.

 

  4.demos to use snapshot

  there are some usage demos in apache official site [2],but i want to declare that this snapshot is 'read-only' (RO) instead of RW,hence then ,if u make some changes in the '.snapshot' dir will cause something errors,in addition ,if u want to check out the real principles of the commands,see details in 'NameNodeRpcServer.java'

 

 

ref:

jira:Support for RW/RO snapshots in HDFS

 

[2]HDFS Snapshots

hbase -tables replication/snapshot/backup within/cross clusters

hadoop-2.x --new features

2025-03-12 19:29:47,693 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT] 2025-03-12 19:29:47,748 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: createNameNode [] 2025-03-12 19:29:47,851 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties 2025-03-12 19:29:47,920 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 2025-03-12 19:29:47,920 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started 2025-03-12 19:29:47,931 INFO org.apache.hadoop.hdfs.server.namenode.NameNodeUtils: fs.defaultFS is hdfs://hadoop:8020 2025-03-12 19:29:47,931 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Clients should use hadoop:8020 to access this namenode/service. 2025-03-12 19:29:48,049 INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor 2025-03-12 19:29:58,112 INFO org.apache.hadoop.hdfs.DFSUtil: Filter initializers set : org.apache.hadoop.http.lib.StaticUserWebFilter,org.apache.hadoop.hdfs.web.AuthFilterInitializer 2025-03-12 19:29:58,135 INFO org.apache.hadoop.hdfs.DFSUtil: Starting Web-server for hdfs at: http://node:9870 2025-03-12 19:29:58,163 INFO org.eclipse.jetty.util.log: Logging initialized @10871ms to org.eclipse.jetty.util.log.Slf4jLog 2025-03-12 19:29:58,372 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets. Reason: Could not read signature secret file: /home/hadoop/hadoop-http-auth-signature-secret 2025-03-12 19:29:58,393 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.namenode is not defined 2025-03-12 19:29:58,400 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) 2025-03-12 19:29:58,404 INFO org.apache.hadoop.http.HttpServer2: Added
最新发布
03-14
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值