OpenStack Swift源码分析(二)ring文件的生成

本文深入分析了Swift-Ring-Bin文件中的关键操作,特别是rebalance方法,该方法用于重新生成ring文件,以实现系统中partition的平衡分布。文章详细解释了一致性哈希算法、副本概念、zone概念和weight概念如何通过此方法得以实现。同时,文章还介绍了rebalance过程中的重要步骤和异常处理机制。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

 上一遍源码分析,关注swift-ring-bin文件,其中最为复杂,也是最为重要操作要数rebalance方法了,它是用来重新生成ring文件,再你修改builder文件后(例如增减设备)使系统中的partition分布平衡(当然,在rebalance后,需要重新启动系统的各个服务)。其中一致性的哈希算法,副本的概念,zone的概念,weight的概念都是通过它来实现的。

源码片段:

swift-ring-builder rebalance方法。    

01 def rebalance():
02         """
03 swift-ring-builder <builder_file> rebalance
04     Attempts to rebalance the ring by reassigning partitions that haven't been
05     recently reassigned.
06         """
07         devs_changed = builder.devs_changed #devs_changed代表builder中的devs是否改变,默认是Flase,当调用add_dev,set_dev_weight,remove_dev,会把devs_changed设置为True。
08         try:
09             last_balance = builder.get_balance()#调用builder.get_balance方法,返回ring的banlance  也就是平衡度 例如0.83%。
10             parts, balance = builder.rebalance()#主要的重平衡方法,返回重新分配的partition的数目和新的balance。
11         except exceptions.RingBuilderError, e:
12             print '-' * 79
13             print ("An error has occurred during ring validation. Common\n"
14                    "causes of failure are rings that are empty or do not\n"
15                    "have enough devices to accommodate the replica count.\n"
16                    "Original exception message:\n %s" % e.message
17                    )
18             print '-' * 79
19             exit(EXIT_ERROR)
20         if not parts:
21             print 'No partitions could be reassigned.'
22             print 'Either none need to be or none can be due to ' \
23                   'min_part_hours [%s].' % builder.min_part_hours
24             exit(EXIT_WARNING)
25         if not devs_changed and abs(last_balance - balance) < 1:
26             print 'Cowardly refusing to save rebalance as it did not change ' \
27                   'at least 1%.'
28             exit(EXIT_WARNING)
29         try:
30             builder.validate()#安全功能方法,捕捉bugs,确保partition发配到真正的device上,不被分配两次等等一些功能。
31         except exceptions.RingValidationError, e:
32             print '-' * 79
33             print ("An error has occurred during ring validation. Common\n"
34                    "causes of failure are rings that are empty or do not\n"
35                    "have enough devices to accommodate the replica count.\n"
36                    "Original exception message:\n %s" % e.message
37                    )
38             print '-' * 79
39             exit(EXIT_ERROR)
40         print 'Reassigned %d (%.02f%%) partitions. Balance is now %.02f.' % \
41               (parts, 100.0 * parts / builder.parts, balance)#打印rebalance结果
42         status = EXIT_SUCCESS
43         if balance > 5#balnce大于5会提示,最小的系统平衡时间。
44             print '-' * 79
45             print 'NOTE: Balance of %.02f indicates you should push this ' % \
46                   balance
47             print '      ring, wait at least %d hours, and rebalance/repush.' \
48                   % builder.min_part_hours
49             print '-' * 79
50             status = EXIT_WARNING
51         ts = time()#截取时间。
52         builder.get_ring().save( #保存新生成的builder ring文件
53             pathjoin(backup_dir, '%d.' % ts + basename(ring_file)))
54         pickle.dump(builder.to_dict(), open(pathjoin(backup_dir,
55             '%d.' % ts + basename(argv[1])), 'wb'), protocol=2)
56         builder.get_ring().save(ring_file)
57         pickle.dump(builder.to_dict(), open(argv[1], 'wb'), protocol=2)
58         exit(status)

 

    其中我加入了一些自己的注释,方便理解。实际上是调用了builder.py中的rebalance方法。

 builder.py 中的rebalance方法:

01 def rebalance(self):
02     """
03     Rebalance the ring.
04  
05     This is the main work function of the builder, as it will assign and
06     reassign partitions to devices in the ring based on weights, distinct
07     zones, recent reassignments, etc.
08  
09     The process doesn't always perfectly assign partitions (that'd take a
10     lot more analysis and therefore a lot more time -- I had code that did
11     that before). Because of this, it keeps rebalancing until the device
12     skew (number of partitions a device wants compared to what it has) gets
13     below 1% or doesn't change by more than 1% (only happens with ring that
14     can't be balanced no matter what -- like with 3 zones of differing
15     weights with replicas set to 3).
16  
17     :returns: (number_of_partitions_altered, resulting_balance)
18     """
19     self._ring = None #令实例中的ring为空
20     if self._last_part_moves_epoch is None:
21         self._initial_balance() #增加一些初始化设置的balance方法,
22         self.devs_changed = False
23         return self.parts, self.get_balance()
24     retval = 0
25     self._update_last_part_moves()#更新part moved时间。
26     last_balance = 0
27     while True:
28         reassign_parts = self._gather_reassign_parts()#返回一个list(part,replica)对,需要重新分配。
29         self._reassign_parts(reassign_parts) #重新分配的实际动作
30         retval += len(reassign_parts)
31         while self._remove_devs:
32             self.devs[self._remove_devs.pop()['id']] = None #删除相应的dev
33         balance = self.get_balance()#获取新的平衡比
34         if balance < 1 or abs(last_balance - balance) < 1 or \
35                 retval == self.parts:
36             break
37         last_balance = balance
38     self.devs_changed = False
39     self.version += 1
40     return retval, balance

    程序会根据_last_part_moves_epoch是否为None来决定,程序执行的路线。如果为None(说明是第一次rebalance),程序会调用_initial_balance()方法,然后返回结果,其实它的操作跟_last_part_moves_epoch不为None时,进行的操作大体相同,只是_initial_balance会做一些初始化的操作。而真正执行rebalance操作动作的是_reassign_parts方法。

 builder.py中的_reassign_parts分配part的动作方法。

001 def _reassign_parts(self, reassign_parts):
002         """
003         For an existing ring data set, partitions are reassigned similarly to
004         the initial assignment. The devices are ordered by how many partitions
005         they still want and kept in that order throughout the process. The
006         gathered partitions are iterated through, assigning them to devices
007         according to the "most wanted" while keeping the replicas as "far
008         apart" as possible. Two different zones are considered the
009         farthest-apart things, followed by different ip/port pairs within a
010         zone; the least-far-apart things are different devices with the same
011         ip/port pair in the same zone.
012  
013         If you want more replicas than devices, you won't get all your
014         replicas.
015  
016         :param reassign_parts: An iterable of (part, replicas_to_replace)
017                                pairs. replicas_to_replace is an iterable of the
018                                replica (an int) to replace for that partition.
019                                replicas_to_replace may be shared for multiple
020                                partitions, so be sure you do not modify it.
021         """
022         for dev in self._iter_devs():
023             dev['sort_key'= self._sort_key_for(dev)#设置每一个dev的sort_key
024         available_devs = #迭代出可用的devs根据sort_key排序
025             sorted((d for in self._iter_devs() if d['weight']),
026                    key=lambda x: x['sort_key'])
027  
028         tier2children = build_tier_tree(available_devs)#生产层结构devs
029  
030         tier2devs = defaultdict(list)#devs层
031         tier2sort_key = defaultdict(list)#sort_key层
032         tiers_by_depth = defaultdict(set)#深度层
033         for dev in available_devs:#安装不同方式分类排序。
034             for tier in tiers_for_dev(dev):
035                 tier2devs[tier].append(dev)  # <-- starts out sorted!
036                 tier2sort_key[tier].append(dev['sort_key'])
037                 tiers_by_depth[len(tier)].add(tier)
038  
039         for part, replace_replicas in reassign_parts:
040             # Gather up what other tiers (zones, ip_ports, and devices) the
041             # replicas not-to-be-moved are in for this part.
042             other_replicas = defaultdict(lambda0)#不同的zone ip_port device_id标识
043             for replica in xrange(self.replicas):
044                 if replica not in replace_replicas:
045                     dev = self.devs[self._replica2part2dev[replica][part]]
046                     for tier in tiers_for_dev(dev):
047                         other_replicas[tier] += 1#不需要重新分配的会被+1
048  
049             def find_home_for_replica(tier=(), depth=1):
050                 # Order the tiers by how many replicas of this
051                 # partition they already have. Then, of the ones
052                 # with the smallest number of replicas, pick the
053                 # tier with the hungriest drive and then continue
054                 # searching in that subtree.
055                 #
056                 # There are other strategies we could use here,
057                 # such as hungriest-tier (i.e. biggest
058                 # sum-of-parts-wanted) or picking one at random.
059                 # However, hungriest-drive is what was used here
060                 # before, and it worked pretty well in practice.
061                 #
062                 # Note that this allocator will balance things as
063                 # evenly as possible at each level of the device
064                 # layout. If your layout is extremely unbalanced,
065                 # this may produce poor results.
066                 candidate_tiers = tier2children[tier]#逐层的找最少的part
067                 min_count = min(other_replicas[t] for in candidate_tiers)
068                 candidate_tiers = [t for in candidate_tiers
069                                    if other_replicas[t] == min_count]
070                 candidate_tiers.sort(
071                     key=lambda t: tier2sort_key[t][-1])
072  
073                 if depth == max(tiers_by_depth.keys()):
074                     return tier2devs[candidate_tiers[-1]][-1]
075  
076                 return find_home_for_replica(tier=candidate_tiers[-1],
077                                              depth=depth + 1)
078  
079             for replica in replace_replicas:#对于要分配的dev做相应的处理
080                 dev = find_home_for_replica()
081                 dev['parts_wanted'-= 1
082                 dev['parts'+= 1
083                 old_sort_key = dev['sort_key']
084                 new_sort_key = dev['sort_key'= self._sort_key_for(dev)
085                 for tier in tiers_for_dev(dev):
086                     other_replicas[tier] += 1
087  
088                     index = bisect.bisect_left(tier2sort_key[tier],
089                                                old_sort_key)
090                     tier2devs[tier].pop(index)
091                     tier2sort_key[tier].pop(index)
092  
093                     new_index = bisect.bisect_left(tier2sort_key[tier],
094                                                    new_sort_key)
095                     tier2devs[tier].insert(new_index, dev)
096                     tier2sort_key[tier].insert(new_index, new_sort_key)
097  
098                 self._replica2part2dev[replica][part] = dev['id']#某个part的某个replica分配到dev['id']
099  
100         # Just to save memory and keep from accidental reuse.
101         for dev in self._iter_devs():
102             del dev['sort_key']

这个函数实现了重新分配的功能,其中重要的概念是三层结构,也就是utrls.py文件,会针对一个dev 或者一个devs,返回三层结构的字典。

源码中给我们举了一个例子:

  Example:

    zone 1 -+---- 192.168.1.1:6000 -+---- device id 0

            |                       |

            |                       +---- device id 1

            |                       |

            |                       +---- device id 2

            |

            +---- 192.168.1.2:6000 -+---- device id 3

                                    |

                                    +---- device id 4

                                    |

                                    +---- device id 5

    zone 2 -+---- 192.168.2.1:6000 -+---- device id 6

            |                       |

            |                       +---- device id 7

            |                       |

            |                       +---- device id 8

            |

            +---- 192.168.2.2:6000 -+---- device id 9

                                    |

                                    +---- device id 10

                                    |

                                    +---- device id 11

    The tier tree would look like:

    {

      (): [(1,), (2,)],

      (1,): [(1, 192.168.1.1:6000),

             (1, 192.168.1.2:6000)],

      (2,): [(1, 192.168.2.1:6000),

             (1, 192.168.2.2:6000)],

      (1, 192.168.1.1:6000): [(1, 192.168.1.1:6000, 0),

                              (1, 192.168.1.1:6000, 1),

                              (1, 192.168.1.1:6000, 2)],

      (1, 192.168.1.2:6000): [(1, 192.168.1.2:6000, 3),

                              (1, 192.168.1.2:6000, 4),

                              (1, 192.168.1.2:6000, 5)],

      (2, 192.168.2.1:6000): [(1, 192.168.2.1:6000, 6),

                              (1, 192.168.2.1:6000, 7),

                              (1, 192.168.2.1:6000, 8)],

      (2, 192.168.2.2:6000): [(1, 192.168.2.2:6000, 9),

                              (1, 192.168.2.2:6000, 10),

                              (1, 192.168.2.2:6000, 11)],

    }


通过zone,ip_port,device_id 分成三层,之后的操作会根据层次,进行相关的操作(这其中就实现了zone,副本等概念)。


这样一个ring rebalance操作就做好了,最后会保存新的 builder文件,和ring文件,ring文件时根据生产的builder文件调用了RingData类中的方法保存的比较简单,这里不做分析。


这样大体上就分析了swift-ring-builder, /swift/common/ring/下的文件,其中具体的函数具体的功能与实现,可以查看源码。下一篇文章我会分析一下swift-init,用通过start方法来说明服务启动的流程。

内容概要:该研究通过在黑龙江省某示范村进行24小时实地测试,比较了燃煤炉具与自动/手动进料生物质炉具的污染物排放特征。结果显示,生物质炉具相比燃煤炉具显著降低了PM2.5、CO和SO2的排放(自动进料分别降低41.2%、54.3%、40.0%;手动进料降低35.3%、22.1%、20.0%),但NOx排放未降低甚至有所增加。研究还发现,经济性和便利性是影响生物质炉具推广的重要因素。该研究不仅提供了实际排放数据支持,还通过Python代码详细复现了排放特征比较、减排效果计算和结果可视化,进一步探讨了燃料性质、动态排放特征、碳平衡计算以及政策建议。 适合人群:从事环境科学研究的学者、政府环保部门工作人员、能源政策制定者、关注农村能源转型的社会人士。 使用场景及目标:①评估生物质炉具在农村地区的推广潜力;②为政策制定者提供科学依据,优化补贴政策;③帮助研究人员深入了解生物质炉具的排放特征和技术改进方向;④为企业研发更高效的生物质炉具提供参考。 其他说明:该研究通过大量数据分析和模拟,揭示了生物质炉具在实际应用中的优点和挑战,特别是NOx排放增加的问题。研究还提出了多项具体的技术改进方向和政策建议,如优化进料方式、提高热效率、建设本地颗粒厂等,为生物质炉具的广泛推广提供了可行路径。此外,研究还开发了一个智能政策建议生成系统,可以根据不同地区的特征定制化生成政策建议,为农村能源转型提供了有力支持。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值