Seamless migration from Nova-network to Neutron in eBay production

最新推荐文章于 2025-05-12 16:33:08 发布

ebay

最新推荐文章于 2025-05-12 16:33:08 发布

阅读量781

点赞数

CC 4.0 BY-SA版权

文章标签： nova

本文链接：https://blog.youkuaiyun.com/ebay/article/details/43529053

本文分享了eBay生产环境中从OpenStack Folsom的Nova Network无缝迁移到Havana的Neutron的过程。文章详细介绍了控制面和数据面迁移的具体步骤，并讨论了迁移后的维护工作及回滚计划。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Problem Statement

In eBay production deployment, two AZs (Availability Zones) in data center are running OpenStack Folsom with Nova Network. To align with efforts to upgrade and operationalize consistent deployment pattern across all eBay production data centers, these two AZs need to be upgraded to OpenStack Havana and Neutron with VMWare NSX plugin.

The most challenge task in the upgrade is nova-network to Neutron migration, for all existing VMs (Virtual Machines) should keep alive and traffic stop time should be as little as possible. This post shares our experience about how to accomplish this mission.

Overview of Cloud Environment

The Figure 1 shows the original Folsom deployment, it is Nova Network Multi-host mode. Each Compute Node has one nova-network process and several dnsmasq processes, nova-network does network management while dnsmasq provides DHCP service for the VMs running in the same Compute Node. All the VMs are running in the bridged mode, Linux bridge interfaces are in each Compute Node and VMs’ tap devices are attached to the Linux bridge interfaces.

All Compute Nodes in SLC are running either RHEL 6.4 or RHEL 6.3

Figure 1

The Figure 2 shows the target Havana deployment. All OpenStack components are upgraded to Havana release and Neutron server is enabled, in addition, VMWare NSX SDN nodes are deployed too. No nova-network and dnsmasq processes are running in Compute Nodes anymore, instead, Open vSwitch components are installed, and the Linux bridge interfaces are replaced by Open vSwitch bridge interfaces. All existing VMs are running in bridged mode still.

Figure 2

Nova-network to Neutron Migration

Control-plane Migration.

The first step is to setup a new set of Havana OpenStack Nodes and NSX Nodes, meanwhile we keep Folsom OpenStack Nodes running still. Transport Zones are created in the NSX controller and all Compute nodes are registered to the NSX controller.

Second step is to do database migration. For Keystone, Nova and Glance database, it is the normal procedure, i.e. export MySQL database from Folsom, and import it to Havana plus db_sync. For Neutron database, net and subnet are created according to networks and fixed_ips tables in Folsom Nova database, Figure 3.

Figure 3

After net and subnet creation in Neutron, the next step is to create ports for all existing VM in Neutron and NSX controller. Here Neutron and Nova APIs are called to do port creation and attach the port to VM. One thing need pay attention is that the tap devices have been created in the Compute Nodes and attached to existing VMs already. In order not to break things in Compute Nodes, the fake driver is enabled in nova-compute service when doing port creation and attachment, see Figure 4.

Figure 4

Data-plane Migration.

The data-plane migration is moving VMs’ tap devices from Linux bridge to Open vSwitch bridge, and putting the Compute Nodes’ Open vSwitch under the control of SDN controller, see Figure 5.

Figure 5

The details steps are,

a.Install Open vSwitch components, including Open vSwitch kernel module and user space application ovs-vswitchd and ovsdb-server. At this step, we only install the package but don’t load the kernel module and don’t start the services. N.B. ovs-l3d (VMWare’s l3d for Open vSwitch) is installed later.

#rpm -Uvh kmod-openvswitch-2.1.0.*.rpm

#rpm -Uvh openvswitch-2.1.0.*.rpm

b.Detach the VMs’ tap devices and Compute Node’s physical interface, e.g. eth0 from Linux bridge. At this point, the VMs on this Compute Node lose network connections. We need rename the tap device from vnetX to tapXXX in order to follow the Havana Neutron tap device naming convention.

#brctl delif $lnxbr $phyif

#brctl delif $lnxbr $vm_tap_folsom

#ip link set $vm_tap_folsom down

#ip link set $vm_tap_folsom name $vm_tap_havana

#ip link set $vm_tap_havana up

c.Delete the Linux bridge interface and remove the Linux bridge module from kernel. This step is especially important in RHEL6.3, as Linux bridge module and Open vSwitch kernel module have symbol conflicts.

#ip addr del $ip dev $lnxbr

#ip link set $lnxbr down

#brctl delbr $lnxbr

#rmmod bridge

d.Start all Open vSwitch services. As nicira-ovs-hypervisor-node rpm package’s specific scripts auto start the ovs-l3d service, we install the rpm here.

#service openvswitch start

#rpm -Uvh nicira-ovs-hypervisor-node*.rpm

e.Create Open vSwitch integration bridge br-int and external bridge e.g. br0, and set the external id for external bridge interface. N.B. the br-int is created by nicira-ovs-hypervisor-node rpm package specific scripts.

#ovs_vsctl -- --may-exist add-br br-int\

-- br-set-external-id br-int bridge-id br-int\

-- set bridge br-int other-config:disable-in-band=true\

-- set bridge br-int fail-mode=secure

#ovs-vsctl add-br $ovsbr

#ip link set $ovsbr up

#ovs-vsctl br-set-external-id $ovsbr bridge-id $ovsbr

#ovs-vsctl set bridge $ovsbr fail-mode=standalone

f.Attach Compute Node’s physical interface to Open vSwitch external bridge interface, config Compute Node’s IP on external bridge interface and adding default route.

#ovs-vsctl add-port $ovsbr $phyif

#ip addr add $ip dev $ovsbr

#ip route add default via $gw dev $ovsbr

g.Attach VMs’ tap devices to Open vSwitch integration bridge, and set properties to the tap devices. The iface-id value is Neutron port’s uuid and with mac address value, both of them are read from Neutron’s database. As same uuid value and mac value are in SDN controller too, now the VMs tap devices have enough information linked to SDN controller.

#ovs-vsctl add-port br-int $vm_tap_havana -- \

set Interface $vm_tap_havana external-ids:iface-id=$iface_id -- \

set Interface $vm_tap_havana external-ids:attached-mac=$mac -- \

set Interface $vm_tap_havana external-ids:iface-status=active

h.Set connection to SDN controller. Then SDN controllers let ovsdb-server setup the patch ports between integration bridge interface and external bridge interface, and download traffic forwarding flows to Compute Node by openflow protocols. At this point, the VMs’ network connections are restored.

#ovs-vsctl set-manager $sdn_controller_url

The VMs traffic stop time is less than 10 seconds which doesn’t break existing TCP connections usually. For RHEL6.4, we could load Open vSwitch kernel module and start its service (step d) before tap devices and physical interface are detached from Linux bridge (step b), then VMs traffic stop time is less than 5 seconds even.

Post-migration

After control-plane and data-plane migration is done, the existing VMs could be working under OpenStack Havana controllers and SDN controllers, but we need consider cases like VM stop, VM restart, and Compute Node restart. In order to make all these cases work well and clean enough, below two post-migration jobs are required.

As the change from Linux bridge interfaces to Open vSwitch bridge interfaces, the Linux network configuration files need update in Compute Node, e.g. /etc/sysconfig/network-scripts/ifcfg-XXX in RHEL.
The running VMs are still regarding tap devices attached the Linux bridge, so we need update the libvirt runtime xml file /var/run/libvirt/qemu/<instance>.xml and restart libvirtd service.

Roll-back plan

There are thousands Compute Nodes in the production, and it is possible that some drift Compute Node configuration would cause migration failure, the roll-back plan is important for the production.

We log all steps during the migration. If there’s failure in certain step, we would run revert-migration script, i.e. from Open vSwith to Linux bridge. It can help shorten VMs traffic break time. Then we have enough time to fix configuration drift and run migration script again.

OpenStack Summit

This topic was presented in the OpenStack Paris Summit 2014, the slides and video are at below links,

https://www.openstack.org/assets/presentation-media/Seamless-migration-from-Nova-network-to-Neutron-in-eBay-production-20141104.pptx

http://www.youtube.com/watch?v=YMLDCBPUnJo

Reference

1.http://docs.openstack.org/

2.http://openvswitch.org/support/

3.NSX User Guide, version 4.1