Problem Statement
In eBay production deployment, two AZs (Availability Zones) in data center are running OpenStack Folsom with Nova Network. To align with efforts to upgrade and operationalize consistent deployment pattern across all eBay production data centers, these two AZs need to be upgraded to OpenStack Havana and Neutron with VMWare NSX plugin.
The most challenge task in the upgrade is nova-network to Neutron migration, for all existing VMs (Virtual Machines) should keep alive and traffic stop time should be as little as possible. This post shares our experience about how to accomplish this mission.
Overview of Cloud Environment
The Figure 1 shows the original Folsom deployment, it is Nova Network Multi-host mode. Each Compute Node has one nova-network process and several dnsmasq processes, nova-network does network management while dnsmasq provides DHCP service for the VMs running in the same Compute Node. All the VMs are running in the bridged mode, Linux bridge interfaces are in each Compute Node and VMs’ tap devices are attached to the Linux bridge interfaces.
All Compute Nodes in SLC are running either RHEL 6.4 or RHEL 6.3
Figure 1
The Figure 2 shows the target Havana deployment. All OpenStack components are upgraded to Havana release and Neutron server is enabled, in addition, VMWare NSX SDN nodes are deployed too. No nova-network and dnsmasq processes are running in Compute Nodes anymore, instead, Open vSwitch components are installed, and the Linux bridge interfaces are replaced by Open vSwitch bridge interfaces. All existing VMs are running in bridged mode still.
Figure 2
Nova-network to Neutron Migration
- Control-plane Migration.
The first step is to setup a new set of Havana OpenStack Nodes and NSX Nodes, meanwhile we keep Folsom OpenStack Nodes running still. Transport Zones are created in the NSX controller and all Compute nodes are registered to the NSX controller.
Second step is to do database migration. For Keystone, Nova and Glance database, it is the normal procedure, i.e. export MySQL database from Folsom, and import it to Havana plus db_sync. For Neutron database, net and subnet are created according to networks and fixed_ips tables in Folsom Nova database, Figure 3.
Figure 3
After net and subnet creation in Neutron, the next step is to create ports for all existing VM in Neutron and NSX controller. Here Neutron and Nova APIs are called to do port creation and attach the port to VM. One thing need pay attention is that the tap devices have been created in the Compute Nodes and attached to existing VMs already. In order not to break things in Compute Nodes, the fake driver is enabled in nova-compute service when doing port creation and attachment, see Figure 4.
Figure 4
- Data-plane Migration.
The data-plane migration is moving VMs’ tap devices from Linux bridge to Open vSwitch bridge, and putting the Compute Nodes’ Open vSwitch under the control of SDN controller, see Figure 5.
Figure 5
The details steps are,
a.Install Open vSwitch components, including Open vSwitch kernel module and user space application ovs-vswitchd and ovsdb-server. At this step, we only install the package but don’t load the kernel module and don’t start the services. N.B. ovs-l3d (VMWare’s l3d for Open vSwitch) is installed later.
#rpm -Uvh kmod-openvswitch-2.1.0.*.rpm #rpm -Uvh openvswitch-2.1.0.*.rpm |
b.Detach the VMs’ tap devices and Compute Node’s physical interface, e.g. eth0 from Linux bridge. At this point, the VMs on this Compute Node lose network connections. We need rename the tap device from vnetX to tapXXX in order to follow the Havana Neutron tap device naming convention.
#brctl delif $lnxbr $phyif #brctl delif $lnxbr $vm_tap_folsom #ip link set $vm_tap_folsom down #ip link set $vm_tap_folsom name $vm_tap_havana #ip link set $vm_tap_havana up |
c.Delete the Linux bridge interface and remove the Linux bridge module from kernel. This step is especially important in RHEL6.3, as Linux bridge module and Open vSwitch kernel module have symbol conflicts.
#ip addr del $ip dev $lnxbr #ip link set $lnxbr down #brctl delbr $lnxbr #rmmod bridge |
d.Start all Open vSwitch services. As nicira-ovs-hypervisor-node rpm package’s specific scripts auto start the ovs-l3d service, we install the rpm here.
#service openvswitch start #rpm -Uvh nicira-ovs-hypervisor-node*.rpm |
e.Create Open vSwitch integration bridge br-int and external bridge e.g. br0, and set the external id for external bridge interface. N.B. the br-int is created by nicira-ovs-hypervisor-node rpm package specific scripts.
#ovs_vsctl -- --may-exist add-br br-int\ -- br-set-external-id br-int bridge-id br-int\ -- set bridge br-int other-config:disable-in-band=true\ -- set bridge br-int fail-mode=secure
#ovs-vsctl add-br $ovsbr #ip link set $ovsbr up #ovs-vsctl br-set-external-id $ovsbr bridge-id $ovsbr #ovs-vsctl set bridge $ovsbr fail-mode=standalone |
f.Attach Compute Node’s physical interface to Open vSwitch external bridge interface, config Compute Node’s IP on external bridge interface and adding default route.
#ovs-vsctl add-port $ovsbr $phyif #ip addr add $ip dev $ovsbr #ip route add default via $gw dev $ovsbr |
g.Attach VMs’ tap devices to Open vSwitch integration bridge, and set properties to the tap devices. The iface-id value is Neutron port’s uuid and with mac address value, both of them are read from Neutron’s database. As same uuid value and mac value are in SDN controller too, now the VMs tap devices have enough information linked to SDN controller.
#ovs-vsctl add-port br-int $vm_tap_havana -- \ set Interface $vm_tap_havana external-ids:iface-id=$iface_id -- \ set Interface $vm_tap_havana external-ids:attached-mac=$mac -- \ set Interface $vm_tap_havana external-ids:iface-status=active |
h.Set connection to SDN controller. Then SDN controllers let ovsdb-server setup the patch ports between integration bridge interface and external bridge interface, and download traffic forwarding flows to Compute Node by openflow protocols. At this point, the VMs’ network connections are restored.
#ovs-vsctl set-manager $sdn_controller_url |
The VMs traffic stop time is less than 10 seconds which doesn’t break existing TCP connections usually. For RHEL6.4, we could load Open vSwitch kernel module and start its service (step d) before tap devices and physical interface are detached from Linux bridge (step b), then VMs traffic stop time is less than 5 seconds even.
- Post-migration
After control-plane and data-plane migration is done, the existing VMs could be working under OpenStack Havana controllers and SDN controllers, but we need consider cases like VM stop, VM restart, and Compute Node restart. In order to make all these cases work well and clean enough, below two post-migration jobs are required.
- As the change from Linux bridge interfaces to Open vSwitch bridge interfaces, the Linux network configuration files need update in Compute Node, e.g. /etc/sysconfig/network-scripts/ifcfg-XXX in RHEL.
- The running VMs are still regarding tap devices attached the Linux bridge, so we need update the libvirt runtime xml file /var/run/libvirt/qemu/<instance>.xml and restart libvirtd service.
- Roll-back plan
There are thousands Compute Nodes in the production, and it is possible that some drift Compute Node configuration would cause migration failure, the roll-back plan is important for the production.
We log all steps during the migration. If there’s failure in certain step, we would run revert-migration script, i.e. from Open vSwith to Linux bridge. It can help shorten VMs traffic break time. Then we have enough time to fix configuration drift and run migration script again.
OpenStack Summit
This topic was presented in the OpenStack Paris Summit 2014, the slides and video are at below links,
http://www.youtube.com/watch?v=YMLDCBPUnJo
Reference
2.http://openvswitch.org/support/
3.NSX User Guide, version 4.1