OpenStack VM DHCP problem with Quantum? Guideline and real case

本文提供了一套详细的OpenStack环境中虚拟机无法自动获取IP地址问题的排查流程。从启动DHCP请求开始,逐步检查网络节点接收情况、DHCP响应、回复是否到达计算节点等步骤,并通过一个真实案例展示了如何定位并解决此类问题。
When using OpenStack in practical scenarios, there will be numbers of detailed evils. One notorious bug is that booted vm sometimes cannot get an IP by DHCP automatically. Many people encountered similar problems, and proposed several solutions, including restarting quantum related services. However, this may work for some special cases, while fail on the others.
So, how to find out the crime culprit for your specified problem? In this article, we will show the guideline to locate the DHCP failure reason and demonstrate with a real case.
Debug Guideline:
0) Start a DHCP request in the vm using
sudo udhcpc
or other dhcp client.
1) Does the DHCP request reach the network node?
If not, then you should use tcpdump to capture packets at the compute node’s and the network node’s network interface (at the data network). A DHCP request usually looks like
IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:82:ee:fe, length 286
if using the following commands.
tcpdump -ni eth1 port 67 and port 68
2) If the DHCP request successfully reaches the network nodes, then make sure the quantum-dhcp-agent offers reply. This can be validated through the log file (/var/log/syslog), or by tcpdump also.
The log may look like
Jun 21 10:42:31 localhost, dnsmasq-dhcp[541]: DHCPREQUEST(tap9c753e61-fc) 50.50.1.6 fa:16:3e:82:ee:fe
Jun 21 10:42:31 localhost, dnsmasq-dhcp[541]: DHCPACK(tap9c753e61-fc) 50.50.1.6 fa:16:3e:82:ee:fe 50-50-1-6
And a DHCP Reply usually looks like
IP 50.50.1.3.67 > 50.50.1.7.68: BOOTP/DHCP, Reply, length 308
If not, make sure the quantum-* services starts successfully at the network node.
service quantum-dhcp-agent status
3) Make sure the DHCP reply goes back to the compute node using tcpdump too.
4) If the DHCP reply reach the compute node, then capture at the vm’s corresponding tap-* network interface, to make sure the reply can reach vm.
If not, then try to check the quantum-plugin-openvswitch-agent services works fine at the compute node.
service quantum-plugin-openvswitch-agent status
5) Sometimes, you may need to restart the whole nodes if problem continues appear at a special machine.
A real case
I have met a weird case.
In the case, everything seems OK. The network node gets the DHCP request and gives back the offer, while the compute node successfully gets the DHCP offer. However, the vm still cannot get IP some times, while occasionally it will get one!
I look very carefully the entire process, and make sure all services are started.
Then the only suspicious component is the OpenvSwitch.
I check the of rules at the br-int (vm’s located bridge) using
ovs-ofctl dump-flows br-int
and they looks like:
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=2219.925s, table=0, n_packets=0, n_bytes=85038, idle_age=3, priority=3,in_port=1,dl_vlan=2 actions=mod_vlan_vid:1,NORMAL
 cookie=0x0, duration=2231.487s, table=0, n_packets=0, n_bytes=120021, idle_age=3, priority=1 actions=NORMAL
 cookie=0x0, duration=2227.341s, table=0, n_packets=0, n_bytes=16868, idle_age=5, priority=2,in_port=1 actions=drop
They look quite normal, as all the rules are generated by the  quantum-plugin-openvswitch-agent service.
I also make sure the DHCP offer reach br-int with capturing packet at it’s data network interface.
tcpdump –ni int-br-eth1 port 67 or port 68
As I guess, the DHCP offer should match rule#1 (vlan mode), and send out. However, watch a while, the n_packets does not increase, which means the DHCP offer does not match the rule.
It is strange right? Why ovs does not work as expected?
Based on my years’ experience on ovs, I think there must be some HIDDEN rule destroying the processing. Then I check more details of the rules.
ovs-appctl bridge/dump-flows br-int
HAHA, some thing now is floating outside.
duration=151s, priority=180001, n_packets=0, n_bytes=0, priority=180001,arp,dl_dst=fe:86:a7:fd:c0:4f,arp_op=2,actions=NORMAL
duration=151s, priority=180003, n_packets=0, n_bytes=0, priority=180003,arp,dl_dst=00:1a:64:99:f2:72,arp_op=2,actions=NORMAL
duration=148s, priority=3, n_packets=0, n_bytes=0, priority=3,in_port=1,dl_vlan=2,actions=mod_vlan_vid:1,NORMAL
duration=151s, priority=180006, n_packets=0, n_bytes=0, priority=180006,arp,nw_src=10.0.1.197,arp_op=1,actions=NORMAL
duration=151s, priority=180004, n_packets=0, n_bytes=0, priority=180004,arp,dl_src=00:1a:64:99:f2:72,arp_op=1,actions=NORMAL
duration=151s, priority=180002, n_packets=0, n_bytes=0, priority=180002,arp,dl_src=fe:86:a7:fd:c0:4f,arp_op=1,actions=NORMAL
duration=151s, priority=15790320, n_packets=174, n_bytes=36869, priority=15790320,actions=NORMAL
duration=151s, priority=180005, n_packets=0, n_bytes=0, priority=180005,arp,nw_dst=10.0.1.197,arp_op=2,actions=NORMAL
duration=151s, priority=180008, n_packets=0, n_bytes=0, priority=180008,tcp,nw_src=10.0.1.197,tp_src=6633,actions=NORMAL
duration=151s, priority=180007, n_packets=0, n_bytes=0, priority=180007,tcp,nw_dst=10.0.1.197,tp_dst=6633,actions=NORMAL
duration=151s, priority=180000, n_packets=0, n_bytes=0, priority=180000,udp,in_port=65534,dl_src=fe:86:a7:fd:c0:4f,tp_src=68,tp_dst=67,actions=NORMAL
table_id=254, duration=165s, priority=0, n_packets=13, n_bytes=2146, priority=0,reg0=0x1,actions=controller(reason=no_match)
table_id=254, duration=165s, priority=0, n_packets=0, n_bytes=0, priority=0,reg0=0x2,drop
See that? Packets are matching the red rule, which owns a high priority and just forward the vlan packet as NORMAL!!
So where does the rule come from?
In some version of ovs, when we start ovs without any controller specified, then it may smartly works like a L2 switch, and some rules will be added automatically.
Now how to solve the problem?
We need to tell the ovs do not be that “Smart” with the commands:
ovs-vsctl set bridge br-int fail-mode=secure

At last, the problem has puzzled our team for several weeks. During solving the problem, I summarize the guideline and wish it would be a little bit helpful.

[note] First published at 2013/06/21
Spring Boot处理OpenStack API的OAuth2认证通常涉及以下几个步骤: 1. **添加依赖**:在pom.xml中添加Spring Security以及OpenStack4J的OAuth2模块作为依赖: ```xml <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-security</artifactId> </dependency> <dependency> <groupId>com.netflix.openstack4j</groupId> <artifactId>openstack4j-oauth2</artifactId> <version>4.x.x</version> <!-- 更新到最新版本 --> </dependency> ``` 2. **配置客户端信息**:在application.properties或application.yml中,设置OpenStack的OAuth2客户端ID、密钥和授权URL,以及其他相关配置。 ``` openstack4j.oauth2.client.id=your-client-id openstack4j.oauth2.client.secret=your-client-secret openstack4j.oauth2.authorize.url=https://identity.example.com/v3/auth/tokens ``` 3. **启用Security**:在@Configuration类上添加@EnableWebSecurity注解开启Spring Security,并配置一个AuthenticationManager,指定使用OpenStack4j OAuth2认证策略。 ```java @EnableWebSecurity public class SecurityConfig extends WebSecurityConfigurerAdapter { @Bean public AuthenticationProvider authenticationProvider() { OAuth2AuthenticationProvider oauth2AuthProvider = new OAuth2AuthenticationProvider(); oauth2AuthProvider.setOAuth2Template(new OAuth2Template(openstack4j.getAuth()); return oauth2AuthProvider; } } ``` 4. **保护API端点**:标记需要身份验证的REST控制器或路径,通常是使用@PreAuthorize("hasRole('ROLE_USER')")这样的注解。 ```java @RestController @RequestMapping("/protected") @PreAuthorize("hasAuthority('OS_AUTH_API')") public class ProtectedController { //... } ``` 5. **处理回调和刷新令牌**:如果需要长期会话,你需要处理token的刷新。Spring Security允许自定义TokenEnhancer来修改或刷新令牌。 ```java @Configuration public class OAuth2TokenAutoConfiguration { //... @Bean public TokenEnhancer tokenEnhancer() { return new JWTTokenEnhancer(openstack4j); } //... } ```
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值