在上一篇文章,我们实现了ns和他们网络中其它ns通信,但是最后发现无法和宿主机通信甚至也无法和其他的外部网络通信,这个显然是不行了,因为docker创建的容器能实现本网段的通信,也能实现宿主机以及外部网络的通信。
[root@localhost ~]# docker run -it --rm busybox
/ # ping -c 2 192.168.159.14
PING 192.168.159.14 (192.168.159.14): 56 data bytes
64 bytes from 192.168.159.14: seq=0 ttl=64 time=0.141 ms
64 bytes from 192.168.159.14: seq=1 ttl=64 time=0.497 ms
--- 192.168.159.14 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.141/0.319/0.497 ms
/ # ping -c 2 qq.com
PING qq.com (58.60.9.21): 56 data bytes
64 bytes from 58.60.9.21: seq=0 ttl=127 time=40.367 ms
64 bytes from 58.60.9.21: seq=1 ttl=127 time=40.170 ms
--- qq.com ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 40.170/40.268/40.367 ms
在创建的ns中,无法和外部的网络通信
宿主机的IP: 192.168.159.14
[root@localhost ~]# ip netns exec net1 ping -c 2 192.168.159.14
connect: Network is unreachable
[root@localhost ~]# ip netns exec net1 ping -c 2 qq.com
ping: qq.com: Name or service not known
我们查看下net1中的路由规则:
[root@localhost ~]# ip netns exec net1 route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.10.1.0 0.0.0.0 255.255.255.0 U 0 0 0 veth1
发现只有一个路由,是10.10.1.0/24网段的规则,仅仅这个路由肯定是无法和外部的网络通信的。
我们给创建的ns中增加默认的路由:
[root@localhost ~]# ip netns exec net1 ip route add default via 10.10.1.1
[root@localhost ~]# ip netns exec net1 route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.10.1.1 0.0.0.0 UG 0 0 0 veth1
10.10.1.0 0.0.0.0 255.255.255.0 U 0 0 0 veth1
[root@localhost ~]# ip netns exec net1 ping 192.168.159.14
PING 192.168.159.14 (192.168.159.14) 56(84) bytes of data.
64 bytes from 192.168.159.14: icmp_seq=1 ttl=64 time=0.097 ms
64 bytes from 192.168.159.14: icmp_seq=2 ttl=64 time=0.042 ms
64 bytes from 192.168.159.14: icmp_seq=3 ttl=64 time=0.049 ms
--- 192.168.159.14 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.042/0.062/0.097/0.026 ms
能和宿主机通信了,但是无法和互联网通信(58.60.9.21 是www.qq.com的主机地址)
[root@localhost ~]# ip netns exec net1 ping 58.60.9.21
PING 58.60.9.21 (58.60.9.21) 56(84) bytes of data.
这次的情况和之前的完全不一样,之前是直接报网络不可达,这次是一直卡主了,似乎是等待着目标主机回复。
我们需要使用tcpdump查看网络设备的流量转发情况:
网桥bridge0
[root@localhost ~]# tcpdump -n -i bridge0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bridge0, link-type EN10MB (Ethernet), capture size 262144 bytes
01:59:46.882448 IP 10.10.1.2 > 58.60.9.21: ICMP echo request, id 7608, seq 166, length 64
01:59:47.882304 IP 10.10.1.2 > 58.60.9.21: ICMP echo request, id 7608, seq 167, length 64
01:59:48.882345 IP 10.10.1.2 > 58.60.9.21: ICMP echo request, id 7608, seq 168, length 64
01:59:49.882310 IP 10.10.1.2 > 58.60.9.21: ICMP echo request, id 7608, seq 169, length 64
网卡ens33
[root@localhost ~]# tcpdump -n -i ens33 host 58.60.9.21
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens33, link-type EN10MB (Ethernet), capture size 262144 bytes
02:02:11.915210 IP 10.10.1.2 > 58.60.9.21: ICMP echo request, id 7608, seq 311, length 64
02:02:12.915248 IP 10.10.1.2 > 58.60.9.21: ICMP echo request, id 7608, seq 312, length 64
02:02:13.915238 IP 10.10.1.2 > 58.60.9.21: ICMP echo request, id 7608, seq 313, length 64
似乎是有问题的,正常情况下,我们需要把访问外网的地址转化为宿主机的IP然后发出去。就像在容器中那样,如下所示:
启动一个busybox容器,在容器中ping 8.8.8.8
[root@localhost ~]# docker run -it --rm busybox
/ # ping -c 4 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=127 time=42.431 ms
64 bytes from 8.8.8.8: seq=1 ttl=127 time=42.438 ms
64 bytes from 8.8.8.8: seq=2 ttl=127 time=42.565 ms
64 bytes from 8.8.8.8: seq=3 ttl=127 time=42.527 ms
--- 8.8.8.8 ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss round-trip min/avg/max = 42.431/42.490/42.565 ms
在docker0网桥上抓包如下:
[root@localhost ~]# tcpdump -n -i docker0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on docker0, link-type EN10MB (Ethernet), capture size 262144 bytes
02:12:37.721456 IP 172.17.3.3 > 8.8.8.8: ICMP echo request, id 2304, seq 0, length 64
02:12:37.763998 IP 8.8.8.8 > 172.17.3.3: ICMP echo reply, id 2304, seq 0, length 64
02:12:38.722085 IP 172.17.3.3 > 8.8.8.8: ICMP echo request, id 2304, seq 1, length 64
02:12:38.764550 IP 8.8.8.8 > 172.17.3.3: ICMP echo reply, id 2304, seq 1, length 64
02:12:39.723085 IP 172.17.3.3 > 8.8.8.8: ICMP echo request, id 2304, seq 2, length 64
02:12:39.766769 IP 8.8.8.8 > 172.17.3.3: ICMP echo reply, id 2304, seq 2, length 64
02:12:40.724125 IP 172.17.3.3 > 8.8.8.8: ICMP echo request, id 2304, seq 3, length 64
02:12:40.766629 IP 8.8.8.8 > 172.17.3.3: ICMP echo reply, id 2304, seq 3, length 64
在物理网卡ens33上抓包如下:
[root@localhost ~]# tcpdump -n -i ens33 host 8.8.8.8
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens33, link-type EN10MB (Ethernet), capture size 262144 bytes
02:08:40.113619 IP 192.168.159.14 > 8.8.8.8: ICMP echo request, id 1792, seq 0, length 64
02:08:40.155887 IP 8.8.8.8 > 192.168.159.14: ICMP echo reply, id 1792, seq 0, length 64
02:08:41.114615 IP 192.168.159.14 > 8.8.8.8: ICMP echo request, id 1792, seq 1, length 64
02:08:41.156882 IP 8.8.8.8 > 192.168.159.14: ICMP echo reply, id 1792, seq 1, length 64
02:08:42.115587 IP 192.168.159.14 > 8.8.8.8: ICMP echo request, id 1792, seq 2, length 64
02:08:42.157990 IP 8.8.8.8 > 192.168.159.14: ICMP echo reply, id 1792, seq 2, length 64
02:08:43.116547 IP 192.168.159.14 > 8.8.8.8: ICMP echo request, id 1792, seq 3, length 64
02:08:43.158905 IP 8.8.8.8 > 192.168.159.14: ICMP echo reply, id 1792, seq 3, length 64
我们看到是在ens33上,对源地址的IP进行了转换,由容器的IP 172.17.3.3换成了宿主机的IP 192.168.159.14。我们可以查看自己的iptable规则:
#查看nat表中的POSTROUTING 链 (忽略不相关的)
[root@localhost ~]# iptables -nv -L -t nat
Chain POSTROUTING (policy ACCEPT 529 packets, 39833 bytes)
pkts bytes target prot opt in out source destination
38 2488 MASQUERADE all -- * !docker0 172.17.3.0/24 0.0.0.0/0
我们看到链POSTROUTING,有一条规则就是源地址是172.17.3.0/24网段发出的数据包,需要做SNAT转换。从容器访问外部的流量,在外部看来就是从宿主机上发出的。
因此,我们也需要给数据包的源地址进行转换。这就需要用iptables规则了。
#把从10.10.1.0/24发出的包都经过一下转换
[root@localhost ~]# iptables -t nat -A POSTROUTING -s 10.10.1.0/255.255.255.0 -o ens33 -j MASQUERADE
#查看此时的nat表(忽略不相干的)
[root@localhost ~]# iptables -nv -L --line -t nat
Chain POSTROUTING (policy ACCEPT 1 packets, 76 bytes)
num pkts bytes target prot opt in out source destination
1 38 2488 MASQUERADE all -- * !docker0 172.17.3.0/24 0.0.0.0/0
2 0 0 MASQUERADE all -- * ens33 10.10.1.0/24 0.0.0.0/0
这样,我们就能ping通宿主机之外的网络
[root@localhost ~]# ip netns exec net1 ping -c 2 58.60.9.21
PING 58.60.9.21 (58.60.9.21) 56(84) bytes of data.
64 bytes from 58.60.9.21: icmp_seq=1 ttl=127 time=41.5 ms
64 bytes from 58.60.9.21: icmp_seq=2 ttl=127 time=42.8 ms
--- 58.60.9.21 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1004ms
rtt min/avg/max/mdev = 41.573/42.221/42.869/0.648 ms
#抓包如下:
[root@localhost ~]# tcpdump -n -i ens33 host 58.60.9.21
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens33, link-type EN10MB (Ethernet), capture size 262144 bytes
03:18:54.514143 IP 192.168.159.14 > 58.60.9.21: ICMP echo request, id 8451, seq 1, length 64
03:18:54.555632 IP 58.60.9.21 > 192.168.159.14: ICMP echo reply, id 8451, seq 1, length 64
03:18:55.519990 IP 192.168.159.14 > 58.60.9.21: ICMP echo request, id 8451, seq 2, length 64
03:18:55.561868 IP 58.60.9.21 > 192.168.159.14: ICMP echo reply, id 8451, seq 2, length 64
同时具备DNS功能
[root@localhost ~]# ip netns exec net1 ping -c 2 qq.com
PING qq.com (58.60.9.21) 56(84) bytes of data.
64 bytes from 58.60.9.21 (58.60.9.21): icmp_seq=1 ttl=127 time=42.1 ms
64 bytes from 58.60.9.21 (58.60.9.21): icmp_seq=2 ttl=127 time=41.9 ms
--- qq.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 41.923/42.044/42.166/0.238 ms
此时,我们基本上实现了和容器类似的功能,能实现ns之间通过网桥通信,也能ns隔离的网络像容器那样访问宿主机以及通过访问外部的网络。我们之前有个需求是:实现Opestack创建的VM和Kubernetes创建的Pod能够通信。无论Openstack采用的什么网络模式,或者Kubernetes采用的哪个网络方案,底层来看,他们都是通过brdige来构建的,我就想到,如何能做到同一个主机中的两个bridge能通信呢?所以,接下来就是通过Linux网络技术实现Linux网桥之间通信,以及跨主机网桥之间的通信。