前言
之前使用nginx-ingress-controller来记录后端真实ip,但是有位老哥说了,我没有用nginx-ingress-controller,而是用的原生nginx,这时候又当如何记录后端真实ip的问题呢
环境准备
nginx:
/* by 01022.hk - online tools website : 01022.hk/zh/imagetogif.html */
upstream backend_ups {
server backend-service:10000;
}
server {
listen 80;
listen [::]:80;
server_name localhost;
location /test {
proxy_pass http://backend_ups;
}
}
/* by 01022.hk - online tools website : 01022.hk/zh/imagetogif.html */
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-test
namespace: default
spec:
selector:
matchLabels:
app: nginx-test
template:
metadata:
labels:
app: nginx-test
spec:
containers:
- image: registry.cn-beijing.aliyuncs.com/wilsonchai/nginx:latest
imagePullPolicy: IfNotPresent
name: nginx-test
ports:
- containerPort: 80
protocol: TCP
volumeMounts:
- mountPath: /etc/nginx/conf.d/default.conf
name: nginx-config
subPath: default.conf
- mountPath: /etc/nginx/nginx.conf
name: nginx-base-config
subPath: nginx.conf
volumes:
- configMap:
defaultMode: 420
name: nginx-config
name: nginx-config
- configMap:
defaultMode: 420
name: nginx-base-config
name: nginx-base-config
---
apiVersion: v1
kind: Service
metadata:
name: nginx-test
namespace: default
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: nginx-test
type: NodePort
backend:
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend
namespace: default
spec:
replicas: 2
selector:
matchLabels:
app: backend
template:
metadata:
labels:
app: backend
spec:
containers:
- image: backend-service:v1
imagePullPolicy: Never
name: backend
ports:
- containerPort: 10000
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: backend-service
namespace: default
spec:
ports:
- port: 10000
protocol: TCP
targetPort: 10000
selector:
app: backend
type: ClusterIP
部署完毕,检查一下
▶ kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
backend-6d4cdd4c68-mqzgj 1/1 Running 0 6m3s 10.244.0.64 wilson <none> <none>
backend-6d4cdd4c68-qjp9m 1/1 Running 0 6m5s 10.244.0.66 wilson <none> <none>
nginx-test-b9bcf66d7-2phvh 1/1 Running 0 6m20s 10.244.0.67 wilson <none> <none>
▶ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
backend-service ClusterIP 10.105.148.194 <none> 10000/TCP 5m
nginx-test NodePort 10.110.71.55 <none> 80:30785/TCP 5m2s
现在的架构大概是这个样子:
尝试访问一下nginx:
▶ curl 127.0.0.1:30785/test
i am backend in backend-6d4cdd4c68-qjp9m
已经反向代理到后端,再看看nginx日志
10.244.0.1 - - [04/Dec/2025:07:27:17 +0000] "GET /test HTTP/1.1" 200 10.105.148.194:10000 40 "-" "curl/7.81.0" "-"
- 不出意外的,其中10.105.148.194是backend-service的ip,并非是pod ip
- 在nginx配置中,upstream的配置是backend-service,使用了k8s的service做了负载均衡,所以在nginx这层无论如何也是拿不到后端pod的ip的
- 现在需要在真正负载均衡那一层把日志打开,就可以看到转发的real server了
iptables
如果用的是iptables做的转发,那就需要复习一下iptables的转发原理了
- 当包进入网卡之后,就进入了PREROUTING
- 其次进入路由,根据目的地址,就分为两个部分:
- 进入本机,走INPUT,随后进入更高层的应用程序处理
- 非本机的包,进入FORWARD,再进入POSTROUTING
规则探索
我们来看看具体的规则:
-
首先先检查PREROUTING,就发现了k8s添加的链表,
KUBE-SERVICES▶ sudo iptables -L PREROUTING -t nat Chain PREROUTING (policy ACCEPT) target prot opt source destination KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */ ... -
继续检查
KUBE-SERVICES,发现了很多service的规则▶ sudo iptables -L KUBE-SERVICES -t nat Chain KUBE-SERVICES (2 references) target prot opt source destination KUBE-SVC-W67AXLFK7VEUVN6G tcp -- anywhere 10.110.71.55 /* default/nginx-test cluster IP */ KUBE-SVC-EDNDUDH2C75GIR6O tcp -- anywhere 10.98.224.124 /* ingress-nginx/ingress-nginx-controller:https cluster IP */ KUBE-SVC-ERIFXISQEP7F7OF4 tcp -- anywhere 10.96.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ KUBE-SVC-JD5MR3NA4I4DYORP tcp -- anywhere 10.96.0.10 /* kube-system/kube-dns:metrics cluster IP */ KUBE-SVC-ZZAJ2COS27FT6J6V tcp -- anywhere 10.105.148.194 /* default/backend-service cluster IP */ KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- anywhere 10.96.0.1 /* default/kubernetes:https cluster IP */ KUBE-SVC-CG5I4G2RS3ZVWGLK tcp -- anywhere 10.98.224.124 /* ingress-nginx/ingress-nginx-controller:http cluster IP */ KUBE-SVC-EZYNCFY2F7N6OQA2 tcp -- anywhere 10.101.164.9 /* ingress-nginx/ingress-nginx-controller-admission:https-webhook cluster IP */ KUBE-SVC-TCOU7JCQXEZGVUNU udp -- anywhere 10.96.0.10 /* kube-system/kube-dns:dns cluster IP */ -
找到目标service:backend-service对应的链规则
KUBE-SVC-ZZAJ2COS27FT6J6V,继续检查▶ sudo iptables -L KUBE-SVC-ZZAJ2COS27FT6J6V -t nat Chain KUBE-SVC-ZZAJ2COS27FT6J6V (1 references) target prot opt source destination KUBE-MARK-MASQ tcp -- !10.244.0.0/16 10.105.148.194 /* default/backend-service cluster IP */ KUBE-SEP-GYMSSX4TMUPRH3OB all -- anywhere anywhere /* default/backend-service -> 10.244.0.64:10000 */ statistic mode random probability 0.50000000000 KUBE-SEP-EZWMSI6QFXP3WRHV all -- anywhere anywhere /* default/backend-service -> 10.244.0.66:10000 */ -
这里已经有明显的结论了,有2条链,对应的了两个pod,再深入检查其中一条链,
KUBE-SEP-GYMSSX4TMUPRH3OB▶ sudo iptables -L KUBE-SEP-GYMSSX4TMUPRH3OB -t nat Chain KUBE-SEP-GYMSSX4TMUPRH3OB (1 references) target prot opt source destination KUBE-MARK-MASQ all -- 10.244.0.64 anywhere /* default/backend-service */ DNAT tcp -- anywhere anywhere /* default/backend-service */ tcp to:10.244.0.64:10000 -
从这里的规则能够知晓,进入
KUBE-SEP-GYMSSX4TMUPRH3OB,会进行DNAT转换,把目标的ip转换成:10.244.0.64;那同理可知,另外一条链会将目标ip转换成:10.244.0.66。所以我们只需要在这个地方加入日志记录,就可以 追踪对端的ip是哪个了sudo iptables -t nat -I KUBE-SEP-GYMSSX4TMUPRH3OB 1 -j LOG --log-prefix "backend-service-pod: " --log-level 4 sudo iptables -t nat -I KUBE-SEP-EZWMSI6QFXP3WRHV 1 -j LOG --log-prefix "backend-service-pod: " --log-level 4 -
来看下整体的效果
▶ sudo iptables -L KUBE-SEP-GYMSSX4TMUPRH3OB -t nat Chain KUBE-SEP-GYMSSX4TMUPRH3OB (1 references) target prot opt source destination LOG all -- anywhere anywhere LOG level warning prefix "backend-service-pod: " KUBE-MARK-MASQ all -- 10.244.0.64 anywhere /* default/backend-service */ DNAT tcp -- anywhere anywhere /* default/backend-service */ tcp to:10.244.0.64:10000 wilson.chai-ubuntu [ 17:18:03 ] /usr/src/trojan ▶ sudo iptables -L KUBE-SEP-EZWMSI6QFXP3WRHV -t nat Chain KUBE-SEP-EZWMSI6QFXP3WRHV (1 references) target prot opt source destination LOG all -- anywhere anywhere LOG level warning prefix "backend-service-pod: " KUBE-MARK-MASQ all -- 10.244.0.66 anywhere /* default/backend-service */ DNAT tcp -- anywhere anywhere /* default/backend-service */ tcp to:10.244.0.66:10000 -
都已经做了对应的日志记录了,开始测试
- 请求nginx:
curl 127.0.0.1:30785/test - 查看日志
tail -f /var/log/syslog | grep backend-service Dec 4 17:17:30 wilson kernel: [109258.569426] backend-service-pod: IN=cni0 OUT= PHYSIN=veth1dc60dd3 MAC=76:d8:f3:a1:f8:1b:a2:79:4b:23:58:d8:08:00 SRC=10.244.0.70 DST=10.105.148.194 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=64486 DF PROTO=TCP SPT=51596 DPT=10000 WINDOW=64860 RES=0x00 SYN URGP=0
- 请求nginx:
什么?!为什么DST依然显示的是10.105.148.194
问题解决
-
再次查看链规则,在日志记录的时候,还没有做DNAT,所以DST依然是service ip,但是如果将LOG规则写在DNAT之后,那DNAT匹配之后就不会再进入LOG,那依然不能记录
▶ sudo iptables -L KUBE-SEP-GYMSSX4TMUPRH3OB -t nat Chain KUBE-SEP-GYMSSX4TMUPRH3OB (1 references) target prot opt source destination LOG all -- anywhere anywhere LOG level warning prefix "backend-service-pod: " KUBE-MARK-MASQ all -- 10.244.0.64 anywhere /* default/backend-service */ DNAT tcp -- anywhere anywhere /* default/backend-service */ tcp to:10.244.0.64:10000 -
还是要回到iptables的工作原理,我们的请求先进入
PREROUTING链,再进入FORWARD链,最后进入POSTROUTING链,请求进入在PREROUTING中已经进行了DNAT的转换,那其实就可以在后面两个链表中记录日志。这里选择在POSTROUTING中记录日志iptables -t nat -I POSTROUTING -p tcp -d 10.244.0.64 -j LOG --log-prefix "backend-service: " iptables -t nat -I POSTROUTING -p tcp -d 10.244.0.66 -j LOG --log-prefix "backend-service: "▶ sudo iptables -L POSTROUTING -t nat Chain POSTROUTING (policy ACCEPT) target prot opt source destination LOG tcp -- anywhere 10.244.0.66 LOG level warning prefix "backend-service: " LOG tcp -- anywhere 10.244.0.64 LOG level warning prefix "backend-service: " ... -
在测试一次
Dec 4 17:33:23 wilson kernel: [110211.770728] backend-service: IN= OUT=cni0 PHYSIN=veth1dc60dd3 PHYSOUT=vetha515043b SRC=10.244.0.70 DST=10.244.0.64 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=59709 DF PROTO=TCP SPT=33454 DPT=10000 WINDOW=64860 RES=0x00 SYN URGP=0 Dec 4 17:33:24 wilson kernel: [110213.141975] backend-service: IN= OUT=cni0 PHYSIN=veth1dc60dd3 PHYSOUT=veth0a8a2dd3 SRC=10.244.0.70 DST=10.244.0.66 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=54719 DF PROTO=TCP SPT=33468 DPT=10000 WINDOW=64860 RES=0x00 SYN URGP=0
iptables小结
这个测试验证了,用linux的iptables规则,依然可以追踪每一条请求的链路,并且回顾了iptables的工作原理,以及探索了k8s基于iptables的转发规则的原理,但是这里有几个问题
- 一旦pod的ip变化,那日志规则也要改变,这在pod随时变化的环境中,需要大量精力维护
- 新增k8s node 节点,依然需要手动添加规则,这也需要大量维护
- 这个方法涉及到了根本的转发规则,一旦处理不当,错误的增删改,将导致不可预知的风险,甚至集群由此崩溃
所以这种方法是不太能够在生产环境当中使用的
在这里做了详细的分析,只是为了证明我们的思路没错,只要在负载均衡那一层进行日志记录,就能拿到real server
ipvs
ipvs 是内核转发,并不会记录访问日志,但是依然可以使用 iptables的方法来记录,就是在POSTROUTING链上记录日志
如果k8s的转发模式是ipvs,并不意味着只需要用ipvs就可以完成转发工作,它需要ipvs与iptables共同协作才能完成
内置于Linux内核中的核心技术模块,它工作在Netfilter框架的INPUT之前的hook点
当请求的目标ip是ipvs的vip时,ipvs就会接入工作,将数据包“劫持”过来,然后进行规则匹配并修改,比如进行DNAT
TCP 10.105.148.194:10000 rr
-> 10.244.0.73:10000 Masq 1 0 0
-> 10.244.0.74:10000 Masq 1 0 0
- 如果修改后的目标ip就是本机ip,那ipvs就直接交给上层应用程序处理
- 如果修改后的目标ip不是本机ip,那该数据包会重新进入路由,然后通过FORWARD,POSTROUTING转发出去
小结
本文复习了linux传统的知识点,iptables与ipvs的工作原理,
并且详细讨论了,如果不加任何插件的情况下,使用操作系统自带的追踪方式查看后端真实的pod,但是这些都不适合在生产环境使用,因为它太底层了,日常的操作不应该去操作底层的配置,就算要用,也应该做一些自动化的脚 本或者插件封装一次才能使用
在不使用nginx-ingress-controller,或者我想记录服务间转发的真实pod,有没有可以直接使用的插件或者组件帮我们完成这个工作呢,答案肯定是有的,那这就是下一期的内容,敬请期待
联系我
- 联系我,做深入的交流
至此,本文结束
在下才疏学浅,有撒汤漏水的,请各位不吝赐教...
本文来自博客园,作者:it排球君,转载请注明原文链接:https://www.cnblogs.com/MrVolleyball/p/19360725
890

被折叠的 条评论
为什么被折叠?



