DOCKER INGRESS 介绍
# docker ingress官网介绍:https://docs.docker.com/engine/swarm/ingress/
如docker官网所述,swarm 模式下使用ingress routing mesh 路由,可以实现服务在一个节点发布后,访问swarm任意节点地址都可以访问到该服务,即使该node节点没有该服务副本在运行。
环境验证
验证环境我们使用3个节点构建一个一主两从的docker swarm集群:
PS:请使用相同版本docker,且3台主机的操作系统及内核版本要求一致。
# 节点信息:
# worker-1: 192.168.100.228
# worker-2:192.168.100.234
# leader:192.168.100.253[root@253 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
wll4l9u5sj9xyon5u1bvq8wth 228 Ready Active 20.10.12
t1ll9hzipxjms5mt14kxx7u3o 234 Ready Active 20.10.12
pkv93oel5hesk9bn22uyit9rz * 253 Ready Active Leader 20.10.12
使用whoami镜像部署一个单副本的Service,访问该Service会返回Service所在容器机名及IP地址。
# 镜像地址:docker pull containous/whoami:1.5.0
部署whoami服务
# 在leader节点253执行
[root@253 ~]# docker service create --name whoami --replicas 1 -p 8080:80 hub.dehuinet.com:58443/middleware/whoami:v1.5.0
mnauoiowxg541iw0fenhqwemq
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service converged
查看 whoami service 的状态及所在节点
# 在leader节点253执行
[root@253 ~]# docker service ps whoami
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
qmpt38h7rk36 whoami.1 hub.dehuinet.com:58443/middleware/whoami:v1.5.0 253 Running Running 2 minutes ago
# 从命令输出结果来看,service服务被分配到253 leader节点# 在leader节点253执行
[root@253 ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
07e821c9674b hub.dehuinet.com:58443/middleware/whoami:v1.5.0 "/whoami" 3 minutes ago Up 3 minutes 80/tcp whoami.1.qmpt38h7rk36evcywsm5pvaft
使用浏览器访问253节点8080端口,返回信息如下:
浏览器访问234节点8080端口:
浏览器访问228节点8080端口:
我们发现正如docke官网所讲,访问集群任意节点的8080端口,都可以访问到服务,那么其大概实现原理是什么?
DOCKER INGRESS 原理
请求在本地网卡接口
先看一下各节点所在服务器的iptables表。
流量首先经过本地网卡 ens192:
[root@253 docker]# iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 11740 packets, 708K bytes)pkts bytes target prot opt in out source destination
1542K 93M DOCKER-INGRESS all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL20M 1219M DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCALChain INPUT (policy ACCEPT 11740 packets, 708K bytes)pkts bytes target prot opt in out source destinationChain OUTPUT (policy ACCEPT 708 packets, 42669 bytes)pkts bytes target prot opt in out source destination3 180 DOCKER-INGRESS all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL1 60 DOCKER all -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCALChain POSTROUTING (policy ACCEPT 708 packets, 42669 bytes)pkts bytes target prot opt in out source destination3 180 MASQUERADE all -- * docker_gwbridge 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match src-type LOCAL399 28723 MASQUERADE all -- * !docker_gwbridge 192.168.0.0/20 0.0.0.0/0114K 6913K MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/00 0 MASQUERADE tcp -- * * 172.17.0.7 172.17.0.7 tcp dpt:9000Chain DOCKER (2 references)pkts bytes target prot opt in out source destination8 480 RETURN all -- docker_gwbridge * 0.0.0.0/0 0.0.0.0/02 120 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0167 10020 DNAT tcp -- !docker0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:9000 to:172.17.0.7:9000Chain DOCKER-INGRESS (2 references)pkts bytes target prot opt in out source destination0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 to:192.168.0.2:8080
1542K 93M RETURN all -- * * 0.0.0.0/0 0.0.0.0/0
经NAT表PREROUTING链,将其转到NAT表的INGRESS链,匹配到目的端口8080后,将其做DNAT转发,DNAT后请求地址变为:192.168.0.2:8080,使用“ifconfig/ip a”命令在服务器上没找到有这个网段的网卡,那么这个“192.168.0.2”地址是哪里来的呢?先继续往下看
PREROUTING确认是需要转发后,继续匹配FORWARD链
[root@253 docker]# iptables -nvL
Chain INPUT (policy ACCEPT 19540 packets, 1497K bytes)pkts bytes target prot opt in out source destinationChain FORWARD (policy ACCEPT 0 packets, 0 bytes)pkts bytes target prot opt in out source destination946K 1192M DOCKER-USER all -- * * 0.0.0.0/0 0.0.0.0/0946K 1192M DOCKER-INGRESS all -- * * 0.0.0.0/0 0.0.0.0/0
3301K 4111M DOCKER-ISOLATION-STAGE-1 all -- * * 0.0.0.0/0 0.0.0.0/0
27686 95M ACCEPT all -- * docker_gwbridge 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED0 0 DOCKER all -- * docker_gwbridge 0.0.0.0/0 0.0.0.0/0
26943 1583K ACCEPT all -- docker_gwbridge !docker_gwbridge 0.0.0.0/0 0.0.0.0/012M 18G ACCEPT all -- * docker0 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED230 13848 DOCKER all -- * docker0 0.0.0.0/0 0.0.0.0/0
2026K 3654M ACCEPT all -- docker0 !docker0 0.0.0.0/0 0.0.0.0/019 1188 ACCEPT all -- docker0 docker0 0.0.0.0/0 0.0.0.0/00 0 DROP all -- docker_gwbridge docker_gwbridge 0.0.0.0/0 0.0.0.0/0Chain OUTPUT (policy ACCEPT 18203 packets, 1201K bytes)pkts bytes target prot opt in out source destinationChain DOCKER (2 references)pkts bytes target prot opt in out source destination167 10020 ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.7 tcp dpt:9000Chain DOCKER-INGRESS (1 references)pkts bytes target prot opt in out source destination0 0 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:80800 0 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED tcp spt:8080
2134K 2679M RETURN all -- * * 0.0.0.0/0 0.0.0.0/0Chain DOCKER-ISOLATION-STAGE-1 (1 references)pkts bytes target prot opt in out source destination
26943 1583K DOCKER-ISOLATION-STAGE-2 all -- docker_gwbridge !docker_gwbridge 0.0.0.0/0 0.0.0.0/0
2026K 3654M DOCKER-ISOLATION-STAGE-2 all -- docker0 !docker0 0.0.0.0/0 0.0.0.0/014M 21G RETURN all -- * * 0.0.0.0/0 0.0.0.0/0Chain DOCKER-ISOLATION-STAGE-2 (2 references)pkts bytes target prot opt in out source destination14 728 DROP all -- * docker_gwbridge 0.0.0.0/0 0.0.0.0/00 0 DROP all -- * docker0 0.0.0.0/0 0.0.0.0/0
2053K 3656M RETURN all -- * * 0.0.0.0/0 0.0.0.0/0Chain DOCKER-USER (1 references)pkts bytes target prot opt in out source destination14M 21G RETURN all -- * * 0.0.0.0/0 0.0.0.0/0
FORWARD链中匹配后只有一个放行动作,则流量继续匹配POSTROUTING链
[root@253 docker]# iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 11740 packets, 708K bytes)pkts bytes target prot opt in out source destination
1542K 93M DOCKER-INGRESS all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL20M 1219M DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCALChain INPUT (policy ACCEPT 11740 packets, 708K bytes)pkts bytes target prot opt in out source destinationChain OUTPUT (policy ACCEPT 708 packets, 42669 bytes)pkts bytes target prot opt in out source destination3 180 DOCKER-INGRESS all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL1 60 DOCKER all -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCALChain POSTROUTING (policy ACCEPT 708 packets, 42669 bytes)pkts bytes target prot opt in out source destination3 180 MASQUERADE all -- * docker_gwbridge 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match src-type LOCAL399 28723 MASQUERADE all -- * !docker_gwbridge 192.168.0.0/20 0.0.0.0/0114K 6913K MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/00 0 MASQUERADE tcp -- * * 172.17.0.7 172.17.0.7 tcp dpt:9000Chain DOCKER (2 references)pkts bytes target prot opt in out source destination8 480 RETURN all -- docker_gwbridge * 0.0.0.0/0 0.0.0.0/02 120 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0167 10020 DNAT tcp -- !docker0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:9000 to:172.17.0.7:9000Chain DOCKER-INGRESS (2 references)pkts bytes target prot opt in out source destination0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 to:192.168.0.2:8080
1542K 93M RETURN all -- * * 0.0.0.0/0 0.0.0.0/0
在POSTROUTING链中MASQUERADE修改了请求源地址为本机地址192.168.100.253,此时流量的请求头变为:
src:192.168.100.253
dst:192.168.0.2
此时,在看这个192.168.0.2地址,其实他是docker在初始化swarm集群时自动创建的一个网桥,名字叫docker_gwbridge 执行如下命令查看:
[root@253 docker]# docker network ls
NETWORK ID NAME DRIVER SCOPE
99806da37661 bridge bridge local
e8fdcfd2b3f0 docker_gwbridge bridge local
b68f7bfff223 host host local
n3dex97iv1gs ingress overlay swarm
7e38e2b5d547 none null local# 查看docker_gwbridge的详细信息
[root@253 docker]# docker network inspect docker_gwbridge
[{"Name": "docker_gwbridge","Id": "e8fdcfd2b3f03ae9ecc7f4548df5f2629ed1d0a52ef050e510d2c928221bc78a","Created": "2025-01-14T16:09:16.75781064+08:00","Scope": "local","Driver": "bridge","EnableIPv6": false,"IPAM": {"Driver": "default","Options": null,"Config": [{"Subnet": "192.168.0.0/20","Gateway": "192.168.0.1"}]},"Internal": false,"Attachable": false,"Ingress": false,"ConfigFrom": {"Network": ""},"ConfigOnly": false,"Containers": {"c482097de23a1296adfb2f4bde725309ad87137b53d32f2362375a50faf8cf8c": {"Name": "gateway_b19d36310a0c","EndpointID": "dcb1d4c82ef30d7d0580feec09af643bcae39b07dae6b290d43910fdf575156e","MacAddress": "02:42:c0:a8:00:03","IPv4Address": "192.168.0.3/20","IPv6Address": ""},"ingress-sbox": {"Name": "gateway_ingress-sbox","EndpointID": "8fee04df9694740d19a7582e66c026b867887f5d2070c3f49c163a5a6fa604bc","MacAddress": "02:42:c0:a8:00:02","IPv4Address": "192.168.0.2/20","IPv6Address": ""}},"Options": {"com.docker.network.bridge.enable_icc": "false","com.docker.network.bridge.enable_ip_masquerade": "true","com.docker.network.bridge.name": "docker_gwbridge"},"Labels": {}}
]
从输出结果看,总结信息如下:
docker_gwbridge 所属网段是192.168.0.0/20
docker_gwbridge 网关是192.168.0.1
docker_gwbridge 关联了两个容器:
whoami(c482097de23a1296adfb2f4bde725309ad87137b53d32f2362375a50faf8cf8c)
ingress-sbox
这个ingress-sbox并非是一个真实的容器,而是docker创建的一个网络命名空间(network namespace)。而192.168.0.2这个地址就是ingress-sbox中的接口地址,执行命令确认下:
[root@253 netns]# nsenter --net="/run/docker/netns/ingress_sbox" ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft forever
2504: eth0@if2505: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group defaultlink/ether 02:42:0a:00:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0inet 10.0.0.2/24 brd 10.0.0.255 scope global eth0valid_lft forever preferred_lft foreverinet 10.0.0.97/32 scope global eth0valid_lft forever preferred_lft forever
2506: eth1@if2507: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group defaultlink/ether 02:42:c0:a8:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 1inet 192.168.0.2/20 brd 192.168.15.255 scope global eth1valid_lft forever preferred_lft forever
我们看到ingress_sbox命名空间内有除lo回环接口外,还有eth0和eth1两个接口地址,其中eth1的接口地址正是流量从本机ens192网卡出站后目的地址。那我们看一下ingress_box命名空间的策略:
[root@253 netns]# nsenter --net=/run/docker/netns/ingress_sbox iptables -t mangle -nvL
Chain PREROUTING (policy ACCEPT 67 packets, 7047 bytes)pkts bytes target prot opt in out source destination36 3380 MARK tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 MARK set 0x205aChain INPUT (policy ACCEPT 36 packets, 3380 bytes)pkts bytes target prot opt in out source destination0 0 MARK all -- * * 0.0.0.0/0 10.0.0.97 MARK set 0x205aChain FORWARD (policy ACCEPT 31 packets, 3667 bytes)pkts bytes target prot opt in out source destinationChain OUTPUT (policy ACCEPT 36 packets, 3380 bytes)pkts bytes target prot opt in out source destinationChain POSTROUTING (policy ACCEPT 67 packets, 7047 bytes)pkts bytes target prot opt in out source destination
在PREROUTING链中,使用MARK给匹配到的流量标记了0x205a(转成十进制后为8282),这个标记可以用于后续的路由或防火墙规则中,以对这些包进行特殊处理。有了这个标记后,内核在后续路由时会捕获这个标记,并进行转发,确认下:
[root@253 netns]# nsenter --net=/run/docker/netns/ingress_sbox ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags-> RemoteAddress:Port Forward Weight ActiveConn InActConn
FWM 8282 rr-> 10.0.0.98:0 Masq 1 0 0
内核捕获8282标记的流量后,会对流量进行再次进行NAT转发(Masq),目的地址是10.0.0.98,即容器在ingress网络内的地址。
[root@253 netns]# docker network inspect ingress
[{"Name": "ingress","Id": "n3dex97iv1gs5ugdluoc34dxi","Created": "2025-01-15T17:01:36.92041533+08:00","Scope": "swarm","Driver": "overlay","EnableIPv6": false,"IPAM": {"Driver": "default","Options": null,"Config": [{"Subnet": "10.0.0.0/24","Gateway": "10.0.0.1"}]},"Internal": false,"Attachable": false,"Ingress": true,"ConfigFrom": {"Network": ""},"ConfigOnly": false,"Containers": {"c482097de23a1296adfb2f4bde725309ad87137b53d32f2362375a50faf8cf8c": {"Name": "whoami.1.oqfzi4mpkxgneq76zd5stkitx","EndpointID": "fe5a7fc1cff508a3ec30cc486b6f2f956e9ad5cb51c11ae6a89501bc37714e3d","MacAddress": "02:42:0a:00:00:62","IPv4Address": "10.0.0.98/24","IPv6Address": ""},"ingress-sbox": {"Name": "ingress-endpoint","EndpointID": "bbe2023c0e3b4069a6e4bc06c51647fb4db962734a4584b94fd52c16ea6935a2","MacAddress": "02:42:0a:00:00:02","IPv4Address": "10.0.0.2/24","IPv6Address": ""}},"Options": {"com.docker.network.driver.overlay.vxlanid_list": "4096"},"Labels": {},"Peers": [{"Name": "4cc45fe7cbbe","IP": "192.168.100.253"},{"Name": "05c5717c0238","IP": "192.168.100.234"},{"Name": "2182e20600db","IP": "192.168.100.228"}]}
]
查看ingress_sbox的arp表,可以找到10.0.0.98的arp地址
[root@253 netns]# nsenter --net=/run/docker/netns/ingress_sbox arp -a
? (10.0.0.37) at 02:42:0a:00:00:25 [ether] on eth0
? (10.0.0.47) at 02:42:0a:00:00:2f [ether] on eth0
? (10.0.0.8) at 02:42:0a:00:00:08 [ether] on eth0
? (10.0.0.155) at 02:42:0a:00:00:9b [ether] on eth0
? (10.0.0.154) at 02:42:0a:00:00:9a [ether] on eth0
? (10.0.0.139) at 02:42:0a:00:00:8b [ether] on eth0
? (10.0.0.96) at 02:42:0a:00:00:60 [ether] on eth0
? (10.0.0.98) at 02:42:0a:00:00:62 [ether] on eth0