flannel使用etcd来存储定义节点的物理IP以及它所持有的子网,它给每一个节点分配一个子网。而每一个子网的划分是由controller-manager
来分配的。
Flannel的网络模型 flannel支持三种Pod网路模型,每个模型在flannel中称为一种”backend”:
vxlan
:Pod与Pod经由隧道封装后通信,各节点彼此间能通信就行,不要求位于同一个二层网络
vxlan directrouting
:位于同一个二层网络上的不同节点上的Pod间通信,无需隧道封装。但非同一个二层网络上的节点上的Pod间通信,仍须隧道封装。
host-gw
:Pod与Pod不经隧道封装而直接通信要求相关节点位于同一个二层网络
通常,在一个节点上,基于该节点的子网向该节点上的Pod分配IP地址,通常需要专门的插件完成;
插件统称:IPAM(IP分配模块)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 --- kind: ConfigMap apiVersion: v1 metadata: name: kube-flannel-cfg namespace: kube-system labels: tier: node app: flannel data: cni-conf.json: | { "name" : "cbr0" , "cniVersion" : "0.3.1" , "plugins" : [ { "type" : "flannel" , "delegate" : { "hairpinMode" : true , "isDefaultGateway" : true } }, { "type" : "portmap" , "capabilities" : { "portMappings" : true } } ] } net-conf.json: | { "Network" : "10.244.0.0/16" , "Backend" : { "Type" : "vxlan" } }
vxlan模式验证 1.在Master节点上查看路由表
1 2 3 4 5 6 7 8 9 10 11 12 root@k8s-master01:~ Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.11.1 0.0.0.0 UG 0 0 0 eth0 10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0 10.244.1.0 10.244.1.0 255.255.255.0 UG 0 0 0 flannel.1 10.244.2.0 10.244.2.0 255.255.255.0 UG 0 0 0 flannel.1 10.244.3.0 10.244.3.0 255.255.255.0 UG 0 0 0 flannel.1 172.16.11.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
2.在Node01节点上上看路由表
1 2 3 4 5 6 7 8 9 10 11 12 13 root@k8s-node01:~ Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.11.1 0.0.0.0 UG 0 0 0 eth0 10.244.0.0 10.244.0.0 255.255.255.0 UG 0 0 0 flannel.1 10.244.1.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0 10.244.2.0 10.244.2.0 255.255.255.0 UG 0 0 0 flannel.1 10.244.3.0 10.244.3.0 255.255.255.0 UG 0 0 0 flannel.1 172.16.11.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
从上面路由表的结果可以看出,跨主机的Pod通信是通过路由来实现。但是也可以看出跨主机的Pod间通信虽然知道了如何把报文送出去,却无法知道报文是送给哪个主机的。这个功能时候Flanneld来实现的,其会实时查询ETCD来了解Pod和主机的对应结果的。
由于实时查询Etcd信息会对Etcd造成压力,flanneld会将其查询的结果保存到neigh中。
1 2 3 4 5 6 root@k8s-node01:~ 10.244.3.0 dev flannel.1 lladdr d2:f7:0a:5c:0c:a4 PERMANENT 10.244.0.0 dev flannel.1 lladdr 3a:08:06:36:54:6d PERMANENT 10.244.2.0 dev flannel.1 lladdr 0a:9f:ec:95:bb:cf PERMANENT
flannel转发逻辑
1 2 3 4 5 6 7 8 9 root@k8s-node01:~ d2:f7:0a:5c:0c:a4 dev flannel.1 dst 172.16.11.83 self permanent 3a:08:06:36:54:6d dev flannel.1 dst 172.16.11.71 self permanent 0a:9f:ec:95:bb:cf dev flannel.1 dst 172.16.11.82 self permanent
vxlan DirectRouting模式 开启DirectRouting需要对flannel的configMap进行修改,并重新应用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 root@k8s-master01:~ --- kind: ConfigMap apiVersion: v1 metadata: name: kube-flannel-cfg namespace: kube-system labels: tier: node app: flannel data: cni-conf.json: | { ...此处不变... } net-conf.json: | { "Network" : "10.244.0.0/16" , "Backend" : { "Type" : "vxlan" , "DirectRouting" : true } }
重新应用配置清单
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 root@k8s-master01:~ Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+ podsecuritypolicy.policy/psp.flannel.unprivileged configured clusterrole.rbac.authorization.k8s.io/flannel unchanged clusterrolebinding.rbac.authorization.k8s.io/flannel unchanged serviceaccount/flannel unchanged configmap/kube-flannel-cfg configured daemonset.apps/kube-flannel-ds unchanged root@k8s-master01:~ NAME READY STATUS RESTARTS AGE kube-flannel-ds-dl2rq 1/1 Running 0 20d kube-flannel-ds-fmml6 1/1 Running 0 20d kube-flannel-ds-rwh5f 1/1 Running 0 20d kube-flannel-ds-tqbbv 1/1 Running 0 20d root@k8s-master01:~
再次查看节点上的路由表信息
1 2 3 4 5 6 7 8 9 10 11 12 13 root@k8s-master01:~ Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.11.1 0.0.0.0 UG 0 0 0 eth0 10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0 10.244.1.0 172.16.11.81 255.255.255.0 UG 0 0 0 eth0 10.244.2.0 172.16.11.82 255.255.255.0 UG 0 0 0 eth0 10.244.3.0 172.16.11.83 255.255.255.0 UG 0 0 0 eth0 172.16.11.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
host-gw模式 开启host-gw模式需要对flannel的configMap进行修改,并重新应用。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 root@k8s-master01:~ --- kind: ConfigMap apiVersion: v1 metadata: name: kube-flannel-cfg namespace: kube-system labels: tier: node app: flannel data: cni-conf.json: | { ...此处不变... } net-conf.json: | { "Network" : "10.244.0.0/16" , "Backend" : { "Type" : "host-gw" } }
重新应用配置清单
1 2 3 4 5 6 7 8 9 10 11 root@k8s-master01:~ Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+ podsecuritypolicy.policy/psp.flannel.unprivileged configured clusterrole.rbac.authorization.k8s.io/flannel unchanged clusterrolebinding.rbac.authorization.k8s.io/flannel unchanged serviceaccount/flannel unchanged configmap/kube-flannel-cfg configured daemonset.apps/kube-flannel-ds unchanged root@k8s-master01:~
flannel的pod重新生成后节点上的flannel.1已经无效了
1 2 3 4 5 6 7 8 9 10 root@k8s-master01:~ 4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default link/ether 3a:08:06:36:54:6d brd ff:ff:ff:ff:ff:ff inet 10.244.0.0/32 brd 10.244.0.0 scope global flannel.1 valid_lft forever preferred_lft forever inet6 fe80::3808:6ff:fe36:546d/64 scope link valid_lft forever preferred_lft forever
查看路由表信息
1 2 3 4 5 6 7 8 9 10 11 12 root@k8s-master01:~ Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.11.1 0.0.0.0 UG 0 0 0 eth0 10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0 10.244.1.0 172.16.11.81 255.255.255.0 UG 0 0 0 eth0 10.244.2.0 172.16.11.82 255.255.255.0 UG 0 0 0 eth0 10.244.3.0 172.16.11.83 255.255.255.0 UG 0 0 0 eth0 172.16.11.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
总结
flannel的vxlan和vxlan DirectRouting其工作逻辑非常复杂,其需要由Flanneld去查询etcd在节点上生成固定的转发表。
在节点处在同一子网下,host-gw或vxlan DirectRouting模式就无需借助flannel.1去封装报文。
以上解决的都是pod间的通信,而在k8s上网络插件除了要解决pod间的通信外还需要解决网络策略,namespace仅能隔离名称,但pod间的通信是不受任何限制的,所以要想真正限制pod间的通信还需要施加网络策略。flannel这种借助etcd存储并生成路由表的方式,在小规模网络中还能使用,在大规模网络且变动平凡的网络中不适用,应该让个节点的路由表通过路由协议学习生成,这个是calico网络插件所具有的功能。