kubernetes的高级调度有以下几种:

  • 节点亲和调度
  • Pod亲和调度
  • 节点污点和Pod容忍度
  • 拓扑分布式调度

节点亲和调度

节点亲和的调度实现方法有以下几种:

  1. pod.spec.nodeName:人为指定Pod运行在哪个节点之上
  2. pod.spec.nodeSelector:节点选择器
  3. pod.spec.affinity.nodeAffinity:节点亲和性。节点的亲和中还存在四种情况
    • 亲和:必须运行在该节点上
    • 反亲和:老死不相往来
    • 硬亲和:必须运行在该节点上。
    • 软亲和:如果节点存在就运行在该节点上,如果节点不存在那就退而求其次,选择其他节点

所谓的亲和就是激活调度策略中的预选和优选函数。

nodeName亲和

nodeName亲和只需要在pod模板里,pod.spec.nodeName字段内指定所要运行的节点即可。

nodeSelector亲和

nodeSelector亲和是依靠Node节点上的标签来实现亲和,所以nodeSelector的亲和分为2步:

  1. 在需要运行pod的Node上打上指定的标签,如该节点上存在ssd那就打标 disktype=ssd
  2. 在pod模板中pod.spec.nodeSelector字段中筛选出指定的标签,如disktype=ssd

nodeAffinity亲和

节点亲和存在硬亲和和软亲和

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
root@k8s-master01:~/yaml/chapter11# kubectl explain pod.spec.affinity.nodeAffinity
KIND: Pod
VERSION: v1

RESOURCE: nodeAffinity <Object>

DESCRIPTION:
Describes node affinity scheduling rules for the pod.

Node affinity is a group of node affinity scheduling rules.

FIELDS:
preferredDuringSchedulingIgnoredDuringExecution <[]Object> # 软亲和
The scheduler will prefer to schedule pods to nodes that satisfy the
affinity expressions specified by this field, but it may choose a node that
violates one or more of the expressions. The node that is most preferred is
the one with the greatest sum of weights, i.e. for each node that meets all
of the scheduling requirements (resource request, requiredDuringScheduling
affinity expressions, etc.), compute a sum by iterating through the
elements of this field and adding "weight" to the sum if the node matches
the corresponding matchExpressions; the node(s) with the highest sum are
the most preferred.

requiredDuringSchedulingIgnoredDuringExecution <Object> # 硬亲和
If the affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node. If the
affinity requirements specified by this field cease to be met at some point
during pod execution (e.g. due to an update), the system may or may not try
to eventually evict the pod from its node.

节点亲和性示例

nodeSelector示例

1.编写资源清单

1
2
3
4
5
6
7
8
9
10
11
root@k8s-master01:~/yaml/chapter11# vim pod-with-nodeselector.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-with-nodeselector
spec:
containers:
- name: demoapp
image: ikubernetes/demoapp:v1.0
nodeSelector:
gpu: '' # 要求node节点要存在gpu这个标签

2.应用资源清单

1
2
3
4
5
6
7
8
9
10
11
12
13
14
root@k8s-master01:~/yaml/chapter11# kubectl apply -f pod-with-nodeselector.yaml
pod/pod-with-nodeselector created

root@k8s-master01:~/yaml/chapter11# kubectl get pods pod-with-nodeselector -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-with-nodeselector 0/1 Pending 0 43s <none> <none> <none> <none>

# pod处于pending状态,这是应为因为当前节点没有gpu=''的节点
root@k8s-master01:~/yaml/chapter11# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8s-master01 Ready control-plane,master 5d20h v1.21.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master01,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
k8s-node01 Ready <none> 5d20h v1.21.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node01,kubernetes.io/os=linux
k8s-node02 Ready <none> 5d20h v1.21.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node02,kubernetes.io/os=linux
k8s-node03 Ready <none> 5d20h v1.21.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node03,kubernetes.io/os=linux

3.为node03打上标签后再次查看pod

1
2
3
4
5
6
7
root@k8s-master01:~/yaml/chapter11# kubectl label node k8s-node03 gpu=''
node/k8s-node03 labeled

# 再次查看可以看出pod被调度到了node03
root@k8s-master01:~/yaml/chapter11# kubectl get pods pod-with-nodeselector -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-with-nodeselector 1/1 Running 0 3m35s 192.168.30.4 k8s-node03 <none> <none>

可以看出nodeSelector为硬亲和。如果不符合将会被pending.

nodeAffinity硬亲和示例

1.编写资源清单

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
apiVersion: apps/v1
kind: Deployment
metadata:
name: node-affinity-required
namespace: default
spec:
replicas: 5
selector:
matchLabels:
app: demoapp
ctlr: node-affinity-required
template:
metadata:
labels:
app: demoapp
ctlr: node-affinity-required
spec:
containers:
- name: demoapp
image: ikubernetes/demoapp:v1.0
livenessProbe:
httpGet:
path: '/livez'
port: 80
initialDelaySeconds: 5
readinessProbe:
httpGet:
path: '/readyz'
port: 80
initialDelaySeconds: 15
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution: # 硬亲和
nodeSelectorTerms: # 节点选择器
- matchExpressions: # 匹配标签
- key: gpu # 标签键为gpu
operator: Exists # 操作符为存在
- key: node-role.kubernete.io/master # 标签键为 master
operator: DoesNotExist # 操作符为DoesNotExist

2.应用资源清单

1
2
root@k8s-master01:~/yaml/chapter11# kubectl apply -f node-affinity-required-demo.yaml
deployment.apps/node-affinity-required created

3.查看pod运行状况

1
2
3
4
5
6
7
8
root@k8s-master01:~/yaml/chapter11# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
node-affinity-required-6ccb64cd6f-gc8lv 1/1 Running 1 104s 192.168.30.5 k8s-node03 <none> <none>
node-affinity-required-6ccb64cd6f-nxssz 1/1 Running 0 104s 192.168.30.7 k8s-node03 <none> <none>
node-affinity-required-6ccb64cd6f-pz9rx 1/1 Running 0 104s 192.168.30.9 k8s-node03 <none> <none>
node-affinity-required-6ccb64cd6f-rg2fk 1/1 Running 1 104s 192.168.30.6 k8s-node03 <none> <none>
node-affinity-required-6ccb64cd6f-wsfjt 1/1 Running 0 104s 192.168.30.8 k8s-node03 <none> <none>
pod-with-nodeselector 1/1 Running 0 59m 192.168.30.4 k8s-node03 <none> <none>

由于在nodeSelector的示例中已经对node03进行了打标,所以node03上存在gpu这个标签,所有的pod在调度时由于受到了硬亲和的影响,全被调度到了node03上。

nodeAfinity软亲和示例

1.编写资源清单

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
root@k8s-master01:~/yaml/chapter11# vim node-affinity-preferred-demo.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: node-affinity-preferred
spec:
replicas: 5
selector:
matchLabels:
app: demoapp
ctlr: node-affinity-preferred
template:
metadata:
name: demoapp
labels:
app: demoapp
ctlr: node-affinity-preferred
spec:
containers:
- name: demoapp
image: ikubernetes/demoapp:v1.0
resources:
requests:
cpu: 1500m
memory: 1Gi
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 60
preference:
matchExpressions:
- key: gpu
operator: Exists
- weight: 30
preference:
matchExpressions:
- key: region
operator: In
values: ["foo","bar"]

2.应用资源清单

1
2
root@k8s-master01:~/yaml/chapter11# kubectl apply -f node-affinity-preferred-demo.yaml
deployment.apps/node-affinity-preferred created

3.查看pod调度结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
root@k8s-master01:~/yaml/chapter11# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
node-affinity-preferred-7844dd55fb-7n8bq 1/1 Running 0 10s 192.168.30.12 k8s-node03 <none> <none>
node-affinity-preferred-7844dd55fb-khchg 1/1 Running 0 10s 192.168.30.11 k8s-node03 <none> <none>
node-affinity-preferred-7844dd55fb-qj5ll 1/1 Running 0 10s 192.168.96.4 k8s-node02 <none> <none>
node-affinity-preferred-7844dd55fb-v8l6v 1/1 Running 0 10s 192.168.131.14 k8s-node01 <none> <none>
node-affinity-preferred-7844dd55fb-xn4j8 1/1 Running 0 10s 192.168.30.10 k8s-node03 <none> <none>
node-affinity-required-6ccb64cd6f-gc8lv 1/1 Running 1 140m 192.168.30.5 k8s-node03 <none> <none>
node-affinity-required-6ccb64cd6f-nxssz 1/1 Running 0 140m 192.168.30.7 k8s-node03 <none> <none>
node-affinity-required-6ccb64cd6f-pz9rx 1/1 Running 0 140m 192.168.30.9 k8s-node03 <none> <none>
node-affinity-required-6ccb64cd6f-rg2fk 1/1 Running 4 140m 192.168.30.6 k8s-node03 <none> <none>
node-affinity-required-6ccb64cd6f-wsfjt 1/1 Running 0 140m 192.168.30.8 k8s-node03 <none> <none>
pod-with-nodeselector 1/1 Running 0 3h17m 192.168.30.4 k8s-node03 <none> <none>

# 可以看到大部分的node-affinity-preferred被调度到了node03上,有2个被调度到了其他节点,这是由于node03的cpu资源不足以创建出pod所致

4.查看node03资源用量占用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
root@k8s-master01:~/yaml/chapter11# kubectl describe node k8s-node03 | grep -A 30 Namespace
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
default node-affinity-preferred-7844dd55fb-7n8bq 1500m (25%) 0 (0%) 1Gi (13%) 0 (0%) 12m
default node-affinity-preferred-7844dd55fb-khchg 1500m (25%) 0 (0%) 1Gi (13%) 0 (0%) 12m
default node-affinity-preferred-7844dd55fb-xn4j8 1500m (25%) 0 (0%) 1Gi (13%) 0 (0%) 12m
default node-affinity-required-6ccb64cd6f-gc8lv 0 (0%) 0 (0%) 0 (0%) 0 (0%) 152m
default node-affinity-required-6ccb64cd6f-nxssz 0 (0%) 0 (0%) 0 (0%) 0 (0%) 152m
default node-affinity-required-6ccb64cd6f-pz9rx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 152m
default node-affinity-required-6ccb64cd6f-rg2fk 0 (0%) 0 (0%) 0 (0%) 0 (0%) 152m
default node-affinity-required-6ccb64cd6f-wsfjt 0 (0%) 0 (0%) 0 (0%) 0 (0%) 152m
default pod-with-nodeselector 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h30m
dev deployment-demo-fb544c5d8-frmr7 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2d21h
kube-system calico-node-d9krp 250m (4%) 0 (0%) 0 (0%) 0 (0%) 5d23h
kube-system kube-proxy-hvvm6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d23h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 4750m (79%) 0 (0%)
memory 3Gi (39%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events: <none>