以镜像格式打包并托管运行于编排系统或容器引擎上的容器就是一个黑盒,因此想要探测容器内部引用进程到底运行健康与否都被容器边界所阻挡,因此正常情况下任何一个为云原生环境所开发的应用都应该考虑到此问题,所以为了便于监测容器自身运行健康与否都应该拥有一个用于探测容器内部应用的探测接口。

一般而言一个云原生应用需要以下接口:

为了便于探测一个Pod内的容器运行健康与否,Pod在设计上直接在Pod级别或Pod内的容器级别就支持允许用户下探针的接口。

Pod内置的三种探针探测

  • LivenessProbe: 存活探针。周期性检测,检测未通过时,kubelet会更具restartPolicy的定义来决定是否会重启该容器;未定义时,kubelet认为容器未终止,即为健康;
  • ReadinessProbe: 就绪性探针。周期性检测,检测未通过时,与该Pod关联的Service,会将该Pod从Service的后端可用端点中删除;直到再次就绪,重新添加回来。未定义时,只要容器未终止,即未就绪;
  • StartupProbe: 启动状态检测。用于检测容器刚运行时,检测其启动是否成功。StartupProbe探针正常退出后,livenessProbe才会工作。便于用户同时使用livenessProbe不同参数或阈值。

探针相关清单列表

1
2
3
4
5
6
7
8
9
10
11
12
13
spec:
containers:
- name: …
image: …
livenessProbe:
exec <Object> # 命令式探针
httpGet <Object> # http GET类型的探针
tcpSocket <Object> # tcp Socket类型的探针
initialDelaySeconds <integer> # 发起初次探测请求的延后时长
periodSeconds <integer> # 请求周期
timeoutSeconds <integer> # 超时时长
successThreshold <integer> # 成功阈值
failureThreshold <integer> # 失败阈值

探针的内置检测方法

Pod的3种探针检测都内置了3种检测方法:

  • ExecAction:直接执行命令,命令成功返回表示探测成功;
  • TCPSocketAction:端口能正常打开,即成功;
  • HTTPGetAction:向指定的path发HTTP请求,2xx, 3xx的响应码表示成功;

存活探针示例

ExecAction示例

探针是在容器内周期性执行某命令,若命令成功则表示容器健康的一种探测方法。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

1.编写配置清单

```bash
root@k8s-master01:~/yaml/chapter04# vim liveness-exec-demo.yaml
apiVersion: v1
kind: Pod
metadata:
name: liveness-exec-demo
namespace: default
spec:
containers:
- name: demo
image: ikubernetes/demoapp:v1.0
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command: ['/bin/sh','-c','[ "$(curl -s 127.0.0.1/livez)" == "OK" ]']
initialDelaySeconds: 5
timeoutSeconds: 1
periodSeconds: 5

2.启动容器

1
2
3
4
5
6
root@k8s-master01:~/yaml/chapter04# kubectl apply -f liveness-exec-demo.yaml
pod/liveness-exec-demo created

root@k8s-master01:~/yaml/chapter04# kubectl get pods liveness-exec-demo -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-exec-demo 1/1 Running 0 68s 10.244.1.20 k8s-node01 <none> <none>

3.测试存活性探针

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 访问容器内的/livez,获得ok
root@k8s-master01:~/yaml/chapter04# curl 10.244.1.20/livez
OK

# 该容器支持POST,将livez改成FAIL后再次访问
root@k8s-master01:~/yaml/chapter04# curl -XPOST -d 'livez=FAIL' 10.244.1.20/livez
root@k8s-master01:~/yaml/chapter04# curl 10.244.1.20/livez
FAIL

# 监视容器的变化。
root@k8s-master01:~/yaml/chapter04# kubectl get pods liveness-exec-demo -w
NAME READY STATUS RESTARTS AGE
liveness-exec-demo 1/1 Running 0 11m
liveness-exec-demo 1/1 Running 1 11m # 容器被重启了。

TCPSocketAction示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

1.编写配置清单

```bash
root@k8s-master01:~/yaml/chapter04# vim liveness-tcpsocket-demo.yaml
apiVersion: v1
kind: Pod
metadata:
name: liveness-tcpsocket-demo
namespace: default
spec:
containers:
- name: demo
image: ikubernetes/demoapp:v1.0
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 80
securityContext:
capabilities:
add:
- NET_ADMIN # 给与网络控制权限
livenessProbe:
tcpSocket:
port: http
periodSeconds: 5
initialDelaySeconds: 5

2.引用配置清单

1
2
3
4
5
6
root@k8s-master01:~/yaml/chapter04# kubectl apply -f liveness-tcpsocket-demo.yaml
pod/liveness-tcpsocket-demo created

root@k8s-master01:~/yaml/chapter04# kubectl get pods liveness-tcpsocket-demo -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-tcpsocket-demo 1/1 Running 0 7m29s 10.244.3.19 k8s-node03 <none> <none>
  1. 测试存活性探针
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# 在容器内将送往80端口的请求全部DROP
root@k8s-master01:~# kubectl exec liveness-tcpsocket-demo -- iptables -A INPUT -p TCP --dport 80 -j DROP

# 查看Pod的描述信息,看Events部分
root@k8s-master01:~# kubectl describe pod liveness-tcpsocket-demo
Name: liveness-tcpsocket-demo
Namespace: default
Priority: 0
Node: k8s-node03/172.16.11.83
Start Time: Fri, 02 Jul 2021 01:15:08 +0000
Labels: <none>
Annotations: <none>
Status: Running
IP: 10.244.3.19
IPs:
IP: 10.244.3.19
Containers:
demo:
Container ID: docker://9d3b43e56ff1ddfb12be7315402193d4ce086f943a8aae2a70805982e5730712
Image: ikubernetes/demoapp:v1.0
Image ID: docker-pullable://ikubernetes/demoapp@sha256:6698b205eb18fb0171398927f3a35fe27676c6bf5757ef57a35a4b055badf2c3
Port: 80/TCP
Host Port: 0/TCP
State: Running
Started: Fri, 02 Jul 2021 01:15:11 +0000
Ready: True
Restart Count: 0
Liveness: tcp-socket :http delay=5s timeout=1s period=5s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rq2n7 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-rq2n7:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 21m default-scheduler Successfully assigned default/liveness-tcpsocket-demo to k8s-node03
Normal Pulled 21m kubelet Container image "ikubernetes/demoapp:v1.0" already present on machine
Normal Created 21m kubelet Created container demo
Normal Started 21m kubelet Started container demo
Warning Unhealthy 14s (x3 over 24s) kubelet Liveness probe failed: dial tcp 10.244.3.19:80: i/o timeout
Normal Killing 14s kubelet Container demo failed liveness probe, will be restarted
# 已经被探测到容器不监健康被重启

# watch Pod的状态
root@k8s-master01:~/yaml/chapter04# kubectl get pods liveness-tcpsocket-demo -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-tcpsocket-demo 1/1 Running 0 15m 10.244.3.19 k8s-node03 <none> <none>
liveness-tcpsocket-demo 1/1 Running 1 21m 10.244.3.19 k8s-node03 <none> <none>
# pod被重启一次。

HTTPGetAction示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

1. 编写配置清单

```bash
root@k8s-master01:~/yaml/chapter04# vim liveness-httpget-demo.yaml
apiVersion: v1
kind: Pod
metadata:
name: liveness-httpget-demo
namespace: default
spec:
containers:
- name: demoapp
image: ikubernetes/demoapp:v1.0
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
port: 80
path: "/livez"
scheme: HTTP
initialDelaySeconds: 5
  1. 应用配置清单
1
2
3
4
5
6
root@k8s-master01:~/yaml/chapter04# kubectl apply -f liveness-httpget-demo.yaml
pod/liveness-httpget-demo created

root@k8s-master01:~/yaml/chapter04# kubectl get pods liveness-httpget-demo -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-httpget-demo 1/1 Running 0 10s 10.244.2.21 k8s-node02 <none> <none>
  1. 测试httpGet探针
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
# 此容器内支持一种策略当/livez的返回值不为ture时,则返回5XX
root@k8s-master01:~/yaml/chapter04# curl 10.244.2.21/livez
OK

# 将livez的值改为FAIL,后返回值为506
root@k8s-master01:~/yaml/chapter04# curl -XPOST -d 'livez=FAIL' 10.244.2.21/livez
root@k8s-master01:~/yaml/chapter04# curl -I 10.244.2.21/livez
HTTP/1.0 506 VARIANT ALSO NEGOTIATES
Content-Type: text/html; charset=utf-8
Content-Length: 4
Server: Werkzeug/1.0.0 Python/3.8.2
Date: Fri, 02 Jul 2021 03:20:15 GMT

# 监视容器变化,被重启一次
root@k8s-master01:~/yaml/chapter04# kubectl get pods liveness-httpget-demo -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-httpget-demo 1/1 Running 0 56m 10.244.2.21 k8s-node02 <none> <none>
liveness-httpget-demo 1/1 Running 1 57m 10.244.2.21 k8s-node02 <none> <none>

# 查看该容器描述信息
root@k8s-master01:~/yaml/chapter04# kubectl describe pod liveness-httpget-demo
Name: liveness-httpget-demo
Namespace: default
Priority: 0
Node: k8s-node02/172.16.11.82
Start Time: Fri, 02 Jul 2021 02:23:34 +0000
Labels: <none>
Annotations: <none>
Status: Running
IP: 10.244.2.21
IPs:
IP: 10.244.2.21
Containers:
demoapp:
Container ID: docker://106715f35270b244563fddbca0bb6cd9dc0745623a24ddad4bb22e13c6754fb3
Image: ikubernetes/demoapp:v1.0
Image ID: docker-pullable://ikubernetes/demoapp@sha256:6698b205eb18fb0171398927f3a35fe27676c6bf5757ef57a35a4b055badf2c3
Port: <none>
Host Port: <none>
State: Running
Started: Fri, 02 Jul 2021 02:23:36 +0000
Ready: True
Restart Count: 0
Liveness: http-get http://:80/livez delay=5s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-tv76f (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-tv76f:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 57m default-scheduler Successfully assigned default/liveness-httpget-demo to k8s-node02
Normal Pulled 57m kubelet Container image "ikubernetes/demoapp:v1.0" already present on machine
Normal Created 57m kubelet Created container demoapp
Normal Started 57m kubelet Started container demoapp
Warning Unhealthy 57m kubelet Liveness probe failed: Get "http://10.244.2.21:80/livez": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 15s (x3 over 35s) kubelet Liveness probe failed: HTTP probe failed with statuscode: 506
Normal Killing 15s kubelet Container demoapp failed liveness probe, will be restarted
# 3次检测失败容器被重启。

就绪探针示例

HTTPGetAction示例

1.编写配置清单

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
root@k8s-master01:~/yaml/chapter04# vim readiness-httpget-demo.yaml
apiVersion: v1
kind: Pod
metadata:
name: readiness-httpget-demo.yaml
namespace: default
spec:
containers:
- name: demoapp
image: ikubernetes/demoapp:v1.0
imagePullPolicy: IfNotPresent
readinessProbe:
httpGet:
path: '/readyz'
port: 80
scheme: HTTP
initialDelaySeconds: 15
timeoutSeconds: 2
periodSeconds: 5
failureThreshold: 3
restartPolicy: Always

2.应用配置清单

1
2
3
4
5
6
7
8
root@k8s-master01:~/yaml/chapter04# kubectl apply -f readiness-httpget-demo.yaml
pod/readiness-httpget-demo created

# 15秒后就绪性弹探测成功,ready为1
root@k8s-master01:~/yaml/chapter04# kubectl get pods readiness-httpget-demo -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
readiness-httpget-demo 0/1 Running 0 10s 10.244.1.23 k8s-node01 <none> <none>
readiness-httpget-demo 1/1 Running 0 35s 10.244.1.23 k8s-node01 <none> <none>

3.模拟就序探测失败

1
2
3
4
5
6
7
8
9
10
11
12
root@k8s-master01:~/yaml/chapter04# curl 10.244.1.23/readyz
OK

# 将readyz的值改为FAIL
root@k8s-master01:~/yaml/chapter04# curl -XPOST -d 'readyz=FAIL' 10.244.1.23/readyz
root@k8s-master01:~/yaml/chapter04# curl 10.244.1.23/readyz
FAIL

# ready失败变为0,此时会从service后端服务中删除
root@k8s-master01:~/yaml/chapter04# kubectl get pods readiness-httpget-demo -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
readiness-httpget-demo 0/1 Running 0 4m26s 10.244.1.23 k8s-node01 <none> <none>