kubernets scheduler主要负责的工作是:接受API Server创建的新Pod,并为其安排一个主机,将信息写入etcd中。

调度流程

kubernetes调度器通过API Server查找还未分配主机的Pod,并尝试为这些Pod分配主机。

  • 客户端提交创建请求,可以通过API Server的Restful API,也可以使用kubectl命令行工具。支持的数据类型包括JSON和YAML
  • API Server处理用户请求,存储Pod数据到etcd
  • 调度器通过API Server查看未绑定的Pod。尝试为Pod分配主机
  • 过滤主机:调度器用一组规则过滤掉不符合要求的主机。比如Pod指定了所需要的资源量,那么可用资源比Pod需要的资源量少的主机会被过滤掉
  • 主机打分:对第一步筛选出的符合要求的主机进行打分,在主机打分阶段,调度器会考虑一些整体优化策略,比如把容一个Replication Controller的副本分布到不同的主机上,使用最低负载的主机等
  • 选择主机:选择打分最高的主机,进行binding操作,结果存储到etcd中
  • 所选主机对于的kubelet根据调度结果执行Pod创建操作

kubernetes中实现的过滤规则主要包括以下几种(在 kubernetes/plugin/pkg/scheduler/algorithm/predicates 中实现):
参考:https://blog.csdn.net/horsefoot/article/details/51263364

  • NoDiskConflict:检查在此主机上是否存在卷冲突。如果这个主机已经挂载了卷,其它同样使用这个卷的Pod不能调度到这个主机上。
  • NoVolumeZoneConflict:检查给定的zone限制前提下,检查如果在此主机上部署Pod是否存在卷冲突。
  • PodFitsResources:检查主机的资源是否满足Pod的需求。根据实际已经分配的资源量做调度,而不是使用已实际使用的资源量做调度。
  • PodFitsHostPorts:检查Pod内每一个容器所需的HostPort是否已被其它容器占用。如果有所需的HostPort不满足需求,那么Pod不能调度到这个主机上。
  • HostName:检查主机名称是不是Pod指定的HostName。
  • MatchNodeSelector:检查主机的标签是否满足Pod的nodeSelector属性需求。
  • MaxEBSVolumeCount:确保已挂载的EBS存储卷不超过设置的最大值,默认值是39。它会检查直接使用的存储卷,和间接使用这种类型存储的PVC。计算不同卷的总目,如果新的Pod部署上去后卷的数目会超过设置的最大值,那么Pod不能调度到这个主机上。
  • MaxGCEPDVolumeCount:确保已挂载的GCE存储卷不超过设置的最大值,默认值是16。规则同上。

环境说明

环境:
   k8s v1.9.2
   centos v7.4
   tomcat v8.0/9.0
   docker v18.05.0-ce

节点名称 IP 备注
KUBE-MASTER+ETCD1 192.168.1.201 部署nginx反向代理
KUBE-ETCD2 192.168.1.202 -
KUBE-ETCD3 192.168.1.203 -
KUBE-NODE1 192.168.1.204 -
KUBE-NODE2 192.168.1.205 -
KUBE-NODE3 192.168.1.206 -

RollingUpdate

Kubernetes rolling-update升级流程图
查看tomcat编排文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
[root@localhost ~]# cat tomcat.yml
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: tomcat-deployment
namespace: web
spec:
replicas: 2
selector:
matchLabels:
app: tomcat
template:
metadata:
labels:
app: tomcat
spec:
nodeSelector:
nodelabel: tomcat
containers:
- name: tomcat
image: tomcat:8.0
imagePullPolicy: Always
ports:
- containerPort: 8080
volumeMounts:
- name: time
mountPath: "/etc/localtime"
readOnly: true
volumes:
- name: time
hostPath:
path: "/etc/localtime"
---
apiVersion: v1
kind: Service
metadata:
name: tomcat-service
namespace: web
labels:
app: tomcat-service
spec:
type: NodePort
selector:
app: tomcat
ports:
- port: 8080
nodePort: 38080
[root@localhost ~]#

查看k8s节点信息:
这里应用replicas: 2且固定了端口,所以Label的Node数必须比应用个数大一个,否则会失败

1
2
3
4
5
6
[root@localhost ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
192.168.1.204 Ready <none> 1d v1.9.2
192.168.1.205 Ready <none> 1d v1.9.2
192.168.1.206 Ready <none> 5h v1.9.2
[root@localhost ~]#

应用tomcat编排文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@localhost ~]# kubectl label node 192.168.1.204 192.168.1.205 192.168.1.206 "nodelabel=tomcat"
[root@localhost ~]# kubectl create namespace web
[root@localhost ~]# kubectl apply -f tomcat.yml
[root@localhost ~]# kubectl get pod -n web -o wide
NAME READY STATUS RESTARTS AGE IP NODE
tomcat-deployment-5768d76c66-jwlhp 1/1 Running 0 46s 172.30.5.2 192.168.1.205
tomcat-deployment-5768d76c66-sxcj6 1/1 Running 0 46s 172.30.79.2 192.168.1.204
[root@localhost ~]#
[root@localhost ~]# kubectl get svc -n web -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
tomcat-service NodePort 172.16.30.72 <none> 8080:38080/TCP 21s app=tomcat
[root@localhost ~]#

查看应用部署状态:

1
2
3
4
5
6
7
8
[root@localhost ~]# kubectl rollout status deployment tomcat-deployment --namespace=web
deployment "tomcat-deployment" successfully rolled out
[root@localhost ~]#
#升级前tomcat为8.0版本
[root@localhost ~]# kubectl describe pod tomcat-deployment-5768d76c66-jwlhp -n web | grep -i 'image:'
Image: tomcat:8.0
[root@localhost ~]#

升级应用:

1
2
3
[root@localhost ~]# kubectl set image deployment tomcat-deployment tomcat=tomcat:9.0 --namespace=web --record
deployment "tomcat-deployment" image updated
[root@localhost ~]#

查看升级结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@localhost ~]# kubectl get pod -n web -o wide
NAME READY STATUS RESTARTS AGE IP NODE
tomcat-deployment-54b6fc9b99-6jr2x 1/1 Running 0 1m 172.30.79.3 192.168.1.204
tomcat-deployment-54b6fc9b99-c6rs9 1/1 Running 0 1m 172.30.27.2 192.168.1.206
[root@localhost ~]#
[root@localhost ~]# kubectl rollout status deployment tomcat-deployment --namespace=web
Waiting for rollout to finish: 1 out of 2 new replicas have been updated...
Waiting for rollout to finish: 1 out of 2 new replicas have been updated...
Waiting for rollout to finish: 1 out of 2 new replicas have been updated...
Waiting for rollout to finish: 1 old replicas are pending termination...
Waiting for rollout to finish: 1 old replicas are pending termination...
deployment "tomcat-deployment" successfully rolled out
[root@localhost ~]#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
[root@localhost ~]# kubectl describe deployment tomcat-deployment --namespace=web
Name: tomcat-deployment
Namespace: web
CreationTimestamp: Mon, 28 May 2018 17:28:27 +0800
Labels: app=tomcat
Annotations: deployment.kubernetes.io/revision=2
kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"apps/v1beta2","kind":"Deployment","metadata":{"annotations":{},"name":"tomcat-deployment","namespace":"web"},"spec":{"replicas":2,"selec... kubernetes.io/change-cause=kubectl set image deployment tomcat-deployment tomcat=tomcat:9.0 --namespace=web --record=trueSelector: app=tomcat
Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=tomcat
Containers:
tomcat:
Image: tomcat:9.0
Port: 8080/TCP
Environment: <none>
Mounts:
/etc/localtime from time (ro)
Volumes:
time:
Type: HostPath (bare host directory volume)
Path: /etc/localtime
HostPathType:
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: tomcat-deployment-54b6fc9b99 (2/2 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 4m deployment-controller Scaled up replica set tomcat-deployment-5768d76c66 to 2
Normal ScalingReplicaSet 1m deployment-controller Scaled up replica set tomcat-deployment-54b6fc9b99 to 1
Normal ScalingReplicaSet 1m deployment-controller Scaled down replica set tomcat-deployment-5768d76c66 to 1
Normal ScalingReplicaSet 1m deployment-controller Scaled up replica set tomcat-deployment-54b6fc9b99 to 2
Normal ScalingReplicaSet 52s deployment-controller Scaled down replica set tomcat-deployment-5768d76c66 to 0
[root@localhost ~]#
1
2
3
4
#tomcat成功升级到9.0版本
[root@localhost ~]# kubectl describe pod tomcat-deployment-54b6fc9b99-6jr2x -n web | grep -i 'image:'
Image: tomcat:9.0
[root@localhost ~]#

查看deployments版本:
由于之前在升级(set image)时添加--record参数,故在查看历史版本时会记录操作的指令

1
2
3
4
5
6
7
[root@localhost ~]# kubectl rollout history deployment tomcat-deployment --namespace=web
deployments "tomcat-deployment"
REVISION CHANGE-CAUSE
1 <none>
2 kubectl set image deployment tomcat-deployment tomcat=tomcat:9.0 --namespace=web --record=true
[root@localhost ~]#

版本回滚

1
2
3
4
5
6
7
8
9
[root@localhost ~]# kubectl rollout undo deployment tomcat-deployment --namespace=web --to-revision=1
deployment "tomcat-deployment"
[root@localhost ~]# kubectl rollout history deployment tomcat-deployment --namespace=web
deployments "tomcat-deployment"
REVISION CHANGE-CAUSE
2 kubectl set image deployment tomcat-deployment tomcat=tomcat:9.0 --namespace=web --record=true
3 <none>
[root@localhost ~]#

查看回滚版本结果:
tomcat成功回退到8.0版本了

1
2
3
4
5
6
7
8
[root@localhost ~]# kubectl get pod -n web -o wide
NAME READY STATUS RESTARTS AGE IP NODE
tomcat-deployment-5768d76c66-bjwlc 1/1 Running 0 3m 172.30.5.2 192.168.1.205
tomcat-deployment-5768d76c66-bz5rf 1/1 Running 0 2m 172.30.79.2 192.168.1.204
[root@localhost ~]#
[root@localhost ~]# kubectl describe pod tomcat-deployment-5768d76c66-bjwlc -n web | grep -i 'image:'
Image: tomcat:8.0
[root@localhost ~]#

查看版本回滚历史:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
[root@localhost ~]# kubectl describe deployment tomcat-deployment --namespace=web
Name: tomcat-deployment
Namespace: web
CreationTimestamp: Mon, 28 May 2018 17:28:27 +0800
Labels: app=tomcat
Annotations: deployment.kubernetes.io/revision=3
kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"apps/v1beta2","kind":"Deployment","metadata":{"annotations":{},"name":"tomcat-deployment","namespace":"web"},"spec":{"replicas":2,"selec...Selector: app=tomcat
Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=tomcat
Containers:
tomcat:
Image: tomcat:8.0
Port: 8080/TCP
Environment: <none>
Mounts:
/etc/localtime from time (ro)
Volumes:
time:
Type: HostPath (bare host directory volume)
Path: /etc/localtime
HostPathType:
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: tomcat-deployment-5768d76c66 (2/2 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 55m deployment-controller Scaled up replica set tomcat-deployment-54b6fc9b99 to 1
Normal ScalingReplicaSet 55m deployment-controller Scaled down replica set tomcat-deployment-5768d76c66 to 1
Normal ScalingReplicaSet 55m deployment-controller Scaled up replica set tomcat-deployment-54b6fc9b99 to 2
Normal ScalingReplicaSet 54m deployment-controller Scaled down replica set tomcat-deployment-5768d76c66 to 0
Normal ScalingReplicaSet 4m deployment-controller Scaled up replica set tomcat-deployment-5768d76c66 to 1
Normal DeploymentRollback 4m deployment-controller Rolled back deployment "tomcat-deployment" to revision 1
Normal ScalingReplicaSet 4m (x2 over 58m) deployment-controller Scaled up replica set tomcat-deployment-5768d76c66 to 2
Normal ScalingReplicaSet 4m deployment-controller Scaled down replica set tomcat-deployment-54b6fc9b99 to 1
Normal ScalingReplicaSet 3m deployment-controller Scaled down replica set tomcat-deployment-54b6fc9b99 to 0
[root@localhost ~]#

注:在升级容器应用版本和回滚时,测试发现会约有2-3秒时间会出现应用访问故障。
测试情况为:Nginx反向代理所有Node IP地址+应用端口号。建议该应用端口号在Label Node上全局唯一。

参数说明:
参考:https://www.jianshu.com/p/6bc8e0ae65d1
https://www.ipcpu.com/2017/09/kubernetes-rolling-update/

  • maxSurgemaxUnavailable
    maxSurge: 1 表示滚动升级时会先启动1个pod;maxUnavailable: 1 表示滚动升级时允许的最大Unavailable的pod个数
  • terminationGracePeriodSeconds
    k8s将会给应用发送SIGTERM信号,可以用来正确、优雅地关闭应用,默认为30秒。如果需要更优雅地关闭,则可以使用k8s提供的pre-stop lifecycle hook 的配置声明,将会在发送SIGTERM之前执行
  • livenessProbereadinessProbe
    livenessProbe是kubernetes认为该pod是存活的,不存在则需要kill掉,然后再新启动一个,以达到replicas指定的个数;readinessProbe是kubernetes认为该pod是启动成功的,这里根据每个应用的特性,自己去判断,可以执行command,也可以进行httpGet

Q & A

Question1:
如果报下面的错误,是k8s在调度时发现无可用的Node,因应用需要的主机端口在Node上被占用,无法完成RollingUpdate。

1
Warning FailedScheduling 12s (x26 over 6m) default-scheduler 0/2 nodes are available: 2 PodFitsHostPorts.

思考:
应用升级成功,升级后NodeIP改变,怎么访问容器应用?
Answer:直接通过POD的IP和端口号可以访问容器应用,但POD的IP地址是不可靠的,如:当POD所在Node发生故障时,POD会被K8S重新调度到另一个POD,这是POD的IP地址就发生了变化,应用将无法访问。部署分布式容器应用,多个实例同时提供服务,容器应用前面配置一个负载均衡来实现转发。K8S的kind:server就是用来解决这些核心问题的。
测试经过:kind:Deployment+service部署的应用,访问任何一个NodeIP+端口号都可以,kind:Deloyment部署应用,访问应用只能用所在的Node IP+端口号才可以。


本文出自”Jack Wang Blog”:http://www.yfshare.vip/2018/05/28/k8s滚动升级-RollingUpdate/