kubernets scheduler主要负责的工作是:接受API Server创建的新Pod,并为其安排一个主机,将信息写入etcd中。
调度流程
kubernetes调度器通过API Server查找还未分配主机的Pod,并尝试为这些Pod分配主机。
- 客户端提交创建请求,可以通过API Server的Restful API,也可以使用kubectl命令行工具。支持的数据类型包括JSON和YAML
- API Server处理用户请求,存储Pod数据到etcd
- 调度器通过API Server查看未绑定的Pod。尝试为Pod分配主机
- 过滤主机:调度器用一组规则过滤掉不符合要求的主机。比如Pod指定了所需要的资源量,那么可用资源比Pod需要的资源量少的主机会被过滤掉
- 主机打分:对第一步筛选出的符合要求的主机进行打分,在主机打分阶段,调度器会考虑一些整体优化策略,比如把容一个Replication Controller的副本分布到不同的主机上,使用最低负载的主机等
- 选择主机:选择打分最高的主机,进行binding操作,结果存储到etcd中
- 所选主机对于的kubelet根据调度结果执行Pod创建操作
kubernetes中实现的过滤规则主要包括以下几种(在 kubernetes/plugin/pkg/scheduler/algorithm/predicates
中实现):
参考:https://blog.csdn.net/horsefoot/article/details/51263364
NoDiskConflict
:检查在此主机上是否存在卷冲突。如果这个主机已经挂载了卷,其它同样使用这个卷的Pod不能调度到这个主机上。
NoVolumeZoneConflict
:检查给定的zone限制前提下,检查如果在此主机上部署Pod是否存在卷冲突。
PodFitsResources
:检查主机的资源是否满足Pod的需求。根据实际已经分配的资源量做调度,而不是使用已实际使用的资源量做调度。
PodFitsHostPorts
:检查Pod内每一个容器所需的HostPort是否已被其它容器占用。如果有所需的HostPort不满足需求,那么Pod不能调度到这个主机上。
HostName
:检查主机名称是不是Pod指定的HostName。
MatchNodeSelector
:检查主机的标签是否满足Pod的nodeSelector属性需求。
MaxEBSVolumeCount
:确保已挂载的EBS存储卷不超过设置的最大值,默认值是39。它会检查直接使用的存储卷,和间接使用这种类型存储的PVC。计算不同卷的总目,如果新的Pod部署上去后卷的数目会超过设置的最大值,那么Pod不能调度到这个主机上。
MaxGCEPDVolumeCount
:确保已挂载的GCE存储卷不超过设置的最大值,默认值是16。规则同上。
环境说明
环境:
k8s v1.9.2
centos v7.4
tomcat v8.0/9.0
docker v18.05.0-ce
节点名称 |
IP |
备注 |
KUBE-MASTER+ETCD1 |
192.168.1.201 |
部署nginx反向代理 |
KUBE-ETCD2 |
192.168.1.202 |
- |
KUBE-ETCD3 |
192.168.1.203 |
- |
KUBE-NODE1 |
192.168.1.204 |
- |
KUBE-NODE2 |
192.168.1.205 |
- |
KUBE-NODE3 |
192.168.1.206 |
- |
RollingUpdate
查看tomcat编排文件:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
| [root@localhost ~] apiVersion: apps/v1beta2 kind: Deployment metadata: name: tomcat-deployment namespace: web spec: replicas: 2 selector: matchLabels: app: tomcat template: metadata: labels: app: tomcat spec: nodeSelector: nodelabel: tomcat containers: - name: tomcat image: tomcat:8.0 imagePullPolicy: Always ports: - containerPort: 8080 volumeMounts: - name: time mountPath: "/etc/localtime" readOnly: true volumes: - name: time hostPath: path: "/etc/localtime" --- apiVersion: v1 kind: Service metadata: name: tomcat-service namespace: web labels: app: tomcat-service spec: type: NodePort selector: app: tomcat ports: - port: 8080 nodePort: 38080 [root@localhost ~]
|
查看k8s节点信息:
这里应用replicas: 2
且固定了端口,所以Label的Node数必须比应用个数大一个,否则会失败
1 2 3 4 5 6
| [root@localhost ~] NAME STATUS ROLES AGE VERSION 192.168.1.204 Ready <none> 1d v1.9.2 192.168.1.205 Ready <none> 1d v1.9.2 192.168.1.206 Ready <none> 5h v1.9.2 [root@localhost ~]
|
应用tomcat编排文件:
1 2 3 4 5 6 7 8 9 10 11 12 13
| [root@localhost ~] [root@localhost ~] [root@localhost ~] [root@localhost ~] NAME READY STATUS RESTARTS AGE IP NODE tomcat-deployment-5768d76c66-jwlhp 1/1 Running 0 46s 172.30.5.2 192.168.1.205 tomcat-deployment-5768d76c66-sxcj6 1/1 Running 0 46s 172.30.79.2 192.168.1.204 [root@localhost ~] [root@localhost ~] NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR tomcat-service NodePort 172.16.30.72 <none> 8080:38080/TCP 21s app=tomcat [root@localhost ~]
|
查看应用部署状态:
1 2 3 4 5 6 7 8
| [root@localhost ~] deployment "tomcat-deployment" successfully rolled out [root@localhost ~] [root@localhost ~] Image: tomcat:8.0 [root@localhost ~]
|
升级应用:
1 2 3
| [root@localhost ~] deployment "tomcat-deployment" image updated [root@localhost ~]
|
查看升级结果:
1 2 3 4 5 6 7 8 9 10 11 12 13
| [root@localhost ~] NAME READY STATUS RESTARTS AGE IP NODE tomcat-deployment-54b6fc9b99-6jr2x 1/1 Running 0 1m 172.30.79.3 192.168.1.204 tomcat-deployment-54b6fc9b99-c6rs9 1/1 Running 0 1m 172.30.27.2 192.168.1.206 [root@localhost ~] [root@localhost ~] Waiting for rollout to finish: 1 out of 2 new replicas have been updated... Waiting for rollout to finish: 1 out of 2 new replicas have been updated... Waiting for rollout to finish: 1 out of 2 new replicas have been updated... Waiting for rollout to finish: 1 old replicas are pending termination... Waiting for rollout to finish: 1 old replicas are pending termination... deployment "tomcat-deployment" successfully rolled out [root@localhost ~]
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
| [root@localhost ~] Name: tomcat-deployment Namespace: web CreationTimestamp: Mon, 28 May 2018 17:28:27 +0800 Labels: app=tomcat Annotations: deployment.kubernetes.io/revision=2 kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"apps/v1beta2","kind":"Deployment","metadata":{"annotations":{},"name":"tomcat-deployment","namespace":"web"},"spec":{"replicas":2,"selec... kubernetes.io/change-cause=kubectl set image deployment tomcat-deployment tomcat=tomcat:9.0 --namespace=web --record=trueSelector: app=tomcat Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 25% max unavailable, 25% max surge Pod Template: Labels: app=tomcat Containers: tomcat: Image: tomcat:9.0 Port: 8080/TCP Environment: <none> Mounts: /etc/localtime from time (ro) Volumes: time: Type: HostPath (bare host directory volume) Path: /etc/localtime HostPathType: Conditions: Type Status Reason ---- ------ ------ Available True MinimumReplicasAvailable Progressing True NewReplicaSetAvailable OldReplicaSets: <none> NewReplicaSet: tomcat-deployment-54b6fc9b99 (2/2 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 4m deployment-controller Scaled up replica set tomcat-deployment-5768d76c66 to 2 Normal ScalingReplicaSet 1m deployment-controller Scaled up replica set tomcat-deployment-54b6fc9b99 to 1 Normal ScalingReplicaSet 1m deployment-controller Scaled down replica set tomcat-deployment-5768d76c66 to 1 Normal ScalingReplicaSet 1m deployment-controller Scaled up replica set tomcat-deployment-54b6fc9b99 to 2 Normal ScalingReplicaSet 52s deployment-controller Scaled down replica set tomcat-deployment-5768d76c66 to 0 [root@localhost ~]#
|
1 2 3 4
| [root@localhost ~] Image: tomcat:9.0 [root@localhost ~]
|
查看deployments版本:
由于之前在升级(set image)时添加--record
参数,故在查看历史版本时会记录操作的指令
1 2 3 4 5 6 7
| [root@localhost ~] deployments "tomcat-deployment" REVISION CHANGE-CAUSE 1 <none> 2 kubectl set image deployment tomcat-deployment tomcat=tomcat:9.0 --namespace=web --record=true [root@localhost ~]
|
版本回滚
1 2 3 4 5 6 7 8 9
| [root@localhost ~] deployment "tomcat-deployment" [root@localhost ~] deployments "tomcat-deployment" REVISION CHANGE-CAUSE 2 kubectl set image deployment tomcat-deployment tomcat=tomcat:9.0 --namespace=web --record=true 3 <none> [root@localhost ~]
|
查看回滚版本结果:
tomcat成功回退到8.0版本了
1 2 3 4 5 6 7 8
| [root@localhost ~] NAME READY STATUS RESTARTS AGE IP NODE tomcat-deployment-5768d76c66-bjwlc 1/1 Running 0 3m 172.30.5.2 192.168.1.205 tomcat-deployment-5768d76c66-bz5rf 1/1 Running 0 2m 172.30.79.2 192.168.1.204 [root@localhost ~] [root@localhost ~] Image: tomcat:8.0 [root@localhost ~]
|
查看版本回滚历史:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
| [root@localhost ~] Name: tomcat-deployment Namespace: web CreationTimestamp: Mon, 28 May 2018 17:28:27 +0800 Labels: app=tomcat Annotations: deployment.kubernetes.io/revision=3 kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"apps/v1beta2","kind":"Deployment","metadata":{"annotations":{},"name":"tomcat-deployment","namespace":"web"},"spec":{"replicas":2,"selec...Selector: app=tomcat Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 25% max unavailable, 25% max surge Pod Template: Labels: app=tomcat Containers: tomcat: Image: tomcat:8.0 Port: 8080/TCP Environment: <none> Mounts: /etc/localtime from time (ro) Volumes: time: Type: HostPath (bare host directory volume) Path: /etc/localtime HostPathType: Conditions: Type Status Reason ---- ------ ------ Available True MinimumReplicasAvailable Progressing True NewReplicaSetAvailable OldReplicaSets: <none> NewReplicaSet: tomcat-deployment-5768d76c66 (2/2 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 55m deployment-controller Scaled up replica set tomcat-deployment-54b6fc9b99 to 1 Normal ScalingReplicaSet 55m deployment-controller Scaled down replica set tomcat-deployment-5768d76c66 to 1 Normal ScalingReplicaSet 55m deployment-controller Scaled up replica set tomcat-deployment-54b6fc9b99 to 2 Normal ScalingReplicaSet 54m deployment-controller Scaled down replica set tomcat-deployment-5768d76c66 to 0 Normal ScalingReplicaSet 4m deployment-controller Scaled up replica set tomcat-deployment-5768d76c66 to 1 Normal DeploymentRollback 4m deployment-controller Rolled back deployment "tomcat-deployment" to revision 1 Normal ScalingReplicaSet 4m (x2 over 58m) deployment-controller Scaled up replica set tomcat-deployment-5768d76c66 to 2 Normal ScalingReplicaSet 4m deployment-controller Scaled down replica set tomcat-deployment-54b6fc9b99 to 1 Normal ScalingReplicaSet 3m deployment-controller Scaled down replica set tomcat-deployment-54b6fc9b99 to 0 [root@localhost ~]#
|
注:在升级容器应用版本和回滚时,测试发现会约有2-3秒时间会出现应用访问故障。
测试情况为:Nginx反向代理所有Node IP地址+应用端口号。建议该应用端口号在Label Node上全局唯一。
参数说明:
参考:https://www.jianshu.com/p/6bc8e0ae65d1
https://www.ipcpu.com/2017/09/kubernetes-rolling-update/
maxSurge
与maxUnavailable
maxSurge: 1 表示滚动升级时会先启动1个pod;maxUnavailable: 1 表示滚动升级时允许的最大Unavailable的pod个数
terminationGracePeriodSeconds
k8s将会给应用发送SIGTERM信号,可以用来正确、优雅地关闭应用,默认为30秒。如果需要更优雅地关闭,则可以使用k8s提供的pre-stop lifecycle hook 的配置声明,将会在发送SIGTERM之前执行
livenessProbe
与readinessProbe
livenessProbe是kubernetes认为该pod是存活的,不存在则需要kill掉,然后再新启动一个,以达到replicas指定的个数;readinessProbe是kubernetes认为该pod是启动成功的,这里根据每个应用的特性,自己去判断,可以执行command,也可以进行httpGet
Q & A
Question1:
如果报下面的错误,是k8s在调度时发现无可用的Node,因应用需要的主机端口在Node上被占用,无法完成RollingUpdate。
1
| Warning FailedScheduling 12s (x26 over 6m) default-scheduler 0/2 nodes are available: 2 PodFitsHostPorts.
|
思考:
应用升级成功,升级后NodeIP改变,怎么访问容器应用?
Answer:直接通过POD的IP和端口号可以访问容器应用,但POD的IP地址是不可靠的,如:当POD所在Node发生故障时,POD会被K8S重新调度到另一个POD,这是POD的IP地址就发生了变化,应用将无法访问。部署分布式容器应用,多个实例同时提供服务,容器应用前面配置一个负载均衡来实现转发。K8S的kind:server
就是用来解决这些核心问题的。
测试经过:kind:Deployment+service
部署的应用,访问任何一个NodeIP+端口号都可以,kind:Deloyment
部署应用,访问应用只能用所在的Node IP+端口号才可以。
本文出自”Jack Wang Blog”:http://www.yfshare.vip/2018/05/28/k8s滚动升级-RollingUpdate/