Kubernetes是一个开源的,用于管理云平台中多个主机上的容器化的应用,Kubernetes的目标是让部署容器化的应用简单并且高效(powerful),Kubernetes提供了应用部署,规划,更新,维护的一种机制。Kubernetes一个核心的特点就是能够自主的管理容器来保证云平台中的容器按照用户的期望状态运行。 环境: Centos 7.4.1708 dockers 18.02.0-ce-rc1 kubernetes v1.9.2 etcd 3.2.15
k8s下载地址:https://github.com/kubernetes/kubernetes/releases https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.9.md#v192
基础配置 同步时间,所有节点均操作1
2
3
4
5
* */3 * * * ntpdate s1a.time.edu.cn &> /dev/null
设置主机名1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@master1 ~]
[root@master2 ~]
[root@master3 ~]
[root@node1 ~]
[root@node2 ~]
192.168.1.195 master1.example.com master1
192.168.1.196 master2.example.com master2
192.168.1.197 master3.example.com master3
192.168.1.198 node1.example.com node1
192.168.1.199 node2.example.com node2
关闭防火墙和Selinux,所有节点都操作1
2
3
4
iptables -F
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
环境说明
角色
IP
组件
Kube-Master + ETCD1
192.168.1.195
etcd、kube-apiserver、kube-controller-manager、kube-scheduler、Flannel
ETCD2
192.168.1.196
etcd
ETCD3
192.168.1.197
etcd
Kube-Node1
192.168.1.198
kubelet、kube-proxy、docker、Flannel
Kube-Node2
192.168.1.199
kubelet、kube-proxy、docker、Flannel
名称
网段/地址
参数
备注
Service_CIDR
172.16.0.0/16
--service-cluster-ip-range
服务 网段
Cluster_CIDR
172.30.0.0/16
--cluster-cidr
POD 网段
CLUSTER_KUBERNETES_SVC_IP
172.16.0.1
-
kubernetes 服务IP,SERVICE_CIDR 中第一个IP
CLUSTER_DNS_SVC_IP
172.16.0.2
-
集群DNS 服务IP,SERVICE_CIDR 中第二个IP
NODE_PORT_RANGE
8400-9000
服务端口 范围
CLUSTER_DNS_DOMAIN
-
cluster.local.
集群DNS域名
FLANNEL_ETCD_PREFIX
-
/kubernetes/network
flanneld 网络配置前缀
安装Docker 在node1/node2这2个节点都安装docker引擎1
2
3
4
5
6
7
8
yum remove docker docker-common docker-selinux docker-engine -y
yum install -y yum-utils device-mapper-persistent-data lvm2
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum-config-manager --enable docker-ce-edge
yum-config-manager --enable docker-ce-test
yum install docker-ce -y
systemctl enable docker
systemctl start docker
创建 CA 证书和秘钥 kubernetes 系统各组件需要使用 TLS 证书对通信进行加密,这里使用 CloudFlare 的 PKI 工具集 cfssl 来生成 Certificate Authority (CA) 证书和秘钥文件,CA 是自签名的证书,用来签名后续创建的其它 TLS 证书
安装 CFSSL cfssl下载地址:https://github.com/cloudflare/cfssl/releases cfssl R1.2工具包本地下载 1
2
3
4
5
6
7
8
9
[root@master1 ~]
[root@master1 ~]
[root@master1 ~]
[root@master1 ~]
[root@master1 ~]
[root@master1 ssl]
[root@master1 ssl]
创建 CA (Certificate Authority)
证书名称
配置文件
用途
ca.pem
ca-config.json
CA 配置文件
etcd.pem
ca-csr.json
CA 证书
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[root@master1 ssl]
{
"signing" : {
"default" : {
"expiry" : "8760h"
},
"profiles" : {
"kubernetes" : {
"usages" : [
"signing" ,
"key encipherment" ,
"server auth" ,
"client auth"
],
"expiry" : "8760h"
}
}
}
}
[root@master1 ssl]
ca-config.json:可以定义多个 profiles,分别指定不同的过期时间、使用场景等参数;后续在签名证书时使用某个 profile;
signing:表示该证书可用于签名其它证书;生成的 ca.pem 证书中 CA=TRUE;
server auth:表示 client 可以用该 CA 对 server 提供的证书进行验证;
client auth:表示 server 可以用该 CA 对 client 提供的证书进行验证;
创建 CA 证书签名请求 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[root@master1 ssl]
{
"CN" : "kubernetes" ,
"key" : {
"algo" : "rsa" ,
"size" : 2048
},
"names" : [
{
"C" : "CN" ,
"ST" : "BeiJing" ,
"L" : "BeiJing" ,
"O" : "k8s" ,
"OU" : "System"
}
]
}
[root@master1 ssl]
“CN”:Common Name,kube-apiserver 从证书中提取该字段作为请求的用户名 (User Name);浏览器使用该字段验证网站是否合法;
“O”:Organization,kube-apiserver 从证书中提取该字段作为请求用户所属的组 (Group);
生成 CA 证书和私钥 1
2
3
4
5
[root@master1 ssl]
[root@master1 ssl]
ca-config.json ca.csr ca-csr.json ca-key.pem ca.pem
[root@master1 ssl]
分发证书 1
2
3
4
5
[root@master1 ssl]
[root@master1 ssl]
[root@master1 ssl]
ca-config.json ca.csr ca-csr.json ca-key.pem ca.pem
[root@master1 ssl]
1
2
[root@master1 ssl]
[root@master1 ssl]
1
2
[root@master1 ssl]
[root@master1 ssl]
1
2
[root@master1 ssl]
[root@master1 ssl]
1
2
[root@master1 ssl]
[root@master1 ssl]
部署高可用 etcd 集群 需要关闭 selinux,关闭防火墙,ntpdate 时间同步 kuberntes 系统使用 etcd 存储所有数据,这里和kuberntes master安装到一起 三个etcd分别取名为:etcd1、etcd2、etcd3
安装 Etcd 三个节点master均安装 etcd下载地址:https://github.com/coreos/etcd/releases etcd-v3.2.15本地下载 1
2
3
[root@master1 ~]
[root@master1 ~]
[root@master1 ~]
创建 TLS 秘钥和证书 为了保证通信安全,客户端(如 etcdctl) 与 etcd 集群、etcd 集群之间的通信需要使用 TLS 加密
创建 etcd 证书签名请求1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[root@master1 ~]
[root@master1 ~]
[root@master1 ssl]
{
"CN" : "etcd" ,
"hosts" : [
"127.0.0.1" ,
"192.168.1.195" ,
"192.168.1.196" ,
"192.168.1.197"
],
"key" : {
"algo" : "rsa" ,
"size" : 2048
},
"names" : [
{
"C" : "CN" ,
"ST" : "BeiJing" ,
"L" : "BeiJing" ,
"O" : "k8s" ,
"OU" : "System"
}
]
}
[root@master1 ssl]
hosts 字段指定授权使用该证书的 etcd 节点 IP
生成 etcd 证书和私钥1
2
3
4
5
[root@master1 ssl]
[root@master1 ssl]
etcd.csr etcd-csr.json etcd-key.pem etcd.pem
[root@master1 ssl]
[root@master1 ssl]
创建 etcd 的 systemd unit 文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[root@master1 ~]
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos
[Service]
Type=notify
WorkingDirectory=/var/lib/etcd/
ExecStart=/usr/local /bin/etcd \
--name=etcd1 \
--cert-file=/etc/etcd/ssl/etcd.pem \
--key-file=/etc/etcd/ssl/etcd-key.pem \
--peer-cert-file=/etc/etcd/ssl/etcd.pem \
--peer-key-file=/etc/etcd/ssl/etcd-key.pem \
--trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
--peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
--initial-advertise-peer-urls=https://192.168.1.195:2380 \
--listen-peer-urls=https://192.168.1.195:2380 \
--listen-client-urls=https://192.168.1.195:2379,http://127.0.0.1:2379 \
--advertise-client-urls=https://192.168.1.195:2379 \
--initial-cluster-token=etcd-cluster-0 \
--initial-cluster=etcd1=https://192.168.1.195:2380,etcd2=https://192.168.1.196:2380,etcd3=https://192.168.1.197:2380 \
--initial-cluster-state=new \
--data-dir=/var/lib/etcd
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
[root@master1 ~]
指定 etcd 的工作目录和数据目录为/var/lib/etcd
,需在启动服务前创建这个目录
--name
为当前etcd的名称,每个etcd节点名字不能相同
--initial-advertise-peer-urls
,--listen-peer-urls
,--listen-client-urls
,--advertise-client-urls
这四个参数需要修改为当前节点的IP地址
--initial-cluster
参数为,etcd集群所有节点的IP地址
为了保证通信安全,需要指定 etcd 的公私钥(cert-file
和key-file
)、Peers 通信的公私钥和 CA 证书(peer-cert-file
、peer-key-file
、peer-trusted-ca-file
)、客户端的CA证书(trusted-ca-file
)
--initial-cluster-state
值为 new 时,--name
的参数值必须位于 --initial-cluster
列表中
分发etcd证书1
2
3
[root@master1 ~]
[root@master1 ~]
[root@master1 ~]
1
2
3
[root@master1 ~]
[root@master1 ~]
[root@master1 ~]
分发etcd.service1
2
[root@master1 ~]
[root@master1 ~]
启动 etcd 服务 依次启动所有节点的etcd服务1
2
3
systemctl daemon-reload
systemctl enable etcd
systemctl start etcd
最先启动的 etcd 进程会卡住一段时间,再等待其它节点上的 etcd 进程加入集群,这是正常现象
1
2
3
4
5
[root@master1 ~]
[root@master1 ~]
https://192.168.1.197:2379 is healthy: successfully committed proposal: took = 1.631081ms
https://192.168.1.195:2379 is healthy: successfully committed proposal: took = 1.187637ms
https://192.168.1.196:2379 is healthy: successfully committed proposal: took = 1.461928ms
部署Flannel 网络 Flannel是 CoreOS 团队针对 Kubernetes 设计的一个覆盖网络(Overlay Network)工具,其目的在于帮助每一个使用 Kuberentes 的 CoreOS 主机拥有一个完整的子网。简单来说,它的功能是让集群中的不同节点主机创建的Docker容器都具有全集群唯一的虚拟IP地址,因为在默认的Docker配置中,每个节点上的Docker服务会分别负责所在节点容器的IP分配。这样会导致一个问题,不同节点上容器可能获得相同的内外IP地址。Flannel的设计目的就是为集群中的所有节点重新规划IP地址的使用规则,从而使得不同节点上的容器能够获得“同属一个内网”且”不重复的”IP地址,并让属于不同节点上的容器能够直接通过内网IP通信。
Flannel通过Etcd服务维护了一张节点间的路由表 参考:http://dockone.io/article/618
flannel下载地址:https://github.com/coreos/flannel/releases flannel-v0.10.0本地下载
节点名称
节点IP
分配地址
node1
192.168.1.198
172.30.57.0
node2
192.168.1.199
172.30.41.0
在安装有cfssl工具的服务器上操作
创建 TLS 秘钥和证书 etcd 集群启用了双向 TLS 认证,所以需要为 flanneld 指定与 etcd 集群通信的 CA 和秘钥
创建 CA 配置文件 创建 flanneld 证书签名请求1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[root@master1 ssl]
{
"CN" : "flanneld" ,
"hosts" : [],
"key" : {
"algo" : "rsa" ,
"size" : 2048
},
"names" : [
{
"C" : "CN" ,
"ST" : "BeiJing" ,
"L" : "BeiJing" ,
"O" : "k8s" ,
"OU" : "System"
}
]
}
[root@master1 ssl]
生成 flanneld 证书和私钥1
2
3
4
5
[root@master1 ssl]
[root@master1 ssl]
flanneld.csr flanneld-csr.json flanneld-key.pem flanneld.pem
[root@master1 ssl]
[root@master1 ssl]
分发flanneld证书1
2
[root@master1 ssl]
[root@master1 ssl]
1
2
[root@master1 ssl]
[root@master1 ssl]
向 etcd 写入集群 Pod 网段信息 使用 etcd v2 API 写入配置 前面执行了export ETCDCTL_API=3
,这个命令会调用etcd v3 API
,所以需要重新打开一个xshell窗口,否则会报错1
2
3
[root@master1 ~]
{"Network" :"172.30.0.0/16" , "SubnetLen" : 24, "Backend" : {"Type" : "vxlan" }}
[root@master1 ~]
1
2
3
4
[root@master1 ~]
{"Network" :"172.30.0.0/16" , "SubnetLen" : 24, "Backend" : {"Type" : "vxlan" }}
[root@master1 ~]
安装和配置 flanneld 在node1/node2节点安装,master不安装flannel1
2
3
4
[root@node1 ~]
[root@node1 ~]
[root@node1 ~]
[root@node1 ~]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[root@node1 ~]
[Unit]
Description=Flanneld overlay address etcd agent
After=network.target
After=network-online.target
Wants=network-online.target
After=etcd.service
Before=docker.service
[Service]
Type=notify
ExecStart=/usr/local /bin/flanneld \
-etcd-cafile=/etc/kubernetes/ssl/ca.pem \
-etcd-certfile=/etc/flanneld/ssl/flanneld.pem \
-etcd-keyfile=/etc/flanneld/ssl/flanneld-key.pem \
-etcd-endpoints=https://192.168.1.195:2379,https://192.168.1.195:2379,https://192.168.1.195:2379 \
-etcd-prefix=/kubernetes/network
ExecStartPost=/usr/local /bin/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/docker
Restart=on-failure
[Install]
WantedBy=multi-user.target
RequiredBy=docker.service
[root@node1 ~]
mk-docker-opts.sh脚本将分配给 flanneld 的 Pod 子网网段信息写入到 /run/flannel/docker 文件中,后续 docker 启动时使用这个文件中参数值设置 docker0 网桥;
flanneld 使用系统缺省路由所在的接口和其它节点通信,对于有多个网络接口的机器(如,内网和公网),可以用 –iface 选项值指定通信接口(上面的 systemd unit 文件没指定这个选项),如本着 Vagrant + Virtualbox,就要指定–iface=enp0s8;
启动 flanneld1
2
3
systemctl daemon-reload
systemctl enable flanneld
systemctl start flanneld
docker集成flanneld网络 让docker0网卡使用flanneld网络网段地址docker.service-yum 安装1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[root@node1 ~]
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service
Wants=network-online.target
[Service]
Type=notify
ExecStart=/usr/bin/dockerd $DOCKER_NETWORK_OPTIONS
EnvironmentFile=/run/flannel/docker
ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
TimeoutStartSec=0
Delegate=yes
KillMode=process
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s
[Install]
WantedBy=multi-user.target
[root@node1 ~]
检查 flanneld 服务1
2
3
4
5
6
7
8
9
10
11
[root@node1 ~]
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 172.30.57.0 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::7890:e0ff:fe20:836e prefixlen 64 scopeid 0x20<link>
ether 7a:90:e0:20:83:6e txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 8 overruns 0 carrier 0 collisions 0
[root@node1 ~]
检查分配给各 flanneld 的 Pod 网段信息 查看集群 Pod 网段(/16)1
2
3
[root@master1 ~]
{"Network" :"172.30.0.0/16" , "SubnetLen" : 24, "Backend" : {"Type" : "vxlan" }}
[root@master1 ~]
查看已分配的 Pod 子网段列表(/24)1
2
[root@master1 ~]
/kubernetes/network/subnets/172.30.57.0-24
查看某一 Pod 网段对应的 flanneld 进程监听的 IP 和网络参数1
2
3
4
[root@master1 ~]
/flanneld/ssl/flanneld.pem --key-file=/etc/flanneld/ssl/flanneld-key.pem get /kubernetes/network/subnets/172.30.57.0-24
{"PublicIP" :"192.168.1.198" ,"BackendType" :"vxlan" ,"BackendData" :{"VtepMAC" :"7a:90:e0:20:83:6e" }}
[root@master1 ~]
确保各节点间 Pod 网段能互联互通 在各节点上部署完 Flannel 后,查看已分配的 Pod 子网段列表(/24)1
2
3
4
[root@master1 ~]
/kubernetes/network/subnets/172.30.41.0-24
/kubernetes/network/subnets/172.30.57.0-24
[root@master1 ~]
当前两个节点分配的 Pod 网段分别是:172.30.42.0-24、172.30.100.0-241
2
3
4
5
6
[root@node1 ~]
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
inet 172.30.57.1/24 brd 172.30.57.255 scope global docker0
4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
inet 172.30.57.0/32 scope global flannel.1
[root@node1 ~]
1
2
3
4
5
6
[root@node2 ~]
3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
inet 172.30.42.0/32 scope global flannel.1
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
inet 172.30.42.1/24 brd 172.30.41.255 scope global docker0
[root@node2 ~]
在各节点上分配 ping 这两个网段的网关地址,确保能通1
2
[root@node1 ~]
[root@node1 ~]
部署 k8s master 节点 kubernetes master 节点包含的组件:
kube-apiserver
kube-scheduler
kube-controller-manager
这三个组件需要部署在同一台机器上
kube-scheduler
、kube-controller-manager
和 kube-apiserver
三者的功能紧密相关;
同时只能有一个 kube-scheduler
、kube-controller-manager
进程处于工作状态,如果运行多个,则需要通过选举产生一个 leader;
kubernetes发布版 tarball(下载脚本) 下载地址:https://github.com/kubernetes/kubernetes/releases kubernetes CHANGELOG(server/client) 下载地址:https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG.md
Service
通讯地址
kube-apiserver
192.168.1.195:6443
kube-controll
127.0.0.1:10252
kube-schedule
127.0.0.1:10251
百度网盘packages目录,密码:nwzk 1
2
3
4
5
[root@master1 ~]
[root@master1 ~]
[root@master1 ~]
[root@master1 kubernetes]
[root@master1 kubernetes]
创建 kubernetes 证书 创建 kubernetes 证书签名请求1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[root@master1 ~]
[root@master1 ssl]
{
"CN" : "kubernetes" ,
"hosts" : [
"127.0.0.1" ,
"192.168.1.195" ,
"172.16.0.1" ,
"kubernetes" ,
"kubernetes.default" ,
"kubernetes.default.svc" ,
"kubernetes.default.svc.cluster" ,
"kubernetes.default.svc.cluster.local"
],
"key" : {
"algo" : "rsa" ,
"size" : 2048
},
"names" : [
{
"C" : "CN" ,
"ST" : "BeiJing" ,
"L" : "BeiJing" ,
"O" : "k8s" ,
"OU" : "System"
}
]
}
[root@master1 ssl]
192.168.1.195
为当前部署的 master 机器 IP
172.16.0.1
kubernetes 服务 IP (预分配,一般是 SERVICE_CIDR 中第一个IP)
如果 hosts 字段不为空则需要指定授权使用该证书的 IP 或域名列表,所以上面分别指定了当前部署的 master 节点主机 IP;
还需要添加 kube-apiserver 注册的名为 kubernetes 的服务 IP (Service Cluster IP),一般是 kube-apiserver –service-cluster-ip-range 选项值指定的网段的第一个IP,如 “172.16.0.1”;
生成 kubernetes 证书和私钥1
2
3
4
5
6
[root@master1 ~]
[root@master1 ssl]
[root@master1 ssl]
kubernetes.csr kubernetes-csr.json kubernetes-key.pem kubernetes.pem
[root@master1 ssl]
[root@master1 ssl]
配置和启动 kube-apiserver 创建 kube-apiserver 使用的客户端 token 文件 kubelet 首次启动时向 kube-apiserver 发送 TLS Bootstrapping 请求,kube-apiserver 验证 kubelet 请求中的 token 是否与它配置的 token.csv 一致,如果一致则自动为 kubelet生成证书和秘钥1
2
3
4
5
6
7
8
9
10
[root@master1 ~]
6240b18d950d086ff9eb596e215d243f
[root@master1 ssl]
6240b18d950d086ff9eb596e215d243f,kubelet-bootstrap,10001,"system:kubelet-bootstrap"
[root@master1 ssl]
[root@master1 ssl]
admin,admin@123,1
readonly ,readonly ,2
[root@master1 ssl]
创建 kube-apiserver 的 systemd unit 文件1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
[root@master1 ssl]
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target
[Service]
ExecStart=/usr/local /bin/kube-apiserver \
--admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota \
--advertise-address=192.168.1.195 \
--bind-address=192.168.1.195 \
--insecure-bind-address=192.168.1.195 \
--authorization-mode=RBAC \
--runtime-config=rbac.authorization.k8s.io/v1alpha1 \
--kubelet-https=true \
--token-auth-file=/etc/kubernetes/token.csv \
--service-cluster-ip-range=172.16.0.0/16 \
--service-node-port-range=8400-9000 \
--tls-cert-file=/etc/kubernetes/ssl/kubernetes.pem \
--tls-private-key-file=/etc/kubernetes/ssl/kubernetes-key.pem \
--client-ca-file=/etc/kubernetes/ssl/ca.pem \
--service-account-key-file=/etc/kubernetes/ssl/ca-key.pem \
--etcd-cafile=/etc/kubernetes/ssl/ca.pem \
--etcd-certfile=/etc/kubernetes/ssl/kubernetes.pem \
--etcd-keyfile=/etc/kubernetes/ssl/kubernetes-key.pem \
--etcd-servers=https://192.168.1.195:2379,https://192.168.1.196:2379,https://192.168.1.197:2379 \
--enable-swagger-ui=true \
--allow-privileged=true \
--apiserver-count=3 \
--audit-log-maxage=30 \
--audit-log-maxbackup=3 \
--audit-log-maxsize=100 \
--audit-log-path=/var/lib/audit.log \
--event-ttl=1h \
--v=2
Restart=on-failure
RestartSec=5
Type=notify
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
[root@master1 ssl]
--bind-address=192.168.1.195
为当前部署的 master 机器 IP,但不能为 127.0.0.1
--insecure-bind-address=192.168.1.195
为当前部署的 master 机器 IP
--service-cluster-ip-range
指定 Service Cluster IP 地址段,该地址段不能路由可达
--service-node-port-range="8400-9000"
指定 NodePort 的端口范围
--etcd-servers="https://192.168.1.195:2379,https://192.168.1.195:2379,https://192.168.1.195:2379"
为etcd 集群服务地址列表
kube-apiserver 1.6 版本开始使用 etcd v3 API 和存储格式
--authorization-mode=RBAC
指定在安全端口使用 RBAC 授权模式,拒绝未通过授权的请求
kube-scheduler、kube-controller-manager 一般和 kube-apiserver 部署在同一台机器上,它们使用非安全端口和 kube-apiserver通信
kube-proxy、kubectl 通过在使用的证书里指定相关的 User、Group 来达到通过 RBAC 授权的目的
如果使用了 kubelet TLS Boostrap 机制,则不能再指定--kubelet-certificate-authority
,--kubelet-client-certificate
和--kubelet-client-key
选项,否则后续 kube-apiserver 校验 kubelet 证书时出现 ”x509: certificate signed by unknown authority“ 错误
启动kube-apiserver1
2
3
[root@master1 ~]
[root@master1 ~]
[root@master1 ~]
1
2
3
4
[root@master1 ~]
tcp 0 0 192.168.1.195:6443 0.0.0.0:* LISTEN 19206/kube-apiserve
tcp 0 0 192.168.1.195:8080 0.0.0.0:* LISTEN 19206/kube-apiserve
[root@master1 ~]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[root@master1 ~]
● kube-apiserver.service - Kubernetes API Server
Loaded: loaded (/etc/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2018-01-29 10:53:14 CST; 15min ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Main PID: 19206 (kube-apiserver)
CGroup: /system.slice/kube-apiserver.service
└─19206 /usr/local /bin/kube-apiserver --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota --advertise-address=...
Jan 29 11:08:14 master1.example.com kube-apiserver[19206]: I0129 11:08:14.947346 19206 wrap.go:42] PUT /apis/apiregistration.k8s.io/v1beta1/apiservices/v1beta...95:42960]
Jan 29 11:08:14 master1.example.com kube-apiserver[19206]: I0129 11:08:14.947470 19206 wrap.go:42] PUT /apis/apiregistration.k8s.io/v1beta1/apiservices/v1beta...95:42960]
Jan 29 11:08:14 master1.example.com kube-apiserver[19206]: I0129 11:08:14.948643 19206 wrap.go:42] PUT /apis/apiregistration.k8s.io/v1beta1/apiservices/v1beta...95:42960]
Jan 29 11:08:14 master1.example.com kube-apiserver[19206]: I0129 11:08:14.997905 19206 wrap.go:42] GET /api/v1/services: (1.191044ms) 200 [[kube-apiserver/v1....95:42960]
Jan 29 11:08:15 master1.example.com kube-apiserver[19206]: I0129 11:08:15.003220 19206 wrap.go:42] GET /api/v1/services: (1.016741ms) 200 [[kube-apiserver/v1....95:42960]
Jan 29 11:08:15 master1.example.com kube-apiserver[19206]: I0129 11:08:15.054109 19206 wrap.go:42] GET /api/v1/namespaces/kube-system: (1.388292ms) 200 [[kube...95:42960]
Jan 29 11:08:15 master1.example.com kube-apiserver[19206]: I0129 11:08:15.055377 19206 wrap.go:42] GET /api/v1/namespaces/kube-public: (1.053295ms) 200 [[kube...95:42960]
Jan 29 11:08:15 master1.example.com kube-apiserver[19206]: I0129 11:08:15.405477 19206 wrap.go:42] GET /api/v1/namespaces/default: (1.48537ms) 200 [[kube-apis...95:42960]
Jan 29 11:08:15 master1.example.com kube-apiserver[19206]: I0129 11:08:15.407091 19206 wrap.go:42] GET /api/v1/namespaces/default/services/kubernetes: (1.0559...95:42960]
Jan 29 11:08:15 master1.example.com kube-apiserver[19206]: I0129 11:08:15.408295 19206 wrap.go:42] GET /api/v1/namespaces/default/endpoints/kubernetes: (971.4….195:42960
]Hint: Some lines were ellipsized, use -l to show in full.
[root@master1 ~]
配置和启动 kube-controller-manager 创建 kube-controller-manager 的 systemd unit 文件1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[root@master1 ssl]
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
[Service]
ExecStart=/usr/local /bin/kube-controller-manager \
--address=127.0.0.1 \
--master=http://192.168.1.195:8080 \
--allocate-node-cidrs=true \
--service-cluster-ip-range=172.16.0.0/16 \
--cluster-cidr=172.30.0.0/16 \
--cluster-name=kubernetes \
--cluster-signing-cert-file=/etc/kubernetes/ssl/ca.pem \
--cluster-signing-key-file=/etc/kubernetes/ssl/ca-key.pem \
--service-account-private-key-file=/etc/kubernetes/ssl/ca-key.pem \
--root-ca-file=/etc/kubernetes/ssl/ca.pem \
--leader-elect=true \
--v=2
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
[root@master1 ssl]
--address
值必须为 127.0.0.1,因为当前 kube-apiserver 期望 scheduler 和 controller-manager 在同一台机器
--master=http://192.168.1.195:8080
:使用非安全 8080 端口与 kube-apiserver 通信
--cluster-cidr
指定 Cluster 中 Pod 的 CIDR 范围,该网段在各 Node 间必须路由可达(flanneld保证)
--service-cluster-ip-range
参数指定 Cluster 中 Service 的CIDR范围,该网络在各 Node 间必须路由不可达,必须和 kube-apiserver 中的参数一致
--cluster-signing-*
指定的证书和私钥文件用来签名为 TLS BootStrap 创建的证书和私钥
--root-ca-file
用来对 kube-apiserver 证书进行校验,指定该参数后,才会在Pod 容器的 ServiceAccount 中放置该 CA 证书文件
--leader-elect=true
部署多台机器组成的 master 集群时选举产生一处于工作状态的 kube-controller-manager 进程
启动 kube-controller-manager1
2
3
[root@master1 ~]
[root@master1 ~]
[root@master1 ~]
1
2
3
[root@master1 ~]
tcp 0 0 127.0.0.1:10252 0.0.0.0:* LISTEN 19300/kube-controll
[root@master1 ~]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[root@master1 ~]
● kube-controller-manager.service - Kubernetes Controller Manager
Loaded: loaded (/etc/systemd/system/kube-controller-manager.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2018-02-01 14:57:01 CST; 41s ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Main PID: 7798 (kube-controller)
CGroup: /system.slice/kube-controller-manager.service
└─7798 /usr/local /bin/kube-controller-manager --address=127.0.0.1 --master=http://192.168.1.195:8080 --allocate-node-cidrs=true --service-cluster-ip-range=172...
Jan 29 11:10:14 master1.example.com kube-controller-manager[7798]: I0201 14:57:12.832889 7798 controller_utils.go:1019] Waiting for caches to sync for cidrall...ntroller
Jan 29 11:10:14 master1.example.com kube-controller-manager[7798]: I0201 14:57:12.832920 7798 taint_controller.go:181] Starting NoExecuteTaintManager
Jan 29 11:10:15 master1.example.com kube-controller-manager[7798]: I0201 14:57:12.932953 7798 controller_utils.go:1026] Caches are synced for cidrallocator controller
Jan 29 11:10:15 master1.example.com kube-controller-manager[7798]: I0201 14:57:13.549661 7798 resource_quota_controller.go:434] syncing resource quota control... {apps v
Jan 29 11:10:15 master1.example.com kube-controller-manager[7798]: I0201 14:57:13.549766 7798 controller_utils.go:1019] Waiting for caches to sync for resourc...ntroller
Jan 29 11:10:15 master1.example.com kube-controller-manager[7798]: I0201 14:57:13.649852 7798 controller_utils.go:1026] Caches are synced for resource quota controller
Jan 29 11:10:15 master1.example.com kube-controller-manager[7798]: I0201 14:57:13.696714 7798 garbagecollector.go:182] syncing garbage collector with updated ...onregist
Jan 29 11:10:15 master1.example.com kube-controller-manager[7798]: I0201 14:57:15.046578 7798 controller_utils.go:1019] Waiting for caches to sync for garbage...ntroller
Jan 29 11:10:15 master1.example.com kube-controller-manager[7798]: I0201 14:57:15.146718 7798 controller_utils.go:1026] Caches are synced for garbage collecto...ntroller
Jan 29 11:10:15 master1.example.com kube-controller-manager[7798]: I0201 14:57:15.146731 7798 garbagecollector.go:219] synced garbage collector
Hint: Some lines were ellipsized, use -l to show in full.
[root@master1 ~]
配置和启动 kube-scheduler 创建 kube-scheduler 的 systemd unit 文件1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[root@master1 ssl]
[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
[Service]
ExecStart=/usr/local /bin/kube-scheduler \
--address=127.0.0.1 \
--master=http://192.168.1.195:8080 \
--leader-elect=true \
--v=2
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
[root@master1 ssl]
--address
值必须为 127.0.0.1,因为当前 kube-apiserver 期望 scheduler 和 controller-manager 在同一台机器
--master=http:/192.168.1.195:8080
:使用非安全 8080 端口与 kube-apiserver 通信
--leader-elect=true
部署多台机器组成的 master 集群时选举产生一处于工作状态的 kube-controller-manager 进程
启动 kube-scheduler1
2
3
[root@master1 ~]
[root@master1 ~]
[root@master1 ~]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[root@master1 ~]
● kube-scheduler.service - Kubernetes Scheduler
Loaded: loaded (/etc/systemd/system/kube-scheduler.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2018-01-29 11:27:07 CST; 2min 34s ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Main PID: 19360 (kube-scheduler)
CGroup: /system.slice/kube-scheduler.service
└─19360 /usr/local /bin/kube-scheduler --address=127.0.0.1 --master=http://192.168.1.195:8080 --leader-elect=true --v=2
Jan 29 11:27:07 master1.example.com systemd[1]: Starting Kubernetes Scheduler...
Jan 29 11:27:07 master1.example.com kube-scheduler[19360]: W0129 11:27:07.324795 19360 server.go:159] WARNING: all flags than --config are deprecated. Please ...ile ASAP.
Jan 29 11:27:07 master1.example.com kube-scheduler[19360]: I0129 11:27:07.325298 19360 server.go:551] Version: v1.9.2
Jan 29 11:27:07 master1.example.com kube-scheduler[19360]: I0129 11:27:07.325412 19360 factory.go:837] Creating scheduler from algorithm provider 'DefaultProvider'
Jan 29 11:27:07 master1.example.com kube-scheduler[19360]: I0129 11:27:07.325419 19360 factory.go:898] Creating scheduler with fit predicates 'map[CheckNodeDi...{} NoDisk
Jan 29 11:27:07 master1.example.com kube-scheduler[19360]: I0129 11:27:07.325564 19360 server.go:570] starting healthz server on 127.0.0.1:10251
Jan 29 11:27:08 master1.example.com kube-scheduler[19360]: I0129 11:27:08.126353 19360 controller_utils.go:1019] Waiting for caches to sync for scheduler controller
Jan 29 11:27:08 master1.example.com kube-scheduler[19360]: I0129 11:27:08.226450 19360 controller_utils.go:1026] Caches are synced for scheduler controller
Jan 29 11:27:08 master1.example.com kube-scheduler[19360]: I0129 11:27:08.226468 19360 leaderelection.go:174] attempting to acquire leader lease...
Jan 29 11:27:08 master1.example.com kube-scheduler[19360]: I0129 11:27:08.232415 19360 leaderelection.go:184] successfully acquired lease kube-system/kube-scheduler
Hint: Some lines were ellipsized, use -l to show in full.
[root@master1 ~]#
1
2
3
[root@master1 ~]
tcp 0 0 127.0.0.1:10251 0.0.0.0:* LISTEN 19360/kube-schedule
[root@master1 ~]
部署 kubectl client节点 kubernetes发布版 tarball(下载脚本) 下载地址:https://github.com/kubernetes/kubernetes/releases kubernetes CHANGELOG(server/client) 下载地址:https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG.md 百度网盘packages目录,密码:nwzk 1
2
3
[root@master1 ~]
[root@master1 ~]
[root@master1 ~]
创建 admin 证书 kubectl 与 kube-apiserver 的安全端口通信,需要为安全通信提供 TLS 证书和秘钥
创建 admin 证书签名请求1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[root@master1 ssl]
{
"CN" : "admin" ,
"hosts" : [],
"key" : {
"algo" : "rsa" ,
"size" : 2048
},
"names" : [
{
"C" : "CN" ,
"ST" : "BeiJing" ,
"L" : "BeiJing" ,
"O" : "system:masters" ,
"OU" : "System"
}
]
}
kube-apiserver
使用 RBAC
对客户端(如 kubelet
、kube-proxy
、Pod
)请求进行授权
kube-apiserver
预定义了一些 RBAC
使用的 RoleBindings
,如 cluster-admin
将 Group system:masters
与 Role cluster-admin
绑定,该 Role 授予了调用kube-apiserver
所有 API 的权限
O 指定该证书的 Group 为 system:masters
,kubelet
使用该证书访问 kube-apiserver
时 ,由于证书被 CA 签名,所以认证通过,同时由于证书用户组为经过预授权的 system:masters
,所以被授予访问所有 API 的权限
hosts 属性值为空列表
生成admin证书和私钥1
2
3
4
[root@master1 ssl]
[root@master1 ssl]
admin.csr admin-csr.json admin-key.pem admin.pem
[root@master1 ssl]
创建 kubectl kubeconfig 文件1
2
3
4
5
6
7
8
[root@master1 ~]
[root@master1 ~]
[root@master1 ~]
[root@master1 ~]
1
2
3
[root@master1 ~]
config
[root@master1 ~]
设置集群参数和客户端认证参数时 --embed-certs
都为 true
,这会将 certificate-authority
、client-certificate
和 client-key
指向的证书文件内容写入到生成的 kube-proxy.kubeconfig
文件中
kube-proxy.pem
证书中 CN 为 system:kube-proxy
,kube-apiserver
预定义的 RoleBinding cluster-admin
将User system:kube-proxy
与 Role system:node-proxier
绑定,该 Role 授予了调用 kube-apiserver
Proxy 相关 API 的权限
admin.pem
证书 O 字段值为 system:masters
,kube-apiserver
预定义的 RoleBinding cluster-admin
将 Group system:masters
与 Role cluster-admin
绑定,该 Role 授予了调用kube-apiserver
相关 API 的权限
生成的 kubeconfig 被保存到 ~/.kube/config
文件
分发 kubeconfig 文件 将 ~/.kube/config
文件拷贝到运行 kubelet 命令的机器的 ~/.kube/
目录下
其他服务器部署kubectl-client工具 先在需要解压安装kubernetes-client-linux-amd64-v1.9.2.tar.gz1
2
3
4
[root@master1 ~]
[root@master1 ~]
[root@master1 ~]
[root@master1 ~]
1
2
3
4
5
6
7
8
[root@master2 ~]
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health" : "true" }
etcd-1 Healthy {"health" : "true" }
etcd-2 Healthy {"health" : "true" }
[root@master2 ~]
如果遇到下面的错误,需要检查kubectl是否可以与api-server通讯1
2
[root@node1 ~]
The connection to the server localhost:8080 was refused - did you specify the right host or port?
查看集群状态 查看k8s 查看各组件信息 需要先安装kubectl命令(kubernetes-client)1
2
3
4
5
6
7
8
[root@master1 ~]
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health" : "true" }
etcd-1 Healthy {"health" : "true" }
etcd-2 Healthy {"health" : "true" }
[root@master1 ~]
或1
2
3
4
5
6
7
8
[root@master1 ~]
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-2 Healthy {"health" : "true" }
etcd-1 Healthy {"health" : "true" }
etcd-0 Healthy {"health" : "true" }
[root@master1 ~]
查看k8s svc地址(服务虚拟IP)1
2
3
4
[root@master1 ~]
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 172.16.0.1 <none> 443/TCP 1d
[root@master1 ~]
或1
2
3
4
[root@master1 ~]
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 172.16.0.1 <none> 443/TCP 1d
[root@master1 ~]
查看集群信息1
2
3
[root@master1 ~]
Kubernetes master is running at https://192.168.1.195:6443
[root@master1 ~]
部署 K8S Node 节点 kubernetes Node 节点包含如下组件:
flanneld
docker
kubelet
kube-proxy
安装和配置 kubelet kubelet 启动时向 kube-apiserver
发送 TLS bootstrapping
请求,需要先将 bootstrap token
文件中的 kubelet-bootstrap
用户赋予 system:node-bootstrapper
角色,然后 kubelet 才有权限创建认证请求(certificatesigningrequests)
kubelet 和kube-proxy 在node1/node2上部署 百度网盘packages目录,密码:nwzk 1
2
3
[root@node1 ~]
[root@node1 ~]
[root@node1 ~]
把在安装有cfssl 工具的服务器(这里是在master1)上生成的admin相关证书和key拷贝到node1上1
2
3
4
[root@master1 ~]
[root@master1 ~]
[root@master1 ~]
[root@master1 ~]
--user=kubelet-bootstrap
是master服务器文件 /etc/kubernetes/token.csv
中指定的用户名,同时也写入了文件 /etc/kubernetes/bootstrap.kubeconfig
如果报下面错误,需要先安装kubectl工具,参考上面操作1
2
3
[root@node2 ~]
The connection to the server localhost:8080 was refused - did you specify the right host or port?
[root@node2 ~]
创建 kubelet bootstrapping kubeconfig 文件1
2
3
4
5
6
7
8
9
10
11
[root@node1 ~]
[root@node1 ~]
[root@node1 ~]
[root@node1 ~]
[root@node1 ~]
bootstrap.kubeconfig
[root@node1 ~]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
[root@node1 ~]
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service
[Service]
WorkingDirectory=/var/lib/kubelet
ExecStart=/usr/local /bin/kubelet \
--address=192.168.1.198 \
--hostname-override=192.168.1.198 \
--pod-infra-container-image=registry.access.redhat.com/rhel7/pod-infrastructure:latest \
--experimental-bootstrap-kubeconfig=/etc/kubernetes/bootstrap.kubeconfig \
--kubeconfig=/etc/kubernetes/kubelet.kubeconfig \
--require-kubeconfig \
--cert-dir=/etc/kubernetes/ssl \
--cluster-dns=223.5.5.5,223.6.6.6,8.8.8.8 \
--cluster-domain=aliyun.com. \
--hairpin-mode promiscuous-bridge \
--allow-privileged=true \
--serialize-image-pulls=false \
--logtostderr=true \
--v=2
ExecStartPost=/sbin/iptables -A INPUT -s 172.30.0.0/16 -p tcp --dport 4194 -j ACCEPT
ExecStartPost=/sbin/iptables -A INPUT -s 172.16.0.0/16 -p tcp --dport 4194 -j ACCEPT
ExecStartPost=/sbin/iptables -A INPUT -s 192.168.0.0/16 -p tcp --dport 4194 -j ACCEPT
ExecStartPost=/sbin/iptables -A INPUT -p tcp --dport 4194 -j DROP
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
[root@node1 ~]
--address
不能设置为 127.0.0.1
,否则后续 Pods 访问 kubelet 的 API 接口时会失败,因为 Pods 访问的 127.0.0.1 指向自己而不是 kubelet。--address
设置为当前部署的节点 IP
如果设置了 --hostname-override
选项,则 kube-proxy 也需要设置该选项,否则会出现找不到 Node 的情况。--hostname-override
设置为当前部署的节点 IP
--cluster-dns
设置为集群 DNS 服务 IP (从 SERVICE_CIDR 中预分配),此环境中使用的 SERVICE_CIDR为172.16.0.0/16,可在master的kube-apiserver配置文件中看到
--cluster-domain
设置为集群 DNS 域名
--experimental-bootstrap-kubeconfig
指向 bootstrap kubeconfig 文件,kubelet 使用该文件中的用户名和 token 向 kube-apiserver 发送 TLS Bootstrapping 请求
管理员通过了 CSR 请求后,kubelet 自动在 --cert-dir
目录创建证书和私钥文件(kubelet-client.crt
和 kubelet-client.key
),然后写入 --kubeconfig
文件(自动创建 --kubeconfig
指定的文件)
建议在 --kubeconfig
配置文件中指定 kube-apiserver
地址,如果未指定 --api-servers
选项,则必须指定 --require-kubeconfig
选项后才从配置文件中读取 kue-apiserver 的地址,否则 kubelet 启动后将找不到 kube-apiserver (日志中提示未找到 API Server),kubectl get nodes
不会返回对应的 Node 信息
--cluster-dns
指定 kubedns 的 Service IP(可以先分配,后续创建 kubedns 服务时指定该 IP),--cluster-domain
指定域名后缀,这两个参数同时指定后才会生效
kubelet cAdvisor 默认在所有接口 监听 4194 端口的请求,对于有外网的机器来说不安全,ExecStartPost
选项指定的 iptables 规则只允许内网机器访问 4194 端口
--cluster-dns
:集群DNS服务IP(从 SERVICE_CIDR 中预分配)。经测试,如果k8s内部没有搭建DNS,建议使用外部公共DNS地址。如果该DNS不存在,将会影响业务对域名的解析
--cluster-domain
:集群 DNS 域名。经测试,此DNS域名后缀必须为真实域名后缀,k8s会写入到POD的/etc/resolv.conf文件中。如果使用默认的DNS后缀(cluster.local.),因该DNS后缀不存在,故在解析DNS时会超时,会严重影响业务
启动 kubelet v1.8版后,需要手动绑定 system:nodes组到 system:node的clusterrole
1
2
3
4
5
[root@node1 ~]
[root@node1 ~]
[root@node1 ~]
[root@node1 ~]
需要关闭swap分区,否则会报下面的错误1
Feb 1 16:56:59 k8s-4 kubelet: error: failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[root@node1 ~]
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2018-02-01 16:41:49 CST; 7min ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Process: 30456 ExecStartPost=/sbin/iptables -A INPUT -p tcp --dport 4194 -j DROP (code=exited, status=0/SUCCESS)
Process: 30452 ExecStartPost=/sbin/iptables -A INPUT -s 192.168.0.0/16 -p tcp --dport 4194 -j ACCEPT (code=exited, status=0/SUCCESS)
Process: 30451 ExecStartPost=/sbin/iptables -A INPUT -s 172.16.0.0/16 -p tcp --dport 4194 -j ACCEPT (code=exited, status=0/SUCCESS)
Process: 30446 ExecStartPost=/sbin/iptables -A INPUT -s 172.30.0.0/16 -p tcp --dport 4194 -j ACCEPT (code=exited, status=0/SUCCESS)
Main PID: 30445 (kubelet)
Memory: 11.7M
CGroup: /system.slice/kubelet.service
└─30445 /usr/local /bin/kubelet --address=192.168.1.198 --hostname-override=192.168.1.198 --pod-infra-container-image=registry.access.redhat.com/rhel7/pod-infr...
Feb 01 16:41:49 node1.example.com kubelet[30445]: I0201 16:41:49.807803 30445 controller.go:114] kubelet config controller: starting controller
Feb 01 16:41:49 node1.example.com kubelet[30445]: I0201 16:41:49.807806 30445 controller.go:118] kubelet config controller: validating combination of defaults and flags
Feb 01 16:41:49 node1.example.com kubelet[30445]: I0201 16:41:49.812805 30445 mount_linux.go:202] Detected OS with systemd
Feb 01 16:41:49 node1.example.com kubelet[30445]: W0201 16:41:49.812875 30445 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Feb 01 16:41:49 node1.example.com kubelet[30445]: I0201 16:41:49.815926 30445 server.go:182] Version: v1.9.2
Feb 01 16:41:49 node1.example.com kubelet[30445]: I0201 16:41:49.815950 30445 feature_gate.go:220] feature gates: &{{} map[]}
Feb 01 16:41:49 node1.example.com kubelet[30445]: W0201 16:41:49.816011 30445 server.go:280] --require-kubeconfig is deprecated. Set --kubeconfig without usi...ubeconfig.
Feb 01 16:41:49 node1.example.com kubelet[30445]: I0201 16:41:49.816019 30445 plugins.go:101] No cloud provider specified.
Feb 01 16:41:49 node1.example.com kubelet[30445]: I0201 16:41:49.816025 30445 server.go:303] No cloud provider specified: "" from the config file: ""
Feb 01 16:41:49 node1.example.com kubelet[30445]: I0201 16:41:49.816036 30445 bootstrap.go:58] Using bootstrap kubeconfig to generate TLS client cert, key an...onfig file
Hint: Some lines were ellipsized, use -l to show in full.
[root@node1 ~]
Q&A master日志报RBAC DENY: user “kubelet-bootstrap” groups错误 启动kubelet服务后,需要看master的kube-apiserver日志(/var/log/messages)中没有如下报错,才算启动成功。1
2
Feb 1 03:41:53 master1 kube-apiserver: I0201 16:41:53.245112 8202 rbac.go:116] RBAC DENY: user "kubelet-bootstrap" groups ["system:kubelet-bootstrap" "system:authentica
ted"] cannot "create" resource "certificatesigningrequests.certificates.k8s.io/nodeclient" cluster-wide
如果在启动node时,master的/var/log/messages日志报上面的错误,原因如下: 1.8版本之前,开启rbac
后,apiserver默认绑定system:nodes
组到system:node
的clusterrole。但v1.8之后,此绑定默认不存在,需要手工绑定 ,否则kubelet启动后会报认证错误,使用kubectl get nodes查看无法成为Ready状态。
默认角色与默认角色绑定 API Server会创建一组默认的ClusterRole和ClusterRoleBinding
对象。这些默认对象中有许多包含system:前缀
,表明这些资源由Kubernetes基础组件“拥有”。 对这些资源的修改可能导致非功能性集群(non-functional cluster
) 这个角色定义了kubelets的权限。如果这个角色被修改,可能会导致kubelets无法正常工作。 所有默认的ClusterRole
和ClusterRoleBinding
对象都会被标记为kubernetes.io/bootstrapping=rbac-defaults
kubectl get clusterrolebinding
和kubectl get clusterrole
可以查看系统中的角色与角色绑定kubectl get clusterrolebindings system:node -o yaml
或kubectl describe clusterrolebindings system:node
查看system:node
角色绑定的详细信息
查看 system:node 角色绑定的详细信息 system:node角色默认绑定为空1
2
3
4
5
6
7
8
9
10
11
[root@master1 ~]
Name: system:node
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate=true
Role:
Kind: ClusterRole
Name: system:node
Subjects:
Kind Name Namespace
---- ---- ---------
[root@master1 ~]
在整个集群范围内将 system:node ClusterRole 授予用户system:node:192.168.1.198
或组system:nodes
1
2
3
4
5
6
7
8
9
10
11
12
[root@master1 ~]
Name: kubelet-node-clusterbinding
Labels: <none>
Annotations: <none>
Role:
Kind: ClusterRole
Name: system:node
Subjects:
Kind Name Namespace
---- ---- ---------
User system:node:192.168.1.198
[root@master1 ~]
通过 kubelet 的 TLS 证书请求 kubelet 首次启动时向 kube-apiserver 发送证书签名请求,必须通过后 kubernetes 系统才会将该 Node 加入到集群
查看未授权的 CSR 请求: 下面如果没有获取到也代表kubelet没有启动成功1
2
3
4
5
6
7
[root@master1 ~]
NAME AGE REQUESTOR CONDITION
node-csr-DeFIxWS7IZimyAUaZGtIh8q4sp_CiNHL2bO1cwEm26U 9m kubelet-bootstrap Pending
[root@master1 ~]
[root@master1 ~]
No resources found.
[root@master1 ~]
通过 CSR 请求:1
2
3
[root@master1 ~]
certificatesigningrequest "node-csr-DeFIxWS7IZimyAUaZGtIh8q4sp_CiNHL2bO1cwEm26U" approved
[root@master1 ~]
1
2
3
4
[root@master1 ~]
NAME AGE REQUESTOR CONDITION
node-csr-DeFIxWS7IZimyAUaZGtIh8q4sp_CiNHL2bO1cwEm26U 13m kubelet-bootstrap Approved,Issued
[root@master1 ~]
自动生成了 kubelet kubeconfig 文件和公私钥1
2
3
4
5
6
7
8
[root@node1 ~]
-rw------- 1 root root 2280 Jan 31 19:19 /etc/kubernetes/kubelet.kubeconfig
[root@node1 ~]
-rw-r--r-- 1 root root 1046 Jan 31 19:19 /etc/kubernetes/ssl/kubelet-client.crt
-rw------- 1 root root 227 Jan 31 19:18 /etc/kubernetes/ssl/kubelet-client.key
-rw-r--r-- 1 root root 1115 Jan 31 19:15 /etc/kubernetes/ssl/kubelet.crt
-rw------- 1 root root 1675 Jan 31 19:15 /etc/kubernetes/ssl/kubelet.key
[root@node1 ~]
如果节点状态变为Ready才算成功,查看日志都已经正常1
2
3
4
[root@master1 ~]
NAME STATUS ROLES AGE VERSION
192.168.1.198 Ready <none> 25m v1.9.2
[root@master1 ~]
参考:http://blog.csdn.net/zhaihaifei/article/details/79098564
多个Node节点加入 1
2
3
4
5
6
7
8
9
10
11
12
[root@master1 ~]
Name: kubelet-node199-clusterbinding
Labels: <none>
Annotations: <none>
Role:
Kind: ClusterRole
Name: system:node
Subjects:
Kind Name Namespace
---- ---- ---------
User system:node:192.168.1.199
[root@master1 ~]
1
2
3
4
5
[root@master1 ~]
NAME STATUS ROLES AGE VERSION
192.168.1.198 Ready <none> 3h v1.9.2
192.168.1.199 Ready <none> 2m v1.9.2
[root@master1 ~]
配置 kube-proxy 创建 kube-proxy 证书签名请求1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[root@master1 ssl]
{
"CN" : "system:kube-proxy" ,
"hosts" : [],
"key" : {
"algo" : "rsa" ,
"size" : 2048
},
"names" : [
{
"C" : "CN" ,
"ST" : "BeiJing" ,
"L" : "BeiJing" ,
"O" : "k8s" ,
"OU" : "System"
}
]
}
[root@master1 ssl]
CN 指定该证书的 User 为 system:kube-proxy
;
kube-apiserver 预定义的 RoleBinding system:node-proxier
将User system:kube-proxy
与 Role system:node-proxier
绑定,该 Role 授予了调用 kube-apiserver
Proxy 相关 API 的权限;
hosts 属性值为空列表
生成 kube-proxy 客户端证书和私钥1
2
3
4
[root@master1 ssl]
[root@master1 ssl]
kube-proxy.csr kube-proxy-csr.json kube-proxy-key.pem kube-proxy.pem
[root@master1 ssl]
把在安装有cfssl 工具的服务器(这里是在master1)上生成的kube-proxy 相关证书和key拷贝到node1上
创建 kube-proxy kubeconfig 文件1
2
3
4
5
6
7
8
[root@node1 ~]
[root@node1 ~]
[root@node1 ~]
[root@node1 ~]
设置集群参数和客户端认证参数时 --embed-certs
都为 true
,这会将 certificate-authority
、client-certificate
和 client-key
指向的证书文件内容写入到生成的 kube-proxy.kubeconfig
文件中
kube-proxy.pem
证书中 CN 为 system:kube-proxy
,kube-apiserver
预定义的 RoleBinding cluster-admin
将User system:kube-proxy
与 Role system:node-proxier
绑定,该 Role 授予了调用 kube-apiserver Proxy
相关 API 的权限
创建 kube-proxy 的 systemd unit 文件 创建工作目录
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[root@node1 ~]
[Unit]
Description=Kubernetes Kube-Proxy Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target
[Service]
WorkingDirectory=/var/lib/kube-proxy
ExecStart=/usr/local /bin/kube-proxy \
--bind-address=192.168.1.198 \
--hostname-override=192.168.1.198 \
--cluster-cidr=172.30.0.0/16 \
--kubeconfig=/etc/kubernetes/kube-proxy.kubeconfig \
--logtostderr=true \
--v=2
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
[root@node1 ~]
--hostname-override
参数值必须 与 kubelet 的值一致,否则 kube-proxy 启动后会找不到该 Node,从而不会创建任何 iptables 规则
--cluster-cidr
必须 与 kube-controller-manager 的 --cluster-cidr
选项值一致,172.30.0.0/16
kube-proxy 根据 --cluster-cidr
判断集群内部和外部流量,指定 --cluster-cidr
或 --masquerade-all
选项后 kube-proxy 才会对访问 Service IP 的请求做 SNAT
--kubeconfig
指定的配置文件嵌入了 kube-apiserver 的地址、用户名、证书、秘钥等请求和认证信息
预定义的 RoleBinding cluster-admin
将User system:kube-proxy
与 Role system:node-proxier
绑定,该 Role 授予了调用 kube-apiserver
Proxy 相关 API 的权限
启动 kube-proxy1
2
3
4
[root@node1 ~]
[root@node1 ~]
[root@node1 ~]
[root@node1 ~]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[root@node1 ~]
● kube-proxy.service - Kubernetes Kube-Proxy Server
Loaded: loaded (/etc/systemd/system/kube-proxy.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2018-02-02 14:24:10 CST; 15s ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Main PID: 3660 (kube-proxy)
Memory: 8.7M
CGroup: /system.slice/kube-proxy.service
‣ 3660 /usr/local /bin/kube-proxy --bind-address=192.168.1.198 --hostname-override=192.168.1.198 --cluster-cidr=172.30.0.0/16 --kubeconfig=/etc/kubernetes/kube...
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.223982 3660 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.224981 3660 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.225415 3660 config.go:202] Starting service config controller
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.225427 3660 controller_utils.go:1019] Waiting for caches to sync for service config controller
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.225514 3660 config.go:102] Starting endpoints config controller
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.225523 3660 controller_utils.go:1019] Waiting for caches to sync for endpoints config controller
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.325575 3660 controller_utils.go:1026] Caches are synced for service config controller
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.325624 3660 proxier.go:984] Not syncing iptables until Services and Endpoints have been re...om master
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.326087 3660 controller_utils.go:1026] Caches are synced for endpoints config controller
Feb 02 14:24:10 node1.example.com kube-proxy[3660]: I0202 14:24:10.326163 3660 proxier.go:329] Adding new service port "default/kubernetes:https" at 172.16.0.1:443/TCP
Hint: Some lines were ellipsized, use -l to show in full.
[root@node1 ~]
验证集群功能 定义文件:nginx-ds.yml 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
[root@master1 ~]
apiVersion: v1
kind: Service
metadata:
name: nginx-ds
labels:
app: nginx-ds
spec:
type : NodePort
selector:
app: nginx-ds
ports:
- name: http
port: 80
targetPort: 80
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: nginx-ds
labels:
addonmanager.kubernetes.io/mode: Reconcile
spec:
template:
metadata:
labels:
app: nginx-ds
spec:
containers:
- name: my-nginx
image: nginx:1.7.9
ports:
- containerPort: 80
[root@master1 ~]
检查各 Node 上的 Pod IP 连通性 当前k8s上有两个Node,在nginx-ds.yml中使用的是DaemonSet
模式,所以会在每个Node上启动这个Pod1
2
A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.
参考:https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/#what-is-a-daemonset 1
2
3
4
5
6
[root@master1 ~]
NAME READY STATUS RESTARTS AGE IP NODE
nginx-ds-9p9pl 1/1 Running 0 4s 172.30.57.2 192.168.1.198
nginx-ds-tvp6b 1/1 Running 0 4s 172.30.41.2 192.168.1.199
[root@master1 ~]
检查服务 IP 和端口可达性1
2
3
[root@master1 ~]
nginx-ds NodePort 172.16.117.40 <none> 80:8446/TCP 5m
[root@master1 ~]
服务IP:172.16.117.40
服务端口:80
NodePort端口:8446
在所有 Node 上执行,IP为nginx-ds的CLUSTER-IP(执行kubectl get svc |grep nginx-ds
命令获取)
检查服务的 NodePort 可达性 在外部机器的浏览器访问 http://192.168.1.198:8446/ 预期输出 nginx 欢迎页面内容
部署 kubedns 插件 官方文件目录:kubernetes/cluster/addons/dnshttps://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pods-dns-config
系统预定义的 RoleBinding 预定义的 RoleBinding system:kube-dns
将 kube-system 命名空间的 kube-dns
ServiceAccount 与 system:kube-dns
Role 绑定, 该 Role 具有访问 kube-apiserver DNS 相关 API 的权限1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[root@master1 ~]
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
creationTimestamp: 2018-01-31T11:03:07Z
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:kube-dns
resourceVersion: "86"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/system%3Akube-dns
uid: 51871810-0676-11e8-8cb0-1e2d0a5bc3f5
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:kube-dns
subjects:
- kind: ServiceAccount
name: kube-dns
namespace: kube-system
[root@master1 ~]
kubedns-controller.yaml
中定义的 Pods 时使用了 kubedns-sa.yaml
文件定义的 kube-dns
ServiceAccount,所以具有访问 kube-apiserver DNS 相关 API 的权限
配置 kube-dns 服务 busybox.yaml kube-dns.yaml 1
2
3
4
5
6
7
8
9
10
11
12
[root@master1 dns]
/root/dns
[root@master1 dns]
busybox.yaml kube-dns.yaml
[root@master1 dns]
clusterIP: 172.16.0.2
image: registry.cn-hangzhou.aliyuncs.com/google_containers/k8s-dns-kube-dns-amd64:1.14.7
- --domain=cluster.local.
image: registry.cn-hangzhou.aliyuncs.com/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.7
- --server=/cluster.local/127.0.0.1
image: registry.cn-hangzhou.aliyuncs.com/google_containers/k8s-dns-sidecar-amd64:1.14.7
[root@master1 dns]
需要将 clusterIP
设置为集群环境变量中变量 CLUSTER_DNS_SVC_IP
值,这个 IP 需要和 kubelet 的 —cluster-dns
参数值一致;
配置 kube-dns Deployment
--domain
为集群环境文档 变量 CLUSTER_DNS_DOMAIN 的值
使用系统已经做了 RoleBinding 的 kube-dns
ServiceAccount,该账户具有访问 kube-apiserver DNS 相关 API 的权限
创建DNSkubectl create -f kube-dns.yaml
创建dns,kubectl delete -f kube-dns.yaml
删除dns1
2
3
4
5
6
[root@master1 ~]
[root@master1 ~]
NAME READY STATUS RESTARTS AGE IP NODE
busybox 1/1 Running 0 19s 172.30.57.2 192.168.1.198
[root@master1 ~]
1
2
3
4
5
6
7
8
9
10
[root@master1 ~]
[root@master1 ~]
[root@master1 ~]
NAME READY STATUS RESTARTS AGE IP NODE
kube-dns-9d8b5fb76-vz6ll 3/3 Running 0 1h 172.30.41.5 192.168.1.199
[root@master1 ~]
[root@master1 ~]
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kube-dns ClusterIP 172.16.0.2 <none> 53/UDP,53/TCP 1h k8s-app=kube-dns
[root@master1 ~]
如果报错了,可以通过这条命令查看报错日志
创建nginx 服务测试dns nginx-deployment.yaml nginx-service.yaml 1
2
[root@master1 ~]
[root@master1 ~]
1
2
3
4
5
6
7
8
9
10
[root@master1 ~]
NAME READY STATUS RESTARTS AGE IP NODE
nginx-deployment-d 8d99448f-rb57v 1/1 Running 0 11m 172.30.41.2 192.168.1.199
[root@master1 ~]
[root@master1 ~]
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 172.16.0.1 <none> 443/TCP 7d <none>
nginx-service NodePort 172.16.215.237 <none> 88:8527/TCP 2d app=nginx
[root@master1 ~]
1
2
3
4
5
6
7
[root@master1 ~]
Server: 172.16.0.2
Address 1: 172.16.0.2 kube-dns.kube-system.svc.cluster.local
Name: nginx-service
Address 1: 172.16.215.237 nginx-service.default.svc.cluster.local
[root@master1 ~]
部署 heapster 插件 heapster release下载页面:https://github.com/kubernetes/heapster/releases heapster-v1.5.0.tar.gz 1
2
3
4
5
6
[root@master1 ~]
[root@master1 ~]
[root@master1 ~]
[root@master1 influxdb]
grafana.yaml heapster.yaml influxdb.yaml
[root@master1 influxdb]
配置rbac heapster-rbac.yaml 无需修改配置1
2
3
[root@master1 influxdb]
/root/heapster-1.5.0/deploy/kube-config/influxdb
[root@master1 influxdb]
如果不执行heapster-rbac.yaml文件,则会报下面的错误:1
E0518 06:08:09.927460 1 reflector.go:190] k8s.io/heapster/metrics/util/util.go:30: Failed to list *v1.Node: nodes is forbidden: User "system:serviceaccount:kube-system:heapster" cannot list nodes at the cluster scope
如果在启动docker容器时,一直处于ContainerCreating
状态,且最后报下面的错误:1
2
Warning FailedCreatePodSandBox 34s (x11 over 43s) kubelet, 192.168.1.204 Failed create pod sandbox.
Normal SandboxChanged 33s (x11 over 43s) kubelet, 192.168.1.204 Pod sandbox changed, it will be killed and re-created.
执行journalctl --since 01:02:00 -u kubelet
查看日志发现,正在下载registry.access.redhat.com/rhel7/pod-infrastructure:latest
docker镜像。 解决办法是:kube-node节点缺少registry.access.redhat.com/rhel7/pod-infrastructure:latest
docker镜像。1
docker pull registry.access.redhat.com/rhel7/pod-infrastructure:latest
配置 influxdb-deployment influxdb.yaml 1
2
3
4
[root@master1 influxdb]
image: lvanneo/heapster-influxdb-amd64:v1.1.1
type : NodePort
[root@master1 influxdb]
1
2
3
[root@master1 influxdb]
/root/heapster-1.5.0/deploy/kube-config/influxdb
[root@master1 influxdb]
查看influxdb数据库集群地址和端口1
2
3
4
[root@master1 influxdb]
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
monitoring-influxdb NodePort 172.16.38.42 <none> 8086:8898/TCP 14s k8s-app=influxdb
[root@master1 influxdb]
配置 heapster-deployment heapster.yaml 1
2
3
4
5
[root@master1 influxdb]
image: lvanneo/heapster-amd64:v1.3.0-beta.1
- --source=kubernetes:https://192.168.1.195:6443
- --sink=influxdb:http://172.16.38.42:8086
[root@master1 influxdb]
1
2
3
[root@master1 influxdb]
/root/heapster-1.5.0/deploy/kube-config/influxdb
[root@master1 influxdb]
查看heapster集群地址和端口1
2
3
4
[root@master1 influxdb]
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
heapster ClusterIP 172.16.202.225 <none> 80/TCP 12s k8s-app=heapster
[root@master1 influxdb]
配置 grafana-deployment grafana.yaml 修改如下配置1
2
3
4
5
6
[root@master1 influxdb]
image: lvanneo/heapster-grafana-amd64:v4.0.2
value: /
type : NodePort
[root@master1 influxdb]
1
2
3
[root@master1 influxdb]
/root/heapster-1.5.0/deploy/kube-config/influxdb
[root@master1 influxdb]
1
2
3
[root@master1 influxdb]
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
monitoring-grafana NodePort 172.16.207.209 <none> 80:8642/TCP 31m k8s-app=grafana
检查 Deployment1
2
3
4
5
[root@master1 ~]
heapster 1 1 1 1 25m
monitoring-grafana 1 1 1 1 53s
monitoring-influxdb 1 1 1 1 28m
[root@master1 ~]
检查 Pods1
2
3
4
5
6
[root@master1 ~]
NAME READY STATUS RESTARTS AGE IP NODE
heapster-5fc 8f648dc-pn4lm 1/1 Running 0 1h 172.30.57.4 192.168.1.198
monitoring-grafana-57b8fcd7b4-67r2j 1/1 Running 0 28m 172.30.57.5 192.168.1.198
monitoring-influxdb-68d87d45f5-5d7qs 1/1 Running 0 1h 172.30.41.3 192.168.1.199
[root@master1 ~]
需要逐一检查下各docker的logs有没有报错信息
访问grafana 访问 http://192.168.1.198:8642/ 配置influxdb-datasource
部署 dashboard 插件 kubernetes-dashboard.yaml 通过rbac 认证kind: ServiceAccount
调用认证文件(/etc/kubernetes/bootstrap.kubeconfig
和/etc/kubernetes/kube-proxy.kubeconfig
),访问的接口1
2
3
4
5
6
7
[root@master1 ~]
image: k8scn/kubernetes-dashboard-amd64:v1.8.0
- --apiserver-host=http://192.168.1.195:8080
- --heapster-host=http://172.16.202.225
type : NodePort
[root@master1 ~]
查看kubernetes-dashboard分配的Node1
2
3
[root@master1 ~]
NAME READY STATUS RESTARTS AGE IP NODE
kubernetes-dashboard-666fbbf977-v9vsh 1/1 Running 0 49s 172.30.41.4 192.168.1.199
查看kubernetes-dashboard分配的 NodePort1
2
3
[root@master1 ~]
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes-dashboard NodePort 172.16.59.24 <none> 443:8847/TCP 39m k8s-app=kubernetes-dashboard
NodePort 8847映射到 dashboard pod 80端口
检查 controller1
2
3
4
[root@master1 ~]
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
kubernetes-dashboard 1 1 1 1 15m
[root@master1 ~]
获取集群服务地址列表1
2
3
4
5
6
7
8
9
[root@master1 ~]
Kubernetes master is running at https://192.168.1.195:6443
Heapster is running at https://192.168.1.195:6443/api/v1/namespaces/kube-system/services/heapster/proxy
KubeDNS is running at https://192.168.1.195:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
monitoring-grafana is running at https://192.168.1.195:6443/api/v1/namespaces/kube-system/services/monitoring-grafana/proxy
monitoring-influxdb is running at https://192.168.1.195:6443/api/v1/namespaces/kube-system/services/monitoring-influxdb/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump' .
[root@master1 ~]
访问dashboard kubernetes-dashboard 服务暴露了 NodePort,可以使用 http://NodeIP:nodePort 地址访问 dashboard
访问 https://192.168.1.199:8847 访问k8s dashboard,经测试火狐浏览器可以,但360浏览器打不开,报404
如果不安装Heapster/influxdb
等插件,k8s-dashboard不能展示Pod
,Nodes
的CPU
,内存
等 metric 图形
保存镜像 导出镜像镜像保存在百度网盘packages目录,密码:nwzk 1
2
3
4
5
6
7
8
docker save k8scn/kubernetes-dashboard-amd64:v1.8.0 > kubernetes-dashboard-amd64-v1.8.0.tar.gz
docker save registry.cn-hangzhou.aliyuncs.com/google_containers/k8s-dns-sidecar-amd64:1.14.7 > k8s-dns-sidecar-amd64-1.14.7.tar.gz
docker save registry.cn-hangzhou.aliyuncs.com/google_containers/k8s-dns-kube-dns-amd64:1.14.7 > k8s-dns-kube-dns-amd64-1.14.7.tar.gz
docker save registry.cn-hangzhou.aliyuncs.com/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.7 > k8s-dns-dnsmasq-nanny-amd64-1.14.7.tar.gz
docker save lvanneo/heapster-influxdb-amd64:v1.1.1 > heapster-influxdb-amd64-v1.1.1.tar.gz
docker save lvanneo/heapster-grafana-amd64:v4.0.2 > heapster-grafana-amd64-v4.0.2.tar.gz
docker save lvanneo/heapster-amd64:v1.3.0-beta.1 > heapster-amd64-v1.3.0-beta.1.tar.gz
docker save k8scn/kubernetes-dashboard-amd64:v1.8.0 > kubernetes-dashboard-amd64-v1.8.0.tar.gz
导入镜像1
docker load -i kubernetes-dashboard-amd64-v1.8.0.tar.gz
参考:https://github.com/opsnull/follow-me-install-kubernetes-cluster
本文出自”Jack Wang Blog”:http://www.yfshare.vip/2018/02/23/部署TLS-k8s/