Keep learning, keep living...

0%

本地环境Kubernetes LoadBalancer实现

Kubernetes没有给本地环境(Bare-metal, On-Premise)提供负载均衡实现,LoadBalancer类型的服务主要在各大公有云厂商上能够得到原生支持。在本地环境创建LoadBalancer类型的服务后,服务的EXTERNAL-IP会一直处于<pending>状态。这是因为在本地环境没有相应的controller来处理这些LoadBalancer服务。比如:

1
2
3
4
[root@master1 vagrant]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.32.0.1 <none> 443/TCP 31d
whoami LoadBalancer 10.32.0.132 <pending> 80:31620/TCP 103s

之前的文章<<基于LVS DR模式的Kubernetes Service External-IP实现>>介绍了手动设置EXTERNAL-IP的方式实现外部负载均衡。本文通过在本地环境实现一个简单的controller来处理LoadBalancer类型服务自动实现负载均衡器。架构示意如图:

LoadBalancer是位于Kubernetes集群外的独立集群。可以通过ECMP将请求分散到不同的LoaderBalancer节点上,LoadBalancer再将请求分发到Kubernetesnode上。

服务依然使用之前文章中的whoami服务定义,type修改为LoadBalancer:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
apiVersion: v1
kind: Service
metadata:
labels:
name: whoami
name: whoami
spec:
ports:
- port: 80
name: web
protocol: TCP
selector:
app: whoami
type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: whoami
labels:
app: whoami
spec:
replicas: 3
selector:
matchLabels:
app: whoami
template:
metadata:
labels:
app: whoami
spec:
containers:
- name: whoami
image: containous/whoami:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
name: web

创建服务:

1
2
3
[root@master1 vagrant]# kubectl apply -f whoami.yaml
service/whoami created
deployment.apps/whoami created

此时查看集群中的service, 可以看到whoami服务的EXTERNAL-IP处于<pending>状态:

1
2
3
4
[root@master1 vagrant]# kubectl get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.32.0.1 <none> 443/TCP 31d <none>
whoami LoadBalancer 10.32.0.188 <pending> 80:31220/TCP 6s app=whoami

接着编写自定义的controller来处理LoadBalancer, 自定义controller的编写可以参考之前的文章<<Kubernetes CDR和Custom Controller>>, main.py源码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
# coding: utf-8

import logging
import sys
import os
from kubernetes import client, config, watch

log = logging.getLogger(__name__)
out_hdlr = logging.StreamHandler(sys.stdout)
out_hdlr.setFormatter(logging.Formatter('%(asctime)s %(message)s'))
out_hdlr.setLevel(logging.INFO)
log.addHandler(out_hdlr)
log.setLevel(logging.INFO)


lb_ip_pools = [
'10.240.0.210',
'10.240.0.211',
'10.240.0.212',
'10.240.0.213',
'10.240.0.214',
'10.240.0.215'
]

node_ips = ['10.240.0.101', '10.240.0.102']


lb_services = {}
kubernetes_services = {}

def add_services(svc_manifest):
ports = svc_manifest.spec.ports
lb_svcs = []

for ip in lb_ip_pools:
if (ip not in lb_services) or (len(lb_services[ip]) == 0):
lb_services[ip] = {}

for port in ports:
lb_svcs.append((port.protocol, ip, port.port))

if port.port not in lb_services[ip]:
lb_services[ip][port.port] = []
lb_services[ip][port.port].append(port.protocol)

kubernetes_services[svc_manifest.metadata.name] = lb_svcs
return lb_svcs

valid_ip = True
for port in ports:
if port.port in lb_services[ip]:
valid_ip = False
break

if valid_ip:
for port in ports:
lb_svcs.append((port.protocol, ip, port.port))

if port.port not in lb_services[ip]:
lb_services[ip][port.port] = []
lb_services[ip][port.port].append(port.protocol)

kubernetes_services[svc_manifest.metadata.name] = lb_svcs
return lb_svcs

return None


def del_services(svc_manifest):
lb_svcs = kubernetes_services[svc_manifest.metadata.name]
del kubernetes_services[svc_manifest.metadata.name]

for svc in lb_svcs:
del lb_services[svc[1]][svc[2]]
return lb_svcs


def del_ipvs(lb_svcs):
for item in lb_svcs:
if item[0] == 'TCP':
command = "ipvsadm -D -t %s:%d" % (item[1], item[2])
os.system(command)
elif item[0] == 'UDP':
command = "ipvsadm -D -u %s:%d" % (item[1], item[2])
os.system(command)

def add_ipvs(lb_svcs):
for item in lb_svcs:
if item[0] == 'TCP':
command = "ipvsadm -A -t %s:%d -s rr" % (item[1], item[2])
os.system(command)
for node_ip in node_ips:
command = "ipvsadm -a -t %s:%d -r %s -g" % (item[1], item[2], node_ip)
os.system(command)
elif item[0] == 'UDP':
command = "ipvsadm -A -u %s:%d -s rr" % (item[1], item[2])
os.system(command)
for node_ip in node_ips:
command = "ipvsadm -a -u %s:%d -r %s -g" % (item[1], item[2], node_ip)
os.system(command)
else:
log.error("invalid protocol: %s", item[0])

def main():
config.load_kube_config()

v1 = client.CoreV1Api()
w = watch.Watch()
for item in w.stream(v1.list_service_for_all_namespaces):
if item["type"] == "ADDED":
svc_manifest = item['object']
namespace = svc_manifest.metadata.namespace
name = svc_manifest.metadata.name
svc_type = svc_manifest.spec.type

log.info("Service ADDED: %s %s %s" % (namespace, name, svc_type))

if svc_type == "LoadBalancer":
if svc_manifest.status.load_balancer.ingress == None:
log.info("Process load balancer service add event")
lb_svcs = add_services(svc_manifest)
if lb_svcs == None:
log.error("no available loadbalancer IP")
continue
add_ipvs(lb_svcs)
svc_manifest.status.load_balancer.ingress = [{'ip': lb_svcs[0][1]}]
v1.patch_namespaced_service_status(name, namespace, svc_manifest)
log.info("Update service status")

elif item["type"] == "MODIFIED":
log.info("Service MODIFIED: %s %s" % (item['object'].metadata.name, item['object'].spec.type))

elif item["type"] == "DELETED":
svc_manifest = item['object']
namespace = svc_manifest.metadata.namespace
name = svc_manifest.metadata.name
svc_type = svc_manifest.spec.type

log.info("Service DELETED: %s %s %s" % (namespace, name, svc_type))

if svc_type == "LoadBalancer":
if svc_manifest.status.load_balancer.ingress != None:
log.info("Process load balancer service delete event")
lb_svcs = del_services(svc_manifest)
if len(lb_svcs) != 0:
del_ipvs(lb_svcs)

if __name__ == '__main__':
main()

我们的controller程序通过Kubernetes APIServer监控service对象。当检测到有LoadBalancer类型的service创建后,则分配VIP并在节点上创建IPVS服务,然后修改相应servicestatus。我们使用的依然是RR模式,将数据包分发到后端的node。修改目的MAC地址的数据包到达node能够正常处理是由于kube-proxy会根据修改后的service对象创建相应的iptables规则:

1
-A KUBE-SERVICES -d 10.240.0.210/32 -p tcp -m comment --comment "default/whoami:web loadbalancer IP" -m tcp --dport 80 -j KUBE-FW-225DYIB7Z2N6SCOU

我们的实验环境是单节点的IPVS, 为了简单直接在IPVS节点上运行我们的controller,因而不存在多次修改service状态的问题。如果在多节点的IPVS集群,controller可以通过Puppet,SaltStack或者Ansible等运维部署工具创建IPVS服务。

IPVS节点上安装依赖的kubernetes库, 因为我们使用的kubernetes版本为1.15.3, 因而安装11.0.0版本, 具体版本依赖参考官网说明

1
pip3 install kubernetes==11.0.0

运行我们的controller:

1
2
3
4
5
6
7
8
[root@lb1 ipvslb]# python3 main.py
2021-11-21 04:18:25,018 Service ADDED: default kubernetes ClusterIP
2021-11-21 04:18:25,019 Service ADDED: default whoami LoadBalancer
2021-11-21 04:18:25,019 Process load balancer service add event
2021-11-21 04:18:25,064 Update service status
2021-11-21 04:18:25,066 Service MODIFIED: whoami LoadBalancer


此时再去查看service:

1
2
3
4
[root@master1 vagrant]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.32.0.1 <none> 443/TCP 31d
whoami LoadBalancer 10.32.0.132 10.240.0.210 80:31620/TCP 2m40s

可以看到whoami服务的EXTERNAL-IP获取到了分配的VIP: 10.240.0.210。查看IPVS服务, 也可以看到对应的服务已经建立:

1
2
3
4
5
6
7
[root@lb1 ipvslb]# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.240.0.210:80 rr
-> 10.240.0.101:80 Route 1 0 0
-> 10.240.0.102:80 Route 1 0 0

此时去访问VIP, 访问成功:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[root@master1 vagrant]# curl http://10.240.0.210
Hostname: whoami-6756777fd4-4lpvl
IP: 127.0.0.1
IP: ::1
IP: 10.230.64.2
IP: fe80::ec48:94ff:fea0:db31
RemoteAddr: 10.230.10.0:45298
GET / HTTP/1.1
Host: 10.240.0.210
User-Agent: curl/7.29.0
Accept: */*

[root@master1 vagrant]# curl http://10.240.0.210
Hostname: whoami-6756777fd4-qzm6r
IP: 127.0.0.1
IP: ::1
IP: 10.230.10.19
IP: fe80::60b7:2aff:feab:68c2
RemoteAddr: 10.230.64.0:45304
GET / HTTP/1.1
Host: 10.240.0.210
User-Agent: curl/7.29.0
Accept: */*

接着删除whoami服务:

1
2
[root@master1 vagrant]# kubectl delete svc/whoami
service "whoami" deleted

controller输出可以看到事件被正确处理:

1
2
2021-11-21 04:27:34,046 Service DELETED: default whoami LoadBalancer
2021-11-21 04:27:34,047 Process load balancer service delete event

查看IPVS服务, 也被正确删除了:

1
2
3
4
[root@lb1 ipvslb]# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn

在我们的示例中,VIP池中的IP需要提前在IPVS节点配置好,比如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[root@lb1 ipvslb]# ip a  show eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:48:90:6c brd ff:ff:ff:ff:ff:ff
inet 10.240.0.6/24 scope global eth1
valid_lft forever preferred_lft forever
inet 10.240.0.201/32 scope global eth1
valid_lft forever preferred_lft forever
inet 10.240.0.210/32 scope global eth1
valid_lft forever preferred_lft forever
inet 10.240.0.211/32 scope global eth1
valid_lft forever preferred_lft forever
inet 10.240.0.212/32 scope global eth1
valid_lft forever preferred_lft forever
inet 10.240.0.213/32 scope global eth1
valid_lft forever preferred_lft forever
inet 10.240.0.214/32 scope global eth1
valid_lft forever preferred_lft forever
inet 10.240.0.215/32 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe48:906c/64 scope link
valid_lft forever preferred_lft forever

nodeIP也是在代码里提前配置的,正常应该从Kubernetess APIServer获取。在示例里简单处理了。

metallb是当前使用范围较广的本地环境LoadBalancer实现,它本身没有使用外部独立的负载均衡集群,是在node节点上实现的负载均衡功能。很多Kubernetes产品已经集成了它,后续有时间可以分析一下它的源码实现。

参考: