Kubernetes没有给本地环境(Bare-metal, On-Premise)提供负载均衡实现,LoadBalancer类型的服务主要在各大公有云厂商上能够得到原生支持。在本地环境创建LoadBalancer类型的服务后,服务的EXTERNAL-IP会一直处于<pending>状态。这是因为在本地环境没有相应的controller来处理这些LoadBalancer服务。比如:
1 2 3 4
   | [root@master1 vagrant] NAME         TYPE           CLUSTER-IP    EXTERNAL-IP   PORT(S)        AGE kubernetes   ClusterIP      10.32.0.1     <none>        443/TCP        31d whoami       LoadBalancer   10.32.0.132   <pending>     80:31620/TCP   103s
   | 
 
之前的文章<<基于LVS DR模式的Kubernetes Service External-IP实现>>介绍了手动设置EXTERNAL-IP的方式实现外部负载均衡。本文通过在本地环境实现一个简单的controller来处理LoadBalancer类型服务自动实现负载均衡器。架构示意如图:
LoadBalancer是位于Kubernetes集群外的独立集群。可以通过ECMP将请求分散到不同的LoaderBalancer节点上,LoadBalancer再将请求分发到Kubernetes的node上。
服务依然使用之前文章中的whoami服务定义,type修改为LoadBalancer:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
   | apiVersion: v1 kind: Service metadata:   labels:     name: whoami   name: whoami spec:   ports:   - port: 80     name: web     protocol: TCP   selector:     app: whoami   type: LoadBalancer --- apiVersion: apps/v1 kind: Deployment metadata:   name: whoami   labels:     app: whoami spec:   replicas: 3   selector:     matchLabels:       app: whoami   template:     metadata:       labels:         app: whoami     spec:       containers:       - name: whoami         image: containous/whoami:latest         imagePullPolicy: IfNotPresent         ports:         - containerPort: 80           name: web
   | 
 
创建服务:
1 2 3
   | [root@master1 vagrant] service/whoami created deployment.apps/whoami created
   | 
 
此时查看集群中的service, 可以看到whoami服务的EXTERNAL-IP处于<pending>状态:
1 2 3 4
   | [root@master1 vagrant] NAME         TYPE           CLUSTER-IP    EXTERNAL-IP   PORT(S)        AGE   SELECTOR kubernetes   ClusterIP      10.32.0.1     <none>        443/TCP        31d   <none> whoami       LoadBalancer   10.32.0.188   <pending>     80:31220/TCP   6s    app=whoami
   | 
 
接着编写自定义的controller来处理LoadBalancer, 自定义controller的编写可以参考之前的文章<<Kubernetes CDR和Custom Controller>>, main.py源码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149
   | 
  import logging import sys import os from kubernetes import client, config, watch
  log = logging.getLogger(__name__) out_hdlr = logging.StreamHandler(sys.stdout) out_hdlr.setFormatter(logging.Formatter('%(asctime)s %(message)s')) out_hdlr.setLevel(logging.INFO) log.addHandler(out_hdlr) log.setLevel(logging.INFO)
 
  lb_ip_pools = [     '10.240.0.210',     '10.240.0.211',     '10.240.0.212',     '10.240.0.213',     '10.240.0.214',     '10.240.0.215' ]
  node_ips = ['10.240.0.101', '10.240.0.102']
 
  lb_services = {} kubernetes_services = {}
  def add_services(svc_manifest):     ports = svc_manifest.spec.ports     lb_svcs = []
      for ip in lb_ip_pools:         if (ip not in lb_services) or (len(lb_services[ip]) == 0):             lb_services[ip] = {}
              for port in ports:                 lb_svcs.append((port.protocol, ip, port.port))
                  if port.port not in lb_services[ip]:                     lb_services[ip][port.port] = []                 lb_services[ip][port.port].append(port.protocol)
              kubernetes_services[svc_manifest.metadata.name] = lb_svcs             return lb_svcs
          valid_ip = True         for port in ports:             if port.port in lb_services[ip]:                 valid_ip = False                 break
          if valid_ip:             for port in ports:                 lb_svcs.append((port.protocol, ip, port.port))
                  if port.port not in lb_services[ip]:                     lb_services[ip][port.port] = []                 lb_services[ip][port.port].append(port.protocol)
              kubernetes_services[svc_manifest.metadata.name] = lb_svcs             return lb_svcs
      return None
 
  def del_services(svc_manifest):     lb_svcs = kubernetes_services[svc_manifest.metadata.name]     del kubernetes_services[svc_manifest.metadata.name]
      for svc in lb_svcs:         del lb_services[svc[1]][svc[2]]     return lb_svcs
 
  def del_ipvs(lb_svcs):     for item in lb_svcs:         if item[0] == 'TCP':             command = "ipvsadm -D -t %s:%d" % (item[1], item[2])             os.system(command)         elif item[0] == 'UDP':             command = "ipvsadm -D -u %s:%d" % (item[1], item[2])             os.system(command)
  def add_ipvs(lb_svcs):     for item in lb_svcs:         if item[0] == 'TCP':             command = "ipvsadm -A -t %s:%d -s rr" % (item[1], item[2])             os.system(command)             for node_ip in node_ips:                 command = "ipvsadm -a -t %s:%d -r %s -g" % (item[1], item[2], node_ip)                 os.system(command)         elif item[0] == 'UDP':             command = "ipvsadm -A -u %s:%d -s rr" % (item[1], item[2])             os.system(command)             for node_ip in node_ips:                 command = "ipvsadm -a -u %s:%d -r %s -g" % (item[1], item[2], node_ip)                 os.system(command)         else:             log.error("invalid protocol: %s", item[0])
   def main():     config.load_kube_config()
      v1 = client.CoreV1Api()     w = watch.Watch()     for item in w.stream(v1.list_service_for_all_namespaces):         if item["type"] == "ADDED":             svc_manifest = item['object']             namespace = svc_manifest.metadata.namespace             name = svc_manifest.metadata.name             svc_type = svc_manifest.spec.type
              log.info("Service ADDED: %s %s %s" % (namespace, name, svc_type))
              if svc_type == "LoadBalancer":                 if svc_manifest.status.load_balancer.ingress == None:                     log.info("Process load balancer service add event")                     lb_svcs = add_services(svc_manifest)                     if lb_svcs == None:                         log.error("no available loadbalancer IP")                         continue                     add_ipvs(lb_svcs)                     svc_manifest.status.load_balancer.ingress = [{'ip': lb_svcs[0][1]}]                     v1.patch_namespaced_service_status(name, namespace, svc_manifest)                     log.info("Update service status")
          elif item["type"] == "MODIFIED":             log.info("Service MODIFIED: %s %s" % (item['object'].metadata.name, item['object'].spec.type))
          elif item["type"] == "DELETED":             svc_manifest = item['object']             namespace = svc_manifest.metadata.namespace             name = svc_manifest.metadata.name             svc_type = svc_manifest.spec.type
              log.info("Service DELETED: %s %s %s" % (namespace, name, svc_type))
              if svc_type == "LoadBalancer":                 if svc_manifest.status.load_balancer.ingress != None:                     log.info("Process load balancer service delete event")                     lb_svcs = del_services(svc_manifest)                     if len(lb_svcs) != 0:                         del_ipvs(lb_svcs)
  if __name__ == '__main__':     main()
 
  | 
 
我们的controller程序通过Kubernetes APIServer监控service对象。当检测到有LoadBalancer类型的service创建后,则分配VIP并在节点上创建IPVS服务,然后修改相应service的status。我们使用的依然是RR模式,将数据包分发到后端的node。修改目的MAC地址的数据包到达node能够正常处理是由于kube-proxy会根据修改后的service对象创建相应的iptables规则:
1
   | -A KUBE-SERVICES -d 10.240.0.210/32 -p tcp -m comment --comment "default/whoami:web loadbalancer IP" -m tcp --dport 80 -j KUBE-FW-225DYIB7Z2N6SCOU
   | 
 
我们的实验环境是单节点的IPVS, 为了简单直接在IPVS节点上运行我们的controller,因而不存在多次修改service状态的问题。如果在多节点的IPVS集群,controller可以通过Puppet,SaltStack或者Ansible等运维部署工具创建IPVS服务。
在IPVS节点上安装依赖的kubernetes库, 因为我们使用的kubernetes版本为1.15.3, 因而安装11.0.0版本, 具体版本依赖参考官网说明
1
   | pip3 install kubernetes==11.0.0
   | 
 
运行我们的controller:
1 2 3 4 5 6 7 8
   | [root@lb1 ipvslb] 2021-11-21 04:18:25,018 Service ADDED: default kubernetes ClusterIP 2021-11-21 04:18:25,019 Service ADDED: default whoami LoadBalancer 2021-11-21 04:18:25,019 Process load balancer service add event 2021-11-21 04:18:25,064 Update service status 2021-11-21 04:18:25,066 Service MODIFIED: whoami LoadBalancer
 
 
   | 
 
此时再去查看service:
1 2 3 4
   | [root@master1 vagrant] NAME         TYPE           CLUSTER-IP    EXTERNAL-IP    PORT(S)        AGE kubernetes   ClusterIP      10.32.0.1     <none>         443/TCP        31d whoami       LoadBalancer   10.32.0.132   10.240.0.210   80:31620/TCP   2m40s
   | 
 
可以看到whoami服务的EXTERNAL-IP获取到了分配的VIP: 10.240.0.210。查看IPVS服务, 也可以看到对应的服务已经建立:
1 2 3 4 5 6 7
   | [root@lb1 ipvslb] IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn TCP  10.240.0.210:80 rr   -> 10.240.0.101:80              Route   1      0          0   -> 10.240.0.102:80              Route   1      0          0
   | 
 
此时去访问VIP, 访问成功:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
   | [root@master1 vagrant] Hostname: whoami-6756777fd4-4lpvl IP: 127.0.0.1 IP: ::1 IP: 10.230.64.2 IP: fe80::ec48:94ff:fea0:db31 RemoteAddr: 10.230.10.0:45298 GET / HTTP/1.1 Host: 10.240.0.210 User-Agent: curl/7.29.0 Accept: */*
  [root@master1 vagrant] Hostname: whoami-6756777fd4-qzm6r IP: 127.0.0.1 IP: ::1 IP: 10.230.10.19 IP: fe80::60b7:2aff:feab:68c2 RemoteAddr: 10.230.64.0:45304 GET / HTTP/1.1 Host: 10.240.0.210 User-Agent: curl/7.29.0 Accept: */*
 
   | 
 
接着删除whoami服务:
1 2
   | [root@master1 vagrant] service "whoami" deleted
   | 
 
从controller输出可以看到事件被正确处理:
1 2
   | 2021-11-21 04:27:34,046 Service DELETED: default whoami LoadBalancer 2021-11-21 04:27:34,047 Process load balancer service delete event
   | 
 
查看IPVS服务, 也被正确删除了:
1 2 3 4
   | [root@lb1 ipvslb] IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
   | 
 
在我们的示例中,VIP池中的IP需要提前在IPVS节点配置好,比如:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
   | [root@lb1 ipvslb] 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000     link/ether 08:00:27:48:90:6c brd ff:ff:ff:ff:ff:ff     inet 10.240.0.6/24 scope global eth1        valid_lft forever preferred_lft forever     inet 10.240.0.201/32 scope global eth1        valid_lft forever preferred_lft forever     inet 10.240.0.210/32 scope global eth1        valid_lft forever preferred_lft forever     inet 10.240.0.211/32 scope global eth1        valid_lft forever preferred_lft forever     inet 10.240.0.212/32 scope global eth1        valid_lft forever preferred_lft forever     inet 10.240.0.213/32 scope global eth1        valid_lft forever preferred_lft forever     inet 10.240.0.214/32 scope global eth1        valid_lft forever preferred_lft forever     inet 10.240.0.215/32 scope global eth1        valid_lft forever preferred_lft forever     inet6 fe80::a00:27ff:fe48:906c/64 scope link        valid_lft forever preferred_lft forever
   | 
 
node的IP也是在代码里提前配置的,正常应该从Kubernetess APIServer获取。在示例里简单处理了。
metallb是当前使用范围较广的本地环境LoadBalancer实现,它本身没有使用外部独立的负载均衡集群,是在node节点上实现的负载均衡功能。很多Kubernetes产品已经集成了它,后续有时间可以分析一下它的源码实现。
参考: