Load balanced Kubernetes Ingress. So metal.

Kubernetes has some incredible features, one of them being Ingress. Ingress can be described as a way to give external access to a Kubernetes-run service, typically over HTTP(S). This is useful when you run webapps (Grafana, Binder) in your Kubernetes cluster that need to be accessed by users across your network.

Typically, Ingress integrates with automation provided by public cloud providers like GCP/GKE, AWS, Azure, Digital Ocean, etc where the external IP and routing is done for you. I’ve found bare-metal Ingress configuration examples on the web to be hand-wavy at best. So what happens when there are so many standards, but not sure which one to pick? You make your own. Below is how I configured my bare-metal Ingress on my CoreOS-based Kubernetes cluster to access Grafana.

Overview:

Starting from the outside in, I spun up another VM (Debian) on my network, and ran haproxy on it, the only non-Kubernetes piece of the solution. This Haproxy instance is what will load balance the incoming HTTP (port 80) and HTTPS (port 443) connections to the nginx-ingress-controller pods running on the controller nodes. This is the only single point of failure in the whole chain, and one I could further improve by running multiple VM’s with identical Haproxy configurations sharing a single IP via a VIP. An optimization for another time.

Port 80 and 443 load balance to ports 30080 and 304430 respectively on one of the three Kubernetes controller nodes. Why do this and not just open up those ports on the Controller nodes themselves? The kubelet on each of the three controllers, and all nodes, does not use low-number ports (<1024), since it does not run as root, or use elevated permissions. I’d rather use haproxy’s proven track record of load balancing to distribute connections, and not the kubelet/kube-proxy. And Given the automatic-update nature of CoreOS, if one controller node restarts for an update, the other two nodes can route Ingress requests.

proxy1% cat /etc/haproxy/hahaproxy.cfg
defaults
 log global
 mode tcp
 option tcplog

listen stats
 mode http
 bind 10.10.2.14:8081
 stats enable
 stats hide-version
 stats refresh 30s
 stats show-node
 stats auth haproxyadmin:haproxyadmin
 stats uri /haproxy_stats
 
frontend http-in
 bind 10.10.2.120:80
 default_backend http-out
 
backend http-out
 server corea-controller0 10.10.0.125:30080 check
 server corea-controller1 10.10.0.126:30080 check
 server corea-controller2 10.10.0.127:30080 check

frontend https-in
 bind 10.10.2.120:443
 default_backend https-out
 
backend https-out
 server corea-controller0 10.10.0.125:30443
 server corea-controller1 10.10.0.126:30443
 server corea-controller2 10.10.0.127:30443

frontend kubernetes-api-in
 bind 10.10.2.119:443
 default_backend kubernetes-api-out
 
backend kubernetes-api-out
 server corea-controller0 10.10.0.125:6443
 server corea-controller1 10.10.0.126:6443
 server corea-controller2 10.10.0.127:6443

HTTP Requests for Grafana have now hit port 30080 on one of 10.10.0.{125,126,127} at the ingress-nginx-service. This service, routes those requests to pod with label ‘app: nginx-ingress-controller’, port (targetPort) 80. ‘Port’ in the Service specification refers to the port of the Kubernetes internally-accessed service. Port 30443, and 443 can be substituted as before for HTTPS instead of HTTP.

% kubectl get service ingress-nginx-service -o wide -n ingress-nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
ingress-nginx-service NodePort 10.122.81.28 <none> 80:30080/TCP,18080:30081/TCP,443:30443/TCP 6d app=nginx-ingress-controller

% kubectl describe service ingress-nginx-service -n ingress-nginx
Name: ingress-nginx-service
Namespace: ingress-nginx
Labels: service=ingress-nginx
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"service":"ingress-nginx"},"name":"ingress-nginx-service","namespace":"ingre...
Selector: app=nginx-ingress-controller
Type: NodePort
IP: 10.122.81.28
Port: http 80/TCP
TargetPort: 80/TCP
NodePort: http 30080/TCP
Endpoints: 10.244.0.10:80,10.244.1.10:80,10.244.2.10:80
Port: nginxstatus 18080/TCP
TargetPort: 18080/TCP
NodePort: nginxstatus 30081/TCP
Endpoints: 10.244.0.10:18080,10.244.1.10:18080,10.244.2.10:18080
Port: https 443/TCP
TargetPort: 443/TCP
NodePort: https 30443/TCP
Endpoints: 10.244.0.10:443,10.244.1.10:443,10.244.2.10:443
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>

At this point requests are now in the Pod network and will stay there (not using externally-accessible IPs and Ports) for the remainder of the http request. One of the controllers will now access its Ingress configuration for Grafana, using the FQDN configured for this service to select where to send the http request itself.

% kubectl get ing -o wide --all-namespaces
NAMESPACE NAME HOSTS ADDRESS PORTS AGE
default grafana grafana.obfuscated.domain.net 80 3d


% kubectl describe ing grafana -n default
Name: grafana
Namespace: default
Address:
Default backend: default-http-backend:80 (<none>)
Rules:
Host Path Backends
---- ---- --------
grafana.obfuscated.domain.net
/ grafana-service:3000 (<none>)

Kubernetes will now route requests to service:grafana-service, destined for port 3000 on that pod’s. This is where the beauty of cloud infrastructure pays off. I can reboot the worker and controller nodes to my heart’s content, and requests will continue to be serviced. You do have more than one replica of your service and ingress, right?

% kubectl describe service grafana-service -n default
 Name: grafana-service
 Namespace: default
 Labels: <none>
 Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"grafana-service","namespace":"default"},"spec":{"ports":[{"port":80,"protocol"...
 Selector: app=grafana
 Type: ClusterIP
 IP: 10.122.184.150
 Port: <unset> 80/TCP
 TargetPort: 3000/TCP
 Endpoints: 10.244.4.8:3000,10.244.5.15:3000
 Session Affinity: None
 Events: <none>

% kubectl get pod -l app=grafana -n default -o wide
 NAME READY STATUS RESTARTS AGE IP NODE
 grafana-667fc96676-rcxdh 1/1 Running 0 3d 10.244.4.8 corea-worker1
 grafana-667fc96676-tjpkg 1/1 Running 0 3d 10.244.5.15 corea-worker2.

Warts:
The ingress-nginx service runs on all Kubernetes nodes (workers and controllers), which I don’t love, since I was trying to reduce the surface at which requests can be made into the Kubernetes cluster. This surface could be reduced by firewalling off those nodes/ports from the rest of the network. But this is just how Kubernetes works. A service exposed via NodePort listens on all cluster hosts (so kube-proxy can route you there).

Some apps don’t play nice with the round-robin nature of http(s) requests within the ingress if you are not using any sort of session/cookie stickiness.