Kubernetes, the slow way.

It all started when I began hearing about this container thing outside of work. I’ve been a Google SRE going on 6 years, but knowing that the way we do containers internally on Borg is probably not how the rest of the world does reliable, scalable, infrastructure. I was curious, how hard could it be to spin up a few containers and play around like I do at work?

Little did I know, it would take two months, a few hours a few nights a week, to get the point where I was able to access a web service inside my home grown Kubernetes cluster. Below are the high level steps, scripts, and notes I kept during the process.

Step One: Build the CoreOS cluster.

Using my virtbuilder script, I built a five-node CoreOS VM cluster on top of a Ubuntu host. I wanted enough VMs to have quorum, with enough leftover to needlessly restart, to watch pods migrate from host to host.

$ ./vmbuilder.py create_vm --bridge_interface vlan12 --domain_name foo.local.net --disk_pool_name vm-store --vm_type coreos --host_name coreA --cluster_size 5 --coreos_create_cluster --debug --ip_address 10.10.2.121 --nameserver 10.10.2.1 --gateway 10.10.2.1 --netmask 255.255.255.0 --memory 2048

Lots of VM builds happen.

core@coreD1 ~ $ etcdctl cluster-health
 member abc1234 is healthy: got healthy result from http://10.10.2.124:2379
 member abc1235 is healthy: got healthy result from http://10.10.2.122:2379
 member abc1236 is healthy: got healthy result from http://10.10.2.121:2379
 member abc1237 is healthy: got healthy result from http://10.10.2.125:2379
 member abc1238 is healthy: got healthy result from http://10.10.2.123:2379
 cluster is healthy

Having a healthy etcd cluster is a prerequisite to building Kubernetes.

Step Two: Install Kubernetes

There are a ton of guides online explaining how to deploy a Kubernetes cluster on AWS or GCE, but not many on bare-metal. The ones I found for bare-metal were based on using vagrant (felt too turnkey), or minikube (what good is a single node?) to marshall the VM’s. Given I already had my own custom way to deploy VM’s on host machine, I had to splice in my own workflow.

I wanted to run on CoreOS given its tight integration with Docker and containers, and based most of my installation workflow on CoreOS’s Kubernetes documentation.

After performing many manual installs of Kubernetes on my CoreOS cluster, I wrote rudimentary shell scripts to make it a bit easier. The repository of my scripts at github.com/jforman/kubernetes (better documentation is forthcoming).

These scripts create the necessary certificates, systemd unit files, and Kubernetes manifests. The final step is to deploy them to both the master and workers. It’s possible to wrap this in Ansible and do it that way, but trying to over-engineer my first rollout in some other framework felt like pre-mature optimization before I really felt like I ‘knew’ the install.

core@coreD1 ~ $ ./kubectl cluster-info
Kubernetes master is running at https://10.10.2.121

core@coreD1 ~ $ ./kubectl get pods --namespace=kube-system
NAME                                READY STATUS  RESTARTS AGE
kube-apiserver-10.10.2.121          1/1   Running 45       41d
kube-controller-manager-10.10.2.121 1/1   Running 12       41d
kube-proxy-10.10.2.121              1/1   Running 7        41d
kube-proxy-10.10.2.122              1/1   Running 9        41d
kube-proxy-10.10.2.123              1/1   Running 9        41d
kube-proxy-10.10.2.124              1/1   Running 9        41d
kube-proxy-10.10.2.125              1/1   Running 10       41d
kube-scheduler-10.10.2.121          1/1   Running 12       41d
Step 3: Configure Addons

Kubernetes addons make the whole system a lot more usable, and perhaps in my opinion, functional at all? The Dashboard provides a UI for both viewing and changing state of the cluster. I found it invaluable in getting a sense of the interconnectedness of the concepts of Kubernetes (nods to pods to replica sets to deployments).

Step 4: Set up Ingress

This is the step in the process where the light bulb of Kubernetes went off over my head. What is Ingress? It is a way to route external requests to services running on the Kubernetes cluster. It watches services moving around the Kubernetes cluster, and directs traffic to them internally based upon external requests. This bit of the infrastructure is what connected the box diagrams of an externally-accessible IP and port, to an internal service running in the service subnet.

I used the yaml templates from the kubernetes/ingress/examples/deployment/nginx Github repo, modifying only for the namespace. Why did I modify the namespace? Ingress currently only will route to services in the same namespace as it runs. Since I run my containers in the ‘default’ namespace, and not kube-system (where I try to keep more infrastructure-type pods), I modified the templates accordingly.

Then to route to my service based upon a name-based virtual host, the ingress yaml looks like this:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
 name: ingress1
 namespace: default
spec:
 rules:
 - host: foo.server.localdomain.net
 http:
 paths:
 - path: /
   backend:
     serviceName: foo-service
     servicePort: 80
Things I learned

Being able to access a command line in a busybox container in a pod on the Kubernetes cluster is very helpful. Why? It helped clear up the fact that you can’t just ping or nmap service IPs from outside the cluster, or even from the host VM. It just didn’t make sense until:

$ kubectl exec -ti busybox -- /bin/sh 


/ # nslookup kubernetes.default.svc.cluster.local
Server: 10.11.0.2
Address 1: 10.11.0.2 kube-dns.kube-system.svc.cluster.local

Name: kubernetes.default.svc.cluster.local
Address 1: 10.11.0.1 kubernetes.default.svc.cluster.local

The nginx-ingress-controller is the service that is actually externally accessible (hosts NOT in the Kubernetes cluster). The ingress-controller’s IP is where DNS entries for a particular named-based virtually-hosted FQDN need to point at. If that node gets restarted, there is the potential for the controller to move to another host, therefore breaking your FQDN-to-IP mapping. Follow up action item: spread the nginx-ingress-controller to every node (or a pool of nodes that run as inbound proxies) and assign your DNS entry to all those IPs. That ‘should’ work through node reboots.