Skip to content

Faster Ubuntu installs up in the clouds

It all started with a Ubuntu Blog blog post about a slimmer Ubuntu server image. I play around with virtual machines at home, many based on Ubuntu’s full-size server ISO. It would take 20-25 minutes to spin up a new VM using some prebuilt preseed files I had constructed to automate user creation and SSH key copying. I knew there was a better way, and it turns out, using the pre-built Ubuntu minimal image (subsequently called cloud image), combined with cloud-init infrastructure, I was able to spin up Ubuntu Cloud Image minimal VMs in under 2 minutes.

What did it take?

Two git commits (one including some refactoring of my virtbuilder script):

What’s the big difference?

  • Full-size server ISO install: 21m10s.
  • Cloud Image install: 10m57s with cloud image download. 56s when the cloud image is already locally downloaded.

Documentation:

Where might you be able to find this handy automation script? https://github.com/jforman/virthelper

I remember IPv6 being difficult.

I remember using he.net year ago for their IPv6 tunnels years ago, and have painful memories of configuring it, both on the router and to share to the subnets on my home LAN. Not this time.

Read more

Load balanced Kubernetes Ingress. So metal.

Kubernetes has some incredible features, one of them being Ingress. Ingress can be described as a way to give external access to a Kubernetes-run service, typically over HTTP(S). This is useful when you run webapps (Grafana, Binder) in your Kubernetes cluster that need to be accessed by users across your network.

Typically, Ingress integrates with automation provided by public cloud providers like GCP/GKE, AWS, Azure, Digital Ocean, etc where the external IP and routing is done for you. I’ve found bare-metal Ingress configuration examples on the web to be hand-wavy at best. So what happens when there are so many standards, but not sure which one to pick? You make your own. Below is how I configured my bare-metal Ingress on my CoreOS-based Kubernetes cluster to access Grafana.

Read more

Kubernetes, CoreOS, and many lines of Python later.

Several months after my last post, and lots of code hacking, I can rebuild CoreOS-based bare-metal Kubernetes cluster in roughly 20 minutes. It only took  ~1300 lines of Python following Kelsey Hightower’s Kubernetes the Hard Way instructions.

Why? The challenge.

But really, why? I like to hack on code at home, and spinning up a new VM for another Django or Golang app was pretty heavyweight, when all I needed was an easy way to push it out via container. And with various open source projects out on the web providing easy ways to run their code, running my own Kubernetes cluster seemed like a no-brainer.

Read more

Large refactors require large changes in code.

It had been several months since I was away from my machines at home, and in that time, CoreOS changed their bare-metal installation procedures quite a bit. To the point where it almost seemed like an after-thought that folks would run CoreOS anywhere outside of GCE/AWS/Azure. Being that I don’t want to spend my money on cloud-based infrastructure when I’ve got a perfectly adequate 8-core machine at home with 32GB of ram and a few TB of storage, I knew I needed to update my virthelper scripts to get with the program.

High level requirements

  • Automate converting the Container Config (no longer cloud-init configs) to Ignition configs.
  • Modify the Libvirt XML manually (with code) to pass Ignition config path arguments.

I found that some of the automation and trickery included with CoreOS to generate the etcd snippets did not support libvirt. The vagrant-virtualbox helpers were a close fit, but not quite enough (expecting eth1 instead of eth0 for the network interface). This causes the coreos-metadata service to completely fail, a current major blocker for my new scripts to bear all their fruit. I’ve filed some issues/pull requests below with the CoreOS team to get that fixed.

There were some cleanup commits in my repository to allow for flexibility in running virt-install, but the main commit is: https://github.com/jforman/virthelper/commit/0cc65134d3dfd1aaaf14392a9e947e428969b491.

Issues/Pull Requests:

Supporting Documentation and Links:

No more powerline networking in this house.

I finally got around to wiring Cat6 to my desktop machines at home, and ripped out those powerline network adapters. I ran a test if iperf between my desktop and my router before and after the upgrade to see how things fared.

iperf results before:
desktop1:~$ iperf -f m -V -t 30 -c 10.10.0.1
————————————————————
Client connecting to 10.10.0.1, TCP port 5001
TCP window size: 0.08 MByte (default)
————————————————————
[ 3] local 10.10.0.241 port 35262 connected with 10.10.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-30.0 sec 510 MBytes 142 Mbits/sec

iperf results over Cat6:
$ iperf -f m -V -t 30 -c 10.10.0.1
————————————————————
Client connecting to 10.10.0.1, TCP port 5001
TCP window size: 0.08 MByte (default)
————————————————————
[ 3] local 10.10.0.241 port 55044 connected with 10.10.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-30.0 sec 2135 MBytes 597 Mbits/sec


142Mbit/sec to 597 Mbit/sec. That’ll do.

What I read today:

I’d like to (try to) keep a running tab of all the technical, and non-technical, bits of information I pick up day to day. I’m hoping it might provide some insight into what I’m interested at the time, or little tidbits of helpful information I find laying around the web.

Pain(less) NGINX Ingress

Once I get my Kubernetes cluster back up at home, I want to create separate environments for promotions. Right now the deployment I have running is much more pets than cattle, and I want to change that. I want to treat each piece as completely replaceable and interchangeable, and that only happens by having a setup that is not one big snowflake that you are afraid to touch.

How we grow Junior Developers at the BBC

All of this one rang true as an SRE trying to write more code. Mentoring others, while getting mentored are crucial characteristics I feel to being part of a productive team. You can’t just sit behind your monitoring with headphones on and expect to build relationships and have impact.

Kubernetes, the slow way.

It all started when I began hearing about this container thing outside of work. I’ve been a Google SRE going on 6 years, but knowing that the way we do containers internally on Borg is probably not how the rest of the world does reliable, scalable, infrastructure. I was curious, how hard could it be to spin up a few containers and play around like I do at work?

Little did I know, it would take two months, a few hours a few nights a week, to get the point where I was able to access a web service inside my home grown Kubernetes cluster. Below are the high level steps, scripts, and notes I kept during the process.

Read more

A simplified way to securely move all the bits.

A while back, I wrote a post about setting up an L2TP/IPSec VPN on my home firewall/router. It required two daemons and a bunch of configuration that had hard coded IP addresses. While this solution used firmly-established practices (L2TP/IPSec), it felt too brittle. What happens when my dynamic IP address changes? Now I need to update config files, restart daemons, etc. There had to be a better way.

Enter IKEv2. IKEv2 is a successor implementation to Internet Security Association and Key Management Protocol (ISAKMP)/Oakley, IKE version 1.

Read more

LACP, VLANs, always stay connected.

I was bored last weekend, so I configured a two-port LACP bonded trunk from my FreeBSD-running NAS connected to my HP Procurve switch.

Why?

  • I could?
  • I had all these spare Ethernet ports on my NAS, and they seemed bored.
  • More seriously: high availability. One interface serving all my storage traffic just seemed ripe for failure. Imagine serving all your VMs over NFS to a VM server across the network over one NIC, and that one dies. Bad news bears.
  • I also wanted to set up VLANs on top of the trunk. Why? So if I wanted to add a network segment for my NAS on another Layer 3 domain, I don’t have to walk down to the basement to patch another cable.

On to the configuration.

First, I configured the NAS box. Relevant /etc/rc.conf configuration:

# LACP Group
# https://www.freebsd.org/doc/handbook/network-aggregation.html
ifconfig_igb0="up"
ifconfig_igb1="up"
cloned_interfaces="lagg0 vlan12"
ifconfig_lagg0="laggproto lacp laggport igb0 laggport igb1"

# VLAN Configuration
# https://www.freebsd.org/doc/handbook/network-vlan.html
vlans_lagg0="12"
ifconfig_lagg0_12="inet 10.10.2.31/24"

In the first stanza, define the interfaces (igb0, igb1) that comprise the LAGG (link aggregation) interface, simply up’ing them. Then configure your LAGG interface. It’s important to specify the LAGG proto (LACP in my case). Since I’ll be assigning IPs to the tagged VLAN interface, I don’t assign an IP to the raw trunk interface. If I wanted to handle untagged traffic on the trunk, I would specify an IP address.

In the second stanza, configuring the tagged VLAN interface. “vlans_lagg0” specifies a list of VLANs which traverse the lagg0 interface. Then, configure the IP address which will be tagged with VLAN 12 traffic, riding on lagg0.

Ifconfig output should look like the following:

nas1:~ % ifconfig lagg0
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
 options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
 ether 0c:c0:7a:54:84:12
 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
 media: Ethernet autoselect
 status: active
 groups: lagg 
 laggproto lacp lagghash l2,l3,l4
 laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
 laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

nas1:~ % ifconfig lagg0.12
lagg0.12: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
 options=303<RXCSUM,TXCSUM,TSO4,TSO6>
 ether 0c:c0:7a:54:84:12
 inet 10.10.2.31 netmask 0xffffff00 broadcast 10.10.2.255 
 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
 media: Ethernet autoselect
 status: active
 vlan: 12 vlanpcp: 0 parent interface: lagg0
 groups: vlan

With the NAS’s ethernet cables disconnected from the switch, we configure the static LACP trunk. To note, I run an HP Procurve 2910al-24G switch, so my commands will not work for instance on a Cisco switch running IOS. This is the point in the exercise where I wish I had command history on the switch. I can’t remember the commands and in what order I did them. It probably went something not far from the following:

switch1$ enable

switch1# config t

switch1(config)# trunk ethernet 14,16 trk1 lacp 

switch1(config)# vlan 12

switch1(vlan-12)# tagged trk1

What does this do? Creates a two port, static LACP trunk on ports 14 and 16. We then create VLAN 12, and assigning lagg0 to this trunk group.

Now plug those Ethernet cables in and wait a minute for LACP neogtiation to take place.

switch1# show trunk

Load Balancing

Port  | Name Type              | Group Type
 ---- + -------------------------------- 
 14   | nas1-trunk-1 100/1000T | Trk1 LACP
 16   | nas1-trunk-2 100/1000T | Trk1 LACP

switch1# show lacp

LACP

LACP Trunk Port LACP
 Port Enabled Group   Status  Partner Status
 ---- ------- ------- ------- ------- -------
 14   Active  Trk1    Up      Yes     Success
 16   Active  Trk1    Up      Yes     Success

Hooray. A working, two-port Ethernet LACP trunk.

I did some testing, removing Ethernet cables from the two-trunk pair seeing how many packets were dropped, or what the logs of the switch looked like.

64 bytes from 10.10.2.31: icmp_seq=204 ttl=63 time=5.204 ms
Request timeout for icmp_seq 207
64 bytes from 10.10.2.31: icmp_seq=208 ttl=63 time=5.223 ms
....
64 bytes from 10.10.2.31: icmp_seq=224 ttl=63 time=4.502 ms
Request timeout for icmp_seq 225
Request timeout for icmp_seq 226
Request timeout for icmp_seq 227
64 bytes from 10.10.2.31: icmp_seq=228 ttl=63 time=7.078 ms

Four packets dropped twice. Not bad! What does this look like from the switch?

switch1# log Trk1
 Keys: W=Warning I=Information
 M=Major D=Debug E=Error
 ---- Event Log listing: Events Since Boot ----
 I 10/22/16 14:00:51 00078 ports: trunk Trk1 is now active
 I 10/22/16 14:00:53 00076 ports: port 14 in Trk1 is now on-line
 I 10/22/16 14:00:59 00079 ports: trunk Trk1 is now inactive
 I 10/22/16 14:00:59 00078 ports: trunk Trk1 is now active
 I 10/22/16 14:01:02 00076 ports: port 16 in Trk1 is now on-line
 I 10/22/16 14:02:01 00077 ports: port 14 in Trk1 is now off-line
 I 10/22/16 14:02:20 00076 ports: port 14 in Trk1 is now on-line
 I 10/22/16 14:04:50 00077 ports: port 16 in Trk1 is now off-line
 I 10/22/16 14:05:02 00076 ports: port 16 in Trk1 is now on-line
 I 10/22/16 14:05:09 00077 ports: port 14 in Trk1 is now off-line
 I 10/22/16 14:05:21 00076 ports: port 14 in Trk1 is now on-line

Hopefully this end-to-end configuration example helps others who might find it useful.

References: