Large refactors require large changes in code.

It had been several months since I was away from my machines at home, and in that time, CoreOS changed their bare-metal installation procedures quite a bit. To the point where it almost seemed like an after-thought that folks would run CoreOS anywhere outside of GCE/AWS/Azure. Being that I don’t want to spend my money on cloud-based infrastructure when I’ve got a perfectly adequate 8-core machine at home with 32GB of ram and a few TB of storage, I knew I needed to update my virthelper scripts to get with the program.

High level requirements

  • Automate converting the Container Config (no longer cloud-init configs) to Ignition configs.
  • Modify the Libvirt XML manually (with code) to pass Ignition config path arguments.

I found that some of the automation and trickery included with CoreOS to generate the etcd snippets did not support libvirt. The vagrant-virtualbox helpers were a close fit, but not quite enough (expecting eth1 instead of eth0 for the network interface). This causes the coreos-metadata service to completely fail, a current major blocker for my new scripts to bear all their fruit. I’ve filed some issues/pull requests below with the CoreOS team to get that fixed.

There were some cleanup commits in my repository to allow for flexibility in running virt-install, but the main commit is:

Issues/Pull Requests:


Supporting Documentation and Links:


No more powerline networking in this house.

I finally got around to wiring Cat6 to my desktop machines at home, and ripped out those powerline network adapters. I ran a test if iperf between my desktop and my router before and after the upgrade to see how things fared.

iperf results before:
desktop1:~$ iperf -f m -V -t 30 -c
Client connecting to, TCP port 5001
TCP window size: 0.08 MByte (default)
[ 3] local port 35262 connected with port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-30.0 sec 510 MBytes 142 Mbits/sec

iperf results over Cat6:
$ iperf -f m -V -t 30 -c
Client connecting to, TCP port 5001
TCP window size: 0.08 MByte (default)
[ 3] local port 55044 connected with port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-30.0 sec 2135 MBytes 597 Mbits/sec

142Mbit/sec to 597 Mbit/sec. That’ll do.

What I read today:

I’d like to (try to) keep a running tab of all the technical, and non-technical, bits of information I pick up day to day. I’m hoping it might provide some insight into what I’m interested at the time, or little tidbits of helpful information I find laying around the web.

Pain(less) NGINX Ingress

Once I get my Kubernetes cluster back up at home, I want to create separate environments for promotions. Right now the deployment I have running is much more pets than cattle, and I want to change that. I want to treat each piece as completely replaceable and interchangeable, and that only happens by having a setup that is not one big snowflake that you are afraid to touch.

How we grow Junior Developers at the BBC

All of this one rang true as an SRE trying to write more code. Mentoring others, while getting mentored are crucial characteristics I feel to being part of a productive team. You can’t just sit behind your monitoring with headphones on and expect to build relationships and have impact.

Kubernetes, the slow way.

It all started when I began hearing about this container thing outside of work. I’ve been a Google SRE going on 6 years, but knowing that the way we do containers internally on Borg is probably not how the rest of the world does reliable, scalable, infrastructure. I was curious, how hard could it be to spin up a few containers and play around like I do at work?

Little did I know, it would take two months, a few hours a few nights a week, to get the point where I was able to access a web service inside my home grown Kubernetes cluster. Below are the high level steps, scripts, and notes I kept during the process.

Continue reading

A simplified way to securely move all the bits.

A while back, I wrote a post about setting up an L2TP/IPSec VPN on my home firewall/router. It required two daemons and a bunch of configuration that had hard coded IP addresses. While this solution used firmly-established practices (L2TP/IPSec), it felt too brittle. What happens when my dynamic IP address changes? Now I need to update config files, restart daemons, etc. There had to be a better way.

Enter IKEv2. IKEv2 is a successor implementation to Internet Security Association and Key Management Protocol (ISAKMP)/Oakley, IKE version 1.

Continue reading

LACP, VLANs, always stay connected.

I was bored last weekend, so I configured a two-port LACP bonded trunk from my FreeBSD-running NAS connected to my HP Procurve switch.


  • I could?
  • I had all these spare Ethernet ports on my NAS, and they seemed bored.
  • More seriously: high availability. One interface serving all my storage traffic just seemed ripe for failure. Imagine serving all your VMs over NFS to a VM server across the network over one NIC, and that one dies. Bad news bears.
  • I also wanted to set up VLANs on top of the trunk. Why? So if I wanted to add a network segment for my NAS on another Layer 3 domain, I don’t have to walk down to the basement to patch another cable.

On to the configuration.

First, I configured the NAS box. Relevant /etc/rc.conf configuration:

# LACP Group
cloned_interfaces="lagg0 vlan12"
ifconfig_lagg0="laggproto lacp laggport igb0 laggport igb1"

# VLAN Configuration

In the first stanza, define the interfaces (igb0, igb1) that comprise the LAGG (link aggregation) interface, simply up’ing them. Then configure your LAGG interface. It’s important to specify the LAGG proto (LACP in my case). Since I’ll be assigning IPs to the tagged VLAN interface, I don’t assign an IP to the raw trunk interface. If I wanted to handle untagged traffic on the trunk, I would specify an IP address.

In the second stanza, configuring the tagged VLAN interface. “vlans_lagg0” specifies a list of VLANs which traverse the lagg0 interface. Then, configure the IP address which will be tagged with VLAN 12 traffic, riding on lagg0.

Ifconfig output should look like the following:

nas1:~ % ifconfig lagg0
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
 ether 0c:c0:7a:54:84:12
 media: Ethernet autoselect
 status: active
 groups: lagg 
 laggproto lacp lagghash l2,l3,l4
 laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
 laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

nas1:~ % ifconfig lagg0.12
lagg0.12: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
 ether 0c:c0:7a:54:84:12
 inet netmask 0xffffff00 broadcast 
 media: Ethernet autoselect
 status: active
 vlan: 12 vlanpcp: 0 parent interface: lagg0
 groups: vlan

With the NAS’s ethernet cables disconnected from the switch, we configure the static LACP trunk. To note, I run an HP Procurve 2910al-24G switch, so my commands will not work for instance on a Cisco switch running IOS. This is the point in the exercise where I wish I had command history on the switch. I can’t remember the commands and in what order I did them. It probably went something not far from the following:

switch1$ enable

switch1# config t

switch1(config)# trunk ethernet 14,16 trk1 lacp 

switch1(config)# vlan 12

switch1(vlan-12)# tagged trk1

What does this do? Creates a two port, static LACP trunk on ports 14 and 16. We then create VLAN 12, and assigning lagg0 to this trunk group.

Now plug those Ethernet cables in and wait a minute for LACP neogtiation to take place.

switch1# show trunk

Load Balancing

Port  | Name Type              | Group Type
 ---- + -------------------------------- 
 14   | nas1-trunk-1 100/1000T | Trk1 LACP
 16   | nas1-trunk-2 100/1000T | Trk1 LACP

switch1# show lacp


LACP Trunk Port LACP
 Port Enabled Group   Status  Partner Status
 ---- ------- ------- ------- ------- -------
 14   Active  Trk1    Up      Yes     Success
 16   Active  Trk1    Up      Yes     Success

Hooray. A working, two-port Ethernet LACP trunk.

I did some testing, removing Ethernet cables from the two-trunk pair seeing how many packets were dropped, or what the logs of the switch looked like.

64 bytes from icmp_seq=204 ttl=63 time=5.204 ms
Request timeout for icmp_seq 207
64 bytes from icmp_seq=208 ttl=63 time=5.223 ms
64 bytes from icmp_seq=224 ttl=63 time=4.502 ms
Request timeout for icmp_seq 225
Request timeout for icmp_seq 226
Request timeout for icmp_seq 227
64 bytes from icmp_seq=228 ttl=63 time=7.078 ms

Four packets dropped twice. Not bad! What does this look like from the switch?

switch1# log Trk1
 Keys: W=Warning I=Information
 M=Major D=Debug E=Error
 ---- Event Log listing: Events Since Boot ----
 I 10/22/16 14:00:51 00078 ports: trunk Trk1 is now active
 I 10/22/16 14:00:53 00076 ports: port 14 in Trk1 is now on-line
 I 10/22/16 14:00:59 00079 ports: trunk Trk1 is now inactive
 I 10/22/16 14:00:59 00078 ports: trunk Trk1 is now active
 I 10/22/16 14:01:02 00076 ports: port 16 in Trk1 is now on-line
 I 10/22/16 14:02:01 00077 ports: port 14 in Trk1 is now off-line
 I 10/22/16 14:02:20 00076 ports: port 14 in Trk1 is now on-line
 I 10/22/16 14:04:50 00077 ports: port 16 in Trk1 is now off-line
 I 10/22/16 14:05:02 00076 ports: port 16 in Trk1 is now on-line
 I 10/22/16 14:05:09 00077 ports: port 14 in Trk1 is now off-line
 I 10/22/16 14:05:21 00076 ports: port 14 in Trk1 is now on-line

Hopefully this end-to-end configuration example helps others who might find it useful.


Get off my lawn, DMZ edition.

I recently changed Internet providers from Comcast Business to Verizon Fios connection. As part of the Fios package, are TV Set Top Boxes (STB) which use coax for Video, and Internet via MOCA for the guide data. It made me curious, what kind of traffic were these things sending on the network? What would they be trying to access? And how hard would it be to DMZ these things off from the rest of my wired/wifi network given I have no idea what they are up to. Behold, a DMZ configuration


  • Cable boxes need to get out to the Internet.
  • Cable boxes should not be able to touch anything else network-wise inside my house but what’s inside the DMZ
  • My wifi/wired networks should be able to initiate connections to the DMZ devices. For science of course (but more for seeing what they are doing).

Continue reading

I wrote my own network latency monitoring agent in Go

For a while I had used Smokeping to generate pretty graphs of network latency between various hosts on my network. The downside with Smokeping was always getting it working. Did I configure my webserver just right? Did I remember to save the webserver configs so that the next time I set this up, things just worked? Did I install all the right Perl modules (and the right versions of each) so that Smokeping’s binary worked? Then there were the differences in operation depending on if I ran it on Linux, OpenBSD, or FreeBSD. There had to be a simpler solution.

I’ve been dabbling in Go and Graphite as side projects at home for a while. Go was a language I’d been wanting to use more given its popularity where I work. Graphite was always this itch I scratched whenever I wanted to visualize machine and network statistics for the various machines on my network. I knew I could come up with a simple solution using these two pieces of tech.

I wanted to start small. Smokeping provides graphs of minimum, maximium, average, and std deviation for round trip times, as well as packet loss. These are all statistics provided by the ping command line tool. Why couldn’t I just wrap ping in a Go binary, and send those data points off to Carbon for graphing in Graphite?

I present the resultant Go binary and library.

parallelping is a Go binary used to ping remote hosts, in parallel. If provided with a Carbon host and port, the data is shipped off to Carbon/Graphite.

carbon-golang is a Go library used to take Carbon metrics and send them off to a Carbon Cache over TCP. I do admit I borrowed a lot of the logic from marpaia/graphite-golang, both because I couldn’t quite get that library to integrate as documented, but also because I wanted the learning experience of building my own Go-based TCP client.

Both of these are my first non-trivial pieces of Go code. The more I spent time with Go the less I felt it’s barrier to entry was as high as anticipated (I’ve been mainly a Python person for many years). Further usage documentation for each bit of code can be found on their respective Github project pages, eventually.

A screenshot so far:




A brand new blog for 2016

A new year gave me an itch to scratch. For years I had been running a pretty standard setup when it came to blogging.

It was as vanilla a setup as one can get, running on a $10/month Linode instance out of their datacenter in Atlanta. I never used the VM much other than for keeping what was an almost-completely static blog. I never had any issues with it. I just wanted to try something new.

The new setup:

I save $5/month and run what I consider a more secure, simpler alternative. We’ll see how this goes.

From 0 to an OpenBSD install, with no hands and a custom disk layout

No one likes to do repetitive OS installs. You know the kind, where you are just clicking through a bunch of prompts for username, password, and partitioning scheme as fast as you can to quickly get to the point where you can get some work done. This scenario happens to me every time OpenBSD releases a new errata. As my OS of choice for firewalls/routers, I use a fresh OS install as the baseline for building a -stable branch of install set files.

While OpenBSD had automated away most of those manual-installation tasks with autoinstall(8), as of a week ago you still could not customize your disk layout. But thanks to commits by OpenBSD developers henning@ and rpe@,you can now specify your own disk layout programmatically to be used during an automated install.

While building a new set of install files is not part of this post, continue reading to see how I got one step  closer by completely automating the base OS install with my custom disk layout.

Using the below source presentation slide decks and Undeadly writeups, along with copius man page reading, my baseline infrastructure and configuration for a completely automated OpenBSD install is as follows:

DHCP server on local LAN configured to provide both an ‘auto_install’ and ‘next-server’ parameter. These two parameters point the pxe-booting host where to grab the code to run the next step of the install.

host openbsd-pxeboottest {    hardware ethernet 52:54:aa:bb:cc:dd;    filename "auto_install";    next-server; } 

Next was a tftp server prepared to handle the request for auto_install:

$ ps auxww -U _tftpd USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND _tftpd 16244 0.0 0.1 780 1172 ?? Is Wed06AM 0:10.14 /usr/sbin/tftpd -4 -l /tftp  $ ls -al /tftp/  total 14312 drwxrwxr-x 2 root wheel 512 May 8 19:46 . drwxr-xr-x 17 root wheel 512 May 6 06:24 .. lrwxr-xr-x 1 root wheel 7 May 6 06:25 auto_install -> pxeboot -rw-r--r-- 1 root wheel 7612185 May 8 21:37 bsd -rw-r--r-- 1 root wheel 80996 May 8 21:37 pxeboot 

The pxeboot and bsd files are the same ones from the install set.

Last but not least is install.conf, the file which contains the answers to the various questions OpenBSD presents during an install. This file must be in the root directory of an httpd server  configured above as ‘next-server’, in my case

Aside from all the normal answers for installation, the new prompt for auto-configuring disk layout is:

URL to autopartitioning template = 

Autodisklabel config:

/    100M-* 75% swap 10M-*  25% 

This config states, with minimums of 100MB for / and10MB for swap, configure the disk layout to provide 75% of its space for / and 25% of its space for swap.

The install should continue as expected and reboot at the end. Upon logging in, I verified the disk layout was what I wanted.

# disklabel -pm wd0 # /dev/rwd0c:... #  size    offset   fstype [fsize bsize cpg] a: 6159.5M 64       4.2BSD 2048 16384 1 # / b: 2029.8M 12614720 swap # none c: 8192.0M 0        unuse 

Based on a 8GB virtual disk file used for testing, ~6000MB for / and 2000MB for swap fits the bill.

Kudos to all the developers involved with this new functionality. I look forward to using it increasingly in the future.