Kubernetes, the slow way.

It all started when I began hearing about this container thing outside of work. I’ve been a Google SRE going on 6 years, but knowing that the way we do containers internally on Borg is probably not how the rest of the world does reliable, scalable, infrastructure. I was curious, how hard could it be to spin up a few containers and play around like I do at work?

Little did I know, it would take two months, a few hours a few nights a week, to get the point where I was able to access a web service inside my home grown Kubernetes cluster. Below are the high level steps, scripts, and notes I kept during the process.

Continue reading “Kubernetes, the slow way.”

A simplified way to securely move all the bits.

A while back, I wrote a post about setting up an L2TP/IPSec VPN on my home firewall/router. It required two daemons and a bunch of configuration that had hard coded IP addresses. While this solution used firmly-established practices (L2TP/IPSec), it felt too brittle. What happens when my dynamic IP address changes? Now I need to update config files, restart daemons, etc. There had to be a better way.

Enter IKEv2. IKEv2 is a successor implementation to Internet Security Association and Key Management Protocol (ISAKMP)/Oakley, IKE version 1.

Continue reading “A simplified way to securely move all the bits.”

LACP, VLANs, always stay connected.

I was bored last weekend, so I configured a two-port LACP bonded trunk from my FreeBSD-running NAS connected to my HP Procurve switch.


  • I could?
  • I had all these spare Ethernet ports on my NAS, and they seemed bored.
  • More seriously: high availability. One interface serving all my storage traffic just seemed ripe for failure. Imagine serving all your VMs over NFS to a VM server across the network over one NIC, and that one dies. Bad news bears.
  • I also wanted to set up VLANs on top of the trunk. Why? So if I wanted to add a network segment for my NAS on another Layer 3 domain, I don’t have to walk down to the basement to patch another cable.

On to the configuration.

First, I configured the NAS box. Relevant /etc/rc.conf configuration:

# LACP Group
# https://www.freebsd.org/doc/handbook/network-aggregation.html
cloned_interfaces="lagg0 vlan12"
ifconfig_lagg0="laggproto lacp laggport igb0 laggport igb1"

# VLAN Configuration
# https://www.freebsd.org/doc/handbook/network-vlan.html

In the first stanza, define the interfaces (igb0, igb1) that comprise the LAGG (link aggregation) interface, simply up’ing them. Then configure your LAGG interface. It’s important to specify the LAGG proto (LACP in my case). Since I’ll be assigning IPs to the tagged VLAN interface, I don’t assign an IP to the raw trunk interface. If I wanted to handle untagged traffic on the trunk, I would specify an IP address.

In the second stanza, configuring the tagged VLAN interface. “vlans_lagg0” specifies a list of VLANs which traverse the lagg0 interface. Then, configure the IP address which will be tagged with VLAN 12 traffic, riding on lagg0.

Ifconfig output should look like the following:

nas1:~ % ifconfig lagg0
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
 ether 0c:c0:7a:54:84:12
 media: Ethernet autoselect
 status: active
 groups: lagg 
 laggproto lacp lagghash l2,l3,l4
 laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
 laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

nas1:~ % ifconfig lagg0.12
lagg0.12: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
 ether 0c:c0:7a:54:84:12
 inet netmask 0xffffff00 broadcast 
 media: Ethernet autoselect
 status: active
 vlan: 12 vlanpcp: 0 parent interface: lagg0
 groups: vlan

With the NAS’s ethernet cables disconnected from the switch, we configure the static LACP trunk. To note, I run an HP Procurve 2910al-24G switch, so my commands will not work for instance on a Cisco switch running IOS. This is the point in the exercise where I wish I had command history on the switch. I can’t remember the commands and in what order I did them. It probably went something not far from the following:

switch1$ enable

switch1# config t

switch1(config)# trunk ethernet 14,16 trk1 lacp 

switch1(config)# vlan 12

switch1(vlan-12)# tagged trk1

What does this do? Creates a two port, static LACP trunk on ports 14 and 16. We then create VLAN 12, and assigning lagg0 to this trunk group.

Now plug those Ethernet cables in and wait a minute for LACP neogtiation to take place.

switch1# show trunk

Load Balancing

Port  | Name Type              | Group Type
 ---- + -------------------------------- 
 14   | nas1-trunk-1 100/1000T | Trk1 LACP
 16   | nas1-trunk-2 100/1000T | Trk1 LACP

switch1# show lacp


LACP Trunk Port LACP
 Port Enabled Group   Status  Partner Status
 ---- ------- ------- ------- ------- -------
 14   Active  Trk1    Up      Yes     Success
 16   Active  Trk1    Up      Yes     Success

Hooray. A working, two-port Ethernet LACP trunk.

I did some testing, removing Ethernet cables from the two-trunk pair seeing how many packets were dropped, or what the logs of the switch looked like.

64 bytes from icmp_seq=204 ttl=63 time=5.204 ms
Request timeout for icmp_seq 207
64 bytes from icmp_seq=208 ttl=63 time=5.223 ms
64 bytes from icmp_seq=224 ttl=63 time=4.502 ms
Request timeout for icmp_seq 225
Request timeout for icmp_seq 226
Request timeout for icmp_seq 227
64 bytes from icmp_seq=228 ttl=63 time=7.078 ms

Four packets dropped twice. Not bad! What does this look like from the switch?

switch1# log Trk1
 Keys: W=Warning I=Information
 M=Major D=Debug E=Error
 ---- Event Log listing: Events Since Boot ----
 I 10/22/16 14:00:51 00078 ports: trunk Trk1 is now active
 I 10/22/16 14:00:53 00076 ports: port 14 in Trk1 is now on-line
 I 10/22/16 14:00:59 00079 ports: trunk Trk1 is now inactive
 I 10/22/16 14:00:59 00078 ports: trunk Trk1 is now active
 I 10/22/16 14:01:02 00076 ports: port 16 in Trk1 is now on-line
 I 10/22/16 14:02:01 00077 ports: port 14 in Trk1 is now off-line
 I 10/22/16 14:02:20 00076 ports: port 14 in Trk1 is now on-line
 I 10/22/16 14:04:50 00077 ports: port 16 in Trk1 is now off-line
 I 10/22/16 14:05:02 00076 ports: port 16 in Trk1 is now on-line
 I 10/22/16 14:05:09 00077 ports: port 14 in Trk1 is now off-line
 I 10/22/16 14:05:21 00076 ports: port 14 in Trk1 is now on-line

Hopefully this end-to-end configuration example helps others who might find it useful.


Get off my lawn, DMZ edition.

I recently changed Internet providers from Comcast Business to Verizon Fios connection. As part of the Fios package, are TV Set Top Boxes (STB) which use coax for Video, and Internet via MOCA for the guide data. It made me curious, what kind of traffic were these things sending on the network? What would they be trying to access? And how hard would it be to DMZ these things off from the rest of my wired/wifi network given I have no idea what they are up to. Behold, a DMZ configuration


  • Cable boxes need to get out to the Internet.
  • Cable boxes should not be able to touch anything else network-wise inside my house but what’s inside the DMZ
  • My wifi/wired networks should be able to initiate connections to the DMZ devices. For science of course (but more for seeing what they are doing).

Continue reading “Get off my lawn, DMZ edition.”

I wrote my own network latency monitoring agent in Go

For a while I had used Smokeping to generate pretty graphs of network latency between various hosts on my network. The downside with Smokeping was always getting it working. Did I configure my webserver just right? Did I remember to save the webserver configs so that the next time I set this up, things just worked? Did I install all the right Perl modules (and the right versions of each) so that Smokeping’s binary worked? Then there were the differences in operation depending on if I ran it on Linux, OpenBSD, or FreeBSD. There had to be a simpler solution.

I’ve been dabbling in Go and Graphite as side projects at home for a while. Go was a language I’d been wanting to use more given its popularity where I work. Graphite was always this itch I scratched whenever I wanted to visualize machine and network statistics for the various machines on my network. I knew I could come up with a simple solution using these two pieces of tech.

I wanted to start small. Smokeping provides graphs of minimum, maximium, average, and std deviation for round trip times, as well as packet loss. These are all statistics provided by the ping command line tool. Why couldn’t I just wrap ping in a Go binary, and send those data points off to Carbon for graphing in Graphite?

I present the resultant Go binary and library.

parallelping is a Go binary used to ping remote hosts, in parallel. If provided with a Carbon host and port, the data is shipped off to Carbon/Graphite.

carbon-golang is a Go library used to take Carbon metrics and send them off to a Carbon Cache over TCP. I do admit I borrowed a lot of the logic from marpaia/graphite-golang, both because I couldn’t quite get that library to integrate as documented, but also because I wanted the learning experience of building my own Go-based TCP client.

Both of these are my first non-trivial pieces of Go code. The more I spent time with Go the less I felt it’s barrier to entry was as high as anticipated (I’ve been mainly a Python person for many years). Further usage documentation for each bit of code can be found on their respective Github project pages, eventually.

A screenshot so far:




A brand new blog for 2016

A new year gave me an itch to scratch. For years I had been running a pretty standard setup when it came to blogging.

It was as vanilla a setup as one can get, running on a $10/month Linode instance out of their datacenter in Atlanta. I never used the VM much other than for keeping what was an almost-completely static blog. I never had any issues with it. I just wanted to try something new.

The new setup:

I save $5/month and run what I consider a more secure, simpler alternative. We’ll see how this goes.

From 0 to an OpenBSD install, with no hands and a custom disk layout

No one likes to do repetitive OS installs. You know the kind, where you are just clicking through a bunch of prompts for username, password, and partitioning scheme as fast as you can to quickly get to the point where you can get some work done. This scenario happens to me every time OpenBSD releases a new errata. As my OS of choice for firewalls/routers, I use a fresh OS install as the baseline for building a -stable branch of install set files.

While OpenBSD had automated away most of those manual-installation tasks with autoinstall(8), as of a week ago you still could not customize your disk layout. But thanks to commits by OpenBSD developers henning@ and rpe@,you can now specify your own disk layout programmatically to be used during an automated install.

While building a new set of install files is not part of this post, continue reading to see how I got one step  closer by completely automating the base OS install with my custom disk layout.

Using the below source presentation slide decks and Undeadly writeups, along with copius man page reading, my baseline infrastructure and configuration for a completely automated OpenBSD install is as follows:

DHCP server on local LAN configured to provide both an ‘auto_install’ and ‘next-server’ parameter. These two parameters point the pxe-booting host where to grab the code to run the next step of the install.

host openbsd-pxeboottest {    hardware ethernet 52:54:aa:bb:cc:dd;    filename "auto_install";    next-server; } 

Next was a tftp server prepared to handle the request for auto_install:

$ ps auxww -U _tftpd USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND _tftpd 16244 0.0 0.1 780 1172 ?? Is Wed06AM 0:10.14 /usr/sbin/tftpd -4 -l /tftp  $ ls -al /tftp/  total 14312 drwxrwxr-x 2 root wheel 512 May 8 19:46 . drwxr-xr-x 17 root wheel 512 May 6 06:24 .. lrwxr-xr-x 1 root wheel 7 May 6 06:25 auto_install -> pxeboot -rw-r--r-- 1 root wheel 7612185 May 8 21:37 bsd -rw-r--r-- 1 root wheel 80996 May 8 21:37 pxeboot 

The pxeboot and bsd files are the same ones from the install set.

Last but not least is install.conf, the file which contains the answers to the various questions OpenBSD presents during an install. This file must be in the root directory of an httpd server  configured above as ‘next-server’, in my case

Aside from all the normal answers for installation, the new prompt for auto-configuring disk layout is:

URL to autopartitioning template = 

Autodisklabel config:

/    100M-* 75% swap 10M-*  25% 

This config states, with minimums of 100MB for / and10MB for swap, configure the disk layout to provide 75% of its space for / and 25% of its space for swap.

The install should continue as expected and reboot at the end. Upon logging in, I verified the disk layout was what I wanted.

# disklabel -pm wd0 # /dev/rwd0c:... #  size    offset   fstype [fsize bsize cpg] a: 6159.5M 64       4.2BSD 2048 16384 1 # / b: 2029.8M 12614720 swap # none c: 8192.0M 0        unuse 

Based on a 8GB virtual disk file used for testing, ~6000MB for / and 2000MB for swap fits the bill.

Kudos to all the developers involved with this new functionality. I look forward to using it increasingly in the future.


All the bits, from anywhere.

Problem Statement: While OpenVPN has served me well over the past few years both for site-to-site and road-warrior style VPN connections, it always bugged me that I had to hack a config file, juggle certificates, and use a custom client that isn’t part of the base OS to bring up the links. My Android phone has a built-in L2TP/IPSec VPN client. My Macbook Pro OS X 10.9 laptop has both an IPSec and L2TP VPN client GUI wrapped around racoon. I run OpenBSD as my firewall/router gateway at home. There must be a solution here.

Goal: To allow all remote clients (both site-to-site and road-warrior) to connect and route all their traffic securely over the Internet through my OpenBSD machine at home.

Some of the hurdles I dealt with, and corners I knew I was cutting, to get the below solution working:

  • The devices I carry with me most of the time (Nexus 5 Android phone, OS X laptop) only support IKEv1, and not IKEv2. Therefore I could not use iked on OpenBSD, I had to use isakmpd.
  • I know that using client certificates is the more secure way to go when authenticating IPSec traffic, but I used a pre-shared key in this example for expediency and simplicity. I plan to migrate to certificates once I get my head wrapped around easily managing them.

On to the configuration. First, the PPP server on the OpenBSD machine. A simple configuration using npppd, handing out IPs on, using statically configured usernames and passwords.

npppd.conf(5) configuration:

set user-max-session 2  authentication LOCAL type local {     users-file "/etc/npppd/npppd-users" }  tunnel L2TP protocol l2tp {     listen on $external_ipv4_ip     l2tp-hostname $external_dns_A_record     idle-timeout 3600 # 1 hour }  ipcp IPCP {     pool-address ""     dns-servers }  interface tun1 address $vpn_endpoint_IP ipcp IPCP bind tunnel from L2TP authenticated by LOCAL to tun1 

Next was isakmpd, the daemon responsible for handling security associations (SA) and handling encrypted and authenticated network traffic.

isakmpd.conf(5) configuration:

ike passive esp transport    proto udp from $(external_IPv4_IP) to any port 1701    main auth hmac-sha1 enc aes group modp1024    quick auth hmac-sha2-256 enc aes group modp1024    psk dce930cbf010a35f336e640de0b7ff8e94b6b2a512d0ec41268e8e20a154fooo 

For my PSK (pre-shared key), I used OpenSSL to generate this random string. Don’t worry, the following is not my PSK, and you should not copy this verbatim.

$ openssl rand -hex 32 dce930cbf010a35f336e640de0b7ff8e94b6b2a512d0ec41268e8e20a1546044 

The below PF rules allow both authenticated and encrypted communication. I tried to be a specific as I could with all rules, having ‘from any to any’, or the like, was avoided at all costs. The last line is not specifically IPSec related, but I will explain it after.

$ext_if = "YOUR_EXTERNAL_INTERFACE_TO_THE_INTERNET" ipsec_if = "enc0" ipsec_tun = "tun1" table <ipsec_net> { }  pass in proto { esp } from any to ($ext_if) pass in on $ext_if proto udp from any to any port {500, 1701, 4500} keep state pass in on $ipsec_if from any to ($ext_if) keep state (if-bound) pass in on $ipsec_tun from <ipsec_net> to any keep state (if-bound) pass out on $ext_if inet from <ipsec_net> to any nat-to ($ext_if) 


  • 500: isakmpd key management
  • 1701: L2TP (used by npppd)
  • 4500: IPSec Nat-Traversal (used by isakmpd)

The last rule allows for remote road-warrior VPN clients to use NAT and route their traffic out my OpenBSD machine. The reason ‘$ipsec_tun:network” is not a viable macro to use in the NAT rule is that the interface created by nppd is not configured with a subnet attached to it. Try as I might, even with configuring /etc/hostname.tun1, when npppd comes up, the interface is configured as pasted below. The only solution I found here was specifying the network itself as either a table or a variable macro.

$ ifconfig tun1 tun1: flags=20043<UP,BROADCAST,RUNNING,NOINET6> mtu 1500     priority: 0     groups: tun     status: active     inet netmask 0xffffffff 

Relevant rc.conf.local snippets

# IPSec Rules ipsec=yes ipsec_rules=/etc/ipsec.conf isakmpd_flags="-K" npppd_flags="-f /etc/npppd/npppd.conf" npppd_flags="" 

At this point, start npppd and isakmpd via their rc.d scripts. It is absolutely critical, given the ‘-K’ flag for isakmpd, you load the IPSec rules manually each time you restart isakmpd via ipsecctl. This bit me many times when testing connections, as isakmpd complains there are encryption mismatches between what the client sent, and what the server expected.

# ipsecctl -f /etc/ipsec.conf 

The above ipsecctl command is the magic incantation you must run manually (the above flags in rc.conf.local ensure it is run upon boot) every time you restart isakmpd.

At this point, I configured my client to use the above username, password, and pre-shared key to connect. I now had a working road-warrior style L2TP/IPSec VPN connection that I could use to access both my internal infrastructure and route traffic out through my Internet connection at home as if I was a client sitting on the internal network.

And now, some log snippets to show what it looks like when an active PPP/L2TP IPSec connection is made:

# npppctl session all Ppp Id = 3 Ppp Id : 3 Username : jforman Realm Name : LOCAL Concentrated Interface : tun1 Assigned IPv4 Address : Tunnel Protocol : L2TP Tunnel From : $REMOTE_IP Start Time : 2015/04/25 15:05:40 Elapsed Time : 23 sec Input Bytes : 9029 (8.8 KB) Input Packets : 86 Input Errors : 2 (2.3%) Output Bytes : 345 Output Packets : 14 Output Errors : 0 (0.0%) 

Npppd logs:

Apr 25 15:05:38 VPNCONCENTRATOR npppd[9306]: l2tpd ctrl=6 logtype=Started RecvSCCRQ from=$(PUBLIC SOURCE IP OF VPN CLIENT):63771/udp tunnel_id=6/7 protocol=1.0 winsize=4 hostname=roadwarrior.theinter.net vendor=(no vendorname) firm=0000 Apr 25 15:05:38 VPNCONCENTRATOR npppd[9306]: l2tpd ctrl=6 SendSCCRP Apr 25 15:05:38 VPNCONCENTRATOR npppd[9306]: l2tpd ctrl=6 logtype=Started RecvSCCRQ from=$(PUBLIC SOURCE IP OF VPN CLIENT):63771/udp tunnel_id=6/7 protocol=1.0 winsize=4 hostname=roadwarrior.theinter.net vendor=(no vendorname) firm=0000 Apr 25 15:05:38 VPNCONCENTRATOR npppd[9306]: l2tpd ctrl=6 RecvSCCN Apr 25 15:05:38 VPNCONCENTRATOR npppd[9306]: l2tpd ctrl=6 SendZLB Apr 25 15:05:39 VPNCONCENTRATOR npppd[9306]: l2tpd ctrl=6 call=18525 RecvICRQ session_id=16020 Apr 25 15:05:39 VPNCONCENTRATOR npppd[9306]: l2tpd ctrl=6 call=18525 SendICRP session_id=18525 Apr 25 15:05:39 VPNCONCENTRATOR npppd[9306]: l2tpd ctrl=6 call=18525 RecvICCN session_id=16020 calling_number= tx_conn_speed=1000000 framing=async Apr 25 15:05:39 VPNCONCENTRATOR npppd[9306]: l2tpd ctrl=6 call=18525 logtype=PPPBind ppp=3 Apr 25 15:05:39 VPNCONCENTRATOR npppd[9306]: ppp id=3 layer=base logtype=Started tunnel=L2TP($(PUBLIC SOURCE IP OF VPN CLIENT):63771) Apr 25 15:05:39 VPNCONCENTRATOR npppd[9306]: l2tpd ctrl=6 call=18525 SendZLB Apr 25 15:05:39 VPNCONCENTRATOR npppd[9306]: l2tpd ctrl=6 call=18525 logtype=PPPBind ppp=3 Apr 25 15:05:42 VPNCONCENTRATOR npppd[9306]: ppp id=3 layer=lcp logtype=Opened mru=1360/1360 auth=MS-CHAP-V2 magic=145d130e/4efdd7cc Apr 25 15:05:42 VPNCONCENTRATOR npppd[9306]: ppp id=3 layer=chap proto=mschap_v2 logtype=Success username="$USERNAME" realm=LOCAL Apr 25 15:05:43 VPNCONCENTRATOR npppd[9306]: ppp id=3 layer=ccp CCP is stopped Apr 25 15:05:45 VPNCONCENTRATOR npppd[9306]: ppp id=3 layer=ipcp logtype=Opened ip= assignType=dynamic Apr 25 15:05:45 VPNCONCENTRATOR npppd[9306]: ppp id=3 layer=base logtype=TUNNELSTART user="$USERNAME" duration=6sec layer2=L2TP layer2from=$(PUBLIC SOURCE IP OF VPN CLIENT):63771 auth=MS-CHAP-V2 ip= iface=tun1 Apr 25 15:05:45 VPNCONCENTRATOR npppd[9306]: ppp id=3 layer=base Using pipex=yes Apr 25 15:05:45 VPNCONCENTRATOR npppd[9306]: ppp id=3 layer=base logtype=TUNNELSTART user="$USERNAME" duration=6sec layer2=L2TP layer2from=$(PUBLIC SOURCE IP OF VPN CLIENT):63771 auth=MS-CHAP-V2 ip= iface=tun1 Apr 25 15:05:45 VPNCONCENTRATOR /bsd: pipex: ppp=3 iface=tun1 protocol=L2TP id=18525 PIPEX is ready. Apr 25 15:05:45 VPNCONCENTRATOR npppd[9306]: ppp id=3 layer=base Using pipex=yes 

ipsecctl output showing flows and security associations:

# ipsecctl -s all 


flow esp in proto udp from $(PUBLIC SOURCE IP OF VPN CLIENT) port 63771 to $(PUBLIC IPV4 VPN TERMINATOR IP) port l2tp peer $(PUBLIC SOURCE IP OF VPN CLIENT) srcid $(PUBLIC IPV4 VPN TERMINATOR IP)/32 dstid $(RFC1918 IP of VPN CLIENT)/32 type use flow esp out proto udp from $(PUBLIC IPV4 VPN TERMINATOR IP) port l2tp to $(PUBLIC SOURCE IP OF VPN CLIENT) port 63771 peer $(PUBLIC SOURCE IP OF VPN CLIENT) srcid $(PUBLIC IPV4 VPN TERMINATOR IP)/32 dstid $(RFC1918 IP of VPN CLIENT)/32 type require flow esp out from ::/0 to ::/0 type deny  SAD: esp transport from $(PUBLIC IPV4 VPN TERMINATOR IP) to $(PUBLIC SOURCE IP OF VPN CLIENT) spi 0x083cd308 auth hmac-sha1 enc aes-256 esp transport from $(PUBLIC SOURCE IP OF VPN CLIENT) to $(PUBLIC IPV4 VPN TERMINATOR IP) spi 0x5f9cd5a0 auth hmac-sha1 enc aes-256 


Many many OpenBSD man pages: isakmpd(8), iked(8), ipsec.conf(5), npppd(8), npppd.conf(5)

Family Tech Support: Vacation Edition

This was an epic visit home, tech-wise. Just so I don’t forget, and can hold it over my folks’ head for a while:

  • Upgraded two five-year-old Linksys E2000 AP’s to Netgear r6250’s. Those old ones were just not reaching the entire length of the house anymore.
  • Upgraded the firewall/router from OpenBSD 5.5-stable to OpenBSD 5.6-stable. It just so happens I’m home every six months to stay relatively close to the most-recent errata.
  • Converted my father’s Gmail account over from one-factor to two-factor authentication thanks to some nasty spyware/adware and potential identity-theft issues he’s had recently. I wasn’t willing to do this conversion remotely given the horror of Application Specific Passwords and how many devices I would have to do it on (desktops, laptops, one iPhone, and one iPad)
  • Reinstalled one Late 2009 21.5″ iMac via Internet Recovery to OSX 10.10 due to aforementioned nasty adware infestation.
  • Upgraded that same iMac from 4GB RAM to 16 GB RAM.

All I can say is that it’s nice having all Mac’s in the house now, after finally kicking out the last Windows-based PC on my last visit.

Third time’s a charm? Gitolite, Git, Nagios, and a bunch of hooks

I was hoping with my past posts on this topic, I would have enough examples to just copy-and-paste along to configure my Gitolite+Nagios monitoring setup. Not so true. It looked like there were semi-colon’s missing in my past examples. After looking at the huge number of changes in Gitolite, I had to re-do everything. Not to mention I always wanted a better way to manage the hooks as opposed to editing them directly on the host. In short, my goal is still simple: be able to manage and verify Nagios configuration remotely via Git. Below is how I did it. For the third time.

First, install Gitolite. I run Gitolite under the ‘git’ user on my Ubuntu-based VM, from now on called monitor1. I clone the Gitolite source under /home/git/gitolite.

In /home/git/.gitolite.rc, in the %RC block, uncomment:

LOCAL_CODE => "$rc{GL_ADMIN_BASE}/local", 

This option tells Gitolite we have some local configuration in our gitolite-admin repository under the ‘local’ directory. More on this later.

In the ENABLE list, uncomment:


This option tells Gitolite we want to be able to to use repo-specific hooks, as opposed to having one set of hooks for all repositories.

Since several of our yet-to-be-defined hooks need elevated permissions, I have configured a sudoers file to allow so.

%nagios-admin ALL=(ALL) NOPASSWD: /usr/sbin/nagios3 -v /tmp/nagiostest-*/nagios.cfg %nagios-admin ALL=(ALL) NOPASSWD: /usr/sbin/service nagios3 restart 

Our ‘nagios’ user is added to the nagios-admin group, along with the git user. This allows us via Gitolite’s hooks to test, update, and restart the Nagios installation.

This concludes all the work on monitor1 as it relates to Gitolite.

On your local workstation, clone the gitolite-admin repo. I’ve chosen to name the repo containing my Nagios configuration, ‘nagios’. At this point, it is probably safe to copy a known-working copy of your Nagios configuration files into the nagios repository itself. The steps following here, if done in one fell swoop, could completely blow away your /etc/nagios3 directory on monitor1 if you are not careful.

One modification necessary for the nagios.cfg itself is to modify the references to the path of the the configuration files. By default, the nagios.cfg lists an absolute path to the files, e.x: /etc/nagios3/conf.d/. In our case, we will be checking out the configuration files to a temporary directory while we run our pre-flight checks and need to use a relative path instead to make this possible.

Therefore in your nagios.cfg file, perform the following changes:

cfg_file=commands.cfg cfg_dir=conf.d 

Now that I look at it, I’m not quite sure why these are specified separately, as your commands.cfg file could live under conf.d. But I’ll leave that for readers who have their own structure preferences. The key here is that relative paths must be used, NOT absolute ones.

Next we move onto the gitolite-admin configuration:

repo nagios   RW+ = jforman_rsa   option hook.pre-receive = nagios-pre-receive   option hook.post-receive = nagios-post-receive   option hook.post-update = nagios-post-update 

This tells Gitolite the name of my nagios config repository, who is ACL’d to read and write to it, and which hooks I wish to override with my custom hooks. Note that in Gitolite, you can only override these three hooks, pre and post-receive, and post-update. Other hooks such a post-merge and merge are special to Gitolite and you will be returned an error if you attempt to override them. Note that each hook, nagios-pre-receive, and so on, corresponds to a file name that will live under my gitolite-admin repository under the ‘local’ directory.

Now we come to the point of defining our hooks. Under your gitolite-admin directory, create the directory structure ‘hooks/repo-specific’ under the directory we defined in the above LOCAL_CODE definition. In our case, that corresponds to ‘local’.

In other words. in our local checkout of the gitolite-admin repository:

mkdir -p ${gitolite_admin_path}/local/hooks/repo-specific 

Under this repo-specific directory, using whatever language you prefer (Python, Shell, Perl, etc), create the files for the repository’s hooks.

gitolite-admin$ tree local/ local/   └── hooks     └── repo-specific       ├── nagios-post-receive       ├── nagios-post-update       └── nagios-pre-receive 


#!/bin/bash  umask 022  while read OLD_SHA1 NEW_SHA1 REFNAME;  do  export GIT_WORK_TREE=/tmp/nagiostest-$NEW_SHA1  mkdir -p $GIT_WORK_TREE /usr/bin/git checkout -f $NEW_SHA1 sudo /usr/sbin/nagios3 -v $GIT_WORK_TREE/nagios.cfg if [ "$?" -ne "0" ]; then   echo "Nagios Preflight Failed"    echo "See the above error, fix your config, and re-push to attempt to update Nagios."    exit 1  else    echo "Nagios Preflight Passed"    echo "Clearing temporary work directory."    rm -rf $GIT_WORK_TREE exit 0 fi done 

nagios-pre-receive: Using the most recent commit, checkout this body of work into a temp directory and run the Nagios pre-flight checks over it. If those checks pass, exit 0 (without error). If those pre-flight checks fail, error out. This latter case will stop the Git push completely. Your running Nagios configs in /etc/nagios3 are untouched.


#!/bin/bash  echo "Updating repo /etc/nagios3" /usr/bin/update-gitrepo /etc/nagios3 

nagios-post-receive: This runs a companion script to update the cloned git repository at /etc/nagios3. Note that this is only run if the post-receive succeeds, and executes after the merge step of the git push has succeeded, which means our Gitolite nagios repository has now merged the commits we are attempting to push.


#!/bin/bash  sudo chown -R root:nagios-admin /etc/nagios3 sudo /usr/sbin/service nagios3 restart 

nagios-post-update: The post-update step runs after the post-receive, ensuring our permissions are correct and then restarts Nagios.

At this point, the custom hooks and gitolite.conf should be committed and pushed to the remote Gitolite gitolite-admin repository. No more storing hooks in the bare repository itself! The hooks themselves are version controlled. This was what bugged me the most about my prior solutions. I always hated not having any history on how I fixed (broke) the scripts in the past.


#!/bin/bash umask 022  REPO_DIR=$1 cd ${REPO_DIR} unset GIT_DIR /usr/bin/git pull origin master 

update-gitrepo: Lives on the monitor1 under /usr/bin, and merely executes a ‘git pull’ under the passed directory.

This last bit of configuration I’m not too happy with, given it feels so manual and hacky. It’s how you get /etc/nagios3 to be the Git repository checkout itself. On monitor1, I normally do this initial work in /tmp, as the ‘git’ user.

cd /tmp git clone /home/git/repositories/nagios.git/ (as root) mv /etc/nagios3 /etc/nagios3.notgit (as root) mv /tmp/nagios /etc/nagios3 (as root) chown -R root:nagios-admin /etc/nagios3 /usr/bin/update-gitrepo /etc/nagios3 

This performs a checkout of the Nagios repository itself (note that we’re skirting Gitolite’s ACL control and accessing the repository directly on the file system). If you wish to further control who can check out the nagios directory, create an ssh-key for your ‘git’ user locally, add to the gitolite-admin repository ACL, and perform your checkout over SSH as opposed to using the absolute-path of the directory.

Wow. This post turned out to be much longer than I expected. If you’ve made it this far, you should have the groundwork laid to do remote clones of your Nagios configuration files, and have the ability to run the pre-flight check verify their correctness before ever getting near your production Nagios directory. Good luck.