Finally! #
Like many others, I’ve been waiting on stock of Raspberry Pi 4’s to come back to start a new project I’ve been wanting to work on for some time, and a couple of weeks ago it finally happened. I picked up eight Pi 4’s and got to finalising the design.
OK hold up, why? #
Great question before we get started, what’s the point? I’ve been wanting to expand the application I run in my homelab away from a single point of failure for some time, not only for better service availability (which let’s be honest is only for my benefit and nerd points) but also as a learning experience to start to understand the practicality behind these architectures, which I’ve wanted to do for some time. It’s all well and good knowing “how it works in theory”, but there’s a load of nuance to unpack in the real world. Additionally, if that doesn’t justify it for you, it’s fun!
Good enough for me, let’s get started.
The Plan #
Hardware #
The Pis were the compute aspect of the upgrades sorted, but I had some other considerations to make. Firstly, heat. The Pi 4s if running a heavy load can start to creep up into the uncomfortable (for me) range, so I picked up some heatsinks, I’m yet to run a heavy workload on all the Pis simultaneously, but idle they’re now down 10℃, which is much more tolerable. Secondly, I had spent a few evening researching storage for the Pi’s, knowing that SD cards can be unreliable over time I wanted to upgrade that, and thankfully the Pi 4 firmware as of 2020 supports USB boot by default, and that is one of the fastest storage configurations you can setup on a Pi, so I went with that. I bought some 500GB NVMe SSDs, and external enclosures to keep everything nice and pretty and then bought a rack shelf to mount at the back of the rack to keep them all. Speaking of the rack, the last thing to do on the hardware side was to install the Pis into the rack, I had already bought the 3U rackmount for NUCs from MyElectronics linked in my first blog post, so it was just a matter of installing the Pis into that rack (with the heatsinks I needed to order some extra M2.5 Screws) and that basically covers all the Pi related upgrades.
While I was at it though, I thought it’d be a good time to sort out cabling, so I bought some 28AWG Cat6 cables from FS.com, colour coded to VLANs, even though my switch doesn’t support VLANs, there’s always the next upgrade to fix that problem though 😉. Oh, and I finally received delivery of some rails for my storage server, so those got installed too.
Software #
I knew I wanted to use Kuberenetes, mainly for the reason that it’s not what I work with at my full time job, something new, with a whole bunch of depth and complexity that I wanted to get into. I had heard that K8S can be quite challenging to run on low spec hardware though like the Raspberry Pis, so I was mainly looking for K3S.
After doing some research, I found this excellent repo by Techno-Tim. I can’t shout this guy out enough, his content is top quality, covers a whole bunch of technical topics in very easy to follow videos and explains the core concepts well and succinctly. Seriously, one of my favourite channels. The example he goes through in the video though covers a simpler 5 node setup, three servers, and two agents. Since I’ve got more nodes in my cluster I’ve created three virtual machines on the NUC Proxmox node, one server, and two agents. My thinking here is that I don’t want to tie up three Pis with managing the cluster, I want some agents to run on x86 hardware, and I want to be able to lose at least one physical server before my kubectl
commands stop working. So my final configuration will have three, highly available K3S servers for managing the cluster (one of which is x86), and eight agents for my workloads, two of which are virtualised.
Installation #
After following that guide I had a cluster up and responding on the VIP, copied my .kube/config
down to my machines and I was ready to go. First I got to installing Traefik, I have bought a domain for internal use so I can have valid certificates for all my internal services, which works through the DNS-01 challenge type. So once I had that set up on all my existing services, I started migrating services from running in docker containers or “jails” on TrueNAS Core into containers in the cluster, some of which I’ve left in virtual machines because those services are not designed for HA, so just replicating a virtual machine actually works out to be the simpler setup for HA on those applications.
Next, I knew I wanted to get Tailscale installed in my network so I could enable remote access from anywhere, so that meant figuring out Kubernetes networking. Coming from never working with this before it can seem a bit daunting, and by no means am I an expert after just a week of tinkering, but I got enough of a grasp of it to get remote acccess working, with DNS through the subnet routing in the Tailscale docs.
Some of the services I wanted though required disk access, so I took to figuring out Kubernetes storage next. I chose Longhorn, mainly because it’s directly mentioned in the K3S docs as supported. This is where the problems started coming in…
Problems #
Installing longhorn on K3S is made out to be very simple, but in my experience it has been far from it. After following the official Longhorn documentation, I have had problems with the core-dns
& metrics-server
containers in the kube-system
namespace, containers stuck in the ContainerCreating
status within the longhorn namespace, which basically means that my cluster can’t monitor the running state of containers across my nodes, and the Traefik containers I had running on the network stop responding. I’ve spent a few days trying to debug this, and even a complete re-installation of the cluster through the playbooks, and installing Longhorn through one of the other installation methods, but there’s obviously something that I’m missing here that I just need more time to figure out.
Conclusion #
So I would love to have a succinct conclusion for you, but unfortunately I’ve run into some problems that ate up more time than I would have liked during my limited time off. However, I think this is probably a good take home lesson for those who have read through all this, and the time has been far from a failure, I’ve already learnt a bunch about how Kubernetes networks operate, as well as Kuberenetes storage and architecting services for HA.
So the conclusion, like many homelabs, is that there is no conclusion. I’ll be continuing to work on my Kubernetes cluster going forward to get as many of my applications up and running, highly available, and learning about Kubernetes (and most likely breaking it a few more times just be sure), I’ll post an update when I get to a configuration that I’m happy with, but new hardware is always exciting and introduces a lot more possibilities, and I am very excited to learn a whole bunch and reach that desired end state.