pipe-bot, the program I successfully wrote in Rust, is very simple — it listens to standard in, then calls to the Discord API based on the message:
A diagram showing the flow of execution for pipe-bot. After starting a Discord client, it waits for stdin, parses it, and then either sends a message, updates the Discord status, or logs an error based on the parsed message.
However, I started with systemctl-bot, which monitors and controls systemd units, parses and shares a config file, reads async streams, and generally has weird edge cases. While it’s not overly complex, it’s a lot to get your head around when you’re also learning the borrower checker and async Rust.
A diagram showing the more complex flow of execution for systemctl-bot. It parses a config file, starts a Discord client, and then branches off into two threads: one handling status updates, and the other handling commands (after registering them with the Discord client.) The status thread waits for unit status updates, then fetches all units' statuses and updates Discord. The command loop waits for a user command, calls systemctl with the appropriate arguments (provided that the targeted unit is in the config file) and then posts the results. If the unit was not in the config file it logs an error instead.
Async Rust
I anticipated fighting with the borrower checker, but—oh boy!—it pales in comparison to writing and understanding async Rust. Since I was coming from the world of “””enterprise software”””, I was used to writing with a level of indirection to facilitate code reuse, unit testing, and refactoring. However, Rust makes you pay for indirection that involves tracking more state or more complex state since it has to track that state while the async call is in progress. Watch this video to hear someone much smarter than me explain why the current state of async Rust ain’t quite it yet:
Something possessed me to go full enterprise software sicko mode during the development of systemctl-bot and unit test every module to as close to 100% coverage as possible. I’m glad I did because it taught me more about generics and about Box, Rc, and Arc as I tried to find ways to mock dependencies, but it also taught me that this style of testing in Rust produces a huge glob of code that is painful to wrangle.
I decided to take a different approach while developing pipe-bot: I just mocked the outer edges of my program and let every test be an integration test. Any unit-level errors that mattered seem to come up in these tests, and since my program was small it wasn’t difficult to identify the specific function where the error originated. I got 99% of the benefit of unit testing with 20% of the effort.
Final thoughts
I enjoy Rust, but I respect Go. Rust is more fun to write, and the compiler’s strict checking is a superpower that ensures you don’t screw yourself up too badly. However, async Rust is a huge pain for me, and while Go is boring, sometimes it’s the ticket to complete a project.
Keeping my NixOS servers up to date was dead simple before I switched to flakes – I enabled
system.autoUpgrade, and I was good to go. Trying the same with a shared flakes-based config introduced a
few problems:
I configured autoUpgrade to commit flake lock changes, but it ran as root. This created file permission issues
since my user owned my NixOS config.
Even when committing worked, each machine piled up slightly different commits waiting for me to upstream.
I could have fixed issue #1 by changing the owner, but fixing #2 required me to rethink the process. Instead of having
each individual machine update their lock file, I realized it would be cleaner to update the lock file upstream first,
and then rebuild each server from upstream. Updating the lock file first ensures there’s only one version of history,
and that makes it easier to reason about what is installed on each server.
Below is one method of updating the shared lock file before updating each server:
Updating flake.lock with GitHub Actions
The update-flake-lock GitHub Action updates your project’s flake lock file on a schedule. It essentially runs
nix flake update --commit-lock-file and then opens a pull request. Add it to your NixOS config repository like this:
# /.github/workflows/main.ymlname:update-dependencieson:workflow_dispatch:# allows manual triggeringschedule:- cron:'0 6 * * *'# daily at 1 am EST/2 am EDTjobs:update-dependencies:runs-on:ubuntu-lateststeps:- uses:actions/checkout@v4- uses:DeterminateSystems/nix-installer-action@v12- id:updateuses:DeterminateSystems/update-flake-lock@v23
Add this step if you want to automatically merge the pull request:
Next, it’s time to configure NixOS to pull changes and rebuild. The configuration below adds two systemd services:
pull-updates pulls config changes from upstream daily at 4:40. It has a few guardrails: it ensures the local
repository is on the main branch, and it only permits fast-forward merges. You’ll want to set serviceConfig.User to
the user owning the repository. If it succeeds, it kicks off rebuild…
systemd.services.pull-updates={description="Pulls changes to system config";restartIfChanged=false;onSuccess=["rebuild.service"];startAt="04:40";path=[pkgs.gitpkgs.openssh];script=''
test "$(git branch --show-current)" = "main"
git pull --ff-only
'';serviceConfig={WorkingDirectory="/etc/nixos";User="user-that-owns-the-repo";Type="oneshot";};};systemd.services.rebuild={description="Rebuilds and activates system config";restartIfChanged=false;path=[pkgs.nixos-rebuildpkgs.systemd];script=''
nixos-rebuild boot
booted="$(readlink /run/booted-system/{initrd,kernel,kernel-modules})"
built="$(readlink /nix/var/nix/profiles/system/{initrd,kernel,kernel-modules})"
if [ "''${booted}" = "''${built}" ]; then
nixos-rebuild switch
else
reboot now
fi
'';serviceConfig.Type="oneshot";};
There are many possible variations. For example, in my real config I split the pull service into separate fetch and
merge services so I can fetch more frequently. You could also replace the GitHub action with a different scheduled
script, or change the rebuild service to never (or always!) reboot.
I restarted my server the other day, and I realized one of my systemd services failed to start on boot because the
Tailscale IP address was not assignable:
This is easy enough to fix. The service should wait to start until after Tailscale is online, so let’s just add
tailscaled.service to the the service’s wants and after properties, reboot, and…
Huh. It turns out Tailscale comes up a bit before its IP address is available. I was tempted to add an ExecStartPre
to my service to sleep for 1 second – gross! – but eventually I found systemd’s fabulous
systemd-networkd-wait-online command, which exits when a given interface has an IP
address. Call it with -i [interface name] and either -4 or -6 to wait for an IPv4 or IPv6 address.
Wrapping it up into a service gives you something like this:
# tailscale-online.service[Unit]Description=Wait for Tailscale to have an IPv4 addressRequisite=systemd-networkd.serviceAfter=systemd-networkd.serviceConflicts=shutdown.target[Service]ExecStart=/usr/lib/systemd/systemd-networkd-wait-online -i tailscale0 -4RemainAfterExit=trueType=oneshot
Services using your Tailscale IP address can now depend on tailscale-online.