Immutable Dev Environment

…of course that SSD would fail under me and wreck my install!

Thankfully I was really disciplined and was carefully keeping my dotfiles repository up-to-date. Coming back to the state I was before that disaster happened should be easy, right?

Well, what ended up happening is that the dotfiles weren’t in sync as expected, there was a bunch of files that were ignored during the backups. Even if I was very careful in keeping that dotfiles repo update, still, there were a lot of unforeseen circumstances:

  • those ignored files were actually important settings
  • worse, I completely forgot that I even used that very obscure package created from an one-off build to solve an immediate need.

Automation with Ansible

I’ve had to solve that problem, and went straight on a quest to automate my development environment setup. A few days later, I’ve had a pretty complete set of ansible scripts, that would install all the packages I needed, clone my dotfiles, and completely setup my environment in a repeatable way…

What I didn’t anticipate is that since setting up a new environment took a pretty good time waiting for ansible to run, I would unconsciously avoid re-creating it frequently, and worse, I ended up installing more stuff after a while. When I went to try using my own scripts, they were completely out-of-sync with what I was currently using at my machine.

Clearly I couldn’t expect I’d be disciplined enough to keep the scripts updated, and at the same time not install new stuff on the environment by simply calling out apt-get install whatever vs editing some random ansible script and re-provisioning my environment.

Then the worst happened. I’ve had another hardware failure, and here I was again scrambling to remember what the heck I did to start with. The Linux distribution I was using changed the name of some packages, updated some package versions and I wasn’t paying attention. It took me 1 entire day to fix my environment.

Enough is enough.

I was already a pretty heavy docker user. After getting inspired by reading Jess Frazelle’s blog I started moving more and more things I used to small containers. The satisfaction of not having to taint my host install with some random packages was great.

Then it dawned on me: Why not actually set up the whole development inside a docker container?

The Docker Dev Environment

I was already used to running things in docker passing the --rm flag to run, essentially making the container filesystem vanish upon exit. The convenience of mounting both ssh-agent and docker sockets inside my container allowed me to get a lot of work done already.

The Dockerfile can be found here.

Design Decisions

  • The security setup for this assumes you added all your public keys to your GitHub account, and took all the required steps to lock down your account (2FA, etc).
  • To avoid port collisions, the OpenSSH server runs on port 3222.
  • There are no passwords for your user or for the root account. No sudo is configured either.
    • You still can go into the host and docker exec -it <container id> sh and get a root shell, but the whole idea is to force you to rebuild the image instead.
  • Every time you restart this container you will lose everything that wasn’t stored in volumes.
    • This might sound radical, but that’s the only thing that worked for me to finally create the discipline of keeping track of the settings I really care.
    • You can persist your settings by moving them to /mnt/secretz/$USER/pack and starting a new shell. .zshrc will call stow and symlink the files appropriately.
    • Likewise, ZSH shell history will try to look for a /mnt/secretz/$USER/.zhistory before bailing out and creating it on the home folder.

Challenges

  • Running docker outside docker meant that my container needed to share some paths with the host.
    • Solution: mount a host folder to my container, and always work inside it. So instead of using ~/dev for my code I ended up using /mnt/codez/dev. This way I could always run docker run --rm -it -v "$(pwd):$(pwd)" <some image> and docker would happily just work.
  • Mounting docker and ssh-agent sockets inside the container was annoying due to permission mismatch.
    • For the docker socket, if .zshrc cannot detect the socket, it sets up docker for TCP access in localhost. This way I can set up a dind container that will be available for my dev container.
    • For the ssh-agent, I ended up running a openssh server inside the dev container. Since I’m an avid tmux user, this not only allows me to use the ssh-agent with my Yubikeys, but also keeps my tmux session running without terminating the container as I disconnect.
  • A few things don’t quite work inside a container, like parts of the Rust test suite that interact with gdb - you might need to run this as a privileged container and disable some seccomp policies.
    • --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --privileged
    • here be dragons, you’ve been warned :P

Docker-for-Mac

I ended up getting back to using Macs as my dev machines after a ridiculous fight between my Bose QC35 and the Linux Bluetooth Stack, but this deserves a whole blog post.

Networking

Docker-for-Mac uses hyperkit for its VM - while it’s really nice to have a very lightweight VM, it comes with a pretty terrible limitation: The host network is completely isolated from the VM’s host network. When you run docker run -p 8090:80 nginx Docker-for-Mac does some tricks and explicitly forwards the port 8090 to your Mac’s localhost.

While this is “good” from a sandboxing perspective, it’s a real pain when you want to access anything running inside your dev container that is already running on docker’s host network. One solution for this is to run OpenVPN inside a container, forward its main port to Mac’s localhost and use Tunnelblick to connect to it. The OpenVPN client is going to get a stable IP address that you can connect to and will actually be on the same VM host network. There’s prior art, but it’s over two years old, with no signs of being updated.

Setting up OpenVPN is annoying and involves dealing with certificates. I’ve been playing with terraform‘s TLS Provider, and it felt like a perfect fit for generating and renewing very short-lived certificates. For this purpose, I ended up creating terraform-openvpn-module, which is super easy to use:

module "openvpn" {
	source            = "github.com/qmx/terraform-openvpn-module"
	vpn_endpoint      = "localhost"
	vpn_port          = "31194"
	server_cidr       = "10.23.0.0/29"
	additional_routes = ["10.22.44.0/24", "10.96.0.0/12"]
}

resource "local_file" "server_conf" {
	content = "${module.openvpn.server_config}"
	filename = "${path.module}/config/openvpn.conf"
}

resource "local_file" "client_conf" {
	content = "${module.openvpn.client_config}"
	filename = "${path.module}/workstation.ovpn"
}

This terraform snippet will create both server and client configuration files, and renew the client certificate when it’s near its expiration.

I originally created this terraform module for generating OpenVPN configuration files tailored to Kubernetes. I use this to be able to access my personal infrastructure like my CI server and Docker registry that are not exposed to the public internet.

Storage

I’ll call out the elephant in the room right away: osxfs is really, really terrible.

Not only being several orders of magnitude slower, it actually doesn’t quite preserve permissions the same way traditional UNIX filesystems do. Worse, it actually conceals the fact that this is happening by storing the permissions as extended file attributes from the Linux side and showing all the files with different permissions on the Mac side. The worst of both worlds.

Did I mention that it is terribly slow?

The solution I’ve found for that is ugly, but it’s totally worth it. As soon as you mount a host volume in Docker-for-Mac, it automatically picks osxfs as the filesystem.

❯ docker run --rm -it -v /Users/qmx:/shire alpine:3.8 mount | grep shire
osxfs on /shire type fuse.osxfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other,max_read=1048576)

But that doesn’t happen when you create a named docker volume!

❯ docker run --rm -it -v foo:/shire alpine:3.8 mount | grep shire
/dev/sda1 on /shire type ext4 (rw,relatime,data=ordered)

So my very ugly solution is creating named volumes for my code, my secrets and for dind‘s docker cache, mounting a host folder to store backups and backing up regularly using restic.

I’m still working out a way of scripting a good way of both running the backups and restores. Stay tuned!

When I have to recreate my dev environment from scratch, I restore the backup from the host folder and move on. Note that this is only really needed when you do a factory reset on your Docker-for-Mac app.

Running it

To run it, clone the workstation repository.

Generate the certificates, using terraform

terraform init
terraform apply

Run the containers

docker-compose up -d

Install the VPN client profile (it’s called workstation.ovpn)

Tunnelblick should be able to connect after the startup is finished.

Finally, you can reach the container (and conveniently spawn or reattaching to an existing tmux session):

ssh 10.23.0.1 -p 3222 -A -t "tmux new -A -s dev"

What’s next?

There are still some rough edges, but overall my life is way better with this environment - there are some things I’d like to improve in the long run:

  • Start adding automated tests
    • Ensure the binaries built by the multi-stage build actually work on the target container (a common problem is me forgetting to install some dependencies on the dev container).
    • Verify some of my common workflows, like successfully building the Rust compiler.
    • Check if diagnostic tools needed by my vim install are working properly
  • find ways of detecting stray configuration files that were created in $HOME, and some tooling to “adopt” them, effectively moving them to /mnt/secretz
    • Backup automation

Well, that’s a wrap! Hope this is useful to someone else!