In 2019, I sat wide-eyed in a presentation hall at KubeCon in San Diego. There I mustered all of my self-restraint to keep myself from leaping off my chair and yelling “Yes! I have seen this bug! And today I know I am not crazy!”
In “The Gotchas of Zero-Downtime Traffic With Kubernetes,” Leigh Capili of Weaveworks gave an excellent presentation on some unexpected network issues in Kubernetes and their not-so-obvious solutions.
Earlier that year, my own team at SPS Commerce had delved deep into the same territory when we spent a couple of weeks performing hundreds of tests where we’d blast applications with traffic while rotating pods.
These tests uncovered a few surprises with pod lifecycles and cluster networking. Among them, we found that a container in a pod will sometimes terminate before Kubernetes removes the endpoint. In other words, the cluster might send traffic to an application that is no longer there. And this can happen any time a pod is terminating (due to a rolling deployment, a node eviction, scale-down event, etc.)
We came up with the same solution that Capili did: We configured pre-stop lifecycle hooks on all of our pods to ensure Kubernetes removes the endpoint before the listening container receives the termination signal. The hook just called “/bin/sleep.”
Easy! And it worked! Except, as one of the other KubeCon attendees asked, what do you do if you don’t have “/bin/sleep” in your container? After the presentation, I ran up to Capili and shared what we believe is a creative solution to this problem. He asked if we could make it available to the community as open source, and today I’m happy to announce “nap.”
Nap is a tiny program that calls the “sleep” system call. In fact, it doesn’t do much else. So what’s the big deal?
What we think makes “nap” interesting is that the program can run in minimal or scratch containers that have no shell or even the standard libraries (no libc, etc). And you can wedge it into those containers through some very small spaces.
We wrote it in assembly language (x86, 64-bit) to be as compact as possible. When fully assembled and linked, it weighs about 800 to 900 bytes depending on the linker options.
Wait … but … did you say assembly language?
Yes. To make “nap” available to pods, we mount the binary code through a Kubernetes ConfigMap, which imposes a 1-megabyte limit on the size. That rules out creating something like a statically linked Go program, which includes a runtime and libraries.
We also wanted the binary to be available to every namespace in the cluster. That means duplicating the ConfigMap (and binary data) all over the place.
So we optimized for size. And then, well, we kept optimizing.
Seriously, though. Assembly language??
OK. I hear you. Why not C or C++? Those are languages with minimal runtime overhead that are well-known to produce tight, efficient code.
Calling “sleep” in C requires linking to libc. Static-linking yields a 120-kilobyte binary. Technically small enough, but surely we can do better.
Digging deeper, we can invoke the “nanosleep” system call in C, which requires linking to time.h. This gives us a 40-kilobyte binary.
From here, we could keep descending the deep, dark staircase of abstraction, inching closer to the metal while linking to smaller and smaller libraries to optimize our code. Or we could just talk to the machine in its native tongue.
Making a system call in assembly language is actually very simple: Move a few values into a few registers and invoke interrupt 0x80. You can do it in four instructions!
If you read the source, you’ll see our program isn’t that simple. We also parse an optional command-line argument and print a few helpful messages to stdout. But you didn’t think we’d go this far without having some fun, did you? 😉
We hope you find this project interesting and informative. This is the first project sponsored by the Open Source Guild at SPS Commerce. This first step was a big one and took a lot of cooperation and partnership with the company’s technical leadership and legal team. One of SPS Commerce’s core values is “giving back” to our communities. We hope that this little project is just the first in a long line of contributions that we can give back to our open-source community.