The Apollo Guidance Computer is a neat little machine with quite the accomplishment: successfully taking people to the Moon and back. Creating and programming it were momentous achievements of technology. A photo of NASA engineer Margaret Hamilton stacking code printouts for it is on the wall of my apartment. How much processing power did this feat take?
As it turns out, not much. The final configuration had 72 kilobytes of program memory and 4 kilobytes of RAM, along with a processor that ran at a whopping 2 megahertz. Today’s computers are of course much more powerful, but their usefulness hasn’t kept track with their raw increase in abilities. A computer with a thousand times more processing power and a million times more memory and storage can start to stutter, not while calculating spaceship trajectories, but while rendering a blog post1 😑 What happened? Was the knowledge of programming lost, sending us into a software dark age? Not quite. The problem is code. Code poorly thought out, stacked on top of more poor code, and covered with yet more as soon as any issues arise.
When I hear the word “platform”, I reach for a pun
I was inspired to write this by many things, but most immediately by a software talk I saw at a Meetup recently. Like many such talks, it was really a stealth ad for the speaker’s own software project. But what really struck me was how pointless it was. I won’t name it because the details don’t matter. The same patterns happen time and time again in these sorts of things. Someone finds a supposed problem (“I don’t understand how the code I’ve already written works”, “Java needs better dependency injection”, “data scientists refuse to use version control”) and then build an entirely unnecessary platform to try and solve it 🤦♀️
I believe many of these issues are caused by desire for a sort of quasi-modularity. The idea is that layer X of software doesn’t provide what you want, but for whatever reason X shouldn’t be changed. So instead a new layer is added on top of X. This provides a veneer of simplicity (“layer X does X, and layer Y does Y”). But when too many layers are stacked on top of each other the result is always a confusing mess. Systems are what get used, not layers, and the simple API provided between layers never provides the whole story about what can and will go wrong. Only a look at the entire stack does.
Ghosts of /dev/console
A good example of how these annoying messes can form and hide right under our noses is the modern Unix-like command line. These started off simple, at least relative to computers in general. A physical teleprinter was hooked up to the computer. It had a keyword and printer, and talked to the computer over a serial line. Then the teleprinter became a terminal. Increased brains in the terminal meant that it could do something besides print a stream of letters, and so control codes were born. Codes could move the cursor around, change the text color, and similar little feats. Programs could use these to great effect to create entire UIs, allowing on-screen text editing, displaying multiple terminal sessions, and more.
There was a fundamental change with the introduction of GUIs, which offered an entirely different UI paradigm. The terminal was replaced with the terminal emulator; a GUI application that displayed everything a terminal did, but now on the pixel-oriented screen. The shell program continued to send control codes as if it was talking to a terminal, even though it was actually talking to the emulator over a virtual serial line provided by the kernel. This was fine as a stopgap, but incredibly enough it’s the start-of-the-art architecture for running command-line programs to this day 😱 It’s 2018, and as I speak someone is using an entire Chrome tab to run a blob of JS that processes control codes from the 80s to talk over a virtual serial line to other software on the same machine just to run a program. They might even be using a terminal multiplexer in it, valiantly performing a pale imitation of what Chrome or the window manager could do far better. Just think of all the cycles wasted and possible features lost 🔥
“So what”, you say, “it may be convoluted but it still works! Besides compatibility is important, and the whole stack is still pretty simple.” That may be true, but it prevents a variety of features from being added. Here are some things that you might want2 on a command line, but are next-to-impossible to get on the current stack:
- Output tables that can use proportional fonts and don’t break when the window is resized.
- Splitting a shell transcript into individual command executions.
- Client control (rather than program control) of output formatting.
- Structured data in pipes that also has pretty printing.
- Graphical output that doesn’t use text character approximations.
To be fair this is not an easy problem to solve. A few projects like TermKit and upterm have tried, but have their own problems. The terminal “protocol” is embedded in a lot of old code. But every day it gets more and more absurd that these pointless and functionality-denying layers still exist. I have worked on some solutions myself but not found anything great yet.
Of course there’s a webdev section
Web development has taken unneeded complexity to the extreme. In SRE the common refrain is “automate yourself out of a job”, which is a snarky way of putting how we improve software systems: by making them reliable, predictable, and homogeneous. Rather than losing a job we expand it by making a single human able to manage more functionality (as opposed to code) at once. Webdev is somewhat different; the best practice seems to be making a stack so complex you get a permanent job maintaining it and rewriting it when fashions quickly change. Luckily they have a fitting punishment: having to maintain these stacks.
“Scalability” has been a common excuse for this sort of poor design. Let’s take an illustrative example. A common pattern in modern Web development is something called client-side rendering. The essence of this is that the Web server, when any page is requested, just sends out an empty HTML page. But don’t worry about missing out on content: the “empty” page turns out to be very large and slow because of all the JS that’s being downloaded and interpreted 😴 After all the code is ready the browser is finally ready to call various backend servers directly and puts a page together in-browser. This is supposedly for scalability, the idea being that processing time is distributed instead of centralized. But a closer examination shows how false that notion is. Putting together the HTML itself is hardly a load-intensive operation and can be cached. Getting everything needed from backends still happens on client-side rendering, and the processing is now slower. All the backend requests have to come from the user’s device instead of within the datacenter. The backend requests can be cached as well, but they still have to be made, and no amount of “modern software” can change light speed delay.
At the extreme client-side rendering can make your page essentially unusable, especially if there’s a large physical distance between client and server. A combination of distance and dozens of serial roundtrips makes one monitoring webpage I know of take >10 seconds to load some line graphs if loaded in Australia. Had processing been done on the server instead with final results sent to the client almost all of the latency would be gone. Can these sorts of latency issues be fixed while keeping client-side rendering? Possibly yes, but not very easily. The big stacks that are often used just to assemble the code that goes to the browser obscure what actually happens and when, making it hard to find and fix these bugs. Webdev’s tendency to constantly change the stacks they use doesn’t help matters either, nor does JS’s inherently asynchronous, out-of-order nature.
Not that client-side rendering is always terrible either. Its primary usefulness is not the talking points often seen online about scalability and slow backends, but about user-level dynamism. If part of your page changes a lot every time it is loaded and every time someone else loads it, splitting that off to be downloaded in the browser does potentially improve things. The canonical example of this is a social network. The server can send a page with the UI elements and such, with posts being loaded and rendered in the browser after the UI skeleton has appeared. But those very real benefits are obscured when people wrongly insist on using it everywhere.
Another harbinger of problems is “code reuse”. Like “scalability” it’s a nice platitude. But it’s not always the right thing. Consider trying to avoid writing 100 lines of code by adding a 50k library to your project. Now you’ve added a massive dependency that’ll be with you every step of the way. The time supposedly saved on those 100 lines is soon lost in increased build times, additional maintenance costs when that library inevitably does something unexpected, and a larger attack surface. Obviously libraries have their place but they should be used sparingly. There’s no
npm install for actually understanding what you’ve built.
Know thyself’s heap
Another important benefit of simplicity is better introspection. Consider the Linux utility
strace. This humble little program runs a program and prints out all the system calls that it makes along with parameters and return values. Rather than trying to do a lot of things for a few programs it does one thing for many programs, and that makes it incredibly useful for debugging many different things. System calls are how a program actually gets anything done, and running
strace means you get a full transcript of that. Unfortunately its power can be ruined when a program is “improved” by unneeded code. When a program makes many calls unrelated to what it’s actually getting done finding out why it’s breaking is almost impossible.
The same principles apply to other introspective tools. CPU and RAM profiling are utterly useless if your framework is constantly spinning its wheels and making allocations. Stack traces are just line noise if your asynchronous framework (for “scalability”) has turned your control flow into spaghetti. Log output isn’t helpful if all 100 of your dependency libraries log everything that they do too.
strace is probably not going to be very helpful for network servers, the same principles apply to them. Inside Google3 tools like Borg, Stubby, and Monarch are widely used. Doing things this way certainly results in some compromises, but it’s worth it in the end. When you get paged in the middle of the night because a service you’ve never heard of before starts failing it pays to have systems that are as simple and homogeneous as possible.4 And while I certainly don’t like all of the Google-internal tools, they are predictable. Teams generally avoid writing idiosyncratic solutions to “fix” them, which in reality causes more cognitive load when I have to find out why their service is breaking mine. Simplicity even pays off in webdev. Debugging layout and formatting issues is far easier if the document is just there instead of being ephemeral and JS-generated, with piles of CSS rules all overriding each other.
Put down the IDE and back away slowly
People talk about “code” as if more of it is constantly needed to get anything done. But that’s simply not true; rather the right code is needed. The wrong code makes getting things done more difficult, and is far more common than you might think. Again, the Apollo Guidance Computer’s program was stored in 72 kilobytes of memory. That’s a rounding error to most modern programs, even some websites!
So please, next time you want to write some code: don’t. Delete some instead. Or contribute to one your dependencies by deleting some of its code, or making it not your dependency anymore ✂️ I can almost guarantee you’ll find something to mercilessly refactor if you look around you.5
- Though hopefully not this one. ↩
- And I definitely want. ↩
- Who I do not speak for, only for myself. ↩
- More details can be found at https://landing.google.com/sre/book/chapters/production-environment.html ↩
- Getting support from your boss/the code maintainer/whoever’s permission you need is left as an exercise to the reader. ↩