Cultivate your developer toolbox
Stop writing throw-away scripts to solve your day-to-day engineering challenges, cultivate a toolbox.
One of the highest return on investment practices I've adopted in my software engineering career is changing how I think about scripts, one-liners, prototypes, and scratchpads. Instead of one-off, throw-away scripts, I think of them as "tools" that I keep in my "toolbox".
It isn't a revolutionary idea, being adjacent to a "second brain," professional journal, or physical toolbox, yet I'm surprised how few of my peers implement the simple practice and feel compelled to share. The practice is now high on my list of new hire tips and tricks, right under finding a mentor.
Copying a role model on my team almost a decade ago I started a ZachStuff git repo. It started as really just "stuff" - a junk drawer of random bash scripts. Over time ZachStuff has become a resource I reference weekly, and it differentiates me from my peers, in a minor way, with how fast I can achieve day-to-day engineering tasks.
Over time, each new tool is faster to write, higher quality, and more powerful. Not just because I learn from the experience of making tools - mastering design patterns, frameworks, libraries, etc. Rather, the value is also the ever-increasing corpus of known-good solutions to copy from.
Many day-to-day engineering challenges share a small number of problem kernels. So copying a working tool skips past the boilerplate to more quickly get down to business. There's only so many versions of "get the logs and search for Foo," "build and run a little bit of code in the cloud somewhere," or "make some chained aws API calls," that I tend to do several times a week.
Below is an example (which was also the motivation for this post!):
Eight years ago I joined a new engineering team and was working on a ramp-up task. My job was to hunt for clues of a bug by joining many Big production log streams of various formats to construct a normalized timeline of events for each instance of the bug.
There are tons of ways to dive through logs: from brute-forcing via the CloudWatch browser console which wouldn't cut it due to volume, to deploying a logging service backed by expensive document search DBs seemed overkill and would go mostly unused.
I thought a reasonable balance was to launch a small EMR/Spark app to fetch, transform, normalize, join, and search logs then log results. I only intended to use this thing once. I spent a couple days wrestling with EMR and Spark just enough to do the job. My small team didn't own any Spark applications, never needed to do something like this before, and had no plans to do so in the future. The tool wasn't obviously transferrable or worth production-alizing, so it was destined to be thrown away as soon as I got the search results I needed.
However, as I had just started keeping my "Stuff" repo, after I found the logs I was looking for I checked in the EMR/Spark log searching tool and wrote myself a README before moving on.
I had not just learned from the experience, which was my first ever use of Spark or Hadoop. But by checking in the application, with usage instructions, as ugly one-off junk as it was, I had created an artifact to use again. Maybe not verbatim, but at least to copy and modify to my needs.
Some years later, on an unrelated team with no experience with or need for big data tools, I needed to search through a huge AWS S3 bucket. I used my previous tool to "spin up a quick Spark cluster" to do the job pretty quickly. I copied the above-mentioned log diving tool I had built years before, updated libraries, added a few features and configuration knobs and solved that days problem quickly.
Because I had an existing Spark app to start from, not only did I finish the immediate job faster than if from scratch, but the new features an added that time around would make the next copy *even better.*
Now recently, years later yet again, I just found myself saving hours of time by re-using parts of that initial "junk drawer" Spark/EMR boilerplate yet again, which is why you're reading this! (and hopefully finding it valuable). While now, after the third copy, it bears resemblance to the Ship of Theseus, I can see the progression over time since I've kept the copies around.
Tips for cultivating your toolbox
I now have roughly 400 "tools" in my toolbox!
Pretty much everything I do that's longer than a one-liner gets checked in. Most are regexes and piped commands to search files. Some tools however are huge multi-language CDK-deployed applications, like my own interpretation of Bees With Machine Guns for generating extreme load to a web app.
Over time, I've found the below practices have proven useful with my tools.
Prefer copies over abstraction
Every time I come across a new problem similar to one I've already solved, I prefer copying the old tool over complicating an existing one. I then prune away what isn't needed and expand the "shell" into the new problem.
Trying to build a multi-tool with enough config or abstraction to do many things well enough is difficult. In a production setting, it makes sense to get that kind of thing right, but for tools the goal is low time-to-completion. Copying is typically just faster (though not always!).
I think of it kind of like encountering an obscure nut and bolt during a home improvement project. You could buy an expensive infinitely adjustable wrench that can handle every shaped nut, or just have a drawer full of dozens of differently-sized cheap wrenches.
In the case of software tools, though, it's as if you can pick up a wrench that's close to what you need, magically duplicate it, then transmute it into the perfect shape for the job at hand. And also store infinitely many in your "drawer" at no cost.
Keep representative data
Another tip is to keep a representative sample of data alongside the data-heavy tools in your toolbox, making sure to follow best practices of data handling (anonymized, cleaned, no secrets, no PII, etc). Keeping that sample data nearby lets you rediscover how the tool works later, or is a useful sandbox for copying to a new, very similar tool.
For example, a regex-y file manipulation script with complicated cat | grep | sed | awk should sit next to a 1000 line sample of the original file against which the script was authored. If that's more than a few Kb, a pointer is fine too.
I think of it as keeping a few spare boards next to your saws in the garage. Your next sawing job is easier because, if needed, you can practice on material you know the saw can handle.
Practice using standalone columnar data tools
Stand-alone (without a daemon) local columnar data tools are incredibly powerful when chained together. Some of my favorite tools are not much more than recs and SQLite glued together. Recs is an underrated tool useful for converting STDOUT to "RECord Streams". From there, into an sqlite db to perform queries, using joins, etc.
This type of tool is invaluable for reconstructing or finding a scenario in a distributed system from logs for debugging. By gluing together the standalone version of columnar tools, they'll always work, don't depend on the state of the system, and it's natural to keep data and code co-located. That's related to my next tip - packages.
Tools are packages
My ZachStuff repo is a monorepo, full of many unrelated things. A recent practice I've adopted is keeping each tool as it's own package in a subdirectory. Tools in Node have their package.json
, python has requirements.txt
or setup.py
, java tools have a pom.xml
, docker has Dockerfile
and docker-compose.yml
, etc.
This avoids any dependency on system libraries, making each tool relatively portable. It also allows for the tool to become a "v0" of a shareable tool.
This also promotes you to think of tools as "buildable projects" that accept "arguments" to do what you want, which I find helpful to ensure the tool is solving the right problem.
README
Related to keeping tools as packages is also keeping a README. Write some amount of instructions for yourself to use later. Whether its a sample usage and a one-line description or multi-step setup guide documentation, anything to help you get going on a copy of the tool quicker in the future is worthwhile.
What to - and not to - keep in your toolbox
Sometimes a one-off tool existing at all is an indicator that something is missing from your team's software.
If you feel the need to script out chained commands to boot your service, build a package, or something that everyone on your team needs to do frequently, that is a smell that the script or process does not belong in your personal toolbox. Instead, that process belongs as close to the service or package as possible - like a build target or bin/ subdirectory in the package.
There's some judgement involved, but some rules of thumb as to what belongs in your personal toolbox is this:
If it took you less than 5 minutes to get working, then it doesn't need to be checked in anywhere. If it took you more than 1 hour, it should *definitely* be checked in *somewhere*.
If you think what you're about to do would be immediately reusable by your teammates, and would take only a few hours to get into shippable shape, then it belongs in your team's software packages.
Everything else should be checked into your personal toolbox repository
Some more examples that do belong in your toolbox are:
Prototypes for new product ideas
Hackathon projects
Scratch-pads used to learn a new library, language, or framework
That whacky grep | sed thing you ran during a Sev-1 to find broken host needle in the fleet haystack
A script you wrote for something random that doesn’t “belong” anywhere else
Closing thoughts
I think of my “ZachStuff” git repo like my version of MacGyver’s pockets. The fuller MacGyver’s pockets were with string and rubber bands and paper clips and seemingly random junk, the more likely he was to be able to quickly rig something together save the day.
A toolbox repo isn’t glamorous, doesn’t replace battle-tested operational tooling, or any part of a functional production system. But it’s a practice I encourage you to keep, as I’ve found it to be very useful.