Earlier this year, I attended CfgMgmtCamp in Ghent and listened to Adam Jacob’s “What if Infrastructure as Code never existed” keynote.
Note: I’d like to extend a huge thanks to Adam for taking the time to review this post, and for an unnamed person for consistently reviewing these posts and for inspiring the thoughts here.
Not only is Adam an excellent speaker, but he can capture thoughts that most people can’t articulate and explain them in a way that revolutionises people’s thinking.
This talk was no exception. If you have yet to see it, I’d like to introduce you to something I’ve always known but have yet to appropriately identify: the 200% knowledge problem.
You can listen to Adam’s excellent explanation of the 200% knowledge problem here. My own explanation of this problem is:
To successfully use an abstraction, you need to understand the problem the abstraction is trying to solve and also understand how the abstraction has solved the problem.
Terraform Modules
My best example of this is when examining the Terraform module ecosystem. Terraform modules, in theory, are designed to solve specific problems in the cloud provider ecosystem. Taking an example like the AWS VPC module removes the need to understand all of the glue that AWS needs to successfully create a VPC, like route tables, subnets and NAT Gateways.
That’s the theory. The reality of these modules is that you need to understand the magical incantation of random functions and use dynamic
to succeed.
In addition, the desire to create Terraform modules that meet every single user’s possible use case means that often, the module will expose the entire surface area of the APIs the module is managing to the user. Usually, this leaves you in a position of having to painstakingly read the whole module’s code before using it, and if something breaks, you’re shit out of luck.
Hearing Adam describe this problem has had my brain slowly creaking for a while. We’ve seen lots of literature in the past few years about the explosion of knowledge required to be a successful “DevOps Engineer”, “Site Reliability Engineer” or “Platform Engineer” or whatever that role’s title is this week. As I’ve noodled on this for the past few months, I’ve started to leverage the 200% problem and give it a moniker of my own: “the 300% production problem”.
The 300% Production Problem
The definition o