Have you ever mentioned something that seems totally normal to you only to be greeted by surprise? Happens to me all the time when I describe something everyone at work thinks is normal. For some reason, my conversation partner’s face morphs from pleasant smile to rictus of horror. Here are a few representative examples.
There’s the company that is perhaps the nicest place I’ve ever worked, combining the best parts of Valve and Netflix. The people are amazing and you’re given near total freedom to do whatever you want. But as a side effect of the culture, they lose perhaps half of new hires in the first year, some voluntarily and some involuntarily. Totally normal, right? Here are a few more anecdotes that were considered totally normal by people in places I’ve worked. And often not just normal, but laudable.
There’s the company that’s incredibly secretive about infrastructure. For example, there’s the team that was afraid that, if they reported bugs to their hardware vendor, the bugs would get fixed and their competitors would be able to use the fixes. Solution: request the firmware and fix bugs themselves! More recently, I know a group of folks outside the company who tried to reproduce the algorithm in the paper the company published earlier this year. The group found that they couldn’t reproduce the result, and that the algorithm in the paper resulted in an unusual level of instability; when asked about this, one of the authors responded “well, we have some tweaks that didn’t make it into the paper” and declined to share the tweaks, i.e., the company purposely published an unreproducible result to avoid giving away the details, as is normal. This company enforces secrecy by having a strict policy of firing leakers. This is introduced at orientation with examples of people who got fired for leaking (e.g., the guy who leaked that a concert was going to happen inside a particular office), and by announcing firings for leaks at the company all hands. The result of those policies is that I know multiple people who are afraid to forward emails about things like updated info on health insurance to a spouse for fear of forwarding the wrong email and getting fired; instead, they use another computer to retype the email and pass it along, or take photos of the email on their phone.
There’s the office where I asked one day about the fact that I almost never saw two particular people in the same room together. I was told that they had a feud going back a decade, and that things had actually improved — for years, they literally couldn’t be in the same room because one of the two would get too angry and do something regrettable, but things had now cooled to the point where the two could, occasionally, be found in the same wing of the office or even the same room. These weren’t just random people, either. They were the two managers of the only two teams in the office.
There’s the company whose culture is so odd that, when I sat down to write a post about it, I found that I’d not only written more than for any other single post, but more than all other posts combined (which is well over 100k words now, the length of a moderate book). This is the same company where someone recently explained to me how great it is that, instead of using data to make decisions, we use political connections, and that the idea of making decisions based on data is a myth anyway; no one does that. This is also the company where all four of the things they told me to get me to join were false, and the job ended up being the one thing I specifically said I didn’t want to do. When I joined this company, my team didn’t use version control for months and it was a real fight to get everyone to use version control. Although I won that fight, I lost the fight to get people to run a build, let alone run tests, before checking in, so the build is broken multiple times per day. When I mentioned that I thought this was a problem for our productivity, I was told that it’s fine because it affects everyone equally. Since the only thing that mattered was my stack ranked productivity, so I shouldn’t care that it impacts the entire team, the fact that it’s normal for everyone means that there’s no cause for concern.
There’s the company that created multiple massive initiatives to recruit more women into engineering roles, where women still get rejected in recruiter screens for not being technical enough after being asked questions like “was your experience with algorithms or just coding?”. I thought that my referral with a very strong recommendation would have prevented that, but it did not.
There’s the company where I worked on a four person effort with a multi-hundred million dollar budget and a billion dollar a year impact, where requests for things that cost hundreds of dollars routinely took months or were denied.
You might wonder if I’ve just worked at places that are unusually screwed up. Sure, the companies are generally considered to be ok places to work and two of them are considered to be among the best places to work, but maybe I’ve just ended up at places that are overrated. But I have the same experience when I hear stories about how other companies work, even places with stellar engineering reputations, except that it’s me that’s shocked and my conversation partner who thinks their story is normal.
There’s the companies that use @flaky, which includes the vast majority of Python-using SF Bay area unicorns. If you don’t know what this is, this is a library that lets you add a Python annotation to those annoying flaky tests that sometimes pass and sometimes fail. When I asked multiple co-workers and former co-workers from three different companies what they thought this did, they all guessed that it re-runs the test multiple times and reports a failure if any of the runs fail. Close, but not quite. It’s technically possible to use @flaky for that, but in practice it’s used to re-run the test multiple times and reports a pass if any of the runs pass. The company that created @flaky is effectively a storage infrastructure company, and the library is widely used at its biggest competitor.
There’s the company with a reputation for having great engineering practices that had 2 9s of reliability last time I checked, for reasons that are entirely predictable from their engineering practices. This is the second thing in a row that can’t be deanonymized because multiple companies fit the description. Here, I’m not talking about companies trying to be the next reddit or twitter where it’s, apparently, totally fine to have 1 9. I’m talking about companies that sell platforms that other companies rely on, where an outage will cause dependent companies to pause operations for the duration of the outage. Multiple companies that build infrastructure find practices that lead to 2 9s of reliability.
As far as I can tell, what happens at a lot these companies is that they started by concentrating almost totally on product growth. That’s completely and totally reasonable, because companies are worth approximately zero when they’re founded; they don’t bother with things that protect them from losses, like good ops practices or actually having security, because there’s nothing to lose (well, except for user data when the inevitable security breach happens, and if you talk to security folks at unicorns you’ll know that these happen).
The result is a culture where people are hyper-focused on growth and ignore risk. That culture tends to stick even after company has grown to be worth well over a billion dollars, and the companies have something to lose. Anyone who comes into one of these companies from Google, Amazon, or another place with solid ops practices is shocked. Often, they try to fix things, and then leave when they can’t make a dent.
Google probably has the best ops and security practices of any tech company today. It’s easy to say that you should take these things as ser