If you were deciding on whether to use AWS, Azure or Google Cloud for a
production application, what factors would you use to make the decision? There
are probably many such factors you might think of, but I’d like to point out
one that probably isn’t commonly considered: the relative trustworthiness of
Google’s accounts system compared to that of, say, AWS.
Google’s accounts system controls access to all Google services, including
Google Cloud. The problem with this is that the design of Google’s accounts
system gives me serious reservations trusting it for anything mission-critical.
The problem lies ultimately with the modern trend of “risk-based authentication”.
However, a more descriptive term would be “non-deterministic login”.
The problem is precisely this: The credentials you require to access a Google
account are essentially indeterminate. Supposedly, for a simple Google account
without 2FA enabled, knowledge of the account email and password should be
sufficient to access an account; except sometimes, they aren’t. Sometimes,
Google might randomly decide your login attempt is suspicious, and demand you
complete some additional verification step.
This sounds potentially innocuous until you then realise that there’s no
guarantee you can actually complete this additional verification step. There
are to my recollection numerous stories of people being locked out of accounts
which they have the passwords for because Google has decided that things are
suspicious and having the password is not enough.
The problem is not with requiring a high level of authentication; for example,
enabling 2FA is commonly considered a best practice nowadays. If I enable 2FA,
I’m committing to be able to complete that authentication challenge in the
future and can make appropriate arrangements. The problem is that Google seems
to reserve the right to randomly and unpredictably demand a higher level of
authentication without any kind of prior opt-in. It’s the unpredictability
that’s the problem here.
What makes this unpredictability particularly pernicious is that it creates the
possibility of being able to access an account for a long period of time before
suddenly having that access essentially destroyed without warning or recourse. If some
additional authentication step X is required for every single login, you’re
going to make sure you can always complete that additional authentication step.
If additional authentication step X is instead only demanded for logins which
are deemed “suspicious” according to the inscrutable statistical determination
of some ML model, there’s a good chance you will have no idea that that
additional authentication step X may be a necessary condition to accessing the
account under some circumstances. Which means the first time you find out
about this is when your infrastructure blows up, you have to login from a different
machine than you usually do, and Google decides this is “suspicious” and demands
an additional authentication step — and because this is the first time you
find out about this, it may or may not be an authentication step you are
capable of completing. If it isn’t, you’re screwed as far as Google Cloud is
concerned. You can’t access your infrastructure or begin to fix it.
Fundamentally, the issue here comes down to the fact that an accounts system
for critical infrastructure needs to fulfill two objectives:
- It must be possible for authorized users to gain access.
- It must not be possible for unauthorized users to gain access.
“Risk-based” authentication essentially tries too hard to fulfil the second
objective in a way that compromises on the former.
In particular, if access to an account can be mission-critical, it’s desirable
for all authentication steps needed to access it to be tested regularly. This
suggests that if an accounts system designed for critical infrastructure has a
repertoire of authentication steps A, B, C it can variously demand, it’s
better for it to demand A, B and C every time than to demand A and B in most
cases, and also C if the login is deemed “suspicious” on undocumented
statistical grounds. The inconvenience of always having to go through step C
is nothing relative to the risk that if C isn’t demanded most of the time,
the account holder may not understand that C is sometimes necessary until
it’s too late, and lose access to the account.
In short, what I want from an infrastructure accounts system is determinism. I
explicitly don’t w