I just returned from an incredible couple of days back at Princeton University (where I did my PhD), where I spent two days at the Center for Information Technology Policy with some of the people who run the world’s Transport Layer Security (TLS) certificate infrastructure. The workshop — Securing the Public Key Infrastructure — was held under Chatham House rules, so I’m thrilled to share a few insights from the event. In particular, I'll explore why certificate authority in network security is hugely challenging, and why an alternative may be the better approach.
What is a Certificate Authority, and What Does it Mean for Network Security?
Unless you are familiar with public key infrastructure (PKI) chances are you’ve probably never thought much about certificate authorities (CAs) before. They make up a part of the Internet’s plumbing that most people don’t spend time thinking about. A CA is an entity that stores, signs, and issues certificates. Certificates bind the ownership of a public key to the named subject of the certificate. This allows others (relying parties) to rely upon signatures made by the private key that corresponds to the certified public key. This is the value of a certificate authority in network security, if you trust the CA, you can trust that the signature was actually created by the named subject of the certificate.
Wikipedia says it pretty well: “A CA acts as a trusted third party—trusted both by the subject (owner) of the certificate and by the party relying upon the certificate.” In other words, the CA is a single point of compromise.
This week, I met some of the folks who operate the world's biggest CAs, as well as those who decide which CAs get to be CAs.
Who decides what gets to be a CA? Well, as it turns out, your operating system or browser ships with a list of CAs that you should trust for code signing to ensure that each software update really came from the actual software developer (not malware distributed by North Korea) or for TLS (why we have secure web connections). There are people (at places like Google, Microsoft, and Mozilla) who are responsible for maintaining those lists. Some of these people were in the room too.
So, a room full of high-powered geeks (aka my favorite type of room).
One thing I’ve heard my whole career is that it’s really hard to run a CA. I used to wonder why. I mean, it’s just signing entries in a database. But now, I know why. Let me share a few nuggets of wisdom.
Network Security Issues with Certificate Authorities
These CAs maintain keys with 25-year lifespans in airgapped hardware security modules (HSMs) that are sealed in tamper-evident bags and stored inside dual control safes within secure rooms that let humans enter only in pairs. If you are an unlucky human who is the only one in the secure room, you get locked inside!
Intermediate CAs are held in online HSMs. These are used to sign the certificates we used day to day in TLS and for code signing. Their lifetimes can last one year, three years, or five years. So, nowhere near what we would consider “short lived,” but a bit shorter lived than the root CAs that are shipped with your operating system or browser. Why do these keys live so long? Because issuing them is really hard. And if you mess up issuance, the consequences can be really bad.
Issuing CA keys is hard because CAs need to maintain physical control of private keys at all times. So, if you get an intermediate CA key signed by a root CA key, and you need to move it from one data center to another (because the root key never leaves its secure room), then you need to drive the intermediate key from one place to the other. There are strict protocols about who must be around the HSM at all times. Going to the bathroom is tricky, apparently, because the key always needs to be supervised by a pair of people who work at the CA. Flying is out of the question because, if an HSM goes through security (the TSA), it is considered compromised, as it’s left the control of the CA employees. An employee could get locked in the secure room with the root CA. And the list goes on.
There's a lot of questions surrounding PKI certificate management worst scenarios. What happens if a key goes bad? Or gets compromised? Here’s the crazy part: Keys never really get revoked because there’s always the worry that they could be baked into some embedded device that’s still online 5–10 years later. Such that if you turn off the old keys, hundreds of devices could go offline. So, the keys never get turned off. Apparently, some old super root keys are still being used to issue intermediate CAs because no one really knows who is relying on those keys. The CAs at the workshop called this problem “calcification” — the keys are out there being used, and you can’t just get rid of them, even if they are super old.
Even when keys do get revoked, no one really checks the certificate revocation! Clients are constantly requesting revocation information from CAs. The Online Certificate Status Protocol (OCSP) does this, driving the majority of the traffic that CAs must support. However, clients don’t really validate this information. Adam Langley of Google wrote a classic post on his blog that explains why. This means a ton of the effort put forth by CAs is just wasted.
All of this means that even large Fortune 100 companies struggle to run their own CAs. The compliance requirements are too hard, and all the requirements around key management in hardware are too esoteric for most companies whose business is not "being a CA." I heard stories about multi-person teams at massive enterprises failing to successfully run private CAs for their organizations. That’s why some CAs have businesses running “outsourced” private CAs for big enterprises — so “private CA as a service” is a real thing!
Certificate Authorities Must Be Highly Available
Beyond security, availability is also a big issue. Site reliability engineers (SREs) at some of these big CAs must live with the worry that if they mess up, “half the Internet could turn off.” We talked about some of the things CAs do to maintain reliability, like running software in data centers that are designed to withstand earthquakes or flooding, and having generators and batteries that would allow them to operate for hours, even during a power outage.
But even that is not enough. At the workshop, there was a lot of conversation about “failover CAs” — that is, allowing a subject (like *.bastionzero.com) to request certificates from two different independent CAs (e.g., one certificate from Google Trust Services and another from Let’s Encrypt). That way, if one of the CAs goes dark due to a failure, the other one will still be alive, so the subject will still have a valid certificate that can be used for connectivity. How to signal this to users correctly without creating security vulnerabilities, and how you automate the issuance of CAs from different providers (which is done via a protocol called ACME) were other interesting topics of discussion.
Dangers of Domain-Validated Certificates
Finally, we spent a ton of time on the security of domain validation as it relates to PKI certificate management.
The idea here is that CAs need proof that a given subject owns a domain. For instance, if they issue a certificate for *.bastionzero.com to me, they need to know that I am actually the owner of *.bastionzero.com. How do they know that? They visit the domain via either an HTTP request or a DNS request and check that I am able to display a validation token that the CA provided to me ahead of time. This verifies that I really own the domain.
Cool. But remember! This cannot be done over TLS! This is because there is no TLS certificate yet because the whole point of this process is to issue the TLS certificate! Meaning that this process runs over the 2000s Internet, without TLS. And so, all those crazy routing and DNS attacks that everyone dismissed 15 years ago as too far-fetched — and that everyone dismisses now because they can be prevented by TLS — are a big deal. These attacks were demonstrated in the lab (we got to see a live demo) and are actually happening in the wild. We heard folks cite real-life incidents involving the use of routing and DNS attacks to issue fraudulent certifications. Naturally, there was a lot of discussion on how to prevent these in the workshop. I hope other attendees prepare a deeper write-up on this issue because it was a significant focus of the conversation.
Finding an Alternative to Certificate Authority Challenges
Running a CA is hard. Don’t try it at home.
If you’ve followed my blogging over the past two years, you probably know that one of the reasons Ethan Heilman and I started BastionZero was to build an access system that doesn’t rely on CAs. There's real value in an alternative to certificate authorities.Because operating a CA is hard. Because a CA is a single point of compromise. And because you don’t want to try this at home.
Instead, BastionZero uses multiple roots of trust to authenticate, authorize, and audit access to infrastructure. This way, you don’t need to worry about a single point of compromise (like a CA) being hacked. BastionZero makes remote access simple, affordable, and secure across all of your workflows.