Cloudflare is down and nobody can check

November 18, 2025 3 min read

Cloudflare went down this morning. So did Downdetector. The site that tells you things are broken was broken because things were broken.

DevOpsCloudInfrastructure

Cloudflare went down this morning at 11:20 UTC. “Unusual traffic spike” according to their status page, which is corporate speak for “we’re not sure what happened but everything’s on fire”.

X, ChatGPT, Spotify, Facebook, Canva, a bunch of crypto sites… 500 errors everywhere. NET stock dropped 3.5% in pre-market. Over 11,500 outage reports on X from the US alone.

But here’s where it gets beautiful.

Downdetector was down too

Downdetector, THE website people use to check if services are down, went down with them. Why? Because it uses Cloudflare to block bots.

The irony

The site that tells you something is down… was down because that something was down.

This is peak infrastructure irony. You can’t make this stuff up.

The “decentralized” internet isn’t

We like to think the internet is this resilient distributed network. It’s not. A handful of providers carry most of the traffic.

Cloudflare alone handles around 20% of global web traffic. AWS carries a similar chunk. When one of them sneezes, half the internet catches a cold.

And here’s the thing: your monitoring, your alerting, your status page… there’s a good chance all of that sits behind the same CDN as your app.

When it breaks, you don’t know it broke. Your users find out before you do.

The real lesson

Don’t just diversify clouds for the sake of multicloud buzzword compliance. Diversify your critical dependencies.

Your DNS. Your CDN. Your alerting system. Your status page.

If everything goes through the same pipe, you have a single point of failure dressed up as “high availability”. It looks redundant on the architecture diagram. It isn’t.

Some practical moves:

Use a different DNS provider than your main CDN
Host your status page on a separate provider (GitHub Pages works fine for this)
Have alerting that doesn’t depend on your primary infrastructure
Keep a way to communicate with users that doesn’t require your main stack

The fix

Cloudflare Access and WARP came back first. Services are recovering as I write this.

But next time someone presents an architecture with “high availability” that has a hidden SPOF, remember this morning. The outage detector couldn’t detect outages. That’s the kind of failure mode nobody thinks about until it happens.

Redundancy isn’t about having two of everything. It’s about having two of everything that don’t share a common failure point.