Recently I was recapping some technical decisions our team made around multi-regional support for the gateway. The legacy gateway only ran in us-central. When we started building the new one, "multi-regional by default" was the loudest idea. Everyone thought it was the right call — better availability, sounds great, nobody really disagreed.
I'm not here to say multi-region is wrong. But I want to write down why it sounded better on paper than it turned out to be, at least for us so far.
Why it sounded great
Multi-region means you're not tied to one place. If us-central has a bad day, traffic can go somewhere else. Your gateway's availability goes up. You're not a single point of failure. On paper that's exactly what you want for a critical path service.
What we ran into in reality
We have a bunch of dependency services. Most of them only run in us-central. So even when we deployed our gateway in eu-west, we still had to call back to us-central for a lot of things — auth, config, filtering, whatever. Those calls become the bottleneck. The gateway might be in two regions, but the dependencies aren't. So we didn't really get "eu traffic stays in eu." We got "eu gateway calls us-central and you can see the latency here."
Making those dependency services multi-regional isn't a small ask. From our side we tried to stay as stateless as we could. But for other services you're talking about moving stateless pods, stateful databases, caches, the whole stack. The work multiplies. And it's not just one team — we depend on several. Getting all of them to support multiple regions is a huge amount of work, and you still have to keep the current production environment stable while you do it.
It doesn't stop there. Some of the upstream services that integrate with us aren't multi-regional either. Same story.
For us, the downsides were real. Operating multiple regions means more to maintain — deployments, config. Double the regions can mean double the cost, also hard to debug when things went wrong.
What we actually gained
The gains are real but they're modest. Our gateway's availability did improve. We've never had a case where a whole region went down and we were glad we had the other one.
I might dig into availability SLO in another post. We tend to assume more uptime is always better and that multi-region is the way to get there. But how much is enough? If you're targeting 99.9% uptime, you can sometimes hit that without multi-region — with a solid incident and on-call process, even a single region can meet the target. You get the same guarantee with less ops, lower cost, and simpler debugging. Maybe that's the better trade.
But the bottlenecks we care about — the dependencies, the upstreams — are still there. Only a small fraction of our upstream is truly multi-regional. So a lot of the time we're still under the us-central dependency bottleneck. We added complexity and cost for a win that's real but smaller than we expected.
You ain't gonna need it
This got me thinking about the old line: you ain't gonna need it. We built for a world where multi-region would unlock a lot. In practice we built for a world we're not in yet. The dependencies and the ecosystem weren't there. So we paid the cost of multi-region — the ops, the cost, the complexity — before we could get most of the benefit.
I'm not saying we should rip it out. We have it, and maybe over time more of the chain will be multi-regional and the payoff will grow. But I'd be more careful next time. Before we go "multi-X by default" again, I'd ask: are the things we depend on ready? Are we solving a problem we have today or one we might have in the near future? And how much are we willing to pay for "might"?
You ain't gonna need it — until you do. The trick is not building the thing you might need so early that you're the only one who has it, and you're paying for it alone.