The Conversation
I was explaining how we’d scaled our backend to someone senior from another company. Nothing dramatic—just the usual shop-talk about traffic patterns and what we’d done about them.
I got to the part about our callback servers. How third-party webhooks used to hit our main API and drown out human requests during spikes. How we pulled the callbacks out into their own service, dropped events onto SNS, let SQS queue them up, and processed them with Lambda. How the main API went back to doing the one thing it was supposed to do—serve humans—and the callback path became boring.
He cut in, sharp.
“Why wouldn’t you just vertically scale the API? Bigger box, done. Why are we spending time on architecture when we could be shipping?”
I started to explain. He pushed back harder. I started to explain again. He pushed back harder again.
And somewhere in the middle of that third explanation, I realized something: I didn’t want to keep fighting. Not because he was right. Because he was operating from a completely different set of defaults than we were, and the cost of bridging that gap—right there, in that conversation—wasn’t worth it.
That moment has been rattling around in my head ever since. This post is me trying to unpack it honestly.
What We Actually Built
Quick detour, because the rest of this won’t make sense without it.
Our main API was a standard human-facing service. Users logged in, clicked buttons, expected responses in a few hundred milliseconds. Traffic was predictable—daylight-weighted, gently autoscaled.
We also had integrations with a bunch of third parties. Payment providers, CRM systems, communication platforms—anything that sends webhooks when something changes on their side. Those events were bursty by nature. A third party would batch a retry storm. A customer would sync a year of historical data. A partner would run a migration script against our endpoint. Traffic to the callback path could spike 10x–50x for minutes at a time.
Early on, these callbacks lived inside the main API. Bad idea. When the spikes hit, the API started responding slowly to everything—webhooks and humans. Users saw degraded performance because a partner decided Tuesday morning was a good time to reconcile.
The fix was structural, not computational. We pulled the callback endpoints into a separate, tiny service—the callback server. Its only job was to validate the incoming event, acknowledge it fast, and drop the payload onto SNS. From there, SQS queued it up, and Lambda consumers picked events off the queue at a rate we controlled—parallelism dialed so downstream dependencies wouldn’t fall over, retries and DLQs handling failure without the sender caring.
The main API stopped seeing third-party traffic entirely. The callback server could spike without affecting humans. The Lambda consumers processed the actual business logic at a pace our databases could handle.
Three different jobs, three different blast radii, three different scaling profiles. That’s it. That’s the whole architecture. Nothing clever—just separation.
To Us, It Wasn’t a Decision
Here’s the thing that threw me when he pushed back: we never debated this.
My team and I saw the traffic pattern, looked at each other, and the split was obvious. We wrote up the proposal, talked it through with leadership, got alignment, and built it. Vertical scaling never came up as a serious alternative. Not because we were zealots about decoupling—because the problem itself pointed at a particular shape.
You can feel it if you’ve been in enough systems: receiving an event and processing an event are two different jobs. Receiving is cheap, fast, and must not fail. Processing is expensive, slow, and can be retried. Bolt them together and you’ve tied the reliability of a fast job to the performance of a slow one. Separate them and each side can be optimized for what it’s actually doing.
The same instinct applies to traffic shape. Human traffic is smooth. Third-party traffic is spiky. Putting them on the same server means sizing for the spike or suffering during it. Either you overpay for idle capacity or you underpay and degrade. The third option—isolate the spiky path—doesn’t cost more, it just requires a little wiring.
And then there’s blast radius. If the callback processing logic has a bug that crashes the service, do you want your human-facing API to go down with it? No. If a partner sends you malformed payloads for an hour, do you want your login page to get slow? No. Failure domains should match functional boundaries, not accidental co-location.
These weren’t three trade-offs we weighed. They were starting points. Defaults. The way you know not to put your hand on a hot stove—not by deliberating, but because the answer is already wired in.
So when this guy kept asking “why not just scale vertically?” he wasn’t asking a question I had an answer to. He was asking a question that, in our shop, didn’t even get posed.
The Uncomfortable Part: I Couldn’t Explain It Well
Here’s the catch, and this is the part I’m still sitting with.
In the moment, I couldn’t explain any of what I just wrote above. Not crisply, not convincingly, not in the five-minute window this guy was giving me. I could feel myself reaching for words and grabbing clichés. You know, separation of concerns. Different failure modes. It’s just cleaner. None of that landed, because none of that is actually the argument. The argument is: here’s the traffic shape, here’s why the shapes don’t mix, here’s what breaks if you co-locate them, here’s the math on vertical scaling for bursty workloads.
I had the argument. I just didn’t have it ready.
That’s the real cost of defaults you can’t articulate. As long as the default lives inside the team, it’s fine—nobody has to defend it, it just works. But the moment you step outside the team, you’re selling. You’re in a budget review. You’re onboarding a senior hire who didn’t come from the same school of thought. You’re pitching a platform decision to a sibling org that wants to understand why their thing should look like your thing.
And in those moments, “we just did it” isn’t an argument. It isn’t even a bad argument. It’s no argument at all.
This is true of a lot of what senior engineers carry around. We know why we don’t SSH into prod servers. We know why we don’t skip migrations. We know why we break services up this way instead of that way. We know because we’ve been burned, or we’ve watched someone else get burned. But the knowledge lives in our hands, not on our tongues, and when we have to put it on our tongues, we fumble.
The pushback exposed that. Uncomfortable—but fair.
Why I Stopped Fighting
So why did I disengage?
Not because he was right. He wasn’t. Vertical scaling would have worked until it didn’t, and it would have stopped working loudly—at 2am, during a third-party retry storm, with human users watching dashboards go red. The callback isolation is right. I’d build it the same way again.
I disengaged because the cost-benefit of that specific conversation didn’t pencil out. He wasn’t a stakeholder in our system. He wasn’t going to make a decision based on what I said. The exchange was essentially rhetorical—he wanted me to justify, and I wanted to get through lunch. There was no version of the next ten minutes where I changed his mind. There were plenty of versions where I spent my energy and got nothing for it.
That sounds cynical. I don’t think it is. I think it’s one of the actual skills you develop as you get more senior: a sense for which arguments are worth having. Not every technical disagreement deserves a defense. Some do—the ones inside your team, the ones with people who can actually use your answer. Those, you invest in. You find the words. You close the articulation gap, because the team needs you to.
But the person at lunch who keeps asking why you didn’t just throw money at the problem? That’s a different conversation. You can nod, change the subject, and move on. Not every default needs to be re-litigated in public.
The one thing I did take away—and this is why I’m writing it down—is that I owe my own team a better version of the answer than the one I gave at lunch. Not for the next stranger. For the next engineer we hire who wants to understand why we do things this way. For the next budget conversation. For the next time “why not just scale it?” comes from someone whose mind actually does need to change.
Some decisions aren’t decisions. But they still deserve the words.