How Northwind Freight cut bug detection from days to minutes

Northwind Freight used to find its production bugs by reading one-star reviews and support tickets. Now a Slack alert fires within minutes of a regression, with the symbolicated stack trace already attached. The change took less than an afternoon to set up.

Northwind builds shipment-tracking and freight-booking software for mid-market carriers — a Next.js dashboard on top of a Node and Express API, kept running by about 25 engineers. They ship several times a week. What they didn't have was any way to know when a deploy broke something. Bugs reached them the slow way: a support ticket, a frustrated email, occasionally a one-star review days after the fact.

The breaking point was a checkout regression. A change to the booking flow silently failed for a subset of customers — the request errored server-side, the user saw a spinner that never resolved, and nothing told the team. It ran like that for the better part of a week before a support thread connected enough dots for an engineer to reproduce it. By then they'd lost bookings they couldn't get back and had no idea how many.

"We were debugging blind," Devin Marsh, a staff engineer at Northwind, said. "Someone would forward a ticket, we'd spend half a day trying to reproduce it from a vague description, and half the time we couldn't. We knew something was wrong long before we knew what."

Their first instinct was more logging. They turned up verbosity across the API and piped everything to their log aggregator. It made things worse — the real errors were buried under thousands of routine lines, and nobody had time to read them. Grepping logs after a customer complained wasn't detection. It was archaeology.

What they did

Northwind installed Catch across both halves of the stack. The Next.js app got @catch.dev/next, which covers client and server in one package, and the Express API got @catch.dev/express.

npm install @catch.dev/next
npm install @catch.dev/express

They set CATCH_ACCESS_TOKEN=ck_live_... in each service's environment, wired source-map upload into their existing build so production stack traces pointed at real lines instead of minified soup, and connected a Slack channel for alerts. The whole thing — both SDKs, source maps, the Slack alert — was done before the afternoon was out. The first real error showed up in the dashboard within minutes of pointing staging traffic at it, grouped into an issue and sorted by how many users it touched. You can see the SDKs they used in the SDK reference.

The detail that mattered most to Marsh wasn't the capture — it was the grouping. Instead of a wall of identical log lines, Catch folded every occurrence of the same fault into one issue, told him how many users hit it, and let him assign it. The signal stopped drowning.

What changed

The checkout-class regression that once hid for a week now surfaces in minutes. Time to detect a production problem went from days — however long it took a customer to complain and a human to notice — to the few minutes between a deploy and a Slack alert firing on the spike.

Triage moved with it. Reproducing a bug from a support ticket used to eat about two days of back-and-forth; with the stack trace, the affected route, and the user count already in front of them, the same triage now takes roughly twenty minutes. The engineer opens the issue, reads the trace, and ships a fix instead of playing detective.

"I stopped hearing about bugs from our support inbox," Marsh said. "Now I hear about them from Slack, minutes after they happen, with the stack trace already attached."

It isn't a finished story. Catch is web and JavaScript only, so the parts of Northwind's infrastructure that aren't Node still sit outside it, and the team is deliberate about which alerts route to Slack so the channel doesn't become the noisy log feed they were trying to escape.

What's next

With detection no longer a manual job, Northwind has started using the user-impact counts to decide what to fix first instead of working from whoever shouted loudest. Next they're rolling the web feedback widget into the same dashboard, so a customer who hits a confusing screen can report it with an annotated screenshot — landing right next to the errors the team already watches.

What they did

What changed

What's next

// More case studies