Developer-First Traffic Management: See Live Traffic & Override APIs Instantly

What if developers could see live traffic flowing in a complex cluster, and then instantly mock or override any API in that microservices stack for testing?
Service meshes are the undisputed titans of modern infrastructure, giving platform engineers unprecedented control over production traffic. But they were built for platform engineers, not developers. When you hand the keys to that complex engine to a developer who just wants to take a test drive, you get friction, slowdowns, and a development cycle that feels more like navigating a bureaucracy than building innovative software.
This isn't a critique of the meshes themselves; it's a critique of how they're used. We’ve learned a hard lesson: tools built for the operational rigor of production are often the wrong tools for the creative chaos of development.
This post focuses on what we built to solve this problem: a new set of developer-first traffic management tools that puts the developer back in the driver's seat. In a follow-up post (Part 2), we'll dive deep into how we built it.
The Developer's Dilemma: A Day in the Life
Let's ground this in a concrete example. Imagine you're a developer working on a large e-commerce application, tasked with adding a feature to the checkout-service. To test it properly, you need it to interact with the real payment-service and inventory-service running in a shared staging environment.
Testing this change the "old way" presented a terrible choice: attempt to run the entire stack locally (a fantasy), or redeploy the entire service for every minor code change (a nightmare). We've previously solved this core problem with Sandboxes: a way to create an isolated environment that intelligently routes all traffic for your checkout-service to your local workstation, while all other service-to-service traffic remains unaffected.
This is a massive leap forward. But as developers started using this power, they wanted to do more. The "all-or-nothing" routing of an entire service was great, but it didn't solve for more granular, complex scenarios. Developers still needed to:
- Deeply inspect live traffic: Understand the actual request and response payloads flowing between services, not just what the (often stale) documentation claims.
- Reroute a single API call: Make a granular change to just one API endpoint (e.g.,
POST /checkout/new-feature) to test a new function, while letting the stable cluster version handle all other calls (e.g.,GET /checkout/history). - Mock specific network calls: Test failure modes by mocking a specific response (e.g., what happens when the
payment-servicereturns a503?) without writing a ton of custom mocking code. - Debug and reproduce specific cases: Isolate a single problematic API call to reproduce a bug, without having to change the entire running service.
Technically, most of these things are possible with a service mesh. But as we quickly learned, "possible" is not the same as "practical."
.png)
Where the Mesh Falls Short for Developers
This is where the dream collides with reality. To implement that "simple" granular route, a developer must:
- Become a Mesh Expert: They have to understand the specific, complex CRDs of their mesh, like Istio's
VirtualService,DestinationRule, or the Kubernetes Gateway API constructs likeHTTPRoute. They need to learn a platform engineer-centric YAML schema that was designed for cluster-wide operations, not for a developer's temporary workflow. - Suffer High Cognitive Load: Their focus shifts from "How does my feature work?" to "What's the right incantation of YAML to make the network do what I want?" Even with newer standards like the Kubernetes Gateway API, it's still a leaky abstraction that forces developers to understand and manage Kubernetes-native configuration, which is not their primary job.
- Risk Breaking Things: In many setups, developers might be editing shared mesh configuration files. A small typo in an Istio
VirtualServicecould inadvertently disrupt traffic for the entire team working in that shared environment. The blast radius for a simple mistake is far too large.
Using mesh-native tools for simple dev tasks felt like using a sledgehammer to crack a nut. We needed a different approach.
Unleashing Developer Superpowers: What We Built
We built a new, programmable layer that sits above the mesh, designed specifically as a developer-first toolkit. It provides two core capabilities to start, enabling a complete, frictionless workflow. This isn't an exhaustive list, but rather the first two powerful tools we chose to build.
1. Real-Time Traffic Discovery: "What's Actually Happening?"
Before you can change an API, you have to understand it. Stale documentation and out-of-date OpenAPI specs don't cut it. Our new layer lets a developer safely tap into a sandboxed version of a service on the cluster and see the real-time flow.
With a simple policy, a developer can get a "read-only" stream of all request and response payloads for a given service in their sandbox. This provides a safe, passive view into the live behavior of the system, letting them work with the ground truth, not just documentation.
2. API-Level Sandboxing: "Surgical Override for Your Local Code"
Once a developer understands the traffic, they can act. This is the killer feature. Instead of redeploying an entire service, the developer runs a single command:
signadot local override --sandbox ... -w ... --p=80 --to=localhost:8000
This command connects their local process (running on localhost:8000) to a specific sandbox's workload port. Here's the magic: our proxy first sends the request to the local process. If that process returns the sd-override header set to true, its response is used (the override is successful). If it doesn't return that header, the proxy seamlessly falls back, using the response from the original workload in the sandbox.
This simple mechanism becomes incredibly powerful when coupled with a new workload type we call "virtual." Unlike "fork" or "local" workloads, a "virtual" workload isn't a separate copy of a service; it's just a pointer to the stable, "baseline" version.
By using the override command against a "virtual" workload, a developer can use their sandbox's routing key (an identifier for their request) to test a single API call against the stable baseline service, but have it served by their local code. This means in just two commands, they can surgically override one API call from the baseline cluster service with code running in their local IDE, all without touching any YAML.
Conclusion: A New Framework for Developer-First Networking
Our journey taught us a critical lesson, and it's the key takeaway we want to leave you with: a vast number of powerful capabilities already exist in our infrastructure, but they aren't exposed at the right level of abstraction. When we build the right developer-first layer to expose them, we can supercharge the developer workflow.
In the modern stack, where teams are building with AI and moving faster than ever, the ability to safely inspect and manipulate live traffic isn't just a convenience; it's a developer superpower. The tools built for networking and operations, like service meshes, are powerful but operate at the wrong abstraction level for this kind of development velocity. The key to unlocking developer productivity is to partner with platform engineers and use a simple, programmable, developer-first layer on top of the mesh to safely expose its power.
The capabilities we've shown today, live traffic discovery and API-level sandboxing, are just the beginning. We've built a general "middleware" framework that allows many more traffic capabilities to be exposed to developers safely. We're on a path to build out a full plugin system, one that empowers platform teams to write and deploy their own custom middlewares, tailored to their organization's specific needs.
Coming in Part 2: Now that you've seen what this layer does, you're probably wondering how it works. In Part 2, we'll dive into the architecture: how we built a high-performance middleware proxy, and integrated it all with different service meshes.
Join our 1000+ subscribers for the latest updates from Signadot


