The Pitfalls of Using Feature Flagging for Testing in Shared Staging Environments

Microservices architecture has redefined how software development occurs, with its focus on decoupling and independently deployable components. However, testing microservices brings unique challenges, especially when developers share high-fidelity environments. A popular strategy employed to avoid potentially destabilizing these shared environments is feature flagging. Traditionally used in production, feature flagging has found its way into the staging process. In this blog post, we'll discuss the pitfalls of using feature flags for testing in shared staging environments and why this may not be an ideal approach.

Feature Flagging: A Brief Overview

Feature flagging, or feature toggling, is a technique that allows teams to modify an application's functionality without altering code. While it's primarily used in production to toggle features on or off for specific user segments or manage rollouts, developers have started using it in staging environments to wrap new code and test it in isolation.

On the surface, this seems like a good idea. Feature flags can isolate new changes, so they're only visible to specific test users, thereby safeguarding other developers from potential disruptions. But does this strategy work in the shared staging environment? Let's dissect it.

Why Feature Flagging Might Fall Short in Staging Environments

1) Long Feedback Cycles

Fast feedback loops are critical for agile development. The quicker developers can identify and address issues, the better the end product. Unfortunately, using feature flags in shared staging environments can lead to longer feedback cycles.

Each code iteration involves modifying the code, correctly implementing feature flags, and then deploying the code through the CI/CD pipeline to the staging environment. The process can be slow and unwieldy, counteracting the need for speedy feedback.

2) Code Pollution

Feature flags have a clear purpose in production environments. However, when used for testing in shared staging environments, they could lead to code pollution. Code pollution, in the context of feature flags, refers to the additional code or clutter that gets introduced into the codebase in order to control the execution of features.

Every new feature flag added to a codebase expands the configuration space exponentially, leading to an increased number of possible states that the application can be in. This amplifies the difficulty of testing and can create obscure bugs that only appear in certain combinations of feature flag states. Furthermore, without careful management, feature flags can lead to technical debt. If old and deprecated flags are not promptly removed, they can clutter the codebase and make it harder to understand and maintain. Flags used for testing purposes may not find a place in the final code, resulting in additional cleanup work and potential confusion down the line.

A significant concern when discussing code pollution is the trouble collaborating in a complex staging system with feature flags: When multiple developers are testing their features concurrently in a shared staging environment, there can be overlaps and interactions between the feature flags they're using. This can lead to a tangled web of conditional code paths which can be difficult to manage and debug.

While feature flags are a powerful tool for managing and releasing new features in production, their use in shared staging environments can lead to a variety of issues related to code complexity, maintainability, and readability.

3) Difficulty in Handling Refactoring

Feature flags work well for introducing new features. However, code changes during active development aren't just about new features - they often involve refactoring or modifying existing code. Wrapping refactored code with feature flags can be challenging and might even defeat the purpose if done incorrectly. Put it another way: feature flags are great for an expanding product, but don’t make as much sense during refactoring.

4) Potential Resource Leaks

Even with feature flags, faulty code can negatively impact a shared environment. If a new feature introduces memory leaks or resource hogging, it can affect the performance of the shared environment, regardless of whether it's under a feature flag or not.

Rethinking Testing in Shared Staging Environments

Given the shortcomings of using feature flags in shared staging environments, it's vital to reassess our testing strategies. Isolating changes is crucial, but the method employed needs to be efficient and productive.

One alternative approach is using a tool like Signadot, which provides lightweight, ephemeral "Sandboxes" within a staging or production Kubernetes cluster. These sandboxes only contain the services that have been modified and take mere seconds to spin up, offering rapid, isolated testing without the need for feature flags.

Conclusion

While feature flags have proven immensely useful in production environments, their application for testing in shared staging environments presents challenges. We must remember that tools and techniques are not one-size-fits-all. As we continue to evolve our microservices testing strategies, we need to consider the unique requirements and challenges of shared staging environments, employing solutions that foster both productivity and code stability.

Join our 1000+ subscribers for the latest updates from Signadot