Testing Code That Has Feature Flags: Strategies That Don't Explode Your Test Matrix
Add ten boolean flags and, in theory, you have 1,024 versions of your application to test. Nobody actually does that — but the fear of it is why some teams still treat flags as a thing to avoid in tested code paths.
They shouldn't. Flags are gates, not branches in your architecture. With a few patterns, your test suite stays small, fast, and honest.
Stub the SDK, not the flag
The biggest mistake is reaching into the network during tests. Real flag evaluations belong in production; in tests, you want a deterministic, in-memory client.
Wrap the SDK behind a thin interface and inject a fake. Each test pins exactly the variants it cares about — nothing else.
// flags.ts
export interface Flags {
evaluate(key: string, fallback: string): string;
}
// flags.test.ts
const fakeFlags = (overrides: Record<string, string>): Flags => ({
evaluate: (key, fallback) => overrides[key] ?? fallback,
});
const flags = fakeFlags({ "checkout.new-payment": "on" });This works the same shape for the Featureflow SDKs — wrap client.evaluate() behind your own port and your tests stop caring whether the network exists.
Test the paths that matter, not the cartesian product
You don't need a test for every flag combination. You need tests for:
- The default state — what new users get when no overrides apply.
- Each variant of the flag under change — the one your PR is touching, both on and off.
- The kill-switch path — the safe fallback you'd flip to in an incident.
Everything else is interactions, and interactions are best caught by integration tests on the handful of flag pairs that actually compose in production.
Default to the safe variant in tests
When a test doesn't explicitly set a flag, your fake client should return the production-default variant. Two benefits: tests written today don't break tomorrow when you add an unrelated flag, and your CI proves the production default is still green.
The Featureflow SDKs already require a fallback at the call site — mirror that in your fake and you get the same guarantee in tests.
Clean up the flag, then clean up the test
When a flag retires, its tests retire with it. Leaving staleif (flags.evaluate(...))branches in test fixtures is exactly how flag debt creeps back in through the side door.
Treat flag retirement as a single PR: remove the flag in code, remove the conditional in tests, delete the flag in your dashboard. Three steps, one ticket.
Flagged code isn't harder to test — it's just easier to test badly. Stub the SDK, pin variants per test, default to safe, and clean up alongside the flag itself. Your matrix stays tractable, and the flags keep doing what they're supposed to: protecting users from your next idea until it's ready.
Want flag patterns that are designed to test cleanly? Start free at featureflow.com.
#FeatureFlags#Testing#ContinuousDelivery#DevOps#SoftwareEngineering
Flags that are built to test cleanly
Featureflow gives you typed SDKs, deterministic fallbacks, and a dashboard that retires dead flags before they rot.
Start Now (Free)Related Articles
Feature Flags and the Strangler Fig: Refactor Legacy Code Without the Big-Bang Rewrite
Big-bang rewrites kill teams. The strangler fig pattern with feature flags lets you replace legacy code one slice at a time — shadow-testing, ramping traffic, and keeping a kill switch the whole way.
Feature Flags for Mobile Apps: Ship Without Waiting on App Store Review
App store reviews take days. Bugs don't wait. Feature flags let mobile teams ship code continuously, gate features remotely, and kill broken behaviour — without a new release.
Feature Flags and Observability: How to Know If Your Rollout Is Working
Releasing to 10% of users without watching metrics is just gambling at a smaller scale. Here's how to connect flag evaluations to your observability stack — so you know when to expand, and when to pull back.