A/B Testing with Feature Flags: Run Experiments Without Extra Infrastructure
Every redesign starts as a guess. Every new checkout flow, onboarding change, or pricing page tweak is a hypothesis dressed up as a decision. A/B testing turns those guesses into evidence — but most teams either skip it entirely or bolt on a dedicated experimentation platform that takes months to integrate.
There's a simpler path: if you're already using feature flags, you already have most of what you need.
Why Teams Skip Experimentation
Real A/B testing feels like a big investment. You need consistent user bucketing (the same user should always see the same variant), traffic splitting, and some way to tie flag state to your analytics events. Building that from scratch is a week of work before you run a single experiment.
Dedicated tools solve this, but they come with their own overhead: a new SDK, a new dashboard, data pipelines to maintain, and eventual divergence between "experiment flags" and "feature flags." You end up managing two systems that do overlapping things.
Feature Flags Already Do the Hard Part
A feature flag with percentage-based targeting is functionally a traffic splitter. Add stable user-level bucketing (hash on user ID so the split is deterministic) and you have a consistent A/B assignment engine. The only missing piece is tying that assignment to your metrics.
Here's what the pattern looks like in practice with the Featureflow JavaScript SDK:
// Evaluate the flag — Featureflow hashes on userId for stable bucketing
const variant = featureflow.evaluate('checkout-redesign').value();
// variant === 'control' | 'treatment'
// Pass the assignment downstream to your analytics layer
analytics.track('Checkout Viewed', {
userId,
variant,
});Featureflow's targeting rules let you define the split — 50/50, 90/10 for a cautious rollout, or segment by plan, region, or any user attribute. The flag evaluates client-side or server-side with consistent results either way. No sticky sessions required.
The Practical Workflow
A typical experiment looks like this:
- Wrap both variants behind the flag in the same codebase — no separate branches, no separate deploys.
- Set the split in the Featureflow dashboard. Start at 10% treatment if you're cautious, 50/50 if you want faster results.
- Log the variant alongside your conversion events in whatever analytics tool you already use (Amplitude, Mixpanel, Segment, PostHog — it doesn't matter).
- Declare a winner and promote the flag to 100% — or kill the variant and clean up the code. Either way, you're done in the same tool you already manage.
Because the flag and the feature lifecycle live in the same place, you don't end up with zombie experiment flags that nobody dares to delete six months later.
What This Approach Doesn't Cover
If you need built-in statistical significance calculations, Bayesian inference, or automated stopping rules, a dedicated experimentation platform earns its keep. Feature flags give you the assignment mechanism — the analysis is still on you.
For most product teams running a handful of concurrent experiments, that trade-off is fine. Export the variant field from your analytics tool, run a chi-square or t-test, and move on.
Experimentation doesn't have to mean a six-week integration project. If your flags already support percentage splits and attribute-based targeting, you're 80% of the way there. Start with one experiment, track the variant in your analytics events, and build from there.
👉 See how Featureflow's targeting rules work at docs.featureflow.io.
#ABTesting#FeatureFlags#Experimentation#ContinuousDelivery#ProductEngineering
Start experimenting without the overhead
Featureflow gives you percentage splits, attribute targeting, and stable user bucketing out of the box. Free to get started.
Start Now (Free)Related Articles
Feature Flags for Mobile Apps: Ship Without Waiting on App Store Review
App store reviews take days. Bugs don't wait. Feature flags let mobile teams ship code continuously, gate features remotely, and kill broken behaviour — without a new release.
Feature Flags and Observability: How to Know If Your Rollout Is Working
Releasing to 10% of users without watching metrics is just gambling at a smaller scale. Here's how to connect flag evaluations to your observability stack — so you know when to expand, and when to pull back.
Feature Flags for Customer Targeting: Roll Out Features to the Right Users
Percentage rollouts control how many users see a feature. Targeting rules control which users see it — by plan tier, beta opt-in, org ID, or any attribute you pass. Here's how to use them.