Continuous DeliveryApr 15, 2026

Feature Flags and Observability: How to Know If Your Rollout Is Working

M
Marcus Johnson
Platform Engineer

You flipped the flag to 10%. Now what?

Most teams treat the rollout step and the monitoring step as entirely separate concerns. The flag controls exposure; dashboards exist somewhere else; someone checks them later. In practice, "later" means "when Slack gets loud" — which is after customers have already had a bad experience.

Releasing to a slice of users without watching what happens to that slice is just gambling at a smaller scale. The percentage reduces blast radius. It doesn't replace knowing whether the rollout is safe to expand.

Segment Your Metrics by Flag Variant

The core pattern is simple: when you evaluate a flag, you know exactly which users are in which variant. Instrument that. Attach the variant as a dimension to every metric event those users generate.

Here's what that looks like with the Featureflow JavaScript SDK:

import featureflow from '@featureflow/js-client-sdk';

// Evaluate the flag — returns the assigned variant for this user
const variant = featureflow.evaluate('new-search-engine', user).value;
// variant === 'on' | 'off'

const start = performance.now();
const results = await runSearch(query);
const duration = performance.now() - start;

// Tag every metric with the variant so you can split charts by cohort
analytics.track('search_completed', {
  userId: user.id,
  variant,                 // 'on' or 'off'
  resultCount: results.length,
  latencyMs: duration,
  hasResults: results.length > 0,
});

Now your analytics tool can plot error rate, latency, and conversion side-by-side for the on cohort and the off cohort. If the lines diverge unfavourably after a rollout step, you see it immediately — not in the next post-mortem.

Rollout Checkpoints, Not a Single Flip

Once your metrics are variant-aware, progressive rollout becomes a real feedback loop instead of a ritual. A sensible checkpoint sequence looks like this:

  • 5% — smoke test. Watch error rates and p99 latency for 30 minutes. If clean, proceed.
  • 25% — load signal. You now have enough volume to spot subtle regressions in conversion or session depth. Watch for 1–2 hours.
  • 100% — full release. Only after the previous cohorts look healthy. At this point the flag is a formality — the risk is already gone.

The specific thresholds don't matter as much as the principle: each step has a defined pass/fail condition before the next one starts. Without that, progressive delivery is just delayed delivery.

Alerts That Know About Flags

The final piece is making your alerting variant-aware. An error rate spike that only affects the on cohort is a clear rollout signal. The same spike across both cohorts is an infrastructure problem. Treating them identically — one undifferentiated alert — hides the distinction you need to act quickly.

Most observability tools (Datadog, Grafana, New Relic) support alert conditions filtered by custom dimensions. Add variant to your metric events and you can write an alert that fires specifically when the treatment cohort degrades — and pages the right person with the right context.

Feature flags give you control over exposure. Observability gives you the signal to use that control wisely. Neither is much use without the other.

👉 Get started with Featureflow at featureflow.com — SDK docs at docs.featureflow.io.

#FeatureFlags#Observability#ContinuousDelivery#DevOps#ProgressiveDelivery

Ship with confidence at every rollout step

Featureflow gives you the control — pair it with your metrics and you always know when it's safe to go further.

Start Now (Free)

Related Articles