singleflight: How Go Collapses Duplicate Calls Under a Thundering Herd

Fri Jul 03 2026•10 min read

July 3, 2026

singleflight: How Go Collapses Duplicate Calls Under a Thundering Herd

Your service is fast because of a cache. Redis sits in front of a slow database, and 99% of requests never touch the database at all. Then a hot key expires, and for a few milliseconds your fast service turns into a thundering herd stampeding a single slow query. This post is about the tiny standard-adjacent package that fixes it — golang.org/x/sync/singleflight — and about what it actually does under the hood, because the internals are a small masterclass in concurrency correctness.

The problem: cache stampede

The pattern is cache-aside. You ask Redis first. On a miss, you go to the database, get the value, and repopulate the cache so the next reader is fast again.

func GetOdds(eventID string, redis *Redis, db *Database) (Odds, error) {
	if v, err := redis.Get(eventID); err == nil {
		return v, nil // fast path: cache hit
	}
	// miss: hit the database and repopulate
	odds, err := db.FetchFromSource(eventID) // the expensive query
	if err != nil {
		return Odds{}, err
	}
	redis.Set(eventID, odds)
	return odds, nil
}

This is correct for one caller. Now put it under load. event1 is a hot key — a live sports event everyone is watching — and its cache entry expires. In the same millisecond, 20 in-flight requests call GetOdds("event1"). All 20 miss Redis at the same time. All 20 fall through to db.FetchFromSource. All 20 run the same expensive query for the same value, concurrently.

                    cache entry for "event1" expires
                                 |
   req1 ─ MISS ─┐                v
   req2 ─ MISS ─┤        ┌──────────────┐
   req3 ─ MISS ─┼──────► │    Redis     │  (empty)
    ...         │        └──────────────┘
  req20 ─ MISS ─┘                │
                                 ▼
                    ┌──────────────────────────┐
   req1  ─────────► │                          │
   req2  ─────────► │        DATABASE          │  20 identical
   req3  ─────────► │   SELECT odds WHERE ...   │  queries at once
    ...            │                          │  for the SAME value
  req20  ─────────► │                          │
                    └──────────────────────────┘

Nineteen of those queries are pure waste. Every caller wants the identical value, but the database does the same work 20 times. Multiply by every hot key expiring across your key space and you get periodic load spikes — CPU, connection-pool exhaustion, lock contention — all triggered by the very cache that was supposed to protect the database. This is the cache stampede (or thundering herd). The problem isn't the miss; it's the duplication of work across concurrent callers.

The solution, in one sentence

singleflight.Group collapses concurrent calls for the same key into a single execution: only the first caller runs the function, and everyone else waits and receives that same result.

Twenty goroutines go in, one database query comes out.

The public API

The surface is tiny. You keep one Group — it's a namespace with duplicate suppression — and route work through it.

Group — the work class. Zero value is ready to use.
Do(key, fn) (v, err, shared) — blocking. Runs fn unless a call for key is already in flight, in which case it waits. shared reports whether the value was handed to more than one caller.
DoChan(key, fn) <-chan Result — same suppression, but returns a channel so you can combine it with select and timeouts.
Forget(key) — drop an in-flight key so the next call re-executes instead of joining the current one.
Result{ Val, Err, Shared } — what DoChan sends.

Wiring it into the cache-aside function is a two-line change. The database call moves inside g.Do:

var g singleflight.Group
 
func GetOdds(eventID string, redis *Redis, db *Database) (Odds, bool, error) {
	// 1. normal Redis attempt
	if v, err := redis.Get(eventID); err == nil {
		return v, false, nil
	}
	// 2. MISS -> all goroutines for the same key collapse here
	v, err, shared := g.Do(eventID, func() (any, error) {
		// only ONE of the concurrent callers runs this
		odds, err := db.FetchFromSource(eventID) // the expensive query
		if err != nil {
			return Odds{}, err
		}
		redis.Set(eventID, odds) // repopulate the cache
		return odds, nil
	})
	if err != nil {
		return Odds{}, shared, err
	}
	return v.(Odds), shared, nil
}

Fire 20 goroutines at GetOdds("event1") now and the database counter reads 1. Nineteen callers get shared == true; they never touched the database.

📝Note

The key is what defines "the same work". Two callers with the same key share one execution; two callers with different keys run independently. Choosing that key well is the whole game — more on that later.

How it works inside

The mechanism is smaller than you'd expect. A Group holds a mutex and a lazily-initialized map:

type Group struct {
	mu sync.Mutex       // protects m
	m  map[string]*call // lazily initialized
}
 
type call struct {
	wg  sync.WaitGroup   // the coordination primitive
	val interface{}      // written once, before wg is done
	err error
	dups  int            // how many duplicate callers joined
	chans []chan<- Result
}

Everything hinges on one sync.WaitGroup per in-flight call. The WaitGroup is the gate: the first caller holds it closed while it works, and all duplicates block on it. Here's the exact path each caller takes.

The first caller takes the mutex, sees no entry for the key, creates a call, does wg.Add(1), stores it in the map, and releases the mutex before running anything. Then it executes fn:

c := new(call)
c.wg.Add(1)
g.m[key] = c
g.mu.Unlock()      // released BEFORE the slow work runs
 
g.doCall(c, key, fn)
return c.val, c.err, c.dups > 0

Releasing the mutex first is what makes duplicates possible — the lock protects the map, not the work. The expensive fn runs with no lock held, so other goroutines can arrive and find the in-flight call.

A duplicate caller takes the mutex, finds the existing call, bumps dups, releases the mutex, and waits on the WaitGroup:

if c, ok := g.m[key]; ok {
	c.dups++
	g.mu.Unlock()
	c.wg.Wait()               // parks until the first caller finishes
	// ... (panic/goexit checks omitted)
	return c.val, c.err, true // same val, same err, shared = true
}

c.wg.Wait() parks the goroutine. When the first caller finishes and calls wg.Done(), every waiter wakes at once and reads the same c.val / c.err. Note the return: duplicates always report shared == true. The first caller instead reports c.dups > 0 — true if anyone joined while it worked. Both sides agree on the same answer.

first caller                     duplicates (x19)
   |                                   |
 Lock ── create call, wg.Add(1) ─ Unlock
   |                                   |
 run fn()  <── mutex free ──►      Lock, dups++, Unlock
   | (200ms)                          |
   |                              c.wg.Wait()  ── parked ──
 wg.Done() ───────────────────────► all wake, read c.val
   |                                   |
 delete(m,key)                    return c.val, true

Cleanup happens the moment the work finishes. When fn returns, doCall deletes the key from the map under the mutex:

g.mu.Lock()
c.wg.Done()
if g.m[key] == c {
	delete(g.m, key)
}
g.mu.Unlock()

This is the single most important property to internalize:

🚨Gotcha

singleflight is not a cache. The key lives in the map only while the call is in flight. As soon as it completes, the entry is deleted. The next wave of concurrent callers re-executes fn from scratch. singleflight deduplicates calls that overlap in time — nothing more. You still need Redis (or an in-memory cache) for actual caching; singleflight only protects the moment the cache is cold.

The robustness details: the double-defer

Read doCall and you'll find a curious shape — a defer inside a defer. It exists to distinguish three different ways fn can end: a normal return, a panic, and a runtime.Goexit. Getting this wrong would either swallow panics or deadlock every waiter forever.

func (g *Group) doCall(c *call, key string, fn func() (interface{}, error)) {
	normalReturn := false
	recovered := false
 
	defer func() {
		// if we get here without a normal return AND without a recover,
		// fn must have called runtime.Goexit
		if !normalReturn && !recovered {
			c.err = errGoexit
		}
		g.mu.Lock()
		defer g.mu.Unlock()
		c.wg.Done()              // <-- waiters are released no matter what
		if g.m[key] == c {
			delete(g.m, key)
		}
		// ... dispatch panic / goexit / normal result ...
	}()
 
	func() {
		defer func() {
			if !normalReturn {
				if r := recover(); r != nil {
					c.err = newPanicError(r) // capture panic + stack
				}
			}
		}()
		c.val, c.err = fn()
		normalReturn = true
	}()
 
	if !normalReturn {
		recovered = true
	}
}

Why two layers? Because a panic and a runtime.Goexit look almost identical from the outside — both unwind the stack — and the only way to tell them apart is to observe whether recover actually stopped the unwinding. The inner defer runs recover(). If it caught something, control returns to doCall, recovered gets set to true, and we know it was a panic. If it was a Goexit, recover returns nil, the goroutine keeps terminating, and we reach the outer defer with both normalReturn and recovered false — the signature of a Goexit, recorded as errGoexit.

The outer defer then propagates the outcome faithfully to every waiter, so the shared result is honest about how the original call ended:

Panic: wrapped in a panicError (value + trimmed stack trace) and re-panicked in each caller. A failure in the shared work surfaces in every caller, not just the one unlucky enough to run it.
Goexit: each waiter calls runtime.Goexit() too, mirroring the original.
Normal: the value is sent to any DoChan channels.

There's one sharp edge worth knowing. If callers are waiting on channels (DoChan), the panic can't simply be re-thrown — the panicking goroutine would die and leave the waiters parked forever. So singleflight re-raises it on a fresh goroutine and pins the current one:

if len(c.chans) > 0 {
	go panic(e)
	select {} // keep this goroutine alive so it shows up in the crash dump
}

That select {} looks like a bug and is actually deliberate: it keeps the goroutine around so the panic appears in the crash dump instead of vanishing. Small detail, real care.

Traps, and when not to reach for it

singleflight is sharp in both senses. The failure modes are the mirror image of its strengths:

It's not a cache. Combine it with Redis or an in-memory cache. On its own it only smooths the cold-cache instant.
One slow call blocks everyone sharing the key. If fn takes 30 seconds, all joined callers wait 30 seconds — including ones whose own deadline was 1 second. Use DoChan with a context so a caller can walk away:
```
select {
case res := <-g.DoChan(key, fn):
    return res.Val, res.Err
case <-ctx.Done():
    return nil, ctx.Err() // give up on the shared call, keep our SLA
}
```
One failure fails the whole wave. If fn returns an error, every joined caller gets that same error. A transient blip — a dropped connection, a one-off timeout — gets amplified across the entire herd. Consider calling Forget(key) on failure so the next caller retries fresh, or add ret/backoff inside fn.
Key granularity matters. Too coarse a key serializes unrelated work — keying on "odds" instead of "odds:event1" would collapse every event into one execution. Too fine and you suppress nothing. The key should name exactly the unit of work that's safe to share.

⚡TL;DR

Reach for singleflight when concurrent callers request the same expensive, idempotent thing — hot cache keys, identical DB/API calls.
Do collapses overlapping calls into one; duplicates wait on a per-call WaitGroup and read the same result.
It is not a cache — the key is deleted the instant the call finishes. Pair it with Redis/memory.
Use DoChan + context so a slow shared call can't blow past an individual caller's deadline.
Every joined caller shares the same error too — Forget on failure or retry inside fn to avoid amplifying transient blips.
The double-defer in doCall faithfully propagates panics (with stack) and runtime.Goexit to all waiters — correctness, not decoration.

singleflight earns its place whenever the same costly, idempotent work fires in concurrent bursts: caches with hot keys, deduplicated calls to expensive APIs or databases, config or token fetches that a hundred goroutines want at once. It won't cache for you and it won't retry for you — but for that narrow, common window where a herd all wants the identical value at the identical moment, it turns N units of work into one, with a surprisingly careful implementation behind those four methods.