Cache Churn Cascades
Our Redis instance couldn't hold the working set. During peak hours, we'd evict hot data before it could be reused. Miss rates would spike, the database would get hammered, and suddenly everything got slower. The more users we had, the worse it got. It felt like we were fighting the system.
Hot Partitions and Bottlenecks
Certain user segments were hitting the same database nodes repeatedly. The data wasn't distributed evenly. We had capacity on other nodes but couldn't use it. It's a maddening problem because the metrics look fine until they don't, and then everything degrades at once.
Layered Caching
Instead of betting everything on Redis, we built three tiers. CDN for the edge, regional caches for warm data, and Redis for the hot working set. Each layer is simple and solves one problem.
Safe Retries Through Idempotency
We made every write safe to retry using idempotency keys. This unlocked read scaling across replicas without fear. Reads distributed, writes safe. No more write bottlenecks.
Graceful Degradation
Stop fighting load. We built backpressure tied to SLO burn rates. When the system started degrading, we reject requests instead of letting them queue and cascade.
How We Actually Built It
Redis Cluster with Consistent Hashing
Single Redis was at 85% CPU. We deployed a 12-node cluster where each node owns a slice of the keyspace. Rebalancing is cheap. We could add nodes and let the cluster rebalance itself. CPU dropped to 35%. Suddenly we had breathing room for traffic spikes instead of immediate panic.
hash(key) % 12 -> node_5
L1 cache hit: 1ms
L1 miss, Redis hit: 5ms
Both miss, database: 50-200ms
Each layer buys us capacity
Async Write-Behind for Aggregations
Expensive operations like ranking calculations were hitting the database on every write. We inverted it: write to cache immediately, return to the user, then background workers batch updates to the database. Eventual consistency with versioning handles any conflicts that arise. The write is now 5ms instead of 50ms.
1. Write to cache (blocking)
2. Return 200 OK to client (5ms total)
3. Background worker flushes batch (50ms)
4. Database eventually consistent
SLO Burn Rate Autoscaling
CPU-based autoscaling was too slow. We needed something that looked ahead instead of behind. We tied concurrency limits to SLO burn rates: if we're burning more than 10% of our error budget per hour, reduce the token bucket. Preemptive instead of reactive. The system scales before users notice problems.
if burn_rate > 0.1:
reduce tokens by 10%
return 429 Too Many Requests
else if burn_rate < 0.05:
increase tokens by 5%
recover capacity
Four Months of Incremental Progress
What Changed
When We Started
Where We Ended Up
What We Actually Learned
Cold Starts Kill Performance
We were chasing database optimization and missed the real problem. When Redis is empty or after restarts, every request hits the database. Thundering herd effects cascaded immediately. The lesson: predictive cache warming before traffic peaks matters more than faster queries. Preventing cold cache hits beats everything.
Idempotency Changed Our Thinking
Making writes safe to retry with idempotency keys seemed like a small thing. It wasn't. Retries became free. Transient failures recovered gracefully. We went from rejecting retries to encouraging them. This pattern alone cut cascading failures dramatically.
SLO Burn Signals Are the Right Lever
CPU at 75% means nothing in context. But "burn rate exceeding 10% per hour" is unambiguous: you will violate your SLO. We switched everything to burn-rate signals. Alert fatigue disappeared. Scaling became smooth instead of thrashing around.
Simplicity at Each Layer Matters
We could have tried to build the perfect database. Instead, we added simple layers that prevented the database from being the bottleneck. Three independent cache layers. Each one simple and focused. Together, exponentially more resilient than any single optimization.
