← Daniel Bloom/03 · Performance Optimization
Five bottlenecks, one stack, >80% faster.
A walkthrough of a full-stack performance program: the five bottlenecks I found, what the fix looked like, and what each one did to P95 — measured the same way before and after.
The headline
P95 latency reduction across the program
Discrete bottlenecks identified and fixed
Server CPU reduction
01 · OPcache
Turned OPcache on
P95 sat at ~850ms across PHP entry points. Every request paid the cost of recompiling the same code paths, over and over.
Enabling OPcache with the right validate_timestamps and memory budget produced a single sharp inflection point — P95 dropped to ~280ms within one deploy, and CPU followed.

02 · Redis
Cached the hot reads
Several core endpoints were re-reading large config and reference tables on every request. The DB was healthy — until it wasn't.
Layered Redis caching with TTLs tuned per endpoint and explicit invalidation on writes. The biggest five endpoints all halved their query latency.

03 · Queues
Queued the non-critical writes
User-facing write actions were doing the full downstream work inline — search indexing, notifications, audit logs — pushing P95 well over a second on writes.
Moved non-critical write side-effects onto async queues with idempotent workers. Request duration cliff-dropped overnight; failure modes became visible instead of silent.
04 · Block cache
Block-cached the heavy components
Page render was dominated by a handful of slow components — the product grid, reviews block, and footer — each re-rendering on every request.
Per-component block caching with stable cache keys collapsed the heaviest layers. Each component's contribution to render time fell, and the slow tail disappeared.
05 · API concurrency
Parallelized the Shopify calls
The shop page made multiple real-time calls to the Shopify API per request — one after another. Server response time grew with every hop, and Shopify itself isn't fast.
Parallelized the real-time Shopify calls and layered caching in front. In the scatter, the dense band of sub-200ms responses near the x-axis is the cached path — those requests skip Shopify entirely.

Cumulative impact
Five fixes, compounding.
Each phase landed independently — but the wins stacked. Here's the P95 trajectory from baseline to the end of the program, plus the headline numbers the business cared about.
P95 latency reduction
Server CPU drop at peak
Throughput on the same hardware
Downtime reduction (deploy pipeline)