← Daniel Bloom/03 · Performance Optimization

Five bottlenecks, one stack, >80% faster.

A walkthrough of a full-stack performance program: the five bottlenecks I found, what the fix looked like, and what each one did to P95 — measured the same way before and after.

The headline

0%

P95 latency reduction across the program

0

Discrete bottlenecks identified and fixed

0%

Server CPU reduction

01 · OPcache

Turned OPcache on

Before

P95 sat at ~850ms across PHP entry points. Every request paid the cost of recompiling the same code paths, over and over.

After

Enabling OPcache with the right validate_timestamps and memory budget produced a single sharp inflection point — P95 dropped to ~280ms within one deploy, and CPU followed.

−67%P95 on PHP entry points
Fig. 01Datadog · Lumen production
Datadog latency dashboard showing the OPcache deploy inflection

02 · Redis

Cached the hot reads

Before

Several core endpoints were re-reading large config and reference tables on every request. The DB was healthy — until it wasn't.

After

Layered Redis caching with TTLs tuned per endpoint and explicit invalidation on writes. The biggest five endpoints all halved their query latency.

−65%avg endpoint query latency
Fig. 02Datadog · shop page · 273k requests
Datadog requests and latency dashboard for the shop page, 273k requests

03 · Queues

Queued the non-critical writes

Before

User-facing write actions were doing the full downstream work inline — search indexing, notifications, audit logs — pushing P95 well over a second on writes.

After

Moved non-critical write side-effects onto async queues with idempotent workers. Request duration cliff-dropped overnight; failure modes became visible instead of silent.

−72%write-action duration
Fig. 03Synthetic · Datadog screenshots pending

04 · Block cache

Block-cached the heavy components

Before

Page render was dominated by a handful of slow components — the product grid, reviews block, and footer — each re-rendering on every request.

After

Per-component block caching with stable cache keys collapsed the heaviest layers. Each component's contribution to render time fell, and the slow tail disappeared.

−58%server render time
Fig. 04Synthetic · Datadog screenshots pending

05 · API concurrency

Parallelized the Shopify calls

Before

The shop page made multiple real-time calls to the Shopify API per request — one after another. Server response time grew with every hop, and Shopify itself isn't fast.

After

Parallelized the real-time Shopify calls and layered caching in front. In the scatter, the dense band of sub-200ms responses near the x-axis is the cached path — those requests skip Shopify entirely.

−43%shop page server response time
Fig. 05Datadog · shop page · response time scatter
Datadog duration scatter showing concurrent API call wall time

Cumulative impact

Five fixes, compounding.

Each phase landed independently — but the wins stacked. Here's the P95 trajectory from baseline to the end of the program, plus the headline numbers the business cared about.

0%

P95 latency reduction

0%

Server CPU drop at peak

0×

Throughput on the same hardware

0%

Downtime reduction (deploy pipeline)

Fig. 06 · CumulativeP95 latency · ms