HFT Engineer's Roadmap

Lock-free SPSC ring buffer: the queue under every trading system

Sat, 20 Jun 2026 00:00:00 +0000

A single-producer/single-consumer ring buffer is the fastest way to move data between two threads — and the canonical low-latency interview question. We build one in C++, prove it correct with acquire/release ordering, and then watch a textbook false-sharing ‘fix’ make it slower before the real optimisation takes it 14× faster. All numbers measured and ThreadSanitizer-clean.

Mechanical sympathy: cache, branches, false sharing

Sun, 03 May 2026 00:00:00 +0000

Three hardware ideas that decide whether your low-latency code is fast or pretending to be: how the cache hierarchy works, why branch prediction can change runtime by 5×, and how false sharing makes lock-free code slower than mutexes.

About

Sat, 25 Apr 2026 00:00:00 +0000

Who’s writing

I’m a software engineer writing Java in production at a high-frequency trading firm. The day job is the kind of work where a 2 ms GC pause is a missed market and the design of a queue can be the difference between making and missing the open. I came to it from regular backend engineering and the interests below reflect that arc.

What this blog is

Deep dives on the systems, languages, and tools that make latency-sensitive software work — and the broader ecosystem they sit in. Some of it is JVM-internal (ZGC, JIT, Loom), some of it is data-shaped (DuckDB, ClickHouse, Iceberg), some of it is networking and OS (kernel-bypass, eBPF), some of it is the trading-domain mental models I wish I’d had on day one (order books, single-writer designs, market microstructure).