Deep dives into the systems behind low-latency and high-frequency trading — JVM and GC internals, lock-free C++, the kernel and the network card, and the data tooling around them. Written from the perspective of an engineer working toward the top trading firms. New here? Start with the reading order for a guided path, or browse Concepts at a glance for a wider index of technologies worth knowing.

Lock-free SPSC ring buffer: the queue under every trading system
A single-producer/single-consumer ring buffer is the fastest way to move data between two threads — and the canonical low-latency interview question. We build one in C++, prove it correct with acquire/release ordering, and then watch a textbook false-sharing ‘fix’ make it slower before the real optimisation takes it 14× faster. All numbers measured and ThreadSanitizer-clean.

Mechanical sympathy: cache, branches, false sharing
Three hardware ideas that decide whether your low-latency code is fast or pretending to be: how the cache hierarchy works, why branch prediction can change runtime by 5×, and how false sharing makes lock-free code slower than mutexes.