
Mechanical sympathy: cache, branches, false sharing
Three hardware ideas that decide whether your low-latency code is fast or pretending to be: how the cache hierarchy works, why branch prediction can change runtime by 5×, and how false sharing makes lock-free code slower than mutexes.