TL;DR
- Generational ZGC (JDK 21+), the generational mode of the Z Garbage Collector, splits the heap into a young and an old generation while keeping ZGC’s defining property: pause times bounded in the hundreds of microseconds, independent of heap size.
- The trick that makes any ZGC variant work is coloured pointers: a few metadata bits stored inside every heap reference, read by a load barrier on each reference load. The barrier checks the colour, and if it’s stale, healing rewrites the slot to a current pointer.
- Generational ZGC adds a store barrier for the first time. It maintains the young/old remembered set and feeds a snapshot for marking. The fast path is a single colour check; a thread-local buffer absorbs most slow cases.
- Generational ZGC drops the multi-mapped memory trick (three virtual aliases per physical page) that pre-gen ZGC relied on. Barriers now do explicit colour masking, RSS reports the truth, and more metadata bits are free.
- The “concurrent compacting GC” problem reduces to: can a mutator dereference a stale pointer mid-relocation without observing torn state? ZGC’s answer is “yes, because every load goes through a barrier that fixes it before the value is used.”
The problem: why concurrent compacting GC is hard
Start from the requirements:
- Compaction. Long-running JVMs fragment. A non-compacting collector eventually fails to allocate a moderately-sized contiguous object even with plenty of free bytes available. So we move live objects together to defragment.
- No long pauses. A multi-TB heap holds tens of millions of references. Walking and updating all of them in a single stop-the-world phase is, on commodity hardware, several seconds. That’s a non-starter for any latency-sensitive system.
- Correctness. The application keeps running. Mutator threads load and store object references. They must never see a torn pointer or read a stale field through a stale pointer to a stale copy of an object that’s been relocated.
Those three together are the hard problem. Compaction means relocation. Relocation means the address of an object changes. If you change an address, you have to fix every pointer that referred to the old address — and you can’t do that atomically across a TB-scale heap.
There are essentially three strategies in play.
Stop the world while you fix everything. This is what Serial, Parallel, and the Concurrent Mark Sweep (CMS) collector during full GC do. Pauses scale with live set. For a 100 GB heap with 60 GB live, that’s seconds. It does not scale.
Compact in regions, evacuate one region at a time, do book-keeping with remembered sets. This is G1 and pre-generational Shenandoah. You only fully evacuate one region per cycle, so the per-cycle work is bounded. You still need to find every pointer into the evacuated region (the remembered set), and you still pay a stop-the-world (STW) pause to fix them up — though smaller than full compaction. Tail latencies are minutes-good but not microseconds-good.
Concurrent compaction with read or write barriers. This is the regime ZGC, Shenandoah, and Azul C4 occupy. The mutator runs throughout. Pointers get fixed up lazily as the application loads or stores them. The cost is on every reference load (or every reference store). The win is that pause times no longer scale with live set or with heap size — only with the root set (thread stacks, JNI globals, etc.), which is small and bounded.
ZGC takes the “concurrent + read barrier on every load” route. It’s a strong choice but it forces a specific solution to a specific subproblem: when a thread loads a pointer from the heap, how does it know whether that pointer is current?
Anatomy of one GC cycle
Before the pointer tricks make sense, you need the shape of what ZGC is doing. A single collection cycle, for one generation, runs three logical stages, all concurrent with the application apart from a few microsecond-scale pauses.
- Mark. Starting from the roots (thread stacks, statics, and for a young collection the remembered set), walk the live object graph and record every reachable object. Anything never reached is garbage.
- Relocate (compact). Choose the most sparsely-populated regions as the relocation set, then copy their live objects into fresh regions, packed tight. Once a region’s survivors are all copied out, the whole region is reclaimed at once. This is the compaction that fights fragmentation.
- Remap. Relocation changed objects’ addresses, so every reference still holding an old address now points at a stale location and has to be repointed to the new copy.
Relocation has to record where everything went, and it does so in a small per-region side table called the forwarding table: for each live object, its old location maps to its new address. The rest of the article keeps coming back to this table. A thread that later loads a stale pointer into a relocated region recovers the object’s new address by looking it up there.
The remap then happens lazily. A stale pointer is repaired the next time some thread loads it, by consulting the forwarding table, and the collector cleans up whatever pointers no thread touched (folding that work into the next cycle’s mark). That one decision is what keeps pauses off the heap-size curve. It also creates the single hazard the rest of ZGC exists to neutralise: in the window between an object moving and a given pointer being repaired, the application can load that pointer, and it has to reach the live copy rather than the stale one. The coloured pointer is the mechanism that guarantees it.
Coloured pointers, in one diagram
A coloured pointer stores GC state inside the reference itself. A reference is 64 bits, but real virtual addresses on x86-64 and AArch64 are only 48 bits or fewer, so ZGC steals the unused top bits to encode that state directly in the pointer.
When the layout is right, the question “is this pointer current?” becomes a register-only check: do the colour bits match the globally-agreed-upon current colour? No memory load.
Here is the JDK 21+ generational layout. The address bits are at the bottom, the colour bits sit just above, and the rest of the word is reserved.
A few things to note that aren’t obvious from the picture:
- Exactly one of {Marked0, Marked1, Remapped} is set at any moment, so every pointer carries one of three colours. One of the three is the current good colour: the value the load barrier reads as “current, nothing to do.” The other two are bad, and loading a bad pointer drops into the slow path.
- The good colour tracks the phase. In a mark phase it is a mark colour; during and after relocation it is
Remapped. So “good” really means “this pointer already had the current phase’s work done to it”: already marked, or already confirmed to point at a current address. - Why two mark colours, M0 and M1? A pointer that no thread happened to load last cycle still carries last cycle’s mark. Reuse the same colour for the next mark and those stale pointers masquerade as already-marked, so the marker wrongly skips them. Alternating fixes it: each mark phase flips to the colour the previous one didn’t use, so every pointer left over from an earlier epoch is automatically the wrong colour and gets re-examined. This is why the good colour rotates
M0 → Rm → M1 → Rm → …. Remappeddoes not mean “has been remapped”. It means “confirmed not to point into the active relocation set”, so the address is current. When relocation starts, the good colour leavesRemapped, so every such pointer is the wrong colour again (still a valid address, just not yet brought up to date). Read one and the load barrier fixes it on the spot: it follows the forwarding table if the object has moved, restamps the pointer with the new good colour, and writes it back, so the slot is correct and takes the fast path from then on.Finalizableis a soft-reachability flag forObject.finalize(). Increasingly vestigial, but the bit is still there.
Flipping the good colour like that, so every existing pointer is suddenly the wrong colour, is the move that trips people up: “invalidate every pointer” sounds like a heap-wide sweep, and it isn’t. The good colour is a single global value the barrier compares against, and a flip writes one new value into it at a safepoint. Not one pointer in the heap changes. What changes is the answer the barrier gives for all of them at once, so invalidating the entire heap costs one O(1) write: you move the goalposts, not the pointers.
Older single-generation ZGC laid the colour bits out differently and multi-mapped the heap into three virtual aliases per page, which is why old diagrams place the bits elsewhere and why pre-gen ZGC famously reported about 3× its real memory (ps and container limits counted the heap three times). Generational ZGC masks the bits off explicitly instead, so neither quirk applies today; just don’t be thrown if you meet the old layout in an older post.
The whole design hangs on one move: put the GC state inside the pointer. Once a few bits of the reference encode “which collection epoch is this from,” the question every load barrier must answer — is this pointer current? — collapses into a register-only mask-and-compare. No memory load, no lock. Everything else in ZGC is machinery to keep that one check cheap and correct.
The load barrier
Every time a Java method reads a reference field from the heap (or an array element, or any oop slot), the just-in-time (JIT) compiler inserts a load barrier. There is no way for the application to read a heap reference without going through one.
The barrier is a snippet of compiled machine code. Conceptually:
loadedRef = *fieldAddress;
if (colourBitsOf(loadedRef) != currentGoodColour) {
loadedRef = slowPathFix(fieldAddress, loadedRef); // may relocate, may mark
}
useRef(loadedRef);
The fast path is a register-only test against a global “bad colour mask” — typically two or three instructions. On a modern CPU, with branch prediction and no memory dependency, the steady-state cost is well under 1 ns per load, often unmeasurable in macrobenchmarks.
The slow path is where the real work happens, and what it does depends on the GC’s current phase:
- During concurrent mark: if the loaded pointer is unmarked, the barrier marks it (pushes it onto the marking stack) and returns a recoloured pointer (the
unmarked → markbranch in the diagram). The mutator’s own load is doing a slice of the collector’s marking work, which is part of how marking stays concurrent. - During concurrent relocate: if the pointer targets the relocation set, the barrier reads the object’s new address from the forwarding table and returns a recoloured pointer to it (the
relocated → forwardingbranch). - Between cycles: the only reason for a bad colour is that the global good colour just flipped, so the barrier recolours the pointer and returns.
Whichever case applies, the barrier then heals the slot, which the diagram traces top to bottom:
After the slow path computes a corrected pointer, it does a compare-and-swap (CAS) to write that pointer back into the slot it was loaded from. This is the self-healing property: each mutator thread that hits a bad pointer pays the slow-path cost once, then the slot is fixed and the next thread to read that slot takes the fast path. Across a concurrent GC cycle, the cost of pointer fixup is amortised across whichever threads happen to load a given slot. The GC itself walks the heap to handle slots no mutator visits.
The CAS can fail. Two threads race to heal the same slot, or a mutator stores a different pointer concurrently. Both cases are fine — losing the CAS just means the slot already has an acceptable value, and the slow-path code re-checks and exits cleanly.
Two consequences worth being explicit about:
- There are no read barriers on field reads of primitives. Only references go through the barrier. A
longfield read is a plain load. - The compiler can sometimes elide barriers. Two loads of the same field within a region the JIT can prove is barrier-free (e.g. between consecutive safepoints with stable GC state) can collapse to a single barrier. This is one reason the steady-state cost is hard to pin down with a microbenchmark.
Store barriers (new in generational ZGC)
Single-generation ZGC had no store barrier: every reference write was a plain memory store. Generational ZGC needs one, because two jobs require noticing writes as they happen. The barrier does both on every reference store.
Track old-to-young pointers (the remembered set). A young collection wants to scan only young objects, but an old object can hold a reference into the young generation, and that reference is a root the young collection must follow or it will free a live young object by mistake. Scanning all of old to find such references would defeat the purpose of collecting only the young. So ZGC records them as they are created: when the program stores a young-gen reference into an old-gen field, the barrier notes that field’s location. That growing set of old→young slots is the remembered set, and a young collection reads it as extra roots alongside the thread stacks.
Keep concurrent marking honest (SATB). Marking runs while the program keeps mutating, which is the dangerous part. If the app overwrites a reference field mid-mark, the old value it held may have been the marker’s only route to an object that was alive when the cycle began. ZGC uses snapshot-at-the-beginning (SATB) marking: it treats everything reachable at the cycle’s start as live for this cycle. To honour that, the barrier stashes the previous reference just before each overwrite, so the marker still gets to visit it. If that object turns out to be dead, it is collected next cycle instead; erring this way is safe, while the other way would free live memory.
The implementation is engineered to keep the fast path tight. Like the load barrier, the fast path is a single colour-bit test:
- The reference being stored has a colour. If it’s already current, no remembered-set work is needed (the field already points at something of the right colour, no cross-generational stale state).
- The reference being overwritten is examined: if it’s an old-to-young pointer, record it.
When the fast path can’t decide, the barrier doesn’t immediately drop into a global slow path. It writes the field address and the previous value into a thread-local store-barrier buffer (a small ring) and continues. The slow path runs only when the buffer is full. This medium path is the main reason the store-barrier overhead stays small in practice.
One refinement, which the comparison table later refers to: ZGC’s remembered set is a precise per-field bitmap, not a coarse card table. It keeps two such bitmaps per region, one that mutators write and one the collector reads, swapped at each young-cycle start so neither side has to lock or clear the other’s copy.
GC cycle phases
Both young and old collections in generational ZGC follow the same coarse shape: (short STW) mark start, concurrent mark, (short STW) mark end, concurrent prepare, (short STW) relocate start, concurrent relocate.
What’s in the STW pauses is minimal. Modern ZGC scans zero roots in stop-the-world phases, because per-thread root scanning moved to a concurrent process some releases ago. The remaining STW work (flipping the global good colour, swapping remembered-set bitmaps, hooking up the relocation set) is constant-time, on the order of tens of microseconds.
sequenceDiagram
autonumber
participant App as Mutator threads
participant GC as ZGC threads
Note over App,GC: STW: Pause Mark Start (~microseconds)
GC->>GC: set good colour to the new mark colour
Note over App,GC: Concurrent Mark
App->>GC: loads mark any unmarked refs (slow path)
GC->>GC: mark the live graph from the roots
Note over App,GC: STW: Pause Mark End
GC->>GC: drain the last marking work
Note over App,GC: Concurrent Prepare for Relocation
GC->>GC: pick sparse regions, build forwarding tables
Note over App,GC: STW: Pause Relocate Start
GC->>GC: set good colour to Remapped, all refs now wrong
Note over App,GC: Concurrent Relocate
App->>GC: loads forward and self-heal refs (slow path)
GC->>GC: relocate whatever no load has touched
A few specifics:
Pause Mark Startmakes this cycle’s mark colour the new good colour, the single global value the barriers compare against. (What gets called the good-colour mask is just that value in the machine form the barrier’s bitwise test needs, not a second thing.) Flipping it leaves every existing reference the wrong colour, so the next load through any of them takes the slow path.Pause Mark Enddrains the last of the mark stacks (the worklists of objects found but not yet scanned) in a brief handshake with the mutators, then computes which objects are live.Pause Relocate Startflips the colour again to invalidate references into the relocation set and installs the forwarding tables. After this, mutators continue running, and any load of a pointer into the relocation set takes the slow path and self-heals.- The bound on each STW pause is the size of the per-thread root set you walk in the safepoint. Since concurrent stack processing moved that to zero, the bound is essentially how long it takes to flip a few global flags and synchronise threads at the safepoint: microseconds, on a sane machine.
This is why ZGC pause times are independent of heap size and live set. Nothing in the STW phase scales with either.
The generational design
Generational ZGC implements the classic two-generation hypothesis: most objects die young. A short-lived object should never get a tenured-space scan; allocate it, use it, throw it away, never look at it again. A long-lived cache or dictionary should survive many young collections without the marker re-traversing it on every cycle.
The young/old split:
- Young gen. Newly allocated objects land here. Collected frequently. A surviving object is either kept in young (if it hasn’t survived enough cycles) or promoted to old.
- Old gen. Objects that survived a configurable number of young cycles. Collected separately, less often.
- Promotion is by relocating the object from a young region to an old region during a young cycle. The forwarding pointer is in the same forwarding table the load barrier consults.
The hard part wasn’t “add a generation”. The hard part was making everything in ZGC’s existing concurrent-compaction model generation-aware without giving up on the load barrier’s fast path.
What had to change:
- Distinct mark and remap state per generation. A young-gen mark cycle is independent of an old-gen mark cycle. The colour bits aren’t enough to encode that on their own, so the global good-colour mask becomes generation-aware — the barrier slow path picks the right behaviour based on which generation the target object lives in.
- A store barrier — the first time ZGC has had one. Discussed above.
- An SATB marking algorithm for both generations, replacing the older “marker walks the heap and the load barrier marks-on-load” approach. SATB has the act-once property the store-barrier fast path relies on.
- A precise remembered set (per-region bitmap pairs) instead of a card table. Card tables are coarse; with multi-megabyte regions you don’t want to scan an entire region per dirty card.
What we get back is real:
- Most allocations don’t pay for old-gen marking. Short-lived objects are collected by the young cycle alone, and are never visited by the old marker.
- Allocation stalls drop sharply. Single-gen ZGC’s biggest practical problem was that on workloads with very high allocation rates, the collector couldn’t reclaim memory fast enough and mutators stalled in the allocator. The young/old split lets the young cycle keep up with mutator allocation without having to reprocess long-lived state every time.
- Throughput catches up to G1. Reported figures put generational ZGC throughput within roughly 5–15% of G1 on most workloads, while keeping ZGC’s pause profile.
Where the sub-millisecond claim comes from
The sub-ms pause claim has a real, tractable explanation. The total STW work per cycle is:
- A handshake to bring all mutator threads to a safepoint. Normally tens of microseconds.
- Some flag flips: change the global good colour, swap remembered-set bitmaps, install the forwarding-table pointer. Constant work, microseconds.
- Releasing the safepoint.
That’s it. Nothing in the STW phase scales with heap size, live set, thread count, or stack depth. Published measurements typically show average pause times around 50 µs and max around 500 µs on a normally-loaded box. The generational rewrite didn’t change that profile.
Interview angle. “Pause time independent of heap size” is the ZGC headline — but the senior move is naming what it costs and what it doesn’t cover. The price is a barrier on every reference load/store (throughput, not pause). And “pause time” excludes the two things that actually bite tails in production: slow-path work on the loading thread, and allocation stalls when the collector can’t keep up. Knowing pauses are bounded but tail latency isn’t is what separates a real answer from reciting the brochure.
The fine print:
- Safepoint sync time is a real floor. It depends on the number of threads, OS scheduling, and how long each thread takes to reach a poll point. On a busy machine with 1000 Java threads it’s higher than on a 16-thread quiet box.
- The slow path is not bounded. A single slow-path execution can take longer than a fast-path test by orders of magnitude, especially if it has to relocate an object. But it’s not in an STW pause; it’s a per-load cost paid by the mutator that loaded the bad pointer. It contributes to per-request latency tails, not to pause time as the term is normally used.
- Allocation stalls are also not pause time. When the collector can’t keep up with the mutator’s allocation rate, allocators block until memory is reclaimed. This does show up as user-visible latency, and it’s the most likely thing to bite you at the tail in production. Java Flight Recorder (JFR) has a dedicated
jdk.ZAllocationStallevent for it (see below).
Practical: when to pick ZGC, and how to monitor it
Pick ZGC when:
- Heap is large (tens of GB and up). ZGC’s strengths are most visible above ~10 GB. Below that the fixed overheads dominate and G1 is fine.
- Tail latency matters more than throughput. If you’d trade 10–15% throughput to never see another 100 ms pause, you want ZGC.
- You’re allocation-rate-stable, or you’ve sized the heap so the collector can keep up.
- You’re on JDK 21 or later. Earlier versions are usable but missing the generational rewrite that closes most of the throughput gap.
Don’t pick ZGC when:
- You have a very small heap (sub-GB). The fixed cost of barrier and safepoint machinery dominates relative gains.
- You’re throughput-bound and your tail latency budget is generous (say 50–100 ms). G1 will give you more compute per dollar.
- Your hot path is allocation-stupid. ZGC’s young cycle is fast but it’s not free; if you are allocating gigabytes per second and not reusing buffers, you’ll churn through allocation stalls regardless of GC choice. Fix the allocation rate first.
Key flags
Modern JDK (24+):
-XX:+UseZGC # ZGC is generational by default; this is enough
-Xms<heap> -Xmx<heap> # pin the heap, avoid resizing
-XX:+AlwaysPreTouch # commit pages at startup, no first-touch latency
-XX:SoftMaxHeapSize=<below-Xmx> # soft target, ZGC tries to stay under it
JDK 21–23:
-XX:+UseZGC -XX:+ZGenerational # explicit generational mode
# JDK 23 deprecates the flag (gen is default)
# JDK 24 removes non-gen entirely
For low-latency workloads you almost always want Xms == Xmx and +AlwaysPreTouch. You don’t want ZGC working out heap sizing while you’re trying to measure tail latency.
Monitoring
The right tool is JFR, not GC logs. Streaming JFR events is cheap, well-typed, and gives you exactly the right level of detail. The events worth wiring an alert to:
jdk.GCPhasePause— STW phases. Anything over a few ms means something is misconfigured (safepoint contention, swapping, memory pressure).jdk.ZAllocationStall— the killer. A non-zero rate means the collector isn’t keeping up. Default threshold is 10 ms but you should care about any of them in a low-latency context. Threshold can be lowered via JFR config.jdk.ZPageAllocation/jdk.ZRelocationSet— useful for understanding heap pressure and relocation work, especially when comparing two configurations.jdk.SafepointBegin/jdk.SafepointStateSynchronization— the non-GC safepoint pauses. ZGC’s pauses are tiny; if your application sees 5 ms safepoints, the cause is almost always something else (revoke biased locks, deopt, JFR safepoint, JVMTI agent). Distinguishing GC pauses from non-GC safepoints requires looking at both classes of events.
Logs, if you must:
-Xlog:gc*,gc+phases=debug:file=gc.log:time,level,tags
The phase tag is what you want for ZGC. Per-cycle output is dense; tools like GCEasy will parse it for you.
Compared to G1 and Shenandoah
It’s worth being precise about what each collector buys you, because the tradeoffs aren’t subtle.
| G1 | Shenandoah | Generational ZGC | |
|---|---|---|---|
| Compaction | Region evacuation, mostly STW | Concurrent, with Brooks pointers (load reference barriers in modern Shenandoah) | Concurrent, with coloured pointers + load barrier |
| Pause time profile | ms–tens of ms; STW-dominated | sub-ms aspirational, low-ms typical | sub-ms typical, hundreds of µs at p99 |
| Pause scaling with heap | Sublinear; live set matters | Independent of heap | Independent of heap |
| Generational | Yes | Optional (experimental) | Yes (JDK 21+) |
| Inter-gen tracking | Card table + SATB | Card table | Per-region bitmaps + SATB |
| Throughput vs Parallel | ~85% | ~80% | ~80–90% |
| Memory overhead | Low | ~10% (Brooks + forwarding) | Higher; multi-mapped legacy was 3× RSS, gen ZGC is now normal |
| Default since | JDK 9 | Optional, third-party in OpenJDK | Default for ZGC since JDK 23 |
flowchart TD
A["Picking a collector"] --> B{"heap > 16 GB and
tail latency < 5 ms?"}
B -- no --> C{"heap < 4 GB or
throughput-only?"}
C -- yes --> D[Parallel or G1 default]
C -- no --> E[G1]
B -- yes --> F{"on JDK 21+?"}
F -- no --> G[Upgrade JDK first,
then ZGC or Shenandoah]
F -- yes --> H{"max pause
requirement?"}
H -- "sub-ms tail" --> I[Generational ZGC]
H -- "low-ms ok" --> J[Generational ZGC
or Shenandoah]
Three comparison notes that matter in practice:
- G1’s failure mode is humongous allocations and full-GC fallbacks. Pathological allocation patterns can trigger a full STW collection that scales with live set. Generational ZGC has no equivalent — there is no STW full collection.
- Shenandoah’s Brooks pointer adds a per-object indirection. Modern Shenandoah uses load reference barriers and has improved on this, but the basic model still pays a per-object header cost. ZGC’s coloured pointer is “free” in the sense that the storage was already there in unused pointer bits.
- Pre-generational Shenandoah and pre-generational ZGC both struggled with very high allocation rates. Both have moved (or are moving) generational; Shenandoah’s mode is still experimental.
Wrapping up
The whole of ZGC follows from one decision: put the collector’s state inside the pointer, so “is this reference current?” becomes a register-only check on every load. From there, marking and relocation run concurrently with the application, the load barrier repairs pointers lazily as they are touched, and the only stop-the-world work left is flipping a global colour and bringing threads to a safepoint. That is why pause times track the small root set rather than the heap, and stay in the tens to hundreds of microseconds even on multi-terabyte heaps.
Generational ZGC keeps that property and adds the old observation that most objects die young: collect the young generation often and cheaply, promote the survivors, and revisit long-lived state rarely. The price is the store barrier, maintaining the remembered set and feeding SATB marking. The payoff is far less work per cycle and far fewer allocation stalls, which closes most of the throughput gap to G1 that held single-generation ZGC back.
It isn’t magic. The constraint that remains is allocation rate: out-allocate the collector and you stall no matter which GC you run, so the real fix is allocating less, not tuning. A few rough edges linger too, mostly invisible until you hit them: no compressed object pointers below ~32 GB, some class-unloading bookkeeping, and a load-barrier slow path whose tail latency is real but never shows up as “pause time.” None of that changes the headline. If you want bounded pauses on a large heap and can spend a little throughput to get them, generational ZGC is the default to reach for on JDK 21 and later.
Further reading
- JEP 439: Generational ZGC — the canonical spec. https://openjdk.org/jeps/439
- JEP 333: ZGC: A Scalable Low-Latency Garbage Collector (Experimental) — original ZGC. https://openjdk.org/jeps/333
- JEP 376: ZGC: Concurrent Thread-Stack Processing — what got pause times to constant. https://openjdk.org/jeps/376
- JEP 474: ZGC: Generational Mode by Default — the JDK 23 default flip. https://openjdk.org/jeps/474
- Inside Java: “Introducing Generational ZGC” — the best non-spec explainer of the redesign. https://inside.java/2023/11/28/gen-zgc-explainer/
- Erik Österlund — “Generational ZGC and Beyond” (JVMLS 2023) — talks through the design tradeoffs. https://www.youtube.com/watch?v=YyXjC68l8mw
- Per Liden — “ZGC: What’s new in JDK 16” — the concurrent stack processing release. https://malloc.se/blog/zgc-jdk16
- Netflix Tech Blog — “Bending pause times to your will with Generational ZGC” — production-shaped numbers from a real deployment. https://netflixtechblog.com/bending-pause-times-to-your-will-with-generational-zgc-256629c9386b
- Yang et al., “Deep Dive into ZGC: A Modern Garbage Collector in OpenJDK”, ACM TOPLAS 2022 — the only academic-quality writeup of ZGC internals. https://dl.acm.org/doi/full/10.1145/3538532
- HotSpot VM Garbage Collection Tuning Guide (JDK 25) — flag reference. https://docs.oracle.com/en/java/javase/25/gctuning/z-garbage-collector.html
- Gunnar Morling — “Lower Java tail latencies with ZGC” — short, practical, with JFR. https://www.morling.dev/blog/lower-java-tail-latencies-with-zgc/
