Where production policy belongs: the thesis behind Eliya

Every configurable system has two spaces. The configuration space: every behaviour you can reach by setting the knobs the system already exposes. And there is the implementation space: behaviour that only exists if someone goes in and changes the internals.

A wrapper script can do a lot. Setting flags, launching a -javaagent, mounting a volume. That is more than "selecting a flag"; it composes external system behaviour around the JVM. But notice the floor it hits, and the floor is the point. It cannot rewrite the heap-dump byte stream as that stream is being written. It cannot tell you a flag's origin, whether the value was derived internally or set on the command line. Even a native JVMTI agent, the strongest external case, running in the JVM's own address space, works through the runtime's published extension points. The boundary that matters is not how many knobs you can reach from outside. It is whether you can change what the JVM's own code does on the inside, and a wrapper, however elaborate, is always on the outside.

Most operational requirements live well outside that boundary, and that is fine. Most requirements never need the JVM's interior. The interesting problems start when one does.

Policy, mechanism, and a caveat

OS research named this separation fifty years ago, the HYDRA paper. Policy names an intent; mechanism implements it. The JVM exposes mechanisms individually: -XX:+HeapDumpOnOutOfMemoryError is a mechanism. The JVM has long exposed first-class flags for some intents, like compilation (-server, tiered compilation) and memory. What it does not expose is a policy that says "when this process dies, it should leave enough evidence to diagnose why".

The upstream JDK is largely neutral, by design, so it does not ship the kind of opinionated defaults that particular groups or industries would want as their baseline. A JDK distribution, however, can take that opinion and cater to those groups with the policies built in.

Why should such an opinionated distribution exist at all? Defaults are policy, whether anyone intends them to be or not. Johnson and Goldstein measured organ-donor consent at around 12% in Germany and over 99% in Austria, two culturally similar countries, and the entire gap came down to which box was pre-ticked. People run what ships. What a JDK distribution defaults to becomes, in practice, the configuration of the fleet. That justifies shipping a distribution with production-grade defaults.

It still does not justify patching a VM (which we did), because a wrapper could ship defaults too. Here is the caveat. It is easy to overreach, so let us be exact about the limit. Separating policy from mechanism does not mean policy has to live inside the mechanism. Plenty of systems externalise their policy correctly: Kubernetes pushes authorisation to admission controllers, applications delegate access decisions to IAM. So the claim is not "policy belongs in the runtime". The real claim is narrower, and harder to dispute: some policies depend on what only the runtime can see or do, a flag's origin, the live object graph, the bytes a dump writer is about to emit. Such policies require runtime intimacy, and externalising them is not a style preference, it is simply not possible. An out-of-the-box JVM often leaves less evidence than operators expect when something goes wrong. No GC log. No heap dump. A crash log written to some ephemeral path that dies with the container. And recording those traces correctly (redacting what is sensitive, signing what has to be tamper-evident, attesting what produced each setting) needs exactly that runtime intimacy. That is the thing a wrapper cannot ship, and the only reason a patch exists.

The placement claim

Let C be the set of external controllers that configure the JVM, and B the set of behaviors those controllers can produce. Every wrapper, Helm chart, or webhook is merely a selector operating within C to pick a point inside B. But certain policies demand behaviors outside B (call them B′) because they depend entirely on what only the runtime can see or do. A B′ behavior can be obtained only by changing the runtime itself. Therefore, the policy that targets B′ is, narrowly, the one that belongs in the VM.

Take an example where a requirement actually falls outside B, PCI DSS Requirement 3.5.1, which says the PAN (Primary Account Number) must be unreadable anywhere it is stored. A heap dump of a payment service writes live card numbers to disk in cleartext. A critic is right that you can deal with this from outside the VM by disabling heap dumps, or encrypting the volume. But look at what each one costs. By disabling dumps you have thrown away the forensic evidence that was the whole reason for running this way; volume encryption protects the disk while the dump still travels cleartext from memory to the writer inside the trust boundary. Redacting the dump as the stream is written, inside the VM, kills the dilemma instead of trading one risk for another. That is a dump-writer problem in HotSpot. You cannot implement that behaviour purely by composing existing JVM flags.

The flag

The whole argument comes down to one flag, which activates a policy group:

java -XX:EliyaProfile=Production -jar app.jar

That flag ships in Eliya, an OpenJDK 25 LTS distribution from Asymm Systems, built for compliance-conscious production in regulated industries: telecom, banking and financial services, healthcare, government. EliyaProfile is the policy point the thesis calls for: a ccstr enum. Production, the general set of production-readiness defaults, is the value that ships today (Phase 1); further values are reserved, some on the roadmap, the rest demand-gated.

A quick word on the name. Eliya is short for Nuwara Eliya, the highland tea country of Sri Lanka. In Sinhala the word means light. Java took its name from an Indonesian island that grows coffee; ours comes from highlands that grow tea.

The Production profile

The Production profile's ergonomic defaults:

Heap dump on OOM, written to a structured path under ${ELIYA_DIAGNOSTIC_PATH}/${service}/${replica}/heap-dumps/.
Exit on OOM, a clean shutdown so orchestration can restart you instead of leaving a zombie JVM.
Native Memory Tracking in summary mode.
Crash log path, a predictable hs_err_pidNNNN.log location under the same tree.
Container support reinforced: UseContainerSupport=true is guaranteed under the profile. Upstream JDK 25 already defaults it on, so today this is a no-op; it is there so the guarantee survives a future upstream changing its mind.
Diagnostic VM options unlocked, which JFR sampling and profiler attachment need to work accurately.

All six of these are existing HotSpot flags, and the structured path could be resolved by a wrapper script and handed to -XX:HeapDumpPath. Phase 1 ships no behaviour a script could not reproduce. Strictly speaking, Production is selection from the configuration space. So why patch the VM at all, instead of shipping a wrapper?

Because Phase 1 is not the capability, it is the boundary. EliyaProfile is a named policy point established inside the runtime, and that is where the genuinely runtime-only capabilities of later phases attach.

Production is a fail-safe default in the Saltzer-Schroeder sense: the VM sets its ergonomics with ergonomic origin, so an operator who explicitly passes a value at the command line wins. Normal JVM precedence (command line beats ergonomic) is respected. Production yields to the operator by design.

Two independent dimensions get conflated here, and they are worth separating:

Enforcement style: does the profile actively defend an invariant by rejecting a conflicting override, or does it only set a default and step aside? A profile could reject startup when an override conflicts with a constraint it holds, but whether it should depends entirely on the second dimension.
Authority model: who owns the invariant, and who has the right to waive it? This is the real difference. Production's constraints are operational: they exist to serve the operator's own goal, diagnosability. An operator who deliberately overrides one is making a call they are entitled to make; they own the goal, so the consequence is theirs to accept, and the profile yields. A compliance value's constraints are external: they exist to satisfy a regulator, not the operator. An SRE saying "I accept card numbers in a world-readable dump" is not authorised to make that trade; the regulation is. So a compliance profile fails closed at startup regardless of operator intent.

Same flag, two authority models. Mandatory enforcement requires the orchestration plane to seal the profile flag in the deployment manifest, so a random operator cannot bypass compliance by removing the flag entirely.

Everything else is upstream OpenJDK 25 unchanged. java.security is bit-identical to upstream: TLS 1.0 / 1.1 disabled, weak ciphers blocked, and current minimum key-size requirements already in place. GC selection is left to JDK 25's ergonomics. Outside the profile, Eliya stays intentionally close to upstream, and EliyaProfile=None preserves upstream behaviour.

The 3 AM story

None of the theory above is how anyone experiences the problem. Here is the version everyone recognises.

It is 3 AM. The pager goes off: the JVM running your settlement engine just OOM'd and the container restarted. Your first instinct is to pull the GC logs. There are none, nobody enabled -Xlog:gc*. Fine, the heap dump then. Also missing: nobody set -XX:+HeapDumpOnOutOfMemoryError, and nobody set -XX:HeapDumpPath either, so even if there were a dump it would be in whatever ephemeral container path the JVM happened to be running from. The crash log? Same story.

Everybody knows these flags. Most have negligible runtime cost. Yet production systems run without them, because every shop builds its own answer to "what flags should be on?", usually after its first incident. The configuration-errors literature says this is not carelessness: hand humans enough knobs and misconfiguration becomes a dominant failure mode (Yin et al., SOSP 2011; Xu et al., FSE 2015). The remedy the literature points to is the one the thesis points to: ship the answer in the system, as one intent-level control, instead of re-deriving it per team as a dozen mechanism-level ones.

If you are not in the audience that needs this, you are building an internal tool, or your team has already done the flag work, a neutral upstream build is the right answer, and there are several good ones.

Where things stand

Phase 1, shipping today: one opt-in flag; Linux x86_64 and aarch64; .tar.gz / .deb / .rpm / multi-arch GHCR Docker; signed and reproducible; quarterly upstream-CPU refreshes within two weeks of each upstream CPU, through the JDK 25 LTS window (September 2029); GPLv2 with Classpath Exception; corresponding source attached to each release. No JDK 21 build, deliberately. JDK 29 LTS arrives at its GA with a 24-month overlap before Eliya 25 sunsets.

Phase 2 (target H2 2026) builds on the same policy point with continuous JFR, bundled local-only diagnostics tooling, and a FIPS-validated provider variant. It also brings unified GC logging (-Xlog:gc*) under the profile, deferred from Phase 1 only because HotSpot's internal LogConfiguration initialises too early for a safe ergonomic override without a deeper source patch.

The rest gets built in public on Foojay, one piece at a time: reproducible builds, the glibc floor, signing and key hygiene, and the structured diagnostic path (the one genuine source patch in Phase 1). The full trajectory is on the roadmap.

Verify it yourself

Do not take any of this on trust. Download the artefact and the signed checksums, fetch the signing key and cross-check its fingerprint against an independent channel, then verify the signature and the checksum. With the profile active, -XX:+PrintFlagsFinal shows {ergonomic} as the origin on each flag the profile set; re-run with EliyaProfile=None to see the difference. The full multi-channel ceremony is on the verify your download page.

References

Levin, Cohen, Corwin, Pollack & Wulf, "Policy/Mechanism Separation in HYDRA", SOSP 1975.
Johnson & Goldstein, "Do Defaults Save Lives?", Science 302, 2003.
Yin et al., "An Empirical Study on Configuration Errors", SOSP 2011.
Xu et al., "Hey, You Have Given Me Too Many Knobs!", ESEC/FSE 2015.
Saltzer & Schroeder, "The Protection of Information in Computer Systems", Proc. IEEE 63(9), 1975.
PCI DSS v4.0 Requirement 3.5.1.

Read next: Eliya JDK, the four-phase roadmap, or Choosing a JDK in 2026.