Cleat - Changelog

v1.2.4 #

2026-07-14 latest

The kit learns from a real job: sequential dispatch, honest worker reports and a built-in way to verify it is working.

A deep re-audit against Anthropic's coordinator-pattern cookbook, plus one big real-world build, turned into a second generation of the flagship kit's delegation policy.

Features #

Kit delegation policy v2 - the plan-big-execute-small fragment now teaches dispatch craft: a dispatch names the files, the exact change, the constraints and the test to run, so the worker executes instead of re-deriving context. It prefers fewer, larger chunks because every dispatch pays a fixed overhead. A dispatch that dies with an error is re-dispatched unchanged instead of absorbed into the expensive main loop. Plans build on scout-verified findings, not on memory. Each rule ports a caveat from Anthropic's cookbook.
Worker reports open with a status - done as dispatched, done with deviations, partial or blocked. Large changes report per-file line ranges instead of pasting whole diffs, so the planner's context stays lean and review happens in git diff.
cleat kit show tells you how to verify routing - after a heavy session /usage should show the bulk of tokens on the worker model. A heavy session showing only your session model means delegation is not happening.

Fixes #

Parallel worker dispatch is out - launching several workers in one message scrambled briefs, spawned duplicates and clobbered files on a real build. The failure classes match open upstream Claude Code bugs (duplicated parallel fan-out, concurrent edit clobbering), unfixed as of this release. The fragment now dispatches one worker at a time: dispatch, review, then dispatch the next. The ban lifts once upstream ships deduplication and edit protection.
The confirm screen stops overstating scout - scout structurally lacks Edit and Write but its Bash stays inspection-only by instruction, so the screen now says read-only by contract instead of stating it flat. When your own agent shadows a kit agent by name, the collision warning now says the kit's policy will steer your agent with its model and its tools.

Changes #

1447 (+8) behavioral tests across 47 files - every new fragment rule, the worker report contract, the sequential-dispatch ban, the confirm-screen wording, the collision note and the kit show verification note each pinned by a test
298 (+8) mutations caught, 0 missed, 0 skipped
README copy polish

v1.2.3 #

2026-07-14

The kit picker stops eating arrow keys and starts explaining itself. Every box finally learns the clipboard rules.

Features #

Kit picker: detail pane, chevrons, arrow-key models - the kit list now shows the highlighted kit's full description in a detail pane, so nobody has to enable one to find out what it does. The model rows show the active value in ‹ › chevrons and ←/→ (or space) cycles it, ⏎ or → selects and the hint lines name the keys. The non-TTY text picker lists each kit's description too.
The kit carries its provenance - the enable screen credits the pattern to Anthropic's cookbook by its own title (big models for planning, small models for execution), cleat kit show prints the notebook URL and the description was rewritten around it: worker and scout each run in their own context window so the main session stays lean, the mechanical bulk bills at the worker model's rate and heavy work burns your rate limit far slower.

Fixes #

Arrow keys no longer close the pickers - every unrecognized escape sequence (left and right arrows, PgUp, function keys) decoded as Escape and the picker loops cancel on Escape, so pressing the right arrow on a model row (the most natural key there) closed the picker instead of changing the model. The caps picker shared the decoder and the bug. Only q or a bare Escape cancels now. Unknown keys are ignored.
The clipboard rules reach the box at last - since v0.1.0 the image baked in-box guidance (copying with xclip/pbcopy/clip reaches your host clipboard, paste-back is not supported, exit 0 means the copy worked) at ~/.claude/CLAUDE.md and since v0.1.0 the host ~/.claude mount shadowed it, so no box ever saw it. The generated CLAUDE.md every box mounts now composes three marked layers: your global content first, byte for byte, then the box notes, then the kit section when one is enabled. Existing boxes pick it up on their next start with no recreate.

Changes #

1439 (+15) behavioral tests across 47 files - the keypress decoder pinned key by key, the picker driven through arrows, chevrons, pane and cancel paths on a real pty, the three-layer CLAUDE.md compose and its ordering, a byte-identity guard between the box notes and the image bake and a pane-fit guard so no future kit description silently truncates
290 (+6) mutations caught, 0 missed, 0 skipped

v1.2.2 #

2026-07-12

The kit's scout subagent loads again. Every mode of working now routes to the kit's cheap models.

A live probe of the plan-big-execute-small kit (dispatch each agent, ask for its model) showed the worker answering on Sonnet 5 and the scout never registered at all. The follow-up audit found that even a healthy kit let exploration and multi-agent fan-outs ride expensive models.

Fixes #

Kit scout agent registers again - since v1.2.0 the generated kit-scout.md carried a colon-space inside its unquoted YAML description ("Use for all exploration: finding"), which is invalid YAML and Claude Code drops an agent whose frontmatter fails to parse without a word. Every scout dispatch errored with "agent not found" and the planner quietly fell back to searching in the main loop, so the kit ran at half strength with nothing visibly broken. The description is reworded, the no-colon constraint is documented at the source and a regression test now validates every generated kit agent frontmatter as strict plain-scalar YAML. Existing kit boxes heal on their next cleat start with no recreate (the overlay regenerates in place). The scout appears in the first session started after that.
Kit routing covers every mode of working - the kit's delegation policy only governed the planner's own loop, so exploration could ride the built-in Explore agent, multi-agent workflow fan-outs defaulted to the expensive session model and sizable single-file work never triggered the worker. The policy now names worker and scout as the only subagents the planner dispatches (built-ins are banned by name), routes workflow stages through them explicitly, widens the dispatch trigger to sizable single-file changes, keeps the locating step of judgment reads on the scout and requires the session to report a missing worker or scout as a broken kit instead of quietly substituting a built-in. The worker also gains a tools pin (Read, Edit, Write, Grep, Glob, Bash) mirroring the scout's.

Changes #

1424 (+6) behavioral tests across 47 files - a frontmatter class check that generates the real overlay and fails on any unparseable kit agent, plus five policy-content tests (workflow routing, built-in ban, single-file trigger, scout-locates carve-out, worker tools pin), each mutation-verified
284 (+7) mutations caught, 0 missed, 0 skipped

v1.2.1 #

2026-07-12

Sessions on a root-only host launch again. The memory advisories stop calling a native Linux engine a VM.

A first run on a stock Ubuntu VPS surfaced both: the box came up green and Claude immediately exited and the readiness nod above it claimed a 4 GB "VM" with "room for many parallel sessions" on a host that has neither.

Fixes #

Sessions on a root-only host no longer die at launch - Cleat maps the box user to your host UID so workspace files come back owned by you and on a root-only host (a stock VPS image, or sudo cleat) that makes the box user root, which Claude Code itself refuses to run with permissions skipped (its root/sudo guard, upstream issue 9184). Every session now rides IS_SANDBOX=1, Claude Code's own escape hatch for sandboxed containers, on root hosts only: ordinary boxes keep an unchanged environment and existing root boxes heal on their next session with no recreate.
Memory advisories name the real pool on a native Linux engine - the readiness nod on a 4 GB VPS read "Docker tuned for Cleat (4 GB VM, room for many parallel sessions)": on a native engine no VM exists, nothing was tuned and four 1 GB ceilings is not room for many. Where the daemon does run in a VM (Docker Desktop anywhere and every macOS backend) the wording is unchanged. A native engine now reads "Docker ready for Cleat (N GB RAM, ...)" and drops the parallel-headroom claim below 8 GB and the overload warning and the cleat status overcommit line say the host has only N GB of RAM instead of naming a VM you don't have.

Changes #

1418 (+7) behavioral tests across 47 files - the root-host IS_SANDBOX injection pinned on both branches at source time, the engine pinned in every readiness, overload and status wording test with refutes covering both readings and a strict-mode smoke pass over the native small-host path
277 (+5) mutations caught, 0 missed, 0 skipped

v1.2.0 #

2026-07-11

Kits: one command enables a tuned Claude Code setup inside the box. Cleat now starts Docker for you when the daemon is down.

The release that makes a Cleat box the nicest place to run Claude, not just the safest: maintainer-curated kits bring the plan-big-execute-small orchestration pattern to any box with one command. The Docker tax shrinks with an autopilot that launches your engine, waits and continues, plus a consent-first install offer on machines that have no Docker at all.

Features #

Kits: curated Claude Code pre-configurations, per box - cleat kit plan-big-execute-small enables the flagship kit, Anthropic's coordinator pattern (big models plan, small models execute) adapted into one command: run your session on Fable 5 and it holds the plan and every review while worker and scout subagents (Sonnet 5 by default) carry the mechanical bulk in their own context windows, so multi-file work burns your rate limit far slower. The kit merges on top of your own config instead of replacing it: your global CLAUDE.md comes first with the kit's delegation policy appended under a marked header, your agents win name collisions and everything applies inside that box only. Enabling, switching and turning a kit off never recreates the container and takes effect next session (boxes created before kits existed lack the mount points, so enabling there offers an explicit rebuild first). Bare cleat kit opens an interactive picker (kit, then per-agent models). cleat kit show prints the kit's full contents before you trust it. cleat kit off returns to vanilla. Pin or swap the agent models under a [kits] section in the global config (worker_model = haiku). The planner is always your session's model and a hint after enabling names the knob (/model inside the session). Adapted from the pattern in Anthropic's cookbook, with its guardrails encoded as behavior: workers report distilled results instead of raw logs, scouts never guess when a search comes up empty and the planner reads subtle code itself instead of trusting a summary.
Read-only masks close the host config write channel - a caged agent could previously write to the user-level ~/.claude/CLAUDE.md, agents and commands that your native host Claude then obeys. Those three surfaces are now read-only masks inside every new box (kit or not): the box sees your real content merged with the kit's, but cannot write a byte of it back. Project-level .claude files in your repo keep working through /workspace. Boxes created before the masks show a one-line recreate note on every start until rebuilt.
Docker autopilot: a down daemon is started for you - cleat on a stopped engine used to die with a raw "Cannot connect to the Docker daemon". Session commands now detect the engine you actually use and start it: Docker Desktop, OrbStack, or Colima (named profiles included) on macOS, Docker Desktop or a rootless engine on Linux and Docker Desktop from inside WSL2, then wait with a spinner and continue. Root-owned Linux engines get the exact sudo command printed instead of a privileged attempt, remote daemons (tcp://, ssh://) are never started for you and a daemon that is up but unreachable because your user lacks the docker group gets the real diagnosis and the usermod remedy instead of a false "Docker is down". Wait is bounded by CLEAT_AUTOSTART_TIMEOUT_SECS (default 90). Opt out entirely with CLEAT_NO_AUTOSTART=1. cleat status now says "Docker isn't running" with the exact start command instead of pretending no boxes exist.
No Docker at all? Cleat offers to install it - when the docker CLI itself is missing, interactive sessions print the exact command and ask before running anything: a three-way Homebrew menu on macOS (Desktop, OrbStack, or Colima, with licensing stated), the official convenience script on Linux (downloaded to a private directory and shown, never piped blind into a shell) and winget under WSL2. Default is No, non-interactive runs never prompt and CLEAT_NO_AUTOSTART=1 suppresses the offer.

Fixes #

Login opens your browser in boxes created before v1.1.1 - the v1.1.1 BROWSER fix landed at container create, but docker exec inherits the container's environment frozen at create time, so existing boxes never saw it and /login stayed on the manual code-paste flow with nothing ever suggesting a recreate. Every session now injects BROWSER at exec time (cleat, cleat shell, cleat login), healing every existing box on its next session with no recreate. A BROWSER= in your .cleat still wins.
A second session can no longer swallow a login link - starting cleat login or a shell next to a running session wiped the browser bridge file unconditionally, so a login URL written moments earlier could vanish before any watcher opened it. The startup sweep is now age-gated: fresh URLs survive and get opened, only stale leftovers from dead sessions are cleared. Every open-or-defer decision in the bridge log now records the URL itself, so a link that "did nothing" can be recovered by hand, including with the bridge set to off.
A fresh machine without ~/.claude/settings.json can create a box - on macOS, creating the very first box on a host where Claude had never written its settings file failed with an opaque Docker error (a nested file mount whose target was missing in the parent bind source). Cleat now pre-creates the file and refuses a broken symlink at any masked path with a clear remedy instead of a raw trace.

Changes #

cleat describe sees new companions - per-box kit selections live next to box descriptions, are removed with cleat rm and are wiped by cleat nuke.
Total-memory detection fails safe - a garbled /proc/meminfo line could kill a session under strict mode instead of skipping the memory advisory.
The README's top fold now carries a real recorded demo of an agent probing for credentials and wiping the box, not a mockup
1411 (+157) behavioral tests across 47 files - the kit merge and mask semantics (user content first, collisions by agent name, inode-stable regeneration on a running box, the :ro masks, broken-symlink refusal, the pre-mask recreate note), the full autopilot launch matrix (macOS Desktop, OrbStack, Colima profiles, Linux rootless, WSL2 interop gating, remote refusal, bounded timeout, kill switch) and the install offer's consent gates, the exec-time BROWSER injection pinned per exec site, the age-gated bridge sweep from both sides and the picker TUI driven on a real pty
272 (+31) mutations caught, 0 missed, 0 skipped
Note on upgrade: existing boxes keep working untouched. Recreating a box (cleat rm && cleat) is only needed to gain the new read-only masks and each start says so until you do

v1.1.1 #

2026-07-08

Logging in to Claude opens your browser again. A second terminal no longer asks you to log in after you have already authenticated.

Two login regressions, both surfacing after a recent Claude Code update: the login link stopped opening the browser and always fell back to pasting a code and a box you had logged out of kept demanding login on every session even after you signed back in elsewhere. Both were traced to changes in how recent Claude builds handle login, then fixed and hardened.

Fixes #

Logging in to Claude opens your browser again - a recent Claude Code update (2.1.191 and later) stopped opening any URL from inside a display-less Linux container unless the BROWSER environment variable is set. In that state it only ever hands its opener the hands-free loopback login URL (the manual code-paste URL is printed, never opened). A Cleat box is display-less Linux and set no BROWSER, so Claude never invoked the container's open shim, the host-side bridge had nothing to open and every login dropped to "paste this code". Cleat now points BROWSER at its bridge shim inside the box, so cleat login (and the first-run login) opens the real login page on your host and completes automatically through the callback proxy, the way it did before. The variable is baked into the image and also passed at container start, so a box created from an older image is fixed without a rebuild and a BROWSER you set in .cleat still wins.
A second terminal no longer re-prompts for login after you have authenticated - logging out inside one box writes hasCompletedOnboarding: false into that box's per-project Claude config and deletes the shared credential file. Recent Claude builds gate the startup login screen on that one flag, so the box you logged out of kept asking you to log in on every session even after you signed back in from another terminal and the shared credential was restored. Cleat only re-asserted that flag when it first created a container, never afterward. It now heals the per-project config whenever you start a stopped box, resume one, or attach to a box that is already running (in place and only when nothing is working inside it) and it carries an identity you established in any box across to boxes that were created earlier. Log in once and every box is logged in.

Changes #

The browser bridge now covers cleat shell and sweeps its own stale markers - cleat shell runs the same host-side browser watcher a session does, so a login started from a raw shell completes instead of waiting forever on a callback nobody was proxying and the watcher clears expired single-open debounce markers when it starts so a long-lived clip directory cannot accumulate them.
Adopting this release refreshes the image once - the BROWSER shim is baked into the image, so the image content advances to spec 3 and the first start after upgrading offers the usual one-time "refresh the image?" prompt. The fix already works without it (the variable is also passed at container start), so the refresh is optional and version-only releases after it will not prompt again.
1254 (+24) behavioral tests across 45 files - the auth-URL classification truth table and its strict-mode probes, the BROWSER shim present at create and ordered before user [env] so a .cleat override wins, the cross-box identity fallback and the start, resume and running-box attach heal paths (with a live-agent skip and a never-truncate guard), the cleat shell watcher and the stale-marker sweep. Two adversarial-review rounds caught and fixed a real regression in the first cut (a rebuild that could drop a box-only top-level config key for API-key users) before release.
241 (+11) mutations caught, 0 missed, 0 skipped

v1.1.0 #

2026-06-27

Closed-terminal sessions stop reserving Docker VM memory on their own. A brand-new project no longer asks you to log in to Claude when you are already logged in on the host.

The first two issues after 1.0, both from a single screenshot: a Mac left running for hours where closed-terminal boxes had over-committed the Docker VM and a fresh project that re-prompted for login even though Claude was authenticated on the host and older projects still worked.

Features #

Idle sessions are stopped automatically to free Docker VM memory - closing a terminal ends the Claude session but leaves its box running, still holding its memory ceiling, so a day of closed terminals can over-commit the Docker VM and make every session swap. On each interactive start, Cleat now stops other boxes that are safe to stop and tells you what it freed, in one line. A box is only stopped when nothing is running inside it (so a session working unattended, with its terminal left open, is never touched), it is not the box you are launching and it has been idle past a grace window (30 minutes by default). A stopped box is preserved: run cleat in its project to bring it right back. Disable the sweep with CLEAT_NO_IDLE_SWEEP=1, or change the grace with CLEAT_IDLE_GRACE_MINS.

Fixes #

A brand-new project no longer re-prompts you to log in to Claude - on macOS, logging in again on the host rotated the Keychain token and invalidated the older copy in the shared credential file. Sessions already running kept working on a live token, but a freshly created box read the stale one and dropped to a login screen. Cleat now refreshes that shared credential file from the Keychain when its token has expired and the Keychain has a newer, valid one and never overwrites a still-valid in-box token, so a fresh box (and any box on its next resume) starts authenticated. A new project is also pre-trusted inside the cage so a newer bundled Claude does not re-run its first-run onboarding there.
The cleat status memory line can no longer contradict itself - its over-commit warning compared running limits against the kernel's reported VM memory but printed the configured slider size, so a total landing between the two could read as "reserves 23 GB on a 24 GB VM". It now compares and prints in the same whole-GB unit, the same way the on-start advisory already did.

Changes #

The over-commit notice reads clearly - it now names how many sessions are still running and reminds you that closing a terminal does not stop its box, instead of a bare GB number that looked like an off-by-one against your VM size. "promised X GB" is now "reserve X GB of memory ceilings" and the stale "~4 GB ceiling" wording was corrected (the per-box default is a quarter of the VM, 4 to 8 GB).
1230 (+36) behavioral tests across 45 files - the idle-session sweep (the liveness gate across a live, detached and unreadable process list, the grace window against both mocked and real run-dir timestamps, the unknown-age skip, self-exclusion, multi-box stop with the freed-memory summary and the malformed-argument, terminal and env-var gates), the expiry-aware credential re-seed paths with a pinned clock, the per-workspace onboarding seed, the session-count copy and the status-line contradiction guard and a strict-mode regression for the singular-session notice
230 (+12) mutations caught, 0 missed, 0 skipped

v1.0.0 #

2026-06-22

The first stable release. A CLI version bump no longer re-downloads the image and recreates your container when nothing in the image actually changed.

Cleat is 1.0. The headline is that upgrades stop being disruptive: the on-start image-refresh prompt is keyed to the image's real content, not the CLI version, so a routine release leaves your running container and its writable layer untouched. This release also pins the base image by digest, fixes two config-parsing bugs and restyles the workspace-trust prompt.

Features #

The image refreshes only when the image actually changed - the on-start "refresh the image?" prompt was tied to the CLI version, so every release asked you to re-pull the image and recreate this project's container, discarding its writable layer (anything you installed in the box, caches) even when the image was byte-identical. The decision is now keyed to the image's content, tracked by a spec number baked into the image as a label, so a version-only release leaves your container alone. It still prompts when something that lands in the image genuinely changes: the entrypoint, the clipboard or browser bridge, the Dockerfile, or the base. Images built before this scheme are classified by the version they were built at, so upgrading to 1.0.0 is a single expected one-time refresh to adopt the pinned base, after which version-only releases never recreate again.
The base image is pinned by digest - the container base is pinned to an exact multi-arch image digest instead of a floating tag, so builds are reproducible and a base or security update becomes a deliberate, reviewed change that ships through the same content-aware refresh path instead of drifting in silently.

Fixes #

A hand-edited .cleat no longer drops its last line - a .cleat (or its [resources] block) whose final line had no trailing newline silently dropped that line: a capability listed last was never requested, so the trust prompt never fired and the cap never applied and a final memory = 8g fell back to the VM-derived default instead of your configured ceiling. Both the capability and resource readers now read that final line, matching the env-file parser.

Changes #

The workspace-trust prompt matches the other startup prompts - approving a project's .cleat capabilities now uses the same single aligned line and [y/N] question as the image-refresh, Claude-update and recreate prompts, with docker flagged amber because it breaks the sandbox, instead of a bordered box. It still defaults to deny: only an explicit yes applies the caps.
1194 (+37) behavioral tests across 44 files - the image content-spec comparison and its legacy version inference (per-field numeric, the equal and just-below boundaries, suffix stripping, leading-zero safety), the no-trailing-newline reads for the capability and resource parsers with mutation-verified regressions, duplicate-key first-wins for resources and the restyled trust prompt (approve, deny, empty-and-EOF default-deny, the no-box style, re-prompt wording and the docker amber rendering)
218 (+6) mutations caught, 0 missed, 0 skipped

v0.16.5 #

2026-06-17

Clicking a link in the terminal opens exactly one tab again, on any terminal. The Docker VM size now matches the slider you set (a 24 GB VM no longer reads as "23 GB").

A polish release, no migration. Both fixes came from live use: a link that opened twice and a tuned-VM line that under-reported the configured memory.

Fixes #

Clicking a link opens one tab, not two - your terminal already opens a clicked link itself (every modern terminal makes URLs clickable) and the in-container open shim also forwarded it through the bridge, so the bridge opened it again about half a second later. The bridge cannot see the terminal's open, so on an interactive terminal it now defers plain links to the terminal and opens via the bridge only what the terminal will not: auth and OAuth-callback URLs (which you never click) and non-interactive or piped runs (where nothing else opens them). This is distinct from the v0.16.3 fix, which made the bridge itself open each URL once. This stops the bridge from re-opening what the terminal already handled. Override with CLEAT_BROWSER_BRIDGE=always (open every URL through the bridge, for a terminal that does not open links itself or for scripts that open plain URLs) or off (never auto-open, though the login callback proxy still runs so cleat login completes when you open the printed URL by hand).
The Docker VM size now reads the configured slider - v0.16.4 rounded the kernel's reported memory to the nearest GB, but the kernel reserve grows with VM size, so a 24 GB VM reported about 23.4 GiB and still read as "23 GB" (and the kernel value alone cannot tell a 23 GB slider from a 24 GB one). Cleat now reads the configured slider value straight from the Docker Desktop settings file, the same way it reads swap and falls back to rounding the kernel value only when the settings cannot be read (native Linux, no Docker Desktop). The displayed size and the undersized and overload checks all use that one value, so a correctly sized VM is never flagged and the number you see is the slider you set.

Changes #

The overload warning never contradicts its own numbers - it compares the promised memory and the VM size in the same whole-GB unit it prints, so it can no longer fire while reading "promised 24 GB but only 24 GB" on a VM whose slider rounds above the kernel-reported value.
1157 (+27) behavioral tests across 42 files - the configured-slider reader across both Docker Desktop settings formats with its fail-soft, base-10 and non-1024-aligned rounding edges, the display-and-threshold agreement on a 24 GB slider, the browser-bridge open policy (plain link deferred on an interactive terminal, auth URL and non-interactive run always opened) across auto, always and off, the off-mode proxy-still-runs contract and the cleat login open-and-message behavior
212 (+11) mutations caught, 0 missed, 0 skipped

v0.16.4 #

2026-06-16

Checks that your Docker VM has enough swap, not just enough memory. Stops a phantom "config changed" recreate after you resize the VM or upgrade the CLI.

A polish release, no migration. The headline fix is a recreate prompt that fired on a box you never touched. The swap check rounds out the on-start Docker tuning advice.

Features #

Swap is checked alongside memory - memory and swap are separate Docker Desktop sliders, so a VM with the right memory can still have swap left at the default. When memory is sized right but swap is below 2 GB, the start now shows a focused advisory (memory is good, swap needs a bump) with the exact step, in place of the "Docker tuned" confirmation. docker info cannot report swap, so Cleat reads it from the Docker Desktop settings file and fails soft: a value it cannot read leaves the confirmation as-is, so it never warns on a guess.

Fixes #

No more phantom "config changed" recreate - after you resized the Docker VM or upgraded the CLI, the next start could claim "Config changed ... caps or env keys differ" and offer to recreate a box you never touched. The config fingerprint folded in a memory ceiling derived from the VM size (and from the CLI's own default formula), so a memory-slider change or a routine upgrade shifted it. The fingerprint now reads only the resource limits you set explicitly in [resources], so an unconfigured box stays stable across VM resizes and upgrades. Boxes created before this release are left alone (never re-prompted) and adopt the new format the next time they are recreated for a real reason. The message also names what actually changed now: capabilities, environment, or resource limits.
The Docker VM size now reads true to the slider - docker info reports the guest kernel's memory, which sits a few hundred MB under the Docker Desktop slider (the kernel reserves some at boot), so a 16 GB VM read as "15 GB" and could false-trigger the "Tight for parallel sessions" advisory on a VM that was in fact sized right. The size now rounds to the nearest GB everywhere it shows (the advisory, the "Docker tuned" line and cleat status) and the undersized check compares that rounded value, so a correctly sized VM is never flagged and the number you see matches the slider you set.

Changes #

A blank line after "Claude Code upgraded" - the upgrade confirmation no longer sits flush against the container bring-up that follows.
A blank line between the prune prompt and the VM advisory - on the daily check both can fire. They no longer run flush together.
No doubled blank line after the what's-new note - on the first start after an upgrade with a well-sized VM, the release highlight and the "Docker tuned" line could be separated by two blank lines instead of one.
1130 (+45) behavioral tests across 42 files - swap detection across both Docker Desktop settings formats with its fail-soft and base-10 edge cases, the swap advisory branch and its display (floored, never rounded up, "disabled" for zero swap), the VM-size rounding (a 16 GB slider read as ~15.6 GiB resolves to 16, not 15) with its no-false-warning guarantee across the advisory, the all-clear and cleat status, the VM-resize and CLI-upgrade fingerprint stability, the legacy-hash grandfather, the configured-only resolvers and the startup-spacing fixes
201 (+17) mutations caught, 0 missed, 0 skipped

v0.16.3 #

2026-06-15

Terminal links open exactly once again. An undersized Docker VM is flagged on every start until you fix it. A well-sized one now gets a one-line confirmation instead of silence.

A polish release, no migration. The headline fix is a recurring double-open on terminal links under concurrent sessions. The on-start Docker memory check is now clearer, harder to miss and honest about what it can promise.

Features #

"Docker tuned for Cleat" confirmation - when your Docker VM is sized right for parallel work, every start now prints a one-line green confirmation directly above Image ready, so a good setup gets confirmed, not just a bad one flagged. It is the exact inverse of the undersized-VM warning, so you see one or the other, never both.

Fixes #

Links no longer open twice - clicking a link in the terminal could open it twice (or more) after a longer session with a few concurrent cleat sessions. A TUI click fires the open shim on both press and release and several host watchers can be alive at once (a cleat login beside a session, two shells on one box, or one left by a crashed session). The same-URL debounce that folds those into one tab was not atomic, so two watchers could each open the same URL. It is now an atomic per-URL claim, so each URL opens exactly once no matter how many watchers are live.
No stray characters after a session ends - the clean-exit Session ended. Resume with: cleat resume line could leave leftover characters from a heavily-used terminal trailing it. The exit sequence now clears that line fully.
The Docker memory advice points at the right panels - it named a Settings → Resources → Memory panel that does not exist in current Docker Desktop. Memory limit and Swap are under Settings → Resources → Advanced. VirtioFS file sharing is under Settings → General → Virtual Machine Options. Both are now named correctly.

Changes #

An undersized Docker VM is flagged on every start until you fix it - the "your VM is too small" advice was shown at most once a day. An undersized VM is the root cause of boxes getting killed at their memory ceiling, so it now shows on every start until the VM is sized right (the same cadence as the new positive confirmation). The interactive prune prompt and the transient overload notice keep the daily cadence.
The Docker advice never quotes a session count - it used to say "room for ~N parallel sessions", which read as a hard cap. It is not one: a box memory ceiling is a limit, not a reservation, so you can run many sessions at once and the VM only swaps if many do genuinely heavy memory work at the same time (rare and no different from Claude Code outside Docker). The copy now says "many parallel sessions".
The on-start Docker advisory lands as its own section - it opens with its own blank line instead of sitting flush against the auto-update line above it and its fix steps use one consistent color treatment.
Em dashes removed from all CLI output - they read as machine-written. The output now uses ordinary punctuation throughout.
1085 (+18) behavioral tests across 42 files - the atomic link debounce (including a concurrent-watcher race test), the every-start undersized advisory, the overload-stays-daily contrast, the positive confirmation with its placement and whitespace, the warning-versus-confirmation deferral and a strict-mode smoke test for the confirmation
184 (+9) mutations caught, 0 missed, 0 skipped

v0.16.2 #

2026-06-14

Fixes the on-start Docker memory advice that went missing under load on real Macs: the "give Docker more memory" steps now always show, the what's-new note keeps its blank line and the overload notice is honest about a machine that can't grow its VM.

A follow-up to v0.16.1 from a real-Mac report, no migration. All three fixes are in the on-start startup block.

Fixes #

The "give Docker more memory" steps no longer vanish under load - on Docker Desktop the overload notice printed the warning but dropped its concrete fix (the memory value, click-path and safe max). The cause was the Docker Desktop detection: it piped docker info into grep and under the shell's strict mode an early grep match closes the pipe, kills docker info with a broken-pipe signal and the pipeline reports failure even though it matched. Under memory pressure (exactly when the advice matters) docker info is slow, so the failure was frequent. It now reads the one field it needs directly, with no pipe and detects the engine once per check instead of repeatedly.
A blank line is back above the what's-new note - the "New in" note relied on the advisory's trailing blank for separation, so when that fix block was skipped the note sat flush against the warning above it. The startup block now guarantees exactly one blank line there, never zero and never two.
The overload notice is host-aware - it always names your real RAM and the Docker VM and only offers the grow-the-VM fix when the recommended size is actually larger than the current VM. When the VM is already as large as your RAM can safely back (for example a ~7 GB VM on an 8 GB Mac, where four sessions still cannot fit), it steers you to running fewer sessions or lowering a box's own ceiling ([resources] memory in .cleat) instead of naming a VM target smaller than what you already have.

Changes #

1067 (+5) behavioral tests across 42 files - a set -o pipefail regression for Docker Desktop detection (the broken-pipe false negative), the host-aware overload steering on a VM that cannot grow and the guaranteed-single blank line above the what's-new note (open-the-gap, no-double-gap and the end-to-end advisory-then-note wiring)
175 (+5) mutations caught, 0 missed, 0 skipped - every new behavior fails when reverted. Two existing advisory mutations were retargeted to the refactored code

v0.16.1 #

2026-06-14

A follow-up to v0.16.0: macOS logins now carry into boxes, out-of-memory crashes are explained with a fix, the Docker memory advice is sized to your actual machine and the outdated-image prompt downloads instead of rebuilding.

All improvements and fixes on top of v0.16.0, no migration. A box and a cleat running in another worktree of the same project are treated as the same thing throughout: one session, one memory ceiling, sharing the one Docker VM.

Features #

Out-of-memory crashes are explained - when a box hits its memory ceiling (detected from the cgroup OOM flag or exit code 137) cleat now names it and offers three fixes instead of leaving an unexplained crash: raise [resources] memory, cap your test runner's workers, or set [resources] cpus. The usual cause is a test runner like jest or vitest sizing its worker pool to the CPU count while a box sees every host core, with no swap to absorb the spike.
Docker memory advice sized to your machine - on Docker Desktop, cleat reads both your host RAM and the Docker VM size and, when the VM is too small for about four parallel sessions (4 x ~4 GB), tells you the exact memory to set, the click-path (Settings > Resources > Memory) and your machine's safe max. It recurs on the daily check while the VM stays small, then stops once you fix it. A native-Linux engine (no resizable VM) is left alone.
macOS login carries into boxes - on macOS, Claude keeps its OAuth login in the login Keychain, not in ~/.claude/.credentials.json, so the shared ~/.claude mount carried no token and a freshly logged-in host still re-prompted inside a box. cleat now bridges the Keychain credential into the box before launch, only when the box has none yet, so it never clobbers a fresher in-box token. No-op on Linux.
Clickable, version-anchored changelog link - the on-start what's-new note links straight to the release's section on the changelog page (#vX.Y.Z), rendered as a real clickable hyperlink (OSC 8) where the terminal supports it (iTerm2, VS Code, WezTerm, Ghostty, kitty, GNOME) and a bare clickable URL everywhere else.

Fixes #

claude doctor no longer reports "install method unknown" - the per-project ~/.claude.json cleat builds for each box now declares the native install, matching the on-disk installer, so the doctor warning and the related install_failed update record clear.
The outdated-image prompt downloads instead of rebuilding - when your local image was built by an older cleat, the on-start offer now pulls the released multi-arch image for this version (a fast download of the exact tested setup) and only builds locally if that version is not published, instead of an unconditional two-minute local rebuild.

Changes #

Default per-box memory floor raised from 2 GB to 4 GB - still a quarter of the Docker VM, now clamped to 4-8 GB (2 GB only when the VM size cannot be read). The 2 GB floor was too tight for a 1M-context session: at the 60% node-heap pin that is only a ~1.2 GB heap, which thrashed the garbage collector and OOM-killed under load. A memory limit is a ceiling, not a reservation, so the larger floor only binds when several sessions run at once, which the overload notice and the new advisory both warn about.
The release highlight reads cleaner - the try-command and changelog link each get their own labelled line instead of a cramped run-on and the changelog link is the clickable anchor above.
1062 (+45) behavioral tests across 43 files - new credentials.bats (macOS Keychain bridge), plus installMethod cases, OOM detection, pull-first image refresh, the host-relative VM advisory with click-path and safe max, the 4 GB floor, host-RAM detection, OSC 8 hyperlink detection and fallback and the version-anchored changelog link
170 (+26) mutations caught, 0 missed, 0 skipped - every new behavior fails when reverted, verified deterministic across three full harness runs

v0.16.0 #

2026-06-11

Native Apple Silicon images, right-sized per-box memory plus optional CPU limits, automatic disk GC and an end to frozen terminals, zombie wedges and duplicate browser tabs.

Everything in this release comes from one live performance investigation: four boxes froze at once, the terminal filled with escape garbage, Docker Desktop showed AMD64 badges on an Apple Silicon Mac and 121 GB of stale images had piled up. Each symptom is fixed at its root cause. Existing boxes get a one-time recreate offer on start. Project files and ~/.claude state survive recreation.

Features #

Multi-arch prebuilt images - the GHCR image is now published for amd64 and arm64 (built on native runners, merged into one manifest). Apple Silicon pulls run natively instead of under emulation, which was the documented trigger for node segfaults, hangs and garbled sessions.
Arch-aware CLI - pulls are pinned to your Docker daemon's architecture, a cached image of the wrong arch is treated as missing and replaced and cleat status flags an emulated image loudly with the arch shown in pull and ready lines.
Per-box resource limits via [resources] - set memory = 4g and optionally cpus = 2 (decimals OK) in the global config, .cleat, or .cleat.<box>. The memory default is a quarter of the Docker VM clamped to 2-8 GB, a ceiling sized so a runaway box OOMs alone instead of swap-thrashing every session. Box swap is off by design. CPU is unlimited unless you set it. Repo-supplied values are clamped (8g memory, your core count for cpus) so an untrusted .cleat cannot overcommit your machine. Sessions pin node's heap to ~60% of the box limit so node stops believing in memory the box does not have.
cleat prune - removes only cleat-owned image bloat: dangling superseded builds and old prebuilt version tags. Boxes (even exited ones) and other projects' images are never touched. The same GC runs silently after every successful pull and rebuild, so the bloat stops accreting.
Daily pressure check - on an interactive start, cleat warns when stale images pile past ~5 GB and offers the prune with one keypress and prints a one-line notice when running containers are promised more memory than the Docker VM has - the freeze-everything condition, surfaced before the freeze.
Zombie reaper - boxes now run a real PID 1 (--init) that reaps orphaned processes and forwards SIGTERM: long sessions no longer wedge at the pids cap with a frozen terminal and cleat stop is instant instead of a 10-second timeout into SIGKILL. Pre-existing boxes get a one-time [Y/n] recreate offer on start or resume.
Sharper diagnostics - cleat status shows the image architecture (with an EMULATED warning), the live zombie count in a running box and a VM-overcommit line. cleat ps explains Exited (255) as a Docker restart with the resume hint.

Fixes #

Browser links no longer open multiple tabs - one click could fire the in-box open shim twice and watchers leaked by crashed sessions each claimed a write, so tabs multiplied the longer cleat had been running. A same-URL debounce window now folds repeat writes into one tab (distinct URLs, like OAuth logins, are never deduped) and every host-side watcher - browser, clipboard, hook bridge - now exits on its own when the cleat process that spawned it is gone, instead of polling forever. An orphaned clipboard watcher can no longer overwrite the host clipboard and an orphaned hook bridge can no longer run host hooks for a dead session.
A crashed claude no longer leaves the terminal spraying escape garbage - after every interactive session, shell and login, cleat restores the terminal: stty sane plus resets for alt-screen, mouse tracking, bracketed paste and the hidden cursor. All no-ops on a clean exit, full recovery after a SIGSEGV.
Crashes are no longer reported as clean exits - the session script used to lose claude's exit code to the clipboard-daemon cleanup and then erase the crash message. The real code now survives, docker's stderr is shown on failure and the cosmetic line-erase only runs on a clean exit on a real TTY.
Memory-clamp bypasses closed - a project config could spell zero as 00g (docker reads --memory 0 as unlimited) or use 64-bit-overflowing values to slip past the 8g clamp. Both are rejected now and leading-zero values like 08g no longer trip octal arithmetic.
Terminal type forwarded - TERM (with a sane fallback) and COLORTERM (when set) are passed into sessions, fixing wrong key sequences and degraded colors in iTerm2, Ghostty and tmux.

Changes #

Spinner hardening - the progress spinner is no longer disowned (its stop really synchronizes), guards against nested starts and erases itself with the cursor restored if the parent process dies hard.
1017 (+116) behavioral tests across 41 files - new arch.bats, resources.bats, prune.bats, init_recreate_check.bats, watcher and spinner orphan harness tests with a zombie-aware process helper, no-seam-override prune ownership tests, marker-file auto-GC spies, status and ps diagnostic assertions, COLORTERM and TERM-fallback subprocess tests and an executed-session-script exit-code test
144 (+49) mutations caught, 0 missed - including the auto-GC call sites, the prune label and repo-scope filters, the status EMULATED comparison, the zero and overflow memory guards, the cpus clamp and wiring, the exit-code capture order and the watcher liveness checks

v0.15.1 #

2026-06-09

Three real-world fixes: an ssh box now survives a Mac reboot instead of failing to start, cleat upgrade-claude no longer dies with a permission error and the startup output reads as one clean block.

All bug fixes, no new commands, no migration. Pick them up by rebuilding the image (cleat rebuild, or accept the on-start "image is outdated" prompt) so the new entrypoint takes effect.

Fixes #

An ssh box survives a reboot - macOS regenerates the SSH agent socket directory on every restart, so the path baked into an ssh-cap container went stale and docker start aborted with a cryptic mount ... not a directory error, prompting you to remove the container. Cleat now checks every bind source before restarting a stopped container and, if one has vanished, recreates the box transparently with a Recreating container (host paths changed) note. Sessions, auth and keys live in other mounts, so nothing is lost. cleat resume self-heals the same way and continues with --continue. The check also covers a ~/.gitconfig or ~/.ssh removed between sessions.
cleat upgrade-claude no longer fails with EACCES - the Claude installer stages each downloaded build under ~/.cache/claude/staging before moving it into ~/.local. That directory is owned by the image build UID and is remapped to your host UID at runtime, but the entrypoint chowned only ~/.local, so the staging mkdir failed with EACCES: permission denied. The entrypoint now chowns ~/.cache too, fixing cleat upgrade-claude, the on-start update prompt and a manual in-container claude update.

Changes #

Contiguous startup output - removed the stray blank line before Image ready (cached), so the green bring-up block (a rebuild's Image rebuilt, then Image ready, Container started, ...) reads as one group instead of being split. The on-start release highlight now carries its own trailing blank, so it stays visually separate without a blank leaking into the bring-up.
901 (+6) behavioral tests - 3 bind-source detection cases and a rotated-SSH-socket regression in start_resume.bats/regressions.bats, an entrypoint ~/.cache chown test and a release-highlight trailing-blank test
95 (+4) mutations caught - the vanished-bind-source recreate, the entrypoint ~/.cache chown, the contiguous-bring-up blank-line rule and the highlight trailing-blank

v0.15.0 #

2026-06-09

Cleaner, quieter startup plus a config-drift fix: upgrading Cleat no longer triggers a false "Config changed" prompt, browser links open one tab instead of one per past session and the on-start notices are plain text.

This release is all polish on the start path: the drift check, the update cadence, the release note and the notice styling. No new commands and existing projects upgrade with no migration.

Features #

On-start release highlight - the first few launches after an upgrade show a short, non-blocking note about what changed in the new version, then it goes quiet. Local only (no network), TTY only and fail-safe: if a release forgets to refresh the highlight copy, or the state cannot be written (read-only install), it shows nothing rather than stale or endlessly repeating text.
Self-update check cadence from 24h to 10min - a published release now surfaces within minutes of a launch instead of up to a day later. The throttle gates only the network lookup, not the prompt itself. Decline-suppression and offline fail-soft behavior are unchanged.

Fixes #

No false "Config changed" after a version upgrade - the config-drift fingerprint no longer includes the CLI version, so bumping Cleat no longer fires a "caps or env keys differ" prompt on a container whose caps and env are untouched. Version drift is handled by the separate image-rebuild prompt, which offers a real rebuild (the only thing that actually applies new container setup) instead of the no-op recreate config drift used to suggest. The fingerprint now also sorts env keys internally, so a reordered env list can never drift the hash.
Browser links no longer open duplicate tabs - a session that died without running its cleanup (crash, SIGKILL, closed terminal) left an orphaned watcher on the bridge directory. The next session started another, so one in-container open produced one tab per live watcher. Each URL is now consumed with an atomic rename, so exactly one watcher opens it no matter how many are alive and orphaned watchers self-exit once their run directory is removed.

Changes #

Startup notices are plain text - the config-drift and outdated-image notices are now single-line messages in the same style as the rest of the startup output, not bordered boxes and a blank line separates the container bring-up from the preceding notices.
Removed two cosmetic opt-out env vars - CLEAT_NO_WHATS_NEW and CLEAT_NO_REBUILD_CHECK are gone. Both gated local, non-blocking, TTY-only output that needed no escape hatch. The two network opt-outs (CLEAT_NO_UPDATE_CHECK, CLEAT_NO_CLAUDE_UPDATE_CHECK) remain, since disabling a phone-home is legitimate for a sandbox tool.
895 (+31) behavioral tests across 37 files - new whats_new.bats (the release-highlight state machine) and browser_bridge.bats (consume-once and orphan self-exit), plus fingerprint order-independence and value-exclusion cases, a CLAUDE_CHECK_INTERVAL stale-side bracket, the CLAUDE_ENV exact-key-set guard and a startup-spacing test
91 (+12) mutations caught - covering version-out-of-fingerprint, cap and env-key sorting, env-value exclusion, the bounded-highlight cap, the plain-text drift and image notices, the rebuild and image-ready blank-line rules, the consume-once rename, the orphaned-watcher self-exit and the 10-minute Claude-check window

v0.14.0 #

2026-06-06

Boxes - multiple named, least-privilege sandboxes per project over the same live files - plus runtime hardening for long-lived containers, the node:24 base image and a single-inspect cleat ps / cleat status.

Run a locked-down dev box beside a cloud-capable az box on the same /workspace, each with its own capabilities, writable layer and Claude session. The default box is byte-identical to the pre-boxes container, so existing projects upgrade with zero migration.

Features #

Boxes - named per-project sandboxes - cleat <verb> [box] runs a named, isolated container scoped to the current directory, mounting the same live files as every other box for that project. A locked-down dev box can run beside a cloud-capable az box over the same repo, where the dev agent can't reach the Docker socket or cloud token the az box holds. Containers are named cleat-<dir>-<hash>-<box> (project hash preserved, dir re-truncated under Docker's 63-char limit). The default box (main or empty) is byte-identical to the legacy cleat-<dir>-<hash> container, so existing projects keep working unchanged. Per-box Claude sessions, history and .claude.json (re-keyed by box, with the default box keeping the legacy key so history survives the upgrade), plus an sh.cleat.box container label.
Per-box capabilities and trust - a box's caps come from .cleat.<box>, which replaces rather than merges .cleat - so a box can declare *fewer* caps than the project default, which is the whole point of least privilege - with fallback to .cleat and .cleat.<box>.env for env. Per-box workspace trust via a 3-column trust file (legacy two-column rows still read as the main box). cleat config <box> --enable <cap> writes per-box config.
Box management - status, ps, describe - cleat status lists this project's boxes and their running state. cleat ps gains a box column. cleat describe <box> [text] (and --desc at start) sets a host-side description that never recreates the container or wipes its writable layer.

Fixes #

Docker cap survives Docker restarts - the socket's owning GID is re-resolved and coder re-added to that group on every session exec (_heal_docker_sock, run as root, timeout-wrapped, a no-op when the cap is off), not just once at container start. A Docker Desktop restart that renumbered the socket GID under a days-old running container used to wedge docker with permission denied - a bare docker exec never re-ran the entrypoint - and now self-heals on cleat, cleat shell and cleat login. The entrypoint and heal path use groupmod to re-point the docker-host group idempotently across GID changes. A dead socket inode (Desktop replaced the socket) is detected and surfaced with cleat stop && cleat resume recovery guidance rather than auto-restarting and killing a live session.
clip-daemon fork-storm - socat now takes a -T 5 inactivity timeout, so a client that connects and never sends can't leave a handler hung on head -c forever. Accumulated hung handlers were exhausting the container's PIDs (fork: Resource temporarily unavailable) and crashing Claude with exit 254. --pids-limit raised from 1024 to 4096 for headroom.

Changes #

Base image node:20-bookworm-slim → node:24-bookworm-slim - current LTS, same Debian bookworm so the Docker CE apt repo line and the node:node 1000:1000 user the entrypoint expects are unchanged. Claude Code is a standalone binary, so the bundled node version doesn't affect it.
Single combined docker inspect for cleat ps and cleat status - both now read a row's box label, running state and /workspace source from one docker inspect per container (was two inspects plus a docker ps in ps and an inspect plus an is_running/container_exists re-probe in status). Field order box|running|path keeps a literal | in a project path from corrupting the parse.
864 (+104) behavioral tests across 35 files - new box_name.bats, boxes.bats, box_hardening.bats, box_fuzz.bats (property/fuzz tests over a cross-product of adversarial paths and box names) and docker_cap.bats, plus box cases in container_name.bats and adapted binary-level smoke, terminal, stub and integration tests
79 (+12) mutations caught - covering default-box and main-container byte-identity, session-key box threading, caps replace-not-merge, cleat status box discovery, description lifecycle, the docker-cap session self-heal, the socat idle timeout and the ps/status single-inspect parse
Ignore the .claude_update_check runtime artifact in .gitignore

v0.13.1 #

2026-06-05

Fixes a startup freeze and a recurring clipboard-daemon permission error that surfaced after upgrading to v0.13.0 on hosts whose UID differs from the image (every macOS user).

Both trace to a freshly recreated container: Claude Code's own launch-time self-updater could hang the TUI and a session could start before the container finished remapping its user - leaving clipboard files owned by the wrong UID.

Fixes #

Startup freeze after recreate - On a freshly recreated container, Claude Code's launch-time self-updater could run under docker exec and hang the terminal. Cleat now sets DISABLE_AUTOUPDATER=1 for every session and in the image - Cleat owns the bundled Claude version via the image and cleat upgrade-claude, so an in-container self-update is both ephemeral and unnecessary. Manual claude update still works.
Clipboard Operation not permitted storm - A session could docker exec into a just-created container before the entrypoint finished remapping the coder user to the host UID, so clip-daemon created /tmp/clip.* owned by the stale image UID. Later correctly-mapped sessions then couldn't clean them from the sticky /tmp and spewed rm/socat errors on every launch. Fixed in depth: sessions wait for the UID remap before launching (cleat, cleat shell, cleat login). clip-daemon keeps its socket, pidfile and handler in a per-UID directory (/tmp/cleat-run-<uid>) so two UIDs can never collide. socat uses unlink-early. And the entrypoint clears stale runtime files as root on every start, auto-healing already-affected containers on recreate.
cleat upgrade-claude ownership - The throwaway upgrade container now receives the host HOST_UID/HOST_GID, so the committed ~/.local is owned by the runtime user and manual claude update keeps working after an upgrade.

Changes #

760 (+5) behavioral tests across 30 files - new regression tests for the auto-updater disable, the UID-remap wait, the per-UID clip directory and the clip↔clip-daemon socket-path consistency, plus an entrypoint stale-file-cleanup test
67 (+5) mutations caught - one per fix above

v0.13.0 #

2026-06-03

The bundled Claude Code stays current and per-container state stays put - adds cleat upgrade-claude and an on-start update prompt, fixes native claude update, makes ad-hoc installs survive restarts and removes the cloud CLI caps in favor of host install + the env cap.

Claude Code is no longer frozen for the life of an image: you can bump it in place and Cleat offers to do it for you when a newer build ships. Anything you install inside a container - and its login - now survives cleat restarts instead of silently vanishing. Capabilities slim to two categories (mount / sandbox).

Features #

cleat upgrade-claude [stable|latest|VERSION] - Upgrade the Claude Code build bundled in the image without a full rebuild. Runs the official installer in a throwaway container, docker commits the result back over the local image, reports the before → after version and offers to recreate the current project's container so the new build is live immediately. Hardened: strict channel validation (rejects shell injection like 2.1.0; rm -rf ~), set -euo pipefail in the in-container install so a failed download can't silently commit an unchanged image, post-install binary verification and best-effort orphaned-layer cleanup
On-start update prompt - An interactive cleat start checks at most once every 10 minutes (cached) whether a newer Claude Code than the image bundles is available and offers a durable image upgrade before launching. TTY-only and network-failure-safe so it never blocks scripts or a flaky connection. Strictly-newer so it never nags to downgrade. Defaults to the latest channel - override with CLEAT_CLAUDE_CHANNEL, or disable entirely with CLEAT_NO_CLAUDE_UPDATE_CHECK=1
Two more on-start prompts: image rebuild and CLI self-update - When the local image was built by an older Cleat, an interactive *"Rebuild the image now? [Y/n]"* offers to rebuild and recreate this project's container before launch (replacing the old static "run cleat rebuild" notice, disable with CLEAT_NO_REBUILD_CHECK=1). When a newer Cleat release is available, *"Upgrade now? [Y/n]"* applies the update and re-execs the new version for this run (CLEAT_NO_UPDATE_CHECK=1 to disable). Both are TTY-only and never nag to downgrade

Fixes #

Native claude update works inside the sandbox - claude update and the on-launch auto-updater failed with EACCES on any host whose UID isn't 1000 (every macOS user): the entrypoint remapped the coder user at runtime but never chowned ~/.local, where Claude Code's binary store lives, so it stayed owned by the build UID. The entrypoint now chown -Rs ~/.local after the remap, so the native update path works and claude doctor reports a healthy install. In-container updates remain ephemeral - cleat upgrade-claude is the durable path
Ad-hoc installs and logins survive cleat restarts - The per-container settings overlay, clipboard bridge and hook spool moved from /tmp/cleat-*-<container> to a persistent ~/.config/cleat/run/<container>/. macOS /tmp file rotation and reboots were deleting a container's bind-mount sources, forcing cleat to recreate the container and silently discard its writable layer - so a tool you installed inside (e.g. an az CLI) and its login vanished on the next start even though you never ran cleat rm. Persistent sources let cleat resume the container instead. cleat clean now prunes runtime dirs for containers that no longer exist, cleat nuke removes them all and containers created before this release migrate automatically on their next start
No more "Configuration Error / Unexpected EOF" at startup - Cleat mounted the single host ~/.claude.json read-write into every container. Since all containers run at /workspace, concurrent or interrupted writes truncated the shared file and Claude greeted you with a JSON parse error. Cleat now builds an isolated, persistent per-project ~/.claude.json (host global state as the base + this project's own approvals) and mounts that - containers never write the host file, so the corruption is gone by construction. This also fixes silent cross-project bleed: trust-dialog, MCP-server and allowedTools approvals were shared across unrelated projects and are now per-project (surviving cleat rm). A corrupt host file is backed up to ~/.claude.json.bak and left untouched rather than silently reset. Auth/login are unaffected (credentials live in ~/.claude/.credentials.json)
open / xdg-open no longer blocks on an interactive terminal - the in-container browser-bridge shim read a URL from stdin (cat) when invoked with no argument. On a terminal that blocked forever instead of printing usage. It now only consumes stdin when stdin is not a tty, so a piped URL (printf '%s' "$url" | open) still forwards while a bare open falls through cleanly.

Changes #

Removed the cloud CLI caps az, aws, gcloud (and the lazy-install framework) - at ~150-250 MB each they dominated the image's perceived weight and first-run time without earning it, so cloud CLIs now belong on the host. Pass credentials into the sandbox with the env cap (e.g. AWS_ACCESS_KEY_ID or AZURE_* in .cleat.env). The post-launch capability display collapses from three categories (mount / cloud / sandbox) to two (mount / sandbox), reversing the v0.11.0-v0.12.0 cloud-cap work
Startup output polish - Warnings (the docker-socket caution, the sandbox: caps row) now render in amber instead of plain yellow, distinct from neutral status markers. The docker-socket sandbox-break warning is amber across its whole line (not just the !), matching the docker entry in the sandbox: row. The Claude-upgrade result folds onto one line (✔ Claude Code upgraded (2.1.156 → 2.1.161)). Every interactive Y/n prompt is aligned under its headline via a shared helper. And the Project: summary row tells the truth under the docker cap - ~/proj (same path, sandboxed) since the container's workdir is the host path there, not /workspace
macOS CI pinned to macos-15 ahead of GitHub's macos-latest → macOS 26 migration (Jun 15 - Jul 15 2026), keeping the bash 3.2 compatibility canary deterministic
755 (+84) behavioral tests across 30 files - new upgrade_claude.bats, claude_update_check.bats, entrypoint.bats, run_dir.bats, claude_json.bats and image_rebuild_check.bats suites. The az/aws/gcloud capability tests were removed with the feature
62 (+22) mutations caught - covering channel validation, the install pipefail guard, the ~/.local chown, the on-start update check's strictly-newer / channel-injection / TTY-only guards, the runtime-dir relocation and the per-project .claude.json isolation mount

v0.12.3 #

2026-05-03

Fixes a long-standing cleat start failure mode - when /tmp rotated individual overlay files but kept the parent dir, the existing container would fail to start with an opaque OCI runtime "not a directory" error instead of cleanly recreating.

User declined the drift recreate prompt, then cleat start aborted with error mounting "/host_mnt/private/tmp/cleat-settings-<cname>/project-settings.local.json" ... not a directory: Are you trying to mount a directory onto a file (or vice-versa)?. Docker auto-creates a missing bind source as a directory, which can't mount onto a file destination inside the container. The pre-fix stale-mount detection only checked dir existence, so this partial-rotation state slipped past the gate and into docker start.

Fixes #

Stale-mount detection covers per-file rotation - new _settings_overlay_intact helper enumerates the container's bind sources via docker inspect and verifies each source inside /tmp/cleat-settings-<cname> is a regular file before docker start. cmd_start auto-recreates and cmd_resume errors out with "host paths changed" guidance when any expected file is missing or the wrong type - the dir-only check would let this state through to an opaque OCI runtime failure
cmd_resume stale-mount message generalized - "host was rebooted" → "host paths changed" so the message covers reboots AND partial-/tmp-rotation cases the new check catches

Changes #

671 (+1) behavioral tests across 24 files - 1 new regression test in regressions.bats simulates the partial-rotation state (overlay dir present + one referenced file missing) and asserts cmd_start auto-recreates instead of falling through to docker start
40 (+1) mutations caught - new v0.12.3_overlay_intact_per_file_check mutation deletes the per-file regular-file check from _settings_overlay_intact and confirms the regression test fails
resume: stale mounts show clear error directing to cleat start updated to match the new wording

v0.12.2 #

2026-05-03

Fixes a v0.12.1 papercut - the new drift recreate prompt rendered as garbled \033[1m...\033[0m literals instead of the intended bold container name. Also fixes a CI-only flake in _hook_bridge_cleanup that was masking real failures under ./test.sh.

The prompt was shipped using echo -n, which prints backslash escapes verbatim. ${BOLD} and ${RESET} are ANSI escape strings that need echo -e to be interpreted. Users hitting the prompt saw Recreate \033[1mcleat-foo\033[0m now? [Y/n] instead of the intended bold container name.

Fixes #

Drift recreate prompt renders ANSI escapes - _resolve_config_drift switched from echo -n to echo -en so ${BOLD} and ${RESET} are interpreted as ANSI escape sequences instead of printed verbatim
Flaky _hook_bridge_cleanup test - the test backgrounded sleep 60 & directly in the bats shell, so when _hook_bridge_cleanup killed the children, bats's DEBUG trap raced bash's SIGCHLD reaper and bash emitted wait_for: No record of process to stderr - flipping the suite to "1 failed" under ./test.sh while passing in isolation. Disowned the backgrounded sleeps so bats's job table doesn't reference them. Also tightened the test: it was iterating _HOOK_BRIDGE_CHILDREN *after* _hook_bridge_cleanup zeros it, so the kill -0 assertions never ran - now snapshots the PIDs into locals first

Changes #

670 (+1) behavioral tests across 24 files - 1 new regression test in regressions.bats asserts the prompt output contains no literal \033[1m / \033[0m substrings
39 (+1) mutations caught - new v0.12.1_drift_prompt_ansi mutation reverts echo -en back to echo -n and confirms the regression test fails

v0.12.1 #

2026-04-28

Drift detection now prompts to recreate the container interactively instead of just printing a notice - closes the most common UX gap users hit after cleat config --enable <cap> followed by cleat.

The fingerprint-based drift detection landed in v0.3.0 already covered cap and env-key changes, but the response was a static "Run: cleat rm && cleat" notice. Users who enabled hooks (or any other cap) on an existing container kept the old mount set and silently saw nothing change - hooks never fired, env vars were missing, etc. Now cleat, cleat resume and cleat claude ask "Recreate <cname> now? [Y/n]" before any docker operation. Sessions persist on the host (~/.claude/projects/<key>/) and survive the rebuild, so accepting is safe by default.

Fixes #

Auto-prompt drift recreate - new _resolve_config_drift helper invoked early in cmd_start, cmd_resume and cmd_claude (before any docker operation). On a TTY with config drift it prompts the user, stops + removes the container, cleans /tmp/cleat-{hooks,clip,settings}-<cname> and falls into the existing "no container" path so cmd_run rebuilds with the new caps/env. Non-TTY runs (CI, scripts) print the legacy informational notice and continue with the existing container - Cleat never auto-destroys without explicit consent
Flaky docker test stub - test/fixtures/mock_bin/docker routed ps calls to ps_a_output whenever *"-a"* matched the args as a substring. Container names of the form cleat-<basename>-<hash8> carry a literal -a whenever the random hash starts with hex digit a - roughly 1 in 16 runs flipped is_running to falsely report true on brand-new containers and broke docker_commands.bats on Ubuntu CI. Switched to the same token-bounded pattern ( -a | --all ) the test-side mocks already use

Changes #

check_drift is now image-version-only - config drift moved into _resolve_config_drift and called earlier in the lifecycle so the prompt fires before docker start of a stale container
669 (+8) behavioral tests across 24 files - 5 new drift-resolution unit tests in capabilities.bats (no-container no-op, hash-match no-op, non-TTY notice, TTY accept removes container, TTY decline keeps container), 3 new stub-routing tests in stub_validation.bats (token-bounded -a , --all , default-to-ps), 1 new regression test in regressions.bats pinning the wiring of _resolve_config_drift from cmd_start
38 (+1) mutations caught - new v0.12.1_drift_recreate_wired mutation drops the _resolve_config_drift call from cmd_start and confirms the regression test fails
regression v0.5.1 updated to also mock docker ps -a so the new cmd_run fallback in cmd_claude (post-drift recreate) doesn't mask the original _RESOLVED_PROJECT assertion
_resolve_config_drift no-op stub added to test files that exercise cmd_start / cmd_resume / cmd_claude (hooks, regressions, edge_cases, terminal_ux, capabilities, start_resume, docker_commands)
Documentation: concept/10-capabilities.md "Drift notice" section reflects the interactive flow. cli/README.md "Configuration drift detection" section now describes the prompt instead of the old static notice

v0.12.0 #

2026-04-26

aws and gcloud caps round out cloud CLI coverage and the post-launch caps display now groups capabilities by behavioral category - same UI in the CLI and on the landing page.

az introduced the lazy-install framework in v0.11.0. aws and gcloud reuse it. The summary block previously rendered active caps as a single inline row, which scaled poorly past four or five names. The new categorized renderer breaks them into mount / cloud / sandbox lines with consistent colour coding so the categorization itself teaches users what each cap actually does.

Features #

aws capability - cleat config --enable aws or cleat --cap aws mounts ~/.aws (read-write) so aws configure and SSO sessions persist on the host. AWS CLI v2 (~150 MB) is lazy-installed inside the container on first activation from the official awscli-exe-linux-{x86_64,aarch64}.zip bundle, with architecture auto-detected via dpkg --print-architecture
gcloud capability - cleat config --enable gcloud or cleat --cap gcloud mounts ~/.config/gcloud (read-write) so gcloud auth login credentials persist on the host. The Google Cloud SDK (~200 MB) is lazy-installed via Google's official Debian repo at packages.cloud.google.com, pinned by GPG keyring at /etc/apt/keyrings/cloud.google.gpg
Categorized caps display - _print_caps groups active caps into mount (green: git, ssh, env, hooks, gh), cloud (blue: az, aws, gcloud) and sandbox (amber: docker). Single-line form when only one category is active, multi-line block with category labels and inline notes ((lazy install), (breaks isolation)) when caps span two or more categories. The same renderer drives both the post-launch summary block and cleat status

Changes #

KNOWN_CAPS adds aws and gcloud. Both join LAZY_CAPS so the existing _run_lazy_installs machinery picks them up automatically. _lazy_cap_label and _lazy_cap_probe get the new entries. _cap_description gains the picker copy
_cap_category is the single source of truth for the mount/cloud/sandbox mapping. Adding a new cap means deciding which category it falls into. The renderer handles the rest
661 (+29) behavioral tests across 24 files - 12 new aws/gcloud unit tests in capabilities.bats (mounts, registration, descriptions, install paths), 13 new categorization-display unit tests (_cap_category, _caps_bucket_active, _print_caps single-line / multi-line / empty-category branches), 4 new smoke tests for --cap aws / --cap gcloud flag parsing and config round-trips
Landing page Hero, ProblemSolution, HowItWorks and Features mockups updated to render the new categorized output 1:1 with the CLI - same labels, same colours, same indentation
Full design in concept/10-capabilities.md (new aws and gcloud cap sections + Display categories table). docs/cli.md and cli/README.md updated to match

v0.11.0 #

2026-04-25

az capability and a reusable lazy-install framework - opt-in tools too large to ship in the base image now install inside the container on first activation, with auth dirs persisted on the host.

The gh and docker caps already pre-install their CLIs in every image. That doesn't scale to cloud-vendor tooling: azure-cli is ~250 MB, awscli ~80 MB, google-cloud-cli ~200 MB. Pre-installing all of them would inflate the image for every user. The new lazy-install framework keeps the base image lean and pushes the cost to users who actually opt in. az is the first cap to use it. aws and gcloud are queued to follow the same pattern.

Features #

az capability - cleat config --enable az or cleat --cap az mounts ~/.azure (read-write) so az login tokens persist on the host across cleat rm, cleat nuke and cleat rebuild. Same auth-persistence model as gh
Lazy-install framework - caps listed in the new LAZY_CAPS registry have an install script at cli/docker/cap-installs/<cap>.sh. After docker run -d, cleat probes the container with command -v <tool>. If absent, it runs the install script via docker exec --user root with a spinner. Subsequent starts hit the fast path and skip entirely. Aborts cleat on install failure rather than silently launching a half-broken environment
Per-container install scope - the tool itself lives inside the container (lost on cleat rm, preserved across cleat resume and Docker daemon restarts). Auth dirs always bind-mount from the host, so credentials survive every container lifecycle operation
Audit-friendly install path - cap-installs/az.sh spells out the apt repo + GPG keyring steps explicitly (Microsoft's official Debian 12 repo at packages.microsoft.com) rather than piping aka.ms/InstallAzureCLIDeb to bash, so each step is reviewable

Changes #

KNOWN_CAPS adds az. The config picker, cleat config --list and --cap validation pick it up automatically
_run_lazy_installs is invoked from cmd_start, cmd_resume and cmd_claude before exec_claude - so any path that launches Claude in a container with a lazy cap active gets the install
_lazy_cap_is_installed is exposed as an override point so tests can simulate both the missing-tool and present-tool branches without an actual command -v round-trip
632 (+12) behavioral tests across 24 files - 9 az-cap unit tests in capabilities.bats (mounts, registration, description, install/skip/no-op/failure paths), 2 smoke tests in smoke.bats (--cap az --help, cleat config --enable az round-trip)
36 mutations caught
Full design in concept/10-capabilities.md → "Lazy install capabilities" section + dedicated az cap section

v0.10.1 #

2026-04-24

First-run no longer rebuilds locally when a transient pull error hides an already-cached prebuilt image.

_do_pull always issued a network pull against GHCR even when the version-tagged prebuilt image was already on disk. A transient registry, network, or auth error there flipped the image into "unavailable" and triggered a 2-5 min local rebuild - even though the prebuilt image was sitting in the local image store waiting to be reused.

Fixes #

Reuse cached prebuilt image without a network call - _do_pull short-circuits when ghcr.io/cleatdev/cleat:v${VERSION} is already in the local image store. It retags as cleat and returns success without touching the registry. Eliminates spurious "Prebuilt image unavailable, building locally" warnings from transient network/auth blips
Cache-hit retag failure falls through to network pull - if docker tag fails after a cache hit (disk full, permission, weird image-store state), _do_pull prints a yellow ! Cached prebuilt image found but could not be tagged warning and falls through to the normal pull flow, instead of silently lying about success

Changes #

Cache-hit success message includes image size for parity with the post-pull message: Image ready (cached v0.10.1, 487 MB)
Docker test stub gained DOCKER_TAG_EXIT_CODE env var and a cached_images mock file backing docker image inspect for testing the cache short-circuit and tag-failure paths
620 (+4) behavioral tests (1 cache-hit unit test in docker_commands.bats, 2 regression tests in regressions.bats)
37 (+2) mutations caught - short-circuit revert + tag-failure-fallthrough revert

v0.10.0 #

2026-04-22

Host Docker daemon access + workspace trust - test dockerized apps from inside the sandbox, without letting untrusted .cleat files silently escalate capabilities.

Two interlocking features. The docker capability mounts your host's Docker socket so docker compose up, docker compose exec and docker build all work against your real daemon from inside Cleat - sibling containers, zero overhead, no DinD. Workspace trust then hardens every capability against supply-chain attacks by gating project-level .cleat files through a per-project approval prompt, so cloning a random repo can no longer silently grant sandbox-escaping Docker access.

Features #

docker capability - cleat config --enable docker or cleat --cap docker mounts /var/run/docker.sock so the container's docker CLI talks to your host daemon. Containers launched from inside Cleat run as siblings on the host, not nested - zero virtualization overhead
Host-path identity mount - when the docker cap is active, the project is bind-mounted at its host path (in addition to /workspace) and --workdir is set there. $(pwd) returns a host-valid path, so docker run -v $(pwd):/app, docker build . and docker-compose.yml relative paths all resolve correctly on the host daemon. CLEAT_HOST_PROJECT is exported for scripts
Docker CLI + compose in the image - docker-ce-cli and docker-compose-plugin are installed in the container (no daemon). Entrypoint stats the mounted socket's GID and adds coder to a matching group so the user can actually talk to the daemon
Workspace trust - a project's .cleat file is now gated through per-project approval. On first launch Cleat prompts with a box listing the requested capabilities. Approval is stored at ~/.config/cleat/trust (mode 0600), keyed on a hash of the canonical (sorted, deduped) cap list. Comment edits and cap reordering don't invalidate trust. Adding, removing, or changing a cap triggers a re-prompt
Scripting escape hatches - --trust-project global flag and CLEAT_TRUST_PROJECT=1 env var bypass the prompt and record approval in one step. Non-interactive contexts without either opt-in silently default-deny project .cleat caps (global config and --cap CLI flags still apply) - the supply-chain protection
cleat trust / cleat untrust subcommands - cleat trust [path] records approval non-interactively, cleat trust --list shows trusted projects with a yellow marker for ones whose .cleat has drifted since approval, cleat untrust [path] removes approval. Safe no-op on missing or unknown paths
Docker-cap startup warning - when the cap is active, startup prints a yellow ! Docker socket mounted - container can create host-level processes line so the tradeoff is never silent
cleat status in readonly trust mode - never prompts and never modifies the trust file, regardless of TTY state. Safe for scripts that pipe through status

Fixes #

cleat resume after cleat rm - previously errored out with "No container found" because cmd_resume refused to create a container. Sessions persist on the host at ~/.claude/projects/<key>/, so the right behavior is to create a fresh container and launch Claude with --continue. Now it does. User visible: cleat rm && cleat resume just works and picks up the last session
cleat rm hint - adds a dim trailing line making it explicit that sessions are preserved: Sessions preserved. Run cleat resume to pick up where you left off.
Session overlay under --cap docker - with the docker cap, workdir is the host project path (not /workspace), so Claude Code encodes its session dir from that path (/Users/marcin/proj → projects/-Users-marcin-proj/) instead of the v0.8.0-assumed projects/-workspace/. Without a second overlay, sessions split between two host dirs. The docker cap block now mounts the per-project session dir at the host-path-derived key too, so sessions always land in the same per-project overlay regardless of which cap was active when they were created

Changes #

docker and gh capability descriptions rewritten in pure ASCII (no em-dashes) so _notice_box renders with correct alignment across POSIX and UTF-8 locales
_hash_cleat_caps pipes through awk '{print $1}' to strip md5sum's stdin-filename suffix - keeps the trust file hex-only
Trust file writes are atomic (temp + rename), 0600 permissions, refuse paths containing tab / newline / carriage return to protect the format
Session-scoped decision cache avoids double-prompts when resolve_caps is called multiple times per invocation
New test/unit/trust.bats (42 tests) plus 8 docker-cap tests in capabilities.bats, 8 new smoke tests, 2 new terminal-UX tests, 12 new regression tests in regressions.bats and 10 new mutation entries
616 (+74) behavioral tests across 24 files
35 (+7) mutations caught - all with a real revert confirming the test fails
Full design documented in concept/15-docker-capability.md and concept/16-workspace-trust.md

v0.9.2 #

2026-04-19

First-run now pulls the prebuilt image from GHCR instead of building locally, plus live pull progress and terminal-output polish.

Fresh installs were always supposed to get the ~30s GHCR pull before falling back to a local build, but cmd_run's missing-image branch called the build function directly, skipping the pull entirely. Every clean install was paying the 2-5 min build cost even though a matching prebuilt image was waiting. The pull tag is also version-matched to the installed CLI now and the pull UX shows live layer progress instead of a silent spinner.

Features #

Live pull progress - _do_pull parses docker pull's line-per-event output to show N/M layers in real time on a single live-updating line, ending with the pulled version and image size (Image ready (pulled v0.9.2, 450 MB)). Non-TTY contexts get a single info line plus the success line

Fixes #

First run skipped the prebuilt image pull - cmd_run's missing-image branch called _do_build directly. Only cleat build hit _do_pull. Result: every clean install paid a 2-5 min local build cost even though ghcr.io/cleatdev/cleat already had a matching image. Fix: _do_pull || _do_build on first run
Registry image tag was not version-matched - REGISTRY_IMAGE was hardcoded to :latest, meaning an installed v0.9.1 CLI would silently pull whatever shipped last to GHCR. Now :v${VERSION}, with a new REGISTRY_BASE so cmd_update can target the freshly-checked-out tag
Installer printed literal \033 escape codes - install.sh's spin_stop used printf %s which passes backslash escapes through unchanged. Messages built with ${BOLD}...${RESET} showed up as Downloaded to \033[1m/Users/you/.cleat\033[0m. Fix: %b to interpret escapes in the arg
Installer spinner left trailing text on shorter success lines - \r alone rewound the cursor without clearing the rest of the line, so Pinned to v0.9.1 overwriting Checking out latest release... produced Pinned to v0.9.1est release.... Fix: \r\033[K to clear to end of line
bin/cleat spinner had the same tail bug - Container started (17 chars) overwriting Starting container... (21 chars) left r... visible in the dim spinner color. Same \r\033[K fix

Changes #

Mutation runner (test/mutation_regressions.sh) now accepts an optional target file parameter so companion scripts like install.sh can be mutation-tested, not just bin/cleat
README.md: install URL now https://cleat.sh/install (short, branded, resolved to the latest tagged release by the Cloudflare worker) instead of raw GitHub on main
README.md: requirements section now lists Pro, Max, Team, Enterprise plans and API keys (was incorrectly "team or Pro plan")
README.md: first-run timing text reflects the pull-first flow (~30s pull with ~2 min local-build fallback)
542 (+4) behavioral tests (52 regressions)
25 (+5) mutations caught

v0.9.1 #

2026-04-14

macOS hardening - full bash 3.2 compatibility, 538 tests green on both platforms.

Config drift detection, the config command and all smoke tests were broken on macOS due to GNU-only commands (md5sum, timeout), bash 3.2 empty-array crashes and BSD sed incompatibilities. Every issue is fixed and 9 new tests cover the pull-fallback logic, portable hashing and update behavior.

Fixes #

Config drift on macOS - compute_config_fingerprint used bare md5sum which doesn't exist on macOS. Now uses the portable _md5 wrapper (md5 -q fallback)
Config command crash on bash 3.2 - ${current_caps[@]} expansion on an empty array triggers "unbound variable" under set -u. Fixed with the safe ${arr[@]+"${arr[@]}"} pattern
Smoke tests on macOS - timeout command doesn't exist on macOS. Replaced with portable perl -e 'alarm shift @ARGV; exec @ARGV' fallback
BSD sed compatibility - all sed -i calls in tests now use -i.bak (BSD sed requires a backup extension)
Smoke _compute_cname - reimplemented inline instead of sourcing the entire CLI (which failed on bash 3.2 due to strict-mode interactions)
Version box alignment test - replaced locale-dependent awk with bash ${#} for consistent width measurement
Docker start failure test - mock docker now fails for both start and run, handling macOS CI TTY recovery edge case

Changes #

Portable _md5 helper and _portable_timeout moved to shared test/setup.bash
CI diagnostic step verifies binary runs on macOS bash 3.2 before test suite
538 (+9) behavioral tests (46 docker_commands, 13 helpers, 9 update)
20 mutations caught
All 538 tests green on Linux (bash 5) and macOS (bash 3.2)

v0.9.0 #

2026-04-13

GitHub CLI capability + faster, lighter Docker image.

New gh capability gives the container access to your GitHub CLI auth - gh auth login inside any container writes tokens back to the host, so you authenticate once and it persists across rm, nuke and rebuild. The Docker image switches to node:20-bookworm-slim, drops vim and build-essential and adds pre-built image pull support for faster first starts.

Features #

gh capability - cleat config --enable gh mounts ~/.config/gh (read-write) into the container. gh auth login works via the browser bridge, tokens persist to the host. GH_TOKEN via --env or .cleat.env works as an alternative
Pre-built image pull - cleat start tries docker pull ghcr.io/cleatdev/cleat:latest before falling back to local build. cleat update also pulls the latest image after updating the CLI

Changes #

Docker image switched from debian:bookworm-slim to node:20-bookworm-slim - Node.js pre-installed, faster layer caching
Removed vim and build-essential from image - smaller footprint, users can apt install if needed
GitHub CLI (gh) pre-installed in the image via official apt repository
Docker stub handles pull and tag commands for test coverage
529 (+5) behavioral tests (45 capabilities, 33 smoke)
20 mutations caught

v0.8.1 #

2026-04-13

Fix arrow-up history leaking across projects.

The v0.8.0 per-project session overlay isolated projects/-workspace/ but missed ~/.claude/history.jsonl - the global input history file shared via the base ~/.claude mount. Arrow-up in Claude showed commands from other projects. Now history.jsonl is overlaid per-project alongside sessions.

Fixes #

History isolation - overlay history.jsonl with a per-project copy from the same session directory used for projects/-workspace, so each project has its own arrow-up history
macOS virtiofs compatibility - ensure ~/.claude/history.jsonl exists on the host before mounting the nested overlay (virtiofs rejects nested mounts when the target file is missing inside the parent bind source)

Changes #

524 (+4) behavioral tests (48 regressions, 31 smoke)
20 (+1) mutations caught

v0.8.0 #

2026-04-12

Per-project session isolation - each project gets its own Claude history.

Previously, all containers shared a single ~/.claude directory. cleat resume for project A showed Claude's conversation history from project B. Now each container mounts a per-project session directory so sessions, tasks and project memory are isolated. Auth and global settings remain shared.

Features #

Per-project sessions - each project's Claude Code sessions are stored under ~/.claude/projects/<basename>-<hash>/ on the host, mounted as an overlay at /home/coder/.claude/projects/-workspace inside the container. --continue finds the correct latest session for each project
Hash-based session keys - project paths are mapped to <lowercase-basename>-<md5-8char> keys, avoiding collisions from similar path names and normalizing case for macOS HFS+ compatibility

Fixes #

Session key collision - the initial tr '/' '-' approach mapped /a-b/c and /a/b-c to the same key, mixing sessions between unrelated projects. Switched to hash-based keys
macOS case sensitivity - session key basename is lowercased so MyProject and myproject share the same key on case-insensitive filesystems

Changes #

cmd_rm and cmd_nuke preserve session directories on the host (only container temp dirs are cleaned)
520 (+9) behavioral tests (19 mutations caught)

v0.7.0 #

2026-04-12

511 tests. Zero regressions ever again.

Three independent test layers - regression registry, real-binary smoke tests and a hardened Docker stub - now catch every class of bug that previously shipped undetected. The test suite grew from 383 to 511 tests, all pre-existing failures were fixed and every regression test is mutation-verified to prove it catches its target bug.

Features #

Regression test registry - 45 tests, one per historical bug from v0.5.1 through v0.6.5, each mutation-tested to confirm it catches the exact failure when the fix is reverted
Real-binary smoke tests - 29 tests that exec bin/cleat as a subprocess under full set -euo pipefail, catching strict-mode bugs that sourced unit tests miss
Hardened Docker stub - opt-in DOCKER_STUB_STRICT=1 validates bind mount sources exist. DOCKER_STUB_SIMULATE_VIRTIOFS=1 reproduces the macOS Docker Desktop nested-mount failure that caused v0.6.5
Edge-case test suite - 33 tests for hostile inputs: paths with $, &, backticks, unicode, broken symlinks. Env values with =, spaces, quotes. Config files with CRLF, BOM, comments-only. Docker exit codes 125/127/137
Mutation test harness - test/mutation_regressions.sh applies 18 targeted mutations and verifies each regression test fails, portable across GNU and BSD sed
GitHub Actions CI - lint + full suite on Ubuntu (bash 5) and macOS (bash 3.2 via /bin/bash) + real-Docker integration tests, with timeouts on every job
Integration test framework - test/integration/ runs against a real Docker daemon (skips gracefully when unavailable), covering full container lifecycle and env passthrough end-to-end

Fixes #

Corrupted update cache crash - check_for_update passed a non-numeric last_check from a garbled cache file into an arithmetic expression. Under set -u this crashed the CLI with "unbound variable". Added regex guard defaulting to 0
Project overlay mount for missing settings files - cmd_run wrote empty {} overlay and bind-mounted to /workspace/.claude/settings.json for host files that didn't exist, failing on macOS Docker Desktop virtiofs with "outside of rootfs". Now only mounts files that exist
Partial container cleanup on docker run failure - a failed docker run could leave a half-created container that blocked the next attempt. Now force-removes on failure
Test HOME isolation - tests no longer touch the developer's real ~/.gitconfig or ~/.claude/settings.json. HOME is redirected to a temp directory per test, fixing 5 pre-existing test failures

Changes #

511 (+128) behavioral tests across 23 files (was 383 across 19)
Container name tests cover shell metacharacters ($, &, ;, ` ``), unicode and very long paths
test/setup.bash isolates HOME, injects git author env vars, documents the strict-mode trade-off

v0.6.4 #

2026-04-10

cleat login actually works - OAuth callback proxy fixed for IPv6, stdin EOF and busy ports.

The browser bridge's OAuth callback proxy had three latent bugs that made cleat login fail silently in most real-world setups. Authentication completed inside the container, but the browser either hung or showed a spurious "callback forwarding failed" page. All three root causes are now fixed with diagnostic logging so future regressions are visible.

Fixes #

Callback reached dead socket (IPv6 vs IPv4) - Node.js inside the container binds Claude Code's callback HTTP server to ::1 when given "localhost", but socat defaults to 127.0.0.1. Every connection was refused. The proxy now tries TCP6:localhost:PORT first, falls back to TCP:localhost:PORT
Browser saw "callback forwarding failed" after a successful login - socat - propagated stdin EOF to the TCP side and exited before reading the 302 response. The proxy reported success as failure. Fixed by using socat -,ignoreeof so the TCP read continues until the server closes the connection. Connection: keep-alive in the browser request is also rewritten to Connection: close so the server actually closes after responding
Proxy gave up silently when the callback port was temporarily in use - bind failures (EADDRINUSE) exited immediately with no log line. Both the socat and python3 paths now retry the bind up to 30 times (one per second) and the python3 path sets SO_REUSEPORT on supporting systems
Zero diagnostic output when the proxy failed - every error was suppressed with 2>/dev/null. The proxy now writes to /tmp/cleat-clip-<container>/.proxy-log with timestamps, protocol used, bind attempts, connection acceptance, bytes forwarded and exit status
Fallback success page on timeout or empty response - when the docker-exec forwarder timed out or produced no response body despite rc=0, the browser got a generic HTTP 502 page even though the code had been delivered. The proxy now sends a styled "Authentication Successful" page in those cases

Changes #

Active capability names now render in green in the startup summary and the first-run caps line (matches the green ✔ success glyph)
Browser watcher waits 500ms (was 200ms) after starting the proxy before opening the browser, giving the bind retry loop a better chance to succeed on the first try
381 (+2) behavioral tests (75 hooks)

v0.6.3 #

2026-04-09

Environment variables work everywhere - shell, login and exec all respect .cleat.env.

Previously, env vars from .cleat.env were only passed at container creation time (docker run). Sessions entered via cleat shell, cleat login, or resumed containers didn't see them. Now all entry points resolve env vars at exec time, so changes to .cleat.env take effect immediately without recreating the container.

Fixes #

Env vars missing in cleat shell - cmd_shell didn't call resolve_env_args or pass _RESOLVED_ENV_ARGS to docker exec. Env vars from .cleat.env were invisible in the shell session
Env vars missing in cleat login - same issue as shell. Custom API endpoints or credentials in .cleat.env weren't available during authentication
Env vars missing after container restart - exec_claude only passed HOME and PATH to docker exec, not the resolved env args. Values added to .cleat.env after container creation were silently dropped
cleat shell missing PATH - used only -e HOME=/home/coder instead of the full CLAUDE_ENV array, so ~/.local/bin wasn't on PATH
Env summary showing 0 vars as empty - when .cleat.env existed but contained only comments, the startup summary omitted the line entirely instead of showing 0 from .cleat.env
Env file missing last line - _parse_env_file skipped the final line when it had no trailing newline

Changes #

--env, --env-file and --cap flags now apply to shell and login commands (previously only start, run, resume, claude)
379 (+9) behavioral tests

v0.6.2 #

2026-04-07

Startup diagnostics - see why containers fail, fix them in one keystroke.

After a reboot or Docker restart, stale containers often refuse to start. Previously you'd see "Container failed to start" with no explanation. Now the CLI shows Docker's actual error message and offers to remove and recreate the container automatically.

Features #

Startup failure diagnostics - docker run, docker start stderr is captured and displayed when a container fails to start, showing the actual Docker error (e.g., mount conflicts, network issues, OCI runtime failures)
Interactive recovery prompt - when docker start fails in a TTY, the CLI asks "Remove container and start fresh? [Y/n]" and auto-recreates on confirmation. Non-TTY mode shows a cleat rm hint instead

Fixes #

Settings overlay directory collision - after cleat rm, Docker's leftover mount targets could turn settings.json into a directory, causing "Is a directory" errors on subsequent starts. The overlay dir is now wiped clean before each docker run
Quoted tilde in project path - summary block showed '~'/Workspaces/project instead of ~/Workspaces/project

Changes #

370 (+2) behavioral tests (73 hooks, 44 config, 12 installer)

v0.6.1 #

2026-04-07

Browser bridge fix - URLs open reliably again.

Fixes #

Browser bridge not opening URLs - v0.6.0 pre-initialized the file timestamp to skip stale URLs, but same-second writes caused new URLs to be silently dropped. Now deletes the stale file instead so every new write is detected

Changes #

368 (+1) behavioral tests (73 hooks, 44 config, 12 installer)

v0.6.0 #

2026-04-06

Interactive config, polished UI, battle-tested hooks.

TUI capability picker with keyboard navigation. Hooks and browser bridge hardened against stale session data. Notice boxes render cleanly at any width.

Features #

TUI config picker - cleat config now uses arrow keys to navigate, space to toggle, enter to save, q to cancel. Falls back to text mode in non-TTY environments

Fixes #

Browser bridge replaying old URLs - watcher was opening URLs left over from previous sessions on startup
Hook bridge replaying old events - event watcher was re-executing hooks from prior sessions on every start
Project overlay creating .claude/ as root - Docker created the directory on the host when it didn't exist, causing permission errors on first start
Notice box alignment - drift and update banners had misaligned borders when version strings varied in length

Changes #

Dynamic notice boxes - config drift, image version and update banners use a shared _notice_box helper with auto-calculated width
367 (+16) behavioral tests (72 hooks, 44 config, 12 installer)

v0.5.2 #

2026-04-06

Hooks just work - no container recreation needed.

Adding, changing, or removing hooks in your project or global settings takes effect immediately on resume or claude attach. No more cleat rm required.

Features #

Automatic project overlay mounts - project-level settings overlays are always created at container startup (even when no hooks exist yet), so hooks added later take effect via the existing bind mount
cleat claude refreshes overlays - attaching to a running container now refreshes project-level settings overlays, picking up any hook changes
cleat resume handles all hook states - overlay refresh now correctly handles hooks added, changed, or removed between sessions

Changes #

351 (+4) behavioral tests (70 hooks tests, 12 installer tests)

v0.5.1 #

2026-04-05

Simplified hooks - your hooks, running on your host.

Hooks capability redesigned: no custom loggers or injected settings. When enabled, your existing Claude Code hooks from all three settings locations run on the host via the bridge watcher.

Fixes #

Project hooks not firing - project-level hook overlays were stripping hooks instead of replacing them with event forwarders, so Claude Code saw no hooks and no events were forwarded to the host bridge
cleat claude ignoring project hooks - cmd_claude did not set the resolved project path, causing the hook bridge to look for project hooks in the wrong directory
cleat resume not refreshing project overlays - resume only refreshed the global settings overlay. Project-level hook changes between sessions were ignored
Update banner shown incorrectly - version comparison used string inequality instead of semver sort, so the banner could appear when the local version was already newer

Changes #

Simplified hooks - removed cleat-hook-logger, entrypoint hook injection, cleat hooks command and CLEAT_NO_HOOKS env var
Settings overlay with forwarder - when hooks ON, hook commands are replaced with an event forwarder in the overlay instead of being stripped. The bridge reads forwarded events and runs the originals on the host
Project-level hook support - hooks from .claude/settings.json and .claude/settings.local.json are also forwarded to the host bridge
Cleaner entrypoint - no longer modifies project directories or creates .claude/settings.local.json
Installer fix - protected all spinner-wrapped operations from silent exits under set -euo pipefail (update path, fresh install checkout, tag resolution)
347 (−9) behavioral tests (66 hooks tests, 12 installer tests)

v0.5.0 #

2026-04-05

Hooks, browser bridge and host connectivity.

Claude Code hooks work transparently - host-defined hooks run on the host, container events are logged to JSONL. Browser URLs from inside the container open on the host with OAuth callbacks proxied back. host.docker.internal is always available.

Features #

Host hook execution - host-defined hooks in ~/.claude/settings.json are stripped from the container (settings overlay) and executed on the host via a bridge watcher, with event JSON on stdin and matcher support
Hook event logging - cleat-hook-logger ships in the Docker image. Entrypoint auto-configures Claude Code to log 13 event types to /var/log/cleat/hooks.jsonl
cleat hooks command - pretty-printed event timeline with --json, --follow and --clear flags
hooks capability - opt-in event logging via cleat config --enable hooks or --cap hooks
Browser bridge - open, xdg-open and sensible-browser shims inside the container forward URLs to the host browser (auth flows, OAuth, etc.)
OAuth callback proxy - browser watcher detects redirect_uri in auth URLs and starts a TCP proxy (socat or python3) from host to container via docker exec, so OAuth callbacks reach Claude Code's HTTP server inside Docker
cleat login browser bridge - login command starts the browser watcher so the auth URL opens automatically and the callback is proxied back
Host connectivity - --add-host host.docker.internal:host-gateway always added on Linux. Docker Desktop detection skips when already provided. No capability needed
Concurrent write safety - flock-based file locking prevents interleaved JSONL lines from parallel hooks

Fixes #

Settings overlay - ~/.claude/settings.json is mounted with hooks stripped so host-only commands (e.g. osascript) don't fail inside the container. It falls back to empty {} if jq unavailable
Resume refreshes overlay - cleat resume refreshes the settings overlay so hook changes between stop/resume take effect
Container cleanup - cleat rm removes hooks, clipboard and settings overlay temp directories
Entrypoint resilience - hook injection failures no longer prevent container startup
Hook timeout - host hook commands timeout after 30s to prevent bridge hangs
Process safety - hook bridge tracks and reaps child processes. Cleanup kills all children on session exit. wait after every kill to reap disowned children
Spinner orphan on Docker failure - docker start, docker run and docker build protected with || rc=$? so set -euo pipefail cannot kill the script before spin_stop runs. Global EXIT trap as defense-in-depth
Login failure cleanup - docker exec ... claude login protected with || rc=$? so browser watcher is always killed even if login fails or user cancels

Changes #

356 (+87) behavioral tests (87 hooks/bridge/safety tests covering event forwarding, host hook execution, browser bridge, OAuth proxy, settings overlay, spinner orphan, process safety, capability gating)
Hook settings injected into .claude/settings.local.json (project-local, gitignored)
Docker image includes /var/log/cleat, cleat-hook-logger and open-bridge shims
Source-level regression guard greps for unprotected docker commands in spin contexts

v0.4.0 #

2026-03-29

Unified terminal design system with spinners and clean output.

No Docker noise. Concise status lines with color, braille spinners and suppressed boilerplate.

Features #

Terminal design system - unified symbols (✔ ▸ ! ✖), 8-color ANSI palette and formatting rules shared across CLI and installer
Braille spinner - 10-frame animation at 80ms/frame for slow operations, with ASCII fallback for non-Unicode terminals
Clean startup sequence - step-by-step checkmarks: Image ready, Container started, Auth shared, Claude launched
Summary block - post-launch output showing container name, project path, active capabilities and env var counts
Docker output suppression - build logs hidden on success, shown on failure. Container IDs and promo text removed
Clean exit - ✔ Session ended - resume with: cleat resume. Docker promo text and Terminated messages suppressed
TTY detection - spinners degrade to static ▸ lines when stdout is not a terminal

Fixes #

Clipboard watcher cleanup - trap on TERM/INT/HUP and disown prevent Terminated messages
Cursor restoration - spinner cleanup restores cursor visibility on unexpected exit via EXIT trap

Changes #

269 (+53) behavioral tests (12 new for terminal UX and output suppression)
Terminal design system documented in concept/12-terminal-design-system.md

v0.3.0 #

2026-03-29

Opt-in capabilities for git, SSH and environment variables.

Extend what the container can access from the host. All disabled by default - the baseline sandbox is unchanged.

Features #

cleat config wizard - interactive mode to toggle capabilities. Direct mode with --enable, --disable, --list
git capability - mount ~/.gitconfig read-only so commits use your host identity
ssh capability - mount ~/.ssh read-only with SSH agent forwarding for private repos
env capability - auto-load env vars from ~/.config/cleat/env (global) and .cleat.env (project)
Session-scoped overrides - --cap, --env KEY=VALUE, --env-file PATH CLI flags
Configuration drift detection - config fingerprint stored as Docker label. Warns when config changes after container creation
Image version detection - suggests cleat rebuild when CLI and image versions diverge
Project-level config - cleat config --project saves to <project>/.cleat, merged with global

Fixes #

Bash 3.2 compatibility - removed associative arrays (local -A) that require bash 4.0+
Empty array expansion - protected against set -u failures on empty arrays in bash < 4.4
Env resolution - replaced grep/sed pipeline that silently exited under set -euo pipefail

Changes #

216 (+95) behavioral tests (95 new for capabilities, config, hardening, bash compat)
21 mutation tests - all mutations caught
Source-level scans for forbidden bash 4+ patterns
Strict-mode regression tests that run the actual binary

v0.2.0 #

2026-03-28

BATS test suite with 121 behavioral tests.

Features #

Test suite - 121 tests covering every CLI command, clipboard shim, container naming, update logic and Docker entrypoint. 12/12 code mutations caught
Test runner - ./test.sh with per-file summary, skip counts, timing and failure details
Sourceable CLI - bin/cleat can be sourced without running main, enabling direct function testing

Changes #

BATS framework (bats-core, bats-assert, bats-support) added as git submodules
Docker stub with file-based mock responses and function-override mocks
14 test files covering all CLI surface area

v0.1.0 #

2026-03-25

Docker sandbox for AI coding agents. One command. Your host stays untouched.

Features #

One command - cleat builds the image, starts a per-project container and launches Claude Code with full permissions
Per-project isolation - each project gets its own container, run as many as you need in parallel
Session persistence - cleat stop and cleat resume pick up where you left off
Zero permission issues - host UID/GID mapped into the container automatically
Clipboard bridge - pbcopy, xclip, xsel shims copy to host clipboard via OSC 52
Shared auth - ~/.claude mounted into all containers, log in once
Auto-upgrade notifications - daily lightweight tag check, never blocks your workflow
Security hardening - --pids-limit 1024, --memory 8g, numeric UID/GID validation, Debian slim base