Changelog

Release history for the Cleat CLI.

v0.12.3 #

latest

Fixes a long-standing cleat start failure mode — when /tmp rotated individual overlay files but kept the parent dir, the existing container would fail to start with an opaque OCI runtime "not a directory" error instead of cleanly recreating.

User declined the drift recreate prompt, then cleat start aborted with error mounting "/host_mnt/private/tmp/cleat-settings-/project-settings.local.json" ... not a directory: Are you trying to mount a directory onto a file (or vice-versa)?. Docker auto-creates a missing bind source as a directory, which can't mount onto a file destination inside the container. The pre-fix stale-mount detection only checked dir existence, so this partial-rotation state slipped past the gate and into docker start.

Fixes #

  • Stale-mount detection covers per-file rotation new _settings_overlay_intact helper enumerates the container's bind sources via docker inspect and verifies each source inside /tmp/cleat-settings- is a regular file before docker start. cmd_start auto-recreates and cmd_resume errors out with "host paths changed" guidance when any expected file is missing or the wrong type — the dir-only check would let this state through to an opaque OCI runtime failure
  • cmd_resume stale-mount message generalized "host was rebooted" → "host paths changed" so the message covers reboots AND partial-/tmp-rotation cases the new check catches

Changes #

  • 671 (+1) behavioral tests across 24 files — 1 new regression test in regressions.bats simulates the partial-rotation state (overlay dir present + one referenced file missing) and asserts cmd_start auto-recreates instead of falling through to docker start
  • 40 (+1) mutations caught — new v0.12.3_overlay_intact_per_file_check mutation deletes the per-file regular-file check from _settings_overlay_intact and confirms the regression test fails
  • resume: stale mounts show clear error directing to cleat start updated to match the new wording

v0.12.2 #

Fixes a v0.12.1 papercut — the new drift recreate prompt rendered as garbled \033[1m...\033[0m literals instead of the intended bold container name. Also fixes a CI-only flake in _hook_bridge_cleanup that was masking real failures under ./test.sh.

The prompt was shipped using echo -n, which prints backslash escapes verbatim. ${BOLD} and ${RESET} are ANSI escape strings that need echo -e to be interpreted. Users hitting the prompt saw Recreate \033[1mcleat-foo\033[0m now? [Y/n] instead of the intended bold container name.

Fixes #

  • Drift recreate prompt renders ANSI escapes _resolve_config_drift switched from echo -n to echo -en so ${BOLD} and ${RESET} are interpreted as ANSI escape sequences instead of printed verbatim
  • Flaky _hook_bridge_cleanup test the test backgrounded sleep 60 & directly in the bats shell, so when _hook_bridge_cleanup killed the children, bats's DEBUG trap raced bash's SIGCHLD reaper and bash emitted wait_for: No record of process to stderr — flipping the suite to "1 failed" under ./test.sh while passing in isolation. Disowned the backgrounded sleeps so bats's job table doesn't reference them. Also tightened the test: it was iterating _HOOK_BRIDGE_CHILDREN *after* _hook_bridge_cleanup zeros it, so the kill -0 assertions never ran — now snapshots the PIDs into locals first

Changes #

  • 670 (+1) behavioral tests across 24 files — 1 new regression test in regressions.bats asserts the prompt output contains no literal \033[1m / \033[0m substrings
  • 39 (+1) mutations caught — new v0.12.1_drift_prompt_ansi mutation reverts echo -en back to echo -n and confirms the regression test fails

v0.12.1 #

Drift detection now prompts to recreate the container interactively instead of just printing a notice — closes the most common UX gap users hit after cleat config --enable followed by cleat.

The fingerprint-based drift detection landed in v0.3.0 already covered cap and env-key changes, but the response was a static "Run: cleat rm && cleat" notice. Users who enabled hooks (or any other cap) on an existing container kept the old mount set and silently saw nothing change — hooks never fired, env vars were missing, etc. Now cleat, cleat resume, and cleat claude ask "Recreate now? [Y/n]" before any docker operation. Sessions persist on the host (~/.claude/projects//) and survive the rebuild, so accepting is safe by default.

Fixes #

  • Auto-prompt drift recreate new _resolve_config_drift helper invoked early in cmd_start, cmd_resume, and cmd_claude (before any docker operation). On a TTY with config drift it prompts the user, stops + removes the container, cleans /tmp/cleat-{hooks,clip,settings}-, and falls into the existing "no container" path so cmd_run rebuilds with the new caps/env. Non-TTY runs (CI, scripts) print the legacy informational notice and continue with the existing container — Cleat never auto-destroys without explicit consent
  • Flaky docker test stub test/fixtures/mock_bin/docker routed ps calls to ps_a_output whenever *"-a"* matched the args as a substring. Container names of the form cleat-- carry a literal -a whenever the random hash starts with hex digit a — roughly 1 in 16 runs flipped is_running to falsely report true on brand-new containers and broke docker_commands.bats on Ubuntu CI. Switched to the same token-bounded pattern ( -a | --all ) the test-side mocks already use

Changes #

  • check_drift is now image-version-only — config drift moved into _resolve_config_drift and called earlier in the lifecycle so the prompt fires before docker start of a stale container
  • 669 (+8) behavioral tests across 24 files — 5 new drift-resolution unit tests in capabilities.bats (no-container no-op, hash-match no-op, non-TTY notice, TTY accept removes container, TTY decline keeps container), 3 new stub-routing tests in stub_validation.bats (token-bounded -a , --all , default-to-ps), 1 new regression test in regressions.bats pinning the wiring of _resolve_config_drift from cmd_start
  • 38 (+1) mutations caught — new v0.12.1_drift_recreate_wired mutation drops the _resolve_config_drift call from cmd_start and confirms the regression test fails
  • regression v0.5.1 updated to also mock docker ps -a so the new cmd_run fallback in cmd_claude (post-drift recreate) doesn't mask the original _RESOLVED_PROJECT assertion
  • _resolve_config_drift no-op stub added to test files that exercise cmd_start / cmd_resume / cmd_claude (hooks, regressions, edge_cases, terminal_ux, capabilities, start_resume, docker_commands)
  • Documentation: concept/10-capabilities.md "Drift notice" section reflects the interactive flow; cli/README.md "Configuration drift detection" section now describes the prompt instead of the old static notice

v0.12.0 #

aws and gcloud caps round out cloud CLI coverage, and the post-launch caps display now groups capabilities by behavioral category — same UI in the CLI and on the landing page.

az introduced the lazy-install framework in v0.11.0; aws and gcloud reuse it. The summary block previously rendered active caps as a single inline row, which scaled poorly past four or five names. The new categorized renderer breaks them into mount / cloud / sandbox lines with consistent colour coding so the categorization itself teaches users what each cap actually does.

Features #

  • aws capability cleat config --enable aws or cleat --cap aws mounts ~/.aws (read-write) so aws configure and SSO sessions persist on the host. AWS CLI v2 (~150 MB) is lazy-installed inside the container on first activation from the official awscli-exe-linux-{x86_64,aarch64}.zip bundle, with architecture auto-detected via dpkg --print-architecture
  • gcloud capability cleat config --enable gcloud or cleat --cap gcloud mounts ~/.config/gcloud (read-write) so gcloud auth login credentials persist on the host. The Google Cloud SDK (~200 MB) is lazy-installed via Google's official Debian repo at packages.cloud.google.com, pinned by GPG keyring at /etc/apt/keyrings/cloud.google.gpg
  • Categorized caps display _print_caps groups active caps into mount (green: git, ssh, env, hooks, gh), cloud (blue: az, aws, gcloud), and sandbox (amber: docker). Single-line form when only one category is active, multi-line block with category labels and inline notes ((lazy install), (breaks isolation)) when caps span two or more categories. The same renderer drives both the post-launch summary block and cleat status

Changes #

  • KNOWN_CAPS adds aws and gcloud; both join LAZY_CAPS so the existing _run_lazy_installs machinery picks them up automatically. _lazy_cap_label and _lazy_cap_probe get the new entries; _cap_description gains the picker copy
  • _cap_category is the single source of truth for the mount/cloud/sandbox mapping. Adding a new cap means deciding which category it falls into; the renderer handles the rest
  • 661 (+29) behavioral tests across 24 files — 12 new aws/gcloud unit tests in capabilities.bats (mounts, registration, descriptions, install paths), 13 new categorization-display unit tests (_cap_category, _caps_bucket_active, _print_caps single-line / multi-line / empty-category branches), 4 new smoke tests for --cap aws / --cap gcloud flag parsing and config round-trips
  • Landing page Hero, ProblemSolution, HowItWorks, and Features mockups updated to render the new categorized output 1:1 with the CLI — same labels, same colours, same indentation
  • Full design in concept/10-capabilities.md (new aws and gcloud cap sections + Display categories table); docs/cli.md and cli/README.md updated to match

v0.11.0 #

az capability and a reusable lazy-install framework — opt-in tools too large to ship in the base image now install inside the container on first activation, with auth dirs persisted on the host.

The gh and docker caps already pre-install their CLIs in every image. That doesn't scale to cloud-vendor tooling: azure-cli is ~250 MB, awscli ~80 MB, google-cloud-cli ~200 MB. Pre-installing all of them would inflate the image for every user. The new lazy-install framework keeps the base image lean and pushes the cost to users who actually opt in. az is the first cap to use it; aws and gcloud are queued to follow the same pattern.

Features #

  • az capability cleat config --enable az or cleat --cap az mounts ~/.azure (read-write) so az login tokens persist on the host across cleat rm, cleat nuke, and cleat rebuild. Same auth-persistence model as gh
  • Lazy-install framework caps listed in the new LAZY_CAPS registry have an install script at cli/docker/cap-installs/.sh. After docker run -d, cleat probes the container with command -v ; if absent, it runs the install script via docker exec --user root with a spinner. Subsequent starts hit the fast path and skip entirely. Aborts cleat on install failure rather than silently launching a half-broken environment
  • Per-container install scope the tool itself lives inside the container (lost on cleat rm, preserved across cleat resume and Docker daemon restarts). Auth dirs always bind-mount from the host, so credentials survive every container lifecycle operation
  • Audit-friendly install path cap-installs/az.sh spells out the apt repo + GPG keyring steps explicitly (Microsoft's official Debian 12 repo at packages.microsoft.com) rather than piping aka.ms/InstallAzureCLIDeb to bash, so each step is reviewable

Changes #

  • KNOWN_CAPS adds az; the config picker, cleat config --list, and --cap validation pick it up automatically
  • _run_lazy_installs is invoked from cmd_start, cmd_resume, and cmd_claude before exec_claude — so any path that launches Claude in a container with a lazy cap active gets the install
  • _lazy_cap_is_installed is exposed as an override point so tests can simulate both the missing-tool and present-tool branches without an actual command -v round-trip
  • 632 (+12) behavioral tests across 24 files — 9 az-cap unit tests in capabilities.bats (mounts, registration, description, install/skip/no-op/failure paths), 2 smoke tests in smoke.bats (--cap az --help, cleat config --enable az round-trip)
  • 36 mutations caught
  • Full design in concept/10-capabilities.md → "Lazy install capabilities" section + dedicated az cap section

v0.10.1 #

First-run no longer rebuilds locally when a transient pull error hides an already-cached prebuilt image.

_do_pull always issued a network pull against GHCR even when the version-tagged prebuilt image was already on disk. A transient registry, network, or auth error there flipped the image into "unavailable" and triggered a 2-5 min local rebuild — even though the prebuilt image was sitting in the local image store waiting to be reused.

Fixes #

  • Reuse cached prebuilt image without a network call _do_pull short-circuits when ghcr.io/cleatdev/cleat:v${VERSION} is already in the local image store. It retags as cleat and returns success without touching the registry. Eliminates spurious "Prebuilt image unavailable, building locally" warnings from transient network/auth blips
  • Cache-hit retag failure falls through to network pull if docker tag fails after a cache hit (disk full, permission, weird image-store state), _do_pull prints a yellow ! Cached prebuilt image found but could not be tagged warning and falls through to the normal pull flow, instead of silently lying about success

Changes #

  • Cache-hit success message includes image size for parity with the post-pull message: Image ready (cached v0.10.1, 487 MB)
  • Docker test stub gained DOCKER_TAG_EXIT_CODE env var and a cached_images mock file backing docker image inspect for testing the cache short-circuit and tag-failure paths
  • 620 (+4) behavioral tests (1 cache-hit unit test in docker_commands.bats, 2 regression tests in regressions.bats)
  • 37 (+2) mutations caught — short-circuit revert + tag-failure-fallthrough revert

v0.10.0 #

Host Docker daemon access + workspace trust — test dockerized apps from inside the sandbox, without letting untrusted .cleat files silently escalate capabilities.

Two interlocking features. The docker capability mounts your host's Docker socket so docker compose up, docker compose exec, and docker build all work against your real daemon from inside Cleat — sibling containers, zero overhead, no DinD. Workspace trust then hardens every capability against supply-chain attacks by gating project-level .cleat files through a per-project approval prompt, so cloning a random repo can no longer silently grant sandbox-escaping Docker access.

Features #

  • docker capability cleat config --enable docker or cleat --cap docker mounts /var/run/docker.sock so the container's docker CLI talks to your host daemon. Containers launched from inside Cleat run as siblings on the host, not nested — zero virtualization overhead
  • Host-path identity mount when the docker cap is active, the project is bind-mounted at its host path (in addition to /workspace) and --workdir is set there. $(pwd) returns a host-valid path, so docker run -v $(pwd):/app, docker build ., and docker-compose.yml relative paths all resolve correctly on the host daemon. CLEAT_HOST_PROJECT is exported for scripts
  • Docker CLI + compose in the image docker-ce-cli and docker-compose-plugin are installed in the container (no daemon). Entrypoint stats the mounted socket's GID and adds coder to a matching group so the user can actually talk to the daemon
  • Workspace trust a project's .cleat file is now gated through per-project approval. On first launch Cleat prompts with a box listing the requested capabilities; approval is stored at ~/.config/cleat/trust (mode 0600), keyed on a hash of the canonical (sorted, deduped) cap list. Comment edits and cap reordering don't invalidate trust; adding, removing, or changing a cap triggers a re-prompt
  • Scripting escape hatches --trust-project global flag and CLEAT_TRUST_PROJECT=1 env var bypass the prompt and record approval in one step. Non-interactive contexts without either opt-in silently default-deny project .cleat caps (global config and --cap CLI flags still apply) — the supply-chain protection
  • cleat trust / cleat untrust subcommands cleat trust [path] records approval non-interactively, cleat trust --list shows trusted projects with a yellow marker for ones whose .cleat has drifted since approval, cleat untrust [path] removes approval. Safe no-op on missing or unknown paths
  • Docker-cap startup warning when the cap is active, startup prints a yellow ! Docker socket mounted — container can create host-level processes line so the tradeoff is never silent
  • cleat status in readonly trust mode never prompts and never modifies the trust file, regardless of TTY state. Safe for scripts that pipe through status

Fixes #

  • cleat resume after cleat rm previously errored out with "No container found" because cmd_resume refused to create a container. Sessions persist on the host at ~/.claude/projects//, so the right behavior is to create a fresh container and launch Claude with --continue. Now it does. User visible: cleat rm && cleat resume just works and picks up the last session
  • cleat rm hint adds a dim trailing line making it explicit that sessions are preserved: Sessions preserved. Run cleat resume to pick up where you left off.
  • Session overlay under --cap docker with the docker cap, workdir is the host project path (not /workspace), so Claude Code encodes its session dir from that path (/Users/marcin/projprojects/-Users-marcin-proj/) instead of the v0.8.0-assumed projects/-workspace/. Without a second overlay, sessions split between two host dirs. The docker cap block now mounts the per-project session dir at the host-path-derived key too, so sessions always land in the same per-project overlay regardless of which cap was active when they were created

Changes #

  • docker and gh capability descriptions rewritten in pure ASCII (no em-dashes) so _notice_box renders with correct alignment across POSIX and UTF-8 locales
  • _hash_cleat_caps pipes through awk '{print $1}' to strip md5sum's stdin-filename suffix — keeps the trust file hex-only
  • Trust file writes are atomic (temp + rename), 0600 permissions, refuse paths containing tab / newline / carriage return to protect the format
  • Session-scoped decision cache avoids double-prompts when resolve_caps is called multiple times per invocation
  • New test/unit/trust.bats (42 tests) plus 8 docker-cap tests in capabilities.bats, 8 new smoke tests, 2 new terminal-UX tests, 12 new regression tests in regressions.bats, and 10 new mutation entries
  • 616 (+74) behavioral tests across 24 files
  • 35 (+7) mutations caught — all with a real revert confirming the test fails
  • Full design documented in concept/15-docker-capability.md and concept/16-workspace-trust.md

v0.9.2 #

First-run now pulls the prebuilt image from GHCR instead of building locally, plus live pull progress and terminal-output polish.

Fresh installs were always supposed to get the ~30s GHCR pull before falling back to a local build, but cmd_run's missing-image branch called the build function directly, skipping the pull entirely. Every clean install was paying the 2-5 min build cost even though a matching prebuilt image was waiting. The pull tag is also version-matched to the installed CLI now, and the pull UX shows live layer progress instead of a silent spinner.

Features #

  • Live pull progress _do_pull parses docker pull's line-per-event output to show N/M layers in real time on a single live-updating line, ending with the pulled version and image size (Image ready (pulled v0.9.2, 450 MB)). Non-TTY contexts get a single info line plus the success line

Fixes #

  • First run skipped the prebuilt image pull cmd_run's missing-image branch called _do_build directly; only cleat build hit _do_pull. Result: every clean install paid a 2-5 min local build cost even though ghcr.io/cleatdev/cleat already had a matching image. Fix: _do_pull || _do_build on first run
  • Registry image tag was not version-matched REGISTRY_IMAGE was hardcoded to :latest, meaning an installed v0.9.1 CLI would silently pull whatever shipped last to GHCR. Now :v${VERSION}, with a new REGISTRY_BASE so cmd_update can target the freshly-checked-out tag
  • Installer printed literal \033 escape codes install.sh's spin_stop used printf %s which passes backslash escapes through unchanged; messages built with ${BOLD}...${RESET} showed up as Downloaded to \033[1m/Users/you/.cleat\033[0m. Fix: %b to interpret escapes in the arg
  • Installer spinner left trailing text on shorter success lines \r alone rewound the cursor without clearing the rest of the line, so Pinned to v0.9.1 overwriting Checking out latest release... produced Pinned to v0.9.1est release.... Fix: \r\033[K to clear to end of line
  • bin/cleat spinner had the same tail bug Container started (17 chars) overwriting Starting container... (21 chars) left r... visible in the dim spinner color. Same \r\033[K fix

Changes #

  • Mutation runner (test/mutation_regressions.sh) now accepts an optional target file parameter so companion scripts like install.sh can be mutation-tested, not just bin/cleat
  • README.md: install URL now https://cleat.sh/install (short, branded, resolved to the latest tagged release by the Cloudflare worker) instead of raw GitHub on main
  • README.md: requirements section now lists Pro, Max, Team, Enterprise plans, and API keys (was incorrectly "team or Pro plan")
  • README.md: first-run timing text reflects the pull-first flow (~30s pull with ~2 min local-build fallback)
  • 542 (+4) behavioral tests (52 regressions)
  • 25 (+5) mutations caught

v0.9.1 #

macOS hardening — full bash 3.2 compatibility, 538 tests green on both platforms.

Config drift detection, the config command, and all smoke tests were broken on macOS due to GNU-only commands (md5sum, timeout), bash 3.2 empty-array crashes, and BSD sed incompatibilities. Every issue is fixed, and 9 new tests cover the pull-fallback logic, portable hashing, and update behavior.

Fixes #

  • Config drift on macOS compute_config_fingerprint used bare md5sum which doesn't exist on macOS. Now uses the portable _md5 wrapper (md5 -q fallback)
  • Config command crash on bash 3.2 ${current_caps[@]} expansion on an empty array triggers "unbound variable" under set -u. Fixed with the safe ${arr[@]+"${arr[@]}"} pattern
  • Smoke tests on macOS timeout command doesn't exist on macOS. Replaced with portable perl -e 'alarm shift @ARGV; exec @ARGV' fallback
  • BSD sed compatibility all sed -i calls in tests now use -i.bak (BSD sed requires a backup extension)
  • Smoke _compute_cname reimplemented inline instead of sourcing the entire CLI (which failed on bash 3.2 due to strict-mode interactions)
  • Version box alignment test replaced locale-dependent awk with bash ${#} for consistent width measurement
  • Docker start failure test mock docker now fails for both start and run, handling macOS CI TTY recovery edge case

Changes #

  • Portable _md5 helper and _portable_timeout moved to shared test/setup.bash
  • CI diagnostic step verifies binary runs on macOS bash 3.2 before test suite
  • 538 (+9) behavioral tests (46 docker_commands, 13 helpers, 9 update)
  • 20 mutations caught
  • All 538 tests green on Linux (bash 5) and macOS (bash 3.2)

v0.9.0 #

GitHub CLI capability + faster, lighter Docker image.

New gh capability gives the container access to your GitHub CLI auth — gh auth login inside any container writes tokens back to the host, so you authenticate once and it persists across rm, nuke, and rebuild. The Docker image switches to node:20-bookworm-slim, drops vim and build-essential, and adds pre-built image pull support for faster first starts.

Features #

  • gh capability cleat config --enable gh mounts ~/.config/gh (read-write) into the container. gh auth login works via the browser bridge, tokens persist to the host. GH_TOKEN via --env or .cleat.env works as an alternative
  • Pre-built image pull cleat start tries docker pull ghcr.io/cleatdev/cleat:latest before falling back to local build. cleat update also pulls the latest image after updating the CLI

Changes #

  • Docker image switched from debian:bookworm-slim to node:20-bookworm-slim — Node.js pre-installed, faster layer caching
  • Removed vim and build-essential from image — smaller footprint, users can apt install if needed
  • GitHub CLI (gh) pre-installed in the image via official apt repository
  • Docker stub handles pull and tag commands for test coverage
  • 529 (+5) behavioral tests (45 capabilities, 33 smoke)
  • 20 mutations caught

v0.8.1 #

Fix arrow-up history leaking across projects.

The v0.8.0 per-project session overlay isolated projects/-workspace/ but missed ~/.claude/history.jsonl — the global input history file shared via the base ~/.claude mount. Arrow-up in Claude showed commands from other projects. Now history.jsonl is overlaid per-project alongside sessions.

Fixes #

  • History isolation overlay history.jsonl with a per-project copy from the same session directory used for projects/-workspace, so each project has its own arrow-up history
  • macOS virtiofs compatibility ensure ~/.claude/history.jsonl exists on the host before mounting the nested overlay (virtiofs rejects nested mounts when the target file is missing inside the parent bind source)

Changes #

  • 524 (+4) behavioral tests (48 regressions, 31 smoke)
  • 20 (+1) mutations caught

v0.8.0 #

Per-project session isolation — each project gets its own Claude history.

Previously, all containers shared a single ~/.claude directory. cleat resume for project A showed Claude's conversation history from project B. Now each container mounts a per-project session directory so sessions, tasks, and project memory are isolated. Auth and global settings remain shared.

Features #

  • Per-project sessions each project's Claude Code sessions are stored under ~/.claude/projects/-/ on the host, mounted as an overlay at /home/coder/.claude/projects/-workspace inside the container. --continue finds the correct latest session for each project
  • Hash-based session keys project paths are mapped to - keys, avoiding collisions from similar path names and normalizing case for macOS HFS+ compatibility

Fixes #

  • Session key collision the initial tr '/' '-' approach mapped /a-b/c and /a/b-c to the same key, mixing sessions between unrelated projects. Switched to hash-based keys
  • macOS case sensitivity session key basename is lowercased so MyProject and myproject share the same key on case-insensitive filesystems

Changes #

  • cmd_rm and cmd_nuke preserve session directories on the host (only container temp dirs are cleaned)
  • 520 (+9) behavioral tests (19 mutations caught)

v0.7.0 #

511 tests. Zero regressions ever again.

Three independent test layers — regression registry, real-binary smoke tests, and a hardened Docker stub — now catch every class of bug that previously shipped undetected. The test suite grew from 383 to 511 tests, all pre-existing failures were fixed, and every regression test is mutation-verified to prove it catches its target bug.

Features #

  • Regression test registry 45 tests, one per historical bug from v0.5.1 through v0.6.5, each mutation-tested to confirm it catches the exact failure when the fix is reverted
  • Real-binary smoke tests 29 tests that exec bin/cleat as a subprocess under full set -euo pipefail, catching strict-mode bugs that sourced unit tests miss
  • Hardened Docker stub opt-in DOCKER_STUB_STRICT=1 validates bind mount sources exist; DOCKER_STUB_SIMULATE_VIRTIOFS=1 reproduces the macOS Docker Desktop nested-mount failure that caused v0.6.5
  • Edge-case test suite 33 tests for hostile inputs: paths with $, &, backticks, unicode, broken symlinks; env values with =, spaces, quotes; config files with CRLF, BOM, comments-only; Docker exit codes 125/127/137
  • Mutation test harness test/mutation_regressions.sh applies 18 targeted mutations and verifies each regression test fails, portable across GNU and BSD sed
  • GitHub Actions CI lint + full suite on Ubuntu (bash 5) and macOS (bash 3.2 via /bin/bash) + real-Docker integration tests, with timeouts on every job
  • Integration test framework test/integration/ runs against a real Docker daemon (skips gracefully when unavailable), covering full container lifecycle and env passthrough end-to-end

Fixes #

  • Corrupted update cache crash check_for_update passed a non-numeric last_check from a garbled cache file into an arithmetic expression; under set -u this crashed the CLI with "unbound variable". Added regex guard defaulting to 0
  • Project overlay mount for missing settings files cmd_run wrote empty {} overlay and bind-mounted to /workspace/.claude/settings.json for host files that didn't exist, failing on macOS Docker Desktop virtiofs with "outside of rootfs". Now only mounts files that exist
  • Partial container cleanup on docker run failure a failed docker run could leave a half-created container that blocked the next attempt. Now force-removes on failure
  • Test HOME isolation tests no longer touch the developer's real ~/.gitconfig or ~/.claude/settings.json; HOME is redirected to a temp directory per test, fixing 5 pre-existing test failures

Changes #

  • 511 (+128) behavioral tests across 23 files (was 383 across 19)
  • Container name tests cover shell metacharacters ($, &, ;, ` ``), unicode, and very long paths
  • test/setup.bash isolates HOME, injects git author env vars, documents the strict-mode trade-off

v0.6.4 #

cleat login actually works — OAuth callback proxy fixed for IPv6, stdin EOF, and busy ports.

The browser bridge's OAuth callback proxy had three latent bugs that made cleat login fail silently in most real-world setups. Authentication completed inside the container, but the browser either hung or showed a spurious "callback forwarding failed" page. All three root causes are now fixed with diagnostic logging so future regressions are visible.

Fixes #

  • Callback reached dead socket (IPv6 vs IPv4) Node.js inside the container binds Claude Code's callback HTTP server to ::1 when given "localhost", but socat defaults to 127.0.0.1; every connection was refused. The proxy now tries TCP6:localhost:PORT first, falls back to TCP:localhost:PORT
  • Browser saw "callback forwarding failed" after a successful login socat - propagated stdin EOF to the TCP side and exited before reading the 302 response; the proxy reported success as failure. Fixed by using socat -,ignoreeof so the TCP read continues until the server closes the connection. Connection: keep-alive in the browser request is also rewritten to Connection: close so the server actually closes after responding
  • Proxy gave up silently when the callback port was temporarily in use bind failures (EADDRINUSE) exited immediately with no log line. Both the socat and python3 paths now retry the bind up to 30 times (one per second), and the python3 path sets SO_REUSEPORT on supporting systems
  • Zero diagnostic output when the proxy failed every error was suppressed with 2>/dev/null. The proxy now writes to /tmp/cleat-clip-/.proxy-log with timestamps, protocol used, bind attempts, connection acceptance, bytes forwarded, and exit status
  • Fallback success page on timeout or empty response when the docker-exec forwarder timed out or produced no response body despite rc=0, the browser got a generic HTTP 502 page even though the code had been delivered. The proxy now sends a styled "Authentication Successful" page in those cases

Changes #

  • Active capability names now render in green in the startup summary and the first-run caps line (matches the green success glyph)
  • Browser watcher waits 500ms (was 200ms) after starting the proxy before opening the browser, giving the bind retry loop a better chance to succeed on the first try
  • 381 (+2) behavioral tests (75 hooks)

v0.6.3 #

Environment variables work everywhere — shell, login, and exec all respect .cleat.env.

Previously, env vars from .cleat.env were only passed at container creation time (docker run). Sessions entered via cleat shell, cleat login, or resumed containers didn't see them. Now all entry points resolve env vars at exec time, so changes to .cleat.env take effect immediately without recreating the container.

Fixes #

  • Env vars missing in cleat shell cmd_shell didn't call resolve_env_args or pass _RESOLVED_ENV_ARGS to docker exec; env vars from .cleat.env were invisible in the shell session
  • Env vars missing in cleat login same issue as shell; custom API endpoints or credentials in .cleat.env weren't available during authentication
  • Env vars missing after container restart exec_claude only passed HOME and PATH to docker exec, not the resolved env args; values added to .cleat.env after container creation were silently dropped
  • cleat shell missing PATH used only -e HOME=/home/coder instead of the full CLAUDE_ENV array, so ~/.local/bin wasn't on PATH
  • Env summary showing 0 vars as empty when .cleat.env existed but contained only comments, the startup summary omitted the line entirely instead of showing 0 from .cleat.env
  • Env file missing last line _parse_env_file skipped the final line when it had no trailing newline

Changes #

  • --env, --env-file, and --cap flags now apply to shell and login commands (previously only start, run, resume, claude)
  • 379 (+9) behavioral tests

v0.6.2 #

Startup diagnostics — see why containers fail, fix them in one keystroke.

After a reboot or Docker restart, stale containers often refuse to start. Previously you'd see "Container failed to start" with no explanation. Now the CLI shows Docker's actual error message and offers to remove and recreate the container automatically.

Features #

  • Startup failure diagnostics docker run, docker start stderr is captured and displayed when a container fails to start, showing the actual Docker error (e.g., mount conflicts, network issues, OCI runtime failures)
  • Interactive recovery prompt when docker start fails in a TTY, the CLI asks "Remove container and start fresh? [Y/n]" and auto-recreates on confirmation; non-TTY mode shows a cleat rm hint instead

Fixes #

  • Settings overlay directory collision after cleat rm, Docker's leftover mount targets could turn settings.json into a directory, causing "Is a directory" errors on subsequent starts; the overlay dir is now wiped clean before each docker run
  • Quoted tilde in project path summary block showed '~'/Workspaces/project instead of ~/Workspaces/project

Changes #

  • 370 (+2) behavioral tests (73 hooks, 44 config, 12 installer)

v0.6.1 #

Browser bridge fix — URLs open reliably again.

Fixes #

  • Browser bridge not opening URLs v0.6.0 pre-initialized the file timestamp to skip stale URLs, but same-second writes caused new URLs to be silently dropped; now deletes the stale file instead so every new write is detected

Changes #

  • 368 (+1) behavioral tests (73 hooks, 44 config, 12 installer)

v0.6.0 #

Interactive config, polished UI, battle-tested hooks.

TUI capability picker with keyboard navigation. Hooks and browser bridge hardened against stale session data. Notice boxes render cleanly at any width.

Features #

  • TUI config picker cleat config now uses arrow keys to navigate, space to toggle, enter to save, q to cancel. Falls back to text mode in non-TTY environments

Fixes #

  • Browser bridge replaying old URLs watcher was opening URLs left over from previous sessions on startup
  • Hook bridge replaying old events event watcher was re-executing hooks from prior sessions on every start
  • Project overlay creating .claude/ as root Docker created the directory on the host when it didn't exist, causing permission errors on first start
  • Notice box alignment drift and update banners had misaligned borders when version strings varied in length

Changes #

  • Dynamic notice boxes config drift, image version, and update banners use a shared _notice_box helper with auto-calculated width
  • 367 (+16) behavioral tests (72 hooks, 44 config, 12 installer)

v0.5.2 #

Hooks just work — no container recreation needed.

Adding, changing, or removing hooks in your project or global settings takes effect immediately on resume or claude attach. No more cleat rm required.

Features #

  • Automatic project overlay mounts project-level settings overlays are always created at container startup (even when no hooks exist yet), so hooks added later take effect via the existing bind mount
  • cleat claude refreshes overlays attaching to a running container now refreshes project-level settings overlays, picking up any hook changes
  • cleat resume handles all hook states overlay refresh now correctly handles hooks added, changed, or removed between sessions

Changes #

  • 351 (+4) behavioral tests (70 hooks tests, 12 installer tests)

v0.5.1 #

Simplified hooks — your hooks, running on your host.

Hooks capability redesigned: no custom loggers or injected settings. When enabled, your existing Claude Code hooks from all three settings locations run on the host via the bridge watcher.

Fixes #

  • Project hooks not firing project-level hook overlays were stripping hooks instead of replacing them with event forwarders, so Claude Code saw no hooks and no events were forwarded to the host bridge
  • cleat claude ignoring project hooks cmd_claude did not set the resolved project path, causing the hook bridge to look for project hooks in the wrong directory
  • cleat resume not refreshing project overlays resume only refreshed the global settings overlay; project-level hook changes between sessions were ignored
  • Update banner shown incorrectly version comparison used string inequality instead of semver sort, so the banner could appear when the local version was already newer

Changes #

  • Simplified hooks removed cleat-hook-logger, entrypoint hook injection, cleat hooks command, and CLEAT_NO_HOOKS env var
  • Settings overlay with forwarder when hooks ON, hook commands are replaced with an event forwarder in the overlay instead of being stripped; the bridge reads forwarded events and runs the originals on the host
  • Project-level hook support hooks from .claude/settings.json and .claude/settings.local.json are also forwarded to the host bridge
  • Cleaner entrypoint no longer modifies project directories or creates .claude/settings.local.json
  • Installer fix protected all spinner-wrapped operations from silent exits under set -euo pipefail (update path, fresh install checkout, tag resolution)
  • 347 (−9) behavioral tests (66 hooks tests, 12 installer tests)

v0.5.0 #

Hooks, browser bridge, and host connectivity.

Claude Code hooks work transparently — host-defined hooks run on the host, container events are logged to JSONL. Browser URLs from inside the container open on the host with OAuth callbacks proxied back. host.docker.internal is always available.

Features #

  • Host hook execution host-defined hooks in ~/.claude/settings.json are stripped from the container (settings overlay) and executed on the host via a bridge watcher, with event JSON on stdin and matcher support
  • Hook event logging cleat-hook-logger ships in the Docker image; entrypoint auto-configures Claude Code to log 13 event types to /var/log/cleat/hooks.jsonl
  • cleat hooks command pretty-printed event timeline with --json, --follow, and --clear flags
  • hooks capability opt-in event logging via cleat config --enable hooks or --cap hooks
  • Browser bridge open, xdg-open, and sensible-browser shims inside the container forward URLs to the host browser (auth flows, OAuth, etc.)
  • OAuth callback proxy browser watcher detects redirect_uri in auth URLs and starts a TCP proxy (socat or python3) from host to container via docker exec, so OAuth callbacks reach Claude Code's HTTP server inside Docker
  • cleat login browser bridge login command starts the browser watcher so the auth URL opens automatically and the callback is proxied back
  • Host connectivity --add-host host.docker.internal:host-gateway always added on Linux; Docker Desktop detection skips when already provided; no capability needed
  • Concurrent write safety flock-based file locking prevents interleaved JSONL lines from parallel hooks

Fixes #

  • Settings overlay ~/.claude/settings.json is mounted with hooks stripped so host-only commands (e.g. osascript) don't fail inside the container; falls back to empty {} if jq unavailable
  • Resume refreshes overlay cleat resume refreshes the settings overlay so hook changes between stop/resume take effect
  • Container cleanup cleat rm removes hooks, clipboard, and settings overlay temp directories
  • Entrypoint resilience hook injection failures no longer prevent container startup
  • Hook timeout host hook commands timeout after 30s to prevent bridge hangs
  • Process safety hook bridge tracks and reaps child processes; cleanup kills all children on session exit; wait after every kill to reap disowned children
  • Spinner orphan on Docker failure docker start, docker run, and docker build protected with || rc=$? so set -euo pipefail cannot kill the script before spin_stop runs; global EXIT trap as defense-in-depth
  • Login failure cleanup docker exec ... claude login protected with || rc=$? so browser watcher is always killed even if login fails or user cancels

Changes #

  • 356 (+87) behavioral tests (87 hooks/bridge/safety tests covering event forwarding, host hook execution, browser bridge, OAuth proxy, settings overlay, spinner orphan, process safety, capability gating)
  • Hook settings injected into .claude/settings.local.json (project-local, gitignored)
  • Docker image includes /var/log/cleat, cleat-hook-logger, and open-bridge shims
  • Source-level regression guard greps for unprotected docker commands in spin contexts

v0.4.0 #

Unified terminal design system with spinners and clean output.

No Docker noise. Concise status lines with color, braille spinners, and suppressed boilerplate.

Features #

  • Terminal design system unified symbols (✔ ▸ ! ✖), 8-color ANSI palette, and formatting rules shared across CLI and installer
  • Braille spinner 10-frame animation at 80ms/frame for slow operations, with ASCII fallback for non-Unicode terminals
  • Clean startup sequence step-by-step checkmarks: Image ready, Container started, Auth shared, Claude launched
  • Summary block post-launch output showing container name, project path, active capabilities, and env var counts
  • Docker output suppression build logs hidden on success, shown on failure; container IDs and promo text removed
  • Clean exit ✔ Session ended — resume with: cleat resume; Docker promo text and Terminated messages suppressed
  • TTY detection spinners degrade to static lines when stdout is not a terminal

Fixes #

  • Clipboard watcher cleanup trap on TERM/INT/HUP and disown prevent Terminated messages
  • Cursor restoration spinner cleanup restores cursor visibility on unexpected exit via EXIT trap

Changes #

  • 269 (+53) behavioral tests (12 new for terminal UX and output suppression)
  • Terminal design system documented in concept/12-terminal-design-system.md

v0.3.0 #

Opt-in capabilities for git, SSH, and environment variables.

Extend what the container can access from the host. All disabled by default — the baseline sandbox is unchanged.

Features #

  • cleat config wizard interactive mode to toggle capabilities; direct mode with --enable, --disable, --list
  • git capability mount ~/.gitconfig read-only so commits use your host identity
  • ssh capability mount ~/.ssh read-only with SSH agent forwarding for private repos
  • env capability auto-load env vars from ~/.config/cleat/env (global) and .cleat.env (project)
  • Session-scoped overrides --cap, --env KEY=VALUE, --env-file PATH CLI flags
  • Configuration drift detection config fingerprint stored as Docker label; warns when config changes after container creation
  • Image version detection suggests cleat rebuild when CLI and image versions diverge
  • Project-level config cleat config --project saves to /.cleat, merged with global

Fixes #

  • Bash 3.2 compatibility removed associative arrays (local -A) that require bash 4.0+
  • Empty array expansion protected against set -u failures on empty arrays in bash < 4.4
  • Env resolution replaced grep/sed pipeline that silently exited under set -euo pipefail

Changes #

  • 216 (+95) behavioral tests (95 new for capabilities, config, hardening, bash compat)
  • 21 mutation tests — all mutations caught
  • Source-level scans for forbidden bash 4+ patterns
  • Strict-mode regression tests that run the actual binary

v0.2.0 #

BATS test suite with 121 behavioral tests.

Features #

  • Test suite 121 tests covering every CLI command, clipboard shim, container naming, update logic, and Docker entrypoint; 12/12 code mutations caught
  • Test runner ./test.sh with per-file summary, skip counts, timing, and failure details
  • Sourceable CLI bin/cleat can be sourced without running main, enabling direct function testing

Changes #

  • BATS framework (bats-core, bats-assert, bats-support) added as git submodules
  • Docker stub with file-based mock responses and function-override mocks
  • 14 test files covering all CLI surface area

v0.1.0 #

Docker sandbox for AI coding agents. One command. Zero risk.

Features #

  • One command cleat builds the image, starts a per-project container, and launches Claude Code with full permissions
  • Per-project isolation each project gets its own container, run as many as you need in parallel
  • Session persistence cleat stop and cleat resume pick up where you left off
  • Zero permission issues host UID/GID mapped into the container automatically
  • Clipboard bridge pbcopy, xclip, xsel shims copy to host clipboard via OSC 52
  • Shared auth ~/.claude mounted into all containers, log in once
  • Auto-upgrade notifications daily lightweight tag check, never blocks your workflow
  • Security hardening --pids-limit 1024, --memory 8g, numeric UID/GID validation, Debian slim base