SL
Skeptik Log
reddit

OpenClaw 4.20 broke things for power users: one community member documented all 29 fixes

By u/Marcelovc Original ↗
Note: Sections highlighted in blue are research additions for completeness, not present in the original thread.

After upgrading to OpenClaw 4.20, one user found 29 things that needed manual patching. Some of their fixes later appeared upstream. They also say they got banned from GitHub for reporting the bugs.

📋 Source: Reddit r/openclaw (u/Marcelovc), GitHub, OpenClaw changelog

Why this matters

If you’re running OpenClaw with ACP agents, Discord bindings, long sessions, or any non-trivial setup, version 4.20 likely broke something you rely on. u/Marcelovc didn’t just complain - they catalogued every failure and every fix. The result is part survival guide, part indictment of a release that shipped incomplete and a community response that banned the messenger.

What is OpenClaw? OpenClaw is an open-source personal AI assistant (363K+ GitHub stars) that runs AI agents across any OS and platform. Version 4.20 (tagged v2026.4.20), released April 21, 2026, is officially a minor release: wizard onboarding improvements, tiered model pricing support, Kimi K2.6 as the new default for Moonshot, cron state split into jobs-state.json, compaction start/completion notifications, and various fixes. The gateway is the central daemon managing sessions, channel bindings (Telegram, Discord, etc.), and message routing. ACP (Agent Client Protocol) lets external agents like Codex, Claude Code, and Kimi connect as persistent, stateful sessions within the OpenClaw runtime.

The 29 fixes, by subsystem

Rather than listing each fix mechanically, here they are grouped by subsystem, with verification against what the project has actually acknowledged.

Gateway and shutdown (fixes 1, 22)

  • Fix 1: Gateway restarts dropped sessions and bindings mid-flight. Solution: add SIGTERM drain handling and use KillMode=mixed in the systemd unit.
  • Fix 22: Inconsistent restart behavior via systemd. Solution: proper drop-in overrides.

These are not controversial bugs. GitHub issue #26412 documents that gateway restarts kill active sub-agents with no drain mode, no session recovery, and no log of what was lost. Issue #32961 requests graceful session drain before restart. Issue #23887 reports that KillMode=process in the systemd unit leaves orphan Chrome/Playwright processes after gateway crashes. PR #20357 proposed making KillMode configurable but the underlying drain problem remains open. The 4.20 changelog does not mention any of these fixes.

ACP and binding persistence (fixes 2, 7, 14, 17, 28, 29)

  • Fix 2: ACP binding identity and state lost across sessions
  • Fix 7: ACP manager metadata corruption
  • Fix 14: Delayed streaming in ACP sessions (solution: deliveryMode = "live")
  • Fix 17: 30-minute limit on long ACP turns (solution: bump to 3600s)
  • Fix 28: Claude CLI crashes in ACP context (solution: heal daemon)
  • Fix 29: Missing persistence for ACP binding healing (solution: lossless-claw extension)

ACP stability is a known pain point. GitHub issue #62128 documents intermittent empty payloads for ACP agents (pi, opencode) triggered by concurrency. The issue is still open. The 4.20 official changelog includes a fix for Anthropic API defaulting that was scoping anthropic-messages transport incorrectly for non-Anthropic providers, which maps closely to fix #18 (Kimi streaming tool arguments missing). But none of the ACP binding persistence or metadata corruption fixes appear in the changelog.

Compaction and memory (fixes 15, 16)

  • Fix 15: Uncontrolled memory growth from startup context. Solution: disable startup context in openclaw.json.
  • Fix 16: Overly aggressive compaction in long sessions. Solution: increase context tokens to 400k.

Compaction is OpenClaw’s process for compressing conversation history when the context window fills up. It is one of the most complained-about subsystems in the project. GitHub issue #13624 describes an “auto-compaction death spiral” where overflow recovery fails when the transcript is too large to compact. Issue #10613 documents compaction retry cascades causing context overflow loops. A detailed postmortem from Clelp traces a non-stop compaction loop to a 184MB embedding cache with no eviction policy and a SQLite store that grew unbounded.

The 4.20 release did add opt-in start and completion notifications during compaction (PR #67830), but this only tells you compaction is happening - it doesn’t fix the underlying threshold problem. Marcelovc’s fix (bumping context to 400k) is a workaround, not a solution. The project has not addressed compaction’s fundamental sensitivity to long, tool-heavy sessions.

Bash and execution safety (fixes 4, 5, 6, 21)

  • Fix 4: Multi-edit failures with no hints, causing infinite loops
  • Fix 5: Compound bash commands (a && b) bypassing preflight checks
  • Fix 6: Runtime self-killing in a loop via pkill (solution: use fuser -k <port>/tcp)
  • Fix 21: Unbounded foreground bash executions with potential hangs (solution: enforce timeouts via hooks)

These are all defensive hardening patches. None appear in the 4.20 changelog. The pkill self-match bug is a classic process management footgun: the runtime’s own process matches the kill pattern, creating a SIGKILL loop.

Model and provider issues (fixes 8, 9, 10, 18, 19, 27)

  • Fix 8: Raw model IDs shown instead of readable names in the selection UI
  • Fix 9: Infinite hang on Codex initialization (solution: 60s timeout)
  • Fix 10: Moonshot API rejecting requests with the thinking field (solution: strip the key)
  • Fix 18: Empty tool arguments during Kimi streaming (solution: switch to anthropic-messages format)
  • Fix 19: Incorrect fallback models in the Codex cascade (solution: empty the fallback array)
  • Fix 27: Behavioral drift in Gemini tool use and thinking (solution: reapply hotfix after every update)

The 4.20 changelog does address some provider-specific issues: it adds support for thinking.keep = "all" on kimi-k2.6 and strips thinking for other Moonshot models or when pinned tool_choice disables it (PR #68816). It also fixes Anthropic API defaulting that was incorrectly scoping anthropic-messages transport to non-Anthropic providers. But Codex initialization hangs, model display names, and Gemini behavioral drift are absent from the official notes.

Auth, logging, and plugin noise (fixes 11, 12, 13)

  • Fix 11: False OAuth token expiry warnings (solution: respect refresh_token in health checks)
  • Fix 12: Repeated noisy warnings from the plugin loader (solution: treat installs as a trusted source)
  • Fix 13: Files not attached when inline media paths are present (solution: detect, validate, and stage found files)

The 4.20 changelog does include a fix for webchat treating inline image attachments as media for empty-turn gating, which overlaps with fix #13.

Claude SDK/CLI workarounds (fixes 24, 25, 26)

  • Fix 24: Upstream bugs in Claude vendor code for ACP (solution: patch with OPENCLAW_* markers)
  • Fix 25: Unwanted system reminders injected by the Claude SDK
  • Fix 26: Same reminder issue at the CLI level, requiring a patch to the CLI binary itself

These three fixes are effectively patching Anthropic’s code from the outside, which is fragile by definition and will break with every SDK update.

Infrastructure and cleanup (fixes 3, 20, 23)

  • Fix 3: Crash on dirty shutdown with AgentDisconnectedError (solution: suppress the error before exit)
  • Fix 20: Accumulated stale state in the runtime (solution: periodic crontab cleanup)
  • Fix 23: Cross-project session leaks at startup (solution: rotate sessions on boot)

The 4.20 changelog does address session store bloat indirectly, with PR #69404 enforcing built-in entry caps and age pruning by default to prevent cron/executor session backlogs from causing OOM before the write path runs.

Community reactions

The post sparked predictably divided responses.

u/Flimsy_Exercise_1561, identifying as a developer, pushed back on the methodology: “If you patch files in dist/, they will disappear with the next release. You have to patch the source code, rebuild, then check whether it works. If you weren’t able to get your changes in, they either suck, descriptions suck, or the changes are not that important in the eyes of the maintainers.”

u/ArchiDevil was blunter: “If these are ‘bugs’ you had reported to GitHub, that not surprising that you’ve been banned.”

u/Odd-Energy71 contributed an ironic fix #30: “Rug pull. Problem: References Openclaw founder. Path: Title. Solution: wut?”

And u/Correct_Support_2444, on the version number: “Dude, it’s a 4.20 release. It’s probably a damned inside joke. It’s well baked.”

The community pushback on patching dist/ files is valid. OpenClaw ships minified, hashed JavaScript bundles (the filenames in the fixes - gateway-cli-Dk7XTZhh.js, bash-tools-UuDLD4ZI.js - include content hashes). Any npm update or version bump replaces these files entirely. The proper path is either contributing patches to the source repository or building from fork. That said, Marcelovc’s core observation stands: these bugs exist, they affect production deployments, and several have open GitHub issues confirming them regardless of who reported them first.

The version number elephant

Release 4.20 came out on April 21, 2026, and the “4/20” jokes wrote themselves. But the rapid-fire patch cycle tells its own story: v2026.4.21 landed on April 22 and v2026.4.22 followed shortly after. When three releases ship in three days, the first one wasn’t finished. None of Marcelovc’s 29 issues appear in the official 4.20 changelog, which focuses on wizard restyling, Kimi defaults, cron state split, and a handful of provider fixes. The gap between the official narrative and what power users actually experience is the real story here.

The technical details

From here on, this gets into the weeds. If you’re interested in the big picture rather than the implementation details, you can skip to the conclusion.

Post-upgrade verification

Marcelovc suggests a verification prompt:

“Audit my actual install against this fixes inventory (check above). Search by affected subsystem, config key, runtime behavior, and file path.”

The full inspection checklist covers:

  • Installed dist/ files (compare against known patched versions)
  • openclaw.json (check for missing or incorrect keys)
  • Systemd units and drop-in overrides
  • Hooks (verify patch markers and paths)
  • Cron jobs (check for stale state accumulation)
  • Logs (scan for the specific error patterns listed)
  • Backups (verify they exist before any manual patch)
  • Local extensions (check compatibility after upgrade)

Why patching dist/ is problematic

OpenClaw bundles are content-hashed and minified. A patch to gateway-cli-Dk7XTZhh.js works until the next npm update replaces the entire file. The sustainable approaches are:

  • Fork the source, apply patches, rebuild
  • Contribute fixes upstream and wait for the next release
  • Use the hooks system where possible to work around bugs without touching dist/

The bottom line

Key points:

  • 29 manual fixes were needed to make OpenClaw 4.20 functional for power-user setups - none acknowledged in the official changelog
  • The user who documented them says they were banned from GitHub for reporting the bugs, while some of their patches later appeared upstream
  • Patching dist/ files is a temporary fix that breaks on every update - the real gap is between the official release narrative and production reality

The version number might be a joke, but the bugs aren’t. When your community’s most thorough bug reporter gets banned instead of listened to, the problem isn’t the bugs.

Resources

reddit By u/Marcelovc