Using AI to Fix AI: How I Upgraded OpenClaw and Built a System So I Never Have to Debug It Again

There is a particular kind of frustration that comes from asking your AI assistant to upgrade itself and watching it go completely silent.

No error. No response. Just a dead Telegram chat and me SSH-ing into my Hetzner server at an inconvenient hour trying to figure out what went wrong.

This happened to me more than once. And every time, the debugging process was the same: remote into the VM, poke around the logs, find something broken, fix it manually, restart the gateway, hope for the best. Not exactly the smooth “AI running my life” aesthetic I was going for.

So after the most recent incident, I decided to stop patching and start building. Here is what happened, what broke, and the system I put in place so it never happens like this again.

The Setup

I run OpenClaw on a Linux VM on Hetzner. OpenClaw is a self-hosted AI assistant that connects to messaging channels like Telegram, so I can talk to my AI assistant the same way I message anyone else. My assistant is named August, running on OpenAI Codex gpt-5.2, and handles tasks, reminders, content research, and general automation work.

When a new version drops, the natural instinct is to just tell August to handle it. “Hey, upgrade to the latest version.” August runs the command, the gateway restarts, and Telegram goes silent.

The problem is structural. Telegram is both the control channel and the thing being upgraded. The moment the gateway restarts mid-upgrade, the connection dies. August cannot tell you what happened because August is the thing that just went offline.

What Actually Broke This Time

I was upgrading from 2026.2.24 to 2026.3.11. The npm install completed, the gateway restarted, but Telegram stopped responding entirely.

The root cause turned out to be nothing to do with the version itself. The systemd service file was still pointing to a temporary path that npm had created and then renamed during the install process. Every time August received a message, it tried to load workspace templates from a directory that no longer existed, crashed silently, and never replied.

The error was: Missing workspace template: AGENTS.md.

Not obvious. Not loud. Just silence.

I brought in Claude Code, Anthropic’s agentic coding tool, gave it SSH access to the VM, and told it to find and fix the problem. It traced the broken path in the service file, updated the ExecStart to point to the correct location, rebuilt the patched version of OpenClaw for 2026.3.11, restarted the gateway, and confirmed Telegram was responding again. All autonomously, with me just watching the output.

That is the “using AI to fix AI” part. And it worked cleanly.

The System I Built Afterwards

Fixing the immediate problem was not enough. The same thing would happen next upgrade if I did not change the process.

So I used Claude Code to build a proper upgrade automation system directly on the VM. Claude Code is Anthropic’s agentic coding tool that you run in a terminal. Unlike the Claude you chat with in a browser, Claude Code can execute commands, read and write files, and work directly on your machine or server. In this case I SSH’d into my Hetzner VM, opened a Claude Code session, and gave it the task. It handled everything from there without me needing to write a single line of code myself. If you have never set up Claude Code on a remote server before, that process deserves its own post and I will cover it separately. For now, just know it is what made this possible.

The system it built has three parts.

The first is an upgrade script that handles the full process in the right order: back up the config, stop the gateway cleanly, run the npm install, verify and update the service file path, run openclaw doctor, apply any custom patches, restart, wait thirty seconds, then check that Telegram is showing as running. If anything fails at any step, it automatically rolls back to the backup, restarts on the old version, and sends a failure message to your assistant who then fires a Gmail alert so you know what happened even if Telegram itself did not recover.

The second is a one-shot systemd timer that your assistant can write to on demand. When I tell August via Telegram to schedule an upgrade for 3am, it runs a script that sets the timer, enables it, and confirms back to me. The upgrade runs while I am asleep, and the first message I get in the morning is either August confirming success or a Gmail telling me it rolled back and why.

The third is an instruction file that lives in August’s workspace so it knows how to handle upgrade requests from me in every future session without needing to be reminded.

No n8n. No webhooks. No extra infrastructure. August already had Gmail connected via MCP, so the whole notification chain runs natively.

The Prompt I Used

If you want Claude Code to build the same system for your setup, here is a cleaned up version of the prompt I used. Replace the placeholders with your own details.

If you are not sure what to put in the placeholders, just ask your assistant before you start. Something like:

“What is my current username, npm global path, and systemd service file location for OpenClaw?”

Your assistant will give you those details in seconds. No manual digging through config files required.

Prompt below

I need you to build an OpenClaw upgrade automation system on this machine. Here is what I need:

1. An upgrade script at ~/scripts/openclaw-upgrade.sh that:

  • Accepts a target version as an argument e.g. ./openclaw-upgrade.sh 2026.3.11

  • Backs up ~/.openclaw to ~/.openclaw.bak-[version]-[timestamp]

  • Stops the gateway cleanly

  • Runs npm install -g openclaw@[version]

  • Verifies and updates the systemd service ExecStart path to [YOUR_NPM_GLOBAL_PATH]/lib/node_modules/openclaw/dist/index.js

  • Runs openclaw doctor

  • Applies patches from ~/openclaw-patches/repatch.sh if present

  • Restarts the gateway, waits 30 seconds, checks that Telegram shows as running

  • On success: sends a Telegram confirmation via openclaw message send

  • On failure: auto-rollbacks, restarts on the old version, notifies your assistant to send a Gmail alert

2. A one-shot systemd timer at ~/.config/systemd/user/openclaw-upgrade.timer that your assistant can write to on demand

3. A scheduling script at ~/scripts/schedule-upgrade.sh that takes a version and time as arguments, writes the timer, enables it, and confirms back

4. An instruction file at ~/scripts/UPGRADE-INSTRUCTIONS.md explaining how your assistant should handle upgrade requests, extract version and time from messages, run the script, and handle failure notifications

Context: username is [YOUR_USERNAME], npm global path is [YOUR_NPM_GLOBAL_PATH], gateway runs as a systemd user service, assistant has Gmail connected via MCP. Do not run the actual upgrade. Just build and validate the scripts.

One thing worth calling out before you use this prompt. The single most important step in the entire script is the service file path check. When npm installs a new version of OpenClaw, it sometimes creates a temporary directory during the process and then renames it once the install completes. If your systemd service file is still pointing to the old temporary path, OpenClaw will appear to start fine but will crash silently on every message because it cannot find its own files.

You will not see a loud error. You will just see silence.

If you want to check this manually at any point, just ask your assistant in Telegram:

“Check my OpenClaw service file and confirm the ExecStart path points to the correct dist/index.js location.”

Your assistant will inspect the file and tell you immediately whether the path is correct or needs updating. The upgrade script handles this automatically on every run, but it is worth knowing you can ask your assistant to verify it anytime, especially if OpenClaw ever goes silent after an update and you are not sure why.

The Lesson

The upgrade process breaking was not really an OpenClaw problem. It was an architecture problem. I was using the thing being upgraded as the tool to run the upgrade, with no fallback, no safety net, and no documented process.

Claude Code was the right tool to fix it precisely because it has no dependency on the thing it was fixing. It sits outside OpenClaw entirely, works directly on the VM, and does not care whether the gateway is up or down.

The system I have now is not complicated. It is just in the right order, with the right fallbacks, and documented well enough that August can operate it on my behalf.

That is what good automation actually looks like. Not impressive. Just reliable.

If you are running OpenClaw on a self-hosted Linux setup and hitting similar upgrade pain, the service file path issue is worth checking first. After any upgrade, ask your assistant to confirm the service file is pointing to the right location. That one check would have saved me a couple of debugging sessions.

Keep Reading