What I Automated with OpenClaw (and What I Couldn't)

OpenClaw hit 60,000 GitHub stars in 72 hours. Every dev influencer posted their "10 workflows that changed my life" thread. I installed it expecting JARVIS. What I got was more interesting — a tool that's genuinely useful in specific spots and completely unreliable in others, and the gap between those two taught me more about AI agents than any tutorial.

Here's what actually happened when I tried to wire it into my workflow.

What Worked: The Boring Stuff

The best use cases were the ones nobody posts about on Twitter.

GitHub issue triage. I connected OpenClaw to my repos and set up a skill that reads new issues, labels them by type (bug, feature, question), and drops a summary into a Discord channel. It gets the label right maybe 80% of the time. That's good enough — I skim the channel once a day and fix the misses in seconds. Before this, issues sat unlabeled for weeks.

Daily digest. Every morning at 8am, OpenClaw pulls my open PRs, any failing CI runs, and calendar conflicts, then posts a single message to my phone. This took maybe 20 minutes to set up and saves me from opening four different apps before coffee.

File organization. I pointed it at my Downloads folder and told it to sort PDFs, screenshots, and code files into labeled directories. Stupid simple. Works every time. This is the kind of automation that doesn't make a good demo but actually changes your day.

What Broke: The Interesting Part

The failures were more instructive than the wins.

Multi-step code changes. I asked OpenClaw to "add error handling to all API routes in this project." It read the codebase, generated patches, and applied them — but the patches conflicted with each other because it processed files independently without tracking shared state. Three routes imported the same error utility differently. I spent longer fixing its work than I would have spent doing it myself.

The lesson: AI agents are bad at tasks where step N depends on what happened in step N-1. Each skill invocation starts with a fresh context. There's no memory between steps unless you explicitly build it into the workflow, and even then the context window limits what it can hold.

Anything requiring judgment. I tried to set up a workflow that would review PRs and leave comments. It ran fine technically — it read the diff, generated comments, posted them. But the comments were either obvious ("this variable could have a better name") or wrong ("this function doesn't handle null" — it did, two lines down, in a way the agent couldn't see because it was looking at the diff, not the full file). I turned it off after a day because bad automated comments are worse than no comments.

Debugging its own failures. When a workflow fails, OpenClaw gives you logs. But the logs often just say the LLM returned an unexpected response. You're debugging a conversation, not code. There's no stack trace. There's no line number. You're reading a prompt and guessing why the model misunderstood it. This is a new kind of debugging that I don't think we have good tools for yet.

What I Actually Learned

The 80/20 rule is real for AI automation. The easy 20% of tasks — triage, sorting, digests, notifications — work reliably and save real time. The hard 80% — code generation, review, multi-step reasoning — fails in ways that cost more time than they save. Knowing which bucket a task falls into before you automate it is the actual skill.

Skills are just prompts with a nice name. OpenClaw's skill system looks like a plugin architecture, but under the hood each skill is a markdown file that gets injected into the LLM's context. If the LLM can't do the task in a single conversation turn, the skill won't work either. Understanding this changed how I wrote skills — I stopped trying to build complex multi-step automations and started building small, single-purpose ones that chain together.

The setup cost is real. Every tutorial shows the "after" — a clean workflow running smoothly. Nobody shows the three hours of fiddling with API tokens, debugging YAML indentation, testing edge cases, and discovering that the Telegram integration drops messages over 4096 characters. OpenClaw is powerful, but it's not plug-and-play. You need to be comfortable debugging systems you didn't write.

Who Should Actually Use It

If you're a developer who's comfortable with YAML, API keys, and debugging opaque systems — and you have a few repetitive workflows that don't require judgment — OpenClaw is genuinely worth the setup time. Start with notifications and triage. Don't start with code generation.

If you're expecting it to replace your IDE or your brain, you'll be disappointed. The best AI agent in the world is still just a language model with access to your APIs. It's as smart as the prompt you give it and as reliable as the weakest integration in the chain.

I'm keeping it running for the boring stuff. And I'm keeping my hands on the keyboard for everything else.