Claude is good at assembling blocks, but still falls apart at creating them

Claude is good at assembling blocks, but still falls apart at creating them

Opus 4.5 is out and people cannot stop raving about it. AGI is nigh! It's a step-change in capabilities!

Don't get me wrong. It's very impressive. But after trying it out in a real codebase for a few weeks, I think that view is overly simplistic. Claude is now incredibly good at assembling well-designed blocks – but it still falls apart when it has to create them.

To demonstrate, I'll run through three real examples: a Sentry debugging loop where Claude ran on its own for 90 minutes and solved the problem; an AWS migration it one-shotted in three hours; and a React refactor where it proposed a hack that would have made our codebase worse.

The same pattern explains all three. And in doing so, it also demonstrates what senior engineers actually do – and why we'll be safe from AGI for a long time.

The most impressive thing Claude Code has done for me is debug, on its own.

I was trying to attach Sentry to our system. Sentry is a wonderful service that creates nice traces of when parts of your code run. This makes it easy to figure out why it’s running slower than you expect.

It's usually very easy to set up, but on this day it wasn't working. And there were no good debug logs, so the only way to figure out what was going on was to guess-and-check. I had to send a test message on our frontend, then look into the Sentry logs to see if we successfully set it up, then randomly try another approach based on the docs. It was frustrating and tedious.

So I had Claude write a little testing script with Playwright that logged into our website and sent a chat. Then I had it connect to Sentry by MCP, and look for the exact codepath I was trying to debug. Finally, I gave it the Sentry docs and told it to keep plugging away until it figured it out.

It took about an hour and a half, but Claude finally got it. This was pretty cool! The core loop of performance engineering is straightforward: make a code change, test, check tracing logs, repeat. With this tooling, Claude could d

Source: HackerNews (https://news.ycombinator.com/item?id=46618042)