Tools: I asked Opus 4.6 and Codex 5.2 to build the same thing, and I was surprised with the results

Tools: I asked Opus 4.6 and Codex 5.2 to build the same thing, and I was surprised with the results

Source: Dev.to

Setting up the test ## And we're off to the races! ## The awaited outputs! ## Opus 4.6 ## Codex 5.2 ## Summary Most of the world is now using Opus 4.6 and Codex 5.2. A little over 24 hours after their release, I was finally able to test them myself. Was I about to need to change my workflow? I built an application awhile back for my own Discord server that ingests content for a RAG system while also helping moderate by auto-assigning roles. It's a fun side project. I figured that before I hop back in and start messing around with these new models, I could test them first to see which one I should use. I settled on a visualization of my data test, so we can consider this frontend heavy with minor logic implementation. I'll be sure to run separate tests for other projects in the backend category soon. In this test, I'm going to be measuring two main things: speed and accuracy. Lets begin. First, for both models, I ensured they had a connection to my Xano.com workspace and could read the data. There's another discussion to be had on MCP, but it's how the models will be communicating with the platform. Second, I used Cursor to test Codex, and I used CC to test Opus 4.6 (critically, these are different environments, and the behind-the-scene aspects are handled differently enough to skew results, but I have other tests for this in the future planned) Third, they each get the prompt: Please take the data from workspace 11 inside Xano (mcp); I want you to create a visual representation of all of my data as it relates to one another. By this, I want you to show me an isometric view of all the relationships between tables, datas, and functions; this includes middlewares, authentication systems, tasks, and anything else. Please go through the entire application and assess all functions, tables, endpoints, tasks, and more to create a map. First, scan through the necessary .XS files. Use MCP to assist with both application flow and data-storage. Second, create an HTML page with CSS and JS that shows, in isometric view, the landscape of the application, with a way to visualize how everything is interconnected. This should be mildly video game like, but with emphasis on readability and accessibility. To assist with readability: query all data, persist as files within local. Your prompts may look different, which will 100% impact the outcome of this experiment. However, I want to test the models on their ability to extrapolate from what I provide. Pressing enter on both, I watched Codex whiz through the tasks while Claude was left flibbergating for several minutes at a time. There isn't a ton of variability in the decision making, but the speed at which they execute is notable different. Regardless, Codex finished about 2 whole minutes before Claude in this instance, at 5 minutes and 55 seconds, with Claude following up shortly after the 8 minute mark. Winner of Development Speed: Codex I started with the output from Opus 4.6 first. I was the most eager to see this result, only expecting good things. It wasn't much of a surprise, but fortunately, when I opened the page, it worked and was accessible and matched a visual model I had conceived of in my head. I could autozoom, drag my mouse around, click on the nodes, click off the nodes, and the sidebar opened up with connectivity information. To be frank, I wasn't blown away, but only because I fully expected Opus 4.6 to do a great job. The standard was upheld. This one was NOT expected. I've only heard good things about Codex, so not seeing anything load... well... I can't necessarily blame Codex, but within the constraints of the given task, it over-performed, expecting to serve content from the server side. Since I wanted to keep this local, all I had to do was copy and paste the error codes, throw them into Cursor like a proper vibe-coder, and then I could hit refresh. Still, though - slightly let down. The visualization was clunky, I had little understanding of what I was looking at. The entire UX required additional prompting for me to make it feasible to use. Winner of Development Accuracy: Opus Ultimately, I wasn't too disappointed with the outcome: Claude has always seemed to perform a little better for me on the frontend. Factoring in the environment differences, Codex is still aching to be field tested in the CLI with some proper backend development. But was I surprised? Yes. I did truly expect similar results between the two. It doesn't seem that I'll need to change my workflow too much in the time being, as Claude really does have an affinity for being able to read between the lines, extrapolate the user's intentions, and deliver. But, it does offer debate around the personality of a model and which one suits your building style best: interpretive vs executional. With that, and my building style, I'd assign Opus 4.6 as the winner in this test. Codex delivers speed, but accuracy and outcome is still the deciding factor. Leave me a comment if you want me to test anything in particular. More tests to come! Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse