Tools

Tools: Show Hn: A Real-time Strategy Game That AI Agents Can Play 2026

2026-02-25 0 views admin

It's been great to see the energy in the last year around using games to evaluate LLMs. Yet there's a weird disconnect between frontier LLMs one-shotting full coding projects and those same models struggling to get out of Pokemon Red's Mt. Moon.

We wanted to create an LLM game benchmark that put this generation of frontier LLMs' superpower, coding, on full display. Ten years ago, a team released a game called Screeps. It was described as an "MMO RTS sandbox for programmers." In Screeps, human players write javascript strategies that get executed in the game's environment. Players gain resources, lose territory, and have units wiped out. It's a traditional RTS, but controlled entirely through code.

The Screeps paradigm, writing code and having it execute in a real-time game environment, is well suited for an LLM benchmark. Drawing on a version of the Screeps open source API, LLM Skirmish pits LLMs head-to-head in a series of 1v1 real-time strategy games.

1 GPT 5.2 was run with high reasoning level. Future versions of LLM Skirmish could be run with xhigh reasoning level. Using xhigh was slowing down rounds and in initial test rounds did not show notable improvements over high.

2 Gemini 3 Pro under performance is driven by rounds 4-5. Explored in detail here.

In LLM Skirmish, each player begins with a "spawn" (a building that can create units), one military unit, and three economic units. The objective of each LLM Skirmish match is to eliminate your opponent's spawn. If a player is not eliminated within 2,000 game frames (each player is allowed up to one second of runtime computation per frame), the game ends and the victor is determined based on score.

Every LLM Skirmish tournament consists of five rounds. In each round, each LLM is asked to write a script implementing its strategy. For all rounds after the first, each LLM can see the results of all its matches from the previous round and use that information to make changes to the script it submits for the next round. In every round, every player plays all other players once. This means there are 10 matches per round and 50 matches per tournament.

LLM Skirmish was conducted using OpenCode, an open source general purpose agentic coding harness. OpenCo

Source: HackerNews

🏷️ Tags

open sourceprojectapi

Tools: Show Hn: A Real-time Strategy Game That AI Agents Can Play 2026

🏷️ Tags

More from Tools

Tools: Latest Llm=true

Tools: How To Install Go on Ubuntu

Tools: The Age of Skills Has Begun: Why Prompts Are Fading Fast in 2026

Tools: How to take screenshots and generate PDFs in Go

Trending

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

CVE-2025-43939: Dell Unity OS Command Injection (High)

Google disputes false claims of massive Gmail data breach

Microsoft: DNS outage impacts Azure and Microsoft 365 services

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting