Cryptocurrency

Crypto: Openai Pits AI Agents Against Each Other To Detect Smart Contract...

2026-02-19 0 views admin

OpenAI said it is becoming increasingly important to evaluate the performance of AI agents in “economically meaningful environments” as their adoption grows.

OpenAI has launched a new benchmark that evaluates how well different AI models detect, patch and even exploit security vulnerabilities found in crypto smart contracts.

OpenAI released the “EVMbench: Evaluating AI Agents on Smart Contract Security” paper on Wednesday, in collaboration with crypto investment firm Paradigm and crypto security firm OtterSec, to evaluate how much the AI agents could theoretically exploit from 120 smart contract vulnerabilities.

Anthropic’s Claude Opus 4.6 came out on top with an average “detect award” of $37,824, followed by OpenAI’s OC-GPT-5.2 and Google’s Gemini 3 Pro at $31,623 and $25,112, respectively.

While AI agents are becoming increasingly efficient at handling basic tasks, OpenAI said it is becoming more important to evaluate their performance in “economically meaningful environments.”

“We expect agentic stablecoin payments to grow, and help ground it in a domain of emerging practical importance,” OpenAI added.

Circle CEO Jeremy Allaire predicted on Jan. 22 that billions of AI agents will be transacting with stablecoins for everyday payments on behalf of users within five years, while former Binance boss Changpeng “CZ” Zhao also recently tipped that crypto would end up being the “native currency for AI agents.”

The need to test agentic AI performance in spotting security vulnerabilities comes as attackers stole $3.4 billion worth of crypto funds in 2025, a marginal increase from 2024.

Related: China’s AI lead will shape crypto’s future

EVMbench drew on 120 curated vulnerabilities from 40 smart contract audits, most of which were sourced from open-source audit competitions. OpenAI said it hopes the benchmark will help track AI progress in spotting and mitigating smart contract vulnerabilities at scale.

Source: CoinTelegraph