OpenZeppelin finds data contamination in OpenAI’s EVMbench

Blockchain security firm OpenZeppelin says it has found methodological flaws and data contamination in its audit of OpenAI’s new artificial intelligence benchmark for blockchain security, EVMbench.

EVMbench was launched in partnership with crypto investment firm Paradigm in mid-February. It was built to evaluate how well different artificial intelligence models can identify, patch, and exploit smart contract vulnerabilities.

In an X post on Monday, OpenZeppelin said it welcomed the initiative but recently decided to put EVMbench “through the same scrutiny” it applies to all the protocols it helps secure, including the likes of decentralized finance heavyweights Aave, Lido and Uniswap.

In its audit, OpenZeppelin found two key issues: training data contamination and classification issues related to several high-severity vulnerabilities.

“We reviewed the dataset and identified methodological flaws and invalid vulnerability classifications, including at least four issues labeled high severity that are not exploitable in practice,” OpenZeppelin said.

Source:OpenZeppelin

The release of the EVMbench saw an evaluation of how well AI agents could theoretically exploit smart contract vulnerabilities. Anthropic’s Claude Open 4.6 topped the list, followed by OpenAI’s OC-GPT-5.2 and Google’s Gemini 3 Pro.

EVMbench testing may need revising

Looking at the first issue in data contamination, OpenZeppelin said the most important capability in “AI security is finding novel vulnerabilities in code the model has never seen before.”

However, during the EVMbench’s testing of AI agents, OpenZeppelin said that all the AI agents that scored the highest had “likely been exposed to the benchmark’s vulnerability reports during pretraining.”

During EVMbench testing, internet access was cut off for the AI agents, meaning they couldn’t simply search for solutions to problems. However, the benchmark was based on curated vulnerabilities from 120 audits conducted between 2024 and mid-2025, with the knowledge training cutoffs for these agents generally set to mid-2025.

As such, it ran the risk that the AI agents already had the answers to all of the problems stored in their memory.

“While this does not necessarily enable the model to identify the issue immediately, it reduces the quality of the test. The dataset’s limited size further narrows the evaluation surface, making these contamination concerns more significant,” OpenZeppelin said.

Finally, OpenZeppelin said that there had been some significant factual errors in the EVMbench’s dataset, arguing that several “high-severity vulnerabilities” were invalid.

OpenZeppelin said it had assessed at least four vulnerabilities that EVMbench classified as high risk, but that don’t actually work. However, EVMbench had been scoring AI agents correctly for finding these supposedly false vulnerabilities.

“These aren’t subjective severity disagreements; they are findings where the described exploit doesn’t work.”

Ultimately, OpenZeppelin reiterated that AI will have a significant impact on bolstering blockchain security, but stressed the importance of applying the tech and testing it properly to maximize its potential.

“The question isn’t whether AI will transform smart contract security — it will. The question is whether the data and benchmarks we use to build and evaluate these tools are held to the same standard as the contracts they’re meant to protect.”

Source link

What's Hot

HashKey Chain Partners Morpho to Blend Compliance and DeFi for Institutional CeDeFi and RWA Lending

Kraken Brings Regulated Perpetual Futures Onshore to US Users

Is California Reaching Critical Mass?

OpenZeppelin finds data contamination in OpenAI’s EVMbench

EVMbench testing may need revising

India’s NHRC Raises Alarm Over Digital Arrest Scams

Rokarolla Trojan Combines Banking Fraud With Device Surveillance

Pyra to Cease Operations Following Drift Hack, Launches Fund Withdrawal Portal

Oklahoma Raises Alarm Over Fake Crypto Returns

Bitcoin Edges Higher to $27.7K; AVAX, XRP Jump as Crypto Market Settles

Inside Job Suspected As Huobi Wallet Falls Victim To $263,000 Exploit

Watch: Jon Stewart Gives Trump Rare Credit

What's Hot

OpenZeppelin finds data contamination in OpenAI’s EVMbench

EVMbench testing may need revising

Related Posts