Google’s Gemini 2.5 Pro Tops Coding Charts and MENSA Tests in AI ‘IQ’ Battle
By: bitcoin ethereum news|2025/05/09 13:15:02
0
Share
In brief Google’s new Gemini 2.5 Pro tops the WebDev Arena leaderboard, outperforming competitors like Claude in coding tasks, making it a standout choice for developers seeking superior coding capabilities. The AI model also features a 1 million token context window (expandable to 2 million), enabling it to handle large codebases and complex projects far beyond the capacity of models like ChatGPT and Claude 3.7 Sonnet. It also achieved the highest scores on reasoning benchmarks, including a MENSA IQ test and Humanity’s Last Exam, demonstrating advanced problem-solving skills essential for sophisticated development tasks. Google’s recently launched Gemini 2.5 Pro has risen to the top spot on coding leaderboards, beating Claude in the famous WebDev Arena—a non-denominational ranking site akin to the LLM arena, but focused specifically on measuring how good AI models are at coding. The achievement comes amid Google’s push to position its flagship AI model as a leader in both coding and reasoning tasks. Released earlier this year Gemini 2.5 Pro ranks first across several categories, including coding, style control, and creative writing. The model’s massive context window—one million tokens expanding to two million soon—allows it to handle large codebases and complex projects that would choke even the closest competitors. For context, powerful models like ChatGPT and Claude 3.7 Sonnet can only handle up to 128K tokens. Gemini also has the highest “IQ” of all AI models. TrackingAI put it through formalized MENSA tests, using verbalized questions from Mensa Norway to create a standardized way to compare AI models. Gemini 2.5 Pro scored higher than competitors on these tests, even when using bespoke questions not publicly available in training data. With an IQ score of 115 in offline tests, the new Gemini ranks among the “bright minded,” with the average human intelligence scoring around 85 to 114 points. But the notion of an AI having IQ needs unpacking. AI systems don’t have intelligence quotients like humans do, so it’s better to think of the benchmark as a metaphor for performance on reasoning benchmarks. For benchmarks specifically designed for AI, Gemini 2.5 Pro scored 86.7% on the AIME 2025 math test and 84.0% on the GPQA science assessment. On Humanity’s Last Exam (HLE), a newer and harder benchmark created to avoid test saturation problems, Gemini 2.5 scored 18.8%, beating OpenAI’s o3 mini (14%) and Claude 3.7 Sonnet (8.9%) which is remarkable in terms of the performance boost.. The new version of Gemini 2.5 Pro is now available for free (with rate limits) to all Gemini users. Google previously described this release as an “experimental version of 2.5 Pro,” part of its family of “thinking models” designed to reason through responses rather than simply generate text. Despite not winning every benchmark, Gemini has caught developers’ attention with its versatility. The model can create complex applications from single prompts, building interactive web apps, endless runner games, and visual simulations without requiring detailed instructions. We tested the model asking it to fix a broken HTML5 code. It generated almost 1000 lines of code, providing results that beat Claude 3.7 Sonnet—the previous leader—in terms of quality and understanding of the full set of instructions. For working developers, Gemini 2.5 Pro’s input costs $2.50 per million tokens and output costs $15.00 per million tokens, positioning it as a cheaper alternative to some competitors while still offering impressive capabilities. The AI model handles up to 30,000 lines of code in its Advanced plan, making it suitable for enterprise-level projects. Its multimodal abilities—working with text, code, audio, images, and video—add flexibility that other coding-focused models can’t match. Generally Intelligent Newsletter A weekly AI journey narrated by Gen, a generative AI model. Source: https://decrypt.co/318416/googles-gemini-2-5-pro-tops-coding-charts-mensa-tests-ai-iq-battle
You may also like

Why have foreign exchange stablecoins never taken off?
Rather than issuing a local currency stablecoin from scratch, it is better to build a layer of foreign currency pricing on top of a USD stablecoin, allowing users to enjoy the liquidity of the dollar while keeping accounts in local currency.

AIDC, computing power leasing, and cloud: The "three-part thesis" of AI transformation in cryptocurrency mining farms
The "AI transformation" of cryptocurrency mining farms is not just a slogan; it is unfolding in three recognizable stages.

Futu has had all its illegal gains confiscated, reminding cryptocurrency exchanges
Even if foreign financial institutions obtain licenses abroad, as long as you are effectively providing financial services to residents in mainland China, Chinese regulatory authorities may evaluate your actions according to Chinese law.

Football, Web3 & Champions' Energy: A Recap of WEEX's LALIGA VIP Meetup in Barcelona
Relive WEEX's exclusive LALIGA VIP Meetup in Barcelona with football legend Fernando Morientes. From a fireside chat and on-site WEEX x LALIGA signing to partner awards and a live LALIGA match broadcast, discover how WEEX connected football culture, Web3, and community.
Pizza, Poker & AI Trading: A Recap of WEEX Crypto Pizza Day in Dubai
Relive WEEX Crypto Pizza Day in Dubai, where the MENA crypto community gathered at WEEX Dubai Studio to celebrate Bitcoin Pizza Day with pizza, poker, networking, and a live AI trading competition. Discover how WEEX turned a historic crypto milestone into a hands-on AI trading experience.

IOSG Founder: Please tell Vitalik the truth, let the OGs who have enjoyed the industry's dividends enlighten the young people
The wage earners freeze to death on the road, the sellers of goods die of thirst on the way. The weavers of brocade wear coarse cloth, and the grain growers do not have enough to eat.

Morning Report | SpaceX reveals it holds approximately $1.45 billion in Bitcoin; Nvidia's Q1 financial report shows revenue of $81.6 billion; Manus plans to raise $1 billion for buyback business
Overview of Important Market Events on May 21

Insiders: DeepSeek is forming a Harness team to compete with Claude Code
DeepSeek Code is coming.

SpaceX officially submitted its prospectus, unveiling the largest IPO in history
SpaceX's public market debut could take place as early as June, making it the first in a series of giant IPOs from AI companies, with OpenAI and Anthropic also waiting for the right moment.

The financial changes under the new SEC regulations: Opportunities and regulatory red lines behind "tokenized stocks"
In-depth analysis of "tokenized stocks": The SEC's advancement of an innovation exemption framework has sparked heated discussions, revealing the real risks behind third-party "synthetic asset" certificates and 24/7 trading.

Blockchain Capital Partner: The structure of on-chain dual-layer capital is still in the early stages of value discovery
How can the on-chain economy build a capital structure that promotes open innovation while also considering institutional scale?

Secured over $60 million in funding from Dragonfly, Sequoia, and others, learn about the on-chain derivatives protocol Variational | CryptoSeed
What is the difference with Hyperliquid?

I tested with $10,000: zero wear and tear, annualized 8%, and can earn points (with complete tutorial + screenshots)
Perps DEX newcomer StandX launches native stablecoin DUSD, achieving a real APY of 8.46% with its innovative three-tier yield mechanism, breaking the 3% traditional stablecoin interest rate ceiling.

Eight departments take strong measures to regulate cross-border brokers, what do you think?
This regulatory action, known as the "Final Battle for the Rectification of Cross-Border Brokers," will completely end the gray era of unlicensed operations by foreign brokers in the domestic market and reshape the entire market landscape for cross-border investment by mainland residents.
Cheers, Charts & AI: A Recap of WEEX Labs' Openguin Party Energy at ETHMilan 26
Looking for the most exciting web3 events in Europe? Relive the Openguin Party at ETHMilan 2026 — co-hosted by WEEX Labs, Pudgy Penguins and Berachain . This rooftop Pudgy Penguins Event brought together Pudgy Penguins Crypto fans, AI traders, and industry leaders for a 20-minute live AI trading competition under the Milan sky. Discover how WEEX is redefining AI-powered trading.

Morning Report | Deloitte acquires crypto infrastructure company Blocknative; stablecoin company Checker completes $8 million financing; a16z may have become the largest external institutional holder of HYPE
Overview of Important Market Events on May 20

Interpretation of xBubble SOP: Packaging Vibe Coding for non-technical users
DAPPOS has launched the low-threshold AI application xBubble, which innovatively automates the packaging of complex large model workflows with an SOP system, allowing users with no technical background to complete professional-level AI tasks with just one sentence.

From Followers to Price Setters: The Role of the Crypto Market is Reversing
The encryption platform successfully achieved precise pre-listing pricing on CBRS, indicating that Crypto is gradually transforming from a follower of traditional finance into a new pricing hub for global assets through innovative mechanisms.
Why have foreign exchange stablecoins never taken off?
Rather than issuing a local currency stablecoin from scratch, it is better to build a layer of foreign currency pricing on top of a USD stablecoin, allowing users to enjoy the liquidity of the dollar while keeping accounts in local currency.
AIDC, computing power leasing, and cloud: The "three-part thesis" of AI transformation in cryptocurrency mining farms
The "AI transformation" of cryptocurrency mining farms is not just a slogan; it is unfolding in three recognizable stages.
Futu has had all its illegal gains confiscated, reminding cryptocurrency exchanges
Even if foreign financial institutions obtain licenses abroad, as long as you are effectively providing financial services to residents in mainland China, Chinese regulatory authorities may evaluate your actions according to Chinese law.
Football, Web3 & Champions' Energy: A Recap of WEEX's LALIGA VIP Meetup in Barcelona
Relive WEEX's exclusive LALIGA VIP Meetup in Barcelona with football legend Fernando Morientes. From a fireside chat and on-site WEEX x LALIGA signing to partner awards and a live LALIGA match broadcast, discover how WEEX connected football culture, Web3, and community.
Pizza, Poker & AI Trading: A Recap of WEEX Crypto Pizza Day in Dubai
Relive WEEX Crypto Pizza Day in Dubai, where the MENA crypto community gathered at WEEX Dubai Studio to celebrate Bitcoin Pizza Day with pizza, poker, networking, and a live AI trading competition. Discover how WEEX turned a historic crypto milestone into a hands-on AI trading experience.
IOSG Founder: Please tell Vitalik the truth, let the OGs who have enjoyed the industry's dividends enlighten the young people
The wage earners freeze to death on the road, the sellers of goods die of thirst on the way. The weavers of brocade wear coarse cloth, and the grain growers do not have enough to eat.
Customer Support:@weikecs
Business Cooperation:@weikecs
Quant Trading & MM:bd@weex.com
VIP Program:support@weex.com



