8 Top AI Models Compete in Live Trading: Only Grok 4.20 Made a Profit, The Worst Lost Half

2025-12-09 06:46:22

[BlockBeats] I recently saw a pretty interesting AI quantitative trading competition result. The new season kicked off on the 20th last month, and looking at it now, the performance of 8 top AI models was worlds apart.

Bottom line: only Grok 4.20 made money, with a 22.27% return. Even Musk couldn’t help but tweet a boast that “our Grok is the strongest trader,” and joked that the GPU bill is covered now, haha.

The others got wrecked: GPT-5.1 had a small loss of 1.41% and was relatively steady, but GEMINI-3-PRO went straight to -24.28%, and DeepSeek-3.1 was about the same at -24.51%. Kimi 2, Qwen 3-MAX, and Claude-sonnet-4-5 hovered between -25% and -32%. The worst was Grok 4—its sibling—at -52.45% at the bottom, a sharp contrast to Grok 4.20.

This season’s setup had the AIs putting real money into tokenized U.S. stocks on trade.xyz. Each round had a different theme, but all models received the same information. Under this setup, differences in algorithms were massively amplified. Looks like in AI quantitative trading, the fine details of model optimization can be make-or-break.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

13 Likes

Reward
13
7
Repost
Share

Comment

Add a comment

AirdropHermit

· 2025-12-12 04:44

Haha, Grok 4 insiders betraying insiders, this plot is incredible

View OriginalReply0

GovernancePretender

· 2025-12-12 02:54

grok4 accidentally broke itself haha, this script is a bit interesting

View OriginalReply0

FortuneTeller42

· 2025-12-09 07:14

LOL, Grok 4 exposed itself—infighting among its own kind.

View OriginalReply0

Blockblind

· 2025-12-09 07:13

Why is grok4 pumping so hard? Its sibling got dumped by 52%—the difference is just insane.

View OriginalReply0

TrustlessMaximalist

· 2025-12-09 07:07

That 52% drop with Grok 4.20 is just insane. How can products from the same company perform so differently...

Musk's marketing is really impressive this time—one model makes money while all the others lose. These numbers are just too wild, haha.

GPT is still stable, but looking at the other data is just painful. Quantitative trading really is a high-risk, high-failure game.

Grok 4 is basically endorsing Grok 4.20, holding down that bottom spot...

After these rankings come out, I wonder if some people will start going all in on Grok. Feels pretty risky.

The other models losing only 24% is actually not too bad, compared to that -52%...

View OriginalReply0

LayerZeroHero

· 2025-12-09 06:56

Wait, Grok 4.20 makes money while Grok 4 loses 52%? That’s a huge difference—how can models from the same company perform so differently? Could it be that the test parameters were set incorrectly?

View OriginalReply0

AlwaysQuestioning

· 2025-12-09 06:53

Grok 4 drops 52%? This guy just slapped Musk in the face.

View OriginalReply0