8 Top AI Models Compete in Live Trading: Only Grok 4.20 Made a Profit, The Worst Lost Half

[BlockBeats] I recently saw a pretty interesting AI quantitative trading competition result. The new season kicked off on the 20th last month, and looking at it now, the performance of 8 top AI models was worlds apart.

Bottom line: only Grok 4.20 made money, with a 22.27% return. Even Musk couldn’t help but tweet a boast that “our Grok is the strongest trader,” and joked that the GPU bill is covered now, haha.

The others got wrecked: GPT-5.1 had a small loss of 1.41% and was relatively steady, but GEMINI-3-PRO went straight to -24.28%, and DeepSeek-3.1 was about the same at -24.51%. Kimi 2, Qwen 3-MAX, and Claude-sonnet-4-5 hovered between -25% and -32%. The worst was Grok 4—its sibling—at -52.45% at the bottom, a sharp contrast to Grok 4.20.

This season’s setup had the AIs putting real money into tokenized U.S. stocks on trade.xyz. Each round had a different theme, but all models received the same information. Under this setup, differences in algorithms were massively amplified. Looks like in AI quantitative trading, the fine details of model optimization can be make-or-break.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 7
  • Repost
  • Share
Comment
Add a comment
Add a comment
AirdropHermitvip
· 2025-12-12 04:44
Haha, Grok 4 insiders betraying insiders, this plot is incredible
View OriginalReply0
GovernancePretendervip
· 2025-12-12 02:54
grok4 accidentally broke itself haha, this script is a bit interesting
View OriginalReply0
FortuneTeller42vip
· 2025-12-09 07:14
LOL, Grok 4 exposed itself—infighting among its own kind.
View OriginalReply0
Blockblindvip
· 2025-12-09 07:13
Why is grok4 pumping so hard? Its sibling got dumped by 52%—the difference is just insane.
View OriginalReply0
TrustlessMaximalistvip
· 2025-12-09 07:07
That 52% drop with Grok 4.20 is just insane. How can products from the same company perform so differently...

Musk's marketing is really impressive this time—one model makes money while all the others lose. These numbers are just too wild, haha.

GPT is still stable, but looking at the other data is just painful. Quantitative trading really is a high-risk, high-failure game.

Grok 4 is basically endorsing Grok 4.20, holding down that bottom spot...

After these rankings come out, I wonder if some people will start going all in on Grok. Feels pretty risky.

The other models losing only 24% is actually not too bad, compared to that -52%...
View OriginalReply0
LayerZeroHerovip
· 2025-12-09 06:56
Wait, Grok 4.20 makes money while Grok 4 loses 52%? That’s a huge difference—how can models from the same company perform so differently? Could it be that the test parameters were set incorrectly?
View OriginalReply0
AlwaysQuestioningvip
· 2025-12-09 06:53
Grok 4 drops 52%? This guy just slapped Musk in the face.
View OriginalReply0
  • Pin