2026-01-17 07:27:46

Recently, an interesting experiment was conducted—multiple large models were allocated $10,000 each to trade over 6 weeks in a football prediction market. The results were quite dramatic.

GPT-5.1 led the pack with a 42.6% increase, followed closely by DeepSeek with a 10.7% profit, and Gemini 3 Pro remained steady at 5.5%. Opus 4.2 contributed 3.9%, while Grok 4.1 Fast achieved 2.1%. However, GPT-5.2 faltered, dropping by 21.8%—it seems not all models excel in this area.

This comparative test was jointly promoted by a prediction market platform and an AI research team. The underlying logic is quite interesting: testing the performance of different AIs in non-standardized decision-making tasks using real funds. Football prediction markets involve data analysis, probability estimation, and risk judgment—making it an ideal scenario to evaluate the practical trading capabilities of large models. The significant differences also reflect that having parameters and training scale alone does not guarantee market decision-making ability; execution strategies and data understanding quality are equally critical.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

17 Likes

Reward
17
10
Repost
Share

Comment

0/400

rekt_but_resilient

· 01-20 07:25

GPT-5.2 crashes directly, now it's awkward haha

View OriginalReply0

SorryRugPulled

· 01-18 06:35

GPT-5.1 directly soars 42.6%, GPT-5.2 reverses and loses 21.8%… Are these two long-lost brothers? Haha DeepSeek quietly gains 10.7%, belonging to the conservative camp. But to be honest, things like football predictions… can they really prove anything? It feels like using real money to gamble and test AI. Having many parameters can't save a model with poor decision-making, I do believe that. But six weeks of data… I’m not sure how meaningful it really is.

View OriginalReply0

RooftopReserver

· 01-17 17:00

GPT-5.2's negative returns are really impressive; you can't even learn that by paying tuition... DeepSeek, on the other hand, is more stable. What does this indicate? Large models still need to rely on intelligence rather than size in the market.

View OriginalReply0

BridgeTrustFund

· 01-17 07:57

gpt5.1 directly soars 42.6%, is this serious? gpt5.2 reverses and loses 21.8%. Is the gap between the same master and apprentice so big?

View OriginalReply0

DeFiCaffeinator

· 01-17 07:57

GPT-5.1 takes off directly, DeepSeek follows steadily, but the move with GPT-5.2 was truly exceptional... The failure of large parameter models shows that, ultimately, practical decision-making ability is still essential.

View OriginalReply0

MetaverseMortgage

· 01-17 07:55

GPT-5.2 directly lost big haha, this is the real "intelligent" test... Armchair strategizing and actual trading are two different things.

View OriginalReply0

ChainSherlockGirl

· 01-17 07:31

GPT-5.2's 21.8% blood loss is truly impressive, making it the biggest suspense of the year... Based on my analysis, this guy might have overfitted a certain competition pattern, only to be hit hard by reality. Conversely, the 42.6% increase of 5.1 is also suspicious; if this data isn't just luck, then it has discovered some pattern we haven't seen.

View OriginalReply0

0xInsomnia

· 01-17 07:30

GPT-5.2 was truly incredible, turning 100,000 into 28,000... This is the true face of AI crypto trading.

View OriginalReply0

ProveMyZK

· 01-17 07:29

GPT-5.2 directly lost money, this is a bit outrageous... just outrageous --- DeepSeek is causing trouble again, this guy really has something --- To put it simply, stock trading with models still depends on execution, having many parameters is useless --- 42.6%? GPT-5.1, what kind of cheat code is this, I don't really believe it --- Using the football prediction market here to stress test AI, the creativity is really impressive --- Haha, why is Grok so disappointing, it's not even as good as Opus --- This experiment tells me one thing: even large models need to have strategy --- Wait, $10k in 6 weeks? This data seems a bit too ideal, is it real? --- DeepSeek isn't bragging, at least it didn't lose money --- Daring to verify AI with real money, these people are really brave

View OriginalReply0

SatsStacking

· 01-17 07:28

gpt5.1 directly takes off 42%? This data is outrageous, feels a bit too perfect, but losing 21% directly in 5.2 is probably deserved haha

View OriginalReply0