加州大學柏克萊分校研究團隊提出新的 AI 訓練方法 GEPA、已被 ICLR 2026 接收為 Oral 論文。GEPA 不更新模型權重、不需 GPU 訓練,只用一個「讀取訓練紀錄」的 LLM 反覆改寫 AI 系統的提示詞,便在 6 項任務上平均勝過主流強化學習方法 GRPO 6%、最高勝出 20%、所需訓練嘗試次數(rollouts)少 35 倍。研究經 AI 工程社群整理擴散後在 X 平台引發討論,目前已整合進 DSPy 成為一等優化器。
GEPA 在做什麼:把訓練紀錄當教材、不再只看分數
傳統強化學習方法(如 GRPO)的工作流是:讓 AI 跑一次任務、根據結果給一個「+1 或 -1」的分數,再用這個分數反覆調整模型權重。問題是 AI 跑這一次任務的過程通常包含上千 token 的推理步驟、工具呼叫、錯誤訊息—這些豐富細節全被壓縮成一個分數,過程資訊被丟掉。所以 RL 需要跑成千上萬次才能收斂。
GEPA 的做法相反:每次 AI 跑完任務,把整段過程(reasoning、工具呼叫、報錯紀錄)原原本本交給另一個「反思 LLM」閱讀。反思 LLM 像個資深工程師讀程式 log,找出哪一步出錯、為何出錯、應該如何修改提示詞,然後直接重寫該模組的提示。同樣一次跑任務、GEPA 從中提取的訊號量遠多於 RL 的單一分數。
為何能贏:把「打分數」改成「讀整段過程」
GEPA 在 6 項任務上平均勝 GRPO 6%、最高勝 20%;對比另一個主流提示優化器 MIPROv2 也勝出 10% 以上(在 AIME-2025 數學題基準上提升 12%)。最關鍵的是訓練成本:GEPA 達到同等性能所需的 rollouts(一次完整跑任務)少 35 倍。
另一項數據是 GEPA 與 DSPy 整合後的「Full Program Adapter」可優化整個 DSPy 程式(包含 signature、模組、控制流),在 MATH 數學基準達 93% 準確率,大幅超過 DSPy 原本的 ChainOfThought 寫法的 67%。GEPA 也在 multi-module 工作流(多模組串接的 AI agent)上表現特別好—可精準鎖定某一個出錯的模組改寫提示,而不是調整整個系統。
誰會先用上:DSPy 一等公民、GitHub 已開源
GEPA 程式碼已開源於 GitHub,並以 dspy.GEPA 形式整合進 DSPy 框架、也獨立發布為 Python library。研究團隊跨 UC Berkeley、Stanford、Notre Dame、Anthropic 等機構,論文作者包含 Matei Zaharia(Databricks 共同創辦人、DSPy 主要作者)與 Omar Khattab(DSPy 主要作者)。
對開發者社群而言,GEPA 提供了「擁有大量 rollout 但不知如何利用」的新解法—多數團隊已累積成千上萬筆 agent 跑任務紀錄,但除了出錯時翻幾筆查 bug,並無系統性方法把這些紀錄轉成模型改進。下一個觀察點是 GEPA 在企業 agentic 工作流(如客服自動化、程式自動修復)的實際導入案例,以及是否會出現非 DSPy 框架的 GEPA 對應實作。
這篇文章 Berkeley GEPA 解析:不更新權重就能讓 AI 學會新任務、35 倍少訓練成本勝 RL 最早出現於 鏈新聞 ABMedia。
Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to
Disclaimer.
Articoli correlati
Astro Co-Founder Open-Sources Rosie: Syncs Skills Across 10 AI Coding Agents
According to Beating, Matthew Phillips, co-founder of the Astro frontend framework, has open-sourced Rosie, a command-line tool for managing AI agent skill packages. The tool automatically detects locally installed coding agents and syncs skills across 10 platforms including Claude
GateNews1h fa
Particle Network Releases Universal Accounts Roadmap, Launches Universal Deposit SDK and AI Agent Accounts
According to ChainCatcher, Particle Network today released the next phase roadmap for Universal Accounts, introducing two new products in coming months: Universal Deposit SDK, enabling developers to add multi-chain deposits with approximately 10 lines of code, and Universal Agent Accounts,
GateNews16h fa
Riot Platforms Q1 2026 Revenue Rises to $167.2M on Data Center Launch
Bitcoin miner Riot Platforms reported total quarterly revenue of $167.2 million in Q1 2026, up from $161.4 million in the same period of 2025, as the company generated $33.2 million from its newly launched data center operations serving AI infrastructure hosting. The milestone prompted CEO Jason Les
CryptoFrontier18h fa
Roblox Launches AI Software to Challenge Unity and Epic Games
According to Bloomberg, Roblox is launching new AI software to compete with Unity Technologies and Epic Games, whose engines dominate large-budget game development. CEO Dave Baszucki stated the tool aims to help creators build multiplayer games with photorealistic graphics more easily, powered by ar
GateNews21h fa
U.S. Navy Signs Nearly $100 Million AI Contract with Domino Data Lab for Mine Detection in Strait of Hormuz
According to Xinhua News Agency, the U.S. Navy's Information Warfare Systems Command recently signed a contract with San Francisco-based AI company Domino Data Lab to procure and deploy machine learning software solutions. The contract, valued at nearly $100 million if fully executed, aims to
GateNews05-03 03:11
XAI Grok 推 Custom Voices:2 分鐘克隆、雙階段身分驗證
xAI 推出 Grok Custom Voices,於控制台錄製約1分鐘語音,2分鐘內產出可用於 TTS 與 Voice Agent API 的客製聲音模型,同步發布 Grok 4.3 與 Voice Library。為防止克隆,採雙階段驗證:先朗讀驗證句,再比對 speaker embedding,確保同一人方可生成。Voice Library 對自製與預建聲音統整管理,80+ 種、28 語言,日後再擴增。
ChainNewsAbmedia05-03 01:35