The Rise of Multimodal AI: Poe's Spring 2025 Report
As generative AI continues to evolve at lightning speed, usage trends are now one of the most reliable indicators of actual innovation and user value. Poe, a platform aggregating access to over 100 AI models, has published its Spring 2025 report based on weekly data from January through April. The analysis spans key modalities including text, reasoning, image, video, and audio, offering a window into which AI models are rising—and which may be left behind.
Reasoning Models: A Rapid Ascent
Usage of reasoning-capable models grew sharply from about 2% in January to 10% by April. Google’s Gemini 2.5 Pro led this category, reaching a 31% share within six weeks of its release. OpenAI’s GPT-4o further expanded its dominance, capturing 35.8% of all message activity on Poe, while GPT-4.1 contributed another 9.4%.
Other entries include OpenAI’s o1-pro, o3, and o4-mini, which were quickly adopted. DeepSeek R1 peaked at 7% usage in February but fell to 3% by April’s end. Hybrid reasoning models such as Gemini 2.5 Flash Preview and Qwen3 showed adaptive potential but remained niche, with combined usage around 1%.
Text Modality: Fast Turnover Among Leading Models
OpenAI’s GPT-4.1 climbed to about 10% share in text-based conversations, while Gemini 2.5 Pro reached 5%. In contrast, Anthropic’s Claude series lost roughly 10 percentage points in share. However, Claude-3.7-Sonnet rapidly overtook Claude-3.5-Sonnet, although the older version still held onto a 12% base.
Image Generation: GPT-Image-1 Enters Fast and Loud
In visual AI, OpenAI’s GPT-Image-1, launched via Poe API in late April, captured a 17% share of image usage in just two weeks. Google’s Imagen3 family steadily grew from 10% to 30%, showing strong user interest in refined generative image quality. Meanwhile, FLUX by Black Forest Labs continued to lead with 35% share, though slightly down from its previous 45%.
Video Generation: Kling Redefines the Space
Kuaishou’s Kling series disrupted the video generation modality. Within three weeks of its late April debut, Kling reached 30%, led by Kling-2.0-Master with a 21% solo share. Google’s Veo 2 held steady at around 20%.
Runway, on the other hand, experienced a dramatic drop from 60% to 20% usage, partially due to its continued reliance on Gen-3-Alpha-Turbo instead of a newer model version.
Audio: ElevenLabs Holds the Throne—for Now
ElevenLabs maintained an 80% usage share in text-to-speech (TTS) through the reporting window, thanks to its consistent quality and speed. Competitors such as Cartesia, Unreal Speech, PlayAI, and Orpheus have entered the field with promising voice innovations, but current usage remains minimal.
To ensure accuracy, Poe’s data excluded automatically triggered outputs (such as one-click audio buttons), focusing solely on active user choices.
Conclusion: A Fast-Moving Frontier—with Platform Limits
Spring 2025 marks a major turning point in the generative AI landscape. Reasoning models are rapidly becoming essential, while image and video modalities are in the midst of a heated race.
However, it’s essential to recognize that this report is based solely on user activity within the Poe platform. While Poe offers one of the most comprehensive AI model ecosystems, its findings may not fully represent broader market dynamics—especially those from enterprise environments, region-specific platforms, or models hosted outside of Poe.
Still, Poe’s dataset serves as a valuable real-time barometer of model competitiveness and user sentiment. It highlights which models are thriving in a multimodal, consumer-facing environment—and which are falling behind.
— Dr. Ken FONG
Original report available here
Keywords
AI trends 2025, Gemini 2.5 Pro, GPT-4.1, GPT-4o, Kling video AI, GPT-Image-1, Poe platform, Claude-3.7-Sonnet, reasoning models, ElevenLabs TTS, multimodal AI, Veo 2, Imagen3, FLUX model, DeepSeek R1, Qwen3, Poe model usage, AI model shift
繁體中文摘要
生成式 AI 進入洗牌期!Poe 報告揭示五大模態新趨勢與平台觀察侷限
生成式 AI 技術持續演進,Poe 平台針對 2025 年 1 至 4 月內部模型使用資料發佈春季趨勢報告,涵蓋 文本、推理、圖像、影片、音訊 五大模態,揭示多個新興模型已快速主導市場。
推理模態中,Google Gemini 2.5 Pro 六週內達 31%,OpenAI 的 GPT-4o 更佔 35.8%,顯示用戶對具推理能力模型需求劇增。DeepSeek R1 熱度不再,Qwen3 與 Gemini Flash 等混合模型仍屬邊緣。
文本模態顯示用戶快速擁抱新旗艦,GPT-4.1 達 10%,Gemini 2.5 Pro 達 5%;Claude 系列則大幅下滑。
圖像生成方面,GPT-Image-1 僅兩週即達 17%;Imagen3 穩健增長至 30%;FLUX 雖仍領先,但佔比由 45% 降至 35%。
影片模態由 Kling 系列領軍,三週即達 30%,Google Veo 2 穩定保持 20%。Runway 則由 60% 重挫至 20%。
音訊方面,ElevenLabs 保持 80% 優勢,新創如 Cartesia 與 Unreal Speech 雖具潛力,尚未普及。
值得注意的是:Poe 報告僅反映平台內模型使用行為,未必可全然代表全球趨勢,特別是在企業應用、開放原始碼工具或地區型平台方面。因此本報告可視為「消費者導向多模態環境下的即時觀察窗口」,而非全面市場調查。
0 Comments