From Gen AI to AI Agents and the Future of Intelligent Automation: Highlights from Mr. Shek Ka-wai’s PolyU SPEED Guest Lecture

On 17 May 2025, Mr. Shek Ka-wai, a generative AI expert and founder of OMP, delivered a guest lecture to students of PolyU SPEED’s Professional Certificate in AI-Driven Digital and Social Media Marketing with many alumni joining online. He walked them through the transformative journey from traditional generative AI to the rise of intelligent agents—a progression that is redefining how AI supports real-world tasks and decision-making. His message was clear: AI is no longer just a tool to answer questions—it’s evolving into systems that can reason, act, and innovate.


The Five Levels of AI: A Framework by OpenAI

Mr. Shek introduced a key framework from OpenAI that outlines the five progressive levels of artificial intelligence. This framework helps contextualize where today’s AI systems are and where we are heading:

  1. Chatbots – Entry-level AI systems designed for simple conversational tasks, such as answering questions and generating content. Think of early versions of ChatGPT or other LLMs. These bots lack the ability to retain memory or apply complex reasoning.
  2. Reasoners – Advanced AI systems that can engage in logical reasoning and multi-step problem-solving. This includes models like OpenAI’s o1 and DeepSeek’s R1. These models are designed to simulate aspects of human critical thinking and analytic deduction.
  3. Agents – AI systems capable of acting in environments, using tools, and autonomously executing tasks. They don’t just answer prompts but can decide which steps to take to complete a goal. Examples include agents that interface with software systems to complete accounting entries or automate research tasks.
  4. Innovators – AI that generates novel insights or solutions that transcend human-taught knowledge. These models apply first-principles thinking to devise new concepts, technologies, or scientific breakthroughs. Innovation is no longer restricted to humans.
  5. Organizational AI – The final level, where AI systems can coordinate across multiple agents and departments, manage organizations, and potentially outperform human executives in decision-making and strategic alignment.

He emphasized that in 2025, we are firmly transitioning into the third level—AI Agents—thanks to advances in reasoning models and reinforcement learning.



The Evolution: From LLMs to AI Agents

Mr. Shek began by drawing a distinction between large language models (LLMs) and AI agents. LLMs are excellent at language-based tasks—answering questions, summarizing information, and generating content—but they require human intervention to act on their outputs. They are akin to an academic who can quote theories but cannot apply them to hands-on work.

In contrast, AI agents are active problem solvers. When given an instruction, they don’t just respond; they interpret the goal, break it into steps, choose the right tools, and iterate until the task is done. For example, when asked to extract and store data from a receipt, a traditional Gen AI model might recognize the text but leave the storage to the user. AI agents, however, not only recognize the text but can also autonomously select the appropriate accounting software and input the data directly—saving significant manual effort.

This level of capability results from combining advanced reasoning models (like OpenAI’s O3 and O4) with reinforcement learning. Much like how a child learns to ride a bicycle through trial and error, AI learns from real-world feedback and evolves independently—building genuine action intelligence.

This new class of AI is capable of action, not just reaction—marking a monumental leap in capability.


Why Reasoning Models Matter

The leap from passive LLMs to active agents is made possible by advanced reasoning models such as OpenAI’s o3 and o4. These models simulate cognitive functions: they can plan, troubleshoot, and adapt. Mr. Shek illustrated this with a powerful analogy—training a scientist. You can give them textbooks (supervised learning), or you can place them in a lab to experiment and learn from outcomes (reinforcement learning).

It’s the latter approach, reinforcement learning, that enables AI to develop practical intelligence. The AI learns from environmental feedback—not just from human ratings. For example, if a tool fails to process a PDF, the agent doesn’t give up. It switches strategies, finds a new tool, and tries again.

A Live Example: AXA Tower Visual Identification
To highlight the new multi-modal and tool-using capabilities, Mr. Shek demonstrated OpenAI’s O3/O4 models with an image-based challenge. He uploaded an upside-down photo of a skyscraper and asked the model to identify the location. The AI agent first rotated the photo using code, zoomed in to read “AXA Tower” on the building, and reasoned further by noticing a red taxi—typical of Hong Kong. To confirm, it performed a web search and matched the address to AXA Tower in Kwun Tong, Hong Kong. This seamless use of coding, visual analysis, and web tools—chained autonomously by the AI—showcases the leap from text-based models to AI agents that solve real-world problems with minimal human guidance.

These abilities are vital in real-world applications like automation, customer service, and data processing, where variables shift constantly.


OpenAI’s Model Progression: o1, o3, o4—What’s in a Name?

Mr. Shek also explained the naming of OpenAI’s models. The “o” stands for “Omni,” reflecting OpenAI’s ambition to build models that are versatile across tasks and modalities. o1 was the first reasoning model released in 2024. o3 and o4 represent the next generations, skipping “o2” due to a trademark conflict with a telecom brand. o3 and o4 marked a leap in autonomy, integrating capabilities such as tool usage, coding, and internet browsing directly into the core AI architecture.


Breaking Free from Human Path Dependency

A particularly thought-provoking section of the lecture was Mr. Shek’s discussion on “path dependency”—our human bias toward established norms and inherited systems. He shared a fascinating historical tale tracing the width of a modern rocket booster all the way back to Roman horse carriages. Through centuries of transportation evolution, what began as a standard for chariot wheels inadvertently shaped the dimensions of railway tunnels, which in turn influenced rocket design.

This anecdote underscored how human systems can get locked into suboptimal decisions simply because "that’s how it’s always been done." AI, unconstrained by history, can approach problems from a clean slate. But that also raises concerns: AI might find solutions that humans never would—not always for the better. It emphasizes the need for clear goal-setting and rigorous oversight.


Reinforcement Learning in Action

To illustrate how reinforcement learning drives AI agents, Mr. Shek described a game developed by OpenAI where AI agents play hide-and-seek. Over millions of iterations, they devised ingenious strategies—blocking doors, hiding ramps, or even riding moving objects—all without explicit programming.

This wasn’t just entertaining; it was evidence of how AI learns adaptively and creatively through trial and error. Mr. Shek warned that when these capabilities extend into real-world software or robotics, the potential for both innovation and disruption becomes enormous.

These scenarios highlight how reinforcement-trained agents can surprise even their creators, discovering loopholes or techniques that weren’t anticipated. It’s this unpredictability that fuels the dual narrative of AI’s promise and peril.


Conclusion: From Understanding to Application

Mr. Shek closed Part 1 of his talk by urging students to explore the potential of AI not just theoretically, but practically. He encouraged experimentation with tools like ChatGPT, agent-based platforms, and reinforcement-learning environments to build hands-on understanding.

AI agents represent the convergence of learning, reasoning, and action. For marketers, educators, and digital professionals, they offer a glimpse into a future where AI isn’t just reactive but adaptive and proactive. Mastering how to collaborate with such systems will be key to future success.

In addition to the theoretical frameworks, Mr. Shek also covered live demos and applied use cases—including multi-modal reasoning, Agentic RAG in customer support, and autonomous research agents like Manus—offering students a front-row seat to the next chapter of intelligent automation.

— Dr. Ken FONG


中文總結

2025年5月17日,OMP創辦人、生成式AI商業應用專家石家偉先生,應邀為理工大學SPEED「人工智能驅動數碼及社交媒體市場營銷專業證書」的學生帶來一場極具啟發性的演講。本文將從AI發展五級、推理模型、AI Agent的興起及其對業界的衝擊,總結重點內容,並剖析如何以創新心態擁抱智能自動化的未來。

一、AI五級進化論:由Chatbot到組織級AI
石Sir首先介紹OpenAI在2024年提出的AI五級進化論,即AI能力將分為五大階段:1) Chatbot 聊天機械人,專注對話、簡單應答和內容生成,未具持久記憶及推理能力;2) Reasoner 推理型AI,具備多步邏輯推理、分析解難(如o1、R1等模型);3) Agent 行動型AI,懂得操作環境、選用工具、自動完成目標任務(如自動錄入數據至會計系統);4) Innovator 創新者級AI,能突破傳統知識局限,以第一性原理作出全新發現和科技突破;5) Organizational AI 組織級AI,協調多個AI團隊、優化企業決策、甚至取代高階管理層。
現時正值第三級「Agent」階段,AI不再僅僅「答問題」,而是能主動分解目標、規劃步驟、調用多種工具於實境自動處理複雜任務,這對商業應用、流程自動化和智能營銷均是革命性突破。

二、從LLM到AI Agent:AI如何走出「高分低能」困局?
傳統大型語言模型(LLM)如ChatGPT,主要以理解和產生語言內容為主,回應查詢、總結資料、生成文案等表現出色,但其行動力有限,仍需人手執行後續步驟(如人工錄入資料)。石Sir形容,LLM像極了一位理論滿分、實戰不足的博士,能解釋知識卻不能親自落場解決問題。
AI Agent的出現,令AI由「被動反應」走向「主動解決」。Agent會根據指令拆解多步驟流程,判斷工具運用、策略切換,遇到失敗時能自動嘗試其他方法,直至任務完成。例如處理收據時,Agent不單可辨識文字,更可自動選擇會計軟件並輸入數據,省卻大量人手操作。
而這種能力,正是推理模型(如OpenAI o3、o4)與強化學習(Reinforcement Learning)結合的成果。AI不單靠預先輸入的知識,更可像小朋友學踩單車般,不斷試錯、從環境回饋中自我改進,具備真正的「行動智慧」。

三、推理模型、o3/o4多模態工具應用現場演示
講座中,石Sir特別展示了OpenAI o3/o4模型的新突破。這類模型能模擬人類計劃、調試、學習過程,並已內建Python編程、網絡搜尋、視覺分析等工具,支援多模態數據處理。
現場示範中,他向AI模型上傳一張倒轉的AXA大廈照片,AI先以程式碼轉正圖像,再自動放大辨識「AXA Tower」字樣,通過紅色的士辨別香港地點,最終再進行網絡查證比對,成功定位觀塘巧明街AXA大廈。此過程完全由AI自動串連,從圖像處理、邏輯推斷到資訊查詢,展現AI Agent由單一任務走向多工具協同、複雜任務解決的實力。
而o3、o4之命名,「o」即Omni,意指全能、多模態——OpenAI期望打造能跨語言、跨媒介、跨任務的全方位AI平台。O2因商標原因跳過不發,顯示AI界進展迅速且靈活。

四、跳出路徑依賴:AI帶來創新與監管新課題
石Sir特別強調「路徑依賴」這一人類常見的思維陷阱——例如火箭助推器尺寸竟源於千年前羅馬戰車輪距,這說明技術發展往往受限於過去歷史規範。AI則可以拋開這些包袱,用嶄新思路解決問題(但同時也可能帶來不可預測的風險)。
他指出AI Agent若失去監管,可能會在追求目標過程中鑽空子、忽視倫理與安全,因此業界需要對AI設計明確目標、設限並進行多層次監控,甚至以AI監管AI,確保其決策符合社會利益。

五、強化學習現場案例:自我學習與創意湧現
他分享OpenAI開發的「躲貓貓」遊戲:AI Agent經過數百萬次模擬,自行學會用箱子堵門、隱藏斜台、甚至借用對手的工具尋找突破口。這不僅展示AI的學習創新能力,也提醒我們其潛在不可控與意外創意(如發現漏洞)的雙刃劍特質。

六、應用前瞻:AI Agent於商業、客服、教育之未來
最後,石Sir勉勵同學,必須從理論走向實踐,積極動手體驗ChatGPT、AI Agent平台、強化學習環境,才能真正掌握AI時代的新技能。
他亦簡介了AI Agent於客服(如Agentic RAG)、企業自動化(如Manus自主式AI Agent)等實際應用場景:AI可根據顧客查詢,自動判斷數據來源、運用適當工具、生成回應,甚至自動整理報價、核查會員身份。這不僅提升工作效率,更推動企業邁向數字轉型與智能決策。
總結而言,AI Agent是學習、推理、行動三位一體的新物種,掌握其運用將是數碼營銷、創業、教育等行業致勝關鍵。擁抱創新與監管並行、發揮AI最大潛力,是每位未來專才不可或缺的修煉。


Keywords

AI Agent, Generative AI, Reinforcement Learning, Reasoning Models, Agentic RAG, Manus AI, LLM vs AI Agent, OpenAI O1, OpenAI O3, OpenAI O4, Path Dependency, Multi-modal Reasoning, Intelligent Automation, PolyU SPEED, SME Digital Transformation, Innovation in AI, Organizational AI, AI Levels Framework, Autonomous Agents, Business Application of AI, Digital Marketing AI

Post a Comment

0 Comments