LLM大模型:OpenManus原理

news/2025/3/15 23:43:35/文章来源:https://www.cnblogs.com/theseventhson/p/18773966

   继deepseek之后,武汉一个开发monica的团队又开发了manus,号称是全球第一个通用的agent!各路自媒体企图复刻下一个deepseek,疯狂报道!然而manus发布后不久,metaGPT团队5个工程师号称耗时3小时就搞定了一个demo版本的manus,取名openManus,才几天时间就收获了34.4K的start,又火出圈了!今天就研究一下openManus的核心原理!

   1、先说说为什么要agent:

  • 目前的LLM只能做决策,无法落地实施,所以还需要外部的tool具体干活
  • 目前的LLM虽然已经有各种COT,但纯粹依靠LLM自己完成整个链条是不行的,还是需要人为介入做plan、action、review等工作

  所以agent诞生了!不管是deep search、deep research、manus等,核心思路都是一样的:plan->action->review->action->review...... 如此循环下去,直到触发结束的条件!大概的流程如下:

  

   具体到openManus,核心的流程是这样的:用户输入prompt后,有专门的agent调用LLM针对prompt做任务拆分,把复杂的问题拆解成一个个细分的、逻辑连贯的小问题,然后对于这些小问题,挨个调用tool box的工具执行,最后返回结果给用户!

       

   这类通用agent最核心的竞争力就两点了:

  • plan是否准确:这个主要看底层LLM的能力,对prompt做命名实体识别意图识别
  • tool box的工具是否丰富:用户的需求是多样的,tool是否足够满足用户需求?

     2、先来看看openManus的目录结构:4个文件夹,分别是agent、flow、prompt、tool,只看名字就知道这个模块的功能了!

  

   整个程序入口肯定是各种agent啦!各大agent之间的关系如下:

  

  (1)agent核心的功能之一不就是plan么,openManus的prompt是这么干的:promt中就直接说明了是expert plan agent,需要生成可执行的plan!

PLANNING_SYSTEM_PROMPT = """
You are an expert Planning Agent tasked with solving problems efficiently through structured plans.
Your job is:
1. Analyze requests to understand the task scope
2. Create a clear, actionable plan that makes meaningful progress with the `planning` tool
3. Execute steps using available tools as needed
4. Track progress and adapt plans when necessary
5. Use `finish` to conclude immediately when the task is completeAvailable tools will vary by task but may include:
- `planning`: Create, update, and track plans (commands: create, update, mark_step, etc.)
- `finish`: End the task when complete
Break tasks into logical steps with clear outcomes. Avoid excessive detail or sub-steps.
Think about dependencies and verification methods.
Know when to conclude - don't continue thinking once objectives are met.
"""NEXT_STEP_PROMPT = """
Based on the current state, what's your next action?
Choose the most efficient path forward:
1. Is the plan sufficient, or does it need refinement?
2. Can you execute the next step immediately?
3. Is the task complete? If so, use `finish` right away.Be concise in your reasoning, then select the appropriate tool or action.
"""

  prompt有了,接着就是让LLM对prompt生成plan了,在agent/planning.py文件中:

async def create_initial_plan(self, request: str) -> None:"""Create an initial plan based on the request."""logger.info(f"Creating initial plan with ID: {self.active_plan_id}")messages = [Message.user_message(f"Analyze the request and create a plan with ID {self.active_plan_id}: {request}")]self.memory.add_messages(messages)response = await self.llm.ask_tool(messages=messages,system_msgs=[Message.system_message(self.system_prompt)],tools=self.available_tools.to_params(),tool_choice=ToolChoice.AUTO,)assistant_msg = Message.from_tool_calls(content=response.content, tool_calls=response.tool_calls)self.memory.add_message(assistant_msg)plan_created = Falsefor tool_call in response.tool_calls:if tool_call.function.name == "planning":result = await self.execute_tool(tool_call)logger.info(f"Executed tool {tool_call.function.name} with result: {result}")# Add tool response to memorytool_msg = Message.tool_message(content=result,tool_call_id=tool_call.id,name=tool_call.function.name,)self.memory.add_message(tool_msg)plan_created = Truebreakif not plan_created:logger.warning("No plan created from initial request")tool_msg = Message.assistant_message("Error: Parameter `plan_id` is required for command: create")self.memory.add_message(tool_msg)

  plan生成后,就是think和act的循环啦!同理,这部分实现代码在agent/toolcall.py中,如下:think的功能是让LLM选择干活的工具,act负责调用具体的工具执行

async def think(self) -> bool:"""Process current state and decide next actions using tools"""if self.next_step_prompt:user_msg = Message.user_message(self.next_step_prompt)self.messages += [user_msg]# Get response with tool options:让LLM选择使用哪种工具干活response = await self.llm.ask_tool(messages=self.messages,system_msgs=[Message.system_message(self.system_prompt)]if self.system_promptelse None,tools=self.available_tools.to_params(),tool_choice=self.tool_choices,)self.tool_calls = response.tool_calls# Log response infologger.info(f"✨ {self.name}'s thoughts: {response.content}")logger.info(f"🛠️ {self.name} selected {len(response.tool_calls) if response.tool_calls else 0} tools to use")if response.tool_calls:logger.info(f"🧰 Tools being prepared: {[call.function.name for call in response.tool_calls]}")try:# Handle different tool_choices modesif self.tool_choices == ToolChoice.NONE:if response.tool_calls:logger.warning(f"🤔 Hmm, {self.name} tried to use tools when they weren't available!")if response.content:self.memory.add_message(Message.assistant_message(response.content))return Truereturn False# Create and add assistant messageassistant_msg = (Message.from_tool_calls(content=response.content, tool_calls=self.tool_calls)if self.tool_callselse Message.assistant_message(response.content))self.memory.add_message(assistant_msg)if self.tool_choices == ToolChoice.REQUIRED and not self.tool_calls:return True  # Will be handled in act()# For 'auto' mode, continue with content if no commands but content existsif self.tool_choices == ToolChoice.AUTO and not self.tool_calls:return bool(response.content)return bool(self.tool_calls)except Exception as e:logger.error(f"🚨 Oops! The {self.name}'s thinking process hit a snag: {e}")self.memory.add_message(Message.assistant_message(f"Error encountered while processing: {str(e)}"))return Falseasync def act(self) -> str:"""Execute tool calls and handle their results"""if not self.tool_calls:if self.tool_choices == ToolChoice.REQUIRED:raise ValueError(TOOL_CALL_REQUIRED)# Return last message content if no tool callsreturn self.messages[-1].content or "No content or commands to execute"results = []for command in self.tool_calls:result = await self.execute_tool(command)#调用具体的工具干活if self.max_observe:result = result[: self.max_observe]logger.info(f"🎯 Tool '{command.function.name}' completed its mission! Result: {result}")# Add tool response to memorytool_msg = Message.tool_message(content=result, tool_call_id=command.id, name=command.function.name)self.memory.add_message(tool_msg)results.append(result)return "\n\n".join(results)

  think和act是循环执行的,直到满足停止条件,这部分功能在agent/base.py实现的:

async def run(self, request: Optional[str] = None) -> str:"""Execute the agent's main loop asynchronously.Args:request: Optional initial user request to process.Returns:A string summarizing the execution results.Raises:RuntimeError: If the agent is not in IDLE state at start."""if self.state != AgentState.IDLE:raise RuntimeError(f"Cannot run agent from state: {self.state}")if request:self.update_memory("user", request)results: List[str] = []async with self.state_context(AgentState.RUNNING):while ( # 循环停止的条件:达到最大步数,或agent的状态已经是完成的了self.current_step < self.max_steps and self.state != AgentState.FINISHED):self.current_step += 1logger.info(f"Executing step {self.current_step}/{self.max_steps}")step_result = await self.step()# Check for stuck stateif self.is_stuck():self.handle_stuck_state()results.append(f"Step {self.current_step}: {step_result}")if self.current_step >= self.max_steps:self.current_step = 0self.state = AgentState.IDLEresults.append(f"Terminated: Reached max steps ({self.max_steps})")return "\n".join(results) if results else "No steps executed"

  既然是while循环迭代,那每次迭代又有啥不一样的了?举个例子:查找AI最新的新闻,并保存到文件中。第一次think,调用LLM的时候输入用户的prompt和相应的人设、能使用的tool,让LLM自己选择一个合适的tool,并输出到response中!这里的LLM选择了google search去查找新闻,并提供了google search的query!

  

   第二次think,给LLM输入的prompt带上了第一轮的prompt和response,类似多轮对话,把多个context收集到一起作为这次的最新的prompt,让LLM继续输出结果,也就是第三次的action是啥!

  

   第三次think:同样包含前面两次的promt!但这次LLM反馈已经不需要调用任何工具了,所以这个query至此已经完全结束!

  

   整个流程简单!另外,用户也可以添加自己的tool,只要符合MCP协议就行!

  

 

 

 

参考:

1、https://github.com/mannaandpoem/OpenManus/blob/main/README_zh.md   

 2、https://www.bilibili.com/video/BV1WzQPYWEGY/?spm_id_from=333.1007.tianma.8-3-29.click&vd_source=241a5bcb1c13e6828e519dd1f78f35b2  openmanus核心代码解读

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/899445.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Day14_TCP三次握手

每日一题 TCP三次握手详解 三次握手(Three-Way Handshake) 是TCP协议建立可靠连接的核心过程,确保通信双方能够正常收发数据并同步初始序列号。以下是详细步骤和原理:1. 第一次握手:SYN(客户端 → 服务器)动作:客户端发送一个TCP报文,设置SYN=1(同步标志位),并生成…

Paimon merge into 实现原理

语法 MERGE INTO target USING source ON source.a = target.a WHEN MATCHED THEN UPDATE SET a = source.a, b = source.b, c = source.c WHEN NOT MATCHED THEN INSERT (a, b, c) values (a, b, c)merge into 实际上是一个语法糖, 相对应的语义也可以通过其他的 sql…

MACD

目录背景和价值用法快线在0轴上方 - 多头较强,否则多头较弱快线上穿慢线 形成金叉,形成多头信号。 快线下穿慢线 形成死叉,形成空头信号顶背离和底背离参考资料 背景和价值 指数平滑移动平均线两个(12和26)均线相交,12EMA上穿26EMA形成金叉,快线从0轴下方上穿0轴上方 用…

python 文件打包成 whl

首先需要安装 wheel, setuptools pip install setuptools wheel简单进行一个打包的例子,项目目录结构如下:# __init__.py def pytest_collection_modifyitems(session, config, items):for item in items:# item.name 用例名称item.name = item.name.encode(utf-8).decode(un…

cmake识别不到vcpkg安装的包的解决(以libssh为例)

承接上篇 vcpkg 跨平台的c/c++库包管理工具(以libssh为例) - 夕西行 - 博客园 vcpkg安装libssh后,vs2022创建的cmake项目竟然不能find_package到libssh 问题出在CMakeLists.txt,注意位置1、2、3的顺序一定不能变cmake_minimum_required (VERSION 3.20)#vcpkg————位置1 …

昆工昆明理工大学冶金最新复试真题及答案

--冶金工程考研809冶金物理化学有色冶金学有色金属冶金冶金过程及设备F002钢铁冶金学冶金调剂

《Transformer自然语言处理实战 : 使用Hugging Face Transformers库构建NLP应用》PDF免费下载

《Transformer自然语言处理实战》聚焦 Hugging Face Transformers 库,系统讲解 Transformer 模型在 NLP 任务中的应用。涵盖文本分类、命名实体识别、机器翻译等核心技术,并提供实践案例,帮助读者快速掌握模型微调与部署。适合 NLP 初学者及希望深入理解 Transformer 的开发…

【论文阅读】maskformer: Per-Pixel Classification is Not All You Need for Semantic Segmentation

标题 Per-Pixel Classification is Not All You Need for Semantic Segmentation (NIPS 2021) 论文:Per-Pixel Classification is Not All You Need for Semantic Segmentation 代码:https://github.com/facebookresearch/MaskFormer 摘要 ​ 现代方法通常将语义分割视为逐…

鸿蒙特效教程04-直播点赞动画效果实现教程

鸿蒙特效教程04-直播点赞动画效果实现教程 在时下流行的直播、短视频等应用中,点赞动画是提升用户体验的重要元素。当用户点击屏幕时,屏幕上会出现飘动的点赞图标,感觉挺好玩的。本教程适合HarmonyOS初学者,通过简单到复杂的步骤,通过HarmonyOS的Canvas组件,一步步实现这…

鸿蒙特效教程02-微信语音录制动画效果实现教程

鸿蒙特效教程02-微信语音录制动画效果实现教程本教程适合HarmonyOS初学者,通过简单到复杂的步骤,一步步实现类似微信APP中的语音录制动画效果。最终效果预览 我们将实现以下功能:长按"按住说话"按钮:显示录音界面和声波动画 录音过程中显示实时时长 手指上滑:取…

SpringBoot使用Kafka生产者、消费者

SpringBoot使用Kafka生产者、消费者@目录依赖配置文件生产者消费者 依赖 <!--kafka--> <dependency><groupId>org.springframework.kafka</groupId><artifactId>spring-kafka</artifactId><version>xxx</version> </depend…