07 Agent Autonomous Planning and Tool Development
Summary
This material explores the capabilities and design principles of AI Agents, systems powered by Large Language Models (LLMs) capable of autonomous planning, memory, and tool use to execute complex task
AI Agent Fundamentals
An AI Agent is an LLM-powered system designed to autonomously understand, plan, and execute complex tasks given a high-level goal.
Core Capabilities
-
Planning: Decomposing large tasks into smaller, manageable sub-goals.
-
Memory:
Code_Short-term memory_*: Used for in-context learning within the current interaction. _Long-term memory_*: Stores past experiences and knowledge, often through external vector databases for fast retrieval, enabling learning over time. -
Tool Use: The ability to call external APIs or functions (e.g., search engines, code interpreters, custom services) to gather additional information or perform specific actions.
Generative Agents
Originating from research by Google and Stanford ("Generative Agents: Interactive Simulacra of Human Behavior," 2023), Generative Agents are designed to simulate believable human and emergent group behaviors.
-
Smallville Example: A simulated town where 25 agents live, interact, remember, reflect, and plan their day using natural language.
-
Agent-Environment Interaction: Agents navigate and interact with detailed environments (e.g., cafes, houses with kitchens), influencing the state of the world and other agents.
-
Information Propagation: Agents possess social attributes, allowing information to spread and relationships to form organically through interactions.
-
Memory Stream: The core module, acting as a comprehensive database of all agent experiences. Agents retrieve relevant records from this stream for planning actions and reacting to the environment, enabling higher-level behavior instructions.
-
Core Architectural Modules:
-
Long-term Memory: Manages extensive historical data, critical for multi-round decisions that exceed LLM token limits.
-
External Tools: Enables the LLM to interact with external services and APIs, augmenting its capabilities.
-
Short-term Memory: Uses attention mechanisms to feed the most relevant parts of long-term memory to the LLM for current decision-making.
-
AI Agent Framework Comparison
Different frameworks offer varying architectures and strengths for building LLM applications.
Tool | Core Positioning | Architecture Features | Applicable Scenarios
LangChain | Open-source LLM app development framework | Chain-based linear or branching workflows, supports Agent mode | Rapidly building RAG, dialogue systems, tool calling.
LangGraph | LangChain extension for complex workflows | Graph-based, supports cycles, conditional logic, multi-Agent collaboration | Complex tasks requiring loops, dynamic branching, or state management (e.g., adaptive RAG).
Qwen-Agent | Alibaba Cloud's AI Agent framework | Based on Alibaba Cloud's LLM, multi-modal interaction, tool calling, MCP (Model-Controller-Perception) | Open-source, integrated with various tools for the Qwen model ecosystem.
Coze | ByteDance's no-code AI Bot platform | Visual drag-and-drop, built-in knowledge base, multi-modal plugins | Quickly deploying social media bots, lightweight workflows.
Dify | Open-source LLM application development platform | API-first, supports Prompt engineering and flexible orchestration | Developers needing customized LLM applications, deep integration, or private deployment.
When to Use Agents & Best Practices
AI Agents are best suited for open-ended problems where the number of steps is unpredictable, and a fixed, hardcoded path is not feasible.
- Decision Trust: LLMs might run multiple iterations, so a degree of trust in their decision-making capabilities is essential.
- Costs and Errors: Autonomy comes with higher operational costs and the potential for error accumulation. Extensive testing in sandbox environments with appropriate guardrails is crucial.
- Complementary with Workflows: Agents and workflows are synergistic. Agents handle specific tasks, while workflows coordinate these tasks into coherent, efficient processes. This enhances automation, scalability, and resilience.
Three Core Principles for Building Agents
- Keep Agent Design Simple: Focus on core functionality, avoiding unnecessary complexity.
- Prioritize Transparency: Clearly show the agent's planning steps so users can understand its decision process.
- Build Functions/MCP: Develop robust tools with clear documentation and tests to ensure reliable interaction with external environments.
Agent Classification
AI Agents can be classified into three main architectures based on their decision-making processes.
1. Reactive Agents
- Characteristics: "Intuitive" agents that make immediate decisions based on the current environment without long-term planning, relying on pre-set rules or direct LLM inference.
- Working Principle: Perceive environment input -> LLM/rule system generates action -> Execute action -> Observe results -> Repeat until task completion.
- Advantages: Fast response times (milliseconds), simple, and reliable for well-defined tasks.
- Limitations: Lacks adaptability for unforeseen scenarios, short-sighted, and may get stuck in local loops.
- Typical Applications: Robot obstacle avoidance, game AI (NPCs reacting to player attacks), industrial control systems (triggering alarms).
- Applicable Scenarios: Tasks with clear rules, real-time response requirements, and no need for long-term strategic planning.
CASE: Private Fund Operations Guidance Q&A Assistant (Reactive)
This agent, built with LangChain and Qwen-Turbo, provides rule consultation for private funds using a reactive architecture.
-
Key Features: Autonomous tool selection, transparent thinking, knowledge boundary awareness, multi-tool collaboration, exception handling.
-
Step 1: Data Preparation: Knowledge base (
FUND_RULES_DB) with rule IDs, categories, questions, and answers.JSON{ "id": "rule001", "category": "设立与募集", "question": "私募基金的合格投资者标准是什么?", "answer": "合格投资者是指具备相应风险识别能力和风险承担能力..." } -
Step 2: Tool Design:
-
search_rules_by_keywords: Retrieves rules via keyword matching. -
search_rules_by_category: Queries rules based on their category. -
answer_question: Directly answers based on question-rule matching.
-
-
Step 3: Agent Architecture: Uses LangChain's Agent framework:
CustomPromptTemplatefor defining thinking format,CustomOutputParserfor LLM output, andAgentExecutorfor managing agent-tool interactions. -
Step 4: Knowledge Boundary Handling: Identifies question topics, distinguishes knowledge base content from LLM's general knowledge, and provides guidance for out-of-scope queries.
2. Deliberative Agents
- Characteristics: Agents that engage in long-term planning, building internal models, and reasoning to select optimal action plans. They are goal-oriented.
- Core Flow: Perceive environment -> Model (update internal world state) -> Reason (generate candidate plans, simulate outcomes) -> Decide (select optimal plan) -> Execute.
- Advantages: Capable of handling complex, multi-step tasks, optimizes for long-term goals, and adapts to dynamic environments.
- Typical Applications: Path planning, logistics scheduling, complex investment decisions.
- Applicable Scenarios: Tasks requiring strategic planning and multi-step optimization, similar to a chess master.
CASE: Smart Investment Research Assistant (Deliberative)
This agent, built with LangGraph, integrates market data, performs multi-step analysis, and generates investment views and research reports for investment research scenarios.
-
Key Features: Internal modeling, multi-plan generation, plan evaluation, long-term planning, transparent reasoning.
-
Processing Steps:
Code_Step 1: Perception_*: Collects market overview, key indicators, recent news, and industry trends. _Step 2: Modeling_*: Builds an internal world model based on collected data, assessing market state, economic cycle, risks, opportunities, and sentiment. _Step 3: Reasoning_*: Generates multiple candidate investment analysis plans, each with hypotheses, analysis approaches, expected outcomes, and pros/cons. _Step 4: Decision_*: Evaluates candidate plans and selects the optimal investment viewpoint, forming a thesis, supporting evidence, risk assessment, and recommendation. _Step 5: Report_*: Generates a comprehensive investment research report. -
State Definition: Uses
TypedDictforResearchAgentStateto maintain the agent's complete state across phases.Pythonfrom typing import Literal, Optional, Dict, Any, List from typing_extensions import TypedDict class ResearchAgentState(TypedDict): research_topic: str industry_focus: str time_horizon: str perception_data: Optional[Dict[str, Any]] world_model: Optional[Dict[str, Any]] reasoning_plans: Optional[List[Dict[str, Any]]] selected_plan: Optional[Dict[str, Any]] final_report: Optional[str] current_phase: Literal["perception", "modeling", "reasoning", "decision", "report"] error: Optional[str] -
Output Models: Pydantic models (e.g.,
PerceptionOutput,ModelingOutput) enforce structured data output for each stage. -
Workflow Implementation: LangGraph's
StateGraphconnects each phase as a directed graph.Pythonfrom langgraph.graph import StateGraph, END def create_research_agent_workflow() -> StateGraph: workflow = StateGraph(ResearchAgentState) workflow.add_node("perception", perception) workflow.add_node("modeling", modeling) workflow.add_node("reasoning", reasoning) workflow.add_node("decision", decision) workflow.add_node("report", report_generation) workflow.set_entry_point("perception") workflow.add_edge("perception", "modeling") workflow.add_edge("modeling", "reasoning") workflow.add_edge("reasoning", "decision") workflow.add_edge("decision", "report") workflow.add_edge("report", END) return workflow.compile() -
Phase Processing Logic: Each phase typically involves checking preconditions, preparing LLM prompts, invoking the LLM, parsing results, updating the state, and handling errors.
3. Hybrid Agents
-
Characteristics: Combines the immediate response of reactive agents with the strategic planning of deliberative agents to achieve a balance of intelligence and efficiency.
-
Three-Layer Design:
Code_Bottom (Reactive Layer)_*: Handles simple, urgent tasks with millisecond response times based on pre-set rules. _Middle (Coordinator Layer)_*: Evaluates task type and priority, dynamically selecting the appropriate processing mode (reactive or deliberative). _Top (Deliberative Layer)_*: Manages complex analyses and long-term planning, building internal models and generating multiple alternatives. -
Operation Mechanism: An arbitration system (e.g., a supervisor) dynamically switches between reactive mode for emergencies and deliberative mode for routine or strategic tasks.
-
Typical Applications: Autonomous driving (immediate braking for obstacles, strategic route planning for normal driving).
-
Core Advantages: Possesses both real-time responsiveness and strategic planning capabilities.
CASE: Investment Advisor AI Assistant (Hybrid)
This agent provides intelligent, personalized wealth management advice, combining the immediate response of a reactive agent with the long-term planning of a deliberative agent via a coordination layer.
-
Processing Flow:
Code_Step 1: Query Assessment (Coordinator Layer)_*: Assesses the user query for type (emergency, informational, analytical) and determines the appropriate processing mode (reactive or deliberative) based on complexity, urgency, and resource needs. _Step 2A: Reactive Processing_*: For simple queries (e.g., "What's the Shanghai Index today?"), providing low-latency, direct data-driven answers. _Step 2B: Deliberative Processing_*: For complex analytical queries (e.g., "How to adjust my portfolio for economic recession?"), involving data collection, deep analysis, and generating detailed recommendations across multiple steps. -
State Management:
WealthAdvisorStateusesTypedDictto maintain the complete state, including query type, processing mode, and results from different layers.Pythonclass WealthAdvisorState(TypedDict): user_query: str customer_profile: Optional[Dict[str, Any]] query_type: Optional[Literal["emergency", "informational", "analytical"]] processing_mode: Optional[Literal["reactive", "deliberative"]] emergency_response: Optional[Dict[str, Any]] market_data: Optional[Dict[str, Any]] analysis_results: Optional[Dict[str, Any]] final_response: Optional[str] current_phase: Literal["assess", "reactive", "collect_data", "analyze", "recommend", "respond"] error: Optional[str] -
Usage Scenarios: Demonstrates handling a market info query reactively (fast response) and a portfolio optimization query deliberatively (multi-step analysis, detailed report).
LangGraph Usage Specifics
LangGraph is a powerful framework for building complex, stateful LLM applications, especially multi-agent systems.
-
ChatTongyi: A LangChain wrapper for Alibaba Cloud's Qwen models. It supports
bind_tools(), a critical feature that allows the LLM to understand and request specific tool calls.Pythonfrom langchain_community.chat_models import ChatTongyi # Create LLM instance llm = ChatTongyi(model_name="qwen-turbo-latest", dashscope_api_key="YOUR_API_KEY") # Bind tools to LLM (key step for Agent autonomy) llm_with_tools = llm.bind_tools(tools) -
Tool Calling in LangGraph:
-
Tools are defined using the
@tooldecorator. -
The LLM analyzes the user query, decides which tool to call,
ToolNodeexecutes the tool, returns the result, and the LLM then generates a response (potentially calling more tools).
Pythonfrom langchain_core.tools import tool @tool def query_shanghai_index() -> str: """查询上证指数实时行情,获取当前点位、涨跌和涨跌幅信息""" # ... actual API call logic ... return "上证指数 当前点位: 3125.62,涨跌: 6.32,涨跌幅: 0.20%" tools = [query_shanghai_index] # Example tool list-
A function like
should_continue_toolsis often used to check if the LLM's last message containstool_calls, indicating that tools need to be executed.Pythondef should_continue_tools(state: WealthAdvisorState) -> str: messages = state.get("messages", []) last_message = messages[-1] if hasattr(last_message, "tool_calls") and last_message.tool_calls: return "tools" # Needs to execute tools return "end" # No tools, end
-
-
ToolNode: A pre-built LangGraph node specifically for executing tool calls. It receives messages containing tool call requests, executes the corresponding functions, and returns the results as
ToolMessageobjects to update the state.Pythonfrom langgraph.prebuilt import ToolNode tool_node = ToolNode(tools) workflow.add_node("tools", tool_node) -
StateGraph: The core class in LangGraph, enabling definition and management of state transition graphs for complex, stateful workflows.
Code_State Management_*: Maintains a shared state (defined by `TypedDict`) that nodes can read and update. _Nodes_*: Functions encapsulated as workflow nodes (`add_node()`). _Routing_*: Conditional (`add_