
AI 代理系统如今风靡全球!它们是简单的 LLM,与特定的提示和工具相连,可以自主地为你完成任务。不过,你也可以建立可靠的分步工作流程,指导 LLM 更可靠地为你解决问题。最近,OpenAI 在 2025 年 2 月推出了 “深度研究”(Deep Research),它是一个代理,可以根据用户的主题,自动运行大量搜索,并将其编译成一份精美的报告。不过,它只适用于 200 美元的专业计划。在这里,我将手把手教你如何使用 LangGraph 以不到一美元的价格创建自己的深度研究和报告生成代理!
OpenAI深度研究简介
OpenAI 于 2025 年 2 月 2 日推出了深度研究,并将其作为 ChatGPT 产品的一项附加功能。他们称这是一种新的代理能力,可以针对用户提出的复杂任务或查询在互联网上进行多步骤研究。他们声称,它可以在数十分钟内完成人类需要花费数小时才能完成的工作。

深度研究执行任务-来源:OpenAI
深度研究是 OpenAI 当前的 Agentic AI 产品,可以自主为您完成工作。您只需通过提示给它一个任务或主题,ChatGPT 就会查找、分析和综合数百个在线资料来源,以研究分析师的水平创建一份综合报告。ChatGPT 由即将推出的 OpenAI o3 模型版本提供支持,该模型针对网页浏览和数据分析进行了优化,它利用推理来搜索、解释和分析互联网上的海量文本、图片和 PDF 文件,最终编制出一份结构合理的报告。
不过,这也有一些限制,因为只有订阅了 200 美元的 ChatGPT 专业版才能使用它。这就是我的 Agentic AI 系统的优势所在,它可以在不到一美元的时间内进行深入研究,并编写出一份精美的报告。让我们开始吧!
深度研究与结构化报告生成规划Agentic AI系统架构
下图显示了我们系统的整体架构,我们将使用 LangChain 的 LangGraph 开源框架来实现该系统,从而轻松构建有状态的代理系统。

深度研究与报告生成AI代理
为上述系统提供动力的关键组件包括
- 强大的大型语言模型(Large Language Model),推理能力强。我们使用的是 GPT-4o,它并不昂贵,速度也很快,不过,你甚至可以使用 Llama 3.2 等 LLM 或其他开源替代品。
- LangGraph 用于构建我们的代理系统,因为它是构建基于循环图的系统的绝佳框架,可以在整个工作流程中保持状态变量,并有助于轻松构建代理反馈回路。
- Tavily AI 是一款出色的人工智能搜索引擎,非常适合网络研究和从网站获取数据,为我们的深度研究系统提供动力。
本项目的重点是为深度研究和结构化报告生成构建一个规划代理 ,作为 OpenAI 深度研究的替代方案。该代理遵循流行的规划代理设计模式(Planning Agent Design Pattern),自动分析用户定义的主题、执行深度网络研究并生成结构良好的报告。这个工作流程的灵感实际上来自 LangChain 自己的Report mAIstro,所以我对他们提出的工作流程给予了充分肯定:
1. 报告规划:
- 代理分析用户提供的主题和默认报告模板,为报告创建自定义计划。
- 根据主题定义导言、关键部分和结论等部分。
- 在确定主要章节之前,会使用网络搜索工具收集所需信息。
2. 2. 研究与写作并行执行:
- 代理使用并行执行来高效执行:
- 网络研究:为每个章节生成查询,并通过网络搜索工具执行,以检索最新信息。
- 撰写章节:利用检索到的数据为每个章节撰写内容,流程如下:
- 研究员从网上收集相关数据。
- 章节撰写人使用这些数据为指定章节生成结构化内容。
3. 格式化已完成的章节:
- 所有章节撰写完成后,将对其进行格式化,以确保报告结构的一致性和一致性。
4. 撰写引言和结论:
- 在完成主要章节的撰写和格式化之后:
- 根据其余章节的内容撰写引言和结论(同步进行)。
- 这一过程可确保这些部分与报告的整体流程和见解保持一致。
5. 最后汇编:
- 将所有已完成的章节汇编在一起,形成最终报告。
- 最终输出的是一份全面而有条理的维基文档式报告。
现在,让我们开始使用 LangGraph 和 Tavily 逐步构建这些组件。
深度研究与结构化报告生成规划AI代理系统的实践实施
现在,我们将根据上一节详细讨论的架构,通过详细说明、代码和输出,逐步实现深度研究报告生成器代理人工智能系统的端到端工作流程。
安装依赖项
我们首先安装必要的依赖库,这些库将用于构建我们的系统。其中包括 langchain、LangGraph 和用于生成漂亮标记符报告的 rich。
!pip install langchain==0.3.14
!pip install langchain-openai==0.3.0
!pip install langchain-community==0.3.14
!pip install langgraph==0.2.64
!pip install langchain==0.3.14
!pip install langchain-openai==0.3.0
!pip install langchain-community==0.3.14
!pip install langgraph==0.2.64
!pip install rich
!pip install langchain==0.3.14
!pip install langchain-openai==0.3.0
!pip install langchain-community==0.3.14
!pip install langgraph==0.2.64
!pip install rich
输入Open AI API密钥
我们使用 getpass() 函数输入 Open AI 密钥,这样就不会在代码中意外暴露密钥。
from getpass import getpass
OPENAI_KEY = getpass('Enter Open AI API Key: ')
from getpass import getpass
OPENAI_KEY = getpass('Enter Open AI API Key: ')
from getpass import getpass
OPENAI_KEY = getpass('Enter Open AI API Key: ')
输入Tavily Search API密钥
我们使用 getpass() 函数输入 Tavily Search 密钥,这样就不会在代码中意外暴露密钥。您可以从这里获取密钥,他们还提供免费服务。
TAVILY_API_KEY = getpass('Enter Tavily Search API Key: ')
TAVILY_API_KEY = getpass('Enter Tavily Search API Key: ')
TAVILY_API_KEY = getpass('Enter Tavily Search API Key: ')
设置环境变量
接下来,我们设置一些系统环境变量,这些变量将在以后验证 LLM 和 Tavily Search 时使用。
os.environ['OPENAI_API_KEY'] = OPENAI_KEY
os.environ['TAVILY_API_KEY'] = TAVILY_API_KEY
import os
os.environ['OPENAI_API_KEY'] = OPENAI_KEY
os.environ['TAVILY_API_KEY'] = TAVILY_API_KEY
import os
os.environ['OPENAI_API_KEY'] = OPENAI_KEY
os.environ['TAVILY_API_KEY'] = TAVILY_API_KEY
定义代理状态模式
我们使用 LangGraph 将代理系统构建为带有节点的图,其中每个节点都包含整个工作流程中的一个特定执行步骤。每个特定的操作集(节点)都有自己的模式,定义如下。您可以根据自己的报告生成风格进一步定制。
from typing_extensions import TypedDict
from pydantic import BaseModel, Field
from typing import Annotated, List, Optional, Literal
# defines structure for each section in the report
class Section(BaseModel):
description="Name for a particular section of the report.",
description: str = Field(
description="Brief overview of the main topics and concepts to be covered in this section.",
description="Whether to perform web search for this section of the report."
description="The content for this section."
class Sections(BaseModel):
sections: List[Section] = Field(
description="All the Sections of the overall report.",
# defines structure for queries generated for deep research
class SearchQuery(BaseModel):
search_query: str = Field(None, description="Query for web search.")
class Queries(BaseModel):
queries: List[SearchQuery] = Field(
description="List of web search queries.",
# consists of input topic and output report generated
class ReportStateInput(TypedDict):
topic: str # Report topic
class ReportStateOutput(TypedDict):
final_report: str # Final report
# overall agent state which will be passed and updated in nodes in the graph
class ReportState(TypedDict):
topic: str # Report topic
sections: list[Section] # List of report sections
completed_sections: Annotated[list, operator.add] # Send() API
report_sections_from_research: str # completed sections to write final sections
final_report: str # Final report
# defines the key structure for sections written using the agent
class SectionState(TypedDict):
section: Section # Report section
search_queries: list[SearchQuery] # List of search queries
source_str: str # String of formatted source content from web search
report_sections_from_research: str # completed sections to write final sections
completed_sections: list[Section] # Final key in outer state for Send() API
class SectionOutputState(TypedDict):
completed_sections: list[Section] # Final key in outer state for Send() API
from typing_extensions import TypedDict
from pydantic import BaseModel, Field
import operator
from typing import Annotated, List, Optional, Literal
# defines structure for each section in the report
class Section(BaseModel):
name: str = Field(
description="Name for a particular section of the report.",
)
description: str = Field(
description="Brief overview of the main topics and concepts to be covered in this section.",
)
research: bool = Field(
description="Whether to perform web search for this section of the report."
)
content: str = Field(
description="The content for this section."
)
class Sections(BaseModel):
sections: List[Section] = Field(
description="All the Sections of the overall report.",
)
# defines structure for queries generated for deep research
class SearchQuery(BaseModel):
search_query: str = Field(None, description="Query for web search.")
class Queries(BaseModel):
queries: List[SearchQuery] = Field(
description="List of web search queries.",
)
# consists of input topic and output report generated
class ReportStateInput(TypedDict):
topic: str # Report topic
class ReportStateOutput(TypedDict):
final_report: str # Final report
# overall agent state which will be passed and updated in nodes in the graph
class ReportState(TypedDict):
topic: str # Report topic
sections: list[Section] # List of report sections
completed_sections: Annotated[list, operator.add] # Send() API
report_sections_from_research: str # completed sections to write final sections
final_report: str # Final report
# defines the key structure for sections written using the agent
class SectionState(TypedDict):
section: Section # Report section
search_queries: list[SearchQuery] # List of search queries
source_str: str # String of formatted source content from web search
report_sections_from_research: str # completed sections to write final sections
completed_sections: list[Section] # Final key in outer state for Send() API
class SectionOutputState(TypedDict):
completed_sections: list[Section] # Final key in outer state for Send() API
from typing_extensions import TypedDict
from pydantic import BaseModel, Field
import operator
from typing import Annotated, List, Optional, Literal
# defines structure for each section in the report
class Section(BaseModel):
name: str = Field(
description="Name for a particular section of the report.",
)
description: str = Field(
description="Brief overview of the main topics and concepts to be covered in this section.",
)
research: bool = Field(
description="Whether to perform web search for this section of the report."
)
content: str = Field(
description="The content for this section."
)
class Sections(BaseModel):
sections: List[Section] = Field(
description="All the Sections of the overall report.",
)
# defines structure for queries generated for deep research
class SearchQuery(BaseModel):
search_query: str = Field(None, description="Query for web search.")
class Queries(BaseModel):
queries: List[SearchQuery] = Field(
description="List of web search queries.",
)
# consists of input topic and output report generated
class ReportStateInput(TypedDict):
topic: str # Report topic
class ReportStateOutput(TypedDict):
final_report: str # Final report
# overall agent state which will be passed and updated in nodes in the graph
class ReportState(TypedDict):
topic: str # Report topic
sections: list[Section] # List of report sections
completed_sections: Annotated[list, operator.add] # Send() API
report_sections_from_research: str # completed sections to write final sections
final_report: str # Final report
# defines the key structure for sections written using the agent
class SectionState(TypedDict):
section: Section # Report section
search_queries: list[SearchQuery] # List of search queries
source_str: str # String of formatted source content from web search
report_sections_from_research: str # completed sections to write final sections
completed_sections: list[Section] # Final key in outer state for Send() API
class SectionOutputState(TypedDict):
completed_sections: list[Section] # Final key in outer state for Send() API
实用函数
我们定义了几个实用函数,它们将帮助我们运行并行网络搜索查询并格式化从网络上获取的结果。
1. run_search_queries(…)
该函数将异步运行针对特定查询列表的 Tavily 搜索查询,并返回搜索结果。由于是异步的,因此它是非阻塞的,可以并行执行。
from langchain_community.utilities.tavily_search import TavilySearchAPIWrapper
from dataclasses import asdict, dataclass
# just to handle objects created from LLM reponses
def to_dict(self) -> Dict[str, Any]:
tavily_search = TavilySearchAPIWrapper()
async def run_search_queries(
search_queries: List[Union[str, SearchQuery]],
include_raw_content: bool = False
for query in search_queries:
# Handle both string and SearchQuery objects
# Just in case LLM fails to generate queries as:
# class SearchQuery(BaseModel):
query_str = query.search_query if isinstance(query, SearchQuery)
else str(query) # text query
# get results from tavily async (in parallel) for each search query
tavily_search.raw_results_async(
include_raw_content=include_raw_content
print(f"Error creating search task for query '{query_str}': {e}")
# Execute all searches concurrently and await results
search_docs = await asyncio.gather(*search_tasks, return_exceptions=True)
# Filter out any exceptions from the results
doc for doc in search_docs
if not isinstance(doc, Exception)
print(f"Error during search queries: {e}")
from langchain_community.utilities.tavily_search import TavilySearchAPIWrapper
import asyncio
from dataclasses import asdict, dataclass
# just to handle objects created from LLM reponses
@dataclass
class SearchQuery:
search_query: str
def to_dict(self) -> Dict[str, Any]:
return asdict(self)
tavily_search = TavilySearchAPIWrapper()
async def run_search_queries(
search_queries: List[Union[str, SearchQuery]],
num_results: int = 5,
include_raw_content: bool = False
) -> List[Dict]:
search_tasks = []
for query in search_queries:
# Handle both string and SearchQuery objects
# Just in case LLM fails to generate queries as:
# class SearchQuery(BaseModel):
# search_query: str
query_str = query.search_query if isinstance(query, SearchQuery)
else str(query) # text query
try:
# get results from tavily async (in parallel) for each search query
search_tasks.append(
tavily_search.raw_results_async(
query=query_str,
max_results=num_results,
search_depth='advanced',
include_answer=False,
include_raw_content=include_raw_content
)
)
except Exception as e:
print(f"Error creating search task for query '{query_str}': {e}")
continue
# Execute all searches concurrently and await results
try:
if not search_tasks:
return []
search_docs = await asyncio.gather(*search_tasks, return_exceptions=True)
# Filter out any exceptions from the results
valid_results = [
doc for doc in search_docs
if not isinstance(doc, Exception)
]
return valid_results
except Exception as e:
print(f"Error during search queries: {e}")
return []
from langchain_community.utilities.tavily_search import TavilySearchAPIWrapper
import asyncio
from dataclasses import asdict, dataclass
# just to handle objects created from LLM reponses
@dataclass
class SearchQuery:
search_query: str
def to_dict(self) -> Dict[str, Any]:
return asdict(self)
tavily_search = TavilySearchAPIWrapper()
async def run_search_queries(
search_queries: List[Union[str, SearchQuery]],
num_results: int = 5,
include_raw_content: bool = False
) -> List[Dict]:
search_tasks = []
for query in search_queries:
# Handle both string and SearchQuery objects
# Just in case LLM fails to generate queries as:
# class SearchQuery(BaseModel):
# search_query: str
query_str = query.search_query if isinstance(query, SearchQuery)
else str(query) # text query
try:
# get results from tavily async (in parallel) for each search query
search_tasks.append(
tavily_search.raw_results_async(
query=query_str,
max_results=num_results,
search_depth='advanced',
include_answer=False,
include_raw_content=include_raw_content
)
)
except Exception as e:
print(f"Error creating search task for query '{query_str}': {e}")
continue
# Execute all searches concurrently and await results
try:
if not search_tasks:
return []
search_docs = await asyncio.gather(*search_tasks, return_exceptions=True)
# Filter out any exceptions from the results
valid_results = [
doc for doc in search_docs
if not isinstance(doc, Exception)
]
return valid_results
except Exception as e:
print(f"Error during search queries: {e}")
return []
这将从 Tavily 搜索结果中提取上下文,确保相同 URL 中的内容没有重复,并将其格式化以显示来源、URL 和相关内容(以及可选的原始内容,原始内容可根据标记数量进行截断)。
from typing import List, Dict, Union, Any
def format_search_query_results(
search_response: Union[Dict[str, Any], List[Any]],
include_raw_content: bool = False
encoding = tiktoken.encoding_for_model("gpt-4")
# Handle different response formats if search results is a dict
if isinstance(search_response, dict):
if 'results' in search_response:
sources_list.extend(search_response['results'])
sources_list.append(search_response)
# if search results is a list
elif isinstance(search_response, list):
for response in search_response:
if isinstance(response, dict):
if 'results' in response:
sources_list.extend(response['results'])
sources_list.append(response)
elif isinstance(response, list):
sources_list.extend(response)
return "No search results found."
# Deduplicate by URL and keep unique sources (website urls)
for source in sources_list:
if isinstance(source, dict) and 'url' in source:
if source['url'] not in unique_sources:
unique_sources[source['url']] = source
formatted_text = "Content from web search:\n\n"
for i, source in enumerate(unique_sources.values(), 1):
formatted_text += f"Source {source.get('title', 'Untitled')}:\n===\n"
formatted_text += f"URL: {source['url']}\n===\n"
formatted_text += f"Most relevant content from source: {source.get('content', 'No content available')}\n===\n"
# truncate raw webpage content to a certain number of tokens to prevent exceeding LLM max token window
raw_content = source.get("raw_content", "")
tokens = encoding.encode(raw_content)
truncated_tokens = tokens[:max_tokens]
truncated_content = encoding.decode(truncated_tokens)
formatted_text += f"Raw Content: {truncated_content}\n\n"
return formatted_text.strip()
import tiktoken
from typing import List, Dict, Union, Any
def format_search_query_results(
search_response: Union[Dict[str, Any], List[Any]],
max_tokens: int = 2000,
include_raw_content: bool = False
) -> str:
encoding = tiktoken.encoding_for_model("gpt-4")
sources_list = []
# Handle different response formats if search results is a dict
if isinstance(search_response, dict):
if 'results' in search_response:
sources_list.extend(search_response['results'])
else:
sources_list.append(search_response)
# if search results is a list
elif isinstance(search_response, list):
for response in search_response:
if isinstance(response, dict):
if 'results' in response:
sources_list.extend(response['results'])
else:
sources_list.append(response)
elif isinstance(response, list):
sources_list.extend(response)
if not sources_list:
return "No search results found."
# Deduplicate by URL and keep unique sources (website urls)
unique_sources = {}
for source in sources_list:
if isinstance(source, dict) and 'url' in source:
if source['url'] not in unique_sources:
unique_sources[source['url']] = source
# Format output
formatted_text = "Content from web search:\n\n"
for i, source in enumerate(unique_sources.values(), 1):
formatted_text += f"Source {source.get('title', 'Untitled')}:\n===\n"
formatted_text += f"URL: {source['url']}\n===\n"
formatted_text += f"Most relevant content from source: {source.get('content', 'No content available')}\n===\n"
if include_raw_content:
# truncate raw webpage content to a certain number of tokens to prevent exceeding LLM max token window
raw_content = source.get("raw_content", "")
if raw_content:
tokens = encoding.encode(raw_content)
truncated_tokens = tokens[:max_tokens]
truncated_content = encoding.decode(truncated_tokens)
formatted_text += f"Raw Content: {truncated_content}\n\n"
return formatted_text.strip()
import tiktoken
from typing import List, Dict, Union, Any
def format_search_query_results(
search_response: Union[Dict[str, Any], List[Any]],
max_tokens: int = 2000,
include_raw_content: bool = False
) -> str:
encoding = tiktoken.encoding_for_model("gpt-4")
sources_list = []
# Handle different response formats if search results is a dict
if isinstance(search_response, dict):
if 'results' in search_response:
sources_list.extend(search_response['results'])
else:
sources_list.append(search_response)
# if search results is a list
elif isinstance(search_response, list):
for response in search_response:
if isinstance(response, dict):
if 'results' in response:
sources_list.extend(response['results'])
else:
sources_list.append(response)
elif isinstance(response, list):
sources_list.extend(response)
if not sources_list:
return "No search results found."
# Deduplicate by URL and keep unique sources (website urls)
unique_sources = {}
for source in sources_list:
if isinstance(source, dict) and 'url' in source:
if source['url'] not in unique_sources:
unique_sources[source['url']] = source
# Format output
formatted_text = "Content from web search:\n\n"
for i, source in enumerate(unique_sources.values(), 1):
formatted_text += f"Source {source.get('title', 'Untitled')}:\n===\n"
formatted_text += f"URL: {source['url']}\n===\n"
formatted_text += f"Most relevant content from source: {source.get('content', 'No content available')}\n===\n"
if include_raw_content:
# truncate raw webpage content to a certain number of tokens to prevent exceeding LLM max token window
raw_content = source.get("raw_content", "")
if raw_content:
tokens = encoding.encode(raw_content)
truncated_tokens = tokens[:max_tokens]
truncated_content = encoding.decode(truncated_tokens)
formatted_text += f"Raw Content: {truncated_content}\n\n"
return formatted_text.strip()
我们可以测试一下这些函数是否能正常工作,如下所示:
docs = await run_search_queries(['langgraph'], include_raw_content=True)
output = format_search_query_results(docs, max_tokens=500,
include_raw_content=True)
docs = await run_search_queries(['langgraph'], include_raw_content=True)
output = format_search_query_results(docs, max_tokens=500,
include_raw_content=True)
print(output)
docs = await run_search_queries(['langgraph'], include_raw_content=True)
output = format_search_query_results(docs, max_tokens=500,
include_raw_content=True)
print(output)
输出
Content from web search:Source Introduction - GitHub Pages:===URL: https://langchain-ai.github.io/langgraphjs/===Most relevant content from source: Overview¶. LangGraph is a library for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows......===Raw Content: 🦜🕸️LangGraph.js¶⚡ Building language agents as graphs ⚡Looking for the Python version? Clickhere ( docs).Overview......Source ️LangGraph - GitHub Pages:===URL: https://langchain-ai.github.io/langgraph/===Most relevant content from source: Overview¶. LangGraph is a library for building stateful, multi-actor applications with LLMs, ......===Raw Content: 🦜🕸️LangGraph¶⚡ Building language agents as graphs ⚡NoteLooking for the JS version? See the JS repo and the JS docs.Overview¶LangGraph is a library for buildingstateful, multi-actor applications with LLMs, ......
创建默认报告模板
这是 LLM 了解如何创建一般报告的起点,它将以此为指导,根据主题创建自定义报告结构。请记住,这不是最终的报告结构,而更像是指导代理的提示。
DEFAULT_REPORT_STRUCTURE = """The report structure should focus on breaking-down the user-provided topic
and building a comprehensive report in markdown using the following format:
1. Introduction (no web search needed)
- Brief overview of the topic area
- Each section should focus on a sub-topic of the user-provided topic
- Include any key concepts and definitions
- Provide real-world examples or case studies where applicable
3. Conclusion (no web search needed)
- Aim for 1 structural element (either a list of table) that distills the main body sections
- Provide a concise summary of the report
When generating the final response in markdown, if there are special characters in the text,
such as the dollar symbol, ensure they are escaped properly for correct rendering e.g $25.5 should become \$25.5
# Structure Guideline
DEFAULT_REPORT_STRUCTURE = """The report structure should focus on breaking-down the user-provided topic
and building a comprehensive report in markdown using the following format:
1. Introduction (no web search needed)
- Brief overview of the topic area
2. Main Body Sections:
- Each section should focus on a sub-topic of the user-provided topic
- Include any key concepts and definitions
- Provide real-world examples or case studies where applicable
3. Conclusion (no web search needed)
- Aim for 1 structural element (either a list of table) that distills the main body sections
- Provide a concise summary of the report
When generating the final response in markdown, if there are special characters in the text,
such as the dollar symbol, ensure they are escaped properly for correct rendering e.g $25.5 should become \$25.5
"""
# Structure Guideline
DEFAULT_REPORT_STRUCTURE = """The report structure should focus on breaking-down the user-provided topic
and building a comprehensive report in markdown using the following format:
1. Introduction (no web search needed)
- Brief overview of the topic area
2. Main Body Sections:
- Each section should focus on a sub-topic of the user-provided topic
- Include any key concepts and definitions
- Provide real-world examples or case studies where applicable
3. Conclusion (no web search needed)
- Aim for 1 structural element (either a list of table) that distills the main body sections
- Provide a concise summary of the report
When generating the final response in markdown, if there are special characters in the text,
such as the dollar symbol, ensure they are escaped properly for correct rendering e.g $25.5 should become \$25.5
"""
报告规划器的指令提示
主要有两个指令提示:
1. REPORT_PLAN_QUERY_GENERATOR_PROMPT(报告计划查询生成器提示)
帮助 LLM 根据主题生成初始问题列表,以便从网上获取更多有关该主题的信息,从而规划报告的整体章节和结构。
REPORT_PLAN_QUERY_GENERATOR_PROMPT = """You are an expert technical report writer, helping to plan a report.
The report will be focused on the following topic:
The report structure will follow these guidelines:
Your goal is to generate {number_of_queries} search queries that will help gather comprehensive information for planning the report sections.
1. Be related to the topic
2. Help satisfy the requirements specified in the report organization
Make the query specific enough to find high-quality, relevant sources while covering the depth and breadth needed for the report structure.
REPORT_PLAN_QUERY_GENERATOR_PROMPT = """You are an expert technical report writer, helping to plan a report.
The report will be focused on the following topic:
{topic}
The report structure will follow these guidelines:
{report_organization}
Your goal is to generate {number_of_queries} search queries that will help gather comprehensive information for planning the report sections.
The query should:
1. Be related to the topic
2. Help satisfy the requirements specified in the report organization
Make the query specific enough to find high-quality, relevant sources while covering the depth and breadth needed for the report structure.
"""
REPORT_PLAN_QUERY_GENERATOR_PROMPT = """You are an expert technical report writer, helping to plan a report.
The report will be focused on the following topic:
{topic}
The report structure will follow these guidelines:
{report_organization}
Your goal is to generate {number_of_queries} search queries that will help gather comprehensive information for planning the report sections.
The query should:
1. Be related to the topic
2. Help satisfy the requirements specified in the report organization
Make the query specific enough to find high-quality, relevant sources while covering the depth and breadth needed for the report structure.
"""
2. REPORT_PLAN_SECTION_GENERATOR_PROMPT(报告计划章节生成器提示)
在这里,我们向 LLM 提供默认报告模板、主题名称和初始查询生成的搜索结果,以创建详细的报告结构。LLM 将为报告中的每个主要部分生成包含以下字段的结构化响应(这只是报告结构–此步骤不创建内容):
- Name – 报告此部分的名称。
- Description – 本节将涵盖的主要主题和概念的简要概述。
- Research – 是否对报告的这一部分进行网络搜索。
- Content – 本节的内容,暂时留空。
REPORT_PLAN_SECTION_GENERATOR_PROMPT = """You are an expert technical report writer, helping to plan a report.
Your goal is to generate the outline of the sections of the report.
The overall topic of the report is:
The report should follow this organizational structure:
You should reflect on this additional context information from web searches to plan the main sections of the report:
Now, generate the sections of the report. Each section should have the following fields:
- Name - Name for this section of the report.
- Description - Brief overview of the main topics and concepts to be covered in this section.
- Research - Whether to perform web search for this section of the report or not.
- Content - The content of the section, which you will leave blank for now.
Consider which sections require web search.
For example, introduction and conclusion will not require research because they will distill information from other parts of the report.
REPORT_PLAN_SECTION_GENERATOR_PROMPT = """You are an expert technical report writer, helping to plan a report.
Your goal is to generate the outline of the sections of the report.
The overall topic of the report is:
{topic}
The report should follow this organizational structure:
{report_organization}
You should reflect on this additional context information from web searches to plan the main sections of the report:
{search_context}
Now, generate the sections of the report. Each section should have the following fields:
- Name - Name for this section of the report.
- Description - Brief overview of the main topics and concepts to be covered in this section.
- Research - Whether to perform web search for this section of the report or not.
- Content - The content of the section, which you will leave blank for now.
Consider which sections require web search.
For example, introduction and conclusion will not require research because they will distill information from other parts of the report.
"""
REPORT_PLAN_SECTION_GENERATOR_PROMPT = """You are an expert technical report writer, helping to plan a report.
Your goal is to generate the outline of the sections of the report.
The overall topic of the report is:
{topic}
The report should follow this organizational structure:
{report_organization}
You should reflect on this additional context information from web searches to plan the main sections of the report:
{search_context}
Now, generate the sections of the report. Each section should have the following fields:
- Name - Name for this section of the report.
- Description - Brief overview of the main topics and concepts to be covered in this section.
- Research - Whether to perform web search for this section of the report or not.
- Content - The content of the section, which you will leave blank for now.
Consider which sections require web search.
For example, introduction and conclusion will not require research because they will distill information from other parts of the report.
"""
报告规划器节点函数
我们将构建报告规划器节点的逻辑,其目的是根据输入的用户主题和默认报告模板指南,创建一个结构化的自定义报告模板,并包含主要部分的名称和描述。

报告规划器节点函数
该功能使用之前创建的两个提示:
- 首先,根据用户主题生成一些查询
- 搜索网络,获取有关这些查询的一些信息
- 利用这些信息生成报告的整体结构,以及需要创建的关键部分
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
llm = ChatOpenAI(model_name="gpt-4o", temperature=0)
async def generate_report_plan(state: ReportState):
"""Generate the overall plan for building the report"""
print('--- Generating Report Plan ---')
report_structure = DEFAULT_REPORT_STRUCTURE
structured_llm = llm.with_structured_output(Queries)
system_instructions_query = REPORT_PLAN_QUERY_GENERATOR_PROMPT.format(
report_organization=report_structure,
number_of_queries=number_of_queries
results = structured_llm.invoke([
SystemMessage(content=system_instructions_query),
HumanMessage(content='Generate search queries that will help with planning the sections of the report.')
# Convert SearchQuery objects to strings
query.search_query if isinstance(query, SearchQuery) else str(query)
for query in results.queries
# Search web and ensure we wait for results
search_docs = await run_search_queries(
include_raw_content=False
print("Warning: No search results returned")
search_context = "No search results available."
search_context = format_search_query_results(
include_raw_content=False
system_instructions_sections = REPORT_PLAN_SECTION_GENERATOR_PROMPT.format(
report_organization=report_structure,
search_context=search_context
structured_llm = llm.with_structured_output(Sections)
report_sections = structured_llm.invoke([
SystemMessage(content=system_instructions_sections),
HumanMessage(content="Generate the sections of the report. Your response must include a 'sections' field containing a list of sections. Each section must have: name, description, plan, research, and content fields.")
print('--- Generating Report Plan Completed ---')
return {"sections": report_sections.sections}
print(f"Error in generate_report_plan: {e}")
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
llm = ChatOpenAI(model_name="gpt-4o", temperature=0)
async def generate_report_plan(state: ReportState):
"""Generate the overall plan for building the report"""
topic = state["topic"]
print('--- Generating Report Plan ---')
report_structure = DEFAULT_REPORT_STRUCTURE
number_of_queries = 8
structured_llm = llm.with_structured_output(Queries)
system_instructions_query = REPORT_PLAN_QUERY_GENERATOR_PROMPT.format(
topic=topic,
report_organization=report_structure,
number_of_queries=number_of_queries
)
try:
# Generate queries
results = structured_llm.invoke([
SystemMessage(content=system_instructions_query),
HumanMessage(content='Generate search queries that will help with planning the sections of the report.')
])
# Convert SearchQuery objects to strings
query_list = [
query.search_query if isinstance(query, SearchQuery) else str(query)
for query in results.queries
]
# Search web and ensure we wait for results
search_docs = await run_search_queries(
query_list,
num_results=5,
include_raw_content=False
)
if not search_docs:
print("Warning: No search results returned")
search_context = "No search results available."
else:
search_context = format_search_query_results(
search_docs,
include_raw_content=False
)
# Generate sections
system_instructions_sections = REPORT_PLAN_SECTION_GENERATOR_PROMPT.format(
topic=topic,
report_organization=report_structure,
search_context=search_context
)
structured_llm = llm.with_structured_output(Sections)
report_sections = structured_llm.invoke([
SystemMessage(content=system_instructions_sections),
HumanMessage(content="Generate the sections of the report. Your response must include a 'sections' field containing a list of sections. Each section must have: name, description, plan, research, and content fields.")
])
print('--- Generating Report Plan Completed ---')
return {"sections": report_sections.sections}
except Exception as e:
print(f"Error in generate_report_plan: {e}")
return {"sections": []}
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
llm = ChatOpenAI(model_name="gpt-4o", temperature=0)
async def generate_report_plan(state: ReportState):
"""Generate the overall plan for building the report"""
topic = state["topic"]
print('--- Generating Report Plan ---')
report_structure = DEFAULT_REPORT_STRUCTURE
number_of_queries = 8
structured_llm = llm.with_structured_output(Queries)
system_instructions_query = REPORT_PLAN_QUERY_GENERATOR_PROMPT.format(
topic=topic,
report_organization=report_structure,
number_of_queries=number_of_queries
)
try:
# Generate queries
results = structured_llm.invoke([
SystemMessage(content=system_instructions_query),
HumanMessage(content='Generate search queries that will help with planning the sections of the report.')
])
# Convert SearchQuery objects to strings
query_list = [
query.search_query if isinstance(query, SearchQuery) else str(query)
for query in results.queries
]
# Search web and ensure we wait for results
search_docs = await run_search_queries(
query_list,
num_results=5,
include_raw_content=False
)
if not search_docs:
print("Warning: No search results returned")
search_context = "No search results available."
else:
search_context = format_search_query_results(
search_docs,
include_raw_content=False
)
# Generate sections
system_instructions_sections = REPORT_PLAN_SECTION_GENERATOR_PROMPT.format(
topic=topic,
report_organization=report_structure,
search_context=search_context
)
structured_llm = llm.with_structured_output(Sections)
report_sections = structured_llm.invoke([
SystemMessage(content=system_instructions_sections),
HumanMessage(content="Generate the sections of the report. Your response must include a 'sections' field containing a list of sections. Each section must have: name, description, plan, research, and content fields.")
])
print('--- Generating Report Plan Completed ---')
return {"sections": report_sections.sections}
except Exception as e:
print(f"Error in generate_report_plan: {e}")
return {"sections": []}
章节生成器 – 查询生成器的指令提示
有一个主要指令提示:
1. REPORT_SECTION_QUERY_GENERATOR_PROMPT
帮助 LLM 为需要构建的特定章节的主题生成一个全面的问题列表
REPORT_SECTION_QUERY_GENERATOR_PROMPT = """Your goal is to generate targeted web search queries that will gather comprehensive information for writing a technical report section.
When generating {number_of_queries} search queries, ensure that they:
1. Cover different aspects of the topic (e.g., core features, real-world applications, technical architecture)
2. Include specific technical terms related to the topic
3. Target recent information by including year markers where relevant (e.g., "2024")
4. Look for comparisons or differentiators from similar technologies/approaches
5. Search for both official documentation and practical implementation examples
- Specific enough to avoid generic results
- Technical enough to capture detailed implementation information
- Diverse enough to cover all aspects of the section plan
- Focused on authoritative sources (documentation, technical blogs, academic papers)"""
REPORT_SECTION_QUERY_GENERATOR_PROMPT = """Your goal is to generate targeted web search queries that will gather comprehensive information for writing a technical report section.
Topic for this section:
{section_topic}
When generating {number_of_queries} search queries, ensure that they:
1. Cover different aspects of the topic (e.g., core features, real-world applications, technical architecture)
2. Include specific technical terms related to the topic
3. Target recent information by including year markers where relevant (e.g., "2024")
4. Look for comparisons or differentiators from similar technologies/approaches
5. Search for both official documentation and practical implementation examples
Your queries should be:
- Specific enough to avoid generic results
- Technical enough to capture detailed implementation information
- Diverse enough to cover all aspects of the section plan
- Focused on authoritative sources (documentation, technical blogs, academic papers)"""
REPORT_SECTION_QUERY_GENERATOR_PROMPT = """Your goal is to generate targeted web search queries that will gather comprehensive information for writing a technical report section.
Topic for this section:
{section_topic}
When generating {number_of_queries} search queries, ensure that they:
1. Cover different aspects of the topic (e.g., core features, real-world applications, technical architecture)
2. Include specific technical terms related to the topic
3. Target recent information by including year markers where relevant (e.g., "2024")
4. Look for comparisons or differentiators from similar technologies/approaches
5. Search for both official documentation and practical implementation examples
Your queries should be:
- Specific enough to avoid generic results
- Technical enough to capture detailed implementation information
- Diverse enough to cover all aspects of the section plan
- Focused on authoritative sources (documentation, technical blogs, academic papers)"""
章节生成器的节点函数 – 生成查询(查询生成器)
该功能使用章节主题和上面的指令提示生成一些问题,以便在网络上查找有关章节主题的有用信息。

查询生成器节点函数
def generate_queries(state: SectionState):
""" Generate search queries for a specific report section """
section = state["section"]
print('--- Generating Search Queries for Section: '+ section.name +' ---')
structured_llm = llm.with_structured_output(Queries)
# Format system instructions
system_instructions = REPORT_SECTION_QUERY_GENERATOR_PROMPT.format(section_topic=section.description, number_of_queries=number_of_queries)
user_instruction = "Generate search queries on the provided topic."
search_queries = structured_llm.invoke([SystemMessage(content=system_instructions),
HumanMessage(content=user_instruction)])
print('--- Generating Search Queries for Section: '+ section.name +' Completed ---')
return {"search_queries": search_queries.queries}
def generate_queries(state: SectionState):
""" Generate search queries for a specific report section """
# Get state
section = state["section"]
print('--- Generating Search Queries for Section: '+ section.name +' ---')
# Get configuration
number_of_queries = 5
# Generate queries
structured_llm = llm.with_structured_output(Queries)
# Format system instructions
system_instructions = REPORT_SECTION_QUERY_GENERATOR_PROMPT.format(section_topic=section.description, number_of_queries=number_of_queries)
# Generate queries
user_instruction = "Generate search queries on the provided topic."
search_queries = structured_llm.invoke([SystemMessage(content=system_instructions),
HumanMessage(content=user_instruction)])
print('--- Generating Search Queries for Section: '+ section.name +' Completed ---')
return {"search_queries": search_queries.queries}
def generate_queries(state: SectionState):
""" Generate search queries for a specific report section """
# Get state
section = state["section"]
print('--- Generating Search Queries for Section: '+ section.name +' ---')
# Get configuration
number_of_queries = 5
# Generate queries
structured_llm = llm.with_structured_output(Queries)
# Format system instructions
system_instructions = REPORT_SECTION_QUERY_GENERATOR_PROMPT.format(section_topic=section.description, number_of_queries=number_of_queries)
# Generate queries
user_instruction = "Generate search queries on the provided topic."
search_queries = structured_llm.invoke([SystemMessage(content=system_instructions),
HumanMessage(content=user_instruction)])
print('--- Generating Search Queries for Section: '+ section.name +' Completed ---')
return {"search_queries": search_queries.queries}
章节生成器的节点函数 – 搜索网络
获取由 generate_queries(…)为特定章节生成的查询,使用我们之前定义的实用功能搜索网络并格式化搜索结果。

网络研究员节点函数
async def search_web(state: SectionState):
""" Search the web for each query, then return a list of raw sources and a formatted string of sources."""
search_queries = state["search_queries"]
print('--- Searching Web for Queries ---')
query_list = [query.search_query for query in search_queries]
search_docs = await run_search_queries(search_queries, num_results=6, include_raw_content=True)
# Deduplicate and format sources
search_context = format_search_query_results(search_docs, max_tokens=4000, include_raw_content=True)
print('--- Searching Web for Queries Completed ---')
return {"source_str": search_context}
async def search_web(state: SectionState):
""" Search the web for each query, then return a list of raw sources and a formatted string of sources."""
# Get state
search_queries = state["search_queries"]
print('--- Searching Web for Queries ---')
# Web search
query_list = [query.search_query for query in search_queries]
search_docs = await run_search_queries(search_queries, num_results=6, include_raw_content=True)
# Deduplicate and format sources
search_context = format_search_query_results(search_docs, max_tokens=4000, include_raw_content=True)
print('--- Searching Web for Queries Completed ---')
return {"source_str": search_context}
async def search_web(state: SectionState):
""" Search the web for each query, then return a list of raw sources and a formatted string of sources."""
# Get state
search_queries = state["search_queries"]
print('--- Searching Web for Queries ---')
# Web search
query_list = [query.search_query for query in search_queries]
search_docs = await run_search_queries(search_queries, num_results=6, include_raw_content=True)
# Deduplicate and format sources
search_context = format_search_query_results(search_docs, max_tokens=4000, include_raw_content=True)
print('--- Searching Web for Queries Completed ---')
return {"source_str": search_context}
章节生成器–章节写作的指令提示
有一个主要的指令提示:
1. SECTION_WRITER_PROMPT(章节编写提示)
限制 LLM 使用特定的文体、结构、长度和方法指南生成并编写特定章节的内容,同时发送使用 search_web(…) 函数从网上获取的文档。
SECTION_WRITER_PROMPT = """You are an expert technical writer crafting one specific section of a technical report.
- Include specific version numbers
- Reference concrete metrics/benchmarks
- Cite official documentation
- Use technical terminology precisely
- Strict 150-200 word limit
- Write in simple, clear language do not use complex words unnecessarily
- Start with your most important insight in **bold**
- Use short paragraphs (2-3 sentences max)
- Use ## for section title (Markdown format)
- Only use ONE structural element IF it helps clarify your point:
* Either a focused table comparing 2-3 key items (using Markdown table syntax)
* Or a short list (3-5 items) using proper Markdown list syntax:
- Use `*` or `-` for unordered lists
- Use `1.` for ordered lists
- Ensure proper indentation and spacing
- End with ### Sources that references the below source material formatted as:
* List each source with title, date, and URL
* Format: `- Title : URL`
- Include at least one specific example or case study if available
- Use concrete details over general statements
- No preamble prior to creating the section content
- Focus on your single most important point
4. Use this source material obtained from web searches to help write the section:
- Format should be Markdown
- Exactly 150-200 words (excluding title and sources)
- Careful use of only ONE structural element (table or bullet list) and only if it helps clarify your point
- One specific example / case study if available
- Starts with bold insight
- No preamble prior to creating the section content
- If there are special characters in the text, such as the dollar symbol,
ensure they are escaped properly for correct rendering e.g $25.5 should become \$25.5
SECTION_WRITER_PROMPT = """You are an expert technical writer crafting one specific section of a technical report.
Title for the section:
{section_title}
Topic for this section:
{section_topic}
Guidelines for writing:
1. Technical Accuracy:
- Include specific version numbers
- Reference concrete metrics/benchmarks
- Cite official documentation
- Use technical terminology precisely
2. Length and Style:
- Strict 150-200 word limit
- No marketing language
- Technical focus
- Write in simple, clear language do not use complex words unnecessarily
- Start with your most important insight in **bold**
- Use short paragraphs (2-3 sentences max)
3. Structure:
- Use ## for section title (Markdown format)
- Only use ONE structural element IF it helps clarify your point:
* Either a focused table comparing 2-3 key items (using Markdown table syntax)
* Or a short list (3-5 items) using proper Markdown list syntax:
- Use `*` or `-` for unordered lists
- Use `1.` for ordered lists
- Ensure proper indentation and spacing
- End with ### Sources that references the below source material formatted as:
* List each source with title, date, and URL
* Format: `- Title : URL`
3. Writing Approach:
- Include at least one specific example or case study if available
- Use concrete details over general statements
- Make every word count
- No preamble prior to creating the section content
- Focus on your single most important point
4. Use this source material obtained from web searches to help write the section:
{context}
5. Quality Checks:
- Format should be Markdown
- Exactly 150-200 words (excluding title and sources)
- Careful use of only ONE structural element (table or bullet list) and only if it helps clarify your point
- One specific example / case study if available
- Starts with bold insight
- No preamble prior to creating the section content
- Sources cited at end
- If there are special characters in the text, such as the dollar symbol,
ensure they are escaped properly for correct rendering e.g $25.5 should become \$25.5
"""
SECTION_WRITER_PROMPT = """You are an expert technical writer crafting one specific section of a technical report.
Title for the section:
{section_title}
Topic for this section:
{section_topic}
Guidelines for writing:
1. Technical Accuracy:
- Include specific version numbers
- Reference concrete metrics/benchmarks
- Cite official documentation
- Use technical terminology precisely
2. Length and Style:
- Strict 150-200 word limit
- No marketing language
- Technical focus
- Write in simple, clear language do not use complex words unnecessarily
- Start with your most important insight in **bold**
- Use short paragraphs (2-3 sentences max)
3. Structure:
- Use ## for section title (Markdown format)
- Only use ONE structural element IF it helps clarify your point:
* Either a focused table comparing 2-3 key items (using Markdown table syntax)
* Or a short list (3-5 items) using proper Markdown list syntax:
- Use `*` or `-` for unordered lists
- Use `1.` for ordered lists
- Ensure proper indentation and spacing
- End with ### Sources that references the below source material formatted as:
* List each source with title, date, and URL
* Format: `- Title : URL`
3. Writing Approach:
- Include at least one specific example or case study if available
- Use concrete details over general statements
- Make every word count
- No preamble prior to creating the section content
- Focus on your single most important point
4. Use this source material obtained from web searches to help write the section:
{context}
5. Quality Checks:
- Format should be Markdown
- Exactly 150-200 words (excluding title and sources)
- Careful use of only ONE structural element (table or bullet list) and only if it helps clarify your point
- One specific example / case study if available
- Starts with bold insight
- No preamble prior to creating the section content
- Sources cited at end
- If there are special characters in the text, such as the dollar symbol,
ensure they are escaped properly for correct rendering e.g $25.5 should become \$25.5
"""
章节创建器的节点函数 – 编写章节(章节编写器)
使用上面的 SECTION_WRITER_PROMPT,输入章节名称、描述和网络搜索文档,然后将其传递给 LLM,由 LLM 撰写该章节的内容

章节撰写器节点函数
def write_section(state: SectionState):
""" Write a section of the report """
section = state["section"]
source_str = state["source_str"]
print('--- Writing Section : '+ section.name +' ---')
# Format system instructions
system_instructions = SECTION_WRITER_PROMPT.format(section_title=section.name, section_topic=section.description, context=source_str)
user_instruction = "Generate a report section based on the provided sources."
section_content = llm.invoke([SystemMessage(content=system_instructions),
HumanMessage(content=user_instruction)])
# Write content to the section object
section.content = section_content.content
print('--- Writing Section : '+ section.name +' Completed ---')
# Write the updated section to completed sections
return {"completed_sections": [section]}
def write_section(state: SectionState):
""" Write a section of the report """
# Get state
section = state["section"]
source_str = state["source_str"]
print('--- Writing Section : '+ section.name +' ---')
# Format system instructions
system_instructions = SECTION_WRITER_PROMPT.format(section_title=section.name, section_topic=section.description, context=source_str)
# Generate section
user_instruction = "Generate a report section based on the provided sources."
section_content = llm.invoke([SystemMessage(content=system_instructions),
HumanMessage(content=user_instruction)])
# Write content to the section object
section.content = section_content.content
print('--- Writing Section : '+ section.name +' Completed ---')
# Write the updated section to completed sections
return {"completed_sections": [section]}
def write_section(state: SectionState):
""" Write a section of the report """
# Get state
section = state["section"]
source_str = state["source_str"]
print('--- Writing Section : '+ section.name +' ---')
# Format system instructions
system_instructions = SECTION_WRITER_PROMPT.format(section_title=section.name, section_topic=section.description, context=source_str)
# Generate section
user_instruction = "Generate a report section based on the provided sources."
section_content = llm.invoke([SystemMessage(content=system_instructions),
HumanMessage(content=user_instruction)])
# Write content to the section object
section.content = section_content.content
print('--- Writing Section : '+ section.name +' Completed ---')
# Write the updated section to completed sections
return {"completed_sections": [section]}
创建章节生成器子代理
这个代理(或者更具体地说,子代理)将被并行调用多次,每个章节都会被调用一次,以搜索网络、获取内容,然后编写特定的章节。我们利用 LangGraph 的发送结构来实现这一功能。

章节构建子代理
from langgraph.graph import StateGraph, START, END
section_builder = StateGraph(SectionState, output=SectionOutputState)
section_builder.add_node("generate_queries", generate_queries)
section_builder.add_node("search_web", search_web)
section_builder.add_node("write_section", write_section)
section_builder.add_edge(START, "generate_queries")
section_builder.add_edge("generate_queries", "search_web")
section_builder.add_edge("search_web", "write_section")
section_builder.add_edge("write_section", END)
section_builder_subagent = section_builder.compile()
from IPython.display import display, Image
Image(section_builder_subagent.get_graph().draw_mermaid_png())
from langgraph.graph import StateGraph, START, END
# Add nodes and edges
section_builder = StateGraph(SectionState, output=SectionOutputState)
section_builder.add_node("generate_queries", generate_queries)
section_builder.add_node("search_web", search_web)
section_builder.add_node("write_section", write_section)
section_builder.add_edge(START, "generate_queries")
section_builder.add_edge("generate_queries", "search_web")
section_builder.add_edge("search_web", "write_section")
section_builder.add_edge("write_section", END)
section_builder_subagent = section_builder.compile()
# Display the graph
from IPython.display import display, Image
Image(section_builder_subagent.get_graph().draw_mermaid_png())
from langgraph.graph import StateGraph, START, END
# Add nodes and edges
section_builder = StateGraph(SectionState, output=SectionOutputState)
section_builder.add_node("generate_queries", generate_queries)
section_builder.add_node("search_web", search_web)
section_builder.add_node("write_section", write_section)
section_builder.add_edge(START, "generate_queries")
section_builder.add_edge("generate_queries", "search_web")
section_builder.add_edge("search_web", "write_section")
section_builder.add_edge("write_section", END)
section_builder_subagent = section_builder.compile()
# Display the graph
from IPython.display import display, Image
Image(section_builder_subagent.get_graph().draw_mermaid_png())
输出

创建动态并行化节点函数 – 并行化章节编写
Send(…) 用于并行化并为每个部分调用一次 section_builder_subagent,以(并行)写入内容。
from langgraph.constants import Send
def parallelize_section_writing(state: ReportState):
""" This is the "map" step when we kick off web research for some sections of the report in parallel and then write the section"""
# Kick off section writing in parallel via Send() API for any sections that require research
Send("section_builder_with_web_search", # name of the subagent node
for s in state["sections"]
from langgraph.constants import Send
def parallelize_section_writing(state: ReportState):
""" This is the "map" step when we kick off web research for some sections of the report in parallel and then write the section"""
# Kick off section writing in parallel via Send() API for any sections that require research
return [
Send("section_builder_with_web_search", # name of the subagent node
{"section": s})
for s in state["sections"]
if s.research
]
from langgraph.constants import Send
def parallelize_section_writing(state: ReportState):
""" This is the "map" step when we kick off web research for some sections of the report in parallel and then write the section"""
# Kick off section writing in parallel via Send() API for any sections that require research
return [
Send("section_builder_with_web_search", # name of the subagent node
{"section": s})
for s in state["sections"]
if s.research
]
创建格式化章节节点函数
这基本上是对所有章节进行格式化并合并成一个大文档的部分。

格式章节节点函数
def format_sections(sections: list[Section]) -> str:
""" Format a list of report sections into a single text string """
for idx, section in enumerate(sections, 1):
Section {idx}: {section.name}
{section.content if section.content else '[Not yet written]'}
def format_completed_sections(state: ReportState):
""" Gather completed sections from research and format them as context for writing the final sections """
print('--- Formatting Completed Sections ---')
# List of completed sections
completed_sections = state["completed_sections"]
# Format completed section to str to use as context for final sections
completed_report_sections = format_sections(completed_sections)
print('--- Formatting Completed Sections is Done ---')
return {"report_sections_from_research": completed_report_sections}
def format_sections(sections: list[Section]) -> str:
""" Format a list of report sections into a single text string """
formatted_str = ""
for idx, section in enumerate(sections, 1):
formatted_str += f"""
{'='*60}
Section {idx}: {section.name}
{'='*60}
Description:
{section.description}
Requires Research:
{section.research}
Content:
{section.content if section.content else '[Not yet written]'}
"""
return formatted_str
def format_completed_sections(state: ReportState):
""" Gather completed sections from research and format them as context for writing the final sections """
print('--- Formatting Completed Sections ---')
# List of completed sections
completed_sections = state["completed_sections"]
# Format completed section to str to use as context for final sections
completed_report_sections = format_sections(completed_sections)
print('--- Formatting Completed Sections is Done ---')
return {"report_sections_from_research": completed_report_sections}
def format_sections(sections: list[Section]) -> str:
""" Format a list of report sections into a single text string """
formatted_str = ""
for idx, section in enumerate(sections, 1):
formatted_str += f"""
{'='*60}
Section {idx}: {section.name}
{'='*60}
Description:
{section.description}
Requires Research:
{section.research}
Content:
{section.content if section.content else '[Not yet written]'}
"""
return formatted_str
def format_completed_sections(state: ReportState):
""" Gather completed sections from research and format them as context for writing the final sections """
print('--- Formatting Completed Sections ---')
# List of completed sections
completed_sections = state["completed_sections"]
# Format completed section to str to use as context for final sections
completed_report_sections = format_sections(completed_sections)
print('--- Formatting Completed Sections is Done ---')
return {"report_sections_from_research": completed_report_sections}
最后章节的指导提示
有一个主要的指导提示:
1. FINAL_SECTION_WRITER_PROMPT(最后章节写作提示)
要求 LLM 根据有关文体、结构、长度、方法的某些指导原则生成并撰写引言或结论的内容,同时发送已撰写部分的内容。
FINAL_SECTION_WRITER_PROMPT = """You are an expert technical writer crafting a section that synthesizes information from the rest of the report.
Available report content of already completed sections:
1. Section-Specific Approach:
- Use # for report title (Markdown format)
- Write in simple and clear language
- Focus on the core motivation for the report in 1-2 paragraphs
- Use a clear narrative arc to introduce the report
- Include NO structural elements (no lists or tables)
- No sources section needed
- Use ## for section title (Markdown format)
- For comparative reports:
* Must include a focused comparison table using Markdown table syntax
* Table should distill insights from the report
* Keep table entries clear and concise
- For non-comparative reports:
* Only use ONE structural element IF it helps distill the points made in the report:
* Either a focused table comparing items present in the report (using Markdown table syntax)
* Or a short list using proper Markdown list syntax:
- Use `*` or `-` for unordered lists
- Use `1.` for ordered lists
- Ensure proper indentation and spacing
- End with specific next steps or implications
- No sources section needed
- Use concrete details over general statements
- Focus on your single most important point
- For introduction: 50-100 word limit, # for report title, no structural elements, no sources section
- For conclusion: 100-150 word limit, ## for section title, only ONE structural element at most, no sources section
- Do not include word count or any preamble in your response
- If there are special characters in the text, such as the dollar symbol,
ensure they are escaped properly for correct rendering e.g $25.5 should become \$25.5"""
FINAL_SECTION_WRITER_PROMPT = """You are an expert technical writer crafting a section that synthesizes information from the rest of the report.
Title for the section:
{section_title}
Topic for this section:
{section_topic}
Available report content of already completed sections:
{context}
1. Section-Specific Approach:
For Introduction:
- Use # for report title (Markdown format)
- 50-100 word limit
- Write in simple and clear language
- Focus on the core motivation for the report in 1-2 paragraphs
- Use a clear narrative arc to introduce the report
- Include NO structural elements (no lists or tables)
- No sources section needed
For Conclusion/Summary:
- Use ## for section title (Markdown format)
- 100-150 word limit
- For comparative reports:
* Must include a focused comparison table using Markdown table syntax
* Table should distill insights from the report
* Keep table entries clear and concise
- For non-comparative reports:
* Only use ONE structural element IF it helps distill the points made in the report:
* Either a focused table comparing items present in the report (using Markdown table syntax)
* Or a short list using proper Markdown list syntax:
- Use `*` or `-` for unordered lists
- Use `1.` for ordered lists
- Ensure proper indentation and spacing
- End with specific next steps or implications
- No sources section needed
3. Writing Approach:
- Use concrete details over general statements
- Make every word count
- Focus on your single most important point
4. Quality Checks:
- For introduction: 50-100 word limit, # for report title, no structural elements, no sources section
- For conclusion: 100-150 word limit, ## for section title, only ONE structural element at most, no sources section
- Markdown format
- Do not include word count or any preamble in your response
- If there are special characters in the text, such as the dollar symbol,
ensure they are escaped properly for correct rendering e.g $25.5 should become \$25.5"""
FINAL_SECTION_WRITER_PROMPT = """You are an expert technical writer crafting a section that synthesizes information from the rest of the report.
Title for the section:
{section_title}
Topic for this section:
{section_topic}
Available report content of already completed sections:
{context}
1. Section-Specific Approach:
For Introduction:
- Use # for report title (Markdown format)
- 50-100 word limit
- Write in simple and clear language
- Focus on the core motivation for the report in 1-2 paragraphs
- Use a clear narrative arc to introduce the report
- Include NO structural elements (no lists or tables)
- No sources section needed
For Conclusion/Summary:
- Use ## for section title (Markdown format)
- 100-150 word limit
- For comparative reports:
* Must include a focused comparison table using Markdown table syntax
* Table should distill insights from the report
* Keep table entries clear and concise
- For non-comparative reports:
* Only use ONE structural element IF it helps distill the points made in the report:
* Either a focused table comparing items present in the report (using Markdown table syntax)
* Or a short list using proper Markdown list syntax:
- Use `*` or `-` for unordered lists
- Use `1.` for ordered lists
- Ensure proper indentation and spacing
- End with specific next steps or implications
- No sources section needed
3. Writing Approach:
- Use concrete details over general statements
- Make every word count
- Focus on your single most important point
4. Quality Checks:
- For introduction: 50-100 word limit, # for report title, no structural elements, no sources section
- For conclusion: 100-150 word limit, ## for section title, only ONE structural element at most, no sources section
- Markdown format
- Do not include word count or any preamble in your response
- If there are special characters in the text, such as the dollar symbol,
ensure they are escaped properly for correct rendering e.g $25.5 should become \$25.5"""
创建撰写最后章节节点函数
该函数使用上述 FINAL_SECTION_WRITER_PROMPT 指令提示来编写引言和结论。该函数将使用下面的 Send(…) 并行执行

最后章节写作节点函数
def write_final_sections(state: SectionState):
""" Write the final sections of the report, which do not require web search and use the completed sections as context"""
section = state["section"]
completed_report_sections = state["report_sections_from_research"]
print('--- Writing Final Section: '+ section.name + ' ---')
# Format system instructions
system_instructions = FINAL_SECTION_WRITER_PROMPT.format(section_title=section.name,
section_topic=section.description,
context=completed_report_sections)
user_instruction = "Craft a report section based on the provided sources."
section_content = llm.invoke([SystemMessage(content=system_instructions),
HumanMessage(content=user_instruction)])
# Write content to section
section.content = section_content.content
print('--- Writing Final Section: '+ section.name + ' Completed ---')
# Write the updated section to completed sections
return {"completed_sections": [section]}
def write_final_sections(state: SectionState):
""" Write the final sections of the report, which do not require web search and use the completed sections as context"""
# Get state
section = state["section"]
completed_report_sections = state["report_sections_from_research"]
print('--- Writing Final Section: '+ section.name + ' ---')
# Format system instructions
system_instructions = FINAL_SECTION_WRITER_PROMPT.format(section_title=section.name,
section_topic=section.description,
context=completed_report_sections)
# Generate section
user_instruction = "Craft a report section based on the provided sources."
section_content = llm.invoke([SystemMessage(content=system_instructions),
HumanMessage(content=user_instruction)])
# Write content to section
section.content = section_content.content
print('--- Writing Final Section: '+ section.name + ' Completed ---')
# Write the updated section to completed sections
return {"completed_sections": [section]}
def write_final_sections(state: SectionState):
""" Write the final sections of the report, which do not require web search and use the completed sections as context"""
# Get state
section = state["section"]
completed_report_sections = state["report_sections_from_research"]
print('--- Writing Final Section: '+ section.name + ' ---')
# Format system instructions
system_instructions = FINAL_SECTION_WRITER_PROMPT.format(section_title=section.name,
section_topic=section.description,
context=completed_report_sections)
# Generate section
user_instruction = "Craft a report section based on the provided sources."
section_content = llm.invoke([SystemMessage(content=system_instructions),
HumanMessage(content=user_instruction)])
# Write content to section
section.content = section_content.content
print('--- Writing Final Section: '+ section.name + ' Completed ---')
# Write the updated section to completed sections
return {"completed_sections": [section]}
创建动态并行化节点函数 – 并行化最后章节的编写
Send(…) 用于并行化,为引言和结论各调用一次 write_final_sections,(并行)写入内容
from langgraph.constants import Send
def parallelize_final_section_writing(state: ReportState):
""" Write any final sections using the Send API to parallelize the process """
# Kick off section writing in parallel via Send() API for any sections that do not require research
Send("write_final_sections",
{"section": s, "report_sections_from_research": state["report_sections_from_research"]})
for s in state["sections"]
from langgraph.constants import Send
def parallelize_final_section_writing(state: ReportState):
""" Write any final sections using the Send API to parallelize the process """
# Kick off section writing in parallel via Send() API for any sections that do not require research
return [
Send("write_final_sections",
{"section": s, "report_sections_from_research": state["report_sections_from_research"]})
for s in state["sections"]
if not s.research
]
from langgraph.constants import Send
def parallelize_final_section_writing(state: ReportState):
""" Write any final sections using the Send API to parallelize the process """
# Kick off section writing in parallel via Send() API for any sections that do not require research
return [
Send("write_final_sections",
{"section": s, "report_sections_from_research": state["report_sections_from_research"]})
for s in state["sections"]
if not s.research
]
编译最终报告节点函数
该函数将报告的所有部分合并在一起,并将其编译成最终报告文件

编译最终报告节点函数
def compile_final_report(state: ReportState):
""" Compile the final report """
sections = state["sections"]
completed_sections = {s.name: s.content for s in state["completed_sections"]}
print('--- Compiling Final Report ---')
# Update sections with completed content while maintaining original order
section.content = completed_sections[section.name]
all_sections = "\n\n".join([s.content for s in sections])
# Escape unescaped $ symbols to display properly in Markdown
formatted_sections = all_sections.replace("\\$", "TEMP_PLACEHOLDER") # Temporarily mark already escaped $
formatted_sections = formatted_sections.replace("$", "\\$") # Escape all $
formatted_sections = formatted_sections.replace("TEMP_PLACEHOLDER", "\\$") # Restore originally escaped $
# Now escaped_sections contains the properly escaped Markdown text
print('--- Compiling Final Report Done ---')
return {"final_report": formatted_sections}
def compile_final_report(state: ReportState):
""" Compile the final report """
# Get sections
sections = state["sections"]
completed_sections = {s.name: s.content for s in state["completed_sections"]}
print('--- Compiling Final Report ---')
# Update sections with completed content while maintaining original order
for section in sections:
section.content = completed_sections[section.name]
# Compile final report
all_sections = "\n\n".join([s.content for s in sections])
# Escape unescaped $ symbols to display properly in Markdown
formatted_sections = all_sections.replace("\\$", "TEMP_PLACEHOLDER") # Temporarily mark already escaped $
formatted_sections = formatted_sections.replace("$", "\\$") # Escape all $
formatted_sections = formatted_sections.replace("TEMP_PLACEHOLDER", "\\$") # Restore originally escaped $
# Now escaped_sections contains the properly escaped Markdown text
print('--- Compiling Final Report Done ---')
return {"final_report": formatted_sections}
def compile_final_report(state: ReportState):
""" Compile the final report """
# Get sections
sections = state["sections"]
completed_sections = {s.name: s.content for s in state["completed_sections"]}
print('--- Compiling Final Report ---')
# Update sections with completed content while maintaining original order
for section in sections:
section.content = completed_sections[section.name]
# Compile final report
all_sections = "\n\n".join([s.content for s in sections])
# Escape unescaped $ symbols to display properly in Markdown
formatted_sections = all_sections.replace("\\$", "TEMP_PLACEHOLDER") # Temporarily mark already escaped $
formatted_sections = formatted_sections.replace("$", "\\$") # Escape all $
formatted_sections = formatted_sections.replace("TEMP_PLACEHOLDER", "\\$") # Restore originally escaped $
# Now escaped_sections contains the properly escaped Markdown text
print('--- Compiling Final Report Done ---')
return {"final_report": formatted_sections}
建立我们的深度研究和报告撰写代理
现在,我们将所有已定义的组件和子代理整合在一起,建立我们的主规划代理。

深度研究与报告撰写代理工作流程
builder = StateGraph(ReportState, input=ReportStateInput, output=ReportStateOutput)
builder.add_node("generate_report_plan", generate_report_plan)
builder.add_node("section_builder_with_web_search", section_builder_subagent)
builder.add_node("format_completed_sections", format_completed_sections)
builder.add_node("write_final_sections", write_final_sections)
builder.add_node("compile_final_report", compile_final_report)
builder.add_edge(START, "generate_report_plan")
builder.add_conditional_edges("generate_report_plan",
parallelize_section_writing,
["section_builder_with_web_search"])
builder.add_edge("section_builder_with_web_search", "format_completed_sections")
builder.add_conditional_edges("format_completed_sections",
parallelize_final_section_writing,
["write_final_sections"])
builder.add_edge("write_final_sections", "compile_final_report")
builder.add_edge("compile_final_report", END)
reporter_agent = builder.compile()
display(Image(reporter_agent.get_graph(xray=True).draw_mermaid_png()))
builder = StateGraph(ReportState, input=ReportStateInput, output=ReportStateOutput)
builder.add_node("generate_report_plan", generate_report_plan)
builder.add_node("section_builder_with_web_search", section_builder_subagent)
builder.add_node("format_completed_sections", format_completed_sections)
builder.add_node("write_final_sections", write_final_sections)
builder.add_node("compile_final_report", compile_final_report)
builder.add_edge(START, "generate_report_plan")
builder.add_conditional_edges("generate_report_plan",
parallelize_section_writing,
["section_builder_with_web_search"])
builder.add_edge("section_builder_with_web_search", "format_completed_sections")
builder.add_conditional_edges("format_completed_sections",
parallelize_final_section_writing,
["write_final_sections"])
builder.add_edge("write_final_sections", "compile_final_report")
builder.add_edge("compile_final_report", END)
reporter_agent = builder.compile()
# view agent structure
display(Image(reporter_agent.get_graph(xray=True).draw_mermaid_png()))
builder = StateGraph(ReportState, input=ReportStateInput, output=ReportStateOutput)
builder.add_node("generate_report_plan", generate_report_plan)
builder.add_node("section_builder_with_web_search", section_builder_subagent)
builder.add_node("format_completed_sections", format_completed_sections)
builder.add_node("write_final_sections", write_final_sections)
builder.add_node("compile_final_report", compile_final_report)
builder.add_edge(START, "generate_report_plan")
builder.add_conditional_edges("generate_report_plan",
parallelize_section_writing,
["section_builder_with_web_search"])
builder.add_edge("section_builder_with_web_search", "format_completed_sections")
builder.add_conditional_edges("format_completed_sections",
parallelize_final_section_writing,
["write_final_sections"])
builder.add_edge("write_final_sections", "compile_final_report")
builder.add_edge("compile_final_report", END)
reporter_agent = builder.compile()
# view agent structure
display(Image(reporter_agent.get_graph(xray=True).draw_mermaid_png()))
输出

现在我们可以运行并测试我们的代理系统了!
运行并测试我们的深度研究报告撰写代理
最后,让我们来测试一下我们的深度研究报告撰写代理!我们将创建一个简单的函数来实时流式传输进度,然后显示最终报告。我建议在代理运行后关闭所有中间打印信息!
from IPython.display import display
from rich.console import Console
from rich.markdown import Markdown as RichMarkdown
async def call_planner_agent(agent, prompt, config={"recursion_limit": 50}, verbose=False):
async for event in events:
for k, v in event.items():
display(RichMarkdown(repr(k) + ' -> ' + repr(v)))
from IPython.display import display
from rich.console import Console
from rich.markdown import Markdown as RichMarkdown
async def call_planner_agent(agent, prompt, config={"recursion_limit": 50}, verbose=False):
events = agent.astream(
{'topic' : prompt},
config,
stream_mode="values",
)
async for event in events:
for k, v in event.items():
if verbose:
if k != "__end__":
display(RichMarkdown(repr(k) + ' -> ' + repr(v)))
if k == 'final_report':
print('='*50)
print('Final Report:')
md = RichMarkdown(v)
display(md)
from IPython.display import display
from rich.console import Console
from rich.markdown import Markdown as RichMarkdown
async def call_planner_agent(agent, prompt, config={"recursion_limit": 50}, verbose=False):
events = agent.astream(
{'topic' : prompt},
config,
stream_mode="values",
)
async for event in events:
for k, v in event.items():
if verbose:
if k != "__end__":
display(RichMarkdown(repr(k) + ' -> ' + repr(v)))
if k == 'final_report':
print('='*50)
print('Final Report:')
md = RichMarkdown(v)
display(md)
测试运行
topic = "Detailed report on how is NVIDIA winning the game against its competitors"
await call_planner_agent(agent=reporter_agent,
topic = "Detailed report on how is NVIDIA winning the game against its competitors"
await call_planner_agent(agent=reporter_agent,
prompt=topic)
topic = "Detailed report on how is NVIDIA winning the game against its competitors"
await call_planner_agent(agent=reporter_agent,
prompt=topic)
输出
--- Generating Report Plan ------ Generating Report Plan Completed ------ Generating Search Queries for Section: NVIDIA's Market Dominance in GPUs ------ Generating Search Queries for Section: Strategic Acquisitions and Partnerships ------ Generating Search Queries for Section: Technological Innovations and AI Leadership ------ Generating Search Queries for Section: Financial Performance and Growth Strategy ------ Generating Search Queries for Section: NVIDIA's Market Dominance in GPUs Completed ------ Searching Web for Queries ------ Generating Search Queries for Section: Financial Performance and Growth Strategy Completed ------ Searching Web for Queries ------ Generating Search Queries for Section: Technological Innovations and AI Leadership Completed ------ Searching Web for Queries ------ Generating Search Queries for Section: Strategic Acquisitions and Partnerships Completed ------ Searching Web for Queries ------ Searching Web for Queries Completed ------ Writing Section : Strategic Acquisitions and Partnerships ------ Searching Web for Queries Completed ------ Writing Section : Financial Performance and Growth Strategy ------ Searching Web for Queries Completed ------ Writing Section : NVIDIA's Market Dominance in GPUs ------ Searching Web for Queries Completed ------ Writing Section : Technological Innovations and AI Leadership ------ Writing Section : Strategic Acquisitions and Partnerships Completed ------ Writing Section : Financial Performance and Growth Strategy Completed ------ Writing Section : NVIDIA's Market Dominance in GPUs Completed ------ Writing Section : Technological Innovations and AI Leadership Completed ------ Formatting Completed Sections ------ Formatting Completed Sections is Done ------ Writing Final Section: Introduction ------ Writing Final Section: Conclusion ------ Writing Final Section: Introduction Completed ------ Writing Final Section: Conclusion Completed ------ Compiling Final Report ------ Compiling Final Report Done ---==================================================Final Report:

如上图所示,它为我们提供了一份相当全面、经过深入研究且结构合理的报告!
小结
如果你正在阅读这篇文章,我对你在这本大型指南中坚持到最后的努力表示赞赏!在这里,我们看到了构建类似于 OpenAI 推出的成熟商业产品(而且还不便宜!)并不太困难,OpenAI 是一家绝对知道如何推出生成式人工智能(Generative AI)优质产品的公司,现在又推出了代理式人工智能(Agentic AI)。
我们看到了如何构建我们自己的深度研究和报告生成代理人工智能系统的详细架构和工作流程,总体而言,运行这个系统的成本还不到承诺的一美元!如果一切都使用开源组件,那么它就是完全免费的!此外,这个系统完全可以定制,你可以控制搜索的方式、报告的结构、长度和风格。需要注意的是,如果使用 Tavily,在运行该代理进行深度研究时,很容易会出现大量搜索,因此要注意并跟踪使用情况。这只是给你提供了一个基础,你可以随意使用这些代码和系统,并对其进行定制,使其变得更好!
评论留言