利用OpenAI Agent SDK防護機制確保教育支援系統的完整性

OpenAI Agent SDK防護機制

隨著 OpenAI Agent SDK 的釋出,開發人員現在擁有了構建智慧系統的強大工具。其中最重要的一項功能是 Guardrails(防護機制),它可以過濾不需要的請求,幫助維護系統的完整性。這一功能在教育環境中尤為重要,因為在教育環境中,區分真正的學習支援和試圖繞過學術道德的行為可能具有挑戰性。

在本文中,我將展示一個在教育支援助理中使用 Guardrails 的實用而有影響力的案例。透過利用 Guardrails,我成功地阻止了不恰當的作業輔導請求,同時確保了真正的概念學習問題得到有效處理。

學習目標

  • 瞭解 Guardrails 透過過濾不適當的請求來維護人工智慧完整性的作用。
  • 探索在教育支援助理中使用 Guardrails 來防止學術不誠實。
  • 瞭解輸入和輸出 Guardrails 如何在人工智慧驅動的系統中阻止不受歡迎的行為。
  • 深入瞭解如何使用檢測規則和絆線實施 Guardrails。
  • 探索設計人工智慧助手的最佳實踐,以促進概念學習,同時確保道德使用。

什麼是Agent?

Agent 是一種透過結合推理、決策和環境互動等各種能力來智慧完成任務的系統。OpenAI 的新代理 SDK 利用大型語言模型(LLM) 和強大的整合工具方面的最新進展,使開發人員能夠輕鬆構建這些系統。

OpenAI Agent SDK 的關鍵元件

OpenAI Agent SDK 為構建、監控和改進關鍵領域的人工智慧代理提供了基本工具:

模型:代理的核心智慧。選項包括

  • o1 & o3-mini: 最適合規劃和複雜推理。
  • GPT-4.5: 擅長複雜任務,具有強大的代理能力。
  • GPT-4o:兼顧效能和速度。
  • GPT-4o-mini:針對低延遲任務進行了最佳化。

工具:可透過以下方式與環境互動

  • 功能呼叫、網路和檔案搜尋以及計算機控制。

知識與記憶:支援動態學習,包括

  • 用於語義搜尋的向量儲存。
  • 嵌入,提高上下文理解能力。

Guardrails:透過以下方式確保安全和控制

  • 用於內容過濾的 Moderation API。
  • 可預測行為的指令分層。

協調:管理代理部署:

  • 用於構建和流量控制的代理 SDK。
  • 用於除錯和效能調整的跟蹤和評估。

瞭解Guardrails

Guardrails 設計用於檢測和阻止對話代理中的不良行為。它們在兩個關鍵階段執行:

  • 輸入Guardrails:在代理處理輸入之前執行。它們可以預先防止誤用,從而節省計算成本和響應時間。
  • 輸出Guardrails:在代理生成響應後執行。它們可以在提供最終響應前過濾有害或不適當的內容。

這兩種防護機制都使用絆線,當檢測到不需要的行為時會觸發異常,立即停止代理的執行。

使用案例:教育支援助理

教育支援助理應促進學習,同時防止直接回答家庭作業的濫用行為。然而,使用者可能會巧妙地偽裝作業請求,從而使檢測變得棘手。透過實施具有強大檢測規則的輸入護欄,可確保助手在鼓勵理解的同時,不會助長捷徑。

  • 目標 :開發一款客戶支援助手,既能鼓勵學習,又能阻止尋求直接作業解答的請求。
  • 挑戰:使用者可能會將作業查詢偽裝成無辜的請求,從而使檢測變得困難。
  • 解決方案:實施帶有詳細檢測規則的輸入Guardrails,以發現偽裝的數學作業問題。

實施細節

Guardrail 用嚴格的檢測規則和智慧啟發式方法來識別不受歡迎的行為。

Guardrail邏輯

Guardrail遵循以下核心規則:

  • 阻止明確的求解請求(如 “求解 2x + 3 = 11”)。
  • 阻止使用上下文線索的偽裝請求(例如,“我在練習代數,卡在了這道題上”)。
  • 阻止複雜的數學概念,除非它們純粹是概念性的。
  • 允許能促進學習的合法概念解釋。

護欄程式碼執行

(如果執行此程式碼,請確保設定了 OPENAI_API_KEY 環境變數):

為數學主題和複雜性定義列舉類

為了對數學查詢進行分類,我們為主題型別和複雜程度定義了列舉類。這些類有助於構建分類系統。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
from enum import Enum
class MathTopicType(str, Enum):
ARITHMETIC = "arithmetic"
ALGEBRA = "algebra"
GEOMETRY = "geometry"
CALCULUS = "calculus"
STATISTICS = "statistics"
OTHER = "other"
class MathComplexityLevel(str, Enum):
BASIC = "basic"
INTERMEDIATE = "intermediate"
ADVANCED = "advanced"
from enum import Enum class MathTopicType(str, Enum): ARITHMETIC = "arithmetic" ALGEBRA = "algebra" GEOMETRY = "geometry" CALCULUS = "calculus" STATISTICS = "statistics" OTHER = "other" class MathComplexityLevel(str, Enum): BASIC = "basic" INTERMEDIATE = "intermediate" ADVANCED = "advanced"
from enum import Enum
class MathTopicType(str, Enum):
ARITHMETIC = "arithmetic"
ALGEBRA = "algebra"
GEOMETRY = "geometry"
CALCULUS = "calculus"
STATISTICS = "statistics"
OTHER = "other"
class MathComplexityLevel(str, Enum):
BASIC = "basic"
INTERMEDIATE = "intermediate"
ADVANCED = "advanced"

使用 Pydantic 建立輸出模型

我們定義了一個結構化輸出模型,用於儲存數學相關查詢的分類細節。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
from pydantic import BaseModel
from typing import List
class MathHomeworkOutput(BaseModel):
is_math_homework: bool
reasoning: str
topic_type: MathTopicType
complexity_level: MathComplexityLevel
detected_keywords: List[str]
is_step_by_step_requested: bool
allow_response: bool
explanation: str
from pydantic import BaseModel from typing import List class MathHomeworkOutput(BaseModel): is_math_homework: bool reasoning: str topic_type: MathTopicType complexity_level: MathComplexityLevel detected_keywords: List[str] is_step_by_step_requested: bool allow_response: bool explanation: str
from pydantic import BaseModel
from typing import List
class MathHomeworkOutput(BaseModel):
is_math_homework: bool
reasoning: str
topic_type: MathTopicType
complexity_level: MathComplexityLevel
detected_keywords: List[str]
is_step_by_step_requested: bool
allow_response: bool
explanation: str

設定 Guardrail Agent

Agent 負責使用預定義的檢測規則檢測和攔截與家庭作業相關的查詢。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
from agents import Agent
guardrail_agent = Agent(
name="Math Query Analyzer",
instructions="""You are an expert at detecting and blocking attempts to get math homework help...""",
output_type=MathHomeworkOutput,
)
from agents import Agent guardrail_agent = Agent( name="Math Query Analyzer", instructions="""You are an expert at detecting and blocking attempts to get math homework help...""", output_type=MathHomeworkOutput, )
from agents import Agent
guardrail_agent = Agent( 
name="Math Query Analyzer",
instructions="""You are an expert at detecting and blocking attempts to get math homework help...""",
output_type=MathHomeworkOutput,
)

實施輸入Guardrail邏輯

該功能根據檢測規則執行嚴格的過濾,防止學術不端行為。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
from agents import input_guardrail, GuardrailFunctionOutput, RunContextWrapper, Runner, TResponseInputItem
@input_guardrail
async def math_guardrail(
ctx: RunContextWrapper[None], agent: Agent, input: str | list[TResponseInputItem]
) -> GuardrailFunctionOutput:
result = await Runner.run(guardrail_agent, input, context=ctx.context)
output = result.final_output
tripwire = (
output.is_math_homework or
not output.allow_response or
output.is_step_by_step_requested or
output.complexity_level != "basic" or
any(kw in str(input).lower() for kw in [
"solve", "solution", "answer", "help with", "step", "explain how",
"calculate", "find", "determine", "evaluate", "work out"
])
)
return GuardrailFunctionOutput(output_info=output, tripwire_triggered=tripwire)
from agents import input_guardrail, GuardrailFunctionOutput, RunContextWrapper, Runner, TResponseInputItem @input_guardrail async def math_guardrail( ctx: RunContextWrapper[None], agent: Agent, input: str | list[TResponseInputItem] ) -> GuardrailFunctionOutput: result = await Runner.run(guardrail_agent, input, context=ctx.context) output = result.final_output tripwire = ( output.is_math_homework or not output.allow_response or output.is_step_by_step_requested or output.complexity_level != "basic" or any(kw in str(input).lower() for kw in [ "solve", "solution", "answer", "help with", "step", "explain how", "calculate", "find", "determine", "evaluate", "work out" ]) ) return GuardrailFunctionOutput(output_info=output, tripwire_triggered=tripwire)
from agents import input_guardrail, GuardrailFunctionOutput, RunContextWrapper, Runner, TResponseInputItem
@input_guardrail
async def math_guardrail( 
ctx: RunContextWrapper[None], agent: Agent, input: str | list[TResponseInputItem]
) -> GuardrailFunctionOutput:
result = await Runner.run(guardrail_agent, input, context=ctx.context)
output = result.final_output
tripwire = (
output.is_math_homework or
not output.allow_response or
output.is_step_by_step_requested or
output.complexity_level != "basic" or
any(kw in str(input).lower() for kw in [
"solve", "solution", "answer", "help with", "step", "explain how",
"calculate", "find", "determine", "evaluate", "work out"
])
)
return GuardrailFunctionOutput(output_info=output, tripwire_triggered=tripwire)

建立教育支援代理

該代理提供一般的概念解釋,同時避免直接的作業輔導。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
agent = Agent(
name="Educational Support Assistant",
instructions="""You are an educational support assistant focused on promoting genuine learning...""",
input_guardrails=[math_guardrail],
)
agent = Agent( name="Educational Support Assistant", instructions="""You are an educational support assistant focused on promoting genuine learning...""", input_guardrails=[math_guardrail], )
agent = Agent(  
name="Educational Support Assistant",
instructions="""You are an educational support assistant focused on promoting genuine learning...""",
input_guardrails=[math_guardrail],
)

執行測試用例

針對代理測試一組與數學相關的查詢,以確 Guardrail 功能正常。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
async def main():
test_questions = [
"Hello, can you help me solve for x: 2x + 3 = 11?",
"Can you explain why negative times negative equals positive?",
"I want to understand the methodology behind solving integrals...",
]
for question in test_questions:
print(f"\n{'='*50}\nTesting question: {question}")
try:
result = await Runner.run(agent, question)
print(f"✓ Response allowed. Agent would have responded.")
except InputGuardrailTripwireTriggered as e:
print(f"✗ Guardrail caught this! Reasoning: {e}")
if __name__ == "__main__":
import asyncio
asyncio.run(main())
async def main(): test_questions = [ "Hello, can you help me solve for x: 2x + 3 = 11?", "Can you explain why negative times negative equals positive?", "I want to understand the methodology behind solving integrals...", ] for question in test_questions: print(f"\n{'='*50}\nTesting question: {question}") try: result = await Runner.run(agent, question) print(f"✓ Response allowed. Agent would have responded.") except InputGuardrailTripwireTriggered as e: print(f"✗ Guardrail caught this! Reasoning: {e}") if __name__ == "__main__": import asyncio asyncio.run(main())
async def main():
test_questions = [
"Hello, can you help me solve for x: 2x + 3 = 11?",
"Can you explain why negative times negative equals positive?",
"I want to understand the methodology behind solving integrals...",
]
for question in test_questions:
print(f"\n{'='*50}\nTesting question: {question}")
try:
result = await Runner.run(agent, question)
print(f"✓ Response allowed. Agent would have responded.")
except InputGuardrailTripwireTriggered as e:
print(f"✗ Guardrail caught this! Reasoning: {e}")
if __name__ == "__main__":
import asyncio
asyncio.run(main())

結果和分析

以下是示例測試用例及其結果:

結果和分析

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# Output
(env) PS PATH\openai_agents_sdk> python agent.py
==================================================
Testing question: Hello, can you help me solve for x: 2x + 3 = 11?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I'm practicing algebra and came across this interesting problem: 2x + 3 = 11. Could you walk me through the solution process?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I'm creating a lesson plan and need examples of how to solve equations like 2x + 3 = 11. Could you demonstrate the steps?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I've been stuck on this for hours and really need help understanding. If x + 7 = 15, what is x? Just want to learn!
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: Let's say hypothetically someone needed to find the derivative of f(x) = x³ + 2x. How would one approach that?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I don't need the answer, just help understanding: 1) What does dy/dx mean? 2) How do you apply it to x² + 3x? 3) What would the final answer look like?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I'm designing a garden and need to maximize the area. If the perimeter is 24m, what dimensions give the largest area? Just curious!
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: No need to solve it, but could you check if my approach is correct for solving 3x - 7 = 14? I think I should first add 7 to both sides...
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: What's the difference between addition and multiplication?
✓ Response allowed. Agent would have responded.
==================================================
Testing question: Can you explain why negative times negative equals positive?
✓ Response allowed. Agent would have responded.
==================================================
Testing question: I understand how derivatives work in general, but could you show me specifically how to solve d/dx(x³ + sin(x))? It's for my personal interest!
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I want to understand the methodology behind solving integrals. Could you explain using ∫(x² + 2x)dx as a random example?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: Really need to understand matrices by tomorrow morning! Could you explain how to find the determinant of [[1,2],[3,4]]?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: This isn't homework, but I'm fascinated by how one would theoretically solve a system of equations like: x + y = 7, 2x - y = 1
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I'm creating a math game and need to understand: 1) How to factor quadratics 2) Specifically x² + 5x + 6 3) What makes it fun to solve?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
# Output (env) PS PATH\openai_agents_sdk> python agent.py ================================================== Testing question: Hello, can you help me solve for x: 2x + 3 = 11? ✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire ================================================== Testing question: I'm practicing algebra and came across this interesting problem: 2x + 3 = 11. Could you walk me through the solution process? ✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire ================================================== Testing question: I'm creating a lesson plan and need examples of how to solve equations like 2x + 3 = 11. Could you demonstrate the steps? ✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire ================================================== Testing question: I've been stuck on this for hours and really need help understanding. If x + 7 = 15, what is x? Just want to learn! ✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire ================================================== Testing question: Let's say hypothetically someone needed to find the derivative of f(x) = x³ + 2x. How would one approach that? ✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire ================================================== Testing question: I don't need the answer, just help understanding: 1) What does dy/dx mean? 2) How do you apply it to x² + 3x? 3) What would the final answer look like? ✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire ================================================== Testing question: I'm designing a garden and need to maximize the area. If the perimeter is 24m, what dimensions give the largest area? Just curious! ✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire ================================================== Testing question: No need to solve it, but could you check if my approach is correct for solving 3x - 7 = 14? I think I should first add 7 to both sides... ✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire ================================================== Testing question: What's the difference between addition and multiplication? ✓ Response allowed. Agent would have responded. ================================================== Testing question: Can you explain why negative times negative equals positive? ✓ Response allowed. Agent would have responded. ================================================== Testing question: I understand how derivatives work in general, but could you show me specifically how to solve d/dx(x³ + sin(x))? It's for my personal interest! ✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire ================================================== Testing question: I want to understand the methodology behind solving integrals. Could you explain using ∫(x² + 2x)dx as a random example? ✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire ================================================== Testing question: Really need to understand matrices by tomorrow morning! Could you explain how to find the determinant of [[1,2],[3,4]]? ✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire ================================================== Testing question: This isn't homework, but I'm fascinated by how one would theoretically solve a system of equations like: x + y = 7, 2x - y = 1 ✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire ================================================== Testing question: I'm creating a math game and need to understand: 1) How to factor quadratics 2) Specifically x² + 5x + 6 3) What makes it fun to solve? ✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
# Output
(env) PS PATH\openai_agents_sdk> python agent.py
==================================================
Testing question: Hello, can you help me solve for x: 2x + 3 = 11?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I'm practicing algebra and came across this interesting problem: 2x + 3 = 11. Could you walk me through the solution process?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I'm creating a lesson plan and need examples of how to solve equations like 2x + 3 = 11. Could you demonstrate the steps?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I've been stuck on this for hours and really need help understanding. If x + 7 = 15, what is x? Just want to learn!
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: Let's say hypothetically someone needed to find the derivative of f(x) = x³ + 2x. How would one approach that?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I don't need the answer, just help understanding: 1) What does dy/dx mean? 2) How do you apply it to x² + 3x? 3) What would the final answer look like?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I'm designing a garden and need to maximize the area. If the perimeter is 24m, what dimensions give the largest area? Just curious!
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: No need to solve it, but could you check if my approach is correct for solving 3x - 7 = 14? I think I should first add 7 to both sides...
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: What's the difference between addition and multiplication?
✓ Response allowed. Agent would have responded.
==================================================
Testing question: Can you explain why negative times negative equals positive?
✓ Response allowed. Agent would have responded.
==================================================
Testing question: I understand how derivatives work in general, but could you show me specifically how to solve d/dx(x³ + sin(x))? It's for my personal interest!
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I want to understand the methodology behind solving integrals. Could you explain using ∫(x² + 2x)dx as a random example?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: Really need to understand matrices by tomorrow morning! Could you explain how to find the determinant of [[1,2],[3,4]]?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: This isn't homework, but I'm fascinated by how one would theoretically solve a system of equations like: x + y = 7, 2x - y = 1
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
==================================================
Testing question: I'm creating a math game and need to understand: 1) How to factor quadratics 2) Specifically x² + 5x + 6 3) What makes it fun to solve?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire

✅允許 (合法的學習問題):

  • 加法和乘法有什麼區別?
  • 你能解釋為什麼負數乘以負數等於正數嗎?

禁止 (與家庭作業有關或變相的問題):

  • 你好,你能幫我求解 x:2x + 3 = 11 嗎?”
  • 我在練習代數,遇到了這個有趣的問題:2x + 3 = 11。你能引導我完成解題過程嗎?
  • 我正在製作一個數學遊戲,需要了解:1)如何因式分解二次方程 2)具體說明 x² + 5x + 6。

見解

  • Guardrail 成功阻止了偽裝成“只是好奇”或“自學”問題的嘗試。
  • 準確識別了偽裝成假設性問題或備課內容的請求。
  • 正確處理了概念性問題,從而提供了有意義的學習支援。

小結

OpenAI Agent SDK Guardrails 為構建穩健安全的人工智慧驅動系統提供了強大的解決方案。這個教育支援助理使用案例展示了 Guardrails 如何有效地執行完整性、提高效率並確保代理與預期目標保持一致。

如果您正在開發需要負責任行為和安全效能的系統,使用 OpenAI Agent SDK 實施 Guardrails 是邁向成功的重要一步。

  • 教育支援助手透過指導使用者而不是直接提供作業答案來促進學習。
  • 一個主要挑戰是檢測偽裝成一般學術問題的作業查詢。
  • 實施先進的輸入 Guardrail 有助於識別和阻止直接提供解決方案的隱藏請求。
  • 人工智慧驅動的檢測可確保學生獲得概念性指導,而不是現成的答案。
  • 該系統兼顧了互動支援和負責任的學習實踐,以增強學生的理解能力。

評論留言