OpenAI Agent SDK: Build Reflection Enabled Agentic
The SDK supports other third-party models. In this example, I’m using Vertex AI’s Gemini 2.0.
You can understand the SDK as a collection of LangChain Supervisor and Swarm, or directly as LlamaIndex’s AgentWorkflow.
So far, I still need to adjust the prompts from my previous experiments to adapt to the SDK. Simply copying and pasting them results in a low success rate for the agentic system, and handoff cannot be completed. Therefore, one lesson learned is that no matter what framework you use, copy-pasting prompts is unlikely to work well. So, remember that.
Note that agentic systems have randomness, and there is no guarantee that every task will be completed. Therefore, the hit rate needs to be continuously improved through prompt engineering.
The key to using third-party models is the base_url. For Gemini, we can get it from Vertex AI or Generative AI (Google AI Studio).
vertexai:
f"https://{os.environ['GOOGLE_CLOUD_REGION']}-aiplatform.googleapis.com/v1beta1/projects/{os.environ['GOOGLE_CLOUD_PROJECT']}/locations/{os.environ['GOOGLE_CLOUD_REGION']}/endpoints/openapi"
ai studio:
"https://generativelanguage.googleapis.com/v1beta/openai/"
The design of handoff is a bit different, and while it’s helpful to specify it in the instructions (agent’s prompt), the SDK also provides a handoff_description, which is optional, but in my experience, very helpful.
Note that handoff_description is different from how handoff details are described in LangChain Swarm. In OpenAI, this description is an introduction to the agent itself if this agent will be handed off to by another agent. Here is the source code:
@dataclass
class Agent(Generic[TContext]):
.....
handoff_description: str | None = None
"""A description of the agent. This is used when the agent is used as a handoff, so that an
LLM knows what it does and when to invoke it.
"""
Note the difference: In LangChain Swarm, if we want to transition from PlanAgent or CriticAgent to RegressionAgent, we will prompt on plan and critic respectively:
plan_agent = create_react_agent(
model=plan_model,
name="PlanAgent",
checkpointer=checkpointer,
tools=[
create_handoff_tool(
agent_name="RegressionAgent", # agent to swarm
description="Transfer the PlanAgent's regression plan to the RegressionAgent who will execute the plan and gather information according to it.",
),
],
critic_agent = create_react_agent(
model=critic_model,
name="CriticAgent",
checkpointer=checkpointer,
tools=[
create_handoff_tool(
agent_name="RegressionAgent", # agent to swarm
description="Transfer the CriticAgent's feedback to the RegressionAgent who will revise the regression findings based on the feedback.",
),
In handoff_description, we need to specify the description of the handoffs to the RegressionAgent, for example, that it will accept handoffs from the plan and critic agents under different circumstances.
regression_agent.handoff_description = "Used for regression work after PlanAgent made the plan and also do revision after the CriticAgent's analysis."
In short, writing handoff instructions in the agent’s main prompt is definitely the best approach, but it does increase token usage. If the project budget isn’t a concern, I would like to do it in both places to increase the handoff hit rate.
Code
The overall structure of the code is similar to using other frameworks; you can find relevant articles and code from the following.
- LangChain/LangGraph: Build Reflection Enabled Agentic 👨🏻💻 code
- Llama-Index/AgentWorkflow: Build Reflection Enabled Agentic 👨🏻💻 code
- LangGraph / swarm and supervisor: Build Reflection Enabled Agentic 👨🏻💻 supervisor 👨🏻💻 swarm
- CrewAI / Hierarchical Manager: Build Reflection Enabled Agentic 👨🏻💻 code
- (Llama-Index Workflows)Translation Agent via Reflection Workflow 👨🏻💻 code
Conclusion
Regardless of what framework you use for agentic development, you can’t guarantee results with every run, which impacts the hit rate. Here’s a suggestion:
- Always debug your prompts, which is prompt engineering.
- Never copy and paste prompts from other projects or similar projects.
- Keep the temperature and top p as much as possible ≤ 1.0 and ≤ 1.0, but greater than 0.8.
- Don’t stick to one framework; try LlamaIndex, LangChain, OpenAI SDK for the same goal, but remember rules one and two.
- It is recommended to transition to CrewAI when the project POC is formed, but remember rules one and two.