Build Your First AI Agent from Scratch - Part 5: Testing, Simulation, and Deployment

📚 Build Your First AI Agent
View All Parts in This Series
Ad Space
Build Your First AI Agent from Scratch - Part 5: Testing, Simulation, and Deployment
We now have an agent that chats, remembers, and uses tools. The final step is making it trustworthy in production. In Lumenly's rollout, this meant building simulation harnesses, release gates, and deployment packs that security and ops approved. Otherwise the agent would stay stuck in "demo" limbo forever.
Thesis: treat agents like resilient microservices. That means smoke tests, simulation-first suites, CI/CD pipelines, feature flags, and observability tuned for autonomy. This final part stitches those elements together.
Simulation-First Testing
Borrowing from our Simulation-First Testing guide, create scenario packs inside simulations/.
# simulations/overdue_invoice.yaml
scenario: "overdue_invoice_followup"
seeds:
transcript: fixtures/invoice_threads.json
mission:
goal: "send personalized reminder with payment link"
constraints:
- "never escalate without checking CRM"
expected:
tool_calls:
- name: search_docs
- name: create_support_ticket
outputs:
contains:
- "invoice INV-324"
- "payment link"
Harness:
# scripts/run_simulation.py
import typer, yaml
from agent_lab.cli import agent
app = typer.Typer()
@app.command()
def run(path: str):
data = yaml.safe_load(open(path))
bot = agent(mock_tools=True)
for turn in data["seeds"]["transcript"]:
bot.send(Message(turn["role"], turn["content"]))
result = bot.send(Message("user", "Please handle the overdue invoice."))
assert all(fragment in result.content for fragment in data["expected"]["outputs"]["contains"])
if __name__ == "__main__":
app()
Wire these simulations into CI (GitHub Actions, GitLab CI). Fail builds if simulations fail or change unexpectedly.
Takeaway: Agents graduate to staging only after simulations are green.
Continuous Integration Pipeline
Sample GitHub Actions workflow:
name: agent-ci
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install -r requirements.txt
- run: pytest --maxfail=1 --disable-warnings
- run: python scripts/run_simulation.py simulations/overdue_invoice.yaml
Add secrets (OPENAI_API_KEY, TOOL_MODE=mock). For integration tests with real APIs, run nightly in a separate workflow flagged as manual or scheduled.
Takeaway: CI should mirror local Make targets to avoid "works on dev" scenarios.
Packaging and Deployment
API Service
Expose the agent via FastAPI:
# src/agent_lab/service.py
from fastapi import FastAPI
from agent_lab.core import CoreAgent
from agent_lab.contracts import Message
app = FastAPI()
@app.post("/chat")
async def chat(payload: dict):
bot = CoreAgent(system_prompt="Prod assistant.")
reply = bot.send(Message("user", payload["message"]))
return {"response": reply.content}
Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY src src
CMD ["uvicorn", "agent_lab.service:app", "--host", "0.0.0.0", "--port", "8080"]
Deployment Targets
- Staging: run with
TOOL_MODE=mockand synthetic users hitting the/chatendpoint. - Production: run behind an API gateway (Kong, Envoy) with rate limiting, auth, and tracing headers.
Takeaway: Containerize early so scaling up is a config change, not a rewrite.
Observability and Alerting
Using the telemetry hooks from earlier parts, stream logs/metrics to a stack such as Loki + Grafana or Datadog.
Metrics to collect:
| Metric | Description | Alert rule |
|---|---|---|
agent.turn_latency_ms |
Time from user input to response | P95 > 2000 ms |
tool.call_failures |
Rate of failed tool executions | >5/min |
simulation.regression |
Count of failing scenarios | >0 |
Add distributed tracing (OpenTelemetry) to correlate user requests with tool calls and model latency.
Takeaway: Ops teams need the same visibility they expect from any service.
Release and Rollback Strategy
- Feature flags: gate risky abilities (auto-refunds) behind LaunchDarkly or ConfigCat toggles.
- Shadow mode: in prod, run the agent in read-only mode and compare outputs to human responses before auto-executing.
- Rollback: maintain scripts that revoke tokens, cancel scheduled tool calls, and revert recent changes in transcripts.
Document release steps in docs/runbook.md:
1. Merge PR, CI runs simulations.
2. Deploy to staging via ArgoCD.
3. Run smoke suite; attach reports.
4. Flip feature flag to 5% traffic.
5. Monitor metrics for 30 minutes.
6. Gradually roll to 100%.
Takeaway: Autonomy increases gradually, not with a big bang.
Final Checklist and Next Steps
You now have:
- Simulation packs and harnesses.
- CI pipeline enforcing tests + simulations.
- API service + Docker packaging.
- Observability stack with alerts.
- Release runbooks and rollback plans.
From here you can:
- Integrate with Agent Observability and Ops dashboards.
- Apply Security for Web-Active Agents if the agent touches browsers.
- Explore monetization/packaging strategies using Monetizing Agent Products.
The Build-Your-First-Agent series is complete, but the real work (iterating with users, adding safeguards, expanding tools) continues. Ship your first pilot, measure everything, and keep improving.
Ad Space
Recommended Tools & Resources
* This section contains affiliate links. We may earn a commission when you purchase through these links at no additional cost to you.
📚 Featured AI Books
OpenAI API
AI PlatformAccess GPT-4 and other powerful AI models for your agent development.
LangChain Plus
FrameworkAdvanced framework for building applications with large language models.
Pinecone Vector Database
DatabaseHigh-performance vector database for AI applications and semantic search.
AI Agent Development Course
EducationComplete course on building production-ready AI agents from scratch.
💡 Pro Tip
Start with the free tiers of these tools to experiment, then upgrade as your AI agent projects grow. Most successful developers use a combination of 2-3 core tools rather than trying everything at once.
📚 Build Your First AI Agent
View All Parts in This Series
🚀 Join the AgentForge Community
Get weekly insights, tutorials, and the latest AI agent developments delivered to your inbox.
No spam, ever. Unsubscribe at any time.



