Build Your First AI Agent from Scratch - Part 1: Environment Setup and Safety Rails

📚 Build Your First AI Agent
View All Parts in This Series
Ad Space
Build Your First AI Agent from Scratch - Part 1: Environment Setup and Safety Rails
When the customer-success team at Lumenly tried to pilot an internal research agent, the prototype ran on one engineer's laptop, depended on a dozen global pip installs, and leaked API keys into shared shell history. The proof of concept impressed stakeholders, but the rollout stalled because no one could reproduce the environment safely. That scenario happens in almost every company experimenting with agents: the ideas are ambitious, but the foundation--tooling, secrets, and policy controls--is fragile.
This tutorial kicks off the Build Your First AI Agent series by focusing on the unglamorous work that unlocks everything else. The thesis is simple: if you want agents you can trust, invest a few hours in a disciplined environment--Python that everyone can version, virtual environments that keep dependencies isolated, secret management that survives audits, and smoke tests that prove the stack works. Do this once, and Parts 2-5 (structure, memory, tools, deployment) become far smoother.
Understand What the Environment Must Guarantee
Before you install anything, align on the outcomes. An agent development environment should guarantee:
- Reproducibility: Every teammate (and future you) can clone the repo, run a single bootstrap script, and get identical versions of Python, libraries, and CLI tooling.
- Safety: Secrets stay outside version control, linting catches accidental leaks, and sandbox scripts limit the blast radius during early experiments.
- Observability from Day One: Even a toy agent should log token usage, dependency versions, and health checks. If you tack this on later, debugging landslide errors becomes painful.
With those goals in mind, the stack we'll assemble includes Python 3.11, pyenv or system installers, uv or pip for package management, Poetry-style lockfiles, pre-commit hooks, and a starter monitoring script.
Takeaway: Define the bar first--otherwise "works on my machine" becomes the default SLA.
Install and Pin Python the Right Way
Theme: predictable runtimes beat ad-hoc downloads.
Choose one Python distribution method and document it. My recommendation:
- macOS/Linux: Install pyenv so you can pin Python per project.
- Windows: Use the Microsoft Store or Python.org installer plus pyenv-win if you switch versions frequently.
# macOS/Linux example
curl https://pyenv.run | bash
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
exec "$SHELL"
pyenv install 3.11.7
pyenv local 3.11.7
python --version # Python 3.11.7
Document the command set in docs/setup.md, commit .python-version, and add a CI check that fails when someone deviates. Use python -m site to confirm the correct site-packages path--this matters once we create virtual environments and lock dependencies.
Takeaway: Pin Python explicitly so you can trace bugs to code, not runtimes.
Create a Project Skeleton with Virtual Environments and Tooling
Theme: structure is the scaffolding for velocity.
Lay out the repo in a way that anticipates growth:
ai-agent-tutorial/
|- src/
| \_- agent_lab/
| |- __init__.py
| \_- cli.py
|- scripts/
| \_- bootstrap.sh
|- tests/
| \_- test_smoke.py
|- .env.example
|- pyproject.toml
|- README.md
\_- Makefile
Create a virtual environment and install core tooling (uv or pip, ruff, pytest, python-dotenv, openai). I prefer uv because it resolves dependencies quickly, but plain pip works too.
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install --upgrade pip
pip install ruff pytest python-dotenv openai
pip freeze > requirements.txt
Codify everything in scripts/bootstrap.sh so new contributors run one command:
#!/usr/bin/env bash
set -euo pipefail
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
pre-commit install
Takeaway: Bootstrap scripts save hours when onboarding teammates or spinning up CI runners.
Wire Secrets and Environment Variables Safely
Theme: treat secrets like code with policy.
Copy .env.example into .env and keep real keys out of Git. Use python-dotenv in early prototypes, then graduate to Vault or Doppler when you deploy.
OPENAI_API_KEY=sk-your-key
ANTHROPIC_API_KEY=
AGENT_DATA_DIR=.agent_data
Load variables in a single settings.py module:
# src/agent_lab/settings.py
from dataclasses import dataclass
from pathlib import Path
from dotenv import load_dotenv
import os
load_dotenv()
@dataclass(frozen=True)
class Settings:
openai_api_key: str = os.getenv("OPENAI_API_KEY", "")
data_dir: Path = Path(os.getenv("AGENT_DATA_DIR", ".agent_data"))
settings = Settings()
settings.data_dir.mkdir(parents=True, exist_ok=True)
Add AGENT_DATA_DIR to .gitignore, run pre-commit hooks that block accidental key commits, and log warnings if keys are missing. This is also where you can seed policy prompts later (e.g., lists of allowed tools).
Takeaway: Centralize configuration so policy changes happen in one file instead of every script.
Verify the Stack with Smoke Tests and Observability Hooks
Theme: trust comes from evidence.
Create a minimal CLI and smoke test:
# src/agent_lab/cli.py
from openai import OpenAI
from agent_lab.settings import settings
def healthcheck() -> bool:
if not settings.openai_api_key:
raise RuntimeError("Missing OPENAI_API_KEY")
client = OpenAI(api_key=settings.openai_api_key)
resp = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "healthcheck"}],
max_tokens=10,
)
print("Tokens used:", resp.usage.total_tokens)
return True
if __name__ == "__main__":
healthcheck()
Add a pytest case to tests/test_smoke.py that mocks OpenAI so CI can run without real calls. Record metrics in a simple JSON log:
# src/agent_lab/logging.py
import json, time
from pathlib import Path
def emit_metric(name: str, **data) -> None:
entry = {"ts": time.time(), "metric": name, **data}
Path("logs").mkdir(exist_ok=True)
Path("logs/metrics.log").write_text(
(Path("logs/metrics.log").read_text() if Path("logs/metrics.log").exists() else "")
+ json.dumps(entry) + "\n",
encoding="utf-8"
)
Calling emit_metric("bootstrap.healthcheck", tokens=resp.usage.total_tokens) now gives you a breadcrumb trail for debugging--you can feed these logs into Part 5's deployment pipeline later.
Takeaway: Ship tests and metrics with the very first script so reliability culture starts immediately.
Automate Quality Gates with Pre-Commit and Make
Theme: keep the guardrails close to the keyboard.
Install pre-commit hooks that run Ruff, detect secrets, and enforce formatting:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.6.1
hooks:
- id: ruff
- repo: https://github.com/zricethezav/gitleaks
rev: v8.18.1
hooks:
- id: gitleaks
Wrap routine tasks in a Makefile (or PowerShell script on Windows):
bootstrap:
sh scripts/bootstrap.sh
lint:
. .venv/bin/activate && ruff check src tests
test:
. .venv/bin/activate && pytest -q
healthcheck:
. .venv/bin/activate && python -m src.agent_lab.cli
This gives newcomers a self-documenting command palette and primes the repo for CI/CD down the road.
Takeaway: Automate checks locally so CI becomes enforcement, not discovery.
Troubleshoot Quickly with a Decision Tree
Issues are inevitable; documenting fixes now saves hours later. Create docs/troubleshooting.md with branches such as:
- Python version mismatch? -> Delete
.venv, runpyenv local, reinstall. - Pip install failing behind corporate proxy? -> Configure
pip.ini/pip.confwith proxy credentials, rerun bootstrap. - OpenAI connection errors? -> Confirm API key scope, check billing, run
curl https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY".
Encourage engineers to append to this document whenever they solve a new problem. This "runbook mindset" will carry through when the agent manages asynchronous workflows in later tutorials.
Takeaway: Treat troubleshooting as shared infrastructure, not tribal knowledge.
Checklist and Next Steps
You've now built a reproducible agent lab that includes:
- Python 3.11 pinned via pyenv or system installer.
- A clean virtual environment with locked dependencies.
- Bootstrap scripts, Make targets, and pre-commit hooks.
- Centralized secrets management and configuration.
- Smoke tests plus metric logging for early observability.
- Troubleshooting playbooks so no one repeats the same pain.
Up next (Part 2): we will turn this scaffolding into a working agent skeleton--stateful CLI, planner interface, and logging that flows into the metrics we just configured. Have your OpenAI (or alternative) credentials ready, and run make healthcheck once more before continuing.
If you want to explore related material meanwhile, skim Agent Observability and Ops for inspiration on what our logs will eventually feed, and Security for Web-Active Agents to understand why secrets discipline matters so much even in day-one prototypes.
Ad Space
Recommended Tools & Resources
* This section contains affiliate links. We may earn a commission when you purchase through these links at no additional cost to you.
📚 Featured AI Books
OpenAI API
AI PlatformAccess GPT-4 and other powerful AI models for your agent development.
LangChain Plus
FrameworkAdvanced framework for building applications with large language models.
Pinecone Vector Database
DatabaseHigh-performance vector database for AI applications and semantic search.
AI Agent Development Course
EducationComplete course on building production-ready AI agents from scratch.
💡 Pro Tip
Start with the free tiers of these tools to experiment, then upgrade as your AI agent projects grow. Most successful developers use a combination of 2-3 core tools rather than trying everything at once.
📚 Build Your First AI Agent
View All Parts in This Series
🚀 Join the AgentForge Community
Get weekly insights, tutorials, and the latest AI agent developments delivered to your inbox.
No spam, ever. Unsubscribe at any time.



