Building an AI Agent Sample Project

Building a purely example project is a different kind of challenge. You’re not shipping a feature — you’re trying to compress a learning curve into something someone else can pick up and run with. That’s what this was: a sample AI agent project meant to give teams a starting point, a set of patterns, and enough working code to not be starting from scratch.

Why Python, Why Now

One of the first decisions wasn’t really a decision at all — it was a conclusion we’d already reached. If you’re building AI agents in 2025, you’re using Python. Every SDK, every library, every partner integration we evaluated had first-class Python support. Other languages exist, but you’re constantly working against the current. With multiple teams across the company using different languages and frameworks, there was a real desire to standardize at least something, and AI tooling was the clearest case for it. Also, I love Python so I won’t agrue against it’s use.

So: Python as the base. Everything else flowed from there.

Picking the Right Orchestration Layer

Once Python was settled, the next question was how to manage LLM interactions and agent logic cleanly. I landed on LangGraph. It had the market adoption and the track record, and it’s flexible enough to handle the range of things a real project might need without boxing you in. I also knew from the start that observability wasn’t optional — being able to trace what an agent is actually doing matters a lot when you’re trying to debug or evaluate behavior. LangGraph made both easier.

Tools like Maxim AI and Confident AI integrate naturally when you’re using LangChain’s abstractions, and that unlocked a lot more out of the box than I expected. A lot of example code floating around assumes you’re calling OpenAI directly, but we’re routing through Amazon Bedrock — and that mismatch made some integrations more painful than they needed to be at first. Once I leaned into LangChain’s Bedrock support, the tracing story got much cleaner.

Worth being specific about what “cleaner” actually means here: inputs, outputs, tool calls, intermediate reasoning steps, and per-call token spend — all logged and traced, without writing a bunch of custom helper code. Early on, some colleagues had been skeptical and had started vibe-coding helper functions just to get basic tracing in place. Once LangGraph was wired up properly, I could show that the full picture was already there. No custom wrappers needed.

Keeping the Agent Simple on Purpose

The agentic part of the project is deliberately minimal. The agent asks the user for a math problem, calls a Python tool to solve it, and returns the answer in plain English. That’s it.

That simplicity was intentional. The point wasn’t to build something impressive — it was to build something instructive. A math solver is approachable enough that anyone can follow the logic, while still demonstrating tool calling in a realistic way. I also used it to show how guardrails work in practice: the agent validates that the input is actually a math problem, and it’s constrained to return the answer in plain English regardless of what the user tries to get it to do — different language, different format, different topic entirely. The guardrails hold.

The goal was always to show the full picture in a context simple enough to actually understand: here’s how you call an LLM, here’s how you wire up a tool, here’s how you add guardrails, here’s how you observe all of it.

What Didn’t Work: Mono Repos and Local LLMs

Two branches of work that didn’t survive contact with reality. First, I tried restructuring the project as a monorepo, thinking it would make the sample more useful to teams that work that way. It made everything more complicated and less readable, which is the opposite of what an example project should do. I rolled it back. Simpler is more instructive.

Second, early on I didn’t have the permissions yet to call models through Bedrock. Getting those set up took longer than expected. In the meantime, I spun up a local LLM to keep making progress — and getting that running on Windows was a genuine pain. It worked eventually, and it let me keep moving. But once Bedrock access was in place, local model support got relegated to a branch. It’ll probably stay there.

Both felt like detours at the time. In retrospect, they were useful — you learn what to cut by trying it first.

The Timeline

Six weeks total, though that number comes with an asterisk. I was splitting time with other work throughout, so it was closer to three weeks of actual focus. The first couple of weeks got something functional. The remaining time was spent getting everything to a point where it covered the full range of things a team would actually need: tool calling, observability and tracing, cost tracking per transaction, structured logging, guardrails, and basic conversational interaction with the model.

Where Things Landed

So far, my team is the primary audience. I built it, and I’m also the one building the team’s first real AI project, so the learning has been put to use immediately. Other teams, for now, are sticking with their existing stacks — and I get it, even if I’d argue it’s the harder path long-term. Making HTTP calls from PHP to interact with an LLM is a choice. It’s not one I’d make, but people have opinions.

The project will be there when they’re ready.

What It Taught Me

Example projects are deceptively hard. You’re not just writing code — you’re writing code that teaches. Every decision about what to include, what to simplify, and what to leave out is a small act of communication. Getting that right took longer than I expected, and I’m not sure I fully got it right even now.

But the foundation is solid. And that’s what it was always supposed to be.