When a partner company that provided our AI chat product went out of business, we had three months to build a replacement from scratch. I was the sole backend engineer.
This is the story of how it got built — the architecture decisions, the hard problems, and the things I'd do differently.
What We Were Replacing
The existing product, provided by a partner company, gave customers a RAG-powered chat agent that lived on their website. It crawled their site, used it as a knowledge base, and let visitors ask questions. It was simple: one agent, one source, one website. Agents could trigger actions based on rules, like displaying a form.
When the partner went out of business, we needed to replace it — and we decided to build something better.
Starting Point
I wasn't starting completely from zero, but close. Someone in the org had written a draft of a page crawler in Node.js. I was able to leverage the API Gateway we'd built for the subject line optimizer. And I had just finished my first project using AWS Lambdas and Bedrock Prompt Management, so the patterns were fresh.
Everything else — the architecture, the agent system, the multi-source design, the crawler infrastructure — I designed from scratch.
The Architecture
Here's how a request flows through the system:
The agent lives on the customer's website as an embedded widget. When a visitor opens it, the widget calls an endpoint to fetch the agent's configuration — color, placement, example prompts, which agent to use (for multi-agent support).
When the visitor sends a message, it spins up a Lambda that handles the request. The Lambda tracks analytics data, performs visitor resolution (binding the visitor to an existing contact or creating a new one), and retrieves the agent's configured actions — forms, links, and their trigger rules.
Everything gets passed into the first prompt. It hits the Bedrock Agent, which either triggers an action or queries the knowledge base. The Lambda saves reporting data and returns the response with a session ID for conversation continuity.
The major components: AWS Lambda, Bedrock Agents, Bedrock Knowledge Bases, PostgreSQL, API Gateway, S3, SQS, and DynamoDB.
The Crawler
The web crawler is what populates the knowledge base. It supports multiple modes — recursive crawling, sitemap parsing, and explicit URL lists. It collects page content and stores it in S3 for the knowledge base to index.
The interesting engineering was in handling the real world. Not every website cooperates with crawlers.
We built a three-step approach: first, try a standard crawl. If that fails for specific reasons, try a stealth approach. If that fails, use an external API to access Cloudflare-protected pages.
For password-protected sites, users can provide credentials which we store securely using KMS encryption. The crawler authenticates, captures the session cookie, and caches auth info in DynamoDB so it can be shared across multiple Lambdas.
The crawler itself is distributed — each Lambda crawls a page, detects URLs, and pushes them onto an SQS queue that triggers more Lambdas. A queuing system prevents multiple crawlers from hitting the same website simultaneously (avoiding 429 rate-limit errors). Sources automatically recrawl every 30 days to keep knowledge fresh, and users can trigger a manual recrawl whenever they update their site.
Going Beyond Parity: Many-to-Many
The original product was one agent, one source. We built many-to-many.
A customer can have multiple agents on their website. Agent A might live on a subsection of pages while Agent B covers the rest. They can share the same knowledge base with different configurations, or read from entirely separate sources. A single agent can also pull from multiple sources.
This gives customers real control over how visitors interact with different parts of their website — something the previous product never offered.
The Hardest Problem: Agent Behavior at Scale
The hardest technical challenge wasn't the infrastructure. It was making agents behave consistently.
Each agent can have a custom voice and tone. Each action can have custom response guidance for when and how it triggers. Getting this right for one customer is straightforward. Getting it right for many customers simultaneously is where it gets dangerous.
I saw the problem early: we'd tune the prompts to work for customers A, B, and C, then customer D's configuration would break something. Fix it for D, and A regresses. Classic prompt engineering whack-a-mole.
To solve this, I built a test framework for the agent. It defines test scenarios — specific inputs, expected behaviors, action triggers — and runs them against the agent systematically. When we make changes to prompt structure or action handling, we run the full suite and catch regressions before they hit production. This turned a fragile, manual process into something we could iterate on with confidence.
Being the Only Backend Engineer
The hardest non-technical challenge was scope. This wasn't one system — it was several: the infrastructure, the API, the crawler, the agent, reporting, analytics, visitor resolution. Each one had its own complexity, and I was the only backend engineer across all of them.
On top of that, I was learning significant parts of the stack on the fly. I'd never built Bedrock Agents or Knowledge Bases before. The crawler was written in Node.js, which I had minimal experience with. I tried converting it to Python early on but realized it would introduce more risk than it eliminated, so I kept it in Node.
This is where AI-assisted development became essential. Claude Code was instrumental throughout the project — translating my architectural intent into Node.js, helping debug unfamiliar runtime behavior, and researching different approaches for agent configuration. I ran multiple Claude Code sessions in parallel, each working on different epics, which let me make progress across the crawler, the API, and the agent simultaneously. It didn't replace the need to understand the system deeply, but it dramatically expanded what one engineer could deliver.
Hitting the Deadline
We shipped on the three-month deadline with full feature parity, better performance than the original product, and the scaffolding for multi-agent/multi-source already in place.
The strategy was focus and discipline. I spent significant time writing tests and performing manual testing throughout — not just at the end. The agent test framework was built early specifically because I knew we couldn't afford regression cycles late in the project. And the parallel Claude Code workflow meant I wasn't blocked on one workstream while another sat idle.
The Production War Story
After all that engineering, the first time we deployed to production and created the first agent, the chat widget didn't load.
We rolled it back immediately and started investigating. It took longer than I'd like to admit to find the root cause: a typo in the configuration. The field was named "style" instead of "styles". A one-character, non-code fix.
Every engineer has a story like this. The system that handles millions of records, processes real-time data, manages distributed crawlers across dozens of Lambdas — brought down by a missing letter.
What I'd Do Differently
Use one language for the entire backend. Running Python for the API and Node.js for the crawler works, but it's not ideal. Two dependency ecosystems, two sets of patterns, two mental models. If I were starting over, I'd pick one and commit.
Beyond that, I'd invest more upfront in observability. When you're the only backend engineer, you need the system to tell you what's wrong — you can't be everywhere at once.