Customer Intelligence AI Agent

An AI system that pulls call transcripts into per-client vector search and answers questions about clients from Slack.

n8nClaudeSupabaseOpenAI EmbeddingsSlackFathom

2025-12-15

Overview

Client-facing teams at Balboa spend real time on a task that doesn't require any judgment: finding out what a client said last month or two calls ago. Someone asks in Slack whether a client ever mentioned X, and the answer exists. It's in a Fathom transcript somewhere. Getting to it means scrolling through Fathom, reading through notes, or pinging whoever was on the call.

I spent my December 2025 break building a system to automate that. Balboa is my dad's company, so I had direct access to the users and a live environment to deploy into. About 30 hours of work over two weeks. I treated it as a chance to build something that actually runs in production, not a demo that works once.

What I Built

Two pieces.

The first is a pipeline that pulls new call transcripts from Fathom every hour, embeds them, and writes them into a vector database tagged by client. The second is a Slack bot that listens for mentions in any internal channel, retrieves relevant context from the vector store, and answers questions through Claude. It runs across Balboa's internal client channels and processes 100 to 200 transcripts a week.

The bot can draft emails and proposals when asked, but it can't send anything. Draft-only was a deliberate choice. An LLM with send permissions is a liability waiting for the wrong context window.

Architecture

Both halves run on n8n, which handles the scheduling, webhooks, and API calls between services.

Ingestion flow:

Hourly scheduled trigger
n8n pulls new transcripts from Fathom's API
Each transcript gets hashed and checked for duplicates
New transcripts are embedded with OpenAI's text embedding model
Embeddings go to Supabase, tagged by client

Query flow:

Slack bot listens for @-mentions
Mention hits an n8n webhook
The channel's client tag filters the vector search
Top matches and the question get sent to Claude
Claude's answer posts back to the thread

Vector Store Migration

The first version used OpenAI's hosted vector stores, one per client. That worked, but every new client meant provisioning a new store and wiring it into the workflow. More stores, more IDs to track, more cost.

I migrated everything to Supabase with a single vector database and a client tag on every row. One place to query, one place to back up. Adding a new client is now adding a tag, not a store. It's cheaper to run and easier to extend if we ever want to search across clients.

The Duplicate Transcript Problem

Early on I wasn't hashing transcripts before ingesting them. Fathom's API returned transcripts in batches, and the hourly scheduler would sometimes pull overlapping windows, or a transcript would get re-processed after a retry. The same conversation ended up in the vector store three or four times. A query would pull back four near-identical chunks from one call and push out context from others.

Fix was a content hash on every transcript. Before embedding, n8n computes the hash and checks it against the hashes already in Supabase. If it matches, skip it. Duplicates went to zero.

I didn't know I needed to solve this until it broke. Tutorials show the happy path where every document gets ingested once. Real pipelines run against real APIs with retries and overlapping schedules. Idempotency has to be built in from the start.

Logging

Every step writes to a log file. When something fails, the log shows where it broke and what data was in flight. Without logs you re-run the workflow and hope it fails the same way. With logs the failure tells you where to look.

I didn't build a retry system beyond what n8n gives you by default. The system is simple enough that most failures are recoverable once you can see what they were.

Why It Was Worth Building

Most AI side projects solve problems nobody has. This one replaced a task people were doing by hand every day: pulling up what a client said on a recent call. The AI isn't the value. Being faster than scrolling Fathom, and being in the place where the question was already being asked (Slack), is the value.

Building something a team uses daily because it's faster than the manual version is a different problem than building something that works. You have to know what the manual version is, where it breaks, and where the user is when they need the answer. The Claude plus RAG plus Slack integration part is standard. The non-obvious part was deciding what to automate and what to leave alone. Drafting emails is safe. Sending them is not, because the cost of the AI sending a wrong email to a client is much higher than the time it would save.

What I'd Change

Hourly scheduled pulls are the weakest part of the current design. If Fathom publishes webhooks, a push-based flow would cut latency from up to an hour down to seconds, and would remove the overlapping-window problem that caused the duplicate bug in the first place. The hash check is worth keeping as a backstop, but event-driven ingestion is the cleaner setup.

Status

Live. Running daily in Balboa's internal Slack channels, ingesting 100 to 200 transcripts a week, and used by the team for real client questions.

← All Projects