Home How We Help Pricing Case Studies About Contact

AI/ML Platform•AI Startup•6 weeks

Building Data Infrastructure for an AI Agent Platform

How we helped an AI company gain visibility into their agent performance and reduce debugging time by 80%

Data Pipeline EngineeringDatabase ArchitectureObservability Dashboards

80%

Faster Debugging

Reduced time to identify and fix agent issues from hours to minutes

Cost Visibility

Full breakdown of token usage, API costs, and compute spend per agent

15min

Real-time Alerts

Issues detected and flagged within 15 minutes of occurrence

Ongoing Tool Costs

Self-hosted infrastructure with no per-seat or usage-based fees

Overview

A fast-growing AI startup was running multiple autonomous agents in production but had zero visibility into what was happening inside them. Logs were scattered, costs were unpredictable, and debugging took hours. We built a unified data pipeline that transformed their raw LangSmith traces into actionable insights.

The Challenge

The client had built a sophisticated multi-agent system handling customer support workflows. Each agent made decisions, called APIs, and processed documents autonomously. The problem? They had no idea what was actually happening. LangSmith captured traces, but they were just raw JSON dumps nobody looked at. When an agent misbehaved—wrong responses, infinite loops, or unexpected costs—the engineering team would spend 3-4 hours manually digging through logs to understand what went wrong. Their monthly AI spend was climbing unpredictably. They suspected certain agents were inefficient, but couldn't prove it. Leadership wanted cost attribution per customer, but the data wasn't structured for that.

Key Points

No unified view of agent behavior across the system

Debugging required manual log diving (3-4 hours per incident)

Unpredictable monthly AI costs with no attribution

LangSmith data sitting unused in raw JSON format

Our Approach

We started with a 2-day discovery sprint to understand their agent architecture, existing data sources, and what questions they actually needed answered. From there, we designed a pipeline that would flow data from LangSmith into a queryable analytics layer. The key insight was that they didn't need real-time streaming for everything—most questions could be answered with 15-minute latency. This let us build a simpler, more reliable batch pipeline rather than over-engineering with Kafka.

Key Points

Discovery sprint to map data sources and requirements

Batch pipeline design prioritizing reliability over complexity

15-minute latency SLA—fast enough for alerts, simple enough to maintain

Clear data model designed around their actual questions

The Solution

We built a three-layer architecture: The Data Ingestion layer uses a Python-based ETL pipeline that pulls trace data from LangSmith every 15 minutes, normalizes the nested JSON structures, and extracts key metrics including token counts, latencies, tool calls, and error rates. The Storage Layer is split by purpose: raw traces land in S3 for long-term retention and compliance, while aggregated metrics go into ClickHouse for fast analytical queries. This dual approach keeps storage costs low while enabling sub-second dashboard queries. For Visualization, Grafana dashboards provide real-time visibility into agent performance. Custom panels show token usage trends, error rates by agent type, latency distributions, and cost breakdowns by customer.

Key Points

LangSmith → S3 (raw) + ClickHouse (aggregated)

Python ETL with robust error handling and retry logic

Grafana dashboards with 20+ panels covering all key metrics

Automated alerts via Slack for anomalies and cost spikes

Technical Implementation

The pipeline runs on their existing AWS infrastructure with minimal additional resources. We used Lambda for the ETL jobs (cost-effective for their volume), S3 for raw storage, and a small ClickHouse cluster for analytics. We made several key technical decisions: We chose ClickHouse over Postgres because their query patterns—time-series aggregations, high cardinality metrics—are exactly what ClickHouse excels at. Queries that took 30+ seconds in Postgres now complete in under 1 second. We used S3 for raw data because compliance required 2-year retention. Storing raw JSON in S3 costs ~$5/month for their volume vs. $200+/month in a database. We went with Grafana over custom dashboards because it's battle-tested, self-hosted, and their team already knew it. No point reinventing the wheel.

Key Points

AWS Lambda for serverless ETL (pay-per-execution)

ClickHouse for sub-second analytical queries

S3 for compliant long-term storage at $5/month

Grafana with custom panels and Slack alerting

The Results

Within two weeks of going live, the client identified two agents that were using 4x more tokens than necessary due to inefficient prompts. Fixing these saved them $3,000/month in API costs—more than the entire project cost. Debugging time dropped dramatically. When an agent started behaving unexpectedly, engineers could now open a dashboard, see exactly what happened (which tools were called, what the LLM returned, where it went wrong), and fix it in minutes instead of hours. Leadership finally got the cost attribution they needed. They could now see AI spend per customer, per agent, per day—enabling better pricing decisions and identifying their most expensive (and profitable) use cases.

Key Points

$3,000/month saved by identifying inefficient agents

Debugging reduced from 3-4 hours to 15-20 minutes

Full cost attribution by customer, agent, and time period

Zero ongoing tool costs (self-hosted infrastructure)

Tech Stack

PythonAWS LambdaS3ClickHouseGrafanaLangSmith API

“We went from flying blind to having complete visibility into our AI systems. The dashboard is now the first thing we check every morning.”

Head of Engineering

AI Startup

More Case Studies

Parking Management

Real-Time Data Pipeline for Car Parking Management

Building a flexible, self-hosted analytics infrastructure with real-time data ingestion and custom observability dashboards

Information Management

Modernizing Legacy Data for Self-Service BI

Transforming a legacy database into a high-performance analytics layer with self-service dashboards

Want Similar Results?

Let's discuss how we can build the data infrastructure your AI team needs. No sales pitch—just a technical conversation about your challenges.

Book a Call

Response within 24 hours