The Real Cost of Running an AI Product in 2025: $/Token Is Only 30% of Your Bill

TL;DR: Everyone focuses on OpenAI's $2.50/$10 per million tokens. But API costs are only 30-40% of your true cost to run an AI product. Infrastructure (AWS/GCP), monitoring tools, caching overhead, and failed requests add 60-70% more. Here's the complete breakdown with real numbers.

The AI Cost Iceberg

You see the API pricing:

GPT-4o: $2.50 input / $10 output per million tokens
Claude 3.5 Sonnet: $3 input / $15 output per million tokens
Gemini 2.5 Pro: $1.25 input / $10 output per million tokens

You calculate your monthly OpenAI bill: $10,000/month.

You think: "Our AI costs are $10K/month. Manageable."

You're missing $15,000-20,000.

Here's what you're missing:

Cost Category	% of Total	Monthly $ (if API = $10K)
API Costs (OpenAI/Anthropic)	30-40%	$10,000 (visible)
Infrastructure (AWS/GCP/Azure)	40-50%	$12,000-15,000 (often untracked)
Monitoring & Observability	5-10%	$1,500-3,000 (often untracked)
Caching & Storage	3-5%	$900-1,500 (often untracked)
Failed Requests & Retries	2-3%	$600-900 (often untracked)
Development & Testing	3-5%	$900-1,500 (often untracked)
TOTAL COST	100%	$25,900-32,900

Your $10K/month API bill is actually a $26K-33K/month total cost.

This is why Cursor's AWS bill doubled from $6.2M to $12.6M/month even as they optimized API usage.

Breaking Down the Hidden 70%

1. Infrastructure Costs (40-50% of total)

What you're paying for:

Compute: Application servers, background workers, queue processors
Databases: PostgreSQL/MySQL for usage tracking, vector databases for embeddings
Storage: Conversation history, generated content, user data, backups
Networking: Data transfer, load balancers, CDN
Container Orchestration: ECS/EKS, auto-scaling groups

Real Example: Cursor's Infrastructure Bill

Month	AWS Bill	API Costs (est.)	Ratio
May 2025	$6.2M	~$8M	77% (AWS/API)
June 2025	$12.6M	~$16M	79% (AWS/API)

Cursor's AWS costs = 77-79% of their Anthropic bill.

Why so high?

Massive conversation history storage (200K token context windows)
Real-time collaboration infrastructure
Code indexing and vector search
Distributed caching layer

Your Infrastructure Will Cost:

Company Stage	Monthly Active Users	Infrastructure Cost
Early Stage	100-1,000	$500-2,000/month
Growth	1,000-10,000	$2,000-10,000/month
Scale	10,000-100,000	$10,000-50,000/month
Enterprise	100,000+	$50,000-500,000/month

Rule of thumb: Infrastructure = 50-80% of your API costs.

2. Monitoring & Observability (5-10% of total)

What you need to track:

Usage monitoring: LangSmith, Helicone, Langfuse
Application performance: Datadog, New Relic, Sentry
Cost tracking: Custom dashboards, alerting systems
User analytics: Mixpanel, Amplitude, PostHog

Cost Breakdown:

Tool	Purpose	Monthly Cost	Per-User Cost
LangSmith	LLM observability	$0-200 base + $0.50/1K traces	Variable
Helicone	LLM cost tracking	$50-500/month	$0.05-0.50 per user
Datadog	Infrastructure monitoring	$15/host + $0.10/GB logs	$500-5,000/month
Sentry	Error tracking	$26-80/month	$0.50-2 per user
PostHog	Product analytics	$0-450/month + usage	Variable

At 10,000 users:

LangSmith: ~$500/month (moderate usage)
Helicone: ~$300/month
Datadog: ~$2,000/month (10 hosts, logs)
Sentry: ~$500/month
PostHog: ~$200/month

Total monitoring cost: $3,500/month (10-15% of API costs)

Why monitoring is essential: Without monitoring, cost attribution is incomplete:

Can't identify which customers have highest cost-to-serve
Don't know which features have highest costs
Miss API errors that add to budget
Can't optimize prompts or models
No alerts when costs increase unexpectedly

See our analysis: Usage Variance in AI Products - shows the impact of tracking per-customer costs.

3. Caching & Storage (3-5% of total)

What gets stored:

Prompt caching: Anthropic/OpenAI cache storage fees
Conversation history: Database and object storage
Generated content: S3/GCS for user outputs
Vector embeddings: Pinecone, Weaviate, or self-hosted
User uploads: Files, images, documents

Caching Costs (Often Overlooked):

Prompt caching isn't free—you pay to write to the cache:

Provider	Cache Write Cost	Cache Read Discount	Break-Even
Anthropic	1.25x input price	90% discount	Need 8+ hits
OpenAI	1x input price	50% discount	Need 2+ hits

Example: Anthropic Claude 3.5 Sonnet

Normal input: $3/1M tokens
Cache write: $3.75/1M tokens (25% premium)
Cache read: $0.30/1M tokens (90% discount)

Math:

Cost to cache 1M tokens = $3.75
Cost to read 1M tokens from cache (8 times) = 8 × $0.30 = $2.40
Total cost = $6.15

Without caching (8 reads) = 8 × $3 = $24
Savings = $17.85 (73% cheaper)

BUT: If cache only hit 2 times:
Total cost = $3.75 + (2 × $0.30) = $4.35
Without caching = 2 × $3 = $6
Savings = $1.65 (27% cheaper)

When caching INCREASES costs:

Low cache hit rate (< 50%)
Prompts change frequently
Context invalidated often
Parallel requests create separate caches

Storage Costs:

Storage Type	Use Case	Monthly Cost (10K users)
PostgreSQL	Usage tracking, user data	$200-500
S3	Generated content, uploads	$100-300
Vector DB	Embeddings, semantic search	$200-1,000
Redis	Session data, rate limiting	$100-400

Total storage: $600-2,200/month

4. Failed Requests & Retries (2-3% of total)

Hidden costs from failures:

Rate limit errors: Exponential backoff = multiple attempts
Timeout retries: Long requests that fail = wasted tokens
Model errors: OpenAI/Anthropic 500 errors = retry logic
Client errors: Bad requests that burn tokens before failing

Real Data:

Studies show AI applications experience:

5-10% rate limit errors during peak usage
2-5% timeout failures on long contexts
1-2% provider errors (OpenAI/Anthropic outages)

Cost Impact:

If 8% of requests fail and retry 3 times on average:

Failed requests = 8% of total
Average retries = 3
Wasted cost = 8% × 3 = 24% extra API spend

On a $10K/month API bill:

Wasted on failed requests: $2,400/month
Most companies don't even track this

How to minimize:

Implement request caching (avoid redundant calls)
Set aggressive timeouts (don't wait 60s for failures)
Use exponential backoff with jitter
Monitor error rates by endpoint
Alert on unusual retry spikes

5. Development & Testing (3-5% of total)

What gets forgotten:

Local development: Developers testing with real APIs
CI/CD testing: Integration tests hitting OpenAI
Staging environments: Pre-production usage
A/B tests: Running experiments with multiple models
Prompt engineering: Iterating on prompts costs tokens

Typical Waste:

Activity	Monthly Cost	Avoidable?
Dev testing	$500-2,000	Partially (use mock APIs)
CI/CD tests	$200-800	Yes (use fixtures)
Staging usage	$300-1,000	Partially (limit to QA)
Prompt iteration	$400-1,500	No (necessary cost)

Total dev overhead: $1,400-5,300/month

Best practices:

Use mock LLM responses for unit tests
Cache responses for integration tests
Limit staging to final QA only
Track dev vs. prod usage separately
Avoid testing in production environments

Real-World TCO Example: AI Chatbot

Product: Customer support AI chatbot Scale: 10,000 active users, 500K conversations/month Model: GPT-4o for accuracy

API Costs (30% of total)

Average conversation:
- Input: 2,000 tokens (context + history)
- Output: 500 tokens (response)

Monthly usage:
- Input: 500K conversations × 2,000 tokens = 1B tokens
- Output: 500K conversations × 500 tokens = 250M tokens

API costs:
- Input: 1B tokens × $2.50/1M = $2,500
- Output: 250M tokens × $10/1M = $2,500
Total API: $5,000/month

Infrastructure Costs (47% of total)

AWS costs:
- ECS Fargate (5 tasks): $350/month
- PostgreSQL RDS (db.r5.large): $450/month
- Redis ElastiCache: $150/month
- S3 storage (conversations): $200/month
- Data transfer: $300/month
- CloudWatch logs: $150/month
- Load balancer: $250/month

Total infrastructure: $1,850/month

Monitoring & Tools (12% of total)

Observability:
- LangSmith: $200/month
- Datadog: $800/month
- Sentry: $100/month

Total monitoring: $1,100/month

Storage & Caching (8% of total)

Additional storage:
- Vector database (Pinecone): $300/month
- Conversation archives (S3): $200/month
- Cache storage (Anthropic): $100/month

Total storage: $600/month

Failed Requests (3% of total)

Error rate: 7% with 2 avg retries
Wasted API costs: $5,000 × 7% × 2 = $700/month

Total Cost of Ownership

Category	Cost	% of Total
API Costs	$5,000	30%
Infrastructure	$1,850	47%
Monitoring	$1,100	12%
Storage	$600	8%
Failed Requests	$700	3%
TOTAL	$8,250	100%

True cost per customer: $8,250 / 10,000 = $0.83/month

If you're charging $15/month, your gross margin is:

Margin = ($15 - $0.83) / $15 = 94.5%

But if you only tracked API costs ($5K):

Assumed cost per customer = $0.50/month
Assumed margin = ($15 - $0.50) / $15 = 96.7% (overstated)

The 2-point margin error is significant at scale:

At $1M ARR: $20K/year difference
At $10M ARR: $200K/year difference
At $100M ARR: $2M/year difference

Why This Matters: Cursor's Wake-Up Call

Cursor hit $500M ARR but discovered their AWS costs were 79% of Anthropic costs.

Their assumed economics:

Anthropic API: $16M/month
Assumed total cost: ~$18M/month (90% API)
Assumed gross margin: 64%

Their actual economics:

Anthropic API: $16M/month
AWS infrastructure: $12.6M/month
Total cost: $28.6M/month
Actual gross margin: 36%

The 28-point margin difference led to:

Multiple repricing cycles (4 times in 12 months)
Usage limits (customer communication required)
$200/month Ultra tier
June 2025 pricing adjustments ($71 single-day charges reported)

Visibility into true costs earlier enables better pricing decisions.

How to Track Your True TCO

Week 1: Audit Current Costs

API Costs:

Export billing from OpenAI, Anthropic, etc.
Break down by endpoint/model/feature
Identify highest-cost API calls

Infrastructure:

Review AWS/GCP/Azure bills
Tag resources by service (app, db, storage, etc.)
Allocate shared costs (networking, monitoring)

Tools & Services:

List all SaaS tools (monitoring, analytics, etc.)
Calculate per-user cost for each
Identify unused or redundant tools

Week 2: Build Cost Attribution

Tag everything:

// Example: Tag API calls with metadata
await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [...],
  metadata: {
    customer_id: "cust_123",
    feature: "chat",
    tier: "pro"
  }
});

Track infrastructure by customer:

Database queries per user
Storage usage per user
Cache hit rates per feature
API error rates by customer

Calculate true cost-to-serve:

Cost-to-Serve = (
  API Costs +
  (Infrastructure × User%) +
  (Monitoring / Total Users) +
  (Storage × User%) +
  (Failed Requests × User%)
) per customer

Week 3: Set Up Alerts

Cost spike alerts:

API spend > 20% above baseline
Infrastructure costs trending up
Individual customer > $X threshold
Failed request rate > 10%

Margin alerts:

Gross margin drops below target
High-cost customers exceed plan value
Free tier abuse detection

Week 4: Optimize

Quick wins:

Switch simple tasks to cheaper models (Haiku vs. Sonnet)
Implement prompt caching (8+ reuses)
Reduce context window size (remove old messages)
Fix retry logic (don't retry bad requests)
Consolidate monitoring tools

Longer-term:

Multi-model routing (cost vs. quality tradeoff)
Self-host embeddings (cheaper than API)
Optimize database queries (reduce RDS costs)
Implement tiered pricing (align cost with value)

Common TCO Calculation Gaps

Gap #1: Only Tracking API Costs

What founders say:

"Our OpenAI bill is $5K/month, so that's our cost."

Reality:

Total cost is $12K-15K/month
They're off by 140-200%
Margins are 50% worse than they think

Fix: Track all infrastructure, monitoring, and storage costs.

Gap #2: Not Allocating Shared Infrastructure

What founders say:

"AWS is just $2K/month for our whole platform."

Reality:

$2K includes web app, API, database, storage, etc.
AI features consume 70% of infrastructure
Real AI infrastructure cost: $1,400/month

Fix: Tag AWS resources by service, calculate AI-specific costs.

Gap #3: Not Tracking Failed Requests

What founders say:

"We retry on errors, so we don't lose requests."

Reality:

8% error rate with 3 retries = 24% wasted spend
Paying for tokens that never deliver value
Adds $2,400/month on $10K API bill

Fix: Monitor retry rates, set aggressive timeouts, cache responses.

Gap #4: Free Tier Costing Excludes Infrastructure

What founders say:

"Free tier costs us $0.50/user in API calls, so we can afford 1,000 free users."

Reality:

API: $0.50/user
Infrastructure: $0.30/user (storage, database, monitoring)
Total cost: $0.80/user
1,000 free users = $800/month, not $500/month

Fix: Calculate true cost-to-serve including infrastructure per user.

Tools to Help You Track TCO

Bear Billing: Built for Complete Cost Visibility

We built Bear Billing because spreadsheets don't cut it for AI cost tracking.

What we track automatically:

API costs from OpenAI, Anthropic, AWS Bedrock, etc.
Infrastructure costs allocated to AI features
Per-customer costs for margin analysis
Failed requests and wasted spend
True cost-to-serve including all overhead

Key features:

Real-time cost dashboards
Per-customer margin tracking
Alert when customers burn margins
Pricing scenario modeling

Join the waitlist for early access to comprehensive AI cost tracking.

Usage Variance in AI Products - Per-customer cost distribution and margin analysis
GitHub Copilot Unit Economics - Case study in AI product margin calculations
The True Cost of Running AI APIs: 2025 Guide - Model pricing comparison and profitability calculations
AI Cost Tracking Solution - Automated cost tracking across all providers

Key Takeaways

API costs are only 30-40% of your total bill - Infrastructure, monitoring, storage, and failed requests add 60-70% more
Cursor's AWS bill was 79% of Anthropic costs - Infrastructure often exceeds API spend at scale
Monitoring is 5-10% but non-negotiable - Without it, you can't identify unprofitable customers
Failed requests waste 2-3% of budget - Most companies don't even track this
True cost-to-serve = API + Infrastructure + Monitoring + Storage + Overhead - Calculate this per customer
Margin errors compound at scale - 2% margin error = $2M/year at $100M ARR

Complete cost visibility enables accurate margin calculations.

Join our waitlist for automated AI cost tracking.

The Real Cost of Running an AI Product in 2025: $/Token Is Only 30% of Your Bill

The AI Cost Iceberg

Breaking Down the Hidden 70%

1. Infrastructure Costs (40-50% of total)

2. Monitoring & Observability (5-10% of total)

3. Caching & Storage (3-5% of total)

4. Failed Requests & Retries (2-3% of total)

5. Development & Testing (3-5% of total)

Real-World TCO Example: AI Chatbot

API Costs (30% of total)

Infrastructure Costs (47% of total)

Monitoring & Tools (12% of total)

Storage & Caching (8% of total)

Failed Requests (3% of total)

Total Cost of Ownership

Why This Matters: Cursor's Wake-Up Call

How to Track Your True TCO

Week 1: Audit Current Costs

Week 2: Build Cost Attribution

Week 3: Set Up Alerts

Week 4: Optimize

Common TCO Calculation Gaps

Gap #1: Only Tracking API Costs

Gap #2: Not Allocating Shared Infrastructure

Gap #3: Not Tracking Failed Requests

Gap #4: Free Tier Costing Excludes Infrastructure

Tools to Help You Track TCO

Bear Billing: Built for Complete Cost Visibility

Key Takeaways

Related Articles

Multi-Model Routing: How to Cut AI Costs 40-60% Without Sacrificing Quality

Unit Economics for AI Products: A Complete Cost Framework Beyond Tokens

HTTP 402 Payments: The Technical Reality Nobody Talks About