Skip to main content
Back to Blog
technical13 min read

The Real Cost of Running an AI Product in 2025: $/Token Is Only 30% of Your Bill

API pricing is one component of total cost. Infrastructure costs (47-67% of budget), monitoring overhead, and true TCO make up the complete picture for AI margin calculations.

BBT

Bear Billing Team

AI Infrastructure Experts

#infrastructure-costs#ai-economics#tco-analysis#cost-optimization#aws-costs

TL;DR: Everyone focuses on OpenAI's $2.50/$10 per million tokens. But API costs are only 30-40% of your true cost to run an AI product. Infrastructure (AWS/GCP), monitoring tools, caching overhead, and failed requests add 60-70% more. Here's the complete breakdown with real numbers.


The AI Cost Iceberg

You see the API pricing:

  • GPT-4o: $2.50 input / $10 output per million tokens
  • Claude 3.5 Sonnet: $3 input / $15 output per million tokens
  • Gemini 2.5 Pro: $1.25 input / $10 output per million tokens

You calculate your monthly OpenAI bill: $10,000/month.

You think: "Our AI costs are $10K/month. Manageable."

You're missing $15,000-20,000.

Here's what you're missing:

Cost Category% of TotalMonthly $ (if API = $10K)
API Costs (OpenAI/Anthropic)30-40%$10,000 (visible)
Infrastructure (AWS/GCP/Azure)40-50%$12,000-15,000 (often untracked)
Monitoring & Observability5-10%$1,500-3,000 (often untracked)
Caching & Storage3-5%$900-1,500 (often untracked)
Failed Requests & Retries2-3%$600-900 (often untracked)
Development & Testing3-5%$900-1,500 (often untracked)
TOTAL COST100%$25,900-32,900

Your $10K/month API bill is actually a $26K-33K/month total cost.

This is why Cursor's AWS bill doubled from $6.2M to $12.6M/month even as they optimized API usage.


Breaking Down the Hidden 70%

1. Infrastructure Costs (40-50% of total)

What you're paying for:

  • Compute: Application servers, background workers, queue processors
  • Databases: PostgreSQL/MySQL for usage tracking, vector databases for embeddings
  • Storage: Conversation history, generated content, user data, backups
  • Networking: Data transfer, load balancers, CDN
  • Container Orchestration: ECS/EKS, auto-scaling groups

Real Example: Cursor's Infrastructure Bill

MonthAWS BillAPI Costs (est.)Ratio
May 2025$6.2M~$8M77% (AWS/API)
June 2025$12.6M~$16M79% (AWS/API)

Cursor's AWS costs = 77-79% of their Anthropic bill.

Why so high?

  • Massive conversation history storage (200K token context windows)
  • Real-time collaboration infrastructure
  • Code indexing and vector search
  • Distributed caching layer

Your Infrastructure Will Cost:

Company StageMonthly Active UsersInfrastructure Cost
Early Stage100-1,000$500-2,000/month
Growth1,000-10,000$2,000-10,000/month
Scale10,000-100,000$10,000-50,000/month
Enterprise100,000+$50,000-500,000/month

Rule of thumb: Infrastructure = 50-80% of your API costs.


2. Monitoring & Observability (5-10% of total)

What you need to track:

  • Usage monitoring: LangSmith, Helicone, Langfuse
  • Application performance: Datadog, New Relic, Sentry
  • Cost tracking: Custom dashboards, alerting systems
  • User analytics: Mixpanel, Amplitude, PostHog

Cost Breakdown:

ToolPurposeMonthly CostPer-User Cost
LangSmithLLM observability$0-200 base + $0.50/1K tracesVariable
HeliconeLLM cost tracking$50-500/month$0.05-0.50 per user
DatadogInfrastructure monitoring$15/host + $0.10/GB logs$500-5,000/month
SentryError tracking$26-80/month$0.50-2 per user
PostHogProduct analytics$0-450/month + usageVariable

At 10,000 users:

  • LangSmith: ~$500/month (moderate usage)
  • Helicone: ~$300/month
  • Datadog: ~$2,000/month (10 hosts, logs)
  • Sentry: ~$500/month
  • PostHog: ~$200/month

Total monitoring cost: $3,500/month (10-15% of API costs)

Why monitoring is essential: Without monitoring, cost attribution is incomplete:

  • Can't identify which customers have highest cost-to-serve
  • Don't know which features have highest costs
  • Miss API errors that add to budget
  • Can't optimize prompts or models
  • No alerts when costs increase unexpectedly

See our analysis: Usage Variance in AI Products - shows the impact of tracking per-customer costs.


3. Caching & Storage (3-5% of total)

What gets stored:

  • Prompt caching: Anthropic/OpenAI cache storage fees
  • Conversation history: Database and object storage
  • Generated content: S3/GCS for user outputs
  • Vector embeddings: Pinecone, Weaviate, or self-hosted
  • User uploads: Files, images, documents

Caching Costs (Often Overlooked):

Prompt caching isn't free—you pay to write to the cache:

ProviderCache Write CostCache Read DiscountBreak-Even
Anthropic1.25x input price90% discountNeed 8+ hits
OpenAI1x input price50% discountNeed 2+ hits

Example: Anthropic Claude 3.5 Sonnet

  • Normal input: $3/1M tokens
  • Cache write: $3.75/1M tokens (25% premium)
  • Cache read: $0.30/1M tokens (90% discount)

Math:

Cost to cache 1M tokens = $3.75
Cost to read 1M tokens from cache (8 times) = 8 × $0.30 = $2.40
Total cost = $6.15

Without caching (8 reads) = 8 × $3 = $24
Savings = $17.85 (73% cheaper)

BUT: If cache only hit 2 times:
Total cost = $3.75 + (2 × $0.30) = $4.35
Without caching = 2 × $3 = $6
Savings = $1.65 (27% cheaper)

When caching INCREASES costs:

  • Low cache hit rate (< 50%)
  • Prompts change frequently
  • Context invalidated often
  • Parallel requests create separate caches

Storage Costs:

Storage TypeUse CaseMonthly Cost (10K users)
PostgreSQLUsage tracking, user data$200-500
S3Generated content, uploads$100-300
Vector DBEmbeddings, semantic search$200-1,000
RedisSession data, rate limiting$100-400

Total storage: $600-2,200/month


4. Failed Requests & Retries (2-3% of total)

Hidden costs from failures:

  • Rate limit errors: Exponential backoff = multiple attempts
  • Timeout retries: Long requests that fail = wasted tokens
  • Model errors: OpenAI/Anthropic 500 errors = retry logic
  • Client errors: Bad requests that burn tokens before failing

Real Data:

Studies show AI applications experience:

  • 5-10% rate limit errors during peak usage
  • 2-5% timeout failures on long contexts
  • 1-2% provider errors (OpenAI/Anthropic outages)

Cost Impact:

If 8% of requests fail and retry 3 times on average:

Failed requests = 8% of total
Average retries = 3
Wasted cost = 8% × 3 = 24% extra API spend

On a $10K/month API bill:

  • Wasted on failed requests: $2,400/month
  • Most companies don't even track this

How to minimize:

  • Implement request caching (avoid redundant calls)
  • Set aggressive timeouts (don't wait 60s for failures)
  • Use exponential backoff with jitter
  • Monitor error rates by endpoint
  • Alert on unusual retry spikes

5. Development & Testing (3-5% of total)

What gets forgotten:

  • Local development: Developers testing with real APIs
  • CI/CD testing: Integration tests hitting OpenAI
  • Staging environments: Pre-production usage
  • A/B tests: Running experiments with multiple models
  • Prompt engineering: Iterating on prompts costs tokens

Typical Waste:

ActivityMonthly CostAvoidable?
Dev testing$500-2,000Partially (use mock APIs)
CI/CD tests$200-800Yes (use fixtures)
Staging usage$300-1,000Partially (limit to QA)
Prompt iteration$400-1,500No (necessary cost)

Total dev overhead: $1,400-5,300/month

Best practices:

  • Use mock LLM responses for unit tests
  • Cache responses for integration tests
  • Limit staging to final QA only
  • Track dev vs. prod usage separately
  • Avoid testing in production environments

Real-World TCO Example: AI Chatbot

Product: Customer support AI chatbot Scale: 10,000 active users, 500K conversations/month Model: GPT-4o for accuracy

API Costs (30% of total)

Average conversation:
- Input: 2,000 tokens (context + history)
- Output: 500 tokens (response)

Monthly usage:
- Input: 500K conversations × 2,000 tokens = 1B tokens
- Output: 500K conversations × 500 tokens = 250M tokens

API costs:
- Input: 1B tokens × $2.50/1M = $2,500
- Output: 250M tokens × $10/1M = $2,500
Total API: $5,000/month

Infrastructure Costs (47% of total)

AWS costs:
- ECS Fargate (5 tasks): $350/month
- PostgreSQL RDS (db.r5.large): $450/month
- Redis ElastiCache: $150/month
- S3 storage (conversations): $200/month
- Data transfer: $300/month
- CloudWatch logs: $150/month
- Load balancer: $250/month

Total infrastructure: $1,850/month

Monitoring & Tools (12% of total)

Observability:
- LangSmith: $200/month
- Datadog: $800/month
- Sentry: $100/month

Total monitoring: $1,100/month

Storage & Caching (8% of total)

Additional storage:
- Vector database (Pinecone): $300/month
- Conversation archives (S3): $200/month
- Cache storage (Anthropic): $100/month

Total storage: $600/month

Failed Requests (3% of total)

Error rate: 7% with 2 avg retries
Wasted API costs: $5,000 × 7% × 2 = $700/month

Total Cost of Ownership

CategoryCost% of Total
API Costs$5,00030%
Infrastructure$1,85047%
Monitoring$1,10012%
Storage$6008%
Failed Requests$7003%
TOTAL$8,250100%

True cost per customer: $8,250 / 10,000 = $0.83/month

If you're charging $15/month, your gross margin is:

Margin = ($15 - $0.83) / $15 = 94.5%

But if you only tracked API costs ($5K):

Assumed cost per customer = $0.50/month
Assumed margin = ($15 - $0.50) / $15 = 96.7% (overstated)

The 2-point margin error is significant at scale:

  • At $1M ARR: $20K/year difference
  • At $10M ARR: $200K/year difference
  • At $100M ARR: $2M/year difference

Why This Matters: Cursor's Wake-Up Call

Cursor hit $500M ARR but discovered their AWS costs were 79% of Anthropic costs.

Their assumed economics:

Anthropic API: $16M/month
Assumed total cost: ~$18M/month (90% API)
Assumed gross margin: 64%

Their actual economics:

Anthropic API: $16M/month
AWS infrastructure: $12.6M/month
Total cost: $28.6M/month
Actual gross margin: 36%

The 28-point margin difference led to:

  1. Multiple repricing cycles (4 times in 12 months)
  2. Usage limits (customer communication required)
  3. $200/month Ultra tier
  4. June 2025 pricing adjustments ($71 single-day charges reported)

Visibility into true costs earlier enables better pricing decisions.


How to Track Your True TCO

Week 1: Audit Current Costs

API Costs:

  • Export billing from OpenAI, Anthropic, etc.
  • Break down by endpoint/model/feature
  • Identify highest-cost API calls

Infrastructure:

  • Review AWS/GCP/Azure bills
  • Tag resources by service (app, db, storage, etc.)
  • Allocate shared costs (networking, monitoring)

Tools & Services:

  • List all SaaS tools (monitoring, analytics, etc.)
  • Calculate per-user cost for each
  • Identify unused or redundant tools

Week 2: Build Cost Attribution

Tag everything:

// Example: Tag API calls with metadata
await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [...],
  metadata: {
    customer_id: "cust_123",
    feature: "chat",
    tier: "pro"
  }
});

Track infrastructure by customer:

  • Database queries per user
  • Storage usage per user
  • Cache hit rates per feature
  • API error rates by customer

Calculate true cost-to-serve:

Cost-to-Serve = (
  API Costs +
  (Infrastructure × User%) +
  (Monitoring / Total Users) +
  (Storage × User%) +
  (Failed Requests × User%)
) per customer

Week 3: Set Up Alerts

Cost spike alerts:

  • API spend > 20% above baseline
  • Infrastructure costs trending up
  • Individual customer > $X threshold
  • Failed request rate > 10%

Margin alerts:

  • Gross margin drops below target
  • High-cost customers exceed plan value
  • Free tier abuse detection

Week 4: Optimize

Quick wins:

  • Switch simple tasks to cheaper models (Haiku vs. Sonnet)
  • Implement prompt caching (8+ reuses)
  • Reduce context window size (remove old messages)
  • Fix retry logic (don't retry bad requests)
  • Consolidate monitoring tools

Longer-term:

  • Multi-model routing (cost vs. quality tradeoff)
  • Self-host embeddings (cheaper than API)
  • Optimize database queries (reduce RDS costs)
  • Implement tiered pricing (align cost with value)

Common TCO Calculation Gaps

Gap #1: Only Tracking API Costs

What founders say:

"Our OpenAI bill is $5K/month, so that's our cost."

Reality:

  • Total cost is $12K-15K/month
  • They're off by 140-200%
  • Margins are 50% worse than they think

Fix: Track all infrastructure, monitoring, and storage costs.


Gap #2: Not Allocating Shared Infrastructure

What founders say:

"AWS is just $2K/month for our whole platform."

Reality:

  • $2K includes web app, API, database, storage, etc.
  • AI features consume 70% of infrastructure
  • Real AI infrastructure cost: $1,400/month

Fix: Tag AWS resources by service, calculate AI-specific costs.


Gap #3: Not Tracking Failed Requests

What founders say:

"We retry on errors, so we don't lose requests."

Reality:

  • 8% error rate with 3 retries = 24% wasted spend
  • Paying for tokens that never deliver value
  • Adds $2,400/month on $10K API bill

Fix: Monitor retry rates, set aggressive timeouts, cache responses.


Gap #4: Free Tier Costing Excludes Infrastructure

What founders say:

"Free tier costs us $0.50/user in API calls, so we can afford 1,000 free users."

Reality:

  • API: $0.50/user
  • Infrastructure: $0.30/user (storage, database, monitoring)
  • Total cost: $0.80/user
  • 1,000 free users = $800/month, not $500/month

Fix: Calculate true cost-to-serve including infrastructure per user.


Tools to Help You Track TCO

Bear Billing: Built for Complete Cost Visibility

We built Bear Billing because spreadsheets don't cut it for AI cost tracking.

What we track automatically:

  1. API costs from OpenAI, Anthropic, AWS Bedrock, etc.
  2. Infrastructure costs allocated to AI features
  3. Per-customer costs for margin analysis
  4. Failed requests and wasted spend
  5. True cost-to-serve including all overhead

Key features:

  • Real-time cost dashboards
  • Per-customer margin tracking
  • Alert when customers burn margins
  • Pricing scenario modeling

Join the waitlist for early access to comprehensive AI cost tracking.



Key Takeaways

  1. API costs are only 30-40% of your total bill - Infrastructure, monitoring, storage, and failed requests add 60-70% more
  2. Cursor's AWS bill was 79% of Anthropic costs - Infrastructure often exceeds API spend at scale
  3. Monitoring is 5-10% but non-negotiable - Without it, you can't identify unprofitable customers
  4. Failed requests waste 2-3% of budget - Most companies don't even track this
  5. True cost-to-serve = API + Infrastructure + Monitoring + Storage + Overhead - Calculate this per customer
  6. Margin errors compound at scale - 2% margin error = $2M/year at $100M ARR

Complete cost visibility enables accurate margin calculations.

Join our waitlist for automated AI cost tracking.

Share this article