TL;DR: Everyone focuses on OpenAI's $2.50/$10 per million tokens. But API costs are only 30-40% of your true cost to run an AI product. Infrastructure (AWS/GCP), monitoring tools, caching overhead, and failed requests add 60-70% more. Here's the complete breakdown with real numbers.
The AI Cost Iceberg
You see the API pricing:
- GPT-4o: $2.50 input / $10 output per million tokens
- Claude 3.5 Sonnet: $3 input / $15 output per million tokens
- Gemini 2.5 Pro: $1.25 input / $10 output per million tokens
You calculate your monthly OpenAI bill: $10,000/month.
You think: "Our AI costs are $10K/month. Manageable."
You're missing $15,000-20,000.
Here's what you're missing:
| Cost Category | % of Total | Monthly $ (if API = $10K) |
|---|---|---|
| API Costs (OpenAI/Anthropic) | 30-40% | $10,000 (visible) |
| Infrastructure (AWS/GCP/Azure) | 40-50% | $12,000-15,000 (often untracked) |
| Monitoring & Observability | 5-10% | $1,500-3,000 (often untracked) |
| Caching & Storage | 3-5% | $900-1,500 (often untracked) |
| Failed Requests & Retries | 2-3% | $600-900 (often untracked) |
| Development & Testing | 3-5% | $900-1,500 (often untracked) |
| TOTAL COST | 100% | $25,900-32,900 |
Your $10K/month API bill is actually a $26K-33K/month total cost.
This is why Cursor's AWS bill doubled from $6.2M to $12.6M/month even as they optimized API usage.
Breaking Down the Hidden 70%
1. Infrastructure Costs (40-50% of total)
What you're paying for:
- Compute: Application servers, background workers, queue processors
- Databases: PostgreSQL/MySQL for usage tracking, vector databases for embeddings
- Storage: Conversation history, generated content, user data, backups
- Networking: Data transfer, load balancers, CDN
- Container Orchestration: ECS/EKS, auto-scaling groups
Real Example: Cursor's Infrastructure Bill
| Month | AWS Bill | API Costs (est.) | Ratio |
|---|---|---|---|
| May 2025 | $6.2M | ~$8M | 77% (AWS/API) |
| June 2025 | $12.6M | ~$16M | 79% (AWS/API) |
Cursor's AWS costs = 77-79% of their Anthropic bill.
Why so high?
- Massive conversation history storage (200K token context windows)
- Real-time collaboration infrastructure
- Code indexing and vector search
- Distributed caching layer
Your Infrastructure Will Cost:
| Company Stage | Monthly Active Users | Infrastructure Cost |
|---|---|---|
| Early Stage | 100-1,000 | $500-2,000/month |
| Growth | 1,000-10,000 | $2,000-10,000/month |
| Scale | 10,000-100,000 | $10,000-50,000/month |
| Enterprise | 100,000+ | $50,000-500,000/month |
Rule of thumb: Infrastructure = 50-80% of your API costs.
2. Monitoring & Observability (5-10% of total)
What you need to track:
- Usage monitoring: LangSmith, Helicone, Langfuse
- Application performance: Datadog, New Relic, Sentry
- Cost tracking: Custom dashboards, alerting systems
- User analytics: Mixpanel, Amplitude, PostHog
Cost Breakdown:
| Tool | Purpose | Monthly Cost | Per-User Cost |
|---|---|---|---|
| LangSmith | LLM observability | $0-200 base + $0.50/1K traces | Variable |
| Helicone | LLM cost tracking | $50-500/month | $0.05-0.50 per user |
| Datadog | Infrastructure monitoring | $15/host + $0.10/GB logs | $500-5,000/month |
| Sentry | Error tracking | $26-80/month | $0.50-2 per user |
| PostHog | Product analytics | $0-450/month + usage | Variable |
At 10,000 users:
- LangSmith: ~$500/month (moderate usage)
- Helicone: ~$300/month
- Datadog: ~$2,000/month (10 hosts, logs)
- Sentry: ~$500/month
- PostHog: ~$200/month
Total monitoring cost: $3,500/month (10-15% of API costs)
Why monitoring is essential: Without monitoring, cost attribution is incomplete:
- Can't identify which customers have highest cost-to-serve
- Don't know which features have highest costs
- Miss API errors that add to budget
- Can't optimize prompts or models
- No alerts when costs increase unexpectedly
See our analysis: Usage Variance in AI Products - shows the impact of tracking per-customer costs.
3. Caching & Storage (3-5% of total)
What gets stored:
- Prompt caching: Anthropic/OpenAI cache storage fees
- Conversation history: Database and object storage
- Generated content: S3/GCS for user outputs
- Vector embeddings: Pinecone, Weaviate, or self-hosted
- User uploads: Files, images, documents
Caching Costs (Often Overlooked):
Prompt caching isn't free—you pay to write to the cache:
| Provider | Cache Write Cost | Cache Read Discount | Break-Even |
|---|---|---|---|
| Anthropic | 1.25x input price | 90% discount | Need 8+ hits |
| OpenAI | 1x input price | 50% discount | Need 2+ hits |
Example: Anthropic Claude 3.5 Sonnet
- Normal input: $3/1M tokens
- Cache write: $3.75/1M tokens (25% premium)
- Cache read: $0.30/1M tokens (90% discount)
Math:
Cost to cache 1M tokens = $3.75
Cost to read 1M tokens from cache (8 times) = 8 × $0.30 = $2.40
Total cost = $6.15
Without caching (8 reads) = 8 × $3 = $24
Savings = $17.85 (73% cheaper)
BUT: If cache only hit 2 times:
Total cost = $3.75 + (2 × $0.30) = $4.35
Without caching = 2 × $3 = $6
Savings = $1.65 (27% cheaper)
When caching INCREASES costs:
- Low cache hit rate (< 50%)
- Prompts change frequently
- Context invalidated often
- Parallel requests create separate caches
Storage Costs:
| Storage Type | Use Case | Monthly Cost (10K users) |
|---|---|---|
| PostgreSQL | Usage tracking, user data | $200-500 |
| S3 | Generated content, uploads | $100-300 |
| Vector DB | Embeddings, semantic search | $200-1,000 |
| Redis | Session data, rate limiting | $100-400 |
Total storage: $600-2,200/month
4. Failed Requests & Retries (2-3% of total)
Hidden costs from failures:
- Rate limit errors: Exponential backoff = multiple attempts
- Timeout retries: Long requests that fail = wasted tokens
- Model errors: OpenAI/Anthropic 500 errors = retry logic
- Client errors: Bad requests that burn tokens before failing
Real Data:
Studies show AI applications experience:
- 5-10% rate limit errors during peak usage
- 2-5% timeout failures on long contexts
- 1-2% provider errors (OpenAI/Anthropic outages)
Cost Impact:
If 8% of requests fail and retry 3 times on average:
Failed requests = 8% of total
Average retries = 3
Wasted cost = 8% × 3 = 24% extra API spend
On a $10K/month API bill:
- Wasted on failed requests: $2,400/month
- Most companies don't even track this
How to minimize:
- Implement request caching (avoid redundant calls)
- Set aggressive timeouts (don't wait 60s for failures)
- Use exponential backoff with jitter
- Monitor error rates by endpoint
- Alert on unusual retry spikes
5. Development & Testing (3-5% of total)
What gets forgotten:
- Local development: Developers testing with real APIs
- CI/CD testing: Integration tests hitting OpenAI
- Staging environments: Pre-production usage
- A/B tests: Running experiments with multiple models
- Prompt engineering: Iterating on prompts costs tokens
Typical Waste:
| Activity | Monthly Cost | Avoidable? |
|---|---|---|
| Dev testing | $500-2,000 | Partially (use mock APIs) |
| CI/CD tests | $200-800 | Yes (use fixtures) |
| Staging usage | $300-1,000 | Partially (limit to QA) |
| Prompt iteration | $400-1,500 | No (necessary cost) |
Total dev overhead: $1,400-5,300/month
Best practices:
- Use mock LLM responses for unit tests
- Cache responses for integration tests
- Limit staging to final QA only
- Track dev vs. prod usage separately
- Avoid testing in production environments
Real-World TCO Example: AI Chatbot
Product: Customer support AI chatbot Scale: 10,000 active users, 500K conversations/month Model: GPT-4o for accuracy
API Costs (30% of total)
Average conversation:
- Input: 2,000 tokens (context + history)
- Output: 500 tokens (response)
Monthly usage:
- Input: 500K conversations × 2,000 tokens = 1B tokens
- Output: 500K conversations × 500 tokens = 250M tokens
API costs:
- Input: 1B tokens × $2.50/1M = $2,500
- Output: 250M tokens × $10/1M = $2,500
Total API: $5,000/month
Infrastructure Costs (47% of total)
AWS costs:
- ECS Fargate (5 tasks): $350/month
- PostgreSQL RDS (db.r5.large): $450/month
- Redis ElastiCache: $150/month
- S3 storage (conversations): $200/month
- Data transfer: $300/month
- CloudWatch logs: $150/month
- Load balancer: $250/month
Total infrastructure: $1,850/month
Monitoring & Tools (12% of total)
Observability:
- LangSmith: $200/month
- Datadog: $800/month
- Sentry: $100/month
Total monitoring: $1,100/month
Storage & Caching (8% of total)
Additional storage:
- Vector database (Pinecone): $300/month
- Conversation archives (S3): $200/month
- Cache storage (Anthropic): $100/month
Total storage: $600/month
Failed Requests (3% of total)
Error rate: 7% with 2 avg retries
Wasted API costs: $5,000 × 7% × 2 = $700/month
Total Cost of Ownership
| Category | Cost | % of Total |
|---|---|---|
| API Costs | $5,000 | 30% |
| Infrastructure | $1,850 | 47% |
| Monitoring | $1,100 | 12% |
| Storage | $600 | 8% |
| Failed Requests | $700 | 3% |
| TOTAL | $8,250 | 100% |
True cost per customer: $8,250 / 10,000 = $0.83/month
If you're charging $15/month, your gross margin is:
Margin = ($15 - $0.83) / $15 = 94.5%
But if you only tracked API costs ($5K):
Assumed cost per customer = $0.50/month
Assumed margin = ($15 - $0.50) / $15 = 96.7% (overstated)
The 2-point margin error is significant at scale:
- At $1M ARR: $20K/year difference
- At $10M ARR: $200K/year difference
- At $100M ARR: $2M/year difference
Why This Matters: Cursor's Wake-Up Call
Cursor hit $500M ARR but discovered their AWS costs were 79% of Anthropic costs.
Their assumed economics:
Anthropic API: $16M/month
Assumed total cost: ~$18M/month (90% API)
Assumed gross margin: 64%
Their actual economics:
Anthropic API: $16M/month
AWS infrastructure: $12.6M/month
Total cost: $28.6M/month
Actual gross margin: 36%
The 28-point margin difference led to:
- Multiple repricing cycles (4 times in 12 months)
- Usage limits (customer communication required)
- $200/month Ultra tier
- June 2025 pricing adjustments ($71 single-day charges reported)
Visibility into true costs earlier enables better pricing decisions.
How to Track Your True TCO
Week 1: Audit Current Costs
API Costs:
- Export billing from OpenAI, Anthropic, etc.
- Break down by endpoint/model/feature
- Identify highest-cost API calls
Infrastructure:
- Review AWS/GCP/Azure bills
- Tag resources by service (app, db, storage, etc.)
- Allocate shared costs (networking, monitoring)
Tools & Services:
- List all SaaS tools (monitoring, analytics, etc.)
- Calculate per-user cost for each
- Identify unused or redundant tools
Week 2: Build Cost Attribution
Tag everything:
// Example: Tag API calls with metadata
await openai.chat.completions.create({
model: "gpt-4o",
messages: [...],
metadata: {
customer_id: "cust_123",
feature: "chat",
tier: "pro"
}
});
Track infrastructure by customer:
- Database queries per user
- Storage usage per user
- Cache hit rates per feature
- API error rates by customer
Calculate true cost-to-serve:
Cost-to-Serve = (
API Costs +
(Infrastructure × User%) +
(Monitoring / Total Users) +
(Storage × User%) +
(Failed Requests × User%)
) per customer
Week 3: Set Up Alerts
Cost spike alerts:
- API spend > 20% above baseline
- Infrastructure costs trending up
- Individual customer > $X threshold
- Failed request rate > 10%
Margin alerts:
- Gross margin drops below target
- High-cost customers exceed plan value
- Free tier abuse detection
Week 4: Optimize
Quick wins:
- Switch simple tasks to cheaper models (Haiku vs. Sonnet)
- Implement prompt caching (8+ reuses)
- Reduce context window size (remove old messages)
- Fix retry logic (don't retry bad requests)
- Consolidate monitoring tools
Longer-term:
- Multi-model routing (cost vs. quality tradeoff)
- Self-host embeddings (cheaper than API)
- Optimize database queries (reduce RDS costs)
- Implement tiered pricing (align cost with value)
Common TCO Calculation Gaps
Gap #1: Only Tracking API Costs
What founders say:
"Our OpenAI bill is $5K/month, so that's our cost."
Reality:
- Total cost is $12K-15K/month
- They're off by 140-200%
- Margins are 50% worse than they think
Fix: Track all infrastructure, monitoring, and storage costs.
Gap #2: Not Allocating Shared Infrastructure
What founders say:
"AWS is just $2K/month for our whole platform."
Reality:
- $2K includes web app, API, database, storage, etc.
- AI features consume 70% of infrastructure
- Real AI infrastructure cost: $1,400/month
Fix: Tag AWS resources by service, calculate AI-specific costs.
Gap #3: Not Tracking Failed Requests
What founders say:
"We retry on errors, so we don't lose requests."
Reality:
- 8% error rate with 3 retries = 24% wasted spend
- Paying for tokens that never deliver value
- Adds $2,400/month on $10K API bill
Fix: Monitor retry rates, set aggressive timeouts, cache responses.
Gap #4: Free Tier Costing Excludes Infrastructure
What founders say:
"Free tier costs us $0.50/user in API calls, so we can afford 1,000 free users."
Reality:
- API: $0.50/user
- Infrastructure: $0.30/user (storage, database, monitoring)
- Total cost: $0.80/user
- 1,000 free users = $800/month, not $500/month
Fix: Calculate true cost-to-serve including infrastructure per user.
Tools to Help You Track TCO
Bear Billing: Built for Complete Cost Visibility
We built Bear Billing because spreadsheets don't cut it for AI cost tracking.
What we track automatically:
- API costs from OpenAI, Anthropic, AWS Bedrock, etc.
- Infrastructure costs allocated to AI features
- Per-customer costs for margin analysis
- Failed requests and wasted spend
- True cost-to-serve including all overhead
Key features:
- Real-time cost dashboards
- Per-customer margin tracking
- Alert when customers burn margins
- Pricing scenario modeling
Join the waitlist for early access to comprehensive AI cost tracking.
Related Resources
- Usage Variance in AI Products - Per-customer cost distribution and margin analysis
- GitHub Copilot Unit Economics - Case study in AI product margin calculations
- The True Cost of Running AI APIs: 2025 Guide - Model pricing comparison and profitability calculations
- AI Cost Tracking Solution - Automated cost tracking across all providers
Key Takeaways
- API costs are only 30-40% of your total bill - Infrastructure, monitoring, storage, and failed requests add 60-70% more
- Cursor's AWS bill was 79% of Anthropic costs - Infrastructure often exceeds API spend at scale
- Monitoring is 5-10% but non-negotiable - Without it, you can't identify unprofitable customers
- Failed requests waste 2-3% of budget - Most companies don't even track this
- True cost-to-serve = API + Infrastructure + Monitoring + Storage + Overhead - Calculate this per customer
- Margin errors compound at scale - 2% margin error = $2M/year at $100M ARR
Complete cost visibility enables accurate margin calculations.
Join our waitlist for automated AI cost tracking.