Summary: This guide breaks down the exact costs of running AI APIs in 2025, comparing GPT-4, Claude Sonnet, and Gemini pricing models with real profitability calculations for SaaS startups. We'll show you how to calculate your true cost-to-serve and identify which customers are burning your margins.
Quick Comparison Table
| Model | Input Cost | Output Cost | Context Window | Best For |
|---|---|---|---|---|
| GPT-4 Turbo | $10/1M tokens | $30/1M tokens | 128K | General purpose, reasoning |
| GPT-4o | $2.50/1M tokens | $10/1M tokens | 128K | Cost-optimized GPT-4 |
| Claude 3.5 Sonnet | $3/1M tokens | $15/1M tokens | 200K | Long documents, code |
| Gemini 1.5 Pro | $1.25/1M tokens | $5/1M tokens | 1M | Cost-sensitive, large context |
| Claude 3 Haiku | $0.25/1M tokens | $1.25/1M tokens | 200K | High-volume, simple tasks |
Prices as of January 2025. Check official pricing pages for updates.
What is Cost-to-Serve?
Definition: Cost-to-serve is the total infrastructure cost required to deliver your product to one customer, including API costs, compute, storage, and overhead.
Formula:
Cost-to-Serve = (AI API Costs + Infrastructure + Overhead) / Active Users
For AI-powered SaaS products, AI API costs typically represent 60-80% of your total cost-to-serve.
Real-World Example: AI Writing Assistant
Let's calculate the economics for a hypothetical AI writing assistant (similar to Jasper, Copy.ai, or Writer):
Usage Assumptions
- Average user: Generates 50,000 tokens/month (input + output combined)
- Model split: 70% input tokens, 30% output tokens
- Price: $29/month subscription
Cost Breakdown by Model
Option 1: GPT-4 Turbo
Input cost: 35,000 tokens × $10/1M = $0.35
Output cost: 15,000 tokens × $30/1M = $0.45
Total AI cost: $0.80/user/month
Margin: $29 - $0.80 = $28.20
Margin %: 97.2%
Option 2: GPT-4o (Cost-Optimized)
Input cost: 35,000 tokens × $2.50/1M = $0.09
Output cost: 15,000 tokens × $10/1M = $0.15
Total AI cost: $0.24/user/month
Margin: $29 - $0.24 = $28.76
Margin %: 99.2%
Option 3: Claude 3.5 Sonnet
Input cost: 35,000 tokens × $3/1M = $0.11
Output cost: 15,000 tokens × $15/1M = $0.23
Total AI cost: $0.34/user/month
Margin: $29 - $0.34 = $28.66
Margin %: 98.8%
Option 4: Claude 3 Haiku (High Volume)
Input cost: 35,000 tokens × $0.25/1M = $0.009
Output cost: 15,000 tokens × $1.25/1M = $0.019
Total AI cost: $0.03/user/month
Margin: $29 - $0.03 = $28.97
Margin %: 99.9%
High-Usage Customers: Margin Analysis
Not all users are average. Let's look at a power user:
For a detailed analysis of usage variance affecting GitHub Copilot, Cursor, and ChatGPT Pro, see: Usage Variance in AI Products: Understanding Per-Customer Cost Distribution.
Power User Profile
- Generates 1,000,000 tokens/month (20x average)
- Still pays $29/month subscription
GPT-4 Turbo Costs:
Input: 700,000 × $10/1M = $7.00
Output: 300,000 × $30/1M = $9.00
Total: $16.00/month
Margin: $29 - $16 = $13
Margin %: 44.8%
Still profitable, but margin compressed from 97% to 45%.
With Usage-Based Tiers:
If you implement:
- First 50k tokens: Included in $29 base
- Next 950k tokens: $0.015/1k tokens extra
Power user revenue: $29 + ($0.015 × 950) = $43.25
Margin: $43.25 - $16 = $27.25
Margin %: 63%
Better alignment between value and cost.
GitHub Copilot: A Cautionary Tale
GitHub Copilot famously loses money on most users:
- Price: $10/month (or $19 for business)
- Average cost: $20-$30/month per user
- Result: Negative margins on average users
Why?
- Unlimited usage model
- Code generation is token-intensive
- Users generate millions of tokens/month
- Pricing hasn't adjusted since launch
Pattern: Usage-based pricing aligns revenue with cost-to-serve across customer segments.
Related: Learn exactly how to migrate from flat-rate to usage-based pricing without customer revolt: From Flat-Rate to Usage-Based Pricing Migration
How to Track Your Cost-to-Serve
Step 1: Instrument Your API Calls
// Example: Track tokens per request
async function callOpenAI(prompt: string, userId: string) {
const response = await openai.chat.completions.create({
model: 'gpt-4-turbo',
messages: [{ role: 'user', content: prompt }],
});
// Track usage
await trackUsage({
userId,
model: 'gpt-4-turbo',
inputTokens: response.usage.prompt_tokens,
outputTokens: response.usage.completion_tokens,
cost: calculateCost(response.usage),
});
return response;
}
Step 2: Aggregate by Customer
SELECT
customer_id,
SUM(input_tokens * input_rate + output_tokens * output_rate) as total_cost,
COUNT(DISTINCT user_id) as active_users,
SUM(input_tokens * input_rate + output_tokens * output_rate) / COUNT(DISTINCT user_id) as cost_per_user
FROM usage_events
WHERE event_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY customer_id
ORDER BY total_cost DESC;
Step 3: Compare to Revenue
SELECT
c.customer_id,
c.mrr as monthly_revenue,
u.total_cost as monthly_cost,
(c.mrr - u.total_cost) as margin,
((c.mrr - u.total_cost) / c.mrr * 100) as margin_pct
FROM customers c
JOIN usage_costs u ON c.customer_id = u.customer_id
WHERE u.month = CURRENT_DATE - INTERVAL '1 month'
ORDER BY margin ASC;
When to Switch Models
Use GPT-4 Turbo when:
- Quality matters more than cost
- Complex reasoning required
- Customer willing to pay premium
Use GPT-4o when:
- Need GPT-4 quality at lower cost
- High-volume use cases
- Good balance of quality/cost
Use Claude Sonnet when:
- Working with large documents (200K context)
- Code generation and analysis
- Need excellent instruction following
Use Haiku or Gemini when:
- High-volume, simple tasks
- Classification, routing, extraction
- Cost is critical constraint
Profitability Checklist
- Track token usage per user/customer
- Calculate actual cost-to-serve monthly
- Identify customers with negative margins
- Implement usage caps or tiered pricing
- Monitor margin trends over time
- Test different model mixes for cost optimization
- Build alerts for high-usage customers
- Review pricing quarterly as AI costs drop
Next Steps
- Audit your current costs: Use tools like Bear Billing to see cost-to-serve by customer
- Test cheaper models: Run A/B tests with GPT-4o vs GPT-4 Turbo to see if users notice quality difference
- Implement usage caps: Add soft limits with upgrade prompts before margins go negative
- Consider hybrid pricing: Base + usage tiers protect margins while keeping customers happy
Tools & Resources
- OpenAI Pricing
- Anthropic Claude Pricing
- Google Gemini Pricing
- Bear Billing - Track cost-to-serve and margins automatically
- Token Estimation Predictive Model - Estimate token counts
Want to track your AI costs automatically? Join our early access program for free white-glove setup and margin dashboards.