The True Cost of Running AI APIs: Complete 2025 Guide

Summary: This guide breaks down the exact costs of running AI APIs in 2025, comparing GPT-4, Claude Sonnet, and Gemini pricing models with real profitability calculations for SaaS startups. We'll show you how to calculate your true cost-to-serve and identify which customers are burning your margins.

Quick Comparison Table

Model	Input Cost	Output Cost	Context Window	Best For
GPT-4 Turbo	$10/1M tokens	$30/1M tokens	128K	General purpose, reasoning
GPT-4o	$2.50/1M tokens	$10/1M tokens	128K	Cost-optimized GPT-4
Claude 3.5 Sonnet	$3/1M tokens	$15/1M tokens	200K	Long documents, code
Gemini 1.5 Pro	$1.25/1M tokens	$5/1M tokens	1M	Cost-sensitive, large context
Claude 3 Haiku	$0.25/1M tokens	$1.25/1M tokens	200K	High-volume, simple tasks

Prices as of January 2025. Check official pricing pages for updates.

What is Cost-to-Serve?

Definition: Cost-to-serve is the total infrastructure cost required to deliver your product to one customer, including API costs, compute, storage, and overhead.

Formula:

Cost-to-Serve = (AI API Costs + Infrastructure + Overhead) / Active Users

For AI-powered SaaS products, AI API costs typically represent 60-80% of your total cost-to-serve.

Real-World Example: AI Writing Assistant

Let's calculate the economics for a hypothetical AI writing assistant (similar to Jasper, Copy.ai, or Writer):

Usage Assumptions

Average user: Generates 50,000 tokens/month (input + output combined)
Model split: 70% input tokens, 30% output tokens
Price: $29/month subscription

Cost Breakdown by Model

Option 1: GPT-4 Turbo

Input cost: 35,000 tokens × $10/1M = $0.35
Output cost: 15,000 tokens × $30/1M = $0.45
Total AI cost: $0.80/user/month

Margin: $29 - $0.80 = $28.20
Margin %: 97.2%

Option 2: GPT-4o (Cost-Optimized)

Input cost: 35,000 tokens × $2.50/1M = $0.09
Output cost: 15,000 tokens × $10/1M = $0.15
Total AI cost: $0.24/user/month

Margin: $29 - $0.24 = $28.76
Margin %: 99.2%

Option 3: Claude 3.5 Sonnet

Input cost: 35,000 tokens × $3/1M = $0.11
Output cost: 15,000 tokens × $15/1M = $0.23
Total AI cost: $0.34/user/month

Margin: $29 - $0.34 = $28.66
Margin %: 98.8%

Option 4: Claude 3 Haiku (High Volume)

Input cost: 35,000 tokens × $0.25/1M = $0.009
Output cost: 15,000 tokens × $1.25/1M = $0.019
Total AI cost: $0.03/user/month

Margin: $29 - $0.03 = $28.97
Margin %: 99.9%

High-Usage Customers: Margin Analysis

Not all users are average. Let's look at a power user:

For a detailed analysis of usage variance affecting GitHub Copilot, Cursor, and ChatGPT Pro, see: Usage Variance in AI Products: Understanding Per-Customer Cost Distribution.

Power User Profile

Generates 1,000,000 tokens/month (20x average)
Still pays $29/month subscription

GPT-4 Turbo Costs:

Input: 700,000 × $10/1M = $7.00
Output: 300,000 × $30/1M = $9.00
Total: $16.00/month

Margin: $29 - $16 = $13
Margin %: 44.8%

Still profitable, but margin compressed from 97% to 45%.

With Usage-Based Tiers:

If you implement:
- First 50k tokens: Included in $29 base
- Next 950k tokens: $0.015/1k tokens extra

Power user revenue: $29 + ($0.015 × 950) = $43.25
Margin: $43.25 - $16 = $27.25
Margin %: 63%

Better alignment between value and cost.

GitHub Copilot: A Cautionary Tale

GitHub Copilot famously loses money on most users:

Price: $10/month (or $19 for business)
Average cost: $20-$30/month per user
Result: Negative margins on average users

Why?

Unlimited usage model
Code generation is token-intensive
Users generate millions of tokens/month
Pricing hasn't adjusted since launch

Pattern: Usage-based pricing aligns revenue with cost-to-serve across customer segments.

Related: Learn exactly how to migrate from flat-rate to usage-based pricing without customer revolt: From Flat-Rate to Usage-Based Pricing Migration

How to Track Your Cost-to-Serve

Step 1: Instrument Your API Calls

// Example: Track tokens per request
async function callOpenAI(prompt: string, userId: string) {
  const response = await openai.chat.completions.create({
    model: 'gpt-4-turbo',
    messages: [{ role: 'user', content: prompt }],
  });

  // Track usage
  await trackUsage({
    userId,
    model: 'gpt-4-turbo',
    inputTokens: response.usage.prompt_tokens,
    outputTokens: response.usage.completion_tokens,
    cost: calculateCost(response.usage),
  });

  return response;
}

Step 2: Aggregate by Customer

SELECT
  customer_id,
  SUM(input_tokens * input_rate + output_tokens * output_rate) as total_cost,
  COUNT(DISTINCT user_id) as active_users,
  SUM(input_tokens * input_rate + output_tokens * output_rate) / COUNT(DISTINCT user_id) as cost_per_user
FROM usage_events
WHERE event_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY customer_id
ORDER BY total_cost DESC;

Step 3: Compare to Revenue

SELECT
  c.customer_id,
  c.mrr as monthly_revenue,
  u.total_cost as monthly_cost,
  (c.mrr - u.total_cost) as margin,
  ((c.mrr - u.total_cost) / c.mrr * 100) as margin_pct
FROM customers c
JOIN usage_costs u ON c.customer_id = u.customer_id
WHERE u.month = CURRENT_DATE - INTERVAL '1 month'
ORDER BY margin ASC;

When to Switch Models

Use GPT-4 Turbo when:

Quality matters more than cost
Complex reasoning required
Customer willing to pay premium

Use GPT-4o when:

Need GPT-4 quality at lower cost
High-volume use cases
Good balance of quality/cost

Use Claude Sonnet when:

Working with large documents (200K context)
Code generation and analysis
Need excellent instruction following

Use Haiku or Gemini when:

High-volume, simple tasks
Classification, routing, extraction
Cost is critical constraint

Profitability Checklist

Track token usage per user/customer
Calculate actual cost-to-serve monthly
Identify customers with negative margins
Implement usage caps or tiered pricing
Monitor margin trends over time
Test different model mixes for cost optimization
Build alerts for high-usage customers
Review pricing quarterly as AI costs drop

Next Steps

Audit your current costs: Use tools like Bear Billing to see cost-to-serve by customer
Test cheaper models: Run A/B tests with GPT-4o vs GPT-4 Turbo to see if users notice quality difference
Implement usage caps: Add soft limits with upgrade prompts before margins go negative
Consider hybrid pricing: Base + usage tiers protect margins while keeping customers happy

Tools & Resources

OpenAI Pricing
Anthropic Claude Pricing
Google Gemini Pricing
Bear Billing - Track cost-to-serve and margins automatically
Token Estimation Predictive Model - Estimate token counts

Want to track your AI costs automatically? Join our early access program for free white-glove setup and margin dashboards.