Skip to main content
Back to Blog
technical6 min read

The True Cost of Running AI APIs: Complete 2025 Guide

Compare GPT-4, Claude, and Gemini pricing with real profitability calculations. See exact $/token costs, context window pricing, and margin analysis for AI SaaS startups.

BBT

Bear Billing Team

AI Billing Experts

#ai-costs#pricing#gpt-4#claude#gemini#unit-economics

Summary: This guide breaks down the exact costs of running AI APIs in 2025, comparing GPT-4, Claude Sonnet, and Gemini pricing models with real profitability calculations for SaaS startups. We'll show you how to calculate your true cost-to-serve and identify which customers are burning your margins.


Quick Comparison Table

ModelInput CostOutput CostContext WindowBest For
GPT-4 Turbo$10/1M tokens$30/1M tokens128KGeneral purpose, reasoning
GPT-4o$2.50/1M tokens$10/1M tokens128KCost-optimized GPT-4
Claude 3.5 Sonnet$3/1M tokens$15/1M tokens200KLong documents, code
Gemini 1.5 Pro$1.25/1M tokens$5/1M tokens1MCost-sensitive, large context
Claude 3 Haiku$0.25/1M tokens$1.25/1M tokens200KHigh-volume, simple tasks

Prices as of January 2025. Check official pricing pages for updates.


What is Cost-to-Serve?

Definition: Cost-to-serve is the total infrastructure cost required to deliver your product to one customer, including API costs, compute, storage, and overhead.

Formula:

Cost-to-Serve = (AI API Costs + Infrastructure + Overhead) / Active Users

For AI-powered SaaS products, AI API costs typically represent 60-80% of your total cost-to-serve.


Real-World Example: AI Writing Assistant

Let's calculate the economics for a hypothetical AI writing assistant (similar to Jasper, Copy.ai, or Writer):

Usage Assumptions

  • Average user: Generates 50,000 tokens/month (input + output combined)
  • Model split: 70% input tokens, 30% output tokens
  • Price: $29/month subscription

Cost Breakdown by Model

Option 1: GPT-4 Turbo

Input cost: 35,000 tokens × $10/1M = $0.35
Output cost: 15,000 tokens × $30/1M = $0.45
Total AI cost: $0.80/user/month

Margin: $29 - $0.80 = $28.20
Margin %: 97.2%

Option 2: GPT-4o (Cost-Optimized)

Input cost: 35,000 tokens × $2.50/1M = $0.09
Output cost: 15,000 tokens × $10/1M = $0.15
Total AI cost: $0.24/user/month

Margin: $29 - $0.24 = $28.76
Margin %: 99.2%

Option 3: Claude 3.5 Sonnet

Input cost: 35,000 tokens × $3/1M = $0.11
Output cost: 15,000 tokens × $15/1M = $0.23
Total AI cost: $0.34/user/month

Margin: $29 - $0.34 = $28.66
Margin %: 98.8%

Option 4: Claude 3 Haiku (High Volume)

Input cost: 35,000 tokens × $0.25/1M = $0.009
Output cost: 15,000 tokens × $1.25/1M = $0.019
Total AI cost: $0.03/user/month

Margin: $29 - $0.03 = $28.97
Margin %: 99.9%

High-Usage Customers: Margin Analysis

Not all users are average. Let's look at a power user:

For a detailed analysis of usage variance affecting GitHub Copilot, Cursor, and ChatGPT Pro, see: Usage Variance in AI Products: Understanding Per-Customer Cost Distribution.

Power User Profile

  • Generates 1,000,000 tokens/month (20x average)
  • Still pays $29/month subscription

GPT-4 Turbo Costs:

Input: 700,000 × $10/1M = $7.00
Output: 300,000 × $30/1M = $9.00
Total: $16.00/month

Margin: $29 - $16 = $13
Margin %: 44.8%

Still profitable, but margin compressed from 97% to 45%.

With Usage-Based Tiers:

If you implement:
- First 50k tokens: Included in $29 base
- Next 950k tokens: $0.015/1k tokens extra

Power user revenue: $29 + ($0.015 × 950) = $43.25
Margin: $43.25 - $16 = $27.25
Margin %: 63%

Better alignment between value and cost.


GitHub Copilot: A Cautionary Tale

GitHub Copilot famously loses money on most users:

  • Price: $10/month (or $19 for business)
  • Average cost: $20-$30/month per user
  • Result: Negative margins on average users

Why?

  1. Unlimited usage model
  2. Code generation is token-intensive
  3. Users generate millions of tokens/month
  4. Pricing hasn't adjusted since launch

Pattern: Usage-based pricing aligns revenue with cost-to-serve across customer segments.

Related: Learn exactly how to migrate from flat-rate to usage-based pricing without customer revolt: From Flat-Rate to Usage-Based Pricing Migration


How to Track Your Cost-to-Serve

Step 1: Instrument Your API Calls

// Example: Track tokens per request
async function callOpenAI(prompt: string, userId: string) {
  const response = await openai.chat.completions.create({
    model: 'gpt-4-turbo',
    messages: [{ role: 'user', content: prompt }],
  });

  // Track usage
  await trackUsage({
    userId,
    model: 'gpt-4-turbo',
    inputTokens: response.usage.prompt_tokens,
    outputTokens: response.usage.completion_tokens,
    cost: calculateCost(response.usage),
  });

  return response;
}

Step 2: Aggregate by Customer

SELECT
  customer_id,
  SUM(input_tokens * input_rate + output_tokens * output_rate) as total_cost,
  COUNT(DISTINCT user_id) as active_users,
  SUM(input_tokens * input_rate + output_tokens * output_rate) / COUNT(DISTINCT user_id) as cost_per_user
FROM usage_events
WHERE event_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY customer_id
ORDER BY total_cost DESC;

Step 3: Compare to Revenue

SELECT
  c.customer_id,
  c.mrr as monthly_revenue,
  u.total_cost as monthly_cost,
  (c.mrr - u.total_cost) as margin,
  ((c.mrr - u.total_cost) / c.mrr * 100) as margin_pct
FROM customers c
JOIN usage_costs u ON c.customer_id = u.customer_id
WHERE u.month = CURRENT_DATE - INTERVAL '1 month'
ORDER BY margin ASC;

When to Switch Models

Use GPT-4 Turbo when:

  • Quality matters more than cost
  • Complex reasoning required
  • Customer willing to pay premium

Use GPT-4o when:

  • Need GPT-4 quality at lower cost
  • High-volume use cases
  • Good balance of quality/cost

Use Claude Sonnet when:

  • Working with large documents (200K context)
  • Code generation and analysis
  • Need excellent instruction following

Use Haiku or Gemini when:

  • High-volume, simple tasks
  • Classification, routing, extraction
  • Cost is critical constraint

Profitability Checklist

  • Track token usage per user/customer
  • Calculate actual cost-to-serve monthly
  • Identify customers with negative margins
  • Implement usage caps or tiered pricing
  • Monitor margin trends over time
  • Test different model mixes for cost optimization
  • Build alerts for high-usage customers
  • Review pricing quarterly as AI costs drop

Next Steps

  1. Audit your current costs: Use tools like Bear Billing to see cost-to-serve by customer
  2. Test cheaper models: Run A/B tests with GPT-4o vs GPT-4 Turbo to see if users notice quality difference
  3. Implement usage caps: Add soft limits with upgrade prompts before margins go negative
  4. Consider hybrid pricing: Base + usage tiers protect margins while keeping customers happy

Tools & Resources


Want to track your AI costs automatically? Join our early access program for free white-glove setup and margin dashboards.

Share this article