← All Articles · · 12 min read

OpenAI API Pricing Calculator: How to Estimate Your Monthly Costs

A practical guide to estimating your OpenAI API costs — GPT-4o, GPT-4o-mini, and o3 pricing explained, real calculation examples for chatbots and summarizers, cost optimization strategies, and a comparison of OpenAI vs Anthropic vs Gemini pricing.

openaiapi-pricingcost-optimizationaigpt

One of the most common developer surprises when building with OpenAI’s API is the bill at the end of the month. Token-based pricing is not intuitive, and it’s easy to underestimate costs when you’re moving fast in development. A chatbot that seems cheap in testing can become expensive at scale if you haven’t done the math.

This guide shows you how to estimate your OpenAI API costs before they surprise you — with real calculation examples, optimization strategies, and a cross-provider comparison so you can make informed model choices.


Understanding OpenAI’s Pricing Model

OpenAI charges based on tokens — not requests, not words, not time. A token is roughly 4 characters or 0.75 words in English. Every API call consumes input tokens (your prompt) and produces output tokens (the response). You pay for both, at different rates.

Key insight: Input tokens are always cheaper than output tokens. For most models, output costs 3–5× more per token than input. This matters for how you architect your prompts.


Current OpenAI Model Pricing (2025)

Prices are per 1 million tokens unless otherwise noted:

ModelInputCached InputOutput
GPT-4o$2.50$1.25$10.00
GPT-4o-mini$0.15$0.075$0.60
o3$10.00$2.50$40.00
o3-mini$1.10$0.55$4.40
o1$15.00$7.50$60.00

Notes:

  • Cached input pricing applies when the same prompt prefix is reused (see Prompt Caching below)
  • o3 and o1 are reasoning models — they think before responding, producing “reasoning tokens” that contribute to cost
  • Check platform.openai.com/docs/pricing for the latest figures — prices change

The Token Math: How to Calculate Costs

The formula is straightforward:

Cost = (input_tokens × input_price_per_token) + (output_tokens × output_price_per_token)

Where:

input_price_per_token = model_input_price / 1,000,000
output_price_per_token = model_output_price / 1,000,000

Quick Reference: Token Counting

ContentApproximate Token Count
Single word~1 token
Average sentence~15–20 tokens
1 page of text (~500 words)~650–750 tokens
Short paragraph~100 tokens
System prompt (200 words)~260 tokens
Full book chapter~4,000–8,000 tokens

You can count tokens precisely using OpenAI’s tiktoken library:

import tiktoken

def count_tokens(text: str, model: str = "gpt-4o") -> int:
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

# Example
system_prompt = "You are a helpful customer support assistant for an e-commerce store."
user_message = "I ordered 3 days ago and my package hasn't arrived. Order number is #78432."

system_tokens = count_tokens(system_prompt)
user_tokens = count_tokens(user_message)
total_input = system_tokens + user_tokens

print(f"System prompt: {system_tokens} tokens")
print(f"User message: {user_tokens} tokens")
print(f"Total input: {total_input} tokens")

Real Calculation Examples

Example 1: Customer Support Chatbot

Setup:

  • System prompt: 300 tokens (role definition, product info, guidelines)
  • Average user message: 80 tokens
  • Average conversation history passed: 400 tokens (last 3 turns)
  • Average assistant response: 200 tokens
  • Volume: 50,000 conversations/month

Per conversation cost (GPT-4o):

Input tokens:  300 (system) + 80 (user) + 400 (history) = 780 tokens
Output tokens: 200 tokens

Input cost:  780 × ($2.50 / 1,000,000)  = $0.00195
Output cost: 200 × ($10.00 / 1,000,000) = $0.00200
Cost per conversation: ~$0.004

Monthly cost at 50,000 conversations:

50,000 × $0.004 = $200/month (GPT-4o)
50,000 × $0.00024 = $12/month (GPT-4o-mini)

Recommendation: Use GPT-4o-mini for straightforward support queries and escalate to GPT-4o only for complex cases. Potential savings: 94%.


Example 2: Document Summarizer

Setup:

  • Average document length: 3,000 tokens
  • System prompt: 150 tokens
  • Average summary output: 400 tokens
  • Volume: 10,000 documents/month

Per document cost (GPT-4o):

Input tokens:  3,000 (doc) + 150 (system) = 3,150 tokens
Output tokens: 400 tokens

Input cost:  3,150 × ($2.50 / 1,000,000)  = $0.007875
Output cost: 400 × ($10.00 / 1,000,000)   = $0.004000
Cost per document: ~$0.012

Monthly cost at 10,000 documents:

GPT-4o:      10,000 × $0.012  = $120/month
GPT-4o-mini: 10,000 × $0.00072 = $7.20/month

For summarization tasks where you don’t need frontier reasoning, GPT-4o-mini produces excellent results at a fraction of the cost.


Example 3: AI Coding Assistant

Setup:

  • System prompt: 500 tokens (coding guidelines, language constraints)
  • Average code context passed: 1,500 tokens
  • Average user question: 100 tokens
  • Average code response: 800 tokens
  • Volume: 5,000 requests/month

Per request cost (GPT-4o):

Input tokens:  500 + 1,500 + 100 = 2,100 tokens
Output tokens: 800 tokens

Input cost:  2,100 × ($2.50 / 1,000,000)  = $0.00525
Output cost: 800 × ($10.00 / 1,000,000)   = $0.00800
Cost per request: ~$0.013

Monthly cost:

GPT-4o:   5,000 × $0.013  = $65/month
o3-mini:  5,000 × ($2,100 × 0.0000011 + $800 × 0.0000044) = ~$23/month

For coding tasks requiring deeper reasoning, o3-mini can outperform GPT-4o at a lower cost — run your own evals to verify for your specific use case.


Building a Cost Estimator in Python

Here’s a reusable calculator you can adapt for your own apps:

# OpenAI pricing per million tokens (update as needed)
PRICING = {
    "gpt-4o": {
        "input": 2.50,
        "cached_input": 1.25,
        "output": 10.00
    },
    "gpt-4o-mini": {
        "input": 0.15,
        "cached_input": 0.075,
        "output": 0.60
    },
    "o3": {
        "input": 10.00,
        "cached_input": 2.50,
        "output": 40.00
    },
    "o3-mini": {
        "input": 1.10,
        "cached_input": 0.55,
        "output": 4.40
    }
}

def estimate_monthly_cost(
    model: str,
    avg_input_tokens: int,
    avg_output_tokens: int,
    monthly_requests: int,
    cached_input_fraction: float = 0.0
) -> dict:
    """
    Estimate monthly API costs.

    Args:
        model: Model name (e.g., "gpt-4o")
        avg_input_tokens: Average input tokens per request
        avg_output_tokens: Average output tokens per request
        monthly_requests: Expected monthly request volume
        cached_input_fraction: Fraction of input that qualifies for cached pricing (0.0–1.0)
    """
    if model not in PRICING:
        raise ValueError(f"Unknown model: {model}. Available: {list(PRICING.keys())}")

    prices = PRICING[model]

    # Split input into cached vs non-cached
    cached_tokens = avg_input_tokens * cached_input_fraction
    fresh_tokens = avg_input_tokens * (1 - cached_input_fraction)

    cost_per_request = (
        (fresh_tokens * prices["input"] / 1_000_000) +
        (cached_tokens * prices["cached_input"] / 1_000_000) +
        (avg_output_tokens * prices["output"] / 1_000_000)
    )

    monthly_cost = cost_per_request * monthly_requests

    return {
        "model": model,
        "cost_per_request_usd": round(cost_per_request, 6),
        "monthly_cost_usd": round(monthly_cost, 2),
        "monthly_input_tokens": avg_input_tokens * monthly_requests,
        "monthly_output_tokens": avg_output_tokens * monthly_requests,
    }


def compare_models(avg_input_tokens, avg_output_tokens, monthly_requests):
    """Compare costs across all models for a given workload."""
    results = []
    for model in PRICING:
        result = estimate_monthly_cost(
            model, avg_input_tokens, avg_output_tokens, monthly_requests
        )
        results.append(result)
    return sorted(results, key=lambda x: x["monthly_cost_usd"])


# Example: Chatbot with 1,000 tokens input, 300 tokens output, 100k requests/month
comparison = compare_models(1000, 300, 100_000)
for r in comparison:
    print(f"{r['model']:20} ${r['monthly_cost_usd']:>10,.2f}/month")

Output for this example:

gpt-4o-mini          $    330.00/month
o3-mini              $  2,420.00/month
gpt-4o               $  5,500.00/month
o3                   $ 22,000.00/month

Paste the JSON output into DevPlaybook’s JSON Formatter to inspect your cost breakdown in a readable format.


Cost Optimization Strategies

1. Prompt Caching

OpenAI’s prompt caching automatically caches repeated prompt prefixes and charges 50% for cached tokens. This is highly effective when you have a large, static system prompt.

To maximize cache hits:

  • Keep your static content (system prompt, reference documents, examples) at the beginning of the prompt
  • Keep dynamic content (user message, conversation history) at the end
  • Cache hits apply when the same prefix appears in multiple requests within the caching window

Potential savings: 30–60% on input costs for applications with large, reused system prompts.

2. Model Selection Routing

Not every request needs your most expensive model. Route by complexity:

def select_model(task_type: str, complexity_score: int) -> str:
    """
    Select model based on task type and complexity.
    complexity_score: 1 (trivial) to 10 (highly complex)
    """
    if task_type in ["classification", "keyword_extraction", "sentiment"] or complexity_score <= 3:
        return "gpt-4o-mini"

    if task_type in ["code_reasoning", "math", "multi_step_analysis"] or complexity_score >= 8:
        return "o3-mini"

    return "gpt-4o"  # Default for mid-complexity tasks

3. Batch Processing

OpenAI’s Batch API offers a 50% cost reduction for non-real-time workloads (24-hour turnaround):

from openai import OpenAI
import json

client = OpenAI()

# Prepare batch
requests = [
    {
        "custom_id": f"task-{i}",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-4o-mini",
            "messages": [{"role": "user", "content": f"Summarize: {document}"}],
            "max_tokens": 200
        }
    }
    for i, document in enumerate(documents)
]

# Write JSONL file
with open("batch_requests.jsonl", "w") as f:
    for req in requests:
        f.write(json.dumps(req) + "\n")

# Upload and submit batch
with open("batch_requests.jsonl", "rb") as f:
    batch_file = client.files.create(file=f, purpose="batch")

batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

print(f"Batch submitted: {batch.id}")

When to use batching: Nightly report generation, bulk document processing, offline classification jobs. Not appropriate for user-facing real-time responses.

4. Trim Conversation History

Passing your full conversation history on every turn is one of the most common cost leaks:

def trim_history(messages: list, max_tokens: int = 2000) -> list:
    """Keep only the most recent messages that fit within token budget."""
    import tiktoken
    enc = tiktoken.encoding_for_model("gpt-4o")

    trimmed = []
    token_count = 0

    # Always keep the system message (index 0)
    # Iterate history in reverse to keep most recent
    for message in reversed(messages[1:]):
        msg_tokens = len(enc.encode(message["content"]))
        if token_count + msg_tokens > max_tokens:
            break
        trimmed.insert(0, message)
        token_count += msg_tokens

    return [messages[0]] + trimmed  # Prepend system message

5. Set max_tokens Appropriately

If you only need a 100-word answer, don’t set max_tokens=4096. While you only pay for tokens actually generated, a tight max_tokens budget forces the model to be concise, which often improves quality for structured outputs.


OpenAI vs Anthropic vs Gemini: Pricing Comparison

As of early 2025, for the mid-tier “best value” models:

ProviderModelInput (per 1M)Output (per 1M)Context Window
OpenAIGPT-4o$2.50$10.00128K
OpenAIGPT-4o-mini$0.15$0.60128K
AnthropicClaude Sonnet 4.6$3.00$15.00200K
AnthropicClaude Haiku 4.5$0.80$4.00200K
GoogleGemini 1.5 Pro$1.25$5.001M
GoogleGemini 1.5 Flash$0.075$0.301M

Key observations:

  1. Cheapest for high-volume tasks: Gemini Flash is the most economical option for simple, high-volume tasks. GPT-4o-mini is close behind.

  2. Best context window: Claude Sonnet 4.6’s 200K context window is a practical advantage for document processing. Gemini 1.5 Pro’s 1M context window is unmatched for extremely long documents.

  3. Reasoning models: OpenAI’s o3/o3-mini and Anthropic’s Claude for extended thinking are premium offerings — use them only when reasoning depth meaningfully improves your output quality.

  4. Price vs quality trade-offs differ by task type: Run your own benchmarks on your specific use case. General benchmarks rarely match real-world results for specialized applications.


Monitoring Costs in Production

Track token usage on every API call:

import json
from datetime import datetime, UTC
from openai import OpenAI

client = OpenAI()
PRICING = {"gpt-4o": {"input": 2.50, "output": 10.00}, "gpt-4o-mini": {"input": 0.15, "output": 0.60}}

def tracked_completion(model, messages, **kwargs):
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        **kwargs
    )

    usage = response.usage
    prices = PRICING.get(model, {"input": 0, "output": 0})

    cost = (
        usage.prompt_tokens * prices["input"] / 1_000_000 +
        usage.completion_tokens * prices["output"] / 1_000_000
    )

    log_entry = {
        "ts": datetime.now(UTC).isoformat(),
        "model": model,
        "prompt_tokens": usage.prompt_tokens,
        "completion_tokens": usage.completion_tokens,
        "cost_usd": round(cost, 8)
    }

    with open("api_usage.jsonl", "a") as f:
        f.write(json.dumps(log_entry) + "\n")

    return response

# Usage
response = tracked_completion(
    "gpt-4o-mini",
    [{"role": "user", "content": "Summarize this article..."}]
)

Use DevPlaybook’s API Tester to manually test your OpenAI API calls during development. It supports custom headers (for Authorization: Bearer ...), request bodies, and displays the full response including usage stats.


Monthly Cost Estimator: Quick Reference

Use this table to estimate costs for common workload sizes. Assumes 1,000 input + 300 output tokens per request.

Monthly RequestsGPT-4o-miniGPT-4oo3-mini
1,000$0.33$3.50$2.42
10,000$3.30$35.00$24.20
100,000$33.00$350.00$242.00
1,000,000$330.00$3,500.00$2,420.00

At 1M requests/month:

  • If GPT-4o-mini quality is sufficient → $330/month
  • If you need GPT-4o quality → $3,500/month
  • That’s a 10× cost difference for the same workload

Model selection is the highest-leverage cost decision you’ll make.


Common Cost Traps to Avoid

1. Not trimming conversation history. Every message in messages[] costs tokens. A long chat session can accumulate thousands of tokens of history per request.

2. Overly verbose system prompts. System prompts are paid for on every request. Audit yours for unnecessary verbiage. A 2,000-token system prompt at 100K requests/month costs an extra $500 on GPT-4o.

3. Using o3 where GPT-4o-mini would do. Reasoning models are 67× more expensive than GPT-4o-mini. Reserve them for tasks where deep reasoning genuinely changes the output quality.

4. Ignoring cached input pricing. If you have a large static system prompt, restructure your prompts to maximize cache hits. This alone can cut input costs by half.

5. Not setting usage alerts. Set budget alerts in the OpenAI console. A runaway loop in development can generate thousands of requests before you notice. Cap your development API key’s usage limit.


Summary: A Decision Framework for Model Selection

Is the task simple? (classification, extraction, short summaries)
  → GPT-4o-mini (or Gemini Flash for maximum savings)

Does the task involve complex reasoning? (multi-step math, code debugging, analysis)
  → Test GPT-4o first; escalate to o3-mini only if quality is insufficient

Do you need maximum context window?
  → Claude Sonnet 4.6 (200K) or Gemini 1.5 Pro (1M)

Is this a non-real-time batch workload?
  → Use OpenAI Batch API for 50% discount

Are you serving users in real time with tight latency requirements?
  → GPT-4o-mini or Haiku 4.5 — both return responses in under 2 seconds

The most important thing you can do is instrument your app from day one. Log every token count, track cost per request, and set budget alerts. Surprises happen when you’re not watching.


DevPlaybook offers free developer tools at devplaybook.cc, including a JSON Formatter and API Tester useful for working with LLM APIs. No affiliate relationship with OpenAI, Anthropic, or Google.

Free Newsletter

Level Up Your Dev Workflow

Get new tools, guides, and productivity tips delivered to your inbox.

Plus: grab the free Developer Productivity Checklist when you subscribe.

Found this guide useful? Check out our free developer tools.