OpenAI API Pricing Calculator: How to Estimate Your Monthly Costs
A practical guide to estimating your OpenAI API costs — GPT-4o, GPT-4o-mini, and o3 pricing explained, real calculation examples for chatbots and summarizers, cost optimization strategies, and a comparison of OpenAI vs Anthropic vs Gemini pricing.
One of the most common developer surprises when building with OpenAI’s API is the bill at the end of the month. Token-based pricing is not intuitive, and it’s easy to underestimate costs when you’re moving fast in development. A chatbot that seems cheap in testing can become expensive at scale if you haven’t done the math.
This guide shows you how to estimate your OpenAI API costs before they surprise you — with real calculation examples, optimization strategies, and a cross-provider comparison so you can make informed model choices.
Understanding OpenAI’s Pricing Model
OpenAI charges based on tokens — not requests, not words, not time. A token is roughly 4 characters or 0.75 words in English. Every API call consumes input tokens (your prompt) and produces output tokens (the response). You pay for both, at different rates.
Key insight: Input tokens are always cheaper than output tokens. For most models, output costs 3–5× more per token than input. This matters for how you architect your prompts.
Current OpenAI Model Pricing (2025)
Prices are per 1 million tokens unless otherwise noted:
| Model | Input | Cached Input | Output |
|---|---|---|---|
| GPT-4o | $2.50 | $1.25 | $10.00 |
| GPT-4o-mini | $0.15 | $0.075 | $0.60 |
| o3 | $10.00 | $2.50 | $40.00 |
| o3-mini | $1.10 | $0.55 | $4.40 |
| o1 | $15.00 | $7.50 | $60.00 |
Notes:
- Cached input pricing applies when the same prompt prefix is reused (see Prompt Caching below)
- o3 and o1 are reasoning models — they think before responding, producing “reasoning tokens” that contribute to cost
- Check platform.openai.com/docs/pricing for the latest figures — prices change
The Token Math: How to Calculate Costs
The formula is straightforward:
Cost = (input_tokens × input_price_per_token) + (output_tokens × output_price_per_token)
Where:
input_price_per_token = model_input_price / 1,000,000
output_price_per_token = model_output_price / 1,000,000
Quick Reference: Token Counting
| Content | Approximate Token Count |
|---|---|
| Single word | ~1 token |
| Average sentence | ~15–20 tokens |
| 1 page of text (~500 words) | ~650–750 tokens |
| Short paragraph | ~100 tokens |
| System prompt (200 words) | ~260 tokens |
| Full book chapter | ~4,000–8,000 tokens |
You can count tokens precisely using OpenAI’s tiktoken library:
import tiktoken
def count_tokens(text: str, model: str = "gpt-4o") -> int:
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))
# Example
system_prompt = "You are a helpful customer support assistant for an e-commerce store."
user_message = "I ordered 3 days ago and my package hasn't arrived. Order number is #78432."
system_tokens = count_tokens(system_prompt)
user_tokens = count_tokens(user_message)
total_input = system_tokens + user_tokens
print(f"System prompt: {system_tokens} tokens")
print(f"User message: {user_tokens} tokens")
print(f"Total input: {total_input} tokens")
Real Calculation Examples
Example 1: Customer Support Chatbot
Setup:
- System prompt: 300 tokens (role definition, product info, guidelines)
- Average user message: 80 tokens
- Average conversation history passed: 400 tokens (last 3 turns)
- Average assistant response: 200 tokens
- Volume: 50,000 conversations/month
Per conversation cost (GPT-4o):
Input tokens: 300 (system) + 80 (user) + 400 (history) = 780 tokens
Output tokens: 200 tokens
Input cost: 780 × ($2.50 / 1,000,000) = $0.00195
Output cost: 200 × ($10.00 / 1,000,000) = $0.00200
Cost per conversation: ~$0.004
Monthly cost at 50,000 conversations:
50,000 × $0.004 = $200/month (GPT-4o)
50,000 × $0.00024 = $12/month (GPT-4o-mini)
Recommendation: Use GPT-4o-mini for straightforward support queries and escalate to GPT-4o only for complex cases. Potential savings: 94%.
Example 2: Document Summarizer
Setup:
- Average document length: 3,000 tokens
- System prompt: 150 tokens
- Average summary output: 400 tokens
- Volume: 10,000 documents/month
Per document cost (GPT-4o):
Input tokens: 3,000 (doc) + 150 (system) = 3,150 tokens
Output tokens: 400 tokens
Input cost: 3,150 × ($2.50 / 1,000,000) = $0.007875
Output cost: 400 × ($10.00 / 1,000,000) = $0.004000
Cost per document: ~$0.012
Monthly cost at 10,000 documents:
GPT-4o: 10,000 × $0.012 = $120/month
GPT-4o-mini: 10,000 × $0.00072 = $7.20/month
For summarization tasks where you don’t need frontier reasoning, GPT-4o-mini produces excellent results at a fraction of the cost.
Example 3: AI Coding Assistant
Setup:
- System prompt: 500 tokens (coding guidelines, language constraints)
- Average code context passed: 1,500 tokens
- Average user question: 100 tokens
- Average code response: 800 tokens
- Volume: 5,000 requests/month
Per request cost (GPT-4o):
Input tokens: 500 + 1,500 + 100 = 2,100 tokens
Output tokens: 800 tokens
Input cost: 2,100 × ($2.50 / 1,000,000) = $0.00525
Output cost: 800 × ($10.00 / 1,000,000) = $0.00800
Cost per request: ~$0.013
Monthly cost:
GPT-4o: 5,000 × $0.013 = $65/month
o3-mini: 5,000 × ($2,100 × 0.0000011 + $800 × 0.0000044) = ~$23/month
For coding tasks requiring deeper reasoning, o3-mini can outperform GPT-4o at a lower cost — run your own evals to verify for your specific use case.
Building a Cost Estimator in Python
Here’s a reusable calculator you can adapt for your own apps:
# OpenAI pricing per million tokens (update as needed)
PRICING = {
"gpt-4o": {
"input": 2.50,
"cached_input": 1.25,
"output": 10.00
},
"gpt-4o-mini": {
"input": 0.15,
"cached_input": 0.075,
"output": 0.60
},
"o3": {
"input": 10.00,
"cached_input": 2.50,
"output": 40.00
},
"o3-mini": {
"input": 1.10,
"cached_input": 0.55,
"output": 4.40
}
}
def estimate_monthly_cost(
model: str,
avg_input_tokens: int,
avg_output_tokens: int,
monthly_requests: int,
cached_input_fraction: float = 0.0
) -> dict:
"""
Estimate monthly API costs.
Args:
model: Model name (e.g., "gpt-4o")
avg_input_tokens: Average input tokens per request
avg_output_tokens: Average output tokens per request
monthly_requests: Expected monthly request volume
cached_input_fraction: Fraction of input that qualifies for cached pricing (0.0–1.0)
"""
if model not in PRICING:
raise ValueError(f"Unknown model: {model}. Available: {list(PRICING.keys())}")
prices = PRICING[model]
# Split input into cached vs non-cached
cached_tokens = avg_input_tokens * cached_input_fraction
fresh_tokens = avg_input_tokens * (1 - cached_input_fraction)
cost_per_request = (
(fresh_tokens * prices["input"] / 1_000_000) +
(cached_tokens * prices["cached_input"] / 1_000_000) +
(avg_output_tokens * prices["output"] / 1_000_000)
)
monthly_cost = cost_per_request * monthly_requests
return {
"model": model,
"cost_per_request_usd": round(cost_per_request, 6),
"monthly_cost_usd": round(monthly_cost, 2),
"monthly_input_tokens": avg_input_tokens * monthly_requests,
"monthly_output_tokens": avg_output_tokens * monthly_requests,
}
def compare_models(avg_input_tokens, avg_output_tokens, monthly_requests):
"""Compare costs across all models for a given workload."""
results = []
for model in PRICING:
result = estimate_monthly_cost(
model, avg_input_tokens, avg_output_tokens, monthly_requests
)
results.append(result)
return sorted(results, key=lambda x: x["monthly_cost_usd"])
# Example: Chatbot with 1,000 tokens input, 300 tokens output, 100k requests/month
comparison = compare_models(1000, 300, 100_000)
for r in comparison:
print(f"{r['model']:20} ${r['monthly_cost_usd']:>10,.2f}/month")
Output for this example:
gpt-4o-mini $ 330.00/month
o3-mini $ 2,420.00/month
gpt-4o $ 5,500.00/month
o3 $ 22,000.00/month
Paste the JSON output into DevPlaybook’s JSON Formatter to inspect your cost breakdown in a readable format.
Cost Optimization Strategies
1. Prompt Caching
OpenAI’s prompt caching automatically caches repeated prompt prefixes and charges 50% for cached tokens. This is highly effective when you have a large, static system prompt.
To maximize cache hits:
- Keep your static content (system prompt, reference documents, examples) at the beginning of the prompt
- Keep dynamic content (user message, conversation history) at the end
- Cache hits apply when the same prefix appears in multiple requests within the caching window
Potential savings: 30–60% on input costs for applications with large, reused system prompts.
2. Model Selection Routing
Not every request needs your most expensive model. Route by complexity:
def select_model(task_type: str, complexity_score: int) -> str:
"""
Select model based on task type and complexity.
complexity_score: 1 (trivial) to 10 (highly complex)
"""
if task_type in ["classification", "keyword_extraction", "sentiment"] or complexity_score <= 3:
return "gpt-4o-mini"
if task_type in ["code_reasoning", "math", "multi_step_analysis"] or complexity_score >= 8:
return "o3-mini"
return "gpt-4o" # Default for mid-complexity tasks
3. Batch Processing
OpenAI’s Batch API offers a 50% cost reduction for non-real-time workloads (24-hour turnaround):
from openai import OpenAI
import json
client = OpenAI()
# Prepare batch
requests = [
{
"custom_id": f"task-{i}",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": f"Summarize: {document}"}],
"max_tokens": 200
}
}
for i, document in enumerate(documents)
]
# Write JSONL file
with open("batch_requests.jsonl", "w") as f:
for req in requests:
f.write(json.dumps(req) + "\n")
# Upload and submit batch
with open("batch_requests.jsonl", "rb") as f:
batch_file = client.files.create(file=f, purpose="batch")
batch = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/chat/completions",
completion_window="24h"
)
print(f"Batch submitted: {batch.id}")
When to use batching: Nightly report generation, bulk document processing, offline classification jobs. Not appropriate for user-facing real-time responses.
4. Trim Conversation History
Passing your full conversation history on every turn is one of the most common cost leaks:
def trim_history(messages: list, max_tokens: int = 2000) -> list:
"""Keep only the most recent messages that fit within token budget."""
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
trimmed = []
token_count = 0
# Always keep the system message (index 0)
# Iterate history in reverse to keep most recent
for message in reversed(messages[1:]):
msg_tokens = len(enc.encode(message["content"]))
if token_count + msg_tokens > max_tokens:
break
trimmed.insert(0, message)
token_count += msg_tokens
return [messages[0]] + trimmed # Prepend system message
5. Set max_tokens Appropriately
If you only need a 100-word answer, don’t set max_tokens=4096. While you only pay for tokens actually generated, a tight max_tokens budget forces the model to be concise, which often improves quality for structured outputs.
OpenAI vs Anthropic vs Gemini: Pricing Comparison
As of early 2025, for the mid-tier “best value” models:
| Provider | Model | Input (per 1M) | Output (per 1M) | Context Window |
|---|---|---|---|---|
| OpenAI | GPT-4o | $2.50 | $10.00 | 128K |
| OpenAI | GPT-4o-mini | $0.15 | $0.60 | 128K |
| Anthropic | Claude Sonnet 4.6 | $3.00 | $15.00 | 200K |
| Anthropic | Claude Haiku 4.5 | $0.80 | $4.00 | 200K |
| Gemini 1.5 Pro | $1.25 | $5.00 | 1M | |
| Gemini 1.5 Flash | $0.075 | $0.30 | 1M |
Key observations:
-
Cheapest for high-volume tasks: Gemini Flash is the most economical option for simple, high-volume tasks. GPT-4o-mini is close behind.
-
Best context window: Claude Sonnet 4.6’s 200K context window is a practical advantage for document processing. Gemini 1.5 Pro’s 1M context window is unmatched for extremely long documents.
-
Reasoning models: OpenAI’s o3/o3-mini and Anthropic’s Claude for extended thinking are premium offerings — use them only when reasoning depth meaningfully improves your output quality.
-
Price vs quality trade-offs differ by task type: Run your own benchmarks on your specific use case. General benchmarks rarely match real-world results for specialized applications.
Monitoring Costs in Production
Track token usage on every API call:
import json
from datetime import datetime, UTC
from openai import OpenAI
client = OpenAI()
PRICING = {"gpt-4o": {"input": 2.50, "output": 10.00}, "gpt-4o-mini": {"input": 0.15, "output": 0.60}}
def tracked_completion(model, messages, **kwargs):
response = client.chat.completions.create(
model=model,
messages=messages,
**kwargs
)
usage = response.usage
prices = PRICING.get(model, {"input": 0, "output": 0})
cost = (
usage.prompt_tokens * prices["input"] / 1_000_000 +
usage.completion_tokens * prices["output"] / 1_000_000
)
log_entry = {
"ts": datetime.now(UTC).isoformat(),
"model": model,
"prompt_tokens": usage.prompt_tokens,
"completion_tokens": usage.completion_tokens,
"cost_usd": round(cost, 8)
}
with open("api_usage.jsonl", "a") as f:
f.write(json.dumps(log_entry) + "\n")
return response
# Usage
response = tracked_completion(
"gpt-4o-mini",
[{"role": "user", "content": "Summarize this article..."}]
)
Use DevPlaybook’s API Tester to manually test your OpenAI API calls during development. It supports custom headers (for Authorization: Bearer ...), request bodies, and displays the full response including usage stats.
Monthly Cost Estimator: Quick Reference
Use this table to estimate costs for common workload sizes. Assumes 1,000 input + 300 output tokens per request.
| Monthly Requests | GPT-4o-mini | GPT-4o | o3-mini |
|---|---|---|---|
| 1,000 | $0.33 | $3.50 | $2.42 |
| 10,000 | $3.30 | $35.00 | $24.20 |
| 100,000 | $33.00 | $350.00 | $242.00 |
| 1,000,000 | $330.00 | $3,500.00 | $2,420.00 |
At 1M requests/month:
- If GPT-4o-mini quality is sufficient → $330/month
- If you need GPT-4o quality → $3,500/month
- That’s a 10× cost difference for the same workload
Model selection is the highest-leverage cost decision you’ll make.
Common Cost Traps to Avoid
1. Not trimming conversation history. Every message in messages[] costs tokens. A long chat session can accumulate thousands of tokens of history per request.
2. Overly verbose system prompts. System prompts are paid for on every request. Audit yours for unnecessary verbiage. A 2,000-token system prompt at 100K requests/month costs an extra $500 on GPT-4o.
3. Using o3 where GPT-4o-mini would do. Reasoning models are 67× more expensive than GPT-4o-mini. Reserve them for tasks where deep reasoning genuinely changes the output quality.
4. Ignoring cached input pricing. If you have a large static system prompt, restructure your prompts to maximize cache hits. This alone can cut input costs by half.
5. Not setting usage alerts. Set budget alerts in the OpenAI console. A runaway loop in development can generate thousands of requests before you notice. Cap your development API key’s usage limit.
Summary: A Decision Framework for Model Selection
Is the task simple? (classification, extraction, short summaries)
→ GPT-4o-mini (or Gemini Flash for maximum savings)
Does the task involve complex reasoning? (multi-step math, code debugging, analysis)
→ Test GPT-4o first; escalate to o3-mini only if quality is insufficient
Do you need maximum context window?
→ Claude Sonnet 4.6 (200K) or Gemini 1.5 Pro (1M)
Is this a non-real-time batch workload?
→ Use OpenAI Batch API for 50% discount
Are you serving users in real time with tight latency requirements?
→ GPT-4o-mini or Haiku 4.5 — both return responses in under 2 seconds
The most important thing you can do is instrument your app from day one. Log every token count, track cost per request, and set budget alerts. Surprises happen when you’re not watching.
DevPlaybook offers free developer tools at devplaybook.cc, including a JSON Formatter and API Tester useful for working with LLM APIs. No affiliate relationship with OpenAI, Anthropic, or Google.
Free Newsletter
Level Up Your Dev Workflow
Get new tools, guides, and productivity tips delivered to your inbox.
Plus: grab the free Developer Productivity Checklist when you subscribe.
Found this guide useful? Check out our free developer tools.
Affiliate disclosure: Some links below are affiliate links — we may earn a small commission at no extra cost to you. Learn more.
Recommended Tools & Resources
DigitalOcean
$200 credit for new users. Simple, affordable cloud hosting for developers.
GitHub Student Pack
Free access to 100+ developer tools. Perfect for students and new devs.
Vercel
Deploy frontend apps instantly. Free tier is generous for side projects.
DevPlaybook Products
Boilerplates, scripts & AI toolkits to 10x your dev workflow.