Understanding Token Pricing: A Complete Guide for Developers
Everything you need to know about LLM token pricing, from how tokens are calculated to optimization strategies that can save you thousands.
Optra AI Team
Engineering
Confused by token pricing? You're not alone. Understanding how LLMs charge for usage is crucial for controlling costs. This guide breaks down everything you need to know about tokens, pricing models, and optimization strategies.
What Are Tokens?
Tokens are the basic units of text that LLMs process. Think of them as "chunks" of text—roughly ¾ of a word in English.
**Examples**:
How Tokens are Calculated
Different text requires different token counts:
const examples = [
{ text: "Hello!", tokens: 2 },
{ text: "The", tokens: 1 },
{ text: "ChatGPT", tokens: 2 },
{ text: "email@example.com", tokens: 5 },
{ text: "console.log('hi')", tokens: 6 }
]**Rule of thumb**: 1 token ≈ 4 characters in English
Token Pricing Models
Input vs Output Tokens
All providers charge differently for input (prompt) and output (completion) tokens:
| Provider | Input ($/1M) | Output ($/1M) | Ratio |
|----------|--------------|---------------|-------|
| GPT-4 Turbo | $10.00 | $30.00 | 1:3 |
| GPT-3.5 Turbo | $0.50 | $1.50 | 1:3 |
| Claude 3 Opus | $15.00 | $75.00 | 1:5 |
| Gemini Pro | $0.50 | $1.50 | 1:3 |
**Key insight**: Output tokens are 3-5x more expensive than input tokens!
Why the Asymmetry?
Output generation requires more computation:
Estimating Token Usage
Method 1: Character Count
function estimateTokens(text: string): number {
return Math.ceil(text.length / 4)
}
// Example
const prompt = "Write a blog post about AI"
console.log(\`Estimated tokens: \${estimateTokens(prompt)}\`) // ~7 tokensMethod 2: OpenAI Tokenizer
import { encode } from 'gpt-tokenizer'
function countTokens(text: string): number {
return encode(text).length
}
// Example
const prompt = "Write a blog post about AI"
console.log(\`Exact tokens: \${countTokens(prompt)}\`) // 8 tokensMethod 3: API Response
const response = await openai.chat.completions.create({
model: 'gpt-4-turbo',
messages: [{ role: 'user', content: prompt }]
})
console.log('Token usage:', response.usage)
// {
// prompt_tokens: 8,
// completion_tokens: 150,
// total_tokens: 158
// }Cost Calculation
Calculate Request Cost
function calculateCost(
promptTokens: number,
completionTokens: number,
model: string
): number {
const pricing = {
'gpt-4-turbo': { input: 0.00001, output: 0.00003 },
'gpt-3.5-turbo': { input: 0.0000005, output: 0.0000015 },
'claude-3-opus': { input: 0.000015, output: 0.000075 }
}
const rates = pricing[model]
return (promptTokens * rates.input) + (completionTokens * rates.output)
}
// Example
const cost = calculateCost(100, 500, 'gpt-4-turbo')
console.log(\`Cost: $\${cost.toFixed(4)}\`) // $0.0160Token Optimization Strategies
Strategy 1: Reduce System Prompts
// BEFORE: 200 tokens
const systemPrompt = \`
You are a helpful AI assistant for Acme Corp.
Your role is to assist users with their questions.
Always be professional and courteous.
Never make up information or provide false data.
\`
// AFTER: 50 tokens
const systemPrompt = \`You're Acme Corp's AI assistant. Be accurate and helpful.\`
// Savings: 150 tokens/request × 10K requests/day = 1.5M tokens/day
// At GPT-4 input rates: $15/day = $450/month savedStrategy 2: Trim Conversation History
function trimHistory(messages: Message[], maxTokens: number = 2000) {
let tokens = 0
const trimmed = []
// Always keep system message
trimmed.push(messages[0])
// Add messages from most recent backwards
for (let i = messages.length - 1; i > 0; i--) {
const messageTokens = countTokens(messages[i].content)
if (tokens + messageTokens > maxTokens) break
trimmed.unshift(messages[i])
tokens += messageTokens
}
return trimmed
}Strategy 3: Use Shorter Outputs
// Specify max_tokens to prevent verbose responses
const response = await openai.chat.completions.create({
model: 'gpt-4-turbo',
messages: [{ role: 'user', content: 'Explain AI briefly' }],
max_tokens: 100 // Limit response length
})Real-World Scenarios
Scenario 1: Chatbot
// Typical chat turn
const promptTokens = 500 // History + new message
const completionTokens = 150 // Response
// Monthly cost (10K conversations)
const monthlyCost = calculateCost(promptTokens, completionTokens, 'gpt-4-turbo') * 10000
console.log(\`Monthly: $\${monthlyCost.toFixed(2)}\`) // $550Scenario 2: Content Generation
// Blog post generation
const promptTokens = 200 // Instructions + outline
const completionTokens = 2000 // 1500-word article
// Cost per article
const costPerArticle = calculateCost(promptTokens, completionTokens, 'claude-3-opus')
console.log(\`Cost per article: $\${costPerArticle.toFixed(2)}\`) // $0.15Key Takeaways
Tools and Resources
Understanding token pricing is the first step to controlling LLM costs. Use these strategies to optimize your usage and save thousands on your AI bills.