Understanding Token Pricing: A Complete Guide for Developers

Everything you need to know about LLM token pricing, from how tokens are calculated to optimization strategies that can save you thousands.

Confused by token pricing? You're not alone. Understanding how LLMs charge for usage is crucial for controlling costs. This guide breaks down everything you need to know about tokens, pricing models, and optimization strategies.

What Are Tokens?

Tokens are the basic units of text that LLMs process. Think of them as "chunks" of text—roughly ¾ of a word in English.

**Examples**:

"Hello world" = 2 tokens

"The quick brown fox" = 4 tokens

"AI" = 1 token

"Artificial Intelligence" = 3 tokens

How Tokens are Calculated

Different text requires different token counts:

const examples = [
  { text: "Hello!", tokens: 2 },
  { text: "The", tokens: 1 },
  { text: "ChatGPT", tokens: 2 },
  { text: "email@example.com", tokens: 5 },
  { text: "console.log('hi')", tokens: 6 }
]

**Rule of thumb**: 1 token ≈ 4 characters in English

Token Pricing Models

Input vs Output Tokens

All providers charge differently for input (prompt) and output (completion) tokens:

|----------|--------------|---------------|-------|

| GPT-4 Turbo | $10.00 | $30.00 | 1:3 |

| GPT-3.5 Turbo | $0.50 | $1.50 | 1:3 |

| Claude 3 Opus | $15.00 | $75.00 | 1:5 |

| Gemini Pro | $0.50 | $1.50 | 1:3 |

**Key insight**: Output tokens are 3-5x more expensive than input tokens!

Why the Asymmetry?

Output generation requires more computation:

**Input processing**: One forward pass through the model

**Output generation**: Multiple passes (one per token generated)

Estimating Token Usage

Method 1: Character Count

function estimateTokens(text: string): number {
  return Math.ceil(text.length / 4)
}

// Example
const prompt = "Write a blog post about AI"
console.log(\`Estimated tokens: \${estimateTokens(prompt)}\`) // ~7 tokens

Method 2: OpenAI Tokenizer

import { encode } from 'gpt-tokenizer'

function countTokens(text: string): number {
  return encode(text).length
}

// Example
const prompt = "Write a blog post about AI"
console.log(\`Exact tokens: \${countTokens(prompt)}\`) // 8 tokens

Method 3: API Response

const response = await openai.chat.completions.create({
  model: 'gpt-4-turbo',
  messages: [{ role: 'user', content: prompt }]
})

console.log('Token usage:', response.usage)
// {
//   prompt_tokens: 8,
//   completion_tokens: 150,
//   total_tokens: 158
// }

Cost Calculation

Calculate Request Cost

function calculateCost(
  promptTokens: number,
  completionTokens: number,
  model: string
): number {
  const pricing = {
    'gpt-4-turbo': { input: 0.00001, output: 0.00003 },
    'gpt-3.5-turbo': { input: 0.0000005, output: 0.0000015 },
    'claude-3-opus': { input: 0.000015, output: 0.000075 }
  }

  const rates = pricing[model]
  return (promptTokens * rates.input) + (completionTokens * rates.output)
}

// Example
const cost = calculateCost(100, 500, 'gpt-4-turbo')
console.log(\`Cost: $\${cost.toFixed(4)}\`) // $0.0160

Token Optimization Strategies

Strategy 1: Reduce System Prompts

// BEFORE: 200 tokens
const systemPrompt = \`
You are a helpful AI assistant for Acme Corp.
Your role is to assist users with their questions.
Always be professional and courteous.
Never make up information or provide false data.
\`

// AFTER: 50 tokens
const systemPrompt = \`You're Acme Corp's AI assistant. Be accurate and helpful.\`

// Savings: 150 tokens/request × 10K requests/day = 1.5M tokens/day
// At GPT-4 input rates: $15/day = $450/month saved

Strategy 2: Trim Conversation History

function trimHistory(messages: Message[], maxTokens: number = 2000) {
  let tokens = 0
  const trimmed = []

  // Always keep system message
  trimmed.push(messages[0])

  // Add messages from most recent backwards
  for (let i = messages.length - 1; i > 0; i--) {
    const messageTokens = countTokens(messages[i].content)
    if (tokens + messageTokens > maxTokens) break

    trimmed.unshift(messages[i])
    tokens += messageTokens
  }

  return trimmed
}

Strategy 3: Use Shorter Outputs

// Specify max_tokens to prevent verbose responses
const response = await openai.chat.completions.create({
  model: 'gpt-4-turbo',
  messages: [{ role: 'user', content: 'Explain AI briefly' }],
  max_tokens: 100  // Limit response length
})

Real-World Scenarios

Scenario 1: Chatbot

// Typical chat turn
const promptTokens = 500  // History + new message
const completionTokens = 150  // Response

// Monthly cost (10K conversations)
const monthlyCost = calculateCost(promptTokens, completionTokens, 'gpt-4-turbo') * 10000
console.log(\`Monthly: $\${monthlyCost.toFixed(2)}\`) // $550

Scenario 2: Content Generation

// Blog post generation
const promptTokens = 200  // Instructions + outline
const completionTokens = 2000  // 1500-word article

// Cost per article
const costPerArticle = calculateCost(promptTokens, completionTokens, 'claude-3-opus')
console.log(\`Cost per article: $\${costPerArticle.toFixed(2)}\`) // $0.15

Key Takeaways

**1 token ≈ ¾ of a word** (or ~4 characters)

**Output tokens cost 3-5x more** than input tokens

**Optimize prompts** to reduce input token count

**Set max_tokens** to prevent runaway costs

**Monitor token usage** to identify optimization opportunities

Tools and Resources

**OpenAI Tokenizer**: https://platform.openai.com/tokenizer

**TikToken Library**: Count tokens programmatically

**GPT Tokenizer NPM**: `npm install gpt-tokenizer`

Understanding token pricing is the first step to controlling LLM costs. Use these strategies to optimize your usage and save thousands on your AI bills.