Technical Guides

Understanding Token Pricing: A Complete Guide for Developers

Everything you need to know about LLM token pricing, from how tokens are calculated to optimization strategies that can save you thousands.

Optra AI Team

Engineering

4 min read

Confused by token pricing? You're not alone. Understanding how LLMs charge for usage is crucial for controlling costs. This guide breaks down everything you need to know about tokens, pricing models, and optimization strategies.

What Are Tokens?

Tokens are the basic units of text that LLMs process. Think of them as "chunks" of text—roughly ¾ of a word in English.

**Examples**:

  • "Hello world" = 2 tokens
  • "The quick brown fox" = 4 tokens
  • "AI" = 1 token
  • "Artificial Intelligence" = 3 tokens
  • How Tokens are Calculated

    Different text requires different token counts:

    const examples = [
      { text: "Hello!", tokens: 2 },
      { text: "The", tokens: 1 },
      { text: "ChatGPT", tokens: 2 },
      { text: "email@example.com", tokens: 5 },
      { text: "console.log('hi')", tokens: 6 }
    ]

    **Rule of thumb**: 1 token ≈ 4 characters in English

    Token Pricing Models

    Input vs Output Tokens

    All providers charge differently for input (prompt) and output (completion) tokens:

    | Provider | Input ($/1M) | Output ($/1M) | Ratio |

    |----------|--------------|---------------|-------|

    | GPT-4 Turbo | $10.00 | $30.00 | 1:3 |

    | GPT-3.5 Turbo | $0.50 | $1.50 | 1:3 |

    | Claude 3 Opus | $15.00 | $75.00 | 1:5 |

    | Gemini Pro | $0.50 | $1.50 | 1:3 |

    **Key insight**: Output tokens are 3-5x more expensive than input tokens!

    Why the Asymmetry?

    Output generation requires more computation:

  • **Input processing**: One forward pass through the model
  • **Output generation**: Multiple passes (one per token generated)
  • Estimating Token Usage

    Method 1: Character Count

    function estimateTokens(text: string): number {
      return Math.ceil(text.length / 4)
    }
    
    // Example
    const prompt = "Write a blog post about AI"
    console.log(\`Estimated tokens: \${estimateTokens(prompt)}\`) // ~7 tokens

    Method 2: OpenAI Tokenizer

    import { encode } from 'gpt-tokenizer'
    
    function countTokens(text: string): number {
      return encode(text).length
    }
    
    // Example
    const prompt = "Write a blog post about AI"
    console.log(\`Exact tokens: \${countTokens(prompt)}\`) // 8 tokens

    Method 3: API Response

    const response = await openai.chat.completions.create({
      model: 'gpt-4-turbo',
      messages: [{ role: 'user', content: prompt }]
    })
    
    console.log('Token usage:', response.usage)
    // {
    //   prompt_tokens: 8,
    //   completion_tokens: 150,
    //   total_tokens: 158
    // }

    Cost Calculation

    Calculate Request Cost

    function calculateCost(
      promptTokens: number,
      completionTokens: number,
      model: string
    ): number {
      const pricing = {
        'gpt-4-turbo': { input: 0.00001, output: 0.00003 },
        'gpt-3.5-turbo': { input: 0.0000005, output: 0.0000015 },
        'claude-3-opus': { input: 0.000015, output: 0.000075 }
      }
    
      const rates = pricing[model]
      return (promptTokens * rates.input) + (completionTokens * rates.output)
    }
    
    // Example
    const cost = calculateCost(100, 500, 'gpt-4-turbo')
    console.log(\`Cost: $\${cost.toFixed(4)}\`) // $0.0160

    Token Optimization Strategies

    Strategy 1: Reduce System Prompts

    // BEFORE: 200 tokens
    const systemPrompt = \`
    You are a helpful AI assistant for Acme Corp.
    Your role is to assist users with their questions.
    Always be professional and courteous.
    Never make up information or provide false data.
    \`
    
    // AFTER: 50 tokens
    const systemPrompt = \`You're Acme Corp's AI assistant. Be accurate and helpful.\`
    
    // Savings: 150 tokens/request × 10K requests/day = 1.5M tokens/day
    // At GPT-4 input rates: $15/day = $450/month saved

    Strategy 2: Trim Conversation History

    function trimHistory(messages: Message[], maxTokens: number = 2000) {
      let tokens = 0
      const trimmed = []
    
      // Always keep system message
      trimmed.push(messages[0])
    
      // Add messages from most recent backwards
      for (let i = messages.length - 1; i > 0; i--) {
        const messageTokens = countTokens(messages[i].content)
        if (tokens + messageTokens > maxTokens) break
    
        trimmed.unshift(messages[i])
        tokens += messageTokens
      }
    
      return trimmed
    }

    Strategy 3: Use Shorter Outputs

    // Specify max_tokens to prevent verbose responses
    const response = await openai.chat.completions.create({
      model: 'gpt-4-turbo',
      messages: [{ role: 'user', content: 'Explain AI briefly' }],
      max_tokens: 100  // Limit response length
    })

    Real-World Scenarios

    Scenario 1: Chatbot

    // Typical chat turn
    const promptTokens = 500  // History + new message
    const completionTokens = 150  // Response
    
    // Monthly cost (10K conversations)
    const monthlyCost = calculateCost(promptTokens, completionTokens, 'gpt-4-turbo') * 10000
    console.log(\`Monthly: $\${monthlyCost.toFixed(2)}\`) // $550

    Scenario 2: Content Generation

    // Blog post generation
    const promptTokens = 200  // Instructions + outline
    const completionTokens = 2000  // 1500-word article
    
    // Cost per article
    const costPerArticle = calculateCost(promptTokens, completionTokens, 'claude-3-opus')
    console.log(\`Cost per article: $\${costPerArticle.toFixed(2)}\`) // $0.15

    Key Takeaways

  • **1 token ≈ ¾ of a word** (or ~4 characters)
  • **Output tokens cost 3-5x more** than input tokens
  • **Optimize prompts** to reduce input token count
  • **Set max_tokens** to prevent runaway costs
  • **Monitor token usage** to identify optimization opportunities
  • Tools and Resources

  • **OpenAI Tokenizer**: https://platform.openai.com/tokenizer
  • **TikToken Library**: Count tokens programmatically
  • **GPT Tokenizer NPM**: `npm install gpt-tokenizer`
  • Understanding token pricing is the first step to controlling LLM costs. Use these strategies to optimize your usage and save thousands on your AI bills.

    tokenspricingfundamentalscost-optimization

    How good is your vibe coding really?

    Prism scores every coding session so you finally know how your prompting performs — and how you compare. Works with Claude Code today; Codex and Gemini next.

    Check your Score