Token Consumption and Prompt Optimization 💸

Reducing the number of tokens sent to LLMs is the most effective way to lower your infrastructure costs and reduce latency. LLM Bastion provides built-in mechanisms to optimize your prompts automatically.

Strategies for Token Reduction

1. Code Comment Stripping (New)

Many requests contain large blocks of code with extensive comments and documentation. While helpful for humans, these comments often consume tokens without providing value for tasks like logic refactoring or cross-language translation.

LLM Bastion can automatically strip comments from code blocks before sending them to the provider.

How to use:

Add the X-Bastion-Optimize-Prompt: strip-comments header to your request.

You can also specify the programming language to fine-tune the stripping: X-Bastion-Code-Language: python

Supported languages:

Python: Removes # and triple-quoted docstrings.
C/Java/Go: Removes // and /* */ blocks.
SQL: Removes -- and /* */.
Shell/YAML: Removes #.

2. Semantic Caching

If you send the same or highly similar prompts multiple times, LLM Bastion can cache the response semantically. This means even if the wording is slightly different, the gateway can return the cached result if the intent is the same, saving 100% of the input tokens.

See the Caching Guide for more details.

3. Systematic Secret Redaction

By redacting secrets (PII, API keys) into short placeholders, you not only improve security but also slightly reduce token counts compared to long, unique strings.

4. Choosing "Economy" Tier Models

When precision is less critical, routing requests to smaller, cheaper models (like Llama 3 8B instead of 70B) significantly reduces the cost per token.

Use the economy profile to prioritize cost-effective routing: X-Bastion-Profile: economy

Measuring Your Savings

Use the /v1/metrics/usage endpoint to track how many tokens were saved by optimization steps.

json

{
  "request_id": "...",
  "original_prompt_tokens": 1050,
  "optimized_prompt_tokens": 780,
  "tokens_saved": 270,
  "cost_saved_micro_eur": 450
}

Token Consumption and Prompt Optimization 💸 ​

Strategies for Token Reduction ​

1. Code Comment Stripping (New) ​

How to use: ​

2. Semantic Caching ​

3. Systematic Secret Redaction ​

4. Choosing "Economy" Tier Models ​

Measuring Your Savings ​