Appearance
Token Consumption and Prompt Optimization 💸 ​
Reducing the number of tokens sent to LLMs is the most effective way to lower your infrastructure costs and reduce latency. LLM Bastion provides built-in mechanisms to optimize your prompts automatically.
Strategies for Token Reduction ​
1. Code Comment Stripping (New) ​
Many requests contain large blocks of code with extensive comments and documentation. While helpful for humans, these comments often consume tokens without providing value for tasks like logic refactoring or cross-language translation.
LLM Bastion can automatically strip comments from code blocks before sending them to the provider.
How to use: ​
Add the X-Bastion-Optimize-Prompt: strip-comments header to your request.
You can also specify the programming language to fine-tune the stripping: X-Bastion-Code-Language: python
Supported languages:
- Python: Removes
#and triple-quoted docstrings. - C/Java/Go: Removes
//and/* */blocks. - SQL: Removes
--and/* */. - Shell/YAML: Removes
#.
2. Semantic Caching ​
If you send the same or highly similar prompts multiple times, LLM Bastion can cache the response semantically. This means even if the wording is slightly different, the gateway can return the cached result if the intent is the same, saving 100% of the input tokens.
See the Caching Guide for more details.
3. Systematic Secret Redaction ​
By redacting secrets (PII, API keys) into short placeholders, you not only improve security but also slightly reduce token counts compared to long, unique strings.
4. Choosing "Economy" Tier Models ​
When precision is less critical, routing requests to smaller, cheaper models (like Llama 3 8B instead of 70B) significantly reduces the cost per token.
Use the economy profile to prioritize cost-effective routing: X-Bastion-Profile: economy
Measuring Your Savings ​
Use the /v1/metrics/usage endpoint to track how many tokens were saved by optimization steps.
json
{
"request_id": "...",
"original_prompt_tokens": 1050,
"optimized_prompt_tokens": 780,
"tokens_saved": 270,
"cost_saved_micro_eur": 450
}