LLM Bastion Caching Strategy - Technical Card

🗄️ Hierarchical Caching (L1 & L2)

The bastion uses a two-layer caching mechanism to maximize efficiency while ensuring 100% data integrity.

Goal: Instant response for identical queries.
Logic: A SHA-256 hash of the normalized request.
Normalization:
- Keys are sorted alphabetically.
- Metadata like user_id, request_id, and stream are removed.
- Message content is trimmed and lowercased.
Performance: < 1ms latency.

Your data is private by default. Each cache entry is "salted" with the tenant_id.

Tenant A	Tenant B	Result
`What is 1+1?`	`What is 1+1?`	MISS (for Tenant B)

NOTE

Only explicitly marked "Public" queries can bypass the tenant salt to serve common knowledge answers.

For details on TTL categories, cache bypass headers, and business rules, please refer to the semantic_caching.feature living documentation.

Response headers will include X-Bastion-Cache: HIT-L1, HIT-L2, or MISS.