Appearance
LLM Bastion Caching Strategy - Technical Card
🗄️ Hierarchical Caching (L1 & L2)
The bastion uses a two-layer caching mechanism to maximize efficiency while ensuring 100% data integrity.
Layer 1: Exact Match (L1)
- Goal: Instant response for identical queries.
- Logic: A SHA-256 hash of the normalized request.
- Normalization:
- Keys are sorted alphabetically.
- Metadata like
user_id,request_id, andstreamare removed. - Message content is trimmed and lowercased.
- Performance: < 1ms latency.
Layer 2: Semantic Match (L2)
- Goal: Cache hits for paraphrased queries (e.g. "Hi" vs "Hello").
- Logic: Vector similarity search using Cosine Similarity.
- Safety Threshold: > 0.98 similarity is required for a HIT.
- Performance: ~10ms latency (excluding embedding).
🛡️ Multi-Tenant Security Mask
Your data is private by default. Each cache entry is "salted" with the tenant_id.
| Tenant A | Tenant B | Result |
|---|---|---|
What is 1+1? | What is 1+1? | MISS (for Tenant B) |
NOTE
Only explicitly marked "Public" queries can bypass the tenant salt to serve common knowledge answers.
⏳ Policy & Control
For details on TTL categories, cache bypass headers, and business rules, please refer to the semantic_caching.feature living documentation.
Response headers will include X-Bastion-Cache: HIT-L1, HIT-L2, or MISS.
