Caching
Overview
Caching in LlmTornado enables significant cost savings and performance improvements by reusing previously computed responses and embeddings. Provider-specific caching mechanisms like Anthropic's prompt caching allow you to cache large context blocks, reducing both latency and token costs for repeated requests.
Quick Start
csharp
using LlmTornado;
using LlmTornado.Chat;
using LlmTornado.Chat.Models;
using LlmTornado.Caching;
TornadoApi api = new TornadoApi("your-api-key");
// Enable caching for large context
Conversation conversation = api.Chat.CreateConversation(new ChatRequest
{
Model = ChatModel.Anthropic.Claude35.Sonnet,
SystemMessage = "Large system context that will be cached...",
CacheControl = new CacheControl
{
Type = CacheControlTypes.Ephemeral
}
});
conversation.AddUserMessage("First question");
ChatRichResponse response1 = await conversation.GetResponseRich();
// Subsequent requests use cached context
conversation.AddUserMessage("Second question");
ChatRichResponse response2 = await conversation.GetResponseRich();
// This request will be faster and cheaper!Prerequisites
- The LlmTornado package installed
- A valid API key with caching support
- Understanding of Chat Basics
- Knowledge of provider-specific caching features
Detailed Explanation
What is Caching?
Caching stores frequently used data to avoid recomputation:
- Prompt Caching - Cache large system prompts or context
- Response Caching - Reuse identical API responses
- Embedding Caching - Store computed embeddings
- Result Caching - Cache computation results
Benefits
- Cost Reduction - Pay less for cached tokens
- Faster Responses - Reduce latency dramatically
- Consistency - Same inputs produce same outputs
- Resource Efficiency - Lower API load
Provider Support
Anthropic (Claude)
- Prompt Caching - Cache system prompts and long context
- Ephemeral Type - 5-minute cache duration
- Cost Savings - 90% reduction for cached tokens
OpenAI
- Response Caching - Automatic for identical requests
- Short Duration - Implementation-specific
Basic Usage
Anthropic Prompt Caching
Cache large system contexts:
csharp
// Large document or context
string largeContext = await File.ReadAllTextAsync("documentation.txt");
Conversation conversation = api.Chat.CreateConversation(new ChatRequest
{
Model = ChatModel.Anthropic.Claude35.Sonnet,
Messages = [
new ChatMessage(ChatMessageRoles.System, largeContext)
{
CacheControl = new CacheControl
{
Type = CacheControlTypes.Ephemeral
}
}
]
});
// First call - context is cached
conversation.AddUserMessage("What is the main topic?");
ChatRichResponse response1 = await conversation.GetResponseRich();
// Subsequent calls use cached context (faster + cheaper)
conversation.AddUserMessage("Tell me more about section 3");
ChatRichResponse response2 = await conversation.GetResponseRich();Multi-Block Caching
Cache multiple context blocks:
csharp
string documentation = await File.ReadAllTextAsync("docs.txt");
string codebase = await File.ReadAllTextAsync("code.txt");
Conversation conversation = api.Chat.CreateConversation(new ChatRequest
{
Model = ChatModel.Anthropic.Claude35.Sonnet,
Messages = [
new ChatMessage(ChatMessageRoles.System, documentation)
{
CacheControl = new CacheControl { Type = CacheControlTypes.Ephemeral }
},
new ChatMessage(ChatMessageRoles.System, codebase)
{
CacheControl = new CacheControl { Type = CacheControlTypes.Ephemeral }
}
]
});Application-Level Caching
Implement your own caching layer:
csharp
public class CachedLlmService
{
private readonly TornadoApi _api;
private readonly Dictionary<string, string> _cache;
public CachedLlmService(TornadoApi api)
{
_api = api;
_cache = new Dictionary<string, string>();
}
public async Task<string> GetResponse(string prompt)
{
// Generate cache key
string cacheKey = ComputeHash(prompt);
// Check cache
if (_cache.TryGetValue(cacheKey, out string cachedResponse))
{
Console.WriteLine("Cache hit!");
return cachedResponse;
}
// Call API
Conversation conversation = _api.Chat.CreateConversation(new ChatRequest
{
Model = ChatModel.OpenAi.Gpt4.Turbo
});
conversation.AddUserMessage(prompt);
ChatRichResponse response = await conversation.GetResponseRich();
// Store in cache
_cache[cacheKey] = response.Content;
return response.Content;
}
private string ComputeHash(string input)
{
using System.Security.Cryptography.SHA256 sha256 = System.Security.Cryptography.SHA256.Create();
byte[] bytes = sha256.ComputeHash(System.Text.Encoding.UTF8.GetBytes(input));
return Convert.ToBase64String(bytes);
}
}Advanced Usage
Distributed Caching
Use Redis or similar for shared caching:
csharp
public class RedisCachedLlmService
{
private readonly TornadoApi _api;
private readonly IDistributedCache _cache;
public RedisCachedLlmService(TornadoApi api, IDistributedCache cache)
{
_api = api;
_cache = cache;
}
public async Task<string> GetResponse(string prompt, TimeSpan? expiration = null)
{
string cacheKey = $"llm:{ComputeHash(prompt)}";
// Try get from cache
string? cached = await _cache.GetStringAsync(cacheKey);
if (cached != null)
{
return cached;
}
// Generate response
Conversation conversation = _api.Chat.CreateConversation(new ChatRequest
{
Model = ChatModel.OpenAi.Gpt4.Turbo
});
conversation.AddUserMessage(prompt);
ChatRichResponse response = await conversation.GetResponseRich();
// Store in cache
DistributedCacheEntryOptions options = new DistributedCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = expiration ?? TimeSpan.FromHours(1)
};
await _cache.SetStringAsync(cacheKey, response.Content, options);
return response.Content;
}
private string ComputeHash(string input)
{
using System.Security.Cryptography.SHA256 sha256 = System.Security.Cryptography.SHA256.Create();
byte[] bytes = sha256.ComputeHash(System.Text.Encoding.UTF8.GetBytes(input));
return Convert.ToBase64String(bytes);
}
}Embedding Caching
Cache expensive embedding operations:
csharp
public class EmbeddingCache
{
private readonly TornadoApi _api;
private readonly Dictionary<string, float[]> _cache;
public EmbeddingCache(TornadoApi api)
{
_api = api;
_cache = new Dictionary<string, float[]>();
}
public async Task<float[]> GetEmbedding(string text)
{
if (_cache.TryGetValue(text, out float[]? cached))
{
return cached;
}
EmbeddingResult? result = await _api.Embeddings.CreateEmbedding(
EmbeddingModel.OpenAi.Gen2.Ada,
text);
float[] embedding = result.Data[0].Embedding;
_cache[text] = embedding;
return embedding;
}
}Cache Invalidation
Implement smart cache invalidation:
csharp
public class SmartCache
{
private readonly Dictionary<string, (string Response, DateTime Timestamp)> _cache;
private readonly TimeSpan _ttl;
public SmartCache(TimeSpan ttl)
{
_cache = new Dictionary<string, (string, DateTime)>();
_ttl = ttl;
}
public async Task<string> GetOrCompute(string key, Func<Task<string>> compute)
{
// Check if cached and not expired
if (_cache.TryGetValue(key, out (string Response, DateTime Timestamp) entry))
{
if (DateTime.UtcNow - entry.Timestamp < _ttl)
{
return entry.Response;
}
else
{
// Expired - remove
_cache.Remove(key);
}
}
// Compute and cache
string result = await compute();
_cache[key] = (result, DateTime.UtcNow);
return result;
}
public void InvalidatePattern(string pattern)
{
List<string> toRemove = _cache.Keys
.Where(k => k.Contains(pattern))
.ToList();
foreach (string key in toRemove)
{
_cache.Remove(key);
}
}
}Best Practices
- Cache Hot Paths - Cache frequently accessed data
- Set Appropriate TTL - Balance freshness vs efficiency
- Monitor Cache Hit Rate - Optimize based on metrics
- Use Cache Keys - Create unique, consistent keys
- Handle Cache Misses - Always have fallback logic
Caching Strategy Guidelines
csharp
// Good: Cache static content
string cachedResponse = await GetCachedResponse("What is 2+2?");
// Good: Cache with expiration
await CacheWithTTL(prompt, response, TimeSpan.FromHours(1));
// Bad: Cache user-specific data globally
// DON'T: Cache personal information without user isolation
// Good: Cache per-user
string key = $"user:{userId}:query:{hash}";Common Issues
Stale Cache Data
- Solution: Implement TTL and invalidation
- Prevention: Set appropriate expiration times
Cache Memory Bloat
- Solution: Implement LRU eviction
- Prevention: Monitor cache size and set limits
Cache Key Collisions
- Solution: Use better hashing algorithms
- Prevention: Include relevant context in keys
Cold Cache Performance
- Solution: Pre-warm cache for common queries
- Prevention: Accept initial slower performance
API Reference
CacheControl
CacheControlTypes Type- Cache type (Ephemeral)- Used with Anthropic prompt caching
CacheControlTypes
Ephemeral- Short-term caching (5 minutes)
Cache Usage in Response
Check if cache was used:
csharp
ChatRichResponse response = await conversation.GetResponseRich();
if (response.Usage?.CacheReadInputTokens > 0)
{
Console.WriteLine($"Cache hit! {response.Usage.CacheReadInputTokens} tokens from cache");
}Related Topics
- Chat Basics - Core chat functionality
- Embeddings - Caching embeddings
- Agents - Caching in agents
- Performance Optimization - .NET caching patterns