Skip to content

Caching

Overview

Caching in LlmTornado enables significant cost savings and performance improvements by reusing previously computed responses and embeddings. Provider-specific caching mechanisms like Anthropic's prompt caching allow you to cache large context blocks, reducing both latency and token costs for repeated requests.

Quick Start

csharp
using LlmTornado;
using LlmTornado.Chat;
using LlmTornado.Chat.Models;
using LlmTornado.Caching;

TornadoApi api = new TornadoApi("your-api-key");

// Enable caching for large context
Conversation conversation = api.Chat.CreateConversation(new ChatRequest
{
    Model = ChatModel.Anthropic.Claude35.Sonnet,
    SystemMessage = "Large system context that will be cached...",
    CacheControl = new CacheControl
    {
        Type = CacheControlTypes.Ephemeral
    }
});

conversation.AddUserMessage("First question");
ChatRichResponse response1 = await conversation.GetResponseRich();

// Subsequent requests use cached context
conversation.AddUserMessage("Second question");
ChatRichResponse response2 = await conversation.GetResponseRich();
// This request will be faster and cheaper!

Prerequisites

  • The LlmTornado package installed
  • A valid API key with caching support
  • Understanding of Chat Basics
  • Knowledge of provider-specific caching features

Detailed Explanation

What is Caching?

Caching stores frequently used data to avoid recomputation:

  • Prompt Caching - Cache large system prompts or context
  • Response Caching - Reuse identical API responses
  • Embedding Caching - Store computed embeddings
  • Result Caching - Cache computation results

Benefits

  1. Cost Reduction - Pay less for cached tokens
  2. Faster Responses - Reduce latency dramatically
  3. Consistency - Same inputs produce same outputs
  4. Resource Efficiency - Lower API load

Provider Support

Anthropic (Claude)

  • Prompt Caching - Cache system prompts and long context
  • Ephemeral Type - 5-minute cache duration
  • Cost Savings - 90% reduction for cached tokens

OpenAI

  • Response Caching - Automatic for identical requests
  • Short Duration - Implementation-specific

Basic Usage

Anthropic Prompt Caching

Cache large system contexts:

csharp
// Large document or context
string largeContext = await File.ReadAllTextAsync("documentation.txt");

Conversation conversation = api.Chat.CreateConversation(new ChatRequest
{
    Model = ChatModel.Anthropic.Claude35.Sonnet,
    Messages = [
        new ChatMessage(ChatMessageRoles.System, largeContext)
        {
            CacheControl = new CacheControl
            {
                Type = CacheControlTypes.Ephemeral
            }
        }
    ]
});

// First call - context is cached
conversation.AddUserMessage("What is the main topic?");
ChatRichResponse response1 = await conversation.GetResponseRich();

// Subsequent calls use cached context (faster + cheaper)
conversation.AddUserMessage("Tell me more about section 3");
ChatRichResponse response2 = await conversation.GetResponseRich();

Multi-Block Caching

Cache multiple context blocks:

csharp
string documentation = await File.ReadAllTextAsync("docs.txt");
string codebase = await File.ReadAllTextAsync("code.txt");

Conversation conversation = api.Chat.CreateConversation(new ChatRequest
{
    Model = ChatModel.Anthropic.Claude35.Sonnet,
    Messages = [
        new ChatMessage(ChatMessageRoles.System, documentation)
        {
            CacheControl = new CacheControl { Type = CacheControlTypes.Ephemeral }
        },
        new ChatMessage(ChatMessageRoles.System, codebase)
        {
            CacheControl = new CacheControl { Type = CacheControlTypes.Ephemeral }
        }
    ]
});

Application-Level Caching

Implement your own caching layer:

csharp
public class CachedLlmService
{
    private readonly TornadoApi _api;
    private readonly Dictionary<string, string> _cache;

    public CachedLlmService(TornadoApi api)
    {
        _api = api;
        _cache = new Dictionary<string, string>();
    }

    public async Task<string> GetResponse(string prompt)
    {
        // Generate cache key
        string cacheKey = ComputeHash(prompt);

        // Check cache
        if (_cache.TryGetValue(cacheKey, out string cachedResponse))
        {
            Console.WriteLine("Cache hit!");
            return cachedResponse;
        }

        // Call API
        Conversation conversation = _api.Chat.CreateConversation(new ChatRequest
        {
            Model = ChatModel.OpenAi.Gpt4.Turbo
        });
        
        conversation.AddUserMessage(prompt);
        ChatRichResponse response = await conversation.GetResponseRich();

        // Store in cache
        _cache[cacheKey] = response.Content;
        return response.Content;
    }

    private string ComputeHash(string input)
    {
        using System.Security.Cryptography.SHA256 sha256 = System.Security.Cryptography.SHA256.Create();
        byte[] bytes = sha256.ComputeHash(System.Text.Encoding.UTF8.GetBytes(input));
        return Convert.ToBase64String(bytes);
    }
}

Advanced Usage

Distributed Caching

Use Redis or similar for shared caching:

csharp
public class RedisCachedLlmService
{
    private readonly TornadoApi _api;
    private readonly IDistributedCache _cache;

    public RedisCachedLlmService(TornadoApi api, IDistributedCache cache)
    {
        _api = api;
        _cache = cache;
    }

    public async Task<string> GetResponse(string prompt, TimeSpan? expiration = null)
    {
        string cacheKey = $"llm:{ComputeHash(prompt)}";

        // Try get from cache
        string? cached = await _cache.GetStringAsync(cacheKey);
        if (cached != null)
        {
            return cached;
        }

        // Generate response
        Conversation conversation = _api.Chat.CreateConversation(new ChatRequest
        {
            Model = ChatModel.OpenAi.Gpt4.Turbo
        });
        
        conversation.AddUserMessage(prompt);
        ChatRichResponse response = await conversation.GetResponseRich();

        // Store in cache
        DistributedCacheEntryOptions options = new DistributedCacheEntryOptions
        {
            AbsoluteExpirationRelativeToNow = expiration ?? TimeSpan.FromHours(1)
        };
        
        await _cache.SetStringAsync(cacheKey, response.Content, options);
        return response.Content;
    }

    private string ComputeHash(string input)
    {
        using System.Security.Cryptography.SHA256 sha256 = System.Security.Cryptography.SHA256.Create();
        byte[] bytes = sha256.ComputeHash(System.Text.Encoding.UTF8.GetBytes(input));
        return Convert.ToBase64String(bytes);
    }
}

Embedding Caching

Cache expensive embedding operations:

csharp
public class EmbeddingCache
{
    private readonly TornadoApi _api;
    private readonly Dictionary<string, float[]> _cache;

    public EmbeddingCache(TornadoApi api)
    {
        _api = api;
        _cache = new Dictionary<string, float[]>();
    }

    public async Task<float[]> GetEmbedding(string text)
    {
        if (_cache.TryGetValue(text, out float[]? cached))
        {
            return cached;
        }

        EmbeddingResult? result = await _api.Embeddings.CreateEmbedding(
            EmbeddingModel.OpenAi.Gen2.Ada,
            text);

        float[] embedding = result.Data[0].Embedding;
        _cache[text] = embedding;
        return embedding;
    }
}

Cache Invalidation

Implement smart cache invalidation:

csharp
public class SmartCache
{
    private readonly Dictionary<string, (string Response, DateTime Timestamp)> _cache;
    private readonly TimeSpan _ttl;

    public SmartCache(TimeSpan ttl)
    {
        _cache = new Dictionary<string, (string, DateTime)>();
        _ttl = ttl;
    }

    public async Task<string> GetOrCompute(string key, Func<Task<string>> compute)
    {
        // Check if cached and not expired
        if (_cache.TryGetValue(key, out (string Response, DateTime Timestamp) entry))
        {
            if (DateTime.UtcNow - entry.Timestamp < _ttl)
            {
                return entry.Response;
            }
            else
            {
                // Expired - remove
                _cache.Remove(key);
            }
        }

        // Compute and cache
        string result = await compute();
        _cache[key] = (result, DateTime.UtcNow);
        return result;
    }

    public void InvalidatePattern(string pattern)
    {
        List<string> toRemove = _cache.Keys
            .Where(k => k.Contains(pattern))
            .ToList();

        foreach (string key in toRemove)
        {
            _cache.Remove(key);
        }
    }
}

Best Practices

  1. Cache Hot Paths - Cache frequently accessed data
  2. Set Appropriate TTL - Balance freshness vs efficiency
  3. Monitor Cache Hit Rate - Optimize based on metrics
  4. Use Cache Keys - Create unique, consistent keys
  5. Handle Cache Misses - Always have fallback logic

Caching Strategy Guidelines

csharp
// Good: Cache static content
string cachedResponse = await GetCachedResponse("What is 2+2?");

// Good: Cache with expiration
await CacheWithTTL(prompt, response, TimeSpan.FromHours(1));

// Bad: Cache user-specific data globally
// DON'T: Cache personal information without user isolation

// Good: Cache per-user
string key = $"user:{userId}:query:{hash}";

Common Issues

Stale Cache Data

  • Solution: Implement TTL and invalidation
  • Prevention: Set appropriate expiration times

Cache Memory Bloat

  • Solution: Implement LRU eviction
  • Prevention: Monitor cache size and set limits

Cache Key Collisions

  • Solution: Use better hashing algorithms
  • Prevention: Include relevant context in keys

Cold Cache Performance

  • Solution: Pre-warm cache for common queries
  • Prevention: Accept initial slower performance

API Reference

CacheControl

  • CacheControlTypes Type - Cache type (Ephemeral)
  • Used with Anthropic prompt caching

CacheControlTypes

  • Ephemeral - Short-term caching (5 minutes)

Cache Usage in Response

Check if cache was used:

csharp
ChatRichResponse response = await conversation.GetResponseRich();
if (response.Usage?.CacheReadInputTokens > 0)
{
    Console.WriteLine($"Cache hit! {response.Usage.CacheReadInputTokens} tokens from cache");
}