Caching

Overview

Caching in LlmTornado enables significant cost savings and performance improvements by reusing previously computed responses and embeddings. Provider-specific caching mechanisms like Anthropic's prompt caching allow you to cache large context blocks, reducing both latency and token costs for repeated requests.

Quick Start

csharp

using LlmTornado;
using LlmTornado.Chat;
using LlmTornado.Chat.Models;
using LlmTornado.Caching;

TornadoApi api = new TornadoApi("your-api-key");

// Enable caching for large context
Conversation conversation = api.Chat.CreateConversation(new ChatRequest
{
    Model = ChatModel.Anthropic.Claude35.Sonnet,
    SystemMessage = "Large system context that will be cached...",
    CacheControl = new CacheControl
    {
        Type = CacheControlTypes.Ephemeral
    }
});

conversation.AddUserMessage("First question");
ChatRichResponse response1 = await conversation.GetResponseRich();

// Subsequent requests use cached context
conversation.AddUserMessage("Second question");
ChatRichResponse response2 = await conversation.GetResponseRich();
// This request will be faster and cheaper!

Prerequisites

The LlmTornado package installed
A valid API key with caching support
Understanding of Chat Basics
Knowledge of provider-specific caching features

Detailed Explanation

What is Caching?

Caching stores frequently used data to avoid recomputation:

Prompt Caching - Cache large system prompts or context
Response Caching - Reuse identical API responses
Embedding Caching - Store computed embeddings
Result Caching - Cache computation results

Benefits

Cost Reduction - Pay less for cached tokens
Faster Responses - Reduce latency dramatically
Consistency - Same inputs produce same outputs
Resource Efficiency - Lower API load

Provider Support

Anthropic (Claude)

Prompt Caching - Cache system prompts and long context
Ephemeral Type - 5-minute cache duration
Cost Savings - 90% reduction for cached tokens

OpenAI

Response Caching - Automatic for identical requests
Short Duration - Implementation-specific

Basic Usage

Anthropic Prompt Caching

Cache large system contexts:

csharp

// Large document or context
string largeContext = await File.ReadAllTextAsync("documentation.txt");

Conversation conversation = api.Chat.CreateConversation(new ChatRequest
{
    Model = ChatModel.Anthropic.Claude35.Sonnet,
    Messages = [
        new ChatMessage(ChatMessageRoles.System, largeContext)
        {
            CacheControl = new CacheControl
            {
                Type = CacheControlTypes.Ephemeral
            }
        }
    ]
});

// First call - context is cached
conversation.AddUserMessage("What is the main topic?");
ChatRichResponse response1 = await conversation.GetResponseRich();

// Subsequent calls use cached context (faster + cheaper)
conversation.AddUserMessage("Tell me more about section 3");
ChatRichResponse response2 = await conversation.GetResponseRich();

Multi-Block Caching

Cache multiple context blocks:

csharp

string documentation = await File.ReadAllTextAsync("docs.txt");
string codebase = await File.ReadAllTextAsync("code.txt");

Conversation conversation = api.Chat.CreateConversation(new ChatRequest
{
    Model = ChatModel.Anthropic.Claude35.Sonnet,
    Messages = [
        new ChatMessage(ChatMessageRoles.System, documentation)
        {
            CacheControl = new CacheControl { Type = CacheControlTypes.Ephemeral }
        },
        new ChatMessage(ChatMessageRoles.System, codebase)
        {
            CacheControl = new CacheControl { Type = CacheControlTypes.Ephemeral }
        }
    ]
});

Application-Level Caching

Implement your own caching layer:

csharp

public class CachedLlmService
{
    private readonly TornadoApi _api;
    private readonly Dictionary<string, string> _cache;

    public CachedLlmService(TornadoApi api)
    {
        _api = api;
        _cache = new Dictionary<string, string>();
    }

    public async Task<string> GetResponse(string prompt)
    {
        // Generate cache key
        string cacheKey = ComputeHash(prompt);

        // Check cache
        if (_cache.TryGetValue(cacheKey, out string cachedResponse))
        {
            Console.WriteLine("Cache hit!");
            return cachedResponse;
        }

        // Call API
        Conversation conversation = _api.Chat.CreateConversation(new ChatRequest
        {
            Model = ChatModel.OpenAi.Gpt4.Turbo
        });
        
        conversation.AddUserMessage(prompt);
        ChatRichResponse response = await conversation.GetResponseRich();

        // Store in cache
        _cache[cacheKey] = response.Content;
        return response.Content;
    }

    private string ComputeHash(string input)
    {
        using System.Security.Cryptography.SHA256 sha256 = System.Security.Cryptography.SHA256.Create();
        byte[] bytes = sha256.ComputeHash(System.Text.Encoding.UTF8.GetBytes(input));
        return Convert.ToBase64String(bytes);
    }
}

Advanced Usage

Distributed Caching

Use Redis or similar for shared caching:

csharp

public class RedisCachedLlmService
{
    private readonly TornadoApi _api;
    private readonly IDistributedCache _cache;

    public RedisCachedLlmService(TornadoApi api, IDistributedCache cache)
    {
        _api = api;
        _cache = cache;
    }

    public async Task<string> GetResponse(string prompt, TimeSpan? expiration = null)
    {
        string cacheKey = $"llm:{ComputeHash(prompt)}";

        // Try get from cache
        string? cached = await _cache.GetStringAsync(cacheKey);
        if (cached != null)
        {
            return cached;
        }

        // Generate response
        Conversation conversation = _api.Chat.CreateConversation(new ChatRequest
        {
            Model = ChatModel.OpenAi.Gpt4.Turbo
        });
        
        conversation.AddUserMessage(prompt);
        ChatRichResponse response = await conversation.GetResponseRich();

        // Store in cache
        DistributedCacheEntryOptions options = new DistributedCacheEntryOptions
        {
            AbsoluteExpirationRelativeToNow = expiration ?? TimeSpan.FromHours(1)
        };
        
        await _cache.SetStringAsync(cacheKey, response.Content, options);
        return response.Content;
    }

    private string ComputeHash(string input)
    {
        using System.Security.Cryptography.SHA256 sha256 = System.Security.Cryptography.SHA256.Create();
        byte[] bytes = sha256.ComputeHash(System.Text.Encoding.UTF8.GetBytes(input));
        return Convert.ToBase64String(bytes);
    }
}

Embedding Caching

Cache expensive embedding operations:

csharp

public class EmbeddingCache
{
    private readonly TornadoApi _api;
    private readonly Dictionary<string, float[]> _cache;

    public EmbeddingCache(TornadoApi api)
    {
        _api = api;
        _cache = new Dictionary<string, float[]>();
    }

    public async Task<float[]> GetEmbedding(string text)
    {
        if (_cache.TryGetValue(text, out float[]? cached))
        {
            return cached;
        }

        EmbeddingResult? result = await _api.Embeddings.CreateEmbedding(
            EmbeddingModel.OpenAi.Gen2.Ada,
            text);

        float[] embedding = result.Data[0].Embedding;
        _cache[text] = embedding;
        return embedding;
    }
}

Cache Invalidation

Implement smart cache invalidation:

csharp

public class SmartCache
{
    private readonly Dictionary<string, (string Response, DateTime Timestamp)> _cache;
    private readonly TimeSpan _ttl;

    public SmartCache(TimeSpan ttl)
    {
        _cache = new Dictionary<string, (string, DateTime)>();
        _ttl = ttl;
    }

    public async Task<string> GetOrCompute(string key, Func<Task<string>> compute)
    {
        // Check if cached and not expired
        if (_cache.TryGetValue(key, out (string Response, DateTime Timestamp) entry))
        {
            if (DateTime.UtcNow - entry.Timestamp < _ttl)
            {
                return entry.Response;
            }
            else
            {
                // Expired - remove
                _cache.Remove(key);
            }
        }

        // Compute and cache
        string result = await compute();
        _cache[key] = (result, DateTime.UtcNow);
        return result;
    }

    public void InvalidatePattern(string pattern)
    {
        List<string> toRemove = _cache.Keys
            .Where(k => k.Contains(pattern))
            .ToList();

        foreach (string key in toRemove)
        {
            _cache.Remove(key);
        }
    }
}

Best Practices

Cache Hot Paths - Cache frequently accessed data
Set Appropriate TTL - Balance freshness vs efficiency
Monitor Cache Hit Rate - Optimize based on metrics
Use Cache Keys - Create unique, consistent keys
Handle Cache Misses - Always have fallback logic

Caching Strategy Guidelines

csharp

// Good: Cache static content
string cachedResponse = await GetCachedResponse("What is 2+2?");

// Good: Cache with expiration
await CacheWithTTL(prompt, response, TimeSpan.FromHours(1));

// Bad: Cache user-specific data globally
// DON'T: Cache personal information without user isolation

// Good: Cache per-user
string key = $"user:{userId}:query:{hash}";

Common Issues

Stale Cache Data

Solution: Implement TTL and invalidation
Prevention: Set appropriate expiration times

Cache Memory Bloat

Solution: Implement LRU eviction
Prevention: Monitor cache size and set limits

Cache Key Collisions

Solution: Use better hashing algorithms
Prevention: Include relevant context in keys

Cold Cache Performance

Solution: Pre-warm cache for common queries
Prevention: Accept initial slower performance

API Reference

CacheControl

CacheControlTypes Type - Cache type (Ephemeral)
Used with Anthropic prompt caching

CacheControlTypes

Ephemeral - Short-term caching (5 minutes)

Cache Usage in Response

Check if cache was used:

csharp

ChatRichResponse response = await conversation.GetResponseRich();
if (response.Usage?.CacheReadInputTokens > 0)
{
    Console.WriteLine($"Cache hit! {response.Usage.CacheReadInputTokens} tokens from cache");
}

Chat Basics - Core chat functionality
Embeddings - Caching embeddings
Agents - Caching in agents
Performance Optimization - .NET caching patterns

Caching ​

Overview ​

Quick Start ​

Prerequisites ​

Detailed Explanation ​

What is Caching? ​

Benefits ​

Provider Support ​

Anthropic (Claude) ​

OpenAI ​

Basic Usage ​

Anthropic Prompt Caching ​

Multi-Block Caching ​

Application-Level Caching ​

Advanced Usage ​

Distributed Caching ​

Embedding Caching ​

Cache Invalidation ​

Best Practices ​

Caching Strategy Guidelines ​

Common Issues ​

Stale Cache Data ​

Cache Memory Bloat ​

Cache Key Collisions ​

Cold Cache Performance ​

API Reference ​

CacheControl ​

CacheControlTypes ​

Cache Usage in Response ​

Related Topics ​

Caching

Overview

Quick Start

Prerequisites

Detailed Explanation

What is Caching?

Benefits

Provider Support

Anthropic (Claude)

OpenAI

Basic Usage

Anthropic Prompt Caching

Multi-Block Caching

Application-Level Caching

Advanced Usage

Distributed Caching

Embedding Caching

Cache Invalidation

Best Practices

Caching Strategy Guidelines

Common Issues

Stale Cache Data

Cache Memory Bloat

Cache Key Collisions

Cold Cache Performance

API Reference

CacheControl

CacheControlTypes

Cache Usage in Response

Related Topics