Skip to content

Guardrails

Overview

Guardrails provide safety mechanisms and constraints for agent behavior, ensuring agents operate within acceptable boundaries and follow security best practices.

Quick Start

csharp
using LlmTornado.Agents;

public struct IsMath
{
    public string Reasoning { get; set; }
    public bool IsMathRequest { get; set; }
}
    
async ValueTask<GuardRailFunctionOutput> MathGuardRail(string? input = "")
{
    TornadoAgent mathGuardrail = new TornadoAgent(api, ChatModel.OpenAi.Gpt41.V41Mini, instructions: "Check if the user is asking you a Math related question.", outputSchema: typeof(IsMath));

    Conversation result = await TornadoRunner.RunAsync(mathGuardrail, input);

    IsMath? isMath = result.Messages.Last().Content.JsonDecode<IsMath>();

    return new GuardRailFunctionOutput(isMath?.Reasoning ?? "", !isMath?.IsMathRequest ?? false);
}

TornadoAgent agent = new TornadoAgent(
    client: api,
    model: ChatModel.OpenAi.Gpt41.V41,
    instructions: "You are a useful assistant"
);

try
{
    Conversation result = await agent.RunAsync("What is the weather?", inputGuardRailFunction: MathGuardRail);

    Console.WriteLine(result.Messages.Last().Content);
}
catch (GuardRailTriggerException guardRailEx)
{
      Console.WriteLine(guardRailEx.message)
}

Other considerations

Content Filtering

  • Reject harmful or inappropriate requests
  • Filter sensitive information from outputs
  • Validate input before processing
  • Sanitize responses

Output Validation

  • Check responses meet quality standards
  • Verify structured output schemas
  • Ensure factual accuracy when possible
  • Validate against business rules

Implementation

Tool Permissions

csharp
Dictionary<string, bool> permissions = new Dictionary<string, bool>
{
    ["send_email"] = true,      // Requires permission
    ["read_file"] = false,      // No permission needed
    ["delete_data"] = true      // Requires permission
};

TornadoAgent agent = new TornadoAgent(
    api, model,
    toolPermissionRequired: permissions
);

Input Validation

csharp
async Task<Conversation> SafeRunAsync(TornadoAgent agent, string userInput)
{
    // Validate input
    if (ContainsSensitiveInfo(userInput))
    {
        throw new InvalidOperationException("Input contains sensitive information");
    }
    
    if (userInput.Length > 10000)
    {
        throw new InvalidOperationException("Input too long");
    }
    
    return await agent.RunAsync(userInput);
}

Output Filtering

csharp
async Task<string> FilteredResponse(TornadoAgent agent, string input)
{
    Conversation result = await agent.RunAsync(input);
    string response = result.Messages.Last().Content;
    
    // Filter sensitive patterns
    response = Regex.Replace(response, @"\d{3}-\d{2}-\d{4}", "***-**-****"); // SSN
    response = Regex.Replace(response, @"\b\d{16}\b", "****-****-****-****"); // Credit card
    
    return response;
}

Best Practices

  • Define clear boundaries in instructions
  • Implement multiple layers of protection
  • Log and monitor agent behavior
  • Test guardrails thoroughly
  • Update guardrails as threats evolve