Skip to content

Moderation

Overview

Content moderation helps ensure that generated content complies with usage policies. LlmTornado provides moderation capabilities to check text for potentially harmful, inappropriate, or policy-violating content across various categories.

Quick Start

csharp
using LlmTornado;
using LlmTornado.Moderation;

TornadoApi api = new TornadoApi("your-api-key");

// Check content for policy violations
ModerationResult? result = await api.Moderation.CreateModeration("Text to check");

if (result?.Results?[0].Flagged == true)
{
    Console.WriteLine("Content flagged for moderation");
}

Prerequisites

  • The LlmTornado package installed
  • A valid API key
  • Understanding of content policy categories

Basic Usage

Simple Moderation Check

csharp
ModerationResult? result = await api.Moderation.CreateModeration(
    "Hello, how are you today?");

ModerationEntry entry = result.Results[0];
Console.WriteLine($"Flagged: {entry.Flagged}");
Console.WriteLine($"Sexual: {entry.Categories.Sexual}");
Console.WriteLine($"Violence: {entry.Categories.Violence}");
Console.WriteLine($"Hate: {entry.Categories.Hate}");

Batch Moderation

csharp
string[] texts = [
    "First text to check",
    "Second text to check",
    "Third text to check"
];

ModerationResult? result = await api.Moderation.CreateModeration(texts);

for (int i = 0; i < result.Results.Count; i++)
{
    Console.WriteLine($"Text {i + 1}: Flagged = {result.Results[i].Flagged}");
}

Detailed Category Scores

csharp
ModerationResult? result = await api.Moderation.CreateModeration("Text to analyze");
ModerationEntry entry = result.Results[0];

Console.WriteLine("Category Scores:");
Console.WriteLine($"Sexual: {entry.CategoryScores.Sexual:F4}");
Console.WriteLine($"Hate: {entry.CategoryScores.Hate:F4}");
Console.WriteLine($"Violence: {entry.CategoryScores.Violence:F4}");
Console.WriteLine($"Self-harm: {entry.CategoryScores.SelfHarm:F4}");
Console.WriteLine($"Sexual/Minors: {entry.CategoryScores.SexualMinors:F4}");
Console.WriteLine($"Hate/Threatening: {entry.CategoryScores.HateThreatening:F4}");
Console.WriteLine($"Violence/Graphic: {entry.CategoryScores.ViolenceGraphic:F4}");

Advanced Usage

Pre-Moderation Filter

Implement a filter for user input:

csharp
async Task<bool> IsContentSafe(string userInput)
{
    ModerationResult? result = await api.Moderation.CreateModeration(userInput);
    return result?.Results?[0].Flagged != true;
}

// Usage
string userInput = Console.ReadLine();
if (await IsContentSafe(userInput))
{
    // Process the input
    ChatRichResponse response = await conversation.GetResponseRich();
}
else
{
    Console.WriteLine("Your input contains inappropriate content.");
}

Best Practices

  1. Moderate User Input - Check content before processing
  2. Moderate AI Output - Verify generated content is appropriate
  3. Handle False Positives - Provide appeal mechanisms
  4. Respect Privacy - Only moderate what's necessary
  5. Log Violations - Track patterns for improvement

API Reference

Moderation Endpoint

  • CreateModeration(string input) - Check single text
  • CreateModeration(string[] input) - Check multiple texts

ModerationResult

  • List<ModerationEntry> Results - Moderation results for each input
  • string Id - Request identifier
  • string Model - Model used for moderation

ModerationEntry

  • bool Flagged - Whether content is flagged
  • Categories Categories - Boolean flags for each category
  • CategoryScores CategoryScores - Confidence scores (0-1)