Skip to content

Vector Databases - Getting Started

Overview

LlmTornado provides integration with vector databases for semantic search, embeddings storage, and retrieval-augmented generation (RAG). Vector databases enable you to store and query high-dimensional vectors efficiently.

Quick Start

csharp
// Install the packages:
// dotnet add package LlmTornado.VectorDatabases
// dotnet add package LlmTornado.VectorDatabases.ChromaDB

using LlmTornado;
using LlmTornado.VectorDatabases;

// Initialize API and vector database
TornadoApi api = new TornadoApi("your-api-key");

// Create embeddings
List<float[]> embeddings = await api.Embeddings.CreateEmbedding(
    new List<string> { "Hello world", "AI is amazing" },
    ChatModel.OpenAi.Embedding.Ada002
);

// Store in vector database (see ChromaDB documentation)

What are Vector Databases?

Vector databases store data as high-dimensional vectors (embeddings) that capture semantic meaning. This enables:

  • Semantic Search: Find similar content by meaning, not just keywords
  • RAG: Retrieve relevant context for AI responses
  • Similarity Matching: Find similar items across large datasets
  • Clustering: Group related items automatically

Creating Embeddings

Text Embeddings

csharp
TornadoApi api = new TornadoApi("your-api-key");

// Single text
float[] embedding = await api.Embeddings.CreateEmbedding(
    "The quick brown fox",
    ChatModel.OpenAi.Embedding.Ada002
);

// Multiple texts
List<float[]> embeddings = await api.Embeddings.CreateEmbedding(
    new List<string> 
    { 
        "First document",
        "Second document",
        "Third document"
    },
    ChatModel.OpenAi.Embedding.Ada002
);

Multimodal Embeddings

csharp
// Image and text embeddings (if supported by provider)
// Useful for image search and multimodal RAG

Use Cases

Store document embeddings and find similar content:

csharp
// 1. Embed documents
// 2. Store in vector database
// 3. Embed query
// 4. Find similar vectors
// 5. Return matching documents

RAG (Retrieval-Augmented Generation)

Enhance AI responses with relevant context:

csharp
// 1. Embed user query
// 2. Retrieve relevant documents
// 3. Pass documents as context to AI
// 4. Generate informed response

Document Clustering

Group similar documents automatically:

csharp
// 1. Create embeddings for all documents
// 2. Use clustering algorithm
// 3. Group by similarity

Supported Providers

LlmTornado supports various embedding providers:

  • OpenAI: text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large
  • Google: Gecko embeddings
  • Cohere: Embed v3
  • Voyage AI: Voyage embeddings

Best Practices

  • Choose appropriate embedding models for your use case
  • Batch embedding requests for efficiency
  • Cache embeddings when possible
  • Consider embedding dimensions vs. performance trade-offs
  • Normalize vectors when required