SQL REFERENCE

Vector Search

Store embeddings as F32 columns, compute cosine similarity and euclidean distance, find nearest neighbors with brute-force KNN or HNSW index.

Embedding Columns

TeideDB stores dense float32 vectors using the TD_F32 type. Embeddings use a flat N×D layout: N rows of D-dimensional vectors stored contiguously in memory. This layout is cache-friendly and SIMD-ready, enabling vectorized distance computations across the full column.

Create embedding columns via the Rust API:

// Create a 3-dimensional embedding column
let embeddings: Vec<f32> = vec![
    0.1, 0.2, 0.3,   // row 0
    0.4, 0.5, 0.6,   // row 1
    0.7, 0.8, 0.9,   // row 2
];
table.create_embedding_column(&ctx, "embedding", &embeddings, 3)?;

Cosine Similarity

COSINE_SIMILARITY(embedding_col, ARRAY[1.0, 0.0, ...])

Computes dot(a, b) / (||a|| * ||b||) per row. Returns an F64 similarity score where 1.0 means identical direction and 0.0 means orthogonal vectors.

SELECT name, COSINE_SIMILARITY(embedding, ARRAY[0.1, 0.2, 0.3]) AS sim
FROM documents
ORDER BY sim DESC
LIMIT 10;

Euclidean Distance

EUCLIDEAN_DISTANCE(embedding_col, ARRAY[1.0, 0.0, ...])

Computes sqrt(sum((a_i - b_i)^2)) per row. Returns an F64 distance where 0.0 means identical vectors.

SELECT name, EUCLIDEAN_DISTANCE(embedding, ARRAY[0.1, 0.2, 0.3]) AS dist
FROM documents
ORDER BY dist ASC
LIMIT 10;

KNN Search

TeideDB automatically detects K-nearest-neighbor query patterns and optimizes them. When a query matches the pattern SELECT ... FROM table ORDER BY similarity_function DESC LIMIT k, the planner bypasses the general execution path and uses a fast KNN kernel (brute-force O(N*D), or HNSW O(D * log N) when an index exists).

-- This query pattern is auto-optimized as KNN search:
SELECT name, COSINE_SIMILARITY(embedding, ARRAY[0.1, 0.2, 0.3]) AS sim
FROM documents
ORDER BY sim DESC
LIMIT 10;

The optimizer detects this pattern when all of the following are true:

If an HNSW index exists on the embedding column, it is used automatically. Otherwise, brute-force scan is performed.

HNSW Index

For large-scale nearest neighbor search, TeideDB supports HNSW (Hierarchical Navigable Small Worlds) approximate nearest neighbor indexes. HNSW builds a multi-layer proximity graph that enables O(D * log N) query time.

Parameters

ParameterDefaultDescription
M16Max neighbors per node. Higher = better recall, more memory.
ef_construction200Build-time search width. Higher = better index quality, slower build.
ef_search64Query-time search width. Higher = better recall, slower query.

Creating an Index

CREATE VECTOR INDEX idx_docs_embedding
ON documents (embedding)
USING HNSW(M = 16, ef_construction = 200);

Persistence

HNSW indexes support save, load, and memory-mapped (mmap) access for efficient startup and shared-memory deployments.

Performance

VectorsBrute-forceHNSW
100K~50ms~0.5ms
1M~500ms~1ms
10M~5s~2ms

Embedding Column Restrictions

Tables with high-dimensional embedding columns (dim > 1) have DML restrictions because the C engine's filter and sort kernels operate element-wise on flat F32 arrays:

Use the KNN query pattern (ORDER BY similarity LIMIT k) to query embedding tables — the optimizer handles this case specially.

DML and Vector Indexes

DML operations (INSERT, DELETE) on a table with vector indexes automatically drop those indexes, since the underlying column data is reallocated. Recreate the index after bulk modifications:

-- After INSERT, the index is gone; recreate it
INSERT INTO documents SELECT * FROM new_docs;
CREATE VECTOR INDEX idx_emb ON documents(embedding) USING HNSW(M = 16, ef_construction = 200);