Vector Search
Store embeddings as F32 columns, compute cosine similarity and euclidean distance, find nearest neighbors with brute-force KNN or HNSW index.
Embedding Columns
TeideDB stores dense float32 vectors using the TD_F32 type. Embeddings use a flat N×D layout: N rows of D-dimensional vectors stored contiguously in memory. This layout is cache-friendly and SIMD-ready, enabling vectorized distance computations across the full column.
Create embedding columns via the Rust API:
// Create a 3-dimensional embedding column
let embeddings: Vec<f32> = vec![
0.1, 0.2, 0.3, // row 0
0.4, 0.5, 0.6, // row 1
0.7, 0.8, 0.9, // row 2
];
table.create_embedding_column(&ctx, "embedding", &embeddings, 3)?;
Cosine Similarity
COSINE_SIMILARITY(embedding_col, ARRAY[1.0, 0.0, ...])
Computes dot(a, b) / (||a|| * ||b||) per row. Returns an F64 similarity score where 1.0 means identical direction and 0.0 means orthogonal vectors.
SELECT name, COSINE_SIMILARITY(embedding, ARRAY[0.1, 0.2, 0.3]) AS sim
FROM documents
ORDER BY sim DESC
LIMIT 10;
Euclidean Distance
EUCLIDEAN_DISTANCE(embedding_col, ARRAY[1.0, 0.0, ...])
Computes sqrt(sum((a_i - b_i)^2)) per row. Returns an F64 distance where 0.0 means identical vectors.
SELECT name, EUCLIDEAN_DISTANCE(embedding, ARRAY[0.1, 0.2, 0.3]) AS dist
FROM documents
ORDER BY dist ASC
LIMIT 10;
KNN Search
TeideDB automatically detects K-nearest-neighbor query patterns and optimizes them. When a query matches the pattern SELECT ... FROM table ORDER BY similarity_function DESC LIMIT k, the planner bypasses the general execution path and uses a fast KNN kernel (brute-force O(N*D), or HNSW O(D * log N) when an index exists).
-- This query pattern is auto-optimized as KNN search:
SELECT name, COSINE_SIMILARITY(embedding, ARRAY[0.1, 0.2, 0.3]) AS sim
FROM documents
ORDER BY sim DESC
LIMIT 10;
The optimizer detects this pattern when all of the following are true:
- ORDER BY uses a
COSINE_SIMILARITYorEUCLIDEAN_DISTANCEexpression - A
LIMITclause is present - No WHERE, GROUP BY, DISTINCT, or JOIN clauses
If an HNSW index exists on the embedding column, it is used automatically. Otherwise, brute-force scan is performed.
HNSW Index
For large-scale nearest neighbor search, TeideDB supports HNSW (Hierarchical Navigable Small Worlds) approximate nearest neighbor indexes. HNSW builds a multi-layer proximity graph that enables O(D * log N) query time.
Parameters
| Parameter | Default | Description |
|---|---|---|
M | 16 | Max neighbors per node. Higher = better recall, more memory. |
ef_construction | 200 | Build-time search width. Higher = better index quality, slower build. |
ef_search | 64 | Query-time search width. Higher = better recall, slower query. |
Creating an Index
CREATE VECTOR INDEX idx_docs_embedding
ON documents (embedding)
USING HNSW(M = 16, ef_construction = 200);
Persistence
HNSW indexes support save, load, and memory-mapped (mmap) access for efficient startup and shared-memory deployments.
Performance
| Vectors | Brute-force | HNSW |
|---|---|---|
| 100K | ~50ms | ~0.5ms |
| 1M | ~500ms | ~1ms |
| 10M | ~5s | ~2ms |
Embedding Column Restrictions
Tables with high-dimensional embedding columns (dim > 1) have DML restrictions because the C engine's filter and sort kernels operate element-wise on flat F32 arrays:
- SELECT with WHERE / ORDER BY / LIMIT / GROUP BY / DISTINCT is not supported on embedding columns.
- UPDATE and DELETE with WHERE are not supported.
- INSERT ... VALUES is not supported (use
INSERT ... SELECT). - Embedding columns with dim = 1 are exempt from all restrictions (identical layout to plain F32).
Use the KNN query pattern (ORDER BY similarity LIMIT k) to query embedding tables — the optimizer handles this case specially.
DML and Vector Indexes
DML operations (INSERT, DELETE) on a table with vector indexes automatically drop those indexes, since the underlying column data is reallocated. Recreate the index after bulk modifications:
-- After INSERT, the index is gone; recreate it
INSERT INTO documents SELECT * FROM new_docs;
CREATE VECTOR INDEX idx_emb ON documents(embedding) USING HNSW(M = 16, ef_construction = 200);