GUIDES

Embedding TeideDB in Rust Applications

TeideDB is a library, not just a CLI. Embed a columnar analytics engine in your Rust application with zero external dependencies and zero setup.

You are building an IoT monitoring service in Rust. Sensors push temperature, humidity, and pressure readings every few seconds. Your API needs to answer queries like "average temperature over the last hour, grouped by sensor" -- fast, without a network round-trip to a separate database process. You need the database inside your binary.

This guide covers the full integration path. By the end, you will know how to:

Estimated time: about 12 minutes.

The Problem: Analytics Inside Your Process

SQLite solves the embedded case for row-oriented workloads, but columnar aggregations over millions of rows are not its strength. DuckDB is columnar but exposes a C++ API -- the FFI boundary is wide and the build system is heavy. TeideDB takes a different approach: the core engine is C17 (~15,000 lines), vendored directly into the Rust crate, and compiled from source by build.rs. No system library, no dynamic linking, no separate process. cargo build gives you a statically linked columnar engine with a morsel-driven executor, a cost-based optimizer, and SQL/PGQ graph queries -- all callable from safe Rust.

Adding the Dependency

TeideDB is not yet on crates.io. Add it as a path or git dependency:

[dependencies]
# From a local checkout:
teide = { path = "../teide-rs" }

# Or from git:
teide = { git = "https://github.com/TeideDB/teide-rs.git" }

The crate exposes three feature flags:

FeatureWhat it addsDefault
(default)Core engine + SQL session APIYes
cliInteractive REPL binaryNo
serverPgWire server binaryNo

For embedding in your own application, the default features are all you need. Leave cli and server off to avoid pulling in their dependencies (rustyline, pgwire, tokio).

Build requirement: Your system needs a C17-capable compiler (gcc >= 7, clang >= 5, or MSVC 2019+). The build.rs script compiles the vendored C source tree at vendor/teide/ using the cc crate. On most systems this just works. If it does not, check that cc can find your C compiler.

The Graph API: Low-Level Power

At the lowest level, TeideDB exposes a lazy DAG of operations. You build a computation graph, then call execute(). Nothing touches data until that final call -- the optimizer (constant folding, predicate pushdown, CSE, fusion, DCE) runs first, then the morsel-driven executor processes the optimized plan.

use teide::{Context, Table};

fn main() -> Result<(), teide::Error> {
    let ctx = Context::new()?;

    let table = Table::from_vecs(
        &ctx,
        &["sensor_id", "temperature"],
        &[vec![1i64, 2, 3, 1, 2, 3]],                  // i64 columns
        &[vec![22.1, 23.5, 19.8, 22.4, 24.1, 20.0]],   // f64 columns
    )?;

    let mut g = ctx.graph(&table)?;
    let sensor = g.scan("sensor_id")?;
    let temp   = g.scan("temperature")?;

    // Celsius to Fahrenheit: temp * 1.8 + 32
    let fahrenheit = g.add(g.mul(temp, g.const_f64(1.8)?)?, g.const_f64(32.0)?)?;

    // Filter: only sensor_id == 1
    let mask = g.eq(sensor, g.const_i64(1)?)?;
    let filtered = g.filter(fahrenheit, mask)?;

    let result = g.execute(filtered)?;  // optimizer + executor run here
    for row in 0..result.nrows() {
        println!("row {}: {:.1} F", row, result.read_f64(0, row));
    }
    Ok(())
}
row 0: 71.8 F row 1: 72.3 F

Key points about the Graph API:

The Session API: SQL Strings

For most applications, you do not need the Graph API at all. The Session type wraps a Context, maintains a table registry, and accepts SQL strings. It parses, plans, optimizes, and executes in a single call.

use teide::sql::{Session, ExecResult};

fn main() -> Result<(), teide::SqlError> {
    let mut session = Session::new()?;

    session.execute("CREATE TABLE sensors (id INTEGER, location VARCHAR, installed DATE)")?;
    session.execute(
        "INSERT INTO sensors VALUES
            (1, 'roof', '2024-01-15'), (2, 'basement', '2024-03-22'),
            (3, 'roof', '2024-06-01'), (4, 'garage',   '2024-07-10')")?;

    match session.execute(
        "SELECT location, COUNT(*) AS cnt FROM sensors GROUP BY location ORDER BY cnt DESC")?
    {
        ExecResult::Query(result) => {
            for row in 0..result.nrows {
                let location = result.table.read_str(0, row);
                let count    = result.table.read_i64(1, row);
                println!("{} => {}", location, count);
            }
        }
        ExecResult::Ddl(msg) => println!("{}", msg),
    }
    Ok(())
}
roof => 2 basement => 1 garage => 1

ExecResult::Query(SqlResult) gives you the result Table, column names, row count, and embedding metadata. ExecResult::Ddl(String) returns a status message for DDL/DML. Tables persist in the session's registry until dropped or the session itself is dropped.

Reading Results

The Table type provides typed accessors for each column type and format helpers for temporal values. Use col_type() to dispatch on the type code:

fn print_result(result: &teide::sql::SqlResult) {
    for row in 0..result.nrows {
        for col in 0..result.columns.len() {
            let cell = match result.table.col_type(col) {
                6  => format!("{}", result.table.read_i64(col, row)),    // TD_I64
                7  => format!("{:.2}", result.table.read_f64(col, row)), // TD_F64
                9  => {                                                   // TD_DATE
                    let days = result.table.read_i64(col, row);
                    teide::Table::format_date(days as i32)
                }
                10 => {                                                   // TD_TIME
                    let ms = result.table.read_i64(col, row);
                    teide::Table::format_time(ms)
                }
                11 => {                                                   // TD_TIMESTAMP
                    let us = result.table.read_i64(col, row);
                    teide::Table::format_timestamp(us)
                }
                20 => result.table.read_str(col, row).to_string(),       // TD_SYM
                _  => "?".to_string(),
            };
            print!("{:>15}", cell);
        }
        println!();
    }
}

Key type codes: TD_I64 = 6, TD_F64 = 7, TD_F32 = 8, TD_DATE = 9, TD_TIME = 10, TD_TIMESTAMP = 11, TD_SYM = 20 (string/symbol). Use table.nrows() and table.ncols() to get dimensions.

Working with Embeddings

TeideDB stores vector embeddings as flat TD_F32 columns -- N rows of D-dimensional vectors packed contiguously. This gives you columnar compression benefits and zero-copy access from the C engine. On top of that, you can build HNSW indexes for approximate nearest-neighbor search.

use teide::sql::{Session, ExecResult};

fn main() -> Result<(), teide::SqlError> {
    let mut session = Session::new()?;

    session.execute("CREATE TABLE docs (id INTEGER, title VARCHAR)")?;
    session.execute(
        "INSERT INTO docs VALUES (1, 'Rust ownership'), (2, 'Graph databases'),
            (3, 'Vector search'), (4, 'Columnar storage')")?;

    // Add a 4-dimensional embedding column (flat f32 array: 4 rows * 4 dims)
    let embeddings: Vec<f32> = vec![
        0.9, 0.1, 0.0, 0.2,   // doc 1
        0.1, 0.8, 0.3, 0.1,   // doc 2
        0.2, 0.3, 0.9, 0.1,   // doc 3
        0.7, 0.1, 0.1, 0.8,   // doc 4
    ];
    session.add_embedding_column("docs", "embedding", 4, &embeddings)?;

    // Query with cosine similarity
    match session.execute(
        "SELECT title, COSINE_SIMILARITY(embedding, ARRAY[0.85, 0.15, 0.05, 0.25]) AS sim
         FROM docs")?
    {
        ExecResult::Query(r) => {
            for row in 0..r.nrows {
                println!("{:<20} {:.4}", r.table.read_str(0, row), r.table.read_f64(1, row));
            }
        }
        _ => {}
    }
    Ok(())
}
Rust ownership 0.9912 Graph databases 0.4521 Vector search 0.3187 Columnar storage 0.8436

For large collections, linear scan is too slow. Build an HNSW index for approximate nearest-neighbor search in logarithmic time:

use teide::HnswIndex;

// Build an HNSW index on the embedding column
// Parameters: table, column_index, dimension, M (neighbors), ef_construction
let stored = session.get_table("docs").unwrap();
let index = HnswIndex::build(&stored.table, 2, 4, 16, 200)?;

// Search: find 2 nearest neighbors
let query = vec![0.85f32, 0.15, 0.05, 0.25];
let results = index.search(&query, 2, 50)?;  // k=2, ef_search=50

for (row_id, distance) in &results {
    println!("row {} distance {:.4}", row_id, distance);
}

// Persist the index to disk
index.save("docs_embedding.hnsw")?;

// Later, reload it
let loaded = HnswIndex::load("docs_embedding.hnsw")?;
row 0 distance 0.0088 row 3 distance 0.1564

The HNSW index is also available via SQL DDL:

CREATE VECTOR INDEX docs_emb_idx ON docs(embedding) USING HNSW(M=16, ef_construction=200);

-- Later:
DROP VECTOR INDEX docs_emb_idx;
DML restrictions on embedding tables: Once a table has embedding columns, UPDATE is not supported and DELETE with a WHERE clause is not supported. DELETE without WHERE truncates the table but preserves embedding metadata. These restrictions exist because the C engine's filter and sort kernels operate element-wise on flat F32 arrays and are not dimension-aware. DML operations (INSERT, DELETE) on a table with vector indexes automatically drop those indexes since the underlying column data is reallocated.

The PgWire Server

Sometimes you need external tools -- psql, DBeaver, Python -- to query data your Rust service manages. TeideDB includes a PostgreSQL wire protocol server.

# Start the server on port 5433
cargo run --features server -- --port 5433

# Connect from another terminal
psql -h 127.0.0.1 -p 5433

The !Send constraint dictates the thread model: each connection gets its own OS thread with a dedicated Session, bridged to the async pgwire handler via channels:

// Simplified view of the server thread model:
//
//   tokio runtime (async)          OS thread (sync)
//  +---------------------+       +-------------------+
//  | pgwire connection   | ----> | Session           |
//  | handler             | <---- | (owns Context,    |
//  |                     | chan  |  table registry)  |
//  +---------------------+       +-------------------+
//
//  Each connection = one OS thread = one Session.
//  The Context never crosses thread boundaries.

Each connection has its own isolated table namespace. Tables created in one connection are not visible to others.

Critical Constraints

Context is !Send + !Sync

Context and Session cannot cross thread boundaries. The type system enforces this via PhantomData<*mut ()> -- trying to move a Context into tokio::spawn is a compile error.

The correct pattern for async applications: spawn a dedicated OS thread that owns the Session, and communicate via channels.

use std::sync::mpsc;
use std::thread;
use teide::sql::{Session, ExecResult};

fn spawn_db_thread() -> mpsc::Sender<(String, mpsc::Sender<String>)> {
    let (tx, rx) = mpsc::channel();
    thread::spawn(move || {
        let mut session = Session::new().expect("engine init");
        for (sql, reply): (String, mpsc::Sender<String>) in rx {
            let msg = match session.execute(&sql) {
                Ok(ExecResult::Query(r)) => format!("{} rows", r.nrows),
                Ok(ExecResult::Ddl(msg)) => msg,
                Err(e) => format!("error: {e}"),
            };
            let _ = reply.send(msg);
        }
    });
    tx
}

ENGINE_LOCK in Tests

Rust's test harness runs tests in parallel. Since the C engine's global state cannot be initialized/destroyed concurrently, you must serialize access with a mutex:

use std::sync::Mutex;
use teide::Context;

static ENGINE_LOCK: Mutex<()> = Mutex::new(());

#[test]
fn test_sensor_query() {
    let _guard = ENGINE_LOCK.lock().unwrap();
    let ctx = Context::new().unwrap();
    // ... your test code ...
    // Context drops here, but the engine singleton persists
    // until all Arc references are gone.
}

#[test]
fn test_another_query() {
    let _guard = ENGINE_LOCK.lock().unwrap();
    let ctx = Context::new().unwrap();
    // Safe: ENGINE_LOCK ensures this doesn't race with the test above.
}

Under the hood, the engine is managed via OnceLock<Mutex<Weak<EngineGuard>>>. Multiple Context handles share one Arc<EngineGuard>; the engine tears down only when the last Arc drops. These constraints are not bugs -- they are the price of zero-copy access to the C engine's thread-local memory arenas. In production, you typically have one Session per thread and no contention.

Putting It All Together

Here is a complete example: an IoT monitoring module that wraps a Session in a domain-specific struct, ingests sensor readings, and exposes typed query methods.

use teide::sql::{Session, ExecResult};

struct SensorMonitor { session: Session }

impl SensorMonitor {
    fn new() -> Result<Self, teide::SqlError> {
        let mut session = Session::new()?;
        session.execute(
            "CREATE TABLE readings (
                sensor_id INTEGER, temperature DOUBLE,
                humidity DOUBLE, ts TIMESTAMP)"
        )?;
        Ok(SensorMonitor { session })
    }

    fn ingest(&mut self, id: i64, temp: f64, hum: f64, ts: &str)
        -> Result<(), teide::SqlError>
    {
        self.session.execute(&format!(
            "INSERT INTO readings VALUES ({id}, {temp}, {hum}, '{ts}')"))?;
        Ok(())
    }

    fn avg_by_sensor(&mut self) -> Result<Vec<(i64, f64, f64)>, teide::SqlError> {
        match self.session.execute(
            "SELECT sensor_id, AVG(temperature), AVG(humidity)
             FROM readings GROUP BY sensor_id ORDER BY sensor_id")?
        {
            ExecResult::Query(r) => Ok((0..r.nrows).map(|i| (
                r.table.read_i64(0, i), r.table.read_f64(1, i), r.table.read_f64(2, i)
            )).collect()),
            _ => Ok(vec![]),
        }
    }
}

fn main() -> Result<(), teide::SqlError> {
    let mut m = SensorMonitor::new()?;
    m.ingest(1, 22.5, 45.0, "2024-08-01 10:00:00")?;
    m.ingest(1, 23.1, 44.2, "2024-08-01 10:05:00")?;
    m.ingest(2, 19.8, 62.1, "2024-08-01 10:00:00")?;
    m.ingest(2, 20.1, 61.5, "2024-08-01 10:05:00")?;

    for (sensor, temp, hum) in m.avg_by_sensor()? {
        println!("Sensor {sensor}: avg temp {temp:.1}, avg humidity {hum:.1}");
    }
    Ok(())
}
Sensor 1: avg temp 22.8, avg humidity 44.6 Sensor 2: avg temp 20.0, avg humidity 61.8

Challenges

Challenge 1: Multi-threaded ingestion pipeline. Build a system where multiple producer threads send sensor readings through an mpsc channel to a single consumer thread that owns a Session. The consumer should batch inserts (collect N readings, then execute a single multi-row INSERT). Add a query thread that periodically requests aggregated results via a separate channel. Measure throughput: how many readings per second can you sustain with batch sizes of 1, 10, 100, and 1000?
Challenge 2: Embedding search microservice. Build an HTTP endpoint (using axum or actix-web) that accepts a JSON body with a query vector and returns the top-K nearest neighbors from a TeideDB table with an HNSW index. The handler must not own the Session directly (it runs on tokio). Use the channel pattern from the "Critical Constraints" section to bridge async and sync worlds. Add an endpoint that inserts new documents with embeddings, and handle the index rebuild that insertion triggers.

What's Next