DEVELOPER
Architecture
How Teide is structured: the C17 core, SQL pipeline, execution model, and memory system.
Overview
Teide is a layered system:
- A zero-dependency C17 columnar engine at the core
- Rust FFI bindings providing safe RAII wrappers
- A SQL frontend (parser, planner, executor)
- REPL and PgWire server as frontend binaries
Project Structure
teide-rs/
src/
lib.rs -- Library root
ffi.rs -- Raw C FFI bindings
engine.rs -- Safe RAII wrappers (Context, Table, Graph, Column)
sql/
mod.rs -- Session, ExecResult
planner.rs -- Query planning (127KB)
expr.rs -- Expression planning (59KB)
cli/ -- REPL binary
server/ -- PgWire server binary
build.rs -- C core vendoring & compilation
SQL Pipeline
SQL text
|
v
+---------+ +----------+ +----------+ +---------+
| Parse | --> | Plan | --> | Optimize | --> | Execute |
|(sqlparser) |(planner.rs) |(C engine) | |(C engine)|
+---------+ +----------+ +----------+ +---------+
- Parse: SQL text is parsed using the
sqlparsercrate with DuckDB dialect - Plan: The Rust planner walks the AST and builds a Teide operation graph (DAG)
- Optimize: The C engine fuses element-wise operations and optimizes the DAG
- Execute: The C engine processes data in parallel morsels
C17 Core
The core engine is written in C17 with zero external dependencies:
- Self-contained buddy allocator with slab caching
- Thread pool with morsel-based task distribution
- Columnar data structures with type-tagged 32-byte block headers
- Copy-on-write semantics for safe concurrent reads
Operation Graph
Computations are expressed as a directed acyclic graph (DAG) of operations:
- Source ops: Scan columns, constants
- Element-wise ops: Arithmetic, comparison, string ops (fuseable into single passes)
- Reductions: SUM, AVG, MIN, MAX, COUNT (pipeline breakers)
- Structural ops: Filter, sort, group, join, window (pipeline breakers)
The optimizer fuses adjacent element-wise operations to minimize memory traffic and maximize cache utilization.
Morsel-Based Execution
Data is processed in morsels of 1024 elements:
- Below 65,536 rows: single-threaded execution
- Above 65,536 rows: automatic parallelization across worker threads
- 8 morsels dispatched concurrently per thread
- Work-stealing between threads for load balancing
This approach provides excellent cache locality (morsels fit in L1/L2 cache) while scaling to many cores.
Memory Model
- Buddy allocator: Orders 5-30 (32 bytes to 1 GB blocks)
- Slab cache: 64 size classes for fast small allocations
- Thread-local heaps: Each worker thread has its own heap (reduces contention)
- Reference counting: Atomic ref counts enable safe sharing and copy-on-write
The .mem REPL command shows detailed allocation statistics including arena usage, direct allocations, slab hit rates, and peak memory.
Storage Formats
| Format | Access | Use Case |
|---|---|---|
| CSV | Full copy | Import/export, ad-hoc analysis |
| Splayed | Zero-copy mmap | Single tables, fast column access |
| Partitioned | Zero-copy mmap | Time-series data with date partitions |
Build Process
The build script (build.rs) handles C core integration:
- Checks for local
vendor/teide/directory - Falls back to
git clone --depth=1for crates.io installs - Compiles all C source with C17 standard, O3 optimization
- Links with libm and pthread on Linux/macOS
- Embeds git commit hash for version display