DEVELOPER

Graph API

Build CSR relationship indexes, traverse neighbors, find shortest paths, and run worst-case optimal joins via the Rust Graph API.

Teide's Graph API operates on the same lazy DAG as all other operations. You build a Rel (compressed sparse row index) from table columns, then call traversal operations on a Graph. Everything is optimized and executed together when you call execute().

Rel — Building the Index

A Rel is a CSR-encoded adjacency index between two sets of nodes. There are two ways to build one:

From a foreign key column

Use Rel::build() when one table has a column of I64 row offsets pointing into a target table:

use teide::{Context, Rel};

let ctx = Context::new()?;
let edges = ctx.read_csv("edges.csv")?;   // columns: src, dst, weight

// Build a CSR from the "dst" foreign key column.
// n_target_nodes = number of rows in the target table.
let rel = Rel::build(&edges, "dst", 1000, true)?;

Parameter	Type	Description
`from_table`	`&Table`	Table containing the foreign key column
`fk_col`	`&str`	Name of the I64 column with target row offsets
`n_target_nodes`	`i64`	Number of rows in the target table (upper bound on IDs)
`sort`	`bool`	Sort target lists for deterministic traversal order

From an edge table

Use Rel::from_edges() when you have explicit source/destination columns:

let edges = ctx.read_csv("friendships.csv")?;  // columns: person_a, person_b

let rel = Rel::from_edges(
    &edges,
    "person_a",   // source column
    "person_b",   // destination column
    1000,         // number of source nodes
    1000,         // number of destination nodes
    true,         // sort targets
)?;

Parameter	Type	Description
`edge_table`	`&Table`	Table with source and destination columns
`src_col`	`&str`	Name of the I64 source node column
`dst_col`	`&str`	Name of the I64 destination node column
`n_src`	`i64`	Number of source nodes
`n_dst`	`i64`	Number of destination nodes
`sort`	`bool`	Sort target lists for deterministic traversal order

Rel — Persistence

Relationship indexes can be saved to disk and loaded back, avoiding the cost of rebuilding from raw data:

// Save to disk
rel.save("/data/indexes/friends")?;

// Load from disk (full copy into memory)
let rel = Rel::load("/data/indexes/friends")?;

// Memory-map from disk (zero-copy, pages on demand)
let rel = Rel::mmap("/data/indexes/friends")?;

Method	Description
`save(dir)`	Serialize the CSR to the given directory
`load(dir)`	Deserialize from disk into memory
`mmap(dir)`	Memory-map from disk (zero-copy, OS pages data on demand)

Tip: Use mmap for large relationship indexes. The OS will page in only the portions accessed during traversal, keeping memory usage proportional to query scope rather than index size.

Direction Constants

Traversal operations accept a direction parameter controlling which edges are followed:

Constant	Value	Meaning
`TD_DIR_FWD`	`0`	Follow forward edges only (src → dst)
`TD_DIR_REV`	`1`	Follow reverse edges only (dst → src)
`TD_DIR_BOTH`	`2`	Follow edges in both directions

use teide::ffi::{TD_DIR_FWD, TD_DIR_REV, TD_DIR_BOTH};

Multi-Table Support

A Graph is created from one primary table, but you can register additional tables and scan their columns in the same computation:

let ctx = Context::new()?;
let people = ctx.read_csv("people.csv")?;
let posts = ctx.read_csv("posts.csv")?;

let mut g = ctx.graph(&people)?;

// Register the posts table; get back a table ID
let posts_id = g.add_table(&posts);

// Scan columns from the secondary table
let post_title = g.scan_table(posts_id, "title")?;
let post_author = g.scan_table(posts_id, "author_id")?;

Method	Returns	Description
`add_table(&table)`	`u16`	Register a table and return its ID for use with `scan_table`
`scan_table(id, col)`	`Column`	Scan a named column from a registered table

Graph Traversal

expand — 1-hop neighbor expansion

Expand each source node to its immediate neighbors via the CSR index:

let ctx = Context::new()?;
let people = ctx.read_csv("people.csv")?;
let edges = ctx.read_csv("follows.csv")?;

let rel = Rel::from_edges(&edges, "src", "dst", 1000, 1000, true)?;

let mut g = ctx.graph(&people)?;
let src = g.scan("id")?;
let neighbors = g.expand(src, &rel, TD_DIR_FWD)?;
let result = g.execute(neighbors)?;

Parameter	Type	Description
`src_nodes`	`Column`	I64 column of source node IDs
`rel`	`&Rel`	CSR relationship index
`direction`	`u8`	`TD_DIR_FWD`, `TD_DIR_REV`, or `TD_DIR_BOTH`

Output columns:

Column	Type	Description
`_src`	I64	Source node ID
`_dst`	I64	Destination node ID (neighbor)

var_expand — Variable-length BFS

Breadth-first traversal from start nodes, expanding between min_depth and max_depth hops:

let start = g.scan("id")?;
let reached = g.var_expand(
    start,
    &rel,
    TD_DIR_FWD,
    1,     // min_depth
    3,     // max_depth
    false, // track_path
)?;
let result = g.execute(reached)?;

Parameter	Type	Description
`start`	`Column`	I64 column of starting node IDs
`rel`	`&Rel`	CSR relationship index
`direction`	`u8`	Edge direction
`min_depth`	`u8`	Minimum number of hops (inclusive)
`max_depth`	`u8`	Maximum number of hops (inclusive)
`track_path`	`bool`	Whether to record full traversal paths

Output columns:

Column	Type	Description
`_start`	I64	Starting node ID
`_end`	I64	Reached node ID
`_depth`	I64	Number of hops from start to end

shortest_path — BFS shortest path

Find the shortest path between source and destination nodes via BFS:

let src = g.const_i64(0)?;   // start node
let dst = g.const_i64(42)?;  // target node
let path = g.shortest_path(src, dst, &rel, 10)?;
let result = g.execute(path)?;

Parameter	Type	Description
`src`	`Column`	Source node ID(s)
`dst`	`Column`	Destination node ID(s)
`rel`	`&Rel`	CSR relationship index
`max_depth`	`u8`	Maximum search depth

Output columns:

Column	Type	Description
`_node`	I64	Node ID along the path
`_depth`	I64	Hop distance from the source

wco_join — Worst-case optimal join

Leapfrog Triejoin across multiple relationships. Finds all n-cliques or multi-way join results in worst-case optimal time:

// Find triangles: nodes (a, b, c) where a->b, b->c, a->c
let rel_ab = Rel::from_edges(&edges, "src", "dst", n, n, true)?;
let rel_bc = Rel::from_edges(&edges, "src", "dst", n, n, true)?;
let rel_ac = Rel::from_edges(&edges, "src", "dst", n, n, true)?;

let triangles = g.wco_join(&[&rel_ab, &rel_bc, &rel_ac], 3)?;
let result = g.execute(triangles)?;

Parameter	Type	Description
`rels`	`&[&Rel]`	Slice of relationship indexes to join
`n_vars`	`u8`	Number of join variables (output columns)

Output columns:

Column	Type	Description
`_v0`, `_v1`, ..., `_vN`	I64	Node IDs for each join variable

Constrained Traversal

There is no single "constrained traversal" primitive. Instead, you compose constraints by combining the Rel (structural), direction, depth bounds, and filter (predicate). All constraints are part of the same lazy DAG and are optimized together at execute() time.

Constraint types

Constraint	Mechanism	Applied
Which edges exist	`Rel::build` / `Rel::from_edges`	Before traversal (index construction)
Edge direction	`direction` parameter	At traversal time
Hop depth	`min_depth` / `max_depth`	At traversal (`var_expand`, `shortest_path`)
Source node properties	`filter(col, predicate)` before traversal	Pre-traversal
Result properties	`filter(col, predicate)` after traversal	Post-traversal
Structural patterns	`wco_join` (Leapfrog Triejoin)	Multi-way pattern matching

Pre-traversal filtering

Constrain which nodes start the traversal by filtering source IDs with a predicate before passing them to expand or var_expand:

let g = ctx.graph(&people)?;

// Scan columns from the people table
let id  = g.scan("id")?;
let age = g.scan("age")?;

// Constraint: only start from people older than 30
let threshold = g.const_i64(30)?;
let pred      = g.gt(age, threshold)?;
let old_ids   = g.filter(id, pred)?;

// Traverse 1 hop from only those constrained start nodes
let neighbors = g.expand(old_ids, &rel, TD_DIR_FWD)?;
let result    = g.execute(neighbors)?;

Multiple predicates can be composed with and / or:

let age  = g.scan("age")?;
let city = g.scan("city")?;

// age > 30 AND city = "NYC"
let age_pred  = g.gt(age, g.const_i64(30)?)?;
let city_pred = g.eq(city, g.const_str("NYC")?)?;
let combined  = g.and(age_pred, city_pred)?;
let start_ids = g.filter(g.scan("id")?, combined)?;

Depth-constrained traversal

Use min_depth and max_depth on var_expand to bound how far the BFS goes:

// Exactly 2 hops (friends-of-friends, excluding direct friends)
let fof = g.var_expand(start, &rel, TD_DIR_FWD, 2, 2, false)?;

// Reachable within 1–5 hops in either direction
let reachable = g.var_expand(start, &rel, TD_DIR_BOTH, 1, 5, false)?;

// Shortest path with a maximum depth of 10
let path = g.shortest_path(src, dst, &rel, 10)?;

Post-traversal filtering

Constrain traversal results by executing the traversal first, then building a new graph on the result table to apply property predicates on destination nodes:

// Step 1: execute the traversal
let g = ctx.graph(&people)?;
let start = g.scan("id")?;
let reached = g.var_expand(start, &rel, TD_DIR_FWD, 1, 3, false)?;
let traversal_result = g.execute(reached)?;
// traversal_result columns: _start, _end, _depth

// Step 2: build a new graph on the result to filter by depth
let g2 = ctx.graph(&traversal_result)?;
let depth = g2.scan("_depth")?;
let pred  = g2.le(depth, g2.const_i64(2)?)?;

// Project only the columns we need, filtered
let start_col = g2.filter(g2.scan("_start")?, pred)?;
let end_col   = g2.filter(g2.scan("_end")?, pred)?;
let depth_col = g2.filter(depth, pred)?;
let projected = g2.select(start_col, &[start_col, end_col, depth_col])?;
let filtered  = g2.execute(projected)?;

Enriching results with destination properties

After traversal, use add_table and scan_table to look up properties of destination nodes. Since _dst / _end values are row offsets into the target table, you can use them to index into that table's columns:

// After expand: result has _src, _dst (row offsets into people)
let g = ctx.graph(&people)?;
let start = g.scan("id")?;
let neighbors = g.expand(start, &rel, TD_DIR_FWD)?;
let result = g.execute(neighbors)?;

// Build a new graph to enrich _dst with people properties
let g2 = ctx.graph(&result)?;
let people_id = g2.add_table(&people);
let dst_name = g2.scan_table(people_id, "name")?;   // look up name
let dst_age  = g2.scan_table(people_id, "age")?;     // look up age

// Filter: only neighbors younger than 25
let pred = g2.lt(dst_age, g2.const_i64(25)?)?;
let dst  = g2.filter(g2.scan("_dst")?, pred)?;
let enriched = g2.execute(dst)?;

Combining all constraints

The full pattern: pre-filter source nodes, traverse with depth bounds, then post-filter results.

use teide::{Context, Rel};
use teide::ffi::TD_DIR_FWD;

let ctx = Context::new()?;
let people  = ctx.read_csv("people.csv")?;    // id, name, city, age
let follows = ctx.read_csv("follows.csv")?;   // src, dst
let n = people.nrows() as i64;

// Structural constraint: build the CSR index
let rel = Rel::from_edges(&follows, "src", "dst", n, n, true)?;

let g = ctx.graph(&people)?;
let id   = g.scan("id")?;
let city = g.scan("city")?;

// Pre-traversal constraint: start from NYC residents only
let nyc_pred = g.eq(city, g.const_str("NYC")?)?;
let nyc_ids  = g.filter(id, nyc_pred)?;

// Depth + direction constraints: 1–3 hops, forward only
let reached = g.var_expand(nyc_ids, &rel, TD_DIR_FWD, 1, 3, false)?;

// Execute the constrained traversal
let result = g.execute(reached)?;
// result columns: _start, _end, _depth

Tip: Because the graph is a lazy DAG, pre-traversal filters are composed into the same execution plan as the traversal itself. The optimizer can push predicates down, fuse operators, and eliminate dead code — so building the constraint in multiple steps has no performance penalty compared to a monolithic operation.

Complete Example

End-to-end social network scenario: load data, build a relationship index, find friends-of-friends reachable from users in a specific city, and read the results.

use teide::{Context, Rel};
use teide::ffi::TD_DIR_FWD;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let ctx = Context::new()?;

    // Load a people table and an edge table
    let people  = ctx.read_csv("people.csv")?;    // id, name, city, age
    let follows = ctx.read_csv("follows.csv")?;   // src, dst

    let n = people.nrows() as i64;

    // Build a CSR relationship index from the edge table
    let rel = Rel::from_edges(&follows, "src", "dst", n, n, true)?;

    // Optionally persist for reuse
    rel.save("/tmp/follows_rel")?;

    // Create a graph rooted on the people table
    let g = ctx.graph(&people)?;

    // Pre-traversal constraint: only start from NYC residents
    let id   = g.scan("id")?;
    let city = g.scan("city")?;
    let pred = g.eq(city, g.const_str("NYC")?)?;
    let nyc_ids = g.filter(id, pred)?;

    // Depth-constrained BFS: friends-of-friends (exactly 2 hops)
    let fof = g.var_expand(nyc_ids, &rel, TD_DIR_FWD, 2, 2, false)?;

    // Execute the computation graph
    let result = g.execute(fof)?;

    // Read the output columns
    let nrows = result.nrows();
    println!("Found {} friend-of-friend pairs from NYC", nrows);
    for i in 0..nrows.min(10) {
        let start = result.get_i64(0, i as usize).unwrap();
        let end   = result.get_i64(1, i as usize).unwrap();
        let depth = result.get_i64(2, i as usize).unwrap();
        println!("  {} -> {} (depth {})", start, end, depth);
    }

    Ok(())
}

API Reference Summary

Rel methods

Method	Description
`Rel::build(table, fk_col, n_target, sort)`	Build CSR from a foreign key column
`Rel::from_edges(table, src, dst, n_src, n_dst, sort)`	Build CSR from an edge table
`Rel::load(dir)`	Load CSR from disk
`Rel::mmap(dir)`	Memory-map CSR from disk
`rel.save(dir)`	Save CSR to disk

Graph traversal methods

Method	Output columns	Description
`g.expand(src, rel, dir)`	`_src`, `_dst`	1-hop neighbor expansion
`g.var_expand(start, rel, dir, min, max, path)`	`_start`, `_end`, `_depth`	Variable-length BFS
`g.shortest_path(src, dst, rel, max)`	`_node`, `_depth`	BFS shortest path
`g.wco_join(rels, n_vars)`	`_v0`, `_v1`, ...	Leapfrog Triejoin

Multi-table methods

Method	Returns	Description
`g.add_table(table)`	`u16`	Register a secondary table
`g.scan_table(id, col)`	`Column`	Scan a column from a registered table