GUIDES

Getting Started: From CSV to Insights in 5 Minutes

You have data. You need answers. No setup required.

It is 4:30 PM on a Thursday. Your VP just pinged you: "Can you pull together headcount and salary stats by department for the leadership sync at 5?" You have a CSV export from the HR system sitting in your downloads folder. You need a tool that gets out of your way and lets you think in SQL.

This guide walks you through a realistic workflow -- from raw data to polished summaries -- using nothing but TeideDB's SQL engine. By the end, you will know how to:

Create tables and insert data (including dates and booleans)
Filter rows with WHERE, BETWEEN, IN, and LIKE
Aggregate with GROUP BY, HAVING, and common aggregate functions
Rank within groups using window functions
Derive new tables with CREATE TABLE AS SELECT
Evaluate constant expressions without a FROM clause

Estimated time: about 10 minutes.

The Setup

Let's start by getting some data into TeideDB. In a real workflow you might use read_csv() to load a file directly, but here we will create the table by hand so you can see the full DDL syntax. Our dataset is a 15-row employee roster -- small enough to reason about, large enough to make aggregations interesting.

Fire up the TeideDB REPL (cargo run --features cli) or connect via psql to a running PgWire server, and run the following:

CREATE TABLE employees (
  id INTEGER, name VARCHAR, dept VARCHAR, title VARCHAR,
  salary REAL, hire_date DATE, active BOOLEAN
);

INSERT INTO employees VALUES
  (1,  'Alice Chen',     'engineering', 'Senior Engineer',    125000.0, '2021-03-15', true),
  (2,  'Bob Martinez',   'engineering', 'Staff Engineer',     145000.0, '2019-08-01', true),
  (3,  'Carol White',    'engineering', 'Junior Engineer',     85000.0, '2023-11-20', true),
  (4,  'Dave Kim',       'marketing',   'Marketing Manager',   95000.0, '2020-06-10', true),
  (5,  'Eve Johnson',    'marketing',   'Content Lead',        82000.0, '2022-01-05', true),
  (6,  'Frank Brown',    'marketing',   'SEO Specialist',      72000.0, '2024-02-14', true),
  (7,  'Grace Liu',      'sales',       'Sales Director',     130000.0, '2018-11-30', true),
  (8,  'Heidi Park',     'sales',       'Account Executive',   88000.0, '2022-07-22', true),
  (9,  'Ivan Torres',    'sales',       'SDR',                 65000.0, '2024-08-01', true),
  (10, 'Judy Adams',     'product',     'Product Manager',    115000.0, '2020-04-15', true),
  (11, 'Karl Weber',     'product',     'UX Designer',         92000.0, '2021-09-01', true),
  (12, 'Lisa Patel',     'product',     'Data Analyst',        88000.0, '2023-03-12', true),
  (13, 'Mike Ross',      'engineering', 'DevOps Engineer',    110000.0, '2022-05-18', true),
  (14, 'Nina Chang',     'engineering', 'ML Engineer',        135000.0, '2021-01-10', true),
  (15, 'Oscar Diaz',     'marketing',   'VP Marketing',       155000.0, '2017-12-01', false);

Quick sanity check -- make sure all 15 rows landed:

SELECT COUNT(*) AS total FROM employees;

total 15

Good. We are in business.

Exploring the Data

Before diving into analysis, it pays to get a feel for the shape of the data. What departments exist? Who earns the most? A few quick queries will orient you.

First, let's see what departments we are working with:

SELECT DISTINCT dept FROM employees ORDER BY dept;

dept engineering marketing product sales

Four departments. Now let's look at the highest-paid employees:

SELECT name, dept, salary
FROM employees
ORDER BY salary DESC
LIMIT 5;

name dept salary Oscar Diaz marketing 155000.0 Bob Martinez engineering 145000.0 Nina Chang engineering 135000.0 Grace Liu sales 130000.0 Alice Chen engineering 125000.0

Engineering dominates the top of the salary list, but the single highest earner is actually in marketing. Interesting -- though Oscar is flagged as inactive. Let's keep that in mind.

Filtering with WHERE

Raw data is rarely what you need. The real value comes from slicing it. TeideDB supports the full set of SQL filter predicates, so you can be as precise as you want.

Engineers earning over $100k:

SELECT name, title, salary
FROM employees
WHERE dept = 'engineering' AND salary > 100000
ORDER BY salary DESC;

name title salary Bob Martinez Staff Engineer 145000.0 Nina Chang ML Engineer 135000.0 Alice Chen Senior Engineer 125000.0 Mike Ross DevOps Engineer 110000.0

Four of the five engineers clear six figures. Not bad. What about everyone on the marketing team?

SELECT name, title, salary
FROM employees
WHERE dept IN ('marketing')
ORDER BY salary DESC;

name title salary Oscar Diaz VP Marketing 155000.0 Dave Kim Marketing Manager 95000.0 Eve Johnson Content Lead 82000.0 Frank Brown SEO Specialist 72000.0

Now let's find active employees hired after 2023 -- your recent hires:

SELECT name, dept, hire_date
FROM employees
WHERE active = true AND hire_date > CAST('2023-01-01' AS DATE)
ORDER BY hire_date;

name dept hire_date Lisa Patel product 2023-03-12 Carol White engineering 2023-11-20 Frank Brown marketing 2024-02-14 Ivan Torres sales 2024-08-01

Four people hired in the last year and change, spread across all four departments.

You can also use BETWEEN for range queries. Employees in the $80k--$100k band:

SELECT name, salary
FROM employees
WHERE salary BETWEEN 80000 AND 100000
ORDER BY salary DESC;

name salary Dave Kim 95000.0 Karl Weber 92000.0 Heidi Park 88000.0 Lisa Patel 88000.0 Carol White 85000.0 Eve Johnson 82000.0

Try it yourself: Write a query that finds all employees whose name contains "an" (case-sensitive). Hint: use LIKE '%an%'. You should find Nina Chang, Ivan Torres, and Frank Brown.

Answering Business Questions

This is where things get useful. Your VP does not want raw rows -- she wants summaries. How big is each team? What does compensation look like? GROUP BY with aggregate functions is the tool for the job.

Headcount and salary statistics by department:

SELECT
  dept,
  COUNT(*) AS headcount,
  AVG(salary) AS avg_salary,
  MIN(salary) AS min_salary,
  MAX(salary) AS max_salary
FROM employees
WHERE active = true
GROUP BY dept
ORDER BY avg_salary DESC;

dept headcount avg_salary min_salary max_salary engineering 5 120000.0 85000.0 145000.0 product 3 98333.33333333333 88000.0 115000.0 sales 3 94333.33333333333 65000.0 130000.0 marketing 3 83000.0 72000.0 95000.0

Engineering is both the largest team and the highest-paid on average. Marketing is the smallest active team (Oscar is inactive) and has the lowest average salary. That is the kind of insight that drives a conversation.

What if you only want departments where the average salary exceeds $90k? Use HAVING to filter after aggregation:

SELECT dept, AVG(salary) AS avg_salary
FROM employees
WHERE active = true
GROUP BY dept
HAVING AVG(salary) > 90000
ORDER BY avg_salary DESC;

dept avg_salary engineering 120000.0 product 98333.33333333333 sales 94333.33333333333

Marketing drops off. Three departments clear the $90k bar.

Ranking and Comparing

Aggregates tell you about groups. Window functions tell you about individuals within groups. They are one of the most powerful features in SQL, and TeideDB supports them fully.

Let's rank each employee within their department by salary:

SELECT
  name, dept, salary,
  ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rank
FROM employees
WHERE active = true
ORDER BY dept, rank;

name dept salary rank Bob Martinez engineering 145000.0 1 Nina Chang engineering 135000.0 2 Alice Chen engineering 125000.0 3 Mike Ross engineering 110000.0 4 Carol White engineering 85000.0 5 Dave Kim marketing 95000.0 1 Eve Johnson marketing 82000.0 2 Frank Brown marketing 72000.0 3 Judy Adams product 115000.0 1 Karl Weber product 92000.0 2 Lisa Patel product 88000.0 3 Grace Liu sales 130000.0 1 Heidi Park sales 88000.0 2 Ivan Torres sales 65000.0 3

Now you can instantly see who the top earner is in every department. But sometimes you want to know how much more the top earner makes compared to the next person. LAG lets you reference the previous row:

SELECT
  name, dept, salary,
  LAG(salary) OVER (PARTITION BY dept ORDER BY salary DESC) AS prev_salary,
  salary - LAG(salary) OVER (PARTITION BY dept ORDER BY salary DESC) AS gap
FROM employees
WHERE active = true
ORDER BY dept, salary DESC;

name dept salary prev_salary gap Bob Martinez engineering 145000.0 Nina Chang engineering 135000.0 145000.0 -10000.0 Alice Chen engineering 125000.0 135000.0 -10000.0 Mike Ross engineering 110000.0 125000.0 -15000.0 Carol White engineering 85000.0 110000.0 -25000.0 Dave Kim marketing 95000.0 Eve Johnson marketing 82000.0 95000.0 -13000.0 Frank Brown marketing 72000.0 82000.0 -10000.0 Judy Adams product 115000.0 Karl Weber product 92000.0 115000.0 -23000.0 Lisa Patel product 88000.0 92000.0 -4000.0 Grace Liu sales 130000.0 Heidi Park sales 88000.0 130000.0 -42000.0 Ivan Torres sales 65000.0 88000.0 -23000.0

The biggest salary gap within a department? Sales: $42k separates Grace Liu from Heidi Park. That might be worth flagging for the compensation review.

Try it yourself: Use SUM(salary) OVER (PARTITION BY dept ORDER BY salary DESC) to compute a running total of salaries within each department. The last row in each partition should equal the department's total salary spend.

Deriving New Tables

When you find a query you keep running, materialize it. CREATE TABLE AS SELECT captures a result set as a new table you can query independently. This is great for dashboards, exports, or just keeping your workspace tidy.

CREATE TABLE dept_summary AS
SELECT
  dept,
  COUNT(*) AS headcount,
  AVG(salary) AS avg_salary,
  MIN(salary) AS min_salary,
  MAX(salary) AS max_salary
FROM employees
WHERE active = true
GROUP BY dept;

Now you can query the summary directly:

SELECT * FROM dept_summary ORDER BY avg_salary DESC;

This table is a first-class citizen. You can join it back to the original employees table, filter it, or export it. It persists for the lifetime of your session.

Quick Calculations

Sometimes you just need a calculator. TeideDB evaluates constant expressions without requiring a FROM clause -- handy for quick math, string operations, or type casting.

SELECT 1 + 1;

(1 + 1) 2

SELECT 'foo' || 'bar';

('foo' || 'bar') foobar

SELECT CAST('2025-01-15' AS DATE);

CAST('2025-01-15' AS DATE) 2025-01-15

No table, no ceremony. Just results.

What's Next

You now know the core SQL workflow: create, insert, filter, aggregate, window, derive. That covers a surprising amount of day-to-day analytical work. But TeideDB has much more to offer.

Joins, Windows & CTEs -- multi-table joins, advanced window frames, and common table expressions for complex queries.
Working with Time -- date arithmetic, timestamp handling, and temporal filtering patterns.
Graph Queries -- model relationships with SQL/PGQ, run MATCH patterns, and find shortest paths.
Vector Search -- store embeddings, compute similarity, and build HNSW indexes for fast nearest-neighbor search.
SQL Reference -- complete syntax documentation for every statement and function TeideDB supports.