Getting Started: From CSV to Insights in 5 Minutes
You have data. You need answers. No setup required.
It is 4:30 PM on a Thursday. Your VP just pinged you: "Can you pull together headcount and salary stats by department for the leadership sync at 5?" You have a CSV export from the HR system sitting in your downloads folder. You need a tool that gets out of your way and lets you think in SQL.
This guide walks you through a realistic workflow -- from raw data to polished summaries -- using nothing but TeideDB's SQL engine. By the end, you will know how to:
- Create tables and insert data (including dates and booleans)
- Filter rows with
WHERE,BETWEEN,IN, andLIKE - Aggregate with
GROUP BY,HAVING, and common aggregate functions - Rank within groups using window functions
- Derive new tables with
CREATE TABLE AS SELECT - Evaluate constant expressions without a
FROMclause
Estimated time: about 10 minutes.
The Setup
Let's start by getting some data into TeideDB. In a real workflow you might use
read_csv() to load a file directly, but here we will create the table by hand
so you can see the full DDL syntax. Our dataset is a 15-row employee roster -- small
enough to reason about, large enough to make aggregations interesting.
Fire up the TeideDB REPL (cargo run --features cli) or connect via
psql to a running PgWire server, and run the following:
CREATE TABLE employees (
id INTEGER, name VARCHAR, dept VARCHAR, title VARCHAR,
salary REAL, hire_date DATE, active BOOLEAN
);
INSERT INTO employees VALUES
(1, 'Alice Chen', 'engineering', 'Senior Engineer', 125000.0, '2021-03-15', true),
(2, 'Bob Martinez', 'engineering', 'Staff Engineer', 145000.0, '2019-08-01', true),
(3, 'Carol White', 'engineering', 'Junior Engineer', 85000.0, '2023-11-20', true),
(4, 'Dave Kim', 'marketing', 'Marketing Manager', 95000.0, '2020-06-10', true),
(5, 'Eve Johnson', 'marketing', 'Content Lead', 82000.0, '2022-01-05', true),
(6, 'Frank Brown', 'marketing', 'SEO Specialist', 72000.0, '2024-02-14', true),
(7, 'Grace Liu', 'sales', 'Sales Director', 130000.0, '2018-11-30', true),
(8, 'Heidi Park', 'sales', 'Account Executive', 88000.0, '2022-07-22', true),
(9, 'Ivan Torres', 'sales', 'SDR', 65000.0, '2024-08-01', true),
(10, 'Judy Adams', 'product', 'Product Manager', 115000.0, '2020-04-15', true),
(11, 'Karl Weber', 'product', 'UX Designer', 92000.0, '2021-09-01', true),
(12, 'Lisa Patel', 'product', 'Data Analyst', 88000.0, '2023-03-12', true),
(13, 'Mike Ross', 'engineering', 'DevOps Engineer', 110000.0, '2022-05-18', true),
(14, 'Nina Chang', 'engineering', 'ML Engineer', 135000.0, '2021-01-10', true),
(15, 'Oscar Diaz', 'marketing', 'VP Marketing', 155000.0, '2017-12-01', false);
Quick sanity check -- make sure all 15 rows landed:
SELECT COUNT(*) AS total FROM employees;
Good. We are in business.
Exploring the Data
Before diving into analysis, it pays to get a feel for the shape of the data. What departments exist? Who earns the most? A few quick queries will orient you.
First, let's see what departments we are working with:
SELECT DISTINCT dept FROM employees ORDER BY dept;
Four departments. Now let's look at the highest-paid employees:
SELECT name, dept, salary
FROM employees
ORDER BY salary DESC
LIMIT 5;
Engineering dominates the top of the salary list, but the single highest earner is actually in marketing. Interesting -- though Oscar is flagged as inactive. Let's keep that in mind.
Filtering with WHERE
Raw data is rarely what you need. The real value comes from slicing it. TeideDB supports the full set of SQL filter predicates, so you can be as precise as you want.
Engineers earning over $100k:
SELECT name, title, salary
FROM employees
WHERE dept = 'engineering' AND salary > 100000
ORDER BY salary DESC;
Four of the five engineers clear six figures. Not bad. What about everyone on the marketing team?
SELECT name, title, salary
FROM employees
WHERE dept IN ('marketing')
ORDER BY salary DESC;
Now let's find active employees hired after 2023 -- your recent hires:
SELECT name, dept, hire_date
FROM employees
WHERE active = true AND hire_date > CAST('2023-01-01' AS DATE)
ORDER BY hire_date;
Four people hired in the last year and change, spread across all four departments.
You can also use BETWEEN for range queries. Employees in the $80k--$100k band:
SELECT name, salary
FROM employees
WHERE salary BETWEEN 80000 AND 100000
ORDER BY salary DESC;
Try it yourself: Write a query that finds all employees whose name
contains "an" (case-sensitive). Hint: use LIKE '%an%'. You should find
Nina Chang, Ivan Torres, and Frank Brown.
Answering Business Questions
This is where things get useful. Your VP does not want raw rows -- she wants summaries.
How big is each team? What does compensation look like? GROUP BY with
aggregate functions is the tool for the job.
Headcount and salary statistics by department:
SELECT
dept,
COUNT(*) AS headcount,
AVG(salary) AS avg_salary,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary
FROM employees
WHERE active = true
GROUP BY dept
ORDER BY avg_salary DESC;
Engineering is both the largest team and the highest-paid on average. Marketing is the smallest active team (Oscar is inactive) and has the lowest average salary. That is the kind of insight that drives a conversation.
What if you only want departments where the average salary exceeds $90k? Use
HAVING to filter after aggregation:
SELECT dept, AVG(salary) AS avg_salary
FROM employees
WHERE active = true
GROUP BY dept
HAVING AVG(salary) > 90000
ORDER BY avg_salary DESC;
Marketing drops off. Three departments clear the $90k bar.
Ranking and Comparing
Aggregates tell you about groups. Window functions tell you about individuals within groups. They are one of the most powerful features in SQL, and TeideDB supports them fully.
Let's rank each employee within their department by salary:
SELECT
name, dept, salary,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rank
FROM employees
WHERE active = true
ORDER BY dept, rank;
Now you can instantly see who the top earner is in every department. But sometimes
you want to know how much more the top earner makes compared to the next person.
LAG lets you reference the previous row:
SELECT
name, dept, salary,
LAG(salary) OVER (PARTITION BY dept ORDER BY salary DESC) AS prev_salary,
salary - LAG(salary) OVER (PARTITION BY dept ORDER BY salary DESC) AS gap
FROM employees
WHERE active = true
ORDER BY dept, salary DESC;
The biggest salary gap within a department? Sales: $42k separates Grace Liu from Heidi Park. That might be worth flagging for the compensation review.
Try it yourself: Use SUM(salary) OVER (PARTITION BY dept ORDER BY
salary DESC) to compute a running total of salaries within each department. The
last row in each partition should equal the department's total salary spend.
Deriving New Tables
When you find a query you keep running, materialize it. CREATE TABLE AS
SELECT captures a result set as a new table you can query independently. This
is great for dashboards, exports, or just keeping your workspace tidy.
CREATE TABLE dept_summary AS
SELECT
dept,
COUNT(*) AS headcount,
AVG(salary) AS avg_salary,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary
FROM employees
WHERE active = true
GROUP BY dept;
Now you can query the summary directly:
SELECT * FROM dept_summary ORDER BY avg_salary DESC;
This table is a first-class citizen. You can join it back to the original
employees table, filter it, or export it. It persists for the lifetime of
your session.
Quick Calculations
Sometimes you just need a calculator. TeideDB evaluates constant expressions without
requiring a FROM clause -- handy for quick math, string operations, or
type casting.
SELECT 1 + 1;
SELECT 'foo' || 'bar';
SELECT CAST('2025-01-15' AS DATE);
No table, no ceremony. Just results.
What's Next
You now know the core SQL workflow: create, insert, filter, aggregate, window, derive. That covers a surprising amount of day-to-day analytical work. But TeideDB has much more to offer.
- Joins, Windows & CTEs -- multi-table joins, advanced window frames, and common table expressions for complex queries.
- Working with Time -- date arithmetic, timestamp handling, and temporal filtering patterns.
- Graph Queries -- model relationships with SQL/PGQ, run MATCH patterns, and find shortest paths.
- Vector Search -- store embeddings, compute similarity, and build HNSW indexes for fast nearest-neighbor search.
- SQL Reference -- complete syntax documentation for every statement and function TeideDB supports.