GUIDES

Graph Queries with SQL/PGQ: Finding Patterns in Connected Data

Following the money is harder than it looks. SQL/PGQ makes it possible.

You are a fraud analyst at a fintech company. A compliance alert has flagged a cluster of accounts with unusual transfer patterns — small amounts bouncing between accounts, money leaving and mysteriously returning, a single source splitting payments to multiple destinations just below the reporting threshold. Your job is to trace the flows, identify the patterns, and produce evidence for the investigation team.

The catch: the evidence lives in connections, not in rows. A single transfer looks innocent. It is only when you follow the chain — account to account to account — that the pattern emerges. Traditional SQL makes this painful. Graph queries make it natural.

In this guide you will learn how to:

Estimated time: about 15 minutes.

The Problem: Why Self-Joins Fall Apart

Before we reach for graph queries, let's feel the pain of doing this with standard SQL. Suppose you want to trace a 3-hop transfer chain starting from account 101. With regular joins, you would write something like this:

-- The painful way: trace 3 hops with self-joins
SELECT t1.src_account AS hop0,
       t1.dst_account AS hop1,
       t2.dst_account AS hop2,
       t3.dst_account AS hop3,
       t1.amount AS amt1,
       t2.amount AS amt2,
       t3.amount AS amt3
FROM transfers t1
JOIN transfers t2 ON t1.dst_account = t2.src_account
JOIN transfers t3 ON t2.dst_account = t3.src_account
WHERE t1.src_account = 101;

That is three joins for three hops. Want four hops? Add another join. Want variable-length paths? You cannot — not without recursive CTEs that quickly become unreadable. Want to check if the money comes back to where it started? Add a WHERE t3.dst_account = 101 and hope you guessed the right chain length.

Now imagine explaining this query to a colleague. Or extending it to include account owners, filter by amount thresholds, and detect cycles of unknown length. The SQL becomes a wall of aliases and join conditions that obscures the very pattern you are trying to find.

There has to be a better way. There is.

Building the Dataset

Our investigation involves four tables: persons (account holders), accounts, an ownership mapping, and a transfer log. Fire up the TeideDB REPL (cargo run --features cli) and create the tables:

CREATE TABLE persons (
  id INTEGER, name VARCHAR, risk_score REAL, country VARCHAR
);
INSERT INTO persons VALUES
  (1, 'Alice Yang',     0.1,  'US'),
  (2, 'Bob Petrov',     0.3,  'UK'),
  (3, 'Carol Santos',   0.8,  'BR'),
  (4, 'Dave Mueller',   0.2,  'DE'),
  (5, 'Eve Nakamura',   0.9,  'JP'),
  (6, 'Frank Osei',     0.4,  'GH'),
  (7, 'Grace Park',     0.1,  'KR'),
  (8, 'Hector Ruiz',    0.7,  'MX');
CREATE TABLE accounts (
  id INTEGER, owner_id INTEGER, account_type VARCHAR,
  balance REAL, opened_date DATE
);
INSERT INTO accounts VALUES
  (101, 1, 'checking',  50000.0,  '2020-01-15'),
  (102, 1, 'savings',   120000.0, '2020-01-15'),
  (103, 2, 'checking',  35000.0,  '2019-06-01'),
  (104, 3, 'checking',  8000.0,   '2022-11-10'),
  (105, 3, 'business',  250000.0, '2022-11-10'),
  (106, 4, 'checking',  42000.0,  '2021-03-20'),
  (107, 5, 'business',  15000.0,  '2023-05-01'),
  (108, 5, 'checking',  3000.0,   '2023-05-01'),
  (109, 6, 'checking',  28000.0,  '2020-09-15'),
  (110, 7, 'savings',   95000.0,  '2018-12-01'),
  (111, 8, 'checking',  11000.0,  '2023-08-20');
CREATE TABLE owns (
  person_id INTEGER, account_id INTEGER
);
INSERT INTO owns VALUES
  (1, 101), (1, 102), (2, 103), (3, 104), (3, 105),
  (4, 106), (5, 107), (5, 108), (6, 109), (7, 110), (8, 111);
CREATE TABLE transfers (
  id INTEGER, src_account INTEGER, dst_account INTEGER,
  amount INTEGER, transfer_date DATE, memo VARCHAR
);
INSERT INTO transfers VALUES
  (1,  101, 103, 15000, '2024-01-10', 'consulting'),
  (2,  103, 104,  9500, '2024-01-12', 'payment'),
  (3,  104, 107,  9000, '2024-01-14', 'services'),
  (4,  107, 101,  8500, '2024-01-16', 'refund'),
  (5,  102, 106, 25000, '2024-02-01', 'investment'),
  (6,  106, 109, 12000, '2024-02-05', 'transfer'),
  (7,  109, 110,  5000, '2024-02-10', 'savings'),
  (8,  105, 108,  4900, '2024-03-01', 'split-1'),
  (9,  105, 109,  4800, '2024-03-01', 'split-2'),
  (10, 105, 111,  4700, '2024-03-01', 'split-3'),
  (11, 110, 102, 30000, '2024-03-15', 'quarterly'),
  (12, 103, 106,  7000, '2024-04-01', 'payment'),
  (13, 106, 104,  3000, '2024-04-05', 'fee'),
  (14, 108, 103,  2500, '2024-04-10', 'return'),
  (15, 111, 104,  4500, '2024-04-15', 'payment');

Let's do a quick sanity check:

SELECT COUNT(*) AS n_persons FROM persons;
n_persons 8
SELECT COUNT(*) AS n_accounts FROM accounts;
n_accounts 11
SELECT COUNT(*) AS n_transfers FROM transfers;
n_transfers 15

Eight people, eleven accounts, fifteen transfers. Buried in those fifteen transfers are a circular flow, a smurfing pattern, and a web of connections that would take pages of self-joins to untangle. Let's use a graph instead.

Creating a Property Graph

A property graph overlays structure on your existing tables. You tell TeideDB which tables are vertices (nodes) and which are edges (connections), and how they reference each other. The data stays in the tables — the graph is a lens for querying it.

CREATE PROPERTY GRAPH financial
VERTEX TABLES (
  persons KEY (id),
  accounts KEY (id)
)
EDGE TABLES (
  owns SOURCE KEY (person_id) REFERENCES persons (id)
       DESTINATION KEY (account_id) REFERENCES accounts (id),
  transfers SOURCE KEY (src_account) REFERENCES accounts (id)
            DESTINATION KEY (dst_account) REFERENCES accounts (id)
);

Let's break down what this does:

Behind the scenes, TeideDB builds a compressed sparse row (CSR) index for fast graph traversal. But from your perspective, you now have a named graph called financial that you can query with MATCH patterns.

First MATCH: Direct Transfers

The simplest graph query matches a single edge — one hop. Let's see all transfers between accounts:

SELECT src, dst, amount
FROM GRAPH_TABLE (financial
  MATCH (a:accounts)-[t:transfers]->(b:accounts)
  COLUMNS (a.id AS src, b.id AS dst, t.amount AS amount)
);
src dst amount 101 103 15000 103 104 9500 104 107 9000 107 101 8500 102 106 25000 106 109 12000 109 110 5000 105 108 4900 105 109 4800 105 111 4700 110 102 30000 103 106 7000 106 104 3000 108 103 2500 111 104 4500

All fifteen transfers, expressed as a pattern: "account a sent money through transfer t to account b." The arrow -> indicates direction — money flows from source to destination.

Now let's filter for large transfers only:

SELECT src, dst, amount, memo
FROM GRAPH_TABLE (financial
  MATCH (a:accounts)-[t:transfers WHERE t.amount > 10000]->(b:accounts)
  COLUMNS (a.id AS src, b.id AS dst, t.amount AS amount, t.memo AS memo)
);
src dst amount memo 101 103 15000 consulting 102 106 25000 investment 106 109 12000 transfer 110 102 30000 quarterly

Four transfers above $10,000. The WHERE clause inside the MATCH pattern filters at the edge level — only transfers matching the condition are traversed.

Who Owns What?

Fraud investigation is not just about money flows. You need to know who is behind each account. The owns edge connects persons to accounts:

SELECT person_name, acct, acct_type
FROM GRAPH_TABLE (financial
  MATCH (p:persons)-[o:owns]->(a:accounts)
  COLUMNS (p.name AS person_name, a.id AS acct, a.account_type AS acct_type)
);
person_name acct acct_type Alice Yang 101 checking Alice Yang 102 savings Bob Petrov 103 checking Carol Santos 104 checking Carol Santos 105 business Dave Mueller 106 checking Eve Nakamura 107 business Eve Nakamura 108 checking Frank Osei 109 checking Grace Park 110 savings Hector Ruiz 111 checking

Now we can see the full picture: Alice owns accounts 101 and 102, Carol owns 104 and 105, Eve owns 107 and 108. These multi-account holders are particularly interesting for investigation — they can move money between their own accounts to obscure trails.

Following the Money: Multi-Hop Paths

Here is where graph queries truly shine. Instead of writing one join per hop, you specify a variable-length path and let TeideDB do the traversal.

Let's trace all paths of 1 to 3 hops starting from account 101:

SELECT src, dst, hops
FROM GRAPH_TABLE (financial
  MATCH (a:accounts WHERE a.id = 101)-[t:transfers]->{1,3}(b:accounts)
  COLUMNS (a.id AS src, b.id AS dst, path_length(t) AS hops)
);

The results show all accounts reachable from account 101 within 1 to 3 hops, along with the number of hops required to reach each one. Some accounts may appear multiple times if there are multiple distinct paths of different lengths reaching them. For example, account 103 is reachable in one hop (directly), while accounts further along the chain require two or three hops.

What if you only care about the shortest route between two specific accounts? Use ANY SHORTEST:

SELECT DISTINCT src, dst, hops
FROM GRAPH_TABLE (financial
  MATCH ANY SHORTEST (a:accounts WHERE a.id = 101)-[t:transfers]->{1,5}(b:accounts WHERE b.id = 107)
  COLUMNS (a.id AS src, b.id AS dst, path_length(t) AS hops)
);
src dst hops 101 107 3

Three hops: 101 → 103 → 104 → 107. That is the shortest chain of transfers connecting Alice's checking account to Eve's business account. The DISTINCT collapses the per-step rows that ANY SHORTEST returns into a single summary row. In the self-join world, you would have needed to try every possible chain length and pick the shortest. Here, TeideDB handles it in one query.

Detecting Circular Flows

This is the moment the investigation gets interesting. A circular flow — money that leaves an account and eventually returns to it — is one of the oldest money laundering signals. The idea is simple: send money on a tour through several accounts to disguise its origin, then bring it back home.

To detect a cycle, use the same variable name for both the start and end node in the MATCH pattern. This tells TeideDB that the path must begin and end at the same node:

SELECT acct, hops
FROM GRAPH_TABLE (financial
  MATCH (a:accounts WHERE a.id = 101)-[t:transfers]->{2,5}(a)
  COLUMNS (a.id AS acct, path_length(t) AS hops)
);
acct hops 101 4

There it is. A 4-hop cycle: account 101 sent money that traveled through a chain of accounts and returned to 101. The key is the repeated variable a(a:accounts WHERE a.id = 101)-[t:transfers]->{2,5}(a) means "start at a and end at the same node a."

Now let's use CHEAPEST with COST to find the lowest-cost path between two accounts. This uses Dijkstra's algorithm under the hood:

SELECT src, dst, total_cost
FROM GRAPH_TABLE (financial
  MATCH CHEAPEST (a:accounts WHERE a.id = 101)-[t:transfers]->{1,5}(b:accounts WHERE b.id = 110)
  COST t.amount
  COLUMNS (a.id AS src, b.id AS dst, COST() AS total_cost)
);

This finds the cheapest (lowest total transfer amount) path from account 101 to account 110, summing the amount on each edge along the way. The result tells you both the route cost and how many hops were needed. This is useful for understanding the minimum financial exposure along any path between two accounts.

Try it yourself: Check whether any other accounts have circular flows. Try accounts 103, 106, and 110. Which ones have cycles, and how long are they? Hint: widen the path range to {2,6} to catch longer cycles. Remember to use the same variable for start and end: (a:accounts WHERE a.id = 103)-[t:transfers]->{2,6}(a).

Smurfing Detection

Smurfing — also called "structuring" — is when a sender splits a large payment into multiple smaller ones to stay below reporting thresholds. Each individual transfer looks unremarkable, but the pattern is unmistakable when you see the full picture.

Let's look for accounts that sent money to three or more distinct destinations in a single day:

SELECT src, transfer_dt, n_destinations, total_amount
FROM (
  SELECT src, transfer_dt, COUNT(*) AS n_destinations, SUM(amount) AS total_amount
  FROM GRAPH_TABLE (financial
    MATCH (a:accounts)-[t:transfers]->(b:accounts)
    COLUMNS (a.id AS src, b.id AS dst, t.amount AS amount, t.transfer_date AS transfer_dt)
  )
  GROUP BY src, transfer_dt
)
WHERE n_destinations >= 3;
src transfer_dt n_destinations total_amount 105 2024-03-01 3 14400

Account 105 sent three transfers on the same day totaling $14,400. Let's see the details:

SELECT src, dst, amount, memo
FROM GRAPH_TABLE (financial
  MATCH (a:accounts WHERE a.id = 105)-[t:transfers]->(b:accounts)
  COLUMNS (a.id AS src, b.id AS dst, t.amount AS amount, t.memo AS memo)
);
src dst amount memo 105 108 4900 split-1 105 109 4800 split-2 105 111 4700 split-3

Three transfers of $4,900, $4,800, and $4,700 — all on the same day, all just below $5,000, all going to different accounts. The memos even say "split-1", "split-2", "split-3". In the real world, memos would not be this obvious, but the transfer pattern would look exactly the same. Account 105 belongs to Carol Santos (risk score: 0.8). This warrants a closer look.

Graph Algorithms: The Big Picture

Individual patterns are useful, but sometimes you need to step back and look at the structure of the entire network. Which accounts are the most central? Do the accounts cluster into distinct communities? Are there isolated subnetworks?

TeideDB provides graph algorithm functions that you can use directly in GRAPH_TABLE COLUMNS clauses. However, algorithm functions (PAGERANK, COMMUNITY, COMPONENT) require a graph with a single edge table. Our financial graph has two edge tables (owns and transfers), so we need to create a separate graph for the transfer network:

CREATE PROPERTY GRAPH transfer_net
VERTEX TABLES (
  accounts KEY (id)
)
EDGE TABLES (
  transfers SOURCE KEY (src_account) REFERENCES accounts (id)
            DESTINATION KEY (dst_account) REFERENCES accounts (id)
);

This transfer_net graph has only accounts as vertices and transfers as edges — exactly what we need for network analysis.

Let's start with PageRank — a measure of how "important" or central each account is in the transfer network:

SELECT acct, rank
FROM GRAPH_TABLE (transfer_net
  MATCH (a:accounts)
  COLUMNS (a.id AS acct, PAGERANK(transfer_net, a) AS rank)
)
ORDER BY rank DESC;

The results rank all eleven accounts by their PageRank score. Accounts that receive transfers from many sources (like account 104, which receives from accounts 103, 106, and 111) will have the highest scores. Accounts that only send but never receive (like account 105) will have the lowest scores. PageRank values are nondeterministic due to floating-point iteration order, so exact scores may vary between runs — but the relative ranking is stable.

Now let's look at community detection using the Louvain algorithm. This groups accounts that transact heavily with each other:

SELECT acct, community
FROM GRAPH_TABLE (transfer_net
  MATCH (a:accounts)
  COLUMNS (a.id AS acct, COMMUNITY(transfer_net, a) AS community)
)
ORDER BY community, acct;

The Louvain algorithm partitions the accounts into communities of tightly-connected nodes. You will typically see the circular-flow accounts (101, 103, 104, 107) grouped together, the chain from Alice's savings through Dave and Frank to Grace in another community, and Carol's business account with the smurfing destinations in a third. The exact community IDs may vary between runs, but the groupings are consistent.

Finally, connected components tells us whether there are isolated subnetworks:

SELECT acct, comp
FROM GRAPH_TABLE (transfer_net
  MATCH (a:accounts)
  COLUMNS (a.id AS acct, COMPONENT(transfer_net, a) AS comp)
)
ORDER BY comp, acct;
acct comp 101 0 102 0 103 0 104 0 105 0 106 0 107 0 108 0 109 0 110 0 111 0

All accounts belong to a single connected component. Every account in our dataset can reach every other account through some chain of transfers. There are no isolated nodes — this is a fully connected network, which makes containment more difficult if any of these accounts are compromised.

Composing with SQL

Graph queries do not live in isolation. You can wrap GRAPH_TABLE in CTEs, join the results with other tables, and use the full power of SQL aggregation. This is where the investigation comes together.

Let's find the average risk score per community by joining graph results with the persons and ownership data. Note that we use the transfer_net graph (single edge table) for the algorithm query and the financial graph for the ownership query:

WITH communities AS (
  SELECT acct, community
  FROM GRAPH_TABLE (transfer_net
    MATCH (a:accounts)
    COLUMNS (a.id AS acct, COMMUNITY(transfer_net, a) AS community)
  )
),
ownership AS (
  SELECT person_name, acct, risk
  FROM GRAPH_TABLE (financial
    MATCH (p:persons)-[o:owns]->(a:accounts)
    COLUMNS (p.name AS person_name, a.id AS acct, p.risk_score AS risk)
  )
)
SELECT c.community,
       COUNT(*) AS n_accounts,
       AVG(o.risk) AS avg_risk,
       MAX(o.risk) AS max_risk
FROM communities c
JOIN ownership o ON c.acct = o.acct
GROUP BY c.community
ORDER BY avg_risk DESC;

The results show each community alongside the number of accounts it contains and the average and maximum risk scores of its members. The community containing the smurfing accounts (Carol Santos, risk 0.8, and Eve Nakamura, risk 0.9) will have the highest average risk, while the chain community (Alice, Dave, Frank, Grace) will have the lowest. The suspicious structural patterns correlate with the risk scores.

Try it yourself: Write a query that uses CLUSTERING_COEFFICIENT to find accounts with a clustering coefficient above 0.3. These are accounts whose neighbors also transact with each other — a sign of a tightly-knit group. Which accounts are they, and do they overlap with the suspicious communities? Remember to use the transfer_net graph for algorithm functions.

The Full Investigation

Let's bring everything together in a single compound query. We want to identify the highest-PageRank accounts, determine which community they belong to, and look up their owners. We use the transfer_net graph for algorithms and financial for ownership.

WITH ranked AS (
  SELECT acct, rank
  FROM GRAPH_TABLE (transfer_net
    MATCH (a:accounts)
    COLUMNS (
      a.id AS acct,
      PAGERANK(transfer_net, a) AS rank
    )
  )
),
comm AS (
  SELECT acct, community
  FROM GRAPH_TABLE (transfer_net
    MATCH (a:accounts)
    COLUMNS (
      a.id AS acct,
      COMMUNITY(transfer_net, a) AS community
    )
  )
),
ownership AS (
  SELECT person_name, acct, risk, country
  FROM GRAPH_TABLE (financial
    MATCH (p:persons)-[o:owns]->(a:accounts)
    COLUMNS (p.name AS person_name, a.id AS acct,
             p.risk_score AS risk, p.country AS country)
  )
)
SELECT r.acct,
       o.person_name,
       o.country,
       o.risk,
       r.rank,
       c.community
FROM ranked r
JOIN comm c ON r.acct = c.acct
JOIN ownership o ON r.acct = o.acct
ORDER BY r.rank DESC;

The results combine three dimensions of analysis into a single view. Each row shows an account alongside its owner, country, risk score, PageRank centrality, and community membership. The investigation picture becomes clear:

This is the power of graph queries composed with SQL: you start with structure (MATCH), add analytics (PageRank, community), and finish with relational operations (JOIN, ORDER BY) to produce actionable intelligence. Each layer builds on the last, and the final result is something no amount of self-joins could produce this cleanly.

What's Next

You have seen how SQL/PGQ turns relationship-heavy investigations from painful self-join exercises into expressive pattern queries. But graph structure is only one dimension of your data. If your accounts have text descriptions, transaction narratives, or behavioral embeddings, you can combine graph queries with vector similarity search.