PostgreSQL Window Functions: The Secret Sauce of Fancy Queries!

Because sometimes, you need a running total without ruining your dataset!

Introduction: What Are Window Functions?

Ever needed to calculate a running total, a ranking, or a moving average—but without losing individual rows? That's where Window Functions come in!

Unlike GROUP BY, which smashes your data into a single row per group, Window Functions let you keep all rows while still applying calculations over a "window" of rows.

Think of it as:

  • GROUP BY = "Summarize everything into one row per group"
  • WINDOW FUNCTION = "Do fancy calculations while keeping all rows intact"

Real-World Use Cases:

Running totals (e.g., total sales per day)
Ranking customers (e.g., top spenders in a store)
Calculating moving averages (e.g., stock price trends)
Finding first & last transactions (e.g., first order per customer)

Basic Syntax of a Window Function

function_name(column) OVER (
    PARTITION BY column_to_group 
    ORDER BY column_to_sort
)

Breaking It Down:

  • function_name(column) → The function (like SUM, AVG, ROW_NUMBER) applied to a column
  • OVER (...) → Defines the "window" or scope of rows
  • PARTITION BY column_to_group → (Optional) Divides data into smaller groups, like GROUP BY but without collapsing rows
  • ORDER BY column_to_sort → Defines the sequence

ROW_NUMBER(), RANK(), and DENSE_RANK(): Who’s the Boss?

Ranking with ROW_NUMBER()

Assigns a unique row number per partition. No duplicates!

SELECT customer_id, order_date, price,
       ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS order_rank
FROM orders;

Best for: Numbering rows uniquely

RANK(): When Ties Exist

Ranks rows, but if there's a tie, it skips numbers!

SELECT customer_id, price,
       RANK() OVER (ORDER BY price DESC) AS price_rank
FROM orders;

Problem: If two customers spent $100, both get rank 1, but the next rank will be 3, not 2!

DENSE_RANK(): No Skipping

Similar to RANK(), but doesn’t skip numbers when ties occur.

SELECT customer_id, price,
       DENSE_RANK() OVER (ORDER BY price DESC) AS price_rank
FROM orders;

 Best for: Ranking without gaps in numbers.

customer_id price RANK() DENSE_RANK()
A 100 1 1
B 100 1 1
C 90 3 2
D 80 4 3

SUM(), AVG(), and COUNT() as Window Functions

Running Totals with SUM()

SELECT customer_id, order_date, price,
       SUM(price) OVER (PARTITION BY customer_id ORDER BY order_date) AS running_total
FROM orders;

How It Works:

  • The SUM(price) calculates a cumulative total for each customer
  • PARTITION BY customer_id keeps calculations separate for each customer
  • ORDER BY order_date ensures the sum is cumulative over time

Moving Averages with AVG()

SELECT customer_id, order_date, price,
       AVG(price) OVER (PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg
FROM orders;

Why?

  • Helps smooth out fluctuations (e.g., stock market trends 📈)
  • Useful for predicting trends based on past values

LEAD() and LAG(): Looking Ahead & Behind

Sometimes, you need to compare the current row with the next or previous row.

LAG(): Get the Previous Row’s Value

SELECT customer_id, order_date, price,
       LAG(price) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_order_price
FROM orders;

Use Case:

  • Find price differences between two consecutive orders

LEAD(): Get the Next Row’s Value

SELECT customer_id, order_date, price,
       LEAD(price) OVER (PARTITION BY customer_id ORDER BY order_date) AS next_order_price
FROM orders;

Use Case:

  • Compare current price with the next price

FIRST_VALUE() and LAST_VALUE(): Finding First & Last Items

FIRST_VALUE(): Get the First Row in Each Partition

SELECT customer_id, order_date, price,
       FIRST_VALUE(price) OVER (PARTITION BY customer_id ORDER BY order_date) AS first_order_price
FROM orders;

Use Case:

  • Find the first purchase price per customer

LAST_VALUE(): Get the Last Row (Careful! It’s Tricky)

SELECT customer_id, order_date, price,
       LAST_VALUE(price) OVER (PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_order_price
FROM orders;

Why So Complex?

  • By default, LAST_VALUE() only looks at the current row, so we need ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to get the actual last row.

NTILE(): Splitting Data into Equal Buckets

Want to divide your dataset into quartiles, percentiles, or N groups? Use NTILE()!

SELECT customer_id, price,
       NTILE(4) OVER (ORDER BY price DESC) AS quartile
FROM orders;

Use Cases:

  • Divide customers into top 25%, middle 50%, bottom 25%
  • Rank products by sales performance

Real-World Example: Customer Ranking & Trends

SELECT customer_id, order_date, price,
       SUM(price) OVER (PARTITION BY customer_id ORDER BY order_date) AS running_total,
       LAG(price) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_order_price,
       RANK() OVER (ORDER BY SUM(price) OVER (PARTITION BY customer_id) DESC) AS customer_rank
FROM orders;

What This Query Does:
Running total of spending per customer
Compare current order with previous order
Ranks customers by total spending

Window Functions Cheat Sheet

Function What It Does
ROW_NUMBER() Assigns a unique row number
RANK() Assigns a rank, skipping numbers for ties
DENSE_RANK() Assigns a rank without skipping numbers
SUM()/AVG()/COUNT() Running totals, moving averages
LAG() Gets the previous row’s value
LEAD() Gets the next row’s value
FIRST_VALUE() Gets the first row in a partition
LAST_VALUE() Gets the last row (requires careful ordering)
NTILE(n) Divides data into n equal parts

Window Functions = Superpowers!

With Window Functions, you can:
Rank data dynamically
Calculate running totals & averages
Compare current rows with previous/next rows
Group data without losing individual records

Post a Comment

0 Comments