Because sometimes, you need a running total without ruining your dataset!
Introduction: What Are Window Functions?
Ever needed to calculate a running total, a ranking, or a moving average—but without losing individual rows? That's where Window Functions come in!
Unlike GROUP BY, which smashes your data into a single row per group, Window Functions let you keep all rows while still applying calculations over a "window" of rows.
Think of it as:
- GROUP BY = "Summarize everything into one row per group"
- WINDOW FUNCTION = "Do fancy calculations while keeping all rows intact"
Real-World Use Cases:
Running totals (e.g., total sales per day)
Ranking customers (e.g., top spenders in a store)
Calculating moving averages (e.g., stock price trends)
Finding first & last transactions (e.g., first order per customer)
Basic Syntax of a Window Function
function_name(column) OVER (
PARTITION BY column_to_group
ORDER BY column_to_sort
)
Breaking It Down:
function_name(column)
→ The function (likeSUM
,AVG
,ROW_NUMBER
) applied to a columnOVER (...)
→ Defines the "window" or scope of rowsPARTITION BY column_to_group
→ (Optional) Divides data into smaller groups, likeGROUP BY
but without collapsing rowsORDER BY column_to_sort
→ Defines the sequence
ROW_NUMBER(), RANK(), and DENSE_RANK(): Who’s the Boss?
Ranking with ROW_NUMBER()
Assigns a unique row number per partition. No duplicates!
SELECT customer_id, order_date, price,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS order_rank
FROM orders;
Best for: Numbering rows uniquely
RANK(): When Ties Exist
Ranks rows, but if there's a tie, it skips numbers!
SELECT customer_id, price,
RANK() OVER (ORDER BY price DESC) AS price_rank
FROM orders;
Problem: If two customers spent $100, both get rank 1, but the next rank will be 3, not 2!
DENSE_RANK(): No Skipping
Similar to RANK()
, but doesn’t skip numbers when ties occur.
SELECT customer_id, price,
DENSE_RANK() OVER (ORDER BY price DESC) AS price_rank
FROM orders;
Best for: Ranking without gaps in numbers.
customer_id | price | RANK() | DENSE_RANK() |
---|---|---|---|
A | 100 | 1 | 1 |
B | 100 | 1 | 1 |
C | 90 | 3 | 2 |
D | 80 | 4 | 3 |
SUM(), AVG(), and COUNT() as Window Functions
Running Totals with SUM()
SELECT customer_id, order_date, price,
SUM(price) OVER (PARTITION BY customer_id ORDER BY order_date) AS running_total
FROM orders;
How It Works:
- The SUM(price) calculates a cumulative total for each customer
PARTITION BY customer_id
keeps calculations separate for each customerORDER BY order_date
ensures the sum is cumulative over time
Moving Averages with AVG()
SELECT customer_id, order_date, price,
AVG(price) OVER (PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg
FROM orders;
Why?
- Helps smooth out fluctuations (e.g., stock market trends 📈)
- Useful for predicting trends based on past values
LEAD() and LAG(): Looking Ahead & Behind
Sometimes, you need to compare the current row with the next or previous row.
LAG(): Get the Previous Row’s Value
SELECT customer_id, order_date, price,
LAG(price) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_order_price
FROM orders;
Use Case:
- Find price differences between two consecutive orders
LEAD(): Get the Next Row’s Value
SELECT customer_id, order_date, price,
LEAD(price) OVER (PARTITION BY customer_id ORDER BY order_date) AS next_order_price
FROM orders;
Use Case:
- Compare current price with the next price
FIRST_VALUE() and LAST_VALUE(): Finding First & Last Items
FIRST_VALUE(): Get the First Row in Each Partition
SELECT customer_id, order_date, price,
FIRST_VALUE(price) OVER (PARTITION BY customer_id ORDER BY order_date) AS first_order_price
FROM orders;
Use Case:
- Find the first purchase price per customer
LAST_VALUE(): Get the Last Row (Careful! It’s Tricky)
SELECT customer_id, order_date, price,
LAST_VALUE(price) OVER (PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_order_price
FROM orders;
Why So Complex?
- By default,
LAST_VALUE()
only looks at the current row, so we needROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
to get the actual last row.
NTILE(): Splitting Data into Equal Buckets
Want to divide your dataset into quartiles, percentiles, or N groups? Use NTILE()
!
SELECT customer_id, price,
NTILE(4) OVER (ORDER BY price DESC) AS quartile
FROM orders;
Use Cases:
- Divide customers into top 25%, middle 50%, bottom 25%
- Rank products by sales performance
Real-World Example: Customer Ranking & Trends
SELECT customer_id, order_date, price,
SUM(price) OVER (PARTITION BY customer_id ORDER BY order_date) AS running_total,
LAG(price) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_order_price,
RANK() OVER (ORDER BY SUM(price) OVER (PARTITION BY customer_id) DESC) AS customer_rank
FROM orders;
What This Query Does:
Running total of spending per customer
Compare current order with previous order
Ranks customers by total spending
Window Functions Cheat Sheet
Function | What It Does |
---|---|
ROW_NUMBER() |
Assigns a unique row number |
RANK() |
Assigns a rank, skipping numbers for ties |
DENSE_RANK() |
Assigns a rank without skipping numbers |
SUM()/AVG()/COUNT() |
Running totals, moving averages |
LAG() |
Gets the previous row’s value |
LEAD() |
Gets the next row’s value |
FIRST_VALUE() |
Gets the first row in a partition |
LAST_VALUE() |
Gets the last row (requires careful ordering) |
NTILE(n) |
Divides data into n equal parts |
Window Functions = Superpowers!
With Window Functions, you can:
Rank data dynamically
Calculate running totals & averages
Compare current rows with previous/next rows
Group data without losing individual records
0 Comments