Beyond GROUP BY: Mastering MySQL Window Functions for Advanced Ranking

The Power of Window Functions in MySQL

For a long time, MySQL developers relied heavily on GROUP BY to aggregate data. While effective, GROUP BY has a major limitation: it collapses individual rows into a single summary. If you need to calculate a running total or rank items while still seeing every row in your result set, you need Window Functions.

Introduced in MySQL 8.0, window functions perform calculations across a set of table rows that are related to the current row. This is incredibly useful for financial reporting, leaderboard systems, and data cleaning.

The Setup: A Simple Sales Table

To understand how these functions work, let’s look at a typical sales table. We want to track how employees are performing within their respective regions.

CREATE TABLE sales (
    id INT AUTO_INCREMENT PRIMARY KEY,
    employee_name VARCHAR(50),
    region VARCHAR(50),
    sale_amount DECIMAL(10, 2)
);

INSERT INTO sales (employee_name, region, sale_amount) VALUES
('Alice', 'North', 5000),
('Bob', 'North', 4500),
('Charlie', 'North', 5000),
('David', 'South', 6000),
('Eve', 'South', 5500);

Understanding ROW_NUMBER, RANK, and DENSE_RANK

MySQL offers three primary ways to rank data. Each behaves differently when it encounters a tie (e.g., Alice and Charlie both sold $5,000).

ROW_NUMBER(): Assigns a unique, sequential number to each row regardless of ties.
RANK(): Assigns the same rank to ties but leaves gaps in the sequence.
DENSE_RANK(): Assigns the same rank to ties but does not leave gaps.

Let’s see them in action within their regions:

SELECT 
    employee_name, 
    region, 
    sale_amount,
    ROW_NUMBER() OVER (PARTITION BY region ORDER BY sale_amount DESC) as row_num,
    RANK() OVER (PARTITION BY region ORDER BY sale_amount DESC) as rank_val,
    DENSE_RANK() OVER (PARTITION BY region ORDER BY sale_amount DESC) as dense_rank_val
FROM sales;

Practical Use Case: Deduplicating Data

One of the most practical applications of ROW_NUMBER() is identifying and removing duplicate records. Suppose you have a table where the same log entry was accidentally inserted multiple times. You can use a Common Table Expression (CTE) to find the duplicates.

WITH CTE AS (
    SELECT 
        id, 
        ROW_NUMBER() OVER (PARTITION BY employee_name, sale_amount ORDER BY id) as occurrence
    FROM sales
)
SELECT * FROM CTE WHERE occurrence > 1;

In this query, any row with an occurrence greater than 1 is a duplicate. You can easily change the SELECT to a DELETE (using a join) to clean your database instantly.

Performance Considerations

While window functions are powerful, they are calculated after the WHERE, GROUP BY, and HAVING clauses. This means the database has to process the entire filtered result set in memory to calculate the windows. To keep your queries fast, ensure that the columns used in your ORDER BY and PARTITION BY clauses are properly indexed. This allows MySQL to sort and group the data much more efficiently.

By mastering these functions, you can write cleaner, more efficient SQL that handles complex analytical tasks directly on the database server, reducing the amount of logic you need to write in your application code.