mysql database official logo

Speed Up Complex Queries: Using MySQL Generated Columns for Faster Indexing

The Problem with Complex WHERE Clauses

In MySQL, indexes are the backbone of performance. However, traditional indexes only work when you query a column directly. As soon as you wrap a column in a function—like WHERE YEAR(created_at) = 2023—MySQL usually abandons the index and performs a full table scan. This happens because the database can't pre-calculate the result of the function for every row in a standard B-Tree index.

Historically, developers solved this by creating a physical column, calculating the value in their application code, and saving it. This approach is brittle and prone to data desync. MySQL Generated Columns (introduced in 5.7) offer a much cleaner, built-in solution.

What are Generated Columns?

A generated column is a column whose value is automatically calculated from an expression involving other columns in the same table. There are two types:

  • VIRTUAL: The values are not stored on disk. They are calculated on the fly when the row is read. However, you can still create an index on a virtual column.
  • STORED: The values are calculated and stored on disk whenever a row is inserted or updated. This uses more disk space but can be slightly faster for reads if the calculation is extremely heavy.

Example 1: Indexing a Calculation

Imagine an e-commerce platform where you frequently filter products by their discounted price. Instead of calculating the discount in every query, you can define a generated column.

ALTER TABLE products 
ADD COLUMN discount_price DECIMAL(10,2) 
AS (original_price * (1 - discount_percentage / 100)) VIRTUAL;

-- Now, add an index to the generated column
CREATE INDEX idx_discount_price ON products(discount_price);

Now, when you run SELECT * FROM products WHERE discount_price < 50;, MySQL uses the index idx_discount_price, making the query nearly instantaneous even with millions of rows.

Example 2: Indexing JSON Data

One of the most powerful use cases for generated columns is indexing specific keys within a JSON blob. Since you cannot directly index a JSON field, a virtual generated column acts as a bridge.

-- Suppose you have a 'settings' JSON column
ALTER TABLE users 
ADD COLUMN theme_preference VARCHAR(20) 
AS (settings->>'$.theme') VIRTUAL;

CREATE INDEX idx_theme ON users(theme_preference);

In this snippet, ->> is the inline path operator that extracts the value and unquotes it. By indexing theme_preference, you can query users based on their JSON settings with the speed of a standard relational table.

When to Use Virtual vs. Stored

In 90% of cases, VIRTUAL is the correct choice. Because MySQL allows you to index virtual columns, the index itself is stored on disk and updated automatically. The "virtual" nature only applies to the column's presence in the actual data rows. Using virtual columns saves disk space and reduces the I/O overhead during write operations.

Use STORED only if the expression is computationally expensive to evaluate or if the column needs to be accessed by a tool that doesn't support virtual columns. For most web applications, virtual columns provide the best balance of performance and efficiency.