casyup.me@outlook.com

0%

read/UnderstandingGeneratedColumns

Understanding Generated Columns

The Theory

Generated Columns is a feature released on MySQL 5.7. They can be used during CREATE TABLEor ALTER TABLE statements. It is a way of storing data without actually sending it through the INSERT or UPDATE clauses in SQL. The database resolves what the data will be.

生成列是在 MySQL 5.7 之后发布的一个特性, 可以在 CREATE TABLE / ALTER TABLE 期间使用. 是一种存储数据而不通过 INSERT / UPDATE 子句的方法. 数据库解析数据是什么

There are two types of Generated Columns: Virtual and Stored. They work with:

  • mathematical expressions (product_price * quantity)
  • built-in functions (RIGHT(), CONCAT(), FROM_UNIXTIME(), JSON_EXTRACT())
  • literals (“2”, “new”, 0)

有两种生成列: virtual 和 stored, 它们工作于:

  • 数学表达式
  • 内建函数
  • 复合常量

Besides that, they can be indexed but they don’t allow subqueries in it.
A Generated Column works within the table domain. If you need subqueries on a particular column, you may have to look at Views.

如上, 它们能够被索引, 但不允许子查询.

生成列在表域中工作, 如果你需要在特殊的列上子查询, 你需要考虑 view

The basic example

As an example I am going to use an e-commerce database as based on my past experience of what I have seen and worked. You will probably have at least these tables or something similar:

  • users – stores user info
  • products – stores product info like price and description
  • orders – stores the user_id, date of order
  • orders_items – stores product_id, order_id, quantity and price at the time of purchase

This is the whole DB: Gist.

Notice the order_items definition:

1
2
3
4
5
6
7
8
9
10
CREATE TABLE `orders_items` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`order_id` int(10) unsigned NOT NULL,
`product_id` int(10) unsigned NOT NULL,
`product_price` decimal(10,2) unsigned NOT NULL DEFAULT '0.00',
`quantity` int(10) unsigned NOT NULL DEFAULT 1,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` varchar(45) NOT NULL DEFAULT 'CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;

The retrieval would bring:

1
2
3
4
SELECT
`id`, `order_id`, `product_id`, `product_price`, `quantity`
FROM `orders_items`
LIMIT 5;
id order_id product_id product_price quantity
1 369 1304 202.18 7
2 4 1736 250.40 3
3 270 1404 29.89 5
4 256 179 190.40 10
5 107 1911 146.98 1

One example is to get the total of that order_item row, something like total_item_price that would store the value of product_price * quantity to show how much the summed amount of an item would be. Some databases have the MONEY type to store price, as with MySQL it is recommended to work with DECIMAL.

上述例子获取 order_item 所有行, 应该有类似 total_item_price 的列存储 product * quantity, 显示总额.

一些数据库有 MONEY 类型存储价格, MySQL 推荐使用 DECIMAL

People solve this problem in different ways:

  • store the calculated price on a new column to make it easier to retrieve;
  • create a view;
  • or they calculate in the application itself, which in this case might cause problems due to how the language handles floats. There are libraries to deal with money values in a lot of languages and frameworks, however, the overhead of converting each row into a money object could be costly depending on the amount of data being transferred.

有几种不同的方式解决这个问题:

  • 在新的列存储计算好的值, 便于检索
  • 创建视图
  • 或者在应用层计算, 由于不同语言对于浮点数的管理可能造成问题. 大量的语言和框架提供了库处理这个问题.然而, 将每行转换为对象的消耗依赖于有多少数据被调入

Another way I’ve seen is: people calculate in the query the total amount for the orders_items row as product_price * quantity:

还有一种我见过的方法 : 在查询中计算总额

1
2
3
4
5
6
7
8
9
SELECT
`id`,
`order_id`,
`product_id`,
`product_price`,
`quantity`,
`product_price` * `quantity` AS total_item_price
FROM `orders_items`
LIMIT 5;
id order_id product_id product_price quantity total_item_price
1 369 1304 202.18 7 1415.26
2 4 1736 250.40 3 751.20
3 270 1404 29.89 5 149.45
4 256 179 190.40 10 1904.00
5 107 1911 146.98 1 146.98

Virtual Columns

  • They take no disk space, except when using a Virtual Column as in a Secondary Index.
  • They are an INPLACE operation: it means the table definition is changed without having to recopy all the data again. More info.
  • The values are calculated on the fly during read operations and BEFORE triggers.

Consider using virtual columns for data where changes happens in a significant number of times. The cost of a Virtual Column comes from reading a table constantly and the server has to compute every time what that column value will be.

  • 不占磁盘空间, 除非使用虚拟列作为次级索引
  • 是 INPLACE 操作 : 这意味着表定义更改不需要拷贝所有数据
  • 值在读取操作和 BEFORE 触发之间动态计算

Stored Columns

  • They do use disk space.
  • It has the same cost of adding a new column, so it is a COPY operation
  • Values are updated in every INSERT and UPDATE statement.

You should consider using Stored Columns for when the data doesn’t change significantly or at all after creation, like for instance, the example above with the orders_items table. Once a purchase is made, the price of the product is stored, not being changed, neither the quantity. Considering this information we could create total_item_price as a Stored Column.

  • 耗费磁盘空间
  • 和增加行有相等的消耗, 等价于 COPY 操作
  • 值在每次 INSERT 和 UPDATE 语句中更新

你需要考虑使用存储列, 以便数据没有显著变化或在创建后没有显著变化, 例如上面带有 order_items 的表. 当购买操作后, 存储产品的价格, 不会再更改, 数量也不会. 考虑到这个信息, 我们应该创建 total_item_price 存储列

The code

Creating a table

1
`-- Virtual Column` `CREATE` `TABLE` ``orders_items` (```id` ``int``(10) unsigned ``NOT` `NULL` `AUTO_INCREMENT,```order_id` ``int``(10) unsigned ``NOT` `NULL``,```product_id` ``int``(10) unsigned ``NOT` `NULL``,```product_price` ``decimal``(10,2) unsigned ``NOT` `NULL` `DEFAULT` `'0.00'``,```quantity` ``int``(10) unsigned ``NOT` `NULL` `DEFAULT` `1,```total_item_price` ``decimal``(10,2) ``AS` `(`quantity` * `product_price`),```created_at` ``timestamp` `NOT` `NULL` `DEFAULT` `CURRENT_TIMESTAMP``,```updated_at` ``varchar``(45) ``NOT` `NULL` `DEFAULT` `'CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP'``,``PRIMARY` `KEY` `(`id`)``) ENGINE=InnoDB AUTO_INCREMENT=1 ``DEFAULT` `CHARSET=utf8mb4;` `-- Stored Column` `CREATE` `TABLE` ``orders_items` (```id` ``int``(10) unsigned ``NOT` `NULL` `AUTO_INCREMENT,```order_id` ``int``(10) unsigned ``NOT` `NULL``,```product_id` ``int``(10) unsigned ``NOT` `NULL``,```product_price` ``decimal``(10,2) unsigned ``NOT` `NULL` `DEFAULT` `'0.00'``,```quantity` ``int``(10) unsigned ``NOT` `NULL` `DEFAULT` `1,```total_item_price` ``decimal``(10,2) ``AS` `(`quantity` * `product_price`) STORED,```created_at` ``timestamp` `NOT` `NULL` `DEFAULT` `CURRENT_TIMESTAMP``,```updated_at` ``varchar``(45) ``NOT` `NULL` `DEFAULT` `'CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP'``,``PRIMARY` `KEY` `(`id`)``) ENGINE=InnoDB AUTO_INCREMENT=1 ``DEFAULT` `CHARSET=utf8mb4;`

Notice how the definition changes on line 9 and 23: you have another keyword, AS, then an expression and specifically on line 23 you see a STORED keyword. In both lines they are generated columns, if nothing is specified will be a VIRTUAL column.

注意在 9 行 和 23 行的更改, 有另一个关键字, AS, 随之表达式和 STORED 关键字. 如果没有指定, 将会是一个 VIRTUAL 列

Altering a table

It uses the same syntax as adding a column, just adding the “AS (expression)” after the data type:

1
`-- `full_name` as VIRTUAL COLUMN``ALTER` `TABLE` `users``ADD` `COLUMN` ``full_name` ``VARCHAR``(500)``AS` `(CONCAT_WS(``" "``, `first_name`, `last_name`));` `-- `total_item_price` as STORED COLUMN``ALTER` `TABLE` `orders_items``ADD` `COLUMN` ``total_item_price` ``DECIMAL``(10, 2)``AS` `(`quantity` * `product_price`) STORED;`

JSON fields

It is also possible to extract data from JSON fields using generated columns. As the functions for JSON are built-in, JSON_EXTRACT and JSON_UNQUOTE as well “->” and “->>” work as expressionsfor a generated column:

1
`-- Stored Columns``ALTER` `TABLE` ``twitter_users```ADD` `COLUMN` ``location` ``VARCHAR``(255)``AS` `(response->>``"$.location"``) STORED;`

Final considerations

When the type is STORED, it must be specified after the expression otherwise the default behaviour will be to be VIRTUAL.

Generated columns can have indexes created as the following, no matter if stored, virtual or extracted from a JSON field:

1
`ALTER` `TABLE` `users``ADD` `INDEX` ``ix_full_name` (`full_name`);`

Which is the same syntax for normal columns.