Understanding Generated Columns
The Theory
Generated Columns is a feature released on MySQL 5.7. They can be used during CREATE TABLE
or ALTER TABLE
statements. It is a way of storing data without actually sending it through the INSERT
or UPDATE
clauses in SQL. The database resolves what the data will be.
生成列是在 MySQL 5.7 之后发布的一个特性, 可以在 CREATE TABLE / ALTER TABLE 期间使用. 是一种存储数据而不通过 INSERT / UPDATE 子句的方法. 数据库解析数据是什么
There are two types of Generated Columns: Virtual and Stored. They work with:
- mathematical expressions (
product_price
*quantity
) - built-in functions (
RIGHT()
,CONCAT()
,FROM_UNIXTIME()
,JSON_EXTRACT()
) - literals (“2”, “new”, 0)
有两种生成列: virtual 和 stored, 它们工作于:
- 数学表达式
- 内建函数
- 复合常量
Besides that, they can be indexed but they don’t allow subqueries in it.
A Generated Column works within the table domain. If you need subqueries on a particular column, you may have to look at Views.
如上, 它们能够被索引, 但不允许子查询.
生成列在表域中工作, 如果你需要在特殊的列上子查询, 你需要考虑 view
The basic example
As an example I am going to use an e-commerce database as based on my past experience of what I have seen and worked. You will probably have at least these tables or something similar:
users
– stores user infoproducts
– stores product info like price and descriptionorders
– stores the user_id, date of orderorders_items
– stores product_id, order_id, quantity and price at the time of purchase
This is the whole DB: Gist.
Notice the order_items
definition:
1 | CREATE TABLE `orders_items` ( |
The retrieval would bring:
1 | SELECT |
id | order_id | product_id | product_price | quantity |
---|---|---|---|---|
1 | 369 | 1304 | 202.18 | 7 |
2 | 4 | 1736 | 250.40 | 3 |
3 | 270 | 1404 | 29.89 | 5 |
4 | 256 | 179 | 190.40 | 10 |
5 | 107 | 1911 | 146.98 | 1 |
One example is to get the total of that order_item row, something like total_item_price
that would store the value of product_price
* quantity
to show how much the summed amount of an item would be. Some databases have the MONEY
type to store price
, as with MySQL it is recommended to work with DECIMAL
.
上述例子获取 order_item 所有行, 应该有类似 total_item_price 的列存储 product * quantity, 显示总额.
一些数据库有 MONEY 类型存储价格, MySQL 推荐使用 DECIMAL
People solve this problem in different ways:
- store the calculated price on a new column to make it easier to retrieve;
- create a view;
- or they calculate in the application itself, which in this case might cause problems due to how the language handles floats. There are libraries to deal with money values in a lot of languages and frameworks, however, the overhead of converting each row into a money object could be costly depending on the amount of data being transferred.
有几种不同的方式解决这个问题:
- 在新的列存储计算好的值, 便于检索
- 创建视图
- 或者在应用层计算, 由于不同语言对于浮点数的管理可能造成问题. 大量的语言和框架提供了库处理这个问题.然而, 将每行转换为对象的消耗依赖于有多少数据被调入
Another way I’ve seen is: people calculate in the query the total amount for the orders_items row as product_price
* quantity
:
还有一种我见过的方法 : 在查询中计算总额
1 | SELECT |
id | order_id | product_id | product_price | quantity | total_item_price |
---|---|---|---|---|---|
1 | 369 | 1304 | 202.18 | 7 | 1415.26 |
2 | 4 | 1736 | 250.40 | 3 | 751.20 |
3 | 270 | 1404 | 29.89 | 5 | 149.45 |
4 | 256 | 179 | 190.40 | 10 | 1904.00 |
5 | 107 | 1911 | 146.98 | 1 | 146.98 |
Virtual Columns
- They take no disk space, except when using a Virtual Column as in a Secondary Index.
- They are an
INPLACE
operation: it means the table definition is changed without having to recopy all the data again. More info. - The values are calculated on the fly during read operations and
BEFORE
triggers.
Consider using virtual columns for data where changes happens in a significant number of times. The cost of a Virtual Column comes from reading a table constantly and the server has to compute every time what that column value will be.
- 不占磁盘空间, 除非使用虚拟列作为次级索引
- 是 INPLACE 操作 : 这意味着表定义更改不需要拷贝所有数据
- 值在读取操作和 BEFORE 触发之间动态计算
Stored Columns
- They do use disk space.
- It has the same cost of adding a new column, so it is a
COPY
operation - Values are updated in every
INSERT
andUPDATE
statement.
You should consider using Stored Columns for when the data doesn’t change significantly or at all after creation, like for instance, the example above with the orders_items
table. Once a purchase is made, the price of the product is stored, not being changed, neither the quantity. Considering this information we could create total_item_price
as a Stored Column.
- 耗费磁盘空间
- 和增加行有相等的消耗, 等价于 COPY 操作
- 值在每次 INSERT 和 UPDATE 语句中更新
你需要考虑使用存储列, 以便数据没有显著变化或在创建后没有显著变化, 例如上面带有 order_items 的表. 当购买操作后, 存储产品的价格, 不会再更改, 数量也不会. 考虑到这个信息, 我们应该创建 total_item_price 存储列
The code
Creating a table
1 | `-- Virtual Column` `CREATE` `TABLE` ``orders_items` (```id` ``int``(10) unsigned ``NOT` `NULL` `AUTO_INCREMENT,```order_id` ``int``(10) unsigned ``NOT` `NULL``,```product_id` ``int``(10) unsigned ``NOT` `NULL``,```product_price` ``decimal``(10,2) unsigned ``NOT` `NULL` `DEFAULT` `'0.00'``,```quantity` ``int``(10) unsigned ``NOT` `NULL` `DEFAULT` `1,```total_item_price` ``decimal``(10,2) ``AS` `(`quantity` * `product_price`),```created_at` ``timestamp` `NOT` `NULL` `DEFAULT` `CURRENT_TIMESTAMP``,```updated_at` ``varchar``(45) ``NOT` `NULL` `DEFAULT` `'CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP'``,``PRIMARY` `KEY` `(`id`)``) ENGINE=InnoDB AUTO_INCREMENT=1 ``DEFAULT` `CHARSET=utf8mb4;` `-- Stored Column` `CREATE` `TABLE` ``orders_items` (```id` ``int``(10) unsigned ``NOT` `NULL` `AUTO_INCREMENT,```order_id` ``int``(10) unsigned ``NOT` `NULL``,```product_id` ``int``(10) unsigned ``NOT` `NULL``,```product_price` ``decimal``(10,2) unsigned ``NOT` `NULL` `DEFAULT` `'0.00'``,```quantity` ``int``(10) unsigned ``NOT` `NULL` `DEFAULT` `1,```total_item_price` ``decimal``(10,2) ``AS` `(`quantity` * `product_price`) STORED,```created_at` ``timestamp` `NOT` `NULL` `DEFAULT` `CURRENT_TIMESTAMP``,```updated_at` ``varchar``(45) ``NOT` `NULL` `DEFAULT` `'CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP'``,``PRIMARY` `KEY` `(`id`)``) ENGINE=InnoDB AUTO_INCREMENT=1 ``DEFAULT` `CHARSET=utf8mb4;` |
Notice how the definition changes on line 9 and 23: you have another keyword, AS
, then an expression and specifically on line 23 you see a STORED
keyword. In both lines they are generated columns, if nothing is specified will be a VIRTUAL
column.
注意在 9 行 和 23 行的更改, 有另一个关键字, AS, 随之表达式和 STORED 关键字. 如果没有指定, 将会是一个 VIRTUAL 列
Altering a table
It uses the same syntax as adding a column, just adding the “AS (expression)” after the data type:
1 | `-- `full_name` as VIRTUAL COLUMN``ALTER` `TABLE` `users``ADD` `COLUMN` ``full_name` ``VARCHAR``(500)``AS` `(CONCAT_WS(``" "``, `first_name`, `last_name`));` `-- `total_item_price` as STORED COLUMN``ALTER` `TABLE` `orders_items``ADD` `COLUMN` ``total_item_price` ``DECIMAL``(10, 2)``AS` `(`quantity` * `product_price`) STORED;` |
JSON fields
It is also possible to extract data from JSON fields using generated columns. As the functions for JSON are built-in, JSON_EXTRACT
and JSON_UNQUOTE
as well “->” and “->>” work as expressionsfor a generated column:
1 | `-- Stored Columns``ALTER` `TABLE` ``twitter_users```ADD` `COLUMN` ``location` ``VARCHAR``(255)``AS` `(response->>``"$.location"``) STORED;` |
Final considerations
When the type is STORED
, it must be specified after the expression otherwise the default behaviour will be to be VIRTUAL
.
Generated columns can have indexes created as the following, no matter if stored, virtual or extracted from a JSON field:
1 | `ALTER` `TABLE` `users``ADD` `INDEX` ``ix_full_name` (`full_name`);` |
Which is the same syntax for normal columns.