15.10 InnoDB Row Formats
The row format of a table determines how its rows are physically stored, which in turn can affect the performance of queries and DML operations. As more rows fit into a single disk page, queries and index lookups can work faster, less cache memory is required in the buffer pool, and less I/O is required to write out updated values.
表的行格式化确定了如何物理存储方式, 反过来也会影响查询的性能和 DML 的操作.
更多的行放在单个磁盘页上, 查询和索引检索的速度会更快, 缓存池需要的空间也更少, 写出更新数据的 I/O 操作也会减少
The data in each table is divided into pages. The pages that make up each table are arranged in a tree data structure called a B-tree index. Table data and secondary indexes both use this type of structure. The B-tree index that represents an entire table is known as the clustered index, which is organized according to the primary key columns. The nodes of a clustered index data structure contain the values of all columns in the row. The nodes of a secondary index structure contain the values of index columns and primary key columns.
表的数据被分散到多个页中, 组成表的页的范围被限制在 B-tree 索引的树结构中, 表的数据和次级索引都使用这种结构
表示整个表的 B-tree 索引被称作聚簇索引, 根据主键列构造. 聚簇索引的节点数据结构包含行中的所有列, 而次级索引机构包含索引列和主键列(PS. 类似二级指针)
Variable-length columns are an exception to the rule that column values are stored in B-tree index nodes. Variable-length columns that are too long to fit on a B-tree page are stored on separately allocated disk pages called overflow pages. Such columns are referred to as off-page columns. The values of off-page columns are stored in singly-linked lists of overflow pages, with each such column having its own list of one or more overflow pages. Depending on column length, all or a prefix of variable-length column values are stored in the B-tree to avoid wasting storage and having to read a separate page.
可变长度列是列值存储在 B-tree 索引节点规则的一种特殊情况, 可变长度列太长, 无法存储在单个 B-tree 也中, 可变长度列存储在单独分配的磁盘页中, 这些页叫做溢出页, 这些列也叫作页外列
页外列以单链表形式存储在溢出页中, 每个页都有一个或多个溢出页的列表
取决于列的长度, 整个/前缀部分 变长列可以存储在 B-tree 中, 避免存储消耗和读取额外的页
The InnoDB
storage engine supports four row formats: REDUNDANT
, COMPACT
, DYNAMIC
, and COMPRESSED
.
InnoDB 存储引擎支持 4 种行格式化: REDUNDANT, COMPACT, DYNAMIC 和 COMPRESSED
Table 15.16 InnoDB Row Format Overview
Row Format | Compact Storage Characteristics | Enhanced Variable-Length Column Storage | Large Index Key Prefix Support | Compression Support | Supported Tablespace Types |
---|---|---|---|---|---|
REDUNDANT |
No | No | No | No | system, file-per-table, general |
COMPACT |
Yes | No | No | No | system, file-per-table, general |
DYNAMIC |
Yes | Yes | Yes | No | system, file-per-table, general |
COMPRESSED |
Yes | Yes | Yes | Yes | file-per-table, general |
The topics that follow describe row format storage characteristics and how to define and determine the row format of a table.
- REDUNDANT Row Format
- COMPACT Row Format
- DYNAMIC Row Format
- COMPRESSED Row Format
- Defining the Row Format of a Table
- Determining the Row Format of a Table
REDUNDANT Row Format
Tables that use the REDUNDANT
row format store the first 768 bytes of variable-length column values (VARCHAR
, VARBINARY
, and BLOB
and TEXT
types) in the index record within the B-tree node, with the remainder stored on overflow pages. Fixed-length columns greater than or equal to 768 bytes are encoded as variable-length columns, which can be stored off-page. For example, a CHAR(255)
column can exceed 768 bytes if the maximum byte length of the character set is greater than 3, as it is with utf8mb4
.
使用 REDUNDANT 行格式化在 B-tree 节点记录中只存储变长列值的前 768 字节(VARCHAR, VARBINARY, BLOB, TEXT 类型), 剩下的存储在溢出页中, 固定长度但 >= 768 字节的列被按照变长列编码, 存储到溢出页中
If the value of a column is 768 bytes or less, an overflow page is not used, and some savings in I/O may result, since the value is stored entirely in the B-tree node. This works well for relatively short BLOB
column values, but may cause B-tree nodes to fill with data rather than key values, reducing their efficiency. Tables with many BLOB
columns could cause B-tree nodes to become too full, and contain too few rows, making the entire index less efficient than if rows were shorter or column values were stored off-page.
如果列的值小于 768 字节, 溢出页不会被用到, 会节省一些 I/O 操作, 因为值完整存储在 B-tree 索引中(PS. 现在我中文的语序都有点乱了, 因为被英文语法的顺序影响了 = =)
这在相对小的 BLOB 列上表现得很好, 但是可能会导致 B-tree 节点被数据填满, 而不是键值, 效率被减少了.
具有太多 BLOB 列的表可能会导致 B-tree 过于充实, 包含的行过于少. 使整个索引的效率低于较短的列或者列存储在溢出页
REDUNDANT Row Format Storage Characteristics
The REDUNDANT
row format has the following storage characteristics:
Each index record contains a 6-byte header. The header is used to link together consecutive records, and for row-level locking.
每个索引记录包含 6 字节的头部, 用于链接连续的记录和行级锁
Records in the clustered index contain fields for all user-defined columns. In addition, there is a 6-byte transaction ID field and a 7-byte roll pointer field.
在聚簇索引记录中包含了所有用户定义的列, 除此之外, 还有 6 字节的事务ID和 7 字节的回滚指针
If no primary key is defined for a table, each clustered index record also contains a 6-byte row ID field.
如果没有为表定义一个主键, 每个聚簇索引记录还包含 6 字节的行 ID 字段
Each secondary index record contains all the primary key columns defined for the clustered index key that are not in the secondary index.
每个次级记录包含为聚簇索引定义的不在次级索引中的所有主键列
A record contains a pointer to each field of the record. If the total length of the fields in a record is less than 128 bytes, the pointer is one byte; otherwise, two bytes. The array of pointers is called the record directory. The area where the pointers point is the data part of the record.
记录包含指向记录每个字段的指针, 如果记录中所有字段长度总和小于 128 字节, 指针大小为 1 字节, 否则为 2 字节 (PS. 这里的 pointer 可能不是指针的意思, 而是类似指示的含义, 它可能是偏移 offset)
指针数组被称为记录目录, 指针指向记录的数据部分
Internally, fixed-length character columns such as
CHAR(10)
in stored in fixed-length format. Trailing spaces are not truncated fromVARCHAR
columns.固定长度的列, 例如 CHAR(10) 以固定长度格式化, VARCHAR 结尾空白不会被截断
Fixed-length columns greater than or equal to 768 bytes are encoded as variable-length columns, which can be stored off-page. For example, a
CHAR(255)
column can exceed 768 bytes if the maximum byte length of the character set is greater than 3, as it is withutf8mb4
.长度 >= 768 字节的列即使是固定的, 也会被当做变长宽度列编码, 存储到溢出页
比如, 一个 CHAR(255) 的列如果字符集的长度超过3, 那么就超过了768 字节, 正如 utf8mb4
An SQL
NULL
value reserves one or two bytes in the record directory. An SQLNULL
value reserves zero bytes in the data part of the record if stored in a variable-length column. For a fixed-length column, the fixed length of the column is reserved in the data part of the record. Reserving fixed space forNULL
values permits columns to be updated in place fromNULL
to non-NULL
values without causing index page fragmentation.SQL NULL 值在记录目录中占有 1 或 2 个字节, 如果存储在变长列中, 在记录的数据部分不占空间.
对于一个固定长度的列, 存储在记录的数据部分, 为 NULL 值保留固定的长度使列的更新可以就地发生, 而不会导致索引页碎片
COMPACT Row Format
Tables that use the COMPACT
row format store the first 768 bytes of variable-length column values (VARCHAR
, VARBINARY
, and BLOB
and TEXT
types) in the index record within the B-tree node, with the remainder stored on overflow pages. Fixed-length columns greater than or equal to 768 bytes are encoded as variable-length columns, which can be stored off-page. For example, a CHAR(255)
column can exceed 768 bytes if the maximum byte length of the character set is greater than 3, as it is with utf8mb4
.
(PS. 和 REDUNDANT 一样的 = =)
If the value of a column is 768 bytes or less, an overflow page is not used, and some savings in I/O may result, since the value is stored entirely in the B-tree node. This works well for relatively short BLOB
column values, but may cause B-tree nodes to fill with data rather than key values, reducing their efficiency. Tables with many BLOB
columns could cause B-tree nodes to become too full, and contain too few rows, making the entire index less efficient than if rows were shorter or column values were stored off-page.
(PS. 怎么还是一样的 = = )
COMPACT Row Format Storage Characteristics
The COMPACT
row format has the following storage characteristics:
Each index record contains a 5-byte header that may be preceded by a variable-length header. The header is used to link together consecutive records, and for row-level locking.
每个索引记录包含 5 字节的头部, 前面可能是变长头部. 头部用于链接记录和行锁
The variable-length part of the record header contains a bit vector for indicating
NULL
columns. If the number of columns in the index that can beNULL
is N, the bit vector occupiesCEILING(*N*/8)
bytes. (For example, if there are anywhere from 9 to 16 columns that can beNULL
, the bit vector uses two bytes.) Columns that areNULL
do not occupy space other than the bit in this vector. The variable-length part of the header also contains the lengths of variable-length columns. Each length takes one or two bytes, depending on the maximum length of the column. If all columns in the index areNOT NULL
and have a fixed length, the record header has no variable-length part.变长列记录头部包含空列的位向量 NULL 列, 如果索引中可以是 NULL 的列的数量为 N, 向量占用 CEILING(N/8)字节(例如, 如果有 9 到 16 列可以为空, 向量使用 2 字节) NULL 列不占用空间, 只占用向量中的位
头部的变长部分也包含变长列的长度, 占用 1 到 2 个字节, 取决于列的最大长度, 如果索引中的所有列非空, 并且有一个固定的长度, 那么记录头部不会有变长长度部分
For each non-
NULL
variable-length field, the record header contains the length of the column in one or two bytes. Two bytes are only needed if part of the column is stored externally in overflow pages or the maximum length exceeds 255 bytes and the actual length exceeds 127 bytes. For an externally stored column, the 2-byte length indicates the length of the internally stored part plus the 20-byte pointer to the externally stored part. The internal part is 768 bytes, so the length is 768+20. The 20-byte pointer stores the true length of the column.对于每个非空变长字段, 记录头部包含列的长度, 1 ~ 2 字节. 2 字节仅在列的部分存储在溢出页, 或最大长度超过 255 字节, 并且真实长度超过 127 字节
对于一个外部存储的列, 这 2 字节长度代表内部存储部分 + 20 字节的指向外部存储部分的指针
内部部分是 768 字节, 长度是 768 + 20, 这 20 字节指针存储列的真实长度 (PS. 这里有点不对)
The record header is followed by the data contents of non-
NULL
columns.记录头后跟着非空列的数据内容
Records in the clustered index contain fields for all user-defined columns. In addition, there is a 6-byte transaction ID field and a 7-byte roll pointer field.
聚簇索引记录包含所有用户定义列, 同时有 6 字节的事务 ID 和 7 字节的回滚 ID
If no primary key is defined for a table, each clustered index record also contains a 6-byte row ID field.
如果没有为表定义主键, 每个聚簇索引还包含 6 字节的行 ID
Each secondary index record contains all the primary key columns defined for the clustered index key that are not in the secondary index. If any of the primary key columns are variable length, the record header for each secondary index has a variable-length part to record their lengths, even if the secondary index is defined on fixed-length columns.
每个次级索引记录包含所有为聚簇索引定义的不在次级索引中的主键, 如果有主键列是变长的, 每个次级索引的头部有一个变长部分记录次级索引的长度, 即使次级索引是固定的列
Internally, for nonvariable-length character sets, fixed-length character columns such as
CHAR(10)
are stored in a fixed-length format.Trailing spaces are not truncated from
VARCHAR
columns.对非变长字符集, 像 CHAR(10) 这样的固定字符集列以固定长度格式化
VARCHAR 列尾部空白不会被截断
Internally, for variable-length character sets such as
utf8mb3
andutf8mb4
,InnoDB
attempts to storeCHAR(*N*)
in N bytes by trimming trailing spaces. If the byte length of aCHAR(*N*)
column value exceeds N bytes, trailing spaces are trimmed to a minimum of the column value byte length. The maximum length of aCHAR(*N*)
column is the maximum character byte length × N.内部, 如 utf8mb3 和 utf8mb4 变长字符集, InnoDB 通过裁剪尾随的空格将 CHAR(N) 存储 N 字节
如果 CHAR(N) 列长度超过 N 字节, 尾端空白被裁剪到列值字节数的最小值, CHAR(N) 列的最大长度是最大字符字节宽度 x N
A minimum of N bytes is reserved for
CHAR(*N*)
. Reserving the minimum space N in many cases enables column updates to be done in place without causing index page fragmentation. By comparison,CHAR(*N*)
columns occupy the maximum character byte length × N when using theREDUNDANT
row format.为 CHAR(N) 存储一个最小 N 字节空间, 存储这个最小的 N 空间在很多情况下是列值的更新就地发生, 不导致页碎片. 对比而言, 当使用 REDUNDANT 行格式化时, CHAR(N) 列占有最大字符长度 x N
Fixed-length columns greater than or equal to 768 bytes are encoded as variable-length fields, which can be stored off-page. For example, a
CHAR(255)
column can exceed 768 bytes if the maximum byte length of the character set is greater than 3, as it is withutf8mb4
.固定长度大于或等于 768 字节的列以变长字段编码, 存储在页外, … (PS. emm… 好像是一样的了)
(PS. REDUNDANT 和 COMPACT 的区别主要有两个 : 1. COMPACT 不保存固定字段的长度 2. COMPACT 会裁剪尾随的空白字符. 所以 REDUNDANT 叫 REDUNDANT, COMPACT 叫 COMPACT, REDUNDANT 牺牲了空间换取效率)
DYNAMIC Row Format
When a table is created with ROW_FORMAT=DYNAMIC
, InnoDB
can store long variable-length column values (for VARCHAR
, VARBINARY
, and BLOB
and TEXT
types) fully off-page, with the clustered index record containing only a 20-byte pointer to the overflow page. Fixed-length fields greater than or equal to 768 bytes are encoded as variable-length fields. For example, a CHAR(255)
column can exceed 768 bytes if the maximum byte length of the character set is greater than 3, as it is with utf8mb4
.
当表被创建为 DYNAMIC 类型时, InnoDB 能将长的变长列完全存储在页外, 聚簇索引只包含 20 字节的指向溢出页的指针. 固定长度 >= 768 的字段按照变长字段存储, 比如 … (PS. 一样的, 就不比如了… = =)
Whether columns are stored off-page depends on the page size and the total size of the row. When a row is too long, the longest columns are chosen for off-page storage until the clustered index record fits on the B-tree page. TEXT
and BLOB
columns that are less than or equal to 40 bytes are stored in line.
列是否存储在页外取决于页的大小和行的总大小.
当行太长时, 最长的列被选为页外存储, 直到聚簇索引记录大小适合保存在 B-tree 页, TEXT 和 BLOB 列 <= 40 字节按行存储
The DYNAMIC
row format maintains the efficiency of storing the entire row in the index node if it fits (as do the COMPACT
and REDUNDANT
formats), but the DYNAMIC
row format avoids the problem of filling B-tree nodes with a large number of data bytes of long columns. The DYNAMIC
row format is based on the idea that if a portion of a long data value is stored off-page, it is usually most efficient to store the entire value off-page. With DYNAMIC
format, shorter columns are likely to remain in the B-tree node, minimizing the number of overflow pages required for a given row.
DYNAMIC 行存储维护在索引节点中存储整行(如果大小匹配的话)的效率(就像 COMPACT 和 REDUNDANT 一样), 但是 DYNAMIC 避免了 B-tree 被大量长列填满的问题
DYNAMIC 行存储基于一部分数据存储在页外的想法, 通常最有效的方法是整个值都存储在页外
使用 DYNAMIC 存储, 较短的行更可能保存在 B-tree 节点, 最小化行所需要的溢出页数量
The DYNAMIC
row format supports index key prefixes up to 3072 bytes.
DYNAMIC 行存储支持索引键前缀, 最大可达 3072 字节 (PS. = = 这么大的么…) (需要设置 innodb_large_prefix=1)
Tables that use the DYNAMIC
row format can be stored in the system tablespace, file-per-table tablespaces, and general tablespaces. To store DYNAMIC
tables in the system tablespace, either disable innodb_file_per_table
and use a regular CREATE TABLE
or ALTER TABLE
statement, or use the TABLESPACE [=] innodb_system
table option with CREATE TABLE
or ALTER TABLE
. The innodb_file_per_table
variable is not applicable to general tablespaces, nor is it applicable when using the TABLESPACE [=] innodb_system
table option to store DYNAMIC
tables in the system tablespace.
使用 DYNAMIC 存储的行能保存在 system tablespace, file-per-table tablespace, 以及 general tablespace 中.
将 DYNAMIC 表存储在 system tablespace 要么取消 innodb_file_per_table 以及使用常规的 CREATE TABLE 或 ALTER TABLE 语句, 或者在 CREATE TABLE 或 ALTER TABLE 时使用 TABLESPACE [=] innodb_system 表选项.
innodb_file_per_table 变量不适用于 general tablespace, 也不适用于使用 TABLESPACE [=] innodb_system 表选项去在 system tablespace 中存储 DYNAMIC 表
DYNAMIC Row Format Storage Characteristics
The DYNAMIC
row format is a variation of the COMPACT
row format. For storage characteristics, see COMPACT Row Format Storage Characteristics.
DYNAMIC 行存储是 COMPACT 行存储的一种变化
COMPRESSED Row Format
The COMPRESSED
row format uses similar internal details for off-page storage as the DYNAMIC
row format, with additional storage and performance considerations from the table and index data being compressed and using smaller page sizes. With the COMPRESSED
row format, the KEY_BLOCK_SIZE
option controls how much column data is stored in the clustered index, and how much is placed on overflow pages. For more information about the COMPRESSED
row format, see Section 15.9, “InnoDB Table and Page Compression”.
COMPRESSED 行存储使用和 DYNAMIC 类似的内部细节 : 页外存储, 压缩表和索引数据以使用更少的页, 考虑额外的存储和性能.
使用 COMPRESSED 行存储, KEY_BLOCK_SIZE 选项控制多少列数据存储在聚簇索引, 多少存储在溢出页.
The COMPRESSED
row format supports index key prefixes up to 3072 bytes.
COMPRESSED 行存储支持索引键前缀, 最大可达 3072 字节
Tables that use the COMPRESSED
row format can be created in file-per-table tablespaces or general tablespaces. The system tablespace does not support the COMPRESSED
row format. To store a COMPRESSED
table in a file-per-table tablespace, the innodb_file_per_table
variable must be enabled. The innodb_file_per_table
variable is not applicable to general tablespaces. General tablespaces support all row formats with the caveat that compressed and uncompressed tables cannot coexist in the same general tablespace due to different physical page sizes. For more information, see Section 15.6.3.3, “General Tablespaces”.
Compressed Row Format Storage Characteristics
The COMPRESSED
row format is a variation of the COMPACT
row format. For storage characteristics, see COMPACT Row Format Storage Characteristics.
Defining the Row Format of a Table
The default row format for InnoDB
tables is defined by innodb_default_row_format
variable, which has a default value of DYNAMIC
. The default row format is used when the ROW_FORMAT
table option is not defined explicitly or when ROW_FORMAT=DEFAULT
is specified.
innodb_default_row_format 控制默认创建的表行存储类型 (dynamic) , 当表选项 ROW_FORMAT 没有显式指明时才会使用默认配置
The row format of a table can be defined explicitly using the ROW_FORMAT
table option in a CREATE TABLE
or ALTER TABLE
statement. For example:
1 | CREATE TABLE t1 (c1 INT) ROW_FORMAT=DYNAMIC; |
An explicitly defined ROW_FORMAT
setting overrides the default row format. Specifying ROW_FORMAT=DEFAULT
is equivalent to using the implicit default.
The innodb_default_row_format
variable can be set dynamically:
1 | mysql> SET GLOBAL innodb_default_row_format=DYNAMIC; |
Valid innodb_default_row_format
options include DYNAMIC
, COMPACT
, and REDUNDANT
. The COMPRESSED
row format, which is not supported for use in the system tablespace, cannot be defined as the default. It can only be specified explicitly in a CREATE TABLE
or ALTER TABLE
statement. Attempting to set theinnodb_default_row_format
variable to COMPRESSED
returns an error:
COMPRESSED 不能用于默认设置, 只能显式指定
1 | mysql> SET GLOBAL innodb_default_row_format=COMPRESSED; |
Newly created tables use the row format defined by the innodb_default_row_format
variable when a ROW_FORMAT
option is not specified explicitly, or whenROW_FORMAT=DEFAULT
is used. For example, the following CREATE TABLE
statements use the row format defined by the innodb_default_row_format
variable.
1 | CREATE TABLE t1 (c1 INT); |
When a ROW_FORMAT
option is not specified explicitly, or when ROW_FORMAT=DEFAULT
is used, an operation that rebuilds a table silently changes the row format of the table to the format defined by the innodb_default_row_format
variable.
Table-rebuilding operations include ALTER TABLE
operations that use ALGORITHM=COPY
or ALGORITHM=INPLACE
where table rebuilding is required. See Section 15.12.1, “Online DDL Operations” for more information. OPTIMIZE TABLE
is also a table-rebuilding operation.
The following example demonstrates a table-rebuilding operation that silently changes the row format of a table created without an explicitly defined row format.
1 | mysql> SELECT @@innodb_default_row_format; |
(PS. 这种情况要格外注意, 更改表操作相当于重新创建, 之前设置的一些默认选项可能已经被改变了)
Consider the following potential issues before changing the row format of existing tables from REDUNDANT
or COMPACT
to DYNAMIC
.
The
REDUNDANT
andCOMPACT
row formats support a maximum index key prefix length of 767 bytes whereasDYNAMIC
andCOMPRESSED
row formats support an index key prefix length of 3072 bytes. In a replication environment, if theinnodb_default_row_format
variable is set toDYNAMIC
on the master, and set toCOMPACT
on the slave, the following DDL statement, which does not explicitly define a row format, succeeds on the master but fails on the slave:REDUNDANT 和 COMPACT 行存储支持最大 767 字节的索引键前缀, 然而 DYNAMIC 和 COMPRESSED 行存储支持的长度可达 3072 字节.
在同样的环境下, 如果 innodb_default_row_format 在主环境下设置为 DYNAMIC, 而在次环境下设置为 COMPACT, 下列没有显式定义行存储的语句会在主环境下成功, 而次环境会失败
1
CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 VARCHAR(5000), KEY i1(c2(3070)));
For related information, see Section 15.6.1.6, “Limits on InnoDB Tables”.
(PS. 原因是 REDUNDANT 和 COMPACT 行存储限制了索引前缀必须低于 768 字节
这里还要加限制, 就是 c2(3070) 真正内存 < 3072 )
Importing a table that does not explicitly define a row format results in a schema mismatch error if the
innodb_default_row_format
setting on the source server differs from the setting on the destination server. For more information, refer to the limitations outlined in Section 15.6.3.7, “Copying Tablespaces to Another Instance”.导入一个为显式指定行存储的表时, 被导入表的行存储和默认存储设置不一致会产生错误
Determining the Row Format of a Table
To determine the row format of a table, use SHOW TABLE STATUS
:
1 | mysql> SHOW TABLE STATUS IN test1\G |
Alternatively, query the INFORMATION_SCHEMA.INNODB_TABLES
table:
1 | mysql> SELECT NAME, ROW_FORMAT FROM INFORMATION_SCHEMA.INNODB_TABLES WHERE NAME='test1/t1'; |