read/Choosing_the_best_indexes_for_MySQL_query_optimization

原文 : https://www.eversql.com/choosing-the-best-indexes-for-mysql-query-optimization/

Which indexes should I create for an SQL query?

我应该为SQL查询创建哪些索引?

As a general rule of thumb, MySQL can only use one index for each table in the query. Therefore, there is no point in creating more than one index for each query. Preferably, same indexes should match as many of the queries as possible, as it will reduce the load on the database when inserting or updating data (which requires updating the indexes as well).

按照一般的经验法则, MySQL查询中的每个表只能使用一个索引

(High Performance MySQL 中澄清了, 有可能会组合, 但最好不要依赖于它)

所以, 为每个查询创建多个索引是没有意义的, 更恰当的情况是, 同样的索引匹配尽可能多的查询

因为这将会在插入或更改数据时, 减少数据库的负载(该操作同样也需要更新索引)

When creating an index, the most important parts are the equality conditions in the WHERE and JOIN conditions. In most cases, conditions such as name = ‘John’ will allow the database to filter most of the rows from the table and go through a small amount of rows to return the required results. Therefore, we should start indexing by adding these columns to the index.

当创建索引时, 最重要的是在 WHERE 和 JOIN 中的等式条件

在大多数情况下, 像 name = ‘John’ 这样的条件会使数据区从表中筛选行, 通过少量的行返回需要的数据

我们应该通过增加这些列到索引中来开始索引

Then, you should look into the range conditions, but you should only add one of them – the most selective one, as MySQL can’t handle more of them. In some cases when there are no range conditions, it makes sense to add the GROUP BY / ORDER BY columns, assuming the ordering is done in only one direction (ASC / DESC).

然后, 你应该关注范围条件, 但是你只能增加其中一个, 可选性最大的一个, 因为MySQL不能处理更多

(PS. 关于 selectivity, High Performance MySQL 5.3 节中 Choosing a Good Column Order 中有介绍)

(简单来理解的话, 就是最具可选性, 最能分辨的列)

在某些情况下没有范围条件, 增加 GROUP BY/ORDER BY 列, 假设顺序由其中一个方向确定(ASC/DESC)

In some cases, it also makes sense to create a separate index that contains the ORDER BY clause’s columns, as MySQL sometimes chooses to use it. Please note though that for this to happen, the index should contain all columns from the ORDER BY clause and they should all be specified with the same order (ASC / DESC). This doesn’t guarantee that the database’s optimizer will pick this index rather than the other compound indexes, but it’s worth a try.

有时候, 也可以创建一个单独的索引包含 ORDER BY 子句的列, 因为 MySQL 有时会选择使用它

请记住, 为了实现这一点, 索引应该包含 ORDER BY 子句中所有的列, 并且被指定为相同的顺序(ASC/DESC)

这不保证数据库的优化是否会选择这个, 而不是其他组合索引, 但是值得一试

Also, in some cases, it makes sense to also add the columns from the SELECT clause to the index, to have a complete covering index. This only relevant if the index isn’t already ‘too large’. What’s too large? Well, no official rule of thumb here, but let’s say… 5-7 columns? Creating a covering index allows the database to not only filter using the index, but to also fetch the information required by the SELECT clause directly from the index, which saves precious I/O operations.

同样的, 有时增加 SELECT 中的列, 以拥有一个完整的覆盖索引也是有必要的, 但只当索引不长时, 才有意义

那么, 多长才是长么? emm… 没有官方规定这个, 但是估计一下… 5-7列?

创建覆盖索引使数据库不仅能过滤, 同时也能直接从索引中获取 SELECT 子句需要的信息, 以节省宝贵的 I/O 操作

Let’s look at an example to clarify: (通过一个例子理解一下)

SELECT
    id, first_name, last_name, age
FROM
    employees
WHERE
    first_name = 'John'
        AND last_name = 'Brack'
        AND age > 25
ORDER BY age ASC;

For this query, we’ll start with adding the columns first_name and last_name, which are compared with an equality operator. Then, we’ll add the age column which is compared with a range condition. No need to have the ORDER BY clause indexed here, as the age column is already in the index. Last but not least, we’ll add id from the SELECT clause to the index to have a covering index.

就这个例子, 我们应该一开始增加在 WHERE 子句中有相等比较操作的列 first_name 和 last_name

然后, 增加具有比较的 age 列. 在这里不需要有 ORDER BY 子句, 因为 age 列已经在索引中了

最后, 但并不是最不重要的. 增加在 SELECT 子句中的 id 列, 以获得一个覆盖索引

So to index this query properly, you should add the index: (为了适当地索引这个查询, 你应该增加这个索引)

1 2	employees (first_name, last_name, age, id). mysql : CREATE INDEX [INDEX_NAME] ON [TABLE_NAME] (first_name, last_name, age, id)

The above is a very simplified pseudo-algorithm that will allow you to build simple indexes for rather simple SQL queries.

上面是一个非常简单的(= = 这东西咋翻译来着… 虚拟方法? 总之就是一种方法)

让你为简单的 SQL 查询构建简单的索引

(实际情况可能会复杂一些, 参考 High Performance MySQL 第 5 章)

If you’re looking for a way to automate your index creation, while also adding the benefit of a proprietary indexing algorithm and query optimization recommendations, you can try out EverSQL Query Optimizer which does all the heavy lifting for you.

(广告 = =)

What not to do when indexing (or writing SQL queries)?

使用索引时, 不要做的事情

We gathered some of the most common mistakes we see programmers and database administrators do when writing queries and indexing their tables.

我们收集了一些当程序/DBA写查询语句和索引他们的表时, 会犯的常见错误

Indexing each and every column in the table separately

索引每一个表中每一列

In most cases, MySQL won’t be able to use more than one index for each table in the query.

Therefore, when creating a separate index for each column in the table, the database is bound to perform only one of the search operations using an index, and the rest of them will be significantly slower, as the database can’t use an index to execute them.

We recommend using compound indexes (explained later in this article) rather than single-column indexes.

通常, MySQL不能在查询语句中, 为同一张表使用超过一个索引

所以, 当为表中的每个列创建一个索引, MySQL被限制只使用其中一个索引, 而其他的会慢很多

所以数据库不能使用索引去执行它们

我们建议使用复合索引(将会在这篇文章之后介绍), 而不是使用单列索引

The OR operator in filtering conditions

过滤中的 OR 操作

Consider this query: (考虑这个查询)

SELECT
    a, b
FROM
    tbl
WHERE
    a = 3 OR b = 8;

In many cases, MySQL won’t be able to use an index to apply an OR condition, and as a result, this query is not index-able.

Therefore, we recommend to avoid such OR conditions and consider splitting the query to two parts, combined with a UNION DISTINCT (or even better, UNION ALL, in case you know there won’t be any duplicate results)

通常, MySQL不能使用一个索引执行 OR 操作, 所以, 这个索引是不可索引的

所以, 我们建议避免这样的 OR 操作, 将其切分为两个部分, 由 UNION DISTINCT 组合

(或者更好地, UNION ALL, 你知道在这个例子中不会有重复的元素)

(PS. 是的, 在这样的情况下, MySQL不会使用索引 )

The order of columns in an index is important

索引中列的顺序很重要

Let’s say I hand you my contacts phone book which is ordered by the contact’s first name and ask you to count how many people are there named “John” in the book. You’ll grab the book in both hands and say “no problem”. You will navigate to the page that holds all names starting with John, and start counting from there.

假设我把我的由联系人的姓作为顺序排序的通讯录给你, 问你在这个通讯录中, 有多少姓 “John” 的人

你会拿着这本书说”没问题”, 你会找到以 “John” 开始的书页, 然后开始计数

Now, let’s say I change the assignment and hand you a phone book that is ordered by the contact’s last name, but ask you to still count all contacts with the first name “John”. How would you approach that? Well, the database scratches his head in this situation as well.

如果我将书的顺序打乱, 以名排序, 问你同样的问题, 你如何回答这个问题?

数据库也会面临同样的麻烦

Now lets look at an SQL query to demonstrate the same behavior with the MySQL optimizer:

现在, 关注一个 SQL 语句来和 MySQL 优化器演示这个行为

Having the index contacts (first_name, last_name) is ideal here, because the index starts with our filtering condition and ends with another column in the SELECT clause.

如果有一个索引组合 (first_name, last_name) 在这里是很好的, 因为索引由 first_name 开始, 由 last_name 结束

(PS. 其实这个 High Performance MySQL 第 5 节有详细讨论)

(这里的优化的前提是, 该索引是 B-tree 类型, 如果是其他的, 比如 hash 类型, 那么可能就没有什么优化效果)

But, having the reverse index contacts (last_name, first_name) is rather useless, as the database can’t use the index for filtering, as the column we need is second in the index and not first.

如果有一个相反顺序的索引, 那么就是无用的

(因为如果是 B-tree 的话, 根本无法索引 = =)

The conclusion from this example is that the order of columns in an index is rather important.

最后点了一下题

Adding redundant indexes

增加冗余索引

Indexes are magnificent when trying to optimize your SQL queries and they can improve performance significantly.

索引在优化你的 SQL 语句和显著提升性能上很有帮助

But, they come with a downside as well. Each index you’re creating should be kept updated and in sync when changes occur in your databases. So for each INSERT / UPDATE / DELETE in your databases, all relevant indexes should be updated. This update can take sometime, especially with large tables / indexes.

但是, 它们也有一些缺点, 每个你创建的索引在你数据库变化时必须保持更新和同步

所以每个 INSERT/UPDATE/DELETE 操作都会引起相关索引的更新, 这些操作所引起的索引可能会很耗时

Therefore, do not create indexes unless you know you’ll need them.

Also, we highly recommend to analyze your database once in a while, searching for any redundant indexes that can be removed.

所以, 除非你知道你需要它们, 否则不要创建无用的索引

同样地, 我们极力推荐每个一段时间, 分析一下数据库, 删除冗余的索引

How to automate index creation and SQL query optimization?

(广告, 有兴趣可以去原页面试试)

How to track redundant indexes in MySQL?

(这是在另一个页面的片段)

如何跟踪 MySQL 中冗余的索引

Starting MySQL 5.6, the database keeps track of index usage as part of its PERFORMANCE SCHEMA. This data can be queried using the schema_unused_indexes view, which displays indexes for which there are no events. Having no events indicates that these indexes might be redundant and unused for a while.

自 MySQL 5.6 开始, 数据库持续跟踪它 PERFORMANCE SCHEMA 中未使用的索引

这些数据可以用 schema_unused_indexes 来查询

显示哪些索引没有工作过, 这表示这些索引是冗余的, 在一段时间内没有使用过

But life isn’t that good, not yet at least. The potential obstacle here is that this information is re-counted every time MySQL is restarted. Therefore, in order to get reliable information, you should query these views a while after the MySQL instance was started. How long after the startup you’re asking? Well, that depends. My question back to you will be – how busy your database is? Do you know if all types of queries are usually executed in the database in a specific period of time? If so, that’s your window.

但是生活没有那么美好, 至少现在没有. 潜在的障碍是这些信息会在每次 MySQL 重启时重新统计

所以, 为了避免这种情况, 你应该在 MySQL 启动一段时间后去查询这些数据

这时间的长短取决与你数据库的忙碌程度

So let’s take a look at how it’s done: (让我们来看一看如何完成)

select * from sys.schema_unused_indexes;

object_schema	object_name	index_name
mydb			age			agecountry_index
mydb			country		agecountry_index

(PS. 显然, 如果数据可靠, 那么 age 表中的 agecountry_index 和 country 表中的 agecountry_index 是可以删除的)