数据库 MySQL 联合索引 (a,b) 的一些困惑

ibugs · August 21, 2015 · Last by ibugs replied at August 26, 2015 · 8968 hits

疑问：

MySQL 建立 (a,b) 的索引，理论上，只有 a, (a,b) 两种情况能用到索引，但是其他情况也能用到索引么？下面就为你一一测试

数据库版本

/usr/local/bin/mysql  Ver 14.14 Distrib 5.6.16, for osx10.9 (x86_64) using  EditLine wrapper

一、表结构

CREATE TABLE `cd_happy_for_ni_deals` (
  `id` int(11) NOT NULL DEFAULT '0',
  `update_time` datetime DEFAULT NULL COMMENT '更新时间',
  `publish_status` int(11) NOT NULL DEFAULT '4' COMMENT '发布状态',
  KEY `idx_of_publish_status_update_time` (`publish_status`,`update_time`,`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

二、唯一性基数

mysql> select count(distinct(update_time)) from cd_happy_for_ni_deals;
+------------------------------+
| count(distinct(update_time)) |
+------------------------------+
|                      1845933 |
+------------------------------+
1 row in set (4.68 sec)

mysql> select count(distinct(publish_status)) from cd_happy_for_ni_deals;
+---------------------------------+
| count(distinct(publish_status)) |
+---------------------------------+
|                               2 |
+---------------------------------+
1 row in set (1.76 sec)

mysql> select count(id) from cd_happy_for_ni_deals;
+-----------+
| count(id) |
+-----------+
|   1907609 |
+-----------+
1 row in set (0.00 sec)

update_time 的选择性：1845933 / 1907609.to_f = 0.9676684268107353 接近 1

publish_status 的选择性：2 / 1907609.to_f = 1.0484328811617055e-06 接近 0

三、建立 (a,b) 索引，分别根据 a 查询，b 查询，(a,b) 查询，(b,a) 查询，统计结果

不走寻常路，我就偏选择选择性低的做索引的第一位。

创建索引

mysql> alter table cd_happy_for_ni_deals add index `idx_of_publish_status_update_time` (`publish_status`, `update_time`, `id`);
Query OK, 0 rows affected (14.69 sec)
Records: 0  Duplicates: 0  Warnings: 0

根据 a 查询

mysql> explain select SQL_NO_CACHE id, publish_status from cd_happy_for_ni_deals where publish_status = 4 \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: cd_happy_for_ni_deals
         type: ref
possible_keys: idx_of_publish_status_update_time
          key: idx_of_publish_status_update_time
      key_len: 4
          ref: const
         rows: 964056 <- 只查询publish_status 的情况
        Extra: Using index
1 row in set (0.00 sec)

查询时间：

mysql> select SQL_NO_CACHE count(id) from cd_happy_for_ni_deals where publish_status = 4 \G;
*************************** 1. row ***************************
count(id): 1858081
1 row in set (0.69 sec)

理论上可以用到索引 (a,b) 中的 a 部分。

根据 b 查询

mysql> explain select SQL_NO_CACHE id, publish_status from cd_happy_for_ni_deals where update_time = '2014-05-17 23:00:48' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: cd_happy_for_ni_deals
         type: index
possible_keys: NULL
          key: idx_of_publish_status_update_time
      key_len: 17
          ref: NULL
         rows: 1928113 <- 只查询update_time 的情况
        Extra: Using where; Using index
1 row in set (0.01 sec)

查询时间：

mysql> select SQL_NO_CACHE count(id) from cd_happy_for_ni_deals where update_time = '2014-05-17 23:00:48' \G;
*************************** 1. row ***************************
count(id): 1
1 row in set (1.06 sec)

查询 b 的时候，理论上用不到索引的。为啥这里？？？

根据 (a,b) 查询

mysql> explain select SQL_NO_CACHE id, publish_status from cd_happy_for_ni_deals where publish_status = 4 and update_time = '2014-05-17 23:00:48' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: cd_happy_for_ni_deals
         type: ref
possible_keys: idx_of_publish_status_update_time
          key: idx_of_publish_status_update_time
      key_len: 13
          ref: const,const
         rows: 1
        Extra: Using where; Using index
1 row in set (0.01 sec)

查询时间：

mysql> select SQL_NO_CACHE count(id) from cd_happy_for_ni_deals where publish_status = 4 and update_time = '2014-05-17 23:00:48' \G;
*************************** 1. row ***************************
count(id): 1
1 row in set (0.00 sec)

符合理论上的预期。

根据 (b,a) 查询

mysql> explain select SQL_NO_CACHE id, publish_status from cd_happy_for_ni_deals where  update_time = '2014-05-17 23:00:48' and publish_status = 4 \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: cd_happy_for_ni_deals
         type: ref
possible_keys: idx_of_publish_status_update_time
          key: idx_of_publish_status_update_time
      key_len: 13
          ref: const,const
         rows: 1
        Extra: Using where; Using index
1 row in set (0.00 sec)

查询时间：

mysql> select SQL_NO_CACHE count(id) from cd_happy_for_ni_deals where  update_time = '2014-05-17 23:00:48' and publish_status = 4 \G;
*************************** 1. row ***************************
count(id): 1
1 row in set (0.00 sec)

理论上，这里只能用到（a,b）中的 a 部分，为啥也这么快？？

欢迎群里的小伙伴热烈讨论

暂时的结论：

1、理论上索引对顺序是敏感的，但是由于 MySQL 的查询优化器会自动调整 where 子句的条件顺序以使用适合的索引。

2、将选择性高的列放在索引的最前列。根据场景的不同，这条经验法则并不是完全准确的。在某些场景下，可能需要根据运行频率最高的查询来调整索引列的顺序。

参考

http://www.programering.com/a/MTMwAzMwATM.html

3 likes

sun1752709589 #1 August 21, 2015

那个「根据 b 查询」得，我感觉它是用的索引覆盖，其实就是给你走了遍索引，所以那个 keys 列会说用到了那个索引，况且 extra 列也告诉你说 using index 了，

ibugs #2 August 21, 2015

@sun1752709589 有道理，extra 显示很明显使用到了索引。那就推翻了之前说到的，索引 (a,b) 其中 b 是用不到索引的这种说法。这种理论网上一搜一大把这种。

sun1752709589 #3 August 21, 2015

#2 楼 @ibugs 我认为这种的话不算用到了索引，如果你在查询中再加入其他不在索引中的列，比如*，应该就不会出现用到索引的情况了，不知道说的对不对，楼主试下可以。索引最重要的作用就是快速定位某段符合筛选规则的数据，至于索引覆盖这东西，可以认为是附加的好处

oxffff #4 August 21, 2015

我很想探讨这个问题。我想先来搞清楚两点：

为什么用 select count(1) 来代表平均查询时间？
" ……查询 b 的时候，理论上用不到索引的。为啥这里？？？ ……" 这里是有三个疑问号，想问啥？

ibugs #5 August 25, 2015

@oxffff 第一个问题，平均查询时间就是个近似值，一个查询肯定是不能代表的。不过 MySQL 同一个查询，消耗的时间很相近。第二个查询，请戳这里 http://segmentfault.com/q/1010000000342176 ，这里的疑问是，理论上用不到索引，但为啥这里用到了索引？难道是 MySQL 做了什么优化处理么？

msg7086 #6 August 25, 2015

#5 楼 @ibugs 先回答下第一个问题吧。为什么你认为日常的请求都是和 count(1) 相近的请求？

1 likes

ibugs #7 August 25, 2015

@msg7086 我没有说，认为日常的请求都是和 count(1) 相近呀。文章中我只是表明了一种情况。为了避免这个疑惑，我还是将文章中的 平均查询时间 改成 查询时间。

oxffff #8 August 25, 2015

#6 楼 @msg7086 对对。性能评判的标准很重要哦，我们需要找一个科学点的评判标准。用 count(1) 的做法，我找不到可以认为它是平均时间的依据。 @ibugs 。

我以前除了看 explain 之外，还会用 profile 。虽说只是单次的查询，没办法得出平均时间。但是起码是针对指定的语句做的 profile。用法在这里： https://dev.mysql.com/doc/refman/5.5/en/show-profile.html

ibugs #9 August 25, 2015

@msg7086 @oxffff count(1) 单次查询肯定是不能代表平均查询时间的，文章中的只能表明单次的查询时间。感谢两位小伙伴，让我更加严谨。

ibugs #10 August 25, 2015

@msg7086 @oxffff

根据 b 查询

居然能用到索引，这点二位有什么想说的没有？好多文章中指出 (a,b) 建索引，单对 b 进行查询，是用不到索引的。

oxffff #11 August 25, 2015

#10 楼 @ibugs

你举的这个例子：

mysql> explain select SQL_NO_CACHE id, publish_status from cd_happy_for_ni_deals where update_time = '2014-05-17 23:00:48' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: cd_happy_for_ni_deals
         type: index
possible_keys: NULL
          key: idx_of_publish_status_update_time
      key_len: 17
          ref: NULL
         rows: 1928113 <- 只查询update_time 的情况
        Extra: Using where; Using index
1 row in set (0.01 sec)

我猜，你这个表的总行数就是 1928113。

这是一个全表查询，并没有用到索引。可别看它的 type 列为 index ，其实和 ALL 差不多。参考：https://dev.mysql.com/doc/refman/5.1/en/explain-output.html#jointype_index

possible_keys: NULL 表示确无索引可以使用。参考：https://dev.mysql.com/doc/refman/5.1/en/explain-output.html#explain_possible_keys

msg7086 #12 August 25, 2015

#9 楼 @ibugs 你说的并不正确。

count(1) 单次查询肯定是不能代表平均查询时间的

实际是，count(1) 单次查询是根本不能代表查询时间的。因为平日里很少很少会去查 count(1)。你用 count(1) 来做性能评判标准毫无意义。你自己想想，你写的应用里，用过多少次 count(1)？

至于你关于查询 b 的疑问，我为你谷歌了一下，解释如下：（其实 #11 楼已经贴了，我补全一下

possible_keys 表示搜索可能用到的索引，key 表示实际选用的索引。有一种可能是 possible_keys 为空，但是 key 不为空的情况。这种情况下说明 mysql 无法利用索引来搜索数据，但是返回的列却是某个索引的一部分，因此可以用覆盖索引的方式优化全表扫描。这里你 idx 包含了 p_s 和 u_t，而返回的列是 id 和 p_s，乍一眼看上去并不包含。然而后面还有解释一句：如果你恰好用的是 InnoDB，那么索引会默认包含主键，也就是说你的索引实际上就是一个 id+p_s+u_t 的索引，完全包含了你的返回值 id+p_s，因此适用于覆盖索引的条件，也就是为什么最后 mysql 会选这个索引的原因了。

希望能帮到你。 http://dev.mysql.com/doc/refman/5.6/en/explain-output.html#explain_key

1 likes

ibugs #13 August 26, 2015

@oxffff @msg7086 谢两位仁兄，我参照，https://dev.mysql.com/doc/ 详细研究下。

You need to Sign in before reply, if you don't have an account, please Sign up first.

3 likes

Total 13 replies

New Reply comming, click to load.