数据库 测试 DB Index 单 column index 和多 column 联合索引的疑问

betterthornbird · July 23, 2015 · Last by xiaoronglv replied at July 27, 2015 · 2574 hits

最近在改数据库 Index 的一些事情

#Table devices, :client_type, :status are strings
add_index :devices, [:client_type]
add_index :devices, [:status, :client_type], length

这个表中还有其他好几个索引,个人觉得既有 :client_type 又有 :status, :client_type 联合索引有点多余。 如果我做个调整,将第二个索引顺序变成:client_type, :status,那根据 MySQL 的最左原则,只查找:client_type也能被 hit 到。 那就可以去掉单一的在:client_type上做的索引了。顺便做了个测试。

移掉:client_type 索引之前

mysql> explain select * from devices where client_type = 'IOSDevice';
+----+-------------+---------+------+------------------------------+------------------------------+---------+-------+------+-----------------------+
| id | select_type | table   | type | possible_keys                | key                          | key_len | ref   | rows | Extra                 |
+----+-------------+---------+------+------------------------------+------------------------------+---------+-------+------+-----------------------+
|  1 | SIMPLE      | devices | ref  | index_devices_on_client_type | index_devices_on_client_type | 258     | const | 5145 | Using index condition |
+----+-------------+---------+------+------------------------------+------------------------------+---------+-------+------+-----------------------+
1 row in set (0.00 sec)

mysql> select * from devices where client_type = 'BlackBerryDevice';
553 rows in set (0.03 sec)

移掉两个索引并建立:client_type, :status之后

mysql> explain select * from devices where client_type = 'IOSDevice';
+----+-------------+---------+------+------------------------------+------------------------------+---------+-------+------+-----------------------+
| id | select_type | table   | type | possible_keys                | key                          | key_len | ref   | rows | Extra                 |
+----+-------------+---------+------+------------------------------+------------------------------+---------+-------+------+-----------------------+
|  1 | SIMPLE      | devices | ref  | index_client_type_and_status | index_client_type_and_status | 258     | const | 9711 | Using index condition |
+----+-------------+---------+------+------------------------------+------------------------------+---------+-------+------+-----------------------+
1 row in set (0.00 sec)

mysql> select * from devices where client_type = 'BlackBerryDevice';
553 rows in set (0.01 sec)

前后结果对比让我很惊讶。联合索引结果比单索引还快。 我的测试是在同一个数据库运行的,会不会是 MySQL 有对结果缓存,还是我测试的数据集太小? 另外像我这种应用场景,是否合适把单索引直接去掉?假设查询:client_type:client_type, :status需求差不多。

当然是联合索引快,两个单索引如果都用到的话,过程是符合两个索引 的记录都找出,然后合并,而联合索引只会在这个联合索引搜索,结果直接返回

但单索引的应用范围更广,因为他可以由应用端来指定组合

联合索引适用于一些已经稳定的用得很多的查询的优化,毕竟你不能改个条件就去改一个联合索引

client_type 和 status 的值的种类都很少

client_type

  • BlackBerryDevice
  • iPhone
  • iPad

status

  • 1
  • 2

某个字段的值特异度很低的情况下,建索引比起全表扫描节省不了多少时间。所以我的看法是,这个字段的索引没啥用。

炮哥,你怎么看?

@ hooopo

#1 楼 @azhao 我的例子是查询一个字段,联合索引比单索引要快。不是查询两个字段,联合索引比两个字段各自单索引再联合快。

可以在本机造一亿条假数据,索引之间的差异会拉大。此外 SET SESSION query_cache_type=0; 可以在当前连接的 session 关闭缓存。

如何在 4 分钟内创建一亿条 mysql 假数据( @vincent 教我的,哈哈)

Without creating a stored procedure, one technique I've applied is to use the table itself to add the columns. First seed it with a value...

insert into table_name ( column1, column2, column3, column4 )
  values ( 1, 2, 3, 4 );

Then insert again, selecting from this table to double the rows each time...

insert into table_name ( column1, column2, column3, uuid)
  select  round( rand()*10000), 0, 0, UUID() from rand_numbers;

You don't need to run the second query that many times to get quite a few random rows. Not as "neat" as using a stored procedure of course, just proposing an alternative.

You need to Sign in before reply, if you don't have an account, please Sign up first.