使用延迟关联对LIMIT语句进行分页性能优化

在SQL语句中，limit语句经常会使用到，在程序中分页都是使用limit语句来完成的。但是如果你只是粗暴地使用select * from table limit start,count;来完成分页操作，那么随着数据量的增大，一次分页操作将会成本将会不断增大。但是如果正确使用limit语句的话，将大大提高查询速度。下面来讲讲如何正确使用limit语句，测试表百万行。

测试环境

先让我们熟悉下基本的sql语句，来查看下我们将要测试表tabletest的基本信息。

mysql> select * from information_schema.tables where TABLE_SCHEMA='dbtest' and TABLE_NAME='tabletest'\G<br>
  TABLE_CATALOG: def
   TABLE_SCHEMA: dbtest
     TABLE_NAME: tabletest
     TABLE_TYPE: BASE TABLE
         ENGINE: InnoDB
        VERSION: 10
     ROW_FORMAT: Compact
     TABLE_ROWS: 4798670
 AVG_ROW_LENGTH: 827
    DATA_LENGTH: 3970957312
MAX_DATA_LENGTH: 0
   INDEX_LENGTH: 1247068160
      DATA_FREE: 7340032
 AUTO_INCREMENT: NULL
    CREATE_TIME: 2016-05-23 10:39:14
    UPDATE_TIME: NULL
     CHECK_TIME: NULL
TABLE_COLLATION: utf8_general_ci
       CHECKSUM: NULL
 CREATE_OPTIONS: 
  TABLE_COMMENT: 
1 row in set (0.00 sec)

mysql> select * from information_schema.tables where TABLE_SCHEMA='dbtest' and TABLE_NAME='tabletest'\G<br>

TABLE_CATALOG: def

TABLE_SCHEMA: dbtest

TABLE_NAME: tabletest

TABLE_TYPE: BASE TABLE

ENGINE: InnoDB

VERSION: 10

ROW_FORMAT: Compact

TABLE_ROWS: 4798670

AVG_ROW_LENGTH: 827

DATA_LENGTH: 3970957312

MAX_DATA_LENGTH: 0

INDEX_LENGTH: 1247068160

DATA_FREE: 7340032

AUTO_INCREMENT: NULL

CREATE_TIME: 2016-05-23 10:39:14

UPDATE_TIME: NULL

CHECK_TIME: NULL

TABLE_COLLATION: utf8_general_ci

CHECKSUM: NULL

CREATE_OPTIONS:

TABLE_COMMENT:

1 row in set (0.00 sec)

查询结果：

从上图中我们可以看到表的基本信息：

表行数：4798670行

平均每行的数据长度：827字节

单表大小：3970957312字节（expr 3970957312 / 1024 / 1024 = 3787M）

tabletest表中字段各种类型都有varchar、datetime、text等，说几个关键表字段：表字段id、mid、bp，其中id为primary key、mid为key、bp没有索引。

测试实验

单表分页语句比较简单，一般就是用index来减少排序，必要的时候需要用到延时关联。我这里为了简化，就没有加入排序这个动作，正常情况下排序后分页才是规范写法。

1）直接用limit start, count分页语句

select * from tabletest limit start, count

1	select * from tabletest limit start, count

当起始页较小时，查询没有性能问题，我们分别看下从10， 100， 1000， 10000开始分页的执行时间（每页取20条），如下：

select * from tabletest limit 1000,20;
20 rows in set (0.00 sec)   #第一次查询
20 rows in set (0.00 sec)   #第二次查询

select * from tabletest limit 10000,20;
20 rows in set (0.14 sec)   #第一次查询
20 rows in set (0.02 sec)   #第二次查询

select * from tabletest limit 100000,20;
20 rows in set (3.46 sec)   #第一次查询
20 rows in set (0.24 sec)   #第二次查询

select * from tabletest limit 500000,20;
20 rows in set (17.71 sec)  #第一次查询
20 rows in set (1.20 sec)   #第二次查询

select * from tabletest limit 1000000,20;
20 rows in set (20.09 sec)  #第一次查询
20 rows in set (2.38 sec)   #第二次查询

select * from tabletest limit 1000,20;

20 rows in set (0.00 sec) #第一次查询

20 rows in set (0.00 sec) #第二次查询

select * from tabletest limit 10000,20;

20 rows in set (0.14 sec) #第一次查询

20 rows in set (0.02 sec) #第二次查询

select * from tabletest limit 100000,20;

20 rows in set (3.46 sec) #第一次查询

20 rows in set (0.24 sec) #第二次查询

select * from tabletest limit 500000,20;

20 rows in set (17.71 sec) #第一次查询

20 rows in set (1.20 sec) #第二次查询

select * from tabletest limit 1000000,20;

20 rows in set (20.09 sec) #第一次查询

20 rows in set (2.38 sec) #第二次查询

PS：第一次执行完毕后缓冲池就会缓冲部分数据，所以你第二次执行会看到时间下降很多。

我们已经看出随着起始记录的增加，时间也随着增大，这说明分页语句limit跟起始页码是有很大关系的。

从中我们也能总结出两件事情：

1）limit语句的查询时间与起始记录的位置成正比；

2）MySQL的limit语句是很方便，但是对记录很多的表并不适合直接使用；

2）对limit分页问题的性能优化方法-延迟关联

推荐使用“延迟关联”的方法来优化排序操作，何谓”延迟关联” ：通过使用覆盖索引查询返回需要的主键，再根据主键关联原表获得需要的数据来加速分页查询。

我们都知道，利用了索引查询的语句中如果只包含了那个索引列（覆盖索引），那么这种情况会查询很快。因为利用索引查找有优化算法，且数据就在查询索引上面，不用再去找相关的数据地址了，这样节省了很多时间。

在我们的例子中，我们知道id字段是主键，自然就包含了默认的主键索引。现在让我们看看利用覆盖索引的查询效果如何。

这次我们利用覆盖索引，只包含id列，如下：

mysql> select id from tabletest limit 1000000,20;
20 rows in set (2.32 sec) #第一次查询
20 rows in set (0.20 sec) #第二次查询

mysql> select id from tabletest limit 1000000,20;

20 rows in set (2.32 sec) #第一次查询

20 rows in set (0.20 sec) #第二次查询

执行计划：

mysql> explain select id from tabletest limit 1000000,20;
+------+-------------+---------------+-------+---------------+------+---------+------+---------+-------------+
| id   | select_type | table         | type  | possible_keys | key  | key_len | ref  | rows    | Extra       |
+------+-------------+---------------+-------+---------------+------+---------+------+---------+-------------+
|    1 | SIMPLE      | tabletest     | index | NULL          | flag | 3       | NULL | 4893802 | Using index |
+------+-------------+---------------+-------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)

mysql> explain select id from tabletest limit 1000000,20;

+------+-------------+---------------+-------+---------------+------+---------+------+---------+-------------+

+------+-------------+---------------+-------+---------------+------+---------+------+---------+-------------+

+------+-------------+---------------+-------+---------------+------+---------+------+---------+-------------+

1 row in set (0.00 sec)

可以看出查询速度提升很多，那么如果我们也要查询所有列，有两种方法，一种是id>=的形式，另一种就是利用join，看下实际情况：

mysql> select * from tabletest where id >= (select id from tabletest limit 1000000,1) limit 20;
20 rows in set (2.48 sec) #第一次查询
20 rows in set (0.72 sec) #第二次查询

mysql> select * from tabletest where id >= (select id from tabletest limit 1000000,1) limit 20;

20 rows in set (2.48 sec) #第一次查询

20 rows in set (0.72 sec) #第二次查询

执行计划：

mysql> explain select * from tabletest where id >= (select id from tabletest limit 1000000,1) limit 20;
+------+-------------+---------------+-------+---------------+------+---------+------+---------+-------------+
| id   | select_type | table         | type  | possible_keys | key  | key_len | ref  | rows    | Extra       |
+------+-------------+---------------+-------+---------------+------+---------+------+---------+-------------+
|    1 | PRIMARY     | tabletest     | ALL   | PRIMARY       | NULL | NULL    | NULL | 4893900 | Using where |
|    2 | SUBQUERY    | tabletest     | index | NULL          | flag | 3       | NULL | 4893900 | Using index |
+------+-------------+---------------+-------+---------------+------+---------+------+---------+-------------+
2 rows in set (0.00 sec)

mysql> explain select * from tabletest where id >= (select id from tabletest limit 1000000,1) limit 20;

+------+-------------+---------------+-------+---------------+------+---------+------+---------+-------------+

+------+-------------+---------------+-------+---------------+------+---------+------+---------+-------------+

+------+-------------+---------------+-------+---------------+------+---------+------+---------+-------------+

2 rows in set (0.00 sec)

另一种写法，延时join的方式。

mysql> select * from tabletest as a join (select id from tabletest limit 1000000,20) as b on a.id=b.id;
20 rows in set (0.19 sec)   #第一次查询
20 rows in set (0.18 sec)   #第二次查询