Rails 一招秒杀 N+1 agg 问题

hooopo · 2018年11月01日 · 最后由 hooopo 回复于 2018年12月13日 · 7155 次阅读
本帖已被管理员设置为精华贴

关于 n+1 聚合函数问题的描述可以参考这个帖子:https://ruby-china.org/topics/26245

最近发现一个更简单灵活的解法,使用 Rails scope,无任何添加剂。

class Order < ApplicationRecord
  scope :with_items_count, -> () {
    aggs_sql = OrderItem.group(:order_id).select("order_id, COUNT(*) AS items_count").to_sql
    select("orders.*, aggs.*").joins("LEFT JOIN (#{aggs_sql}) AS aggs ON aggs.order_id = orders.id")
  }
end

使用方法:

Order.with_items_count.last(10).each do |x| 
  p x.items_count
  p x.id
  p x.no
end

生产 SQL:

SELECT  orders.*, aggs.* FROM "orders" LEFT JOIN (SELECT order_id, COUNT(*) AS items_count FROM "order_items" WHERE "order_items"."deleted_at" IS NULL GROUP BY "order_items"."order_id") AS aggs ON aggs.order_id = orders.id LIMIT 10

从原来的 n+1 变成了一条带子查询的 SQL,比 eager load 还要高效。如果你用 Postgres 的话,结合Selective Aggregates,多个过滤条件和多个 agg 一起搞定,可以爽到飞...

巧妙。但是当 Order 有附加的 where 条件时,是否需要把这个条件带入到 aggs_sql 里?

hellorails 回复

不需要哇

hooopo 回复

order_items 全表 group?

hellorails 回复

是不是全表要看执行计划,你 explian 一下咯

jasl 将本帖设为了精华贴。 11月01日 10:52
  1. 玩了下 Selective Aggregates,非常爽,学习了(为什么 pg 官方文档上找不到呢?)

  2. 你讲的无任何添加剂的秒杀 n+1 agg 的解法和 https://ruby-china.org/topics/26245的一楼那个物化视图方法貌似是一样的(都是你写的👍 ),区别是,OrderItem.group(:order_id).select("order_id, COUNT(*) AS items_count")在那里变成了物化视图。

  3. 感觉这种解法已经和写原生 SQL 差不多了,可能写原生 SQL 更加易读一些

8 楼 已删除

1.这里有各个版本的特性支持:https://www.postgresql.org/about/featurematrix/

3.主要是可以利用其它 scope relation 等组合,raw SQL 复用困难。。

materialized view 有维护成本的

不给咱们那个表单项目加上吗 😊

flyerhzm 已经把这个功能做成 gem 了, https://github.com/xinminlabs/eager_group,我用过,很不错。

PG 文档上都有,见 4.2.7 Aggregate Expression

其实这个带统计性质的 scope 和领域模型的关注点没有太大关系,重构到另一个查询模块或者类会不会比较好?

yakjuly 回复

eager_group 确实很好用,但存在一些问题:

  1. eager_group 好像不支持 Rails 5,好像也不维护了,这个帖子提的方案不依赖任何 gem,不存在维护问题
  2. 这个帖子的方案无论有多少个需要聚合的指标,都只产生一条 SQL,而 eager group 如果有 n 个指标,需要产生 n+1 个 SQL
  3. eager_group 只是 aggregate_function(aggregate_column),算是一种最经典的形式,但现实需求往往比这要复杂,很多 case 处理不来。
khun84 回复

good point.

jasl 回复

没需求啊

liprais 回复

最近发现 mv refresh 之后还要手动 vacuum,要不空间一直涨。

SELECT  orders.*, aggs.* FROM "orders" LEFT JOIN (SELECT order_id, COUNT(*) AS items_count FROM "order_items" WHERE "order_items"."deleted_at" IS NULL GROUP BY "order_items"."order_id") AS aggs ON aggs.order_id = orders.id LIMIT 10

这个是正序的情况,是很快,但是 last 是带了 order by orders.id desc 的,好像速度不行~

SELECT  orders.*, aggs.* FROM "orders" LEFT JOIN (SELECT order_id, COUNT(*) AS items_count FROM "order_items" WHERE "order_items"."deleted_at" IS NULL GROUP BY "order_items"."order_id") AS aggs ON aggs.order_id = orders.id ORDER BY orders.id DESC LIMIT 10

感觉在中间的子查询中还要再嵌套一层过滤 order_id 的,根据外层的条件来

不带 ORDER BY

Limit  (cost=0.71..6.73 rows=10 width=359) (actual time=0.072..0.127 rows=10 loops=1)
  ->  Merge Left Join  (cost=0.71..40027.18 rows=66537 width=359) (actual time=0.071..0.125 rows=10 loops=1)
        Merge Cond: (orders.id = order_items.order_id)
        ->  Index Scan using orders_pkey on orders  (cost=0.29..9346.63 rows=66537 width=347) (actual time=0.028..0.054 rows=10 loops=1)
        ->  GroupAggregate  (cost=0.42..29731.93 rows=34768 width=12) (actual time=0.039..0.057 rows=5 loops=1)
              Group Key: order_items.order_id
              ->  Index Only Scan using index_order_items_on_order_id on order_items  (cost=0.42..28311.68 rows=214514 width=4) (actual time=0.029..0.044 rows=21 loops=1)
                    Heap Fetches: 21
Planning time: 0.640 ms
Execution time: 0.244 ms

带了 ORDER BY

Limit  (cost=12230.82..12232.33 rows=10 width=359) (actual time=254.210..254.258 rows=10 loops=1)
  ->  Merge Left Join  (cost=12230.82..22265.03 rows=66537 width=359) (actual time=254.208..254.253 rows=10 loops=1)
        Merge Cond: (orders.id = order_items.order_id)
        ->  Index Scan Backward using orders_pkey on orders  (cost=0.29..9346.63 rows=66537 width=347) (actual time=0.025..0.059 rows=10 loops=1)
        ->  Sort  (cost=12230.53..12317.45 rows=34768 width=12) (actual time=254.177..254.179 rows=5 loops=1)
              Sort Key: order_items.order_id DESC
              Sort Method: quicksort  Memory: 3857kB
              ->  HashAggregate  (cost=8912.71..9260.39 rows=34768 width=12) (actual time=202.281..223.324 rows=53143 loops=1)
                    Group Key: order_items.order_id
                    ->  Seq Scan on order_items  (cost=0.00..7840.14 rows=214514 width=4) (actual time=0.014..87.574 rows=214514 loops=1)
Planning time: 0.660 ms
Execution time: 255.422 ms

如果在中间那个 group by 的子查询中再嵌套一个子查询带外层的条件,又感觉比较丑

ysllyfe 回复

可以试试 把 left join 改成 LEFT JOIN LATERAL,加个条件 OrderItem.where("orders.id = order_items.order_id").group

SELECT orders.*, aggs.* FROM "orders" LEFT JOIN LATERAL (SELECT order_id, COUNT(*) AS items_count FROM "order_items" WHERE "order_items"."deleted_at" IS NULL AND (orders.id = order_items.order_id) GROUP BY "order_items"."order_id") AS aggs ON aggs.order_id = orders.id ORDER BY id desc;
Sort  (cost=31853.46..31862.86 rows=3759 width=344) (actual time=26.134..26.694 rows=3759 loops=1)
  Sort Key: orders.id DESC
  Sort Method: quicksort  Memory: 2035kB
  ->  Nested Loop Left Join  (cost=0.28..31630.25 rows=3759 width=344) (actual time=0.041..21.992 rows=3759 loops=1)
        ->  Seq Scan on orders  (cost=0.00..289.59 rows=3759 width=328) (actual time=0.014..0.904 rows=3759 loops=1)
        ->  Subquery Scan on aggs  (cost=0.28..8.33 rows=1 width=16) (actual time=0.004..0.005 rows=1 loops=3759)
              Filter: (aggs.order_id = orders.id)
              ->  GroupAggregate  (cost=0.28..8.31 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=3759)
                    Group Key: order_items.order_id
                    ->  Index Scan using index_order_items_on_order_id on order_items  (cost=0.28..8.30 rows=1 width=8) (actual time=0.002..0.003 rows=1 loops=3759)
                          Index Cond: (orders.id = order_id)
                          Filter: (deleted_at IS NULL)
                          Rows Removed by Filter: 0
Planning time: 0.411 ms
Execution time: 26.936 ms

不用 lateral 的话,你要加索引 [order_id, deleted_at],我本地是没问题的,带 order by id desc:

Limit  (cost=495.12..498.33 rows=10 width=344) (actual time=16.001..16.218 rows=10 loops=1)
  ->  Merge Left Join  (cost=495.12..1702.50 rows=3759 width=344) (actual time=16.001..16.217 rows=10 loops=1)
        Merge Cond: (orders.id = order_items.order_id)
        ->  Index Scan Backward using orders_pkey on orders  (cost=0.28..1144.43 rows=3759 width=328) (actual time=0.105..0.283 rows=10 loops=1)
        ->  Sort  (cost=494.84..503.81 rows=3589 width=16) (actual time=15.886..15.889 rows=10 loops=1)
              Sort Key: order_items.order_id DESC
              Sort Method: quicksort  Memory: 249kB
              ->  GroupAggregate  (cost=0.28..247.03 rows=3589 width=16) (actual time=1.900..14.025 rows=3257 loops=1)
                    Group Key: order_items.order_id
                    ->  Index Only Scan using tindex on order_items  (cost=0.28..187.50 rows=4727 width=8) (actual time=1.887..11.415 rows=4727 loops=1)
                          Index Cond: (deleted_at IS NULL)
                          Heap Fetches: 32
Planning time: 0.716 ms
Execution time: 16.316 ms

和索引无关,本地索引是对的

order_items 的数据量为 20W+ orders 为 5W+

使用 LEFT JOIN LATERAL 在子查询里面多加 where 条件后好。感觉还是里面的查询不知道要使用 order_id 的索引来过滤数据,需要强制指定。如果没有这种用法,是不是就只能用子查询了?第一次见到这种写法 🎁 🎁 又学了一招

ysllyfe 回复

因为你的条件有 deleted at,要加符合索引,我试过没问题的,你看我上一个回帖后面的结果。

hooopo 回复

嗯,我忘了,我没有 deleted_at 字段,#21 的那里语句写的是按例子来的,实际的是下面这个

SELECT  orders.*, aggs.* FROM "orders" LEFT JOIN (SELECT order_id, COUNT(*) AS items_count FROM "order_items" GROUP BY "order_items"."order_id") AS aggs ON aggs.order_id = orders.id ORDER BY orders.id DESC LIMIT 10;

@jasl @huacnlee 看来需要新会员冷静期和贴子报告功能 😅

👍 这里有个小坑,对于 order_items 没记录的 order, order.items_count会返回 NULL

fangxing204 回复

select 的时候 COALESCE 一下或自己判断都可以

需要 登录 后方可回复, 如果你还没有账号请 注册新账号