在使用 ActiveRecord 这样的 ORM 工具时,常会嵌套遍历 model。
例如,有两个 model,Post、Comment,关系是一对多。
class Post < ApplicationRecord
has_many :comments
end
class Comment < ApplicationRecord
belongs_to :post
end
总共有 4 个 post。
> Post.count
(0.1ms) SELECT COUNT(*) FROM "posts"
=> 4
获取每个 post 的所有 comment,我们可以:
> Post.all.map{|post| post.comments}
Post Load (0.3ms) SELECT "posts".* FROM "posts"
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? [["post_id", 1]]
Comment Load (0.4ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? [["post_id", 2]]
Comment Load (0.6ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? [["post_id", 3]]
Comment Load (0.6ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? [["post_id", 4]]
可以看到为了得到 4 条数据,我们执行了 5(4 + 1)次的查询,这就是所谓 N + 1 查询问题。
除了凭经验外,有一些 gem 也可以帮助我们提早发现 N + 1 查询问题。
例如收费的New Relic,免费的Bullet。
简单来说,就是提前加载 model 关系,让 ActiveRecord 预先加载所需要的数据。
ActiveRecord 提供了以下三个方法预加载。
includes
preload
eager_load
他们的区别可以参考这里或这里。
以最常用的includes
方法为例。
> Post.includes(:comments).map{|post| post.comments}
Post Load (0.2ms) SELECT "posts".* FROM "posts"
Comment Load (0.5ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN (1, 2, 3, 4)
得到的结果一样,但执行的查询只有两次。
includes
方法的确很惊艳,但……
第一,代码不够优雅。
例如,假设我们现在想找的是 id 在 1 到 3 之间的 post 的 comment。
一般的我们的逻辑是,查找 id 在 1 到 3 之间的 post,获取各 post 的 comment 然后合并。
而预加载后的逻辑是,查找 id 在 1 到 3 之间的 post,关联 comment,再获取各 post 的 comment 然后合并。
总觉得有点冗余。
> Post.where(id: 1..3).includes(:comments).map{|post| post.comments}
Post Load (0.5ms) SELECT "posts".* FROM "posts" WHERE ("posts"."id" BETWEEN ? AND ?) [["id", 1], ["id", 3]]
Comment Load (0.5ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN (1, 2, 3)
第二,不符合 DRY。
既然我们都不喜欢 N + 1,那就应该从源头上杜绝,而不是每次查询时都要主动includes
一次。
懒癌程序员的救星Goldiloader——几乎完美的解决了以上两个问题。
gem 'goldiloader'
bundle install
以后,就可以用最直接(傻瓜)的方式点点点……
> Post.where(id: 1..3).map{|post| post.comments}
Post Load (0.2ms) SELECT "posts".* FROM "posts" WHERE ("posts"."id" BETWEEN ? AND ?) [["id", 1], ["id", 3]]
Comment Load (0.3ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN (1, 2, 3)
Goldiloader 默认自动加载所有关联数据,用auto_include: false
可以方便地关闭自动加载。
class Post < ApplicationRecord
has_many :comments, auto_include: false
end
以下的方法比较特殊,如果关系已经加载了,则会直接返回已缓存的值,如果没被加载,则会通过 SQL 查询。
假设现在我们需要获取每个 post 的最新的 comment。
但这不是我们想要的。
> Post.all.sum{|post| [post.id, post.comments.last&.content]}
Post Load (0.1ms) SELECT "posts".* FROM "posts"
Comment Load (0.1ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? ORDER BY "comments"."id" DESC LIMIT ? [["post_id", 1], ["LIMIT", 1]]
Comment Load (0.1ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? ORDER BY "comments"."id" DESC LIMIT ? [["post_id", 2], ["LIMIT", 1]]
Comment Load (0.1ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? ORDER BY "comments"."id" DESC LIMIT ? [["post_id", 3], ["LIMIT", 1]]
Comment Load (0.1ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? ORDER BY "comments"."id" DESC LIMIT ? [["post_id", 4], ["LIMIT", 1]]
添加选项full_load: true
后,当调用上述方法时,Goldiloader 会强制自动加载所需的关系。
class Post < ApplicationRecord
has_many :comments, fully_load: true
end
这才是我们想要的。
> Post.all.sum{|post| [post.id, post.comments.last&.content]}
Post Load (0.3ms) SELECT "posts".* FROM "posts"
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN (1, 2, 3, 4)
Goldiloader 是 ActiveRecord 的衍生工具,所以 ActiveRecord 预加载的副作用也一并继承了。
现在我们自定义一个has_one
关系,用以获取最新的一条 comment。
class Post < ApplicationRecord
has_many :comments, fully_load: true
has_one :latest_comment, -> { order(created_at: :desc) }, class_name: 'Comment'
end
遍历 post 获取最新的 comment。
> Post.all.map{|post| post.latest_comment}
不使用 Goldiloader 或者预加载时,每条 SQL 自动回加上limit 1
。
Post Load (0.3ms) SELECT "posts".* FROM "posts"
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? ORDER BY "comments"."created_at" DESC LIMIT ? [["post_id", 1], ["LIMIT", 1]]
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? ORDER BY "comments"."created_at" DESC LIMIT ? [["post_id", 2], ["LIMIT", 1]]
Comment Load (0.1ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? ORDER BY "comments"."created_at" DESC LIMIT ? [["post_id", 3], ["LIMIT", 1]]
Comment Load (0.1ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? ORDER BY "comments"."created_at" DESC LIMIT ? [["post_id", 4], ["LIMIT", 1]]
使用 Goldiloader 或者预加载时,世界变清净了,但同时会有性能隐患,因为 post 的数据量可能非常大。
Post Load (0.5ms) SELECT "posts".* FROM "posts"
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN (1, 2, 3, 4) ORDER BY "comments"."created_at" DESC
遇到以下的关系(方法),Goldiloader 会自动关闭自动预加载。
N + 1 查询问题是一个容易被忽略的问题。
发现解决它也不难,includes
已经够用,Goldiloader 更是锦上添花,对新手足够友好。
不过对于我这种被 Rails“坑”习惯的斯德哥尔摩症候群患者来说,没有includes
反而没安全感了>_<|||