当业务发展到一定程度,优化成了不得不面对的一个难题。平时自己写写小工具,玩玩就算了,哪怕是满地的 N + 1 查询也不会有什么感觉。然而随着用户量持续增长,就不得不面对,数据库优化,机子扩容等方面的问题。这篇文章简单聊聊 Rails 中解决 N + 1 查询的一个小手段 SQL Caching。原文链接 https://lanzhiheng.com/posts/sql-caching-in-rails-request
要解决 N + 1 查询带来的问题,Rails 提供了不少的工具方法,其中包括includes
, preload
, eager_load
等等,这几者的区别,笔者在上一篇文章中有所提及,这里就不赘述了。通常情况下要解决 N + 1,问题,最简单的方式就是直接采用上面的方法
> Post.eager_load(:category).where("categories.name = 'Blogs'").limit(10).each {|m| puts m.category.name }
SQL (4.7ms) SELECT "posts"."id" AS t0_r0, "posts"."title" AS t0_r1, "posts"."body" AS t0_r2, "posts"."slug" AS t0_r3, "posts"."created_at" AS t0_r4, "posts"."updated_at" AS t0_r5, "posts"."excerpt" AS t0_r6, "posts"."category_id" AS t0_r7, "posts"."draft" AS t0_r8, "categories"."id" AS t1_r0, "categories"."key" AS t1_r1, "categories"."name" AS t1_r2, "categories"."created_at" AS t1_r3, "categories"."updated_at" AS t1_r4 FROM "posts" LEFT OUTER JOIN "categories" ON "categories"."id" = "posts"."category_id" WHERE (categories.name = 'Blogs') LIMIT $1 [["LIMIT", 10]]
Blogs
Blogs
Blogs
...
假设在展示文章列表的时候我想知道这个文章是否打上最热门的标签。那么新手(像楼主)可能会把代码写成这个样子
class Post < ApplicationRecord
def tags_include?(tag_name)
return false if tag_name.blank?
tags.pluck(:name).include?(tag_name)
end
end
class PostsController < ApplicationController
def index
@posts = Post.order('created_at DESC')
end
end
<!-- app/views/posts/_post.html.erb -->
<li class="inline-post-wrapper">
<div class="title-wrapper">
<%= link_to post.title, post_path(post.slug) %> » <span class="post-meta"><%= post.created_at.strftime('%Y-%m-%d') %>
<div><%= post.tags_include?(params[:search_tag]) %></div>
</div>
</li>
当请求为
curl http://127.0.0.1:4000/posts
日志正常
Started GET "/posts" for 127.0.0.1 at 2021-12-30 23:10:33 +0800
Processing by PostsController#index as */*
Rendering posts/index.html.erb within layouts/application
Post Load (11.0ms) SELECT "posts".* FROM "posts" ORDER BY created_at DESC
只查询了数据库一次,然而加上参数后
curl http://127.0.0.1:4000/posts?search_tag=Rails
日志却是这番光景
Started GET "/posts?search_tag=Rails" for 127.0.0.1 at 2021-12-30 23:13:48 +0800
Processing by PostsController#index as */*
Parameters: {"search_tag"=>"Rails"}
Rendering posts/index.html.erb within layouts/application
Post Load (6.6ms) SELECT "posts".* FROM "posts" ORDER BY created_at DESC
↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'
(0.3ms) SELECT "tags"."name" FROM "tags" INNER JOIN "taggings" ON "tags"."id" = "taggings"."tag_id" WHERE "taggings"."taggable_id" = $1 AND "taggings"."taggable_type" = $2 AND "taggings"."context" = $3 [["taggable_id", 76], ["taggable_type", "Post"], ["context", "tags"]]
↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'
(0.2ms) SELECT "tags"."name" FROM "tags" INNER JOIN "taggings" ON "tags"."id" = "taggings"."tag_id" WHERE "taggings"."taggable_id" = $1 AND "taggings"."taggable_type" = $2 AND "taggings"."context" = $3 [["taggable_id", 75], ["taggable_type", "Post"], ["context", "tags"]]
↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'
(0.2ms) SELECT "tags"."name" FROM "tags" INNER JOIN "taggings" ON "tags"."id" = "taggings"."tag_id" WHERE "taggings"."taggable_id" = $1 AND "taggings"."taggable_type" = $2 AND "taggings"."context" = $3 [["taggable_id", 74], ["taggable_type", "Post"], ["context", "tags"]]
↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'
可见也出现了 N + 1 查询了,而且这种查询还不那么容易发现。仅仅是给渲染层多加一个方法,就容易导致这种问题。当然可以通过简单的Post.includes(:tags)
来解决这个问题。不过今天主要想介绍另一种优化策略,这种策略对于仅在渲染层调用的方法十分有效,策略名为SQL Caching。
官方对它的描述很简单
If Rails encounters the same query again for that request, it will use the cached result set as opposed to running the query against the database again.
也就是在同一个请求里面(我们可以理解成同一个 Action 里面),如果 Rails 检测到两条一摸一样的数据库查询,那么第二次查询将从缓存里面取,而不会真的去查数据库,比方说
class PostsController < ApplicationController
def index
# From Database
@posts = Post.order('created_at DESC')
# From Cached
@posts = Post.order('created_at DESC')
end
end
这种策略对于在渲染层调用复杂方法,而又不太容易通过includes
等手段去优化的场景尤为有效。应用在上面的案例中,可以把Post#tags_include?
调整成下面这个样子
class Post < ApplicationRecord
def tags_include?(tag_name)
return false if tag_name.blank?
tag_ids = ActsAsTaggableOn::Tag.where(name: 'Rails').pluck(:id)
post_ids = ActsAsTaggableOn::Tagging.where(tag_id: tag_ids, taggable_type: 'Post').pluck(:taggable_id).uniq
post_ids.include?(id)
end
end
它要做的事情其实是一样的,然而数据库查询次数却不一样。假设再次发送请求
> curl http://127.0.0.1:4000/posts?search_tag=Rails
它的日志会长得像下面这样
Started GET "/posts?search_tag=Rails" for 127.0.0.1 at 2021-12-31 23:22:05 +0800
Processing by PostsController#index as */*
Parameters: {"search_tag"=>"Rails"}
Rendering posts/index.html.erb within layouts/application
Post Load (5.8ms) SELECT "posts".* FROM "posts" ORDER BY created_at DESC
↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'
(0.4ms) SELECT "tags"."id" FROM "tags" WHERE "tags"."name" = $1 [["name", "Rails"]]
↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'
(0.3ms) SELECT "taggings"."taggable_id" FROM "taggings" WHERE "taggings"."tag_id" = $1 AND "taggings"."taggable_type" = $2 [["tag_id", 5], ["taggable_type", "Post"]]
↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'
CACHE (0.0ms) SELECT "tags"."id" FROM "tags" WHERE "tags"."name" = $1 [["name", "Rails"]]
↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'
CACHE (0.0ms) SELECT "taggings"."taggable_id" FROM "taggings" WHERE "taggings"."tag_id" = $1 AND "taggings"."taggable_type" = $2 [["tag_id", 5], ["taggable_type", "Post"]]
↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'
CACHE (0.0ms) SELECT "tags"."id" FROM "tags" WHERE "tags"."name" = $1 [["name", "Rails"]]
↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'
CACHE (0.0ms) SELECT "taggings"."taggable_id" FROM "taggings" WHERE "taggings"."tag_id" = $1 AND "taggings"."taggable_type" = $2 [["tag_id", 5], ["taggable_type", "Post"]]
为了方便查看我用空行分割开了,可见只有第一次调用该方法的时候会查询数据库,随后 Rails 会将查询结果缓存起来。如果再次遇到相同的查询,那么它将会从缓存中获取结果,而不会再去访问数据库。其实就是 Rails 把查询结果通过合适的数据结构存储在内存中,下次需要的时候直接从内存中获取,节省了查询数据库的时间。
这篇文章作为上一篇文章的补充,在一些场景里面,使用内置的includes
, preload
, eager_load
解决 N + 1 问题,可能会稍显麻烦,并且在一定程度上缺乏灵活性。如果在渲染层中调用了会引发 N + 1 问题的复杂方法,而这个方法并不容易使用includes
等手段来解决的时候,采用SQL Caching或许是一个不错的选择,在同一个请求中相同的查询会被缓存起来,供下一次查询使用,能有效解决 N + 1 问题。