Rails Rails 查询优化小手段-SQL Caching

lanzhiheng · 2022年01月11日 · 最后由 lanzhiheng 回复于 2022年01月23日 · 923 次阅读

当业务发展到一定程度,优化成了不得不面对的一个难题。平时自己写写小工具,玩玩就算了,哪怕是满地的 N + 1 查询也不会有什么感觉。然而随着用户量持续增长,就不得不面对,数据库优化,机子扩容等方面的问题。这篇文章简单聊聊 Rails 中解决 N + 1 查询的一个小手段 SQL Caching。原文链接 https://lanzhiheng.com/posts/sql-caching-in-rails-request

方法层面的优化

要解决 N + 1 查询带来的问题,Rails 提供了不少的工具方法,其中包括includes, preload, eager_load等等,这几者的区别,笔者在上一篇文章中有所提及,这里就不赘述了。通常情况下要解决 N + 1,问题,最简单的方式就是直接采用上面的方法

> Post.eager_load(:category).where("categories.name = 'Blogs'").limit(10).each {|m| puts m.category.name }
  SQL (4.7ms)  SELECT "posts"."id" AS t0_r0, "posts"."title" AS t0_r1, "posts"."body" AS t0_r2, "posts"."slug" AS t0_r3, "posts"."created_at" AS t0_r4, "posts"."updated_at" AS t0_r5, "posts"."excerpt" AS t0_r6, "posts"."category_id" AS t0_r7, "posts"."draft" AS t0_r8, "categories"."id" AS t1_r0, "categories"."key" AS t1_r1, "categories"."name" AS t1_r2, "categories"."created_at" AS t1_r3, "categories"."updated_at" AS t1_r4 FROM "posts" LEFT OUTER JOIN "categories" ON "categories"."id" = "posts"."category_id" WHERE (categories.name = 'Blogs') LIMIT $1  [["LIMIT", 10]]
Blogs
Blogs
Blogs
...

假设在展示文章列表的时候我想知道这个文章是否打上最热门的标签。那么新手(像楼主)可能会把代码写成这个样子

class Post < ApplicationRecord
  def tags_include?(tag_name)
    return false if tag_name.blank?

    tags.pluck(:name).include?(tag_name)
  end
end
class PostsController < ApplicationController
  def index
    @posts = Post.order('created_at DESC')
  end
end
<!-- app/views/posts/_post.html.erb -->

<li class="inline-post-wrapper">
  <div class="title-wrapper">
    <%= link_to post.title, post_path(post.slug) %> » <span class="post-meta"><%= post.created_at.strftime('%Y-%m-%d') %>
    <div><%= post.tags_include?(params[:search_tag]) %></div>
  </div>
</li>

当请求为

curl http://127.0.0.1:4000/posts

日志正常

Started GET "/posts" for 127.0.0.1 at 2021-12-30 23:10:33 +0800
Processing by PostsController#index as */*
  Rendering posts/index.html.erb within layouts/application
  Post Load (11.0ms)  SELECT "posts".* FROM "posts" ORDER BY created_at DESC

只查询了数据库一次,然而加上参数后

curl http://127.0.0.1:4000/posts?search_tag=Rails

日志却是这番光景

Started GET "/posts?search_tag=Rails" for 127.0.0.1 at 2021-12-30 23:13:48 +0800
Processing by PostsController#index as */*
  Parameters: {"search_tag"=>"Rails"}
  Rendering posts/index.html.erb within layouts/application
  Post Load (6.6ms)  SELECT "posts".* FROM "posts" ORDER BY created_at DESC
  ↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'
   (0.3ms)  SELECT "tags"."name" FROM "tags" INNER JOIN "taggings" ON "tags"."id" = "taggings"."tag_id" WHERE "taggings"."taggable_id" = $1 AND "taggings"."taggable_type" = $2 AND "taggings"."context" = $3  [["taggable_id", 76], ["taggable_type", "Post"], ["context", "tags"]]
  ↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'
   (0.2ms)  SELECT "tags"."name" FROM "tags" INNER JOIN "taggings" ON "tags"."id" = "taggings"."tag_id" WHERE "taggings"."taggable_id" = $1 AND "taggings"."taggable_type" = $2 AND "taggings"."context" = $3  [["taggable_id", 75], ["taggable_type", "Post"], ["context", "tags"]]
  ↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'
   (0.2ms)  SELECT "tags"."name" FROM "tags" INNER JOIN "taggings" ON "tags"."id" = "taggings"."tag_id" WHERE "taggings"."taggable_id" = $1 AND "taggings"."taggable_type" = $2 AND "taggings"."context" = $3  [["taggable_id", 74], ["taggable_type", "Post"], ["context", "tags"]]
  ↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'

可见也出现了 N + 1 查询了,而且这种查询还不那么容易发现。仅仅是给渲染层多加一个方法,就容易导致这种问题。当然可以通过简单的Post.includes(:tags)来解决这个问题。不过今天主要想介绍另一种优化策略,这种策略对于仅在渲染层调用的方法十分有效,策略名为SQL Caching

官方对它的描述很简单

If Rails encounters the same query again for that request, it will use the cached result set as opposed to running the query against the database again.

也就是在同一个请求里面(我们可以理解成同一个 Action 里面),如果 Rails 检测到两条一摸一样的数据库查询,那么第二次查询将从缓存里面取,而不会真的去查数据库,比方说

class PostsController < ApplicationController
  def index
    # From Database
    @posts = Post.order('created_at DESC')
    # From Cached
    @posts = Post.order('created_at DESC')
  end
end

这种策略对于在渲染层调用复杂方法,而又不太容易通过includes等手段去优化的场景尤为有效。应用在上面的案例中,可以把Post#tags_include?调整成下面这个样子

class Post < ApplicationRecord
  def tags_include?(tag_name)
    return false if tag_name.blank?

    tag_ids = ActsAsTaggableOn::Tag.where(name: 'Rails').pluck(:id)
    post_ids = ActsAsTaggableOn::Tagging.where(tag_id: tag_ids, taggable_type: 'Post').pluck(:taggable_id).uniq

    post_ids.include?(id)
  end
end

它要做的事情其实是一样的,然而数据库查询次数却不一样。假设再次发送请求

> curl http://127.0.0.1:4000/posts?search_tag=Rails

它的日志会长得像下面这样

Started GET "/posts?search_tag=Rails" for 127.0.0.1 at 2021-12-31 23:22:05 +0800
Processing by PostsController#index as */*
  Parameters: {"search_tag"=>"Rails"}
  Rendering posts/index.html.erb within layouts/application
  Post Load (5.8ms)  SELECT "posts".* FROM "posts" ORDER BY created_at DESC
  ↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'
   (0.4ms)  SELECT "tags"."id" FROM "tags" WHERE "tags"."name" = $1  [["name", "Rails"]]
  ↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'
   (0.3ms)  SELECT "taggings"."taggable_id" FROM "taggings" WHERE "taggings"."tag_id" = $1 AND "taggings"."taggable_type" = $2  [["tag_id", 5], ["taggable_type", "Post"]]
  ↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'

  CACHE  (0.0ms)  SELECT "tags"."id" FROM "tags" WHERE "tags"."name" = $1  [["name", "Rails"]]
  ↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'
  CACHE  (0.0ms)  SELECT "taggings"."taggable_id" FROM "taggings" WHERE "taggings"."tag_id" = $1 AND "taggings"."taggable_type" = $2  [["tag_id", 5], ["taggable_type", "Post"]]
  ↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'
  CACHE  (0.0ms)  SELECT "tags"."id" FROM "tags" WHERE "tags"."name" = $1  [["name", "Rails"]]
  ↳ activerecord (6.0.3.7) lib/active_record/log_subscriber.rb:100:in `debug'
  CACHE  (0.0ms)  SELECT "taggings"."taggable_id" FROM "taggings" WHERE "taggings"."tag_id" = $1 AND "taggings"."taggable_type" = $2  [["tag_id", 5], ["taggable_type", "Post"]]

为了方便查看我用空行分割开了,可见只有第一次调用该方法的时候会查询数据库,随后 Rails 会将查询结果缓存起来。如果再次遇到相同的查询,那么它将会从缓存中获取结果,而不会再去访问数据库。其实就是 Rails 把查询结果通过合适的数据结构存储在内存中,下次需要的时候直接从内存中获取,节省了查询数据库的时间。

尾声

这篇文章作为上一篇文章的补充,在一些场景里面,使用内置的includes, preload, eager_load解决 N + 1 问题,可能会稍显麻烦,并且在一定程度上缺乏灵活性。如果在渲染层中调用了会引发 N + 1 问题的复杂方法,而这个方法并不容易使用includes等手段来解决的时候,采用SQL Caching或许是一个不错的选择,在同一个请求中相同的查询会被缓存起来,供下一次查询使用,能有效解决 N + 1 问题。

好总结。以前试过这种方案,觉得稍微有点 hack 就没这么整😀

捉个虫:

tag_ids = ActsAsTaggableOn::Tag.where(name: 'Rails').pluck(:id)

应该是

tag_ids = ActsAsTaggableOn::Tag.where(name: tag_name).pluck(:id)

顺便问一下楼主:ActsAsTaggableOn::Tag 是在模型下建的命名空间么,还是用的 gem 的要求

jicheng1014 回复

谢了老哥,果然有这样的 gem

楼主高产 🚀

zhengpd 回复

多谢 Eric。😀

需要 登录 后方可回复, 如果你还没有账号请 注册新账号