Homeland 谈谈今天 ruby-china 的优化

huacnlee · 2012年02月04日 · 最后由 psvr 回复于 2012年02月06日 · 6266 次阅读

今天我做了几件事情:

1. 给 Topic model 加上了 last_reply_user_login 和 node_name 字段,在数据创建的时候写入,以减少列表也的动态查询;
2. 修正了分页的 Bug 带来的性能问题;
3. 给 User, Node, Section 加上了 MongoDB 的 cache;
4. MongoDB 移到了 MongoIC 上面去;
5. 服务上面的 Ruby 重新装了一下那个  falcon 补丁,并试着调整了一下 GC 参数(看起来比之前好些了);

随着数据越来越多,我渐渐感觉 ruby-china 越来越慢了,日志显示有时候论坛首页打开要600ms...无法接受,而论坛的分页页面就更离谱了,1300ms+ ... 于是我试着检查慢下来的原因 开始是想将 最后回复人和节点两个数据写死到 Topic 里面

class Topic
  belongs_to :user
  belongs_to :last_reply_user
  field :last_reply_user_login
  belongs_to :node
  field :node_name
end

目前列表页面需要关联三个数据 user, last_reply_usernode,而 node 的数据有不是市场修改的,而列表展示只是需要 name 字段,所以如果直接再 Topic 保存的时候存下 node_name,那查询的时候就会省下很多了,last_reply_user 也是同理。 这个改动后,论坛首页就只有两个动态查询了。

不过这个改动导致我不得不把修改帐号名的功能给关闭的。

论坛首页的速度是上去了,但是分页依然还是很慢。 无意中我发现查询日志里面 includes user 的 id 数量非常多,有 Topic 的总数那么多...

后面才知道原来 will_paginate 无法支持 Mongoid 的分页方式,目前实现的分页原来是假象,它是先取出全部的数据后,用 Array 来分页...

解决办法就是另外再安装 will_paginate_mongoid 这个 Gem,具体再 will_paginate_mongoid 的 Github 页面上也有说明 https://github.com/lucasas/will_paginate_mongoid

然后,速度自然上去了,这个 Bug 数据少的时候无法发现...

做个标记,以后调整哪来对比 1 CPU, 1G 内存,4 个 unicorn 进程,帖子 1000 用户 1000 现在的响应速度

Started GET "/topics/last"
Completed 200 OK in 265ms (Views: 201.0ms | Mongo: 52.8ms | Solr: 0.0ms)

Started GET "/topics"
Completed 200 OK in 120ms (Views: 88.4ms | Mongo: 30.7ms | Solr: 0.0ms)

Started GET "/topics/node2?page=2"
Completed 200 OK in 198ms (Views: 152.2ms | Mongo: 36.3ms | Solr: 0.0ms)

Started GET "/topics"
Completed 200 OK in 223ms (Views: 90.3ms | Mongo: 130.8ms | Solr: 0.0ms)

帖子查看页面,以 777 这篇为例,目前有 58 个回复

Started GET "/topics/777" at 2012-02-04 20:41:05 +0800
Processing by TopicsController#show as HTML
  Parameters: {"id"=>"777"}
  Rendered topics/_base.html.erb (1.9ms)
  Rendered app/cells/topics/sidebar_for_new_topic_node.html.erb (0.9ms)
  Rendered topics/_topic_info.html.erb (3.7ms)
  Rendered topics/translation/_topic_reply_statics.zh-CN.html.erb (0.3ms)
  Rendered replies/_reply.html.erb (5.1ms)
  Rendered replies/_reply.html.erb (3.5ms)
  Rendered replies/_reply.html.erb (4.0ms)
  Rendered replies/_reply.html.erb (4.0ms)
  Rendered replies/_reply.html.erb (5.3ms)
  Rendered replies/_reply.html.erb (3.9ms)
  Rendered replies/_reply.html.erb (3.2ms)
  Rendered replies/_reply.html.erb (3.8ms)
  Rendered replies/_reply.html.erb (3.3ms)
  Rendered replies/_reply.html.erb (3.5ms)
  Rendered replies/_reply.html.erb (3.6ms)
  Rendered replies/_reply.html.erb (3.6ms)
  Rendered replies/_reply.html.erb (3.3ms)
  Rendered replies/_reply.html.erb (3.7ms)
  Rendered replies/_reply.html.erb (4.4ms)
  Rendered replies/_reply.html.erb (4.7ms)
  Rendered replies/_reply.html.erb (3.8ms)
  Rendered replies/_reply.html.erb (4.3ms)
  Rendered replies/_reply.html.erb (3.7ms)
  Rendered replies/_reply.html.erb (3.1ms)
  Rendered replies/_reply.html.erb (3.7ms)
  Rendered replies/_reply.html.erb (3.2ms)
  Rendered replies/_reply.html.erb (3.0ms)
  Rendered replies/_reply.html.erb (3.6ms)
  Rendered replies/_reply.html.erb (3.2ms)
  Rendered replies/_reply.html.erb (3.3ms)
  Rendered replies/_reply.html.erb (3.3ms)
  Rendered replies/_reply.html.erb (3.2ms)
  Rendered replies/_reply.html.erb (3.5ms)
  Rendered replies/_reply.html.erb (4.1ms)
  Rendered replies/_reply.html.erb (6.9ms)
  Rendered replies/_reply.html.erb (6.0ms)
  Rendered replies/_reply.html.erb (3.9ms)
  Rendered replies/_reply.html.erb (3.1ms)
  Rendered replies/_reply.html.erb (4.3ms)
  Rendered replies/_reply.html.erb (4.0ms)
  Rendered replies/_reply.html.erb (4.3ms)
  Rendered replies/_reply.html.erb (3.3ms)
  Rendered replies/_reply.html.erb (3.8ms)
  Rendered replies/_reply.html.erb (7.3ms)
  Rendered replies/_reply.html.erb (5.3ms)
  Rendered replies/_reply.html.erb (3.8ms)
  Rendered replies/_reply.html.erb (4.1ms)
  Rendered replies/_reply.html.erb (4.1ms)
  Rendered replies/_reply.html.erb (3.2ms)
  Rendered replies/_reply.html.erb (3.6ms)
  Rendered replies/_reply.html.erb (3.6ms)
  Rendered replies/_reply.html.erb (4.0ms)
  Rendered replies/_reply.html.erb (3.2ms)
  Rendered replies/_reply.html.erb (3.3ms)
  Rendered replies/_reply.html.erb (3.9ms)
  Rendered replies/_reply.html.erb (10.3ms)
  Rendered replies/_reply.html.erb (8.5ms)
  Rendered replies/_reply.html.erb (6.0ms)
  Rendered replies/_reply.html.erb (3.7ms)
  Rendered replies/_reply.html.erb (3.6ms)
  Rendered replies/_reply.html.erb (3.6ms)
  Rendered replies/_reply.html.erb (3.7ms)
  Rendered topics/_replies.html.erb (252.6ms)
  Rendered topics/_editor_toolbar.html.erb (0.4ms)
  Rendered app/cells/topics/reply_help_block.zh-CN.html.erb (0.0ms)
  Rendered topics/show.html.erb within layouts/application (279.3ms)
  Rendered common/_search_form.html.erb (0.2ms)
  Rendered common/_user_nav.html.erb (4.9ms)
  Rendered common/_feeds.html.erb (0.7ms)
  Rendered common/_google_analytics.html.erb (0.1ms)
Completed 200 OK in "500ms" (Views: "288.6ms" | Mongo: "49.6ms" | Solr: 0.0ms)

似乎 MongoIC 依然不给力,刚刚有几个请求:

Completed 200 OK in 842ms (Views: 162.0ms | Mongo: 677.9ms | Solr: 0.0ms)
Completed 200 OK in 948ms (Views: 95.1ms | Mongo: 851.6ms | Solr: 0.0ms)

为什么要 render 这么多次 reply 这个模板?加:collection 选项呀

分页推荐用 kaminari,印像中从 mongoid2.2 开始,用 will_paginate 分页是有问题的

#3 楼 @hooopo 什么 : collection ?

这句呀:

<% replies.each_with_index do |reply,i| %>
  <%= render :partial => "replies/reply", :locals => {:topic => topic, :item => reply, :i => i} %>
<% end %>

改成:

<%= render :partial => "replies/reply", :collection => replies%>

(模板里的变量可能要调整一下)

渲染局部模板是 Rails 里的性能瓶颈之一,像这种循环里 render 相当于数据库查询里面的 n+1 query 了。。

#6 楼 @hooopo 局部模板渲染的性能好像以前讨论过,不过缺乏对最近 rails 版本的性能测试,最好找条件做一下性能测试

#6 楼 @hooopo 原来 Rails 3 还有这个东西,才知道。 http://rails.rubyonrails.org/classes/ActionView/Partials.html

<%= render :partial => "replies/reply", :collection => replies%>

replies/reply.html.erb

<%= reply_counter + 1 %>楼, 创建于 <%= reply.created_at %>

#6 楼 @hooopo 不过两种方式效果看起来是一样的,只不过简化了一下,执行时间是差不多的。

刚刚测试,目前查看帖子页面最耗时的就是 format_topic_body 这个转换过程,比如 777 这片 58 个回复,如果不用 format_topic_body 的话,Rendered topics/_replies.html.erb 的时间可以从 (252.6ms) 下降到 60ms,这个数字还是很可观的。 需要做的就是把 format_topic_body 的过程放到数据提交的时候,查出来就是有格式的。

@fsword 做了一个简单的测试:

<% unless params[:collection]%>
  <% @topic.posts.each do |post|%>
    <%= render :partial => "post", :locals => {:post => post}%>
  <% end%>
<% else %>
  <%= render :partial => "post", :collection => @topic.posts%>
<% end %>

使用 collection 选项 100 个请求耗时 1.73s

hooopo@hooopo:~/demo$ ab -c1 -n100 http://localhost:3000/topics/1?collection=1
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        thin
Server Hostname:        localhost
Server Port:            3000

Document Path:          /topics/1?collection=1
Document Length:        26220 bytes

Concurrency Level:      1
Time taken for tests:   1.731 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      2688900 bytes
HTML transferred:       2622000 bytes
Requests per second:    57.78 [#/sec] (mean)
Time per request:       17.307 [ms] (mean)
Time per request:       17.307 [ms] (mean, across all concurrent requests)
Transfer rate:          1517.20 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    11   17  10.7     11      47
Waiting:       11   17  10.7     11      47
Total:         11   17  10.7     11      47

Percentage of the requests served within a certain time (ms)
  50%     11
  66%     13
  75%     14
  80%     36
  90%     37
  95%     38
  98%     43
  99%     47
 100%     47 (longest request)

循环 render 耗时 2.24

hooopo@hooopo:~/demo$ ab -c1 -n100 http://localhost:3000/topics/1
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        thin
Server Hostname:        localhost
Server Port:            3000

Document Path:          /topics/1
Document Length:        26467 bytes

Concurrency Level:      1
Time taken for tests:   2.249 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      2713600 bytes
HTML transferred:       2646700 bytes
Requests per second:    44.46 [#/sec] (mean)
Time per request:       22.492 [ms] (mean)
Time per request:       22.492 [ms] (mean, across all concurrent requests)
Transfer rate:          1178.20 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    14   22  12.3     16      61
Waiting:       14   22  12.3     16      61
Total:         15   22  12.3     16      61

Percentage of the requests served within a certain time (ms)
  50%     16
  66%     18
  75%     22
  80%     42
  90%     43
  95%     48
  98%     52
  99%     61
 100%     61 (longest request)

ruby 1.9.3-p0 ,Rails3.2.1 评论数量是 100 个,_post.html.erb 是简单的读属性没有复杂逻辑,从日志也可以看到循环 render 的耗时明显高于 render collection

#10 楼 @huacnlee 那样编辑的时候就乱了。。只能加缓存

#12 楼 @hooopo 而且也不是这么好移动的,依赖了很多 Rails Helper,另外到时候格式化规则改变,就数据无法更新,只能上局部缓存了。

不过另外一个问题就是用户修改了 email,头像无法更新,无法清除缓存,这个还是个麻烦事...不过这种情况应该很少 如果是用户头像用用户名作为固定的地址,比如 //l.ruby-china.com/avatar/huacnlee.jpg,新上传永远覆盖这个文件,就不会有这个麻烦

缓存设过期时间

#15 楼 @Rei 也是一个折中的办法,我加上去

#11 楼 @hooopo 看来差别是挺大的,测试的日志还在吗?

#13 楼 @huacnlee 看了一下 format_topic_body 这个 helper,太复杂了,而且对同一个字符串反复解析,能不能把一些功能放到 js 上实现,后端给基本的 json 数据?

#18 楼 @fsword 已经把 TimeAgo 给 JS 了,其实那个 format_topic_body 已经比较快了,写到 JS 里面有太多需要独立实现,而且不好维护,也没法缓存。 我现在已经加上 cache 了,速度噌一下就上去...

#17 楼 @fsword https://gist.github.com/1739013 前面 100 个是加了 collection 选项的,后面是没加的

现在还慢是因为 MongoIC 连接慢...这段时间不稳定,晚点我再把数据库那回来。

现在加了回复内容的缓存以后

Started GET "/topics/777"
Completed 200 OK in 124ms (Views: 42.1ms | Mongo: 25.6ms | Solr: 0.0ms)

#22 楼 @huacnlee 实在不行就扩大 Cache 范围,尽可能的把 Render 之后的 Html Cache 起来。

另外,Rails3.1 的 memecahed Cache 貌似有性能问题,跟 memery_cache 有数量级的性能差别。不知道是不是因为我用的 dalli 的缘故。Rails3.1 类似的性能陷阱还有不少。

#23 楼 @dazuiba 扩大 cache 的范围必然增大缓存失效的机率。得不偿失呀!除非想做成静态页。 缓存的粒度应该是越小越好。

一步一步来,现在能撑得起了,就以后再说

#25 楼 @huacnlee 辛苦,辛苦!受教了。

@huacnlee 能不能稍微讲一下 Mongoid#cache 文档上说

Out of the box, Mongoid wraps the MongoDB Ruby Driver's cursor in order to be memory efficient for large queries and data sets. However if you want the query to load all matching documents in memory and return them on n iterations without hitting the database again you may cache a class or criteria:

这个缓存是针对每个 request 的,还是全局的?大体的机制是什么样的?谢谢

#27 楼 @niedhui 我看 MongoDb 那边介绍说是将查询结果放在内存里面,具体不太了解,还在尝试中...

@huacnlee quora 克隆那个项目里也同样存在上述性能问题

需要 登录 后方可回复, 如果你还没有账号请 注册新账号