新手问题 怎么查 Rails 应用的灵异问题? (结帖,问题解决,但具体原因不清楚)

cxh116 · 2015年01月10日 · 最后由 cxh116 回复于 2015年01月12日 · 3789 次阅读

应用组合 nginx + puma + mysql,网站有时报 502 的错误,查 nginx 错误日志显示是

[error] 992#0: *68629 upstream timed out (110: Connection timed out) while reading response header from upstream, client:

看日志是 nginx 连接 puma 有问题,某些请求运行时间过长. 以为是代码写的有问题,参考 Debugging, Unicorns, Rails and Timeouts 这篇的指导,运行了 24 小时,下面是一些日志摘要

W, [2015-01-09T18:27:22.815086 #9680]  WARN -- : /null? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T18:27:43.346427 #9680]  WARN -- : /null? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T18:29:35.144107 #9680]  WARN -- : /api/product_orders?page=1&token=xxxx&10=10 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T18:30:03.399384 #9680]  WARN -- : /api/product_orders?page=1&token=xxxx&10=10 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T19:51:58.257420 #9674]  WARN -- : /farm_products? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T19:51:58.347016 #9672]  WARN -- : /farm_products/155? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T19:52:14.973153 #9672]  WARN -- : /farm_products? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T19:52:26.380556 #9672]  WARN -- : /farm_products? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T19:52:38.458941 #9672]  WARN -- : /farm_products? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T19:52:48.322228 #9680]  WARN -- : /farm_products/155? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T19:53:30.383086 #9680]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T20:02:16.656869 #9680]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(

这些请求有时会超过 8 秒钟,但你去访问的时候又正常了,随机在不同的请求路径出现. 最不可理解的是 "/null?" 这个是 android 手机端调用的 bug,地址是不存在的,也就是说在在 routes 就被返回 404 了,根本不会有什么数据库查询之类的,但它处理时间居然超过了 8 秒

服务器应该也没有问题,查了同一时间能正常响应的请求,最快的请求 9ms 就完成了,一般 100ms 左右

感觉也不像是 puma 的问题,因为其它的网站也有用 puma,没有出现这问题。

也不像代码有死循环问题,从 puma 进程启动时间来看,只要我没有部署手动重启,启动时间不会变.god 都没有因为超过内存或 cpu 使用限制而杀掉 puma 进程,因为没有超过限制过

自己对此问题实在不知道怎么下手了,所以用论坛的大牛们求救?指点一下有没有什么工具或方法查此类问题。先说声谢谢了

给出的信息还不够详细,Puma,Nginx 配置发出来看看,还有 502 的是什么样的功能

出现这些情况的时候是否都是并发比较多的时候?可以检查一下操作系统的 ulimit 等参数,可能文件打开句柄数或者 socket 打开数之类超过了限制,导致 nginx 等待 puma 处理掉其他请求的时候,没有办法再打开更多的进程而出现超时。

#1 楼 @huacnlee nginx site conf

server {
    listen       80;
    server_name  www.xxx.com;

    root   /home/manor/app/current/public;

    access_log  /var/log/nginx/xxx.com.access.log  main;

    error_page  404              /404.html;
    error_page   500 502 503 504  /50x.html;
    location = /50x.html {
        root   /usr/share/nginx/html;
    }

    location ~* \.(js|css|png|jpg|jpeg|gif|ico)$ {
        expires 30d;
        access_log off;
    }

    location / {
        index  index.html;
        try_files $uri @manorpuma;
    }

    location @manorpuma {
        proxy_set_header  X-Real-IP        $remote_addr;
        proxy_set_header  X-Forwarded-For  $proxy_add_x_forwarded_for;
        proxy_set_header  Host             $http_host;
        proxy_redirect    off;
        proxy_pass   http://127.0.0.1:9292;
    }
}

nginx conf

user  nginx;
worker_processes  1;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;


events {
    worker_connections  1024;
}


http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';


    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;
    server_tokens off;

    client_max_body_size 10m;

    #gzip  on;

    include /etc/nginx/conf.d/*.conf;
}

puma conf

base_name = 'manor'

directory "/home/#{base_name}/app/current"

worker_timeout 10
workers 2
threads 8,32
#bind "unix:///home/#{base_name}/app/current/tmp/puma.sock?mask=0777"
bind "tcp://127.0.0.1:9292"
pidfile "/home/#{base_name}/app/current/tmp/puma.pid"
state_path "/home/#{base_name}/app/current/tmp/puma.state"

preload_app!

on_worker_boot do
  #$redis.client.reconnect

  ActiveRecord::Base.connection.reconnect!
end

502 请求出现的路径也是随机的,也的只是后台简单的单表查询

#2 楼 @quakewang 并发不高,晚上 0 点时也出现这种情况,那个时候基本上几分钟一个请求

ulimit -a 结果

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63706
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63706
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

尝试过 ab 简单测试首页,1w 个请求,100 个并发,失败请求数为 0,也没有看到超时,首页最少查了 4 张表

这台物理机上面放的虚拟机蛮多的,会不会是因为 CPU 没有限制,某些虚拟机把 CPU 占用完了而导致超时?

ps

manor    15270  0.0  1.4 291476 115020 ?       Sl   09:19   0:04 puma 2.10.2 (tcp://127.0.0.1:9292)                                                                                                                
manor    15273  1.5  3.0 1321304 252220 ?      Sl   09:19  12:08 puma: cluster worker 0: 15270                                                                                                                     
manor    15277  1.6  3.0 1318972 251180 ?      Sl   09:19  12:54 puma: cluster worker 1: 15270  

rails log, 这是今天上午以来的超时日志,像 squog48480.txt 这个路径是不存在的,是别人在扫描,但也超时了 8 秒钟

W, [2015-01-10T14:10:12.408338 #15273]  WARN -- : /admin/pay_details?utf8=%E2%9C%93&q%5Border_number_cont%5D=&q%5Bpay_order_product_orders_order_number_eq%5D=150109455847&commit=%E6%90%9C%E7%B4%A2 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T14:10:44.467813 #15273]  WARN -- : /admin/pay_details?utf8=%E2%9C%93&q%5Border_number_cont%5D=&q%5Bpay_order_product_orders_order_number_eq%5D=150109580972&commit=%E6%90%9C%E7%B4%A2 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T14:25:57.898674 #15273]  WARN -- : /admin/pay_details/415? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T14:36:19.919086 #15273]  WARN -- : /products/7? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T14:36:24.623033 #15273]  WARN -- : /admin/pay_details/415? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T14:36:40.778678 #15273]  WARN -- : /products/7? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T14:39:51.156973 #15273]  WARN -- : /admin/product_orders? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T14:42:20.671041 #15273]  WARN -- : /admin/manors/28? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T14:42:22.235304 #15273]  WARN -- : /admin/manors/28? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T14:56:17.433214 #15273]  WARN -- : /farm_products/151? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T15:28:11.755474 #15273]  WARN -- : /admin/login? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T15:28:19.967264 #15273]  WARN -- : /admin/coupons/1? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:09:20.641891 #15273]  WARN -- : /farm_products/203?from=singlemessage&isappinstalled=0 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:09:44.190186 #15273]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:28:58.509238 #15273]  WARN -- : /manors/28? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:28:58.509812 #15273]  WARN -- : /farm_products/204? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:28:58.512869 #15273]  WARN -- : /about? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:28:58.513834 #15273]  WARN -- : /farm_products? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:28:58.520793 #15273]  WARN -- : /farm_products/203? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:28:59.594871 #15273]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:28:59.618843 #15273]  WARN -- : /manors? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:28:59.632609 #15273]  WARN -- : /login? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:28:59.632963 #15273]  WARN -- : /farm_products/list?cid=1 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:08.643499 #15273]  WARN -- : /manors/28? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:08.646714 #15273]  WARN -- : /farm_products/204? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:09.600770 #15273]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:13.687735 #15273]  WARN -- : /manors/28? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:13.695943 #15273]  WARN -- : /manors/28? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:13.695795 #15273]  WARN -- : /manors/28? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:15.992802 #15273]  WARN -- : /farm_products/204? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:15.994440 #15273]  WARN -- : /farm_products/204? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:15.995973 #15273]  WARN -- : /farm_products/204? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:16.003169 #15273]  WARN -- : /farm_products/204? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:16.006432 #15273]  WARN -- : /farm_products/204? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:16.013690 #15273]  WARN -- : /farm_products/204? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:19.624268 #15273]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:30:02.932361 #15273]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T17:17:48.064364 #15273]  WARN -- : /products/23? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T17:18:29.761654 #15273]  WARN -- : /farm_products/151? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T17:23:53.770625 #15273]  WARN -- : /farm_products/168? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T17:30:08.645647 #15273]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T17:31:42.602090 #15273]  WARN -- : /mapi/merchants/login? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:23:42.086524 #15273]  WARN -- : /manors/27? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:24:09.796062 #15273]  WARN -- : /farm_products/209? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:24:21.304103 #15273]  WARN -- : /user_coupons? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:24:24.149114 #15273]  WARN -- : /pay_orders/274? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:24:41.174035 #15273]  WARN -- : /user_coupons? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:24:56.974919 #15273]  WARN -- : /user_coupons? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:25:31.223952 #15273]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:27:28.185707 #15273]  WARN -- : /pay_orders/from_product_order?product_order_id=321 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:27:31.065133 #15273]  WARN -- : /products/4? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:27:49.097025 #15273]  WARN -- : /farm_products/131? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:28:17.217875 #15273]  WARN -- : /tourism_products/14? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:28:28.265505 #15273]  WARN -- : /pay_orders/277/go? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:28:46.029485 #15273]  WARN -- : /notes/4? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:47:42.219719 #15273]  WARN -- : /farm_products/list?cid=4 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:47:43.973957 #15273]  WARN -- : /farm_products/list?cid=4 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:48:05.570790 #15273]  WARN -- : /farm_products/list?cid=2&page=2 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T19:57:21.140408 #15273]  WARN -- : /buddies/new? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:01:20.062225 #15273]  WARN -- : /manors/3? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:03:00.752695 #15273]  WARN -- : /manors/3? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:28:55.244187 #15273]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:30:42.501772 #15273]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:36:58.784065 #15273]  WARN -- : /activities? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:40:28.962651 #15273]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:41:06.718788 #15273]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:47:15.851442 #15273]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:56:22.624565 #15273]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:59:53.813606 #15273]  WARN -- : /uztce33383.txt? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T21:02:09.573501 #15273]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T21:08:49.833639 #15273]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T21:10:52.103289 #15273]  WARN -- : /squog48480.txt? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T21:19:50.979843 #15273]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T21:38:22.530999 #15273]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T21:44:01.503328 #15273]  WARN -- : /farm_products/165? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T21:54:59.149493 #15273]  WARN -- : /forget? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T22:00:11.402326 #15273]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T22:03:53.743778 #15273]  WARN -- : /manor_product? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T22:04:54.059040 #15273]  WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T22:31:22.260466 #15273]  WARN -- : /tourism_products/6? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T22:41:30.776601 #15273]  WARN -- : /products/40? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T22:41:40.844292 #15273]  WARN -- : /products/40? I’m about to timeout bringing down my unicorn worker too :(

  1. unicorn 有单独的 error log,不知道 puma 有没有,可以看看那个日志
  2. 压力测试不反应真实情况,newrelic 按上试一下,免费版的就可以
  3. 检查一下外部请求和服务,包括 redis memcache 等

#4 楼 @cxh116 我看到 open file 是 1024,这个值是默认值吧,有点偏小。 你监控一下 lsof,看打开的文件数是不是超过了,而且错误日志有点奇怪,你说用的是 puma,但是为什么错误日志里面是 unicorn:I’m about to timeout bringing down my unicorn worker too :(

#5 楼 @hooooopo 后台加上 timeout 的日志,现在会报数据库连接问题,有可能是数据库的问题。之前 web 服务与 mysql 都是同一台机器,这个项目是他们要求数据库分离,所以他们另外配置了一台机器。也许是那边的问题,也许是内网网络环境的问题。这边准备把它放一台机器上面再来试试

之前没有出现数据库获取不到连接的问题,是加了 LogBeforeTimeout 才出现,也许是 LogBeforeTimeout 另外起线程导致每次都要获取新的 mysql 连接。但 mysql 的最大连接数已经设置为 256 了,两个进程,共 64 个线程,感觉连接数应该够用了

没有用到 redis 与 memcache. newrelic 准备试试. puma 有个 stdout_redirect 配置参数,也准备来试试

cat log/production.log| grep -v INFO

W, [2015-01-11T08:56:13.948047 #15273]  WARN -- : /api/users/login? I’m about to timeout bringing down my unicorn worker too :(
F, [2015-01-11T08:56:15.951826 #15273] FATAL -- : 
ActiveRecord::ConnectionTimeoutError (could not obtain a database connection within 5.000 seconds (waited 5.000 seconds)):
  config/initializers/log_before_timeout.rb:15:in `call'


W, [2015-01-11T08:56:16.535154 #15277]  WARN -- : /api/users/login? I’m about to timeout bringing down my unicorn worker too :(
F, [2015-01-11T08:56:18.539041 #15277] FATAL -- : 
ActiveRecord::ConnectionTimeoutError (could not obtain a database connection within 5.000 seconds (waited 5.000 seconds)):
  config/initializers/log_before_timeout.rb:15:in `call'


W, [2015-01-11T08:56:36.107301 #15273]  WARN -- : /api/users/login? I’m about to timeout bringing down my unicorn worker too :(
F, [2015-01-11T08:56:38.111343 #15273] FATAL -- : 
ActiveRecord::ConnectionTimeoutError (could not obtain a database connection within 5.000 seconds (waited 5.000 seconds)):
  config/initializers/log_before_timeout.rb:15:in `call'


W, [2015-01-11T08:56:38.654370 #15277]  WARN -- : /api/users/login? I’m about to timeout bringing down my unicorn worker too :(
F, [2015-01-11T08:56:40.658041 #15277] FATAL -- : 
ActiveRecord::ConnectionTimeoutError (could not obtain a database connection within 5.000 seconds (waited 5.000 seconds)):
  config/initializers/log_before_timeout.rb:15:in `call'


W, [2015-01-11T08:59:19.421961 #15277]  WARN -- : /farm_products/140? I’m about to timeout bringing down my unicorn worker too :(
F, [2015-01-11T08:59:21.425568 #15277] FATAL -- : 
ActiveRecord::ConnectionTimeoutError (could not obtain a database connection within 5.000 seconds (waited 5.000 seconds)):
  config/initializers/log_before_timeout.rb:15:in `call'

#6 楼 @quakewang 错误日志提示 unicorn 是打印的文本就是这样的 http://underthehood.meltwater.com/blog/2014/03/21/debugging-unicorn-rails-timeouts/

#5 楼 @hooooopo

谢谢你们的帮忙,问题暂时算解决了。把数据库迁移到 web 服务器上面来没有发现问题了。数据库服务器是他们甲方配置的,还安装了什么"帕拉迪统一安全管理与综合审计",看这软件的介绍是对系统底层修改蛮大的

需要 登录 后方可回复, 如果你还没有账号请 注册新账号