应用组合 nginx + puma + mysql,网站有时报 502 的错误,查 nginx 错误日志显示是
[error] 992#0: *68629 upstream timed out (110: Connection timed out) while reading response header from upstream, client:
看日志是 nginx 连接 puma 有问题,某些请求运行时间过长. 以为是代码写的有问题,参考 Debugging, Unicorns, Rails and Timeouts 这篇的指导,运行了 24 小时,下面是一些日志摘要
W, [2015-01-09T18:27:22.815086 #9680] WARN -- : /null? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T18:27:43.346427 #9680] WARN -- : /null? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T18:29:35.144107 #9680] WARN -- : /api/product_orders?page=1&token=xxxx&10=10 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T18:30:03.399384 #9680] WARN -- : /api/product_orders?page=1&token=xxxx&10=10 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T19:51:58.257420 #9674] WARN -- : /farm_products? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T19:51:58.347016 #9672] WARN -- : /farm_products/155? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T19:52:14.973153 #9672] WARN -- : /farm_products? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T19:52:26.380556 #9672] WARN -- : /farm_products? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T19:52:38.458941 #9672] WARN -- : /farm_products? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T19:52:48.322228 #9680] WARN -- : /farm_products/155? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T19:53:30.383086 #9680] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-09T20:02:16.656869 #9680] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
这些请求有时会超过 8 秒钟,但你去访问的时候又正常了,随机在不同的请求路径出现. 最不可理解的是 "/null?" 这个是 android 手机端调用的 bug,地址是不存在的,也就是说在在 routes 就被返回 404 了,根本不会有什么数据库查询之类的,但它处理时间居然超过了 8 秒
服务器应该也没有问题,查了同一时间能正常响应的请求,最快的请求 9ms 就完成了,一般 100ms 左右
感觉也不像是 puma 的问题,因为其它的网站也有用 puma,没有出现这问题。
也不像代码有死循环问题,从 puma 进程启动时间来看,只要我没有部署手动重启,启动时间不会变.god 都没有因为超过内存或 cpu 使用限制而杀掉 puma 进程,因为没有超过限制过
自己对此问题实在不知道怎么下手了,所以用论坛的大牛们求救?指点一下有没有什么工具或方法查此类问题。先说声谢谢了
出现这些情况的时候是否都是并发比较多的时候?可以检查一下操作系统的 ulimit 等参数,可能文件打开句柄数或者 socket 打开数之类超过了限制,导致 nginx 等待 puma 处理掉其他请求的时候,没有办法再打开更多的进程而出现超时。
#1 楼 @huacnlee nginx site conf
server {
listen 80;
server_name www.xxx.com;
root /home/manor/app/current/public;
access_log /var/log/nginx/xxx.com.access.log main;
error_page 404 /404.html;
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root /usr/share/nginx/html;
}
location ~* \.(js|css|png|jpg|jpeg|gif|ico)$ {
expires 30d;
access_log off;
}
location / {
index index.html;
try_files $uri @manorpuma;
}
location @manorpuma {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_pass http://127.0.0.1:9292;
}
}
nginx conf
user nginx;
worker_processes 1;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
sendfile on;
#tcp_nopush on;
keepalive_timeout 65;
server_tokens off;
client_max_body_size 10m;
#gzip on;
include /etc/nginx/conf.d/*.conf;
}
puma conf
base_name = 'manor'
directory "/home/#{base_name}/app/current"
worker_timeout 10
workers 2
threads 8,32
#bind "unix:///home/#{base_name}/app/current/tmp/puma.sock?mask=0777"
bind "tcp://127.0.0.1:9292"
pidfile "/home/#{base_name}/app/current/tmp/puma.pid"
state_path "/home/#{base_name}/app/current/tmp/puma.state"
preload_app!
on_worker_boot do
#$redis.client.reconnect
ActiveRecord::Base.connection.reconnect!
end
502 请求出现的路径也是随机的,也的只是后台简单的单表查询
#2 楼 @quakewang 并发不高,晚上 0 点时也出现这种情况,那个时候基本上几分钟一个请求
ulimit -a 结果
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63706
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 63706
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
尝试过 ab 简单测试首页,1w 个请求,100 个并发,失败请求数为 0,也没有看到超时,首页最少查了 4 张表
这台物理机上面放的虚拟机蛮多的,会不会是因为 CPU 没有限制,某些虚拟机把 CPU 占用完了而导致超时?
ps
manor 15270 0.0 1.4 291476 115020 ? Sl 09:19 0:04 puma 2.10.2 (tcp://127.0.0.1:9292)
manor 15273 1.5 3.0 1321304 252220 ? Sl 09:19 12:08 puma: cluster worker 0: 15270
manor 15277 1.6 3.0 1318972 251180 ? Sl 09:19 12:54 puma: cluster worker 1: 15270
rails log, 这是今天上午以来的超时日志,像 squog48480.txt 这个路径是不存在的,是别人在扫描,但也超时了 8 秒钟
W, [2015-01-10T14:10:12.408338 #15273] WARN -- : /admin/pay_details?utf8=%E2%9C%93&q%5Border_number_cont%5D=&q%5Bpay_order_product_orders_order_number_eq%5D=150109455847&commit=%E6%90%9C%E7%B4%A2 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T14:10:44.467813 #15273] WARN -- : /admin/pay_details?utf8=%E2%9C%93&q%5Border_number_cont%5D=&q%5Bpay_order_product_orders_order_number_eq%5D=150109580972&commit=%E6%90%9C%E7%B4%A2 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T14:25:57.898674 #15273] WARN -- : /admin/pay_details/415? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T14:36:19.919086 #15273] WARN -- : /products/7? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T14:36:24.623033 #15273] WARN -- : /admin/pay_details/415? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T14:36:40.778678 #15273] WARN -- : /products/7? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T14:39:51.156973 #15273] WARN -- : /admin/product_orders? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T14:42:20.671041 #15273] WARN -- : /admin/manors/28? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T14:42:22.235304 #15273] WARN -- : /admin/manors/28? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T14:56:17.433214 #15273] WARN -- : /farm_products/151? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T15:28:11.755474 #15273] WARN -- : /admin/login? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T15:28:19.967264 #15273] WARN -- : /admin/coupons/1? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:09:20.641891 #15273] WARN -- : /farm_products/203?from=singlemessage&isappinstalled=0 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:09:44.190186 #15273] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:28:58.509238 #15273] WARN -- : /manors/28? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:28:58.509812 #15273] WARN -- : /farm_products/204? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:28:58.512869 #15273] WARN -- : /about? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:28:58.513834 #15273] WARN -- : /farm_products? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:28:58.520793 #15273] WARN -- : /farm_products/203? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:28:59.594871 #15273] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:28:59.618843 #15273] WARN -- : /manors? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:28:59.632609 #15273] WARN -- : /login? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:28:59.632963 #15273] WARN -- : /farm_products/list?cid=1 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:08.643499 #15273] WARN -- : /manors/28? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:08.646714 #15273] WARN -- : /farm_products/204? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:09.600770 #15273] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:13.687735 #15273] WARN -- : /manors/28? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:13.695943 #15273] WARN -- : /manors/28? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:13.695795 #15273] WARN -- : /manors/28? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:15.992802 #15273] WARN -- : /farm_products/204? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:15.994440 #15273] WARN -- : /farm_products/204? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:15.995973 #15273] WARN -- : /farm_products/204? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:16.003169 #15273] WARN -- : /farm_products/204? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:16.006432 #15273] WARN -- : /farm_products/204? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:16.013690 #15273] WARN -- : /farm_products/204? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:29:19.624268 #15273] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T16:30:02.932361 #15273] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T17:17:48.064364 #15273] WARN -- : /products/23? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T17:18:29.761654 #15273] WARN -- : /farm_products/151? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T17:23:53.770625 #15273] WARN -- : /farm_products/168? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T17:30:08.645647 #15273] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T17:31:42.602090 #15273] WARN -- : /mapi/merchants/login? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:23:42.086524 #15273] WARN -- : /manors/27? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:24:09.796062 #15273] WARN -- : /farm_products/209? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:24:21.304103 #15273] WARN -- : /user_coupons? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:24:24.149114 #15273] WARN -- : /pay_orders/274? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:24:41.174035 #15273] WARN -- : /user_coupons? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:24:56.974919 #15273] WARN -- : /user_coupons? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:25:31.223952 #15273] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:27:28.185707 #15273] WARN -- : /pay_orders/from_product_order?product_order_id=321 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:27:31.065133 #15273] WARN -- : /products/4? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:27:49.097025 #15273] WARN -- : /farm_products/131? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:28:17.217875 #15273] WARN -- : /tourism_products/14? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:28:28.265505 #15273] WARN -- : /pay_orders/277/go? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:28:46.029485 #15273] WARN -- : /notes/4? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:47:42.219719 #15273] WARN -- : /farm_products/list?cid=4 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:47:43.973957 #15273] WARN -- : /farm_products/list?cid=4 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T18:48:05.570790 #15273] WARN -- : /farm_products/list?cid=2&page=2 I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T19:57:21.140408 #15273] WARN -- : /buddies/new? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:01:20.062225 #15273] WARN -- : /manors/3? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:03:00.752695 #15273] WARN -- : /manors/3? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:28:55.244187 #15273] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:30:42.501772 #15273] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:36:58.784065 #15273] WARN -- : /activities? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:40:28.962651 #15273] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:41:06.718788 #15273] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:47:15.851442 #15273] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:56:22.624565 #15273] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T20:59:53.813606 #15273] WARN -- : /uztce33383.txt? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T21:02:09.573501 #15273] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T21:08:49.833639 #15273] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T21:10:52.103289 #15273] WARN -- : /squog48480.txt? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T21:19:50.979843 #15273] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T21:38:22.530999 #15273] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T21:44:01.503328 #15273] WARN -- : /farm_products/165? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T21:54:59.149493 #15273] WARN -- : /forget? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T22:00:11.402326 #15273] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T22:03:53.743778 #15273] WARN -- : /manor_product? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T22:04:54.059040 #15273] WARN -- : /? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T22:31:22.260466 #15273] WARN -- : /tourism_products/6? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T22:41:30.776601 #15273] WARN -- : /products/40? I’m about to timeout bringing down my unicorn worker too :(
W, [2015-01-10T22:41:40.844292 #15273] WARN -- : /products/40? I’m about to timeout bringing down my unicorn worker too :(
#5 楼 @hooooopo 后台加上 timeout 的日志,现在会报数据库连接问题,有可能是数据库的问题。之前 web 服务与 mysql 都是同一台机器,这个项目是他们要求数据库分离,所以他们另外配置了一台机器。也许是那边的问题,也许是内网网络环境的问题。这边准备把它放一台机器上面再来试试
之前没有出现数据库获取不到连接的问题,是加了 LogBeforeTimeout 才出现,也许是 LogBeforeTimeout 另外起线程导致每次都要获取新的 mysql 连接。但 mysql 的最大连接数已经设置为 256 了,两个进程,共 64 个线程,感觉连接数应该够用了
没有用到 redis 与 memcache. newrelic 准备试试. puma 有个 stdout_redirect 配置参数,也准备来试试
cat log/production.log| grep -v INFO
W, [2015-01-11T08:56:13.948047 #15273] WARN -- : /api/users/login? I’m about to timeout bringing down my unicorn worker too :(
F, [2015-01-11T08:56:15.951826 #15273] FATAL -- :
ActiveRecord::ConnectionTimeoutError (could not obtain a database connection within 5.000 seconds (waited 5.000 seconds)):
config/initializers/log_before_timeout.rb:15:in `call'
W, [2015-01-11T08:56:16.535154 #15277] WARN -- : /api/users/login? I’m about to timeout bringing down my unicorn worker too :(
F, [2015-01-11T08:56:18.539041 #15277] FATAL -- :
ActiveRecord::ConnectionTimeoutError (could not obtain a database connection within 5.000 seconds (waited 5.000 seconds)):
config/initializers/log_before_timeout.rb:15:in `call'
W, [2015-01-11T08:56:36.107301 #15273] WARN -- : /api/users/login? I’m about to timeout bringing down my unicorn worker too :(
F, [2015-01-11T08:56:38.111343 #15273] FATAL -- :
ActiveRecord::ConnectionTimeoutError (could not obtain a database connection within 5.000 seconds (waited 5.000 seconds)):
config/initializers/log_before_timeout.rb:15:in `call'
W, [2015-01-11T08:56:38.654370 #15277] WARN -- : /api/users/login? I’m about to timeout bringing down my unicorn worker too :(
F, [2015-01-11T08:56:40.658041 #15277] FATAL -- :
ActiveRecord::ConnectionTimeoutError (could not obtain a database connection within 5.000 seconds (waited 5.000 seconds)):
config/initializers/log_before_timeout.rb:15:in `call'
W, [2015-01-11T08:59:19.421961 #15277] WARN -- : /farm_products/140? I’m about to timeout bringing down my unicorn worker too :(
F, [2015-01-11T08:59:21.425568 #15277] FATAL -- :
ActiveRecord::ConnectionTimeoutError (could not obtain a database connection within 5.000 seconds (waited 5.000 seconds)):
config/initializers/log_before_timeout.rb:15:in `call'
#6 楼 @quakewang 错误日志提示 unicorn 是打印的文本就是这样的 http://underthehood.meltwater.com/blog/2014/03/21/debugging-unicorn-rails-timeouts/
谢谢你们的帮忙,问题暂时算解决了。把数据库迁移到 web 服务器上面来没有发现问题了。数据库服务器是他们甲方配置的,还安装了什么"帕拉迪统一安全管理与综合审计",看这软件的介绍是对系统底层修改蛮大的