运维 [求助] mongodb 老是挂掉,ruby 进程负荷高超时退出,怎么破

xinzhi · July 03, 2013 · Last by ashchan replied at July 04, 2013 · 7736 hits

今天下午出现的状况,查看 log 发现是 mongodb 不能正常服务。系统负荷偏高,unicorn.log 的进程也是进进出出。

tailf log/unicorn.log
I, [2013-07-03T23:56:13.076309 #24580]  INFO -- : worker=16 spawned pid=24580
I, [2013-07-03T23:56:13.085923 #24580]  INFO -- : worker=16 ready
E, [2013-07-03T23:56:15.066372 #20700] ERROR -- : worker=12 PID:24421 timeout (31s > 30s), killing
E, [2013-07-03T23:56:15.079335 #20700] ERROR -- : reaped #<Process::Status: pid 24421 SIGKILL (signal 9)> worker=12
I, [2013-07-03T23:56:15.095747 #24590]  INFO -- : worker=12 spawned pid=24590
I, [2013-07-03T23:56:15.104987 #24590]  INFO -- : worker=12 ready
E, [2013-07-03T23:56:17.085413 #20700] ERROR -- : worker=1 PID:24449 timeout (31s > 30s), killing
E, [2013-07-03T23:56:17.094906 #20700] ERROR -- : reaped #<Process::Status: pid 24449 SIGKILL (signal 9)> worker=1
I, [2013-07-03T23:56:17.111842 #24598]  INFO -- : worker=1 spawned pid=24598
I, [2013-07-03T23:56:17.122091 #24598]  INFO -- : worker=1 ready
E, [2013-07-03T23:56:20.102053 #20700] ERROR -- : worker=6 PID:24455 timeout (31s > 30s), killing
E, [2013-07-03T23:56:20.112134 #20700] ERROR -- : reaped #<Process::Status: pid 24455 SIGKILL (signal 9)> worker=6
I, [2013-07-03T23:56:20.129459 #24610]  INFO -- : worker=6 spawned pid=24610
I, [2013-07-03T23:56:20.138258 #24610]  INFO -- : worker=6 ready
Thu Jul  4 03:57:26 [conn10126] query krypton.topics query: { $query: { deleted_at: null, state: "publish", tags: { $all: [ "p2p租赁" ] }, node_id: { $nin: [ 17 ] } }, $orderby: { created_at: -1 } } ntoreturn:15 ntoskip:60 nscanned:17024 reslen:20 151ms
Thu Jul  4 03:57:26 [conn10126] query krypton.topics query: { $query: { deleted_at: null, state: "publish", tags: { $all: [ "p2p租赁" ] }, node_id: { $nin: [ 17 ] } }, $orderby: { created_at: -1 } } ntoreturn:15 ntoskip:60 nscanned:17024 reslen:20 124ms
Thu Jul  4 03:57:33 [conn10013] query krypton.topics query: { $query: { deleted_at: null, state: "publish", tags: { $in: [ "着迷网", "zhaomiwang" ] }, node_id: { $nin: [ 17 ] } }, $orderby: { created_at: -1 } } ntoreturn:15 nscanned:17024 nreturned:2 reslen:3234 122ms
Thu Jul  4 03:57:35 [conn10147] query krypton.topics query: { $query: { deleted_at: null, state: "publish", tags: { $all: [ "滤镜" ] }, node_id: { $nin: [ 17 ] } }, $orderby: { created_at: -1 } } ntoreturn:15 ntoskip:45 nscanned:17024 reslen:20 126ms
Thu Jul  4 03:57:35 [conn10147] query krypton.topics query: { $query: { deleted_at: null, state: "publish", tags: { $all: [ "滤镜" ] }, node_id: { $nin: [ 17 ] } }, $orderby: { created_at: -1 } } ntoreturn:15 ntoskip:45 nscanned:17024 reslen:20 135ms

http://www.36kr.com

查询太慢了吧,试试把 index 加上

user 和 tag 部分一直是性能瓶颈,这两部分的索引不知道要怎么写了。@huacnlee @Rei @poshboytl

现在索引是怎么样的?进 mongo 终端查询一条加 explain() 看看结果。 http://docs.mongodb.org/manual/reference/operator/explain/

@Rei:

user:

> db.users.find({ login: /^u1371355664$/i }).sort({ _id: 1 }).explain()
{
       "cursor" : "BtreeCursor id",
       "nscanned" : 75882,
       "nscannedObjects" : 75882,
       "n" : 1,
       "millis" : 504,
       "nYields" : 0,
       "nChunkSkips" : 0,
       "isMultiKey" : false,
       "indexOnly" : false,
       "indexBounds" : {
               "_id" : [
                       [
                               {
                                       "$minElement" : 1
                               },
                               {
                                       "$maxElement" : 1
                               }
                       ]
               ]
       }
}

tag:

> db.topics.find({ deleted_at: null, state: "publish", tags: { $all: [ "创业者-30-秒课堂" ] }, node_id: { $nin: [ 17 ] } }).sort({ created_at: -1 }).explain()
{
       "cursor" : "BtreeCursor tags_1",
       "nscanned" : 0,
       "nscannedObjects" : 0,
       "n" : 0,
       "scanAndOrder" : true,
       "millis" : 0,
       "nYields" : 0,
       "nChunkSkips" : 0,
       "isMultiKey" : true,
       "indexOnly" : false,
       "indexBounds" : {
               "tags" : [
                       [
                               "创业者-30-秒课堂",
                               "创业者-30-秒课堂"
                       ]
               ]
       }
}

感觉没有 index 啊,是不是忘了 create_index

db.users.getIndexes()
db.topics.getIndexes()

#5 楼 @Rei 是存在的吧。

> db.users.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "ns" : "krypton.users",
        "name" : "_id_"
    },
    {
        "v" : 1,
        "key" : {
            "login" : 1
        },
        "ns" : "krypton.users",
        "name" : "login_1"
    },
    {
        "v" : 1,
        "key" : {
            "email" : 1
        },
        "ns" : "krypton.users",
        "name" : "email_1"
    },
    {
        "v" : 1,
        "key" : {
            "location" : 1
        },
        "ns" : "krypton.users",
        "name" : "location_1"
    },
    {
        "v" : 1,
        "key" : {
            "private_token" : 1
        },
        "ns" : "krypton.users",
        "sparse" : true,
        "name" : "private_token_1"
    },
    {
        "v" : 1,
        "key" : {
            "tags" : 1
        },
        "ns" : "krypton.users",
        "background" : true,
        "name" : "tags_1"
    },
    {
        "v" : 1,
        "key" : {
            "tx_openid" : 1
        },
        "ns" : "krypton.users",
        "name" : "tx_openid_1"
    },
    {
        "v" : 1,
        "key" : {
            "invitation_token" : 1
        },
        "ns" : "krypton.users",
        "background" : true,
        "name" : "invitation_token_1"
    },
    {
        "v" : 1,
        "key" : {
            "invitation_by_id" : 1
        },
        "ns" : "krypton.users",
        "background" : true,
        "name" : "invitation_by_id_1"
    },
    {
        "v" : 1,
        "key" : {
            "replies_count" : 1,
            "topics_count" : 1
        },
        "ns" : "krypton.users",
        "name" : "replies_count_1_topics_count_1"
    },
    {
        "v" : 1,
        "key" : {
            "articles_count" : 1,
            "comments_count" : 1
        },
        "ns" : "krypton.users",
        "name" : "articles_count_1_comments_count_1"
    },
    {
        "v" : 1,
        "key" : {
            "login" : -1,
            "_id" : -1
        },
        "ns" : "krypton.users",
        "name" : "login_-1__id_-1"
    }
]
> db.topics.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "ns" : "krypton.topics",
        "name" : "_id_"
    },
    {
        "v" : 1,
        "key" : {
            "tags" : 1
        },
        "ns" : "krypton.topics",
        "background" : true,
        "name" : "tags_1"
    },
    {
        "v" : 1,
        "key" : {
            "companies" : 1
        },
        "ns" : "krypton.topics",
        "background" : true,
        "name" : "companies_1"
    },
    {
        "v" : 1,
        "key" : {
            "people" : 1
        },
        "ns" : "krypton.topics",
        "background" : true,
        "name" : "people_1"
    },
    {
        "v" : 1,
        "key" : {
            "products" : 1
        },
        "ns" : "krypton.topics",
        "background" : true,
        "name" : "products_1"
    },
    {
        "v" : 1,
        "key" : {
            "node_id" : 1
        },
        "ns" : "krypton.topics",
        "name" : "node_id_1"
    },
    {
        "v" : 1,
        "key" : {
            "user_id" : 1
        },
        "ns" : "krypton.topics",
        "name" : "user_id_1"
    },
    {
        "v" : 1,
        "key" : {
            "replied_at" : -1,
            "_id" : -1
        },
        "ns" : "krypton.topics",
        "name" : "replied_at_-1__id_-1"
    },
    {
        "v" : 1,
        "key" : {
            "likes_count" : 1
        },
        "ns" : "krypton.topics",
        "name" : "likes_count_1"
    },
    {
        "v" : 1,
        "key" : {
            "suggested_at" : 1
        },
        "ns" : "krypton.topics",
        "name" : "suggested_at_1"
    },
    {
        "v" : 1,
        "key" : {
            "created_at" : -1
        },
        "ns" : "krypton.topics",
        "name" : "created_at_-1"
    },
    {
        "v" : 1,
        "key" : {
            "sources" : 1
        },
        "ns" : "krypton.topics",
        "background" : true,
        "name" : "sources_1"
    },
    {
        "v" : 1,
        "key" : {
            "state" : 1
        },
        "ns" : "krypton.topics",
        "name" : "state_1"
    }
]

users 去掉 sort,似乎这条排序导致用错了 index。

topics 查询没有内容比较奇怪,看日志是 scan 了很多目标的,我估计要加个多键的索引 { tags: 1, created_at: -1 } 。而且单 tag 的查询,应该用 { tag: 'tagName' } 更快。

We're sorry, but something went wrong.

We've been notified about this issue and we'll take a look at it shortly.

#7 楼 @Rei 包含 sort 是这样写的: @user = User.where(:login => /^#{params[:id]}$/i).first,我不确定这句要怎么改了..

#9 楼 @xinzhi 是不是用了 default scope

#10 楼 @Rei 没有加,就是.first会有 sort id,去掉之后是没有的。

#11 楼 @xinzhi 用 hint 指定索引吧

#12 楼 @Rei 具体一点?我还不会用 hint。。

你确定搞挂的是这种 tag 查询?

这次搞好以后,赶紧把 newrelic 装上

#14 楼 @huacnlee newrelic 一直在用着。这次挂掉是缓存丢失然后并发重建繁忙导致的,没有用 Delay Jobs 的后果。

#16 楼 @xinzhi 缓存重建还是加上锁比较好

#13 楼 @xinzhi

@user = User.where(:login => /^#{params[:id]}$/i).hint(:login => 1).first

#11 楼 @xinzhi 发现是 mongoid 的行为。

#18 楼 @Rei 换成这句,查询时间还是会很长的,hint 还没有出现在查询中: Parameters: {"id"=>"xinzhi"} MOPED: 127.0.0.1:27017 QUERY database=krypton collection=users selector={"$query"=>{"login"=>/^xinzhi$/i}, "$orderby"=>{:_id=>1}} flags=[:slave_ok] limit=-1 skip=0 batch_size=nil fields=nil (248.6749ms)

#20 楼 @xinzhi 又发现一个问题,用 /i 会用不了索引,去掉 i

不过登陆不是一个频繁操作,觉得这个慢点也无所谓。

#21 楼 @Rei 赞呀!是工作的~ MOPED: 127.0.0.1:27017 QUERY database=krypton collection=users selector={"$query"=>{"login"=>/^xinzhi$/}, "$orderby"=>{:_id=>1}} flags=[:slave_ok] limit=-1 skip=0 batch_size=nil fields=nil (1.2119ms)

我之前 resque 老是 crash 掉,用了 god 解决的,也许这里可以借鉴下。

#23 楼 @wxianfeng 频繁 crash 靠 god 治标不治本。

#22 楼 @xinzhi 只要看到 nscanned:N(其中 N 大于 1),一般就是索引不对。查询条件和排序一起决定了用哪个 index,chain 多个 where 和 order 很容易会造成用不上 index。

You need to Sign in before reply, if you don't have an account, please Sign up first.