有的时候做批量操作的时候,比如:
Article.published.each do |ar|
do_sth(ar)
end
发现执行过程中,内存不断消耗掉,按照我的想法,each 出来的应该是一个迭代器,每次迭代中间的临时变量应该会被收集掉,但是结果不是。我是用 pry 来跑的,不知道是否会有影响。这样的问题就是,如果 article 表比较大,或者中间展开消耗了很多内存(比如 article 又联系到了一堆对象),最后容易跑爆掉。
请问大家有谁知道具体的原理,以及如何处理这种状况?谢谢。
为什么不直接翻文档
http://guides.rubyonrails.org/active_record_querying.html
But this approach becomes increasingly impractical as the table size increases, since User.all.each instructs Active Record to fetch the entire table in a single pass, build a model object per row, and then keep the entire array of model objects in memory. Indeed, if we have a large number of records, the entire collection may exceed the amount of memory available.
Rails provides two methods that address this problem by dividing records into memory-friendly batches for processing. The first method, find_each, retrieves a batch of records and then yields each record to the block individually as a model. The second method, find_in_batches, retrieves a batch of records and then yields the entire batch to the block as an array of models.
很神奇,我用了 find_each,内存还是飞快地涨,里面我联系了另外一个 model,另外一个 model 里面有一个serialize :data, Hash
。。。不知道是不是这个原因。。
不行,里面的东西我都清空了,内存消耗还是涨,可能是 pry 记录结果的原因。。
#9 楼 @doitian 看了一下,好像这个是解法: http://nerdd.dk/posts/YAML-load-considered-harmful 不过用了之后,还是暴增。。