Ruby 利用数组的 & 操作,精减代码

luolinae86 · September 21, 2015 · Last by bindiry replied at March 24, 2017 · 4476 hits

精减前

arr = [1,2,3,4,5]
if (arr.include? 1) || (arr.include? 2) || (arr.include? 4)
  #do something
end

精减后

arr = [1,2,3,4,5]
if (arr & [1,2,4]).any?
#do something
end

数组对象的 & 操作,是求数组的交集 因此,需要判断在某个大的集合中,是否包含小集合中的一个或者多个时, 用 & 操作取代 || 操作可以精减代码

数组中,值得注意的还有 - 操作,

a = [1,2,3]
b = [2,3,4]
a - b #[1]
b - a #[4]

a - b 操作得到的数组是在 a 中存在,但 b 中不存在的元素,反之亦然

这个不错~其实好多方法都知道,就是用的时候想不起来😓

a = [1,2,3,4,5]
subset = [1,2,4]
subset.all? &a.method(:include?)

# 或者
require('set')
Set.new(subset).subset? Set.new(a)

如果求交并集场景较多,还是先转换成 Set 更高效

#2 楼 @jex 用 uniq 呢?

#3 楼 @tini8 uniq 之后还是数组啊,求交集并集还是低效(Ruby 中Array#&之类的求交集并集方法内部是用的 Hash),Set#include? 方法复杂度是 O(1),数组的是 O(n) 啊

#2 楼 @jex

SourceCode

#filename: bench_array_and_set.rb
require 'benchmark'
require 'set'

ary = (1..5).to_a
subary = [1, 2, 4]
subset = Set.new(subary)
set = Set.new(ary)
iterations = 10000

Benchmark.bm do |bm|
  bm.report('Array & and any?') do
    iterations.times { (ary & subary).any? }
  end

  bm.report('Array all? and include?') do
    iterations.times { subary.all?(&ary.method(:include?)) }
  end

  bm.report('Set with new') do
    iterations.times  { Set.new(subary).subset? Set.new(ary) }
  end

  bm.report('Set without new') do
    iterations.times { subset.subset? set }
  end
end

Result:

                             user     system      total        real
Array & and any?            0.010000   0.000000   0.010000 (  0.012931)
Array all? and include?     0.020000   0.000000   0.020000 (  0.023838)
Set with new                0.100000   0.010000   0.110000 (  0.101239)
Set without new             0.030000   0.000000   0.030000 (  0.034167)

#5 楼 @davidwei 看来我想当然了,不过这也只能怪 Ruby 自己了,Array#& 是 C 实现的,而 Set模块是用 Ruby 实现的…一个比 Array 更需要高效率的场景反而用 Ruby 写 😢

不过你的测试代码不太好,集合太小,并且不是随机数据,Array#include?理论是比Set#include?慢的,但看 Ruby 源码,Array#& 方法已经将数组转换成 hash 再求并集的,所以求交集并集能和 Set(数据结构意义上的 Set,Ruby 中的 Set 模块太令人失望了)一样高效。

看下面测试:

require 'benchmark'
require 'set'
GC.disable 

ary = (1..1000).to_a.shuffle
subary = (1..100).to_a.shuffle
set = Set.new(ary)
subset = Set.new(subary)
iterations = 2000

Benchmark.bm do |bm|
  bm.report("Array all? and include?\n") do
    iterations.times { subary.all?(&ary.method(:include?)) }
  end

  bm.report("Set all? and include?  \n") do
    iterations.times { subset.all?(&set.method(:include?)) }
  end

  bm.report("Array & and any?       \n") do
    iterations.times { (ary & subary).any? }
  end

end

输出:

       user     system      total        real
Array all? and include?
  2.900000   0.000000   2.900000 (  2.900707)
Set all? and include?  
  0.030000   0.000000   0.030000 (  0.038133)
Array & and any?       
  0.110000   0.000000   0.110000 (  0.112873)

理论上最高效的方法是:

subset.all?(&set.method(:include?))

但鉴于 Set 模块是 Ruby 代码实现的,集合不大的情况下,还是楼主的(arr & subary).any? 实际速度最快 😪

感谢 @jex@davidwei ,另外无意中看到的另一个benchmark源码,贴出来供这个贴子参考。

You need to Sign in before reply, if you don't have an account, please Sign up first.