新手问题切割数学算式来获取符号，不同方法的性能比较的疑问

u4crella · 2019年12月17日 · 最后由 u4crella 回复于 2019年12月18日 · 2461 次阅读

代码见下文。

我觉得真是奇了怪了，旧的方法我现在都看不顺眼(之前没想起用split，用c#写完、vs2013调试完再用ruby代码写的)，变量也比新的方法的变量多得多，也用了一些我为了方便而写的轮子函数，怎么会比新的方法还要快的呢？

测试平台虚拟机 linux x86_64 4.12.14 ruby 2.6.5 自己编译的版本。已经测试多次了，应该没有偶然因素，测试的时候电脑除了播歌，也没跑其他程序。

12-18 说明：代码已更新，按一楼兄弟的说法，把 str.sub() 方法改成 str[1, id2]；但是还是比旧的方法慢。

页面搜索 '新方法开始' 、'旧方法开始'、'切换到旧方法'、'切换到新方法'

前面是一堆辅助用的函数，可以直接跳过，搜索上面的字符，直接跳到新或旧的方法




# 新旧方法的正文 从 第82行 开始
class Sawdust4ExprError < StandardError
end
=begin
# 此文件已删除内容:
sawdust_fallback_prepare、sawdust_prepare
=end
class Expr
  attr_accessor :type, :val, :priority
  def initialize(type, val, prio = nil)
    @type = type; @val = val; 
    # 优先级说明: * / 为3；+ - 为2
    if type == :op
      case val
      when '+', '-'
        @priority = 2
      when '*', '/'
        @priority = 3
      else
        raise Sawdust4ExprError
      end
    else
      @priority = nil
    end
  end
end

def addtmpinfo(type_sym, val)
    #~ tinfo1 = 
  #~ puts 'in addtmpinfo'
  #~ puts type_sym.inspect, "\n", val.inspect
    return Expr.new(type_sym, val.to_s)
end

#公用库

def strrep(src_str, text1, text2 = nil)
  # 字符串 替换 封装，第三个参数为空时，替换为''字符。
  if src_str.include?(text1)
    text2 = '' if text2.nil?
    return src_str.gsub(text1, text2)
  else
    return src_str
  end
end

def ttail(src_text, start_pos, end_pos = 0)
  # //获取src_text在start_pos和end_pos之间的文本，start_pos 按1开头
  end_pos = src_text.length if end_pos == 0
  return src_text[start_pos - 1, end_pos - start_pos + 1]
end

def tpos(src_text, pos)
  # //获取src_text在pos - 1处的字符，不指定索引时，返回文本长度。
  # //索引为-1时，返回最后一个字符
  return src_text[pos - 1] if pos > 0
  return src_text[src_text.length + pos] if pos < 0
  raise "tpos不允许以0为位置参数!"
end

def tlen(src_text)
  return src_text.length
end

def tib(text)
  # if text is blank的缩写
  # 如果text是nil或者全由空格组成，返回true，否则false
    text = text.chomp
  return true if text.nil?
  newtext = ''
  (0..(text.length - 1)).to_a.each { |ni|
    if !text[ni] == ' '
      newtext << text[ni]
    end
  }
  return true if newtext == ''
  return false
end

def check_op(obj)
  # 检查分隔符并分类
  case obj
  when '+', '-', '*', '/'
    return :op
  when '(', ')'
    return :pr
  when ','
    return :cma
  else
    raise StandardError, "check_obj obj=#{obj.inspect}, class = #{obj.class}, if = #{obj == ' '}"
  end

end

# 新方法开始

def exp_scan_new(formula)
  alist = []
  text = formula.gsub(' ', '').gsub("\t", '')
  raise Sawdust4ExprError, "表达式为空白" if (text == '' || text.nil?)
#   puts 'exp scan new text=', "<#{text}>"
  # 如果表达式以'-'（负号）或'+'（正号）开头，则在表达式开头加上0
  if (text[0] == '+' || text[0] == '-')
    text = '0' << text
  end
  # 这里不处理的话，后面无法获取正负号开头的信息

  texts = text.split(/\+|-|\*|\/|\(|\)|\,/)
  # 分割符号: + - * / ( ) , #结束(不包含空格)
  texts = texts.reject{|t| (t == ' ' || t == ''|| t.nil?)}
#   puts 'texts inspect=', texts.inspect
  #~ p texts
  tf = text; syms = ''
  pos = 0

  while (pos < (texts.size - 1))
    t1 = texts[pos]; 
    alist << addtmpinfo(:nobj, t1)

      tf = tf[(t1.size), (tf.size - t1.size + 1)]; 
#       if pos == texts.size - 1
#         t2 = ''
#       else
        t2 = texts[pos + 1]
#       end
      while (tf != '' && tf[0, t2.size] != t2)
#         puts "tf=<#{tf}>"
        symb = tf[0]
        alist << addtmpinfo(check_op(symb), symb)
        tf = tf[1, (tf.size - 1)]
      end
    pos += 1
  end
#   puts 'tf=', "<#{tf}>"
  # 此时tf开头是texts的最后一个元素
  last_nobj = texts.last
  alist << addtmpinfo(:nobj, last_nobj)
  tf = tf[last_nobj.size, (tf.size - last_nobj.size + 1)]
  while (tf != '' )#&& tf[0, t2.size] != t2)
  #         puts "tf=<#{tf}>"
    symb = tf[0]
    alist << addtmpinfo(check_op(symb), symb)
    tf = tf[1, (tf.size - 1)]
  end

  return alist.clone
end

# 旧方法开始

def exp_scan_old(formula)
    # tmp_text1 = src_exp.to_s
    raise "空文本" unless (tib(formula))
    blist = []
    src_exp = strrep(formula, ' ', '')
    src_exp = strrep(src_exp, '()', '')
  # 暂时先去掉空格和连对的括号
    switcher = ''
    # 检查命令结尾

        #~ raise new StandardError, "表达式 结尾 不能为运算符"

    # 检查命令开头

        #~ raise new StandardError, "表达式 开头 不能为运算符"

    # 逐字符检查
    tmpid = 1; last_obj_start_id = 1 # 从头记录

    while (tmpid <= tlen(src_exp))
        switcher = tpos(src_exp, tmpid); tmp_text1 = ''
        #~ # sayln('当前switcher = [' + switcher + ']')
        case switcher
        when '('
            if (last_obj_start_id < tmpid)
                tmp_text1 = ttail(src_exp, last_obj_start_id, tmpid - 1)
                blist << addtmpinfo(:nobj, tmp_text1)
            end
            blist << addtmpinfo(:pr, '(')
            last_obj_start_id = tmpid + 1
        when ')'
            if (last_obj_start_id == tmpid)
                if blist[-1].type == :op
                    raise "右括号)连接不当"
                end
            elsif (last_obj_start_id < tmpid)
                tmp_text1 = ttail(src_exp, last_obj_start_id, tmpid - 1)
                blist << addtmpinfo(:nobj, tmp_text1)
            end
            blist << addtmpinfo(:pr, ')')
            last_obj_start_id = tmpid + 1
        when '+', '-', '*', '/'
            if (last_obj_start_id == tmpid)
                if (blist[-1].type == :pr)
                    # 运算符的上一个字符是左右括号
                elsif (blist[-1].type == :op)
                    raise StandardError, "表达式 #{src_exp}  含 连续的重复 运算符, loid = #{last_obj_start_id} tmpid = #{tmpid}"
                else
                    tmp_text1 = ttail(src_exp, last_obj_start_id, tmpid)
                    blist << addtmpinfo(:nobj, tmp_text1)
                end
            elsif (last_obj_start_id < tmpid)
                tmp_text1 = ttail(src_exp, last_obj_start_id, tmpid - 1)
                blist << addtmpinfo(:nobj, tmp_text1)
            end
            blist << addtmpinfo(:op, switcher)
            last_obj_start_id = tmpid + 1
        when ','
            if (last_obj_start_id == tmpid)
                raise StandardError, ",前一位不应该是运算符或括号"
            elsif (last_obj_start_id < tmpid)
                tmp_text1 = ttail(src_exp, last_obj_start_id, tmpid - 1)
                blist << addtmpinfo(:nobj, tmp_text1)
                blist << addtmpinfo(:cma, ',')
                last_obj_start_id = tmpid + 1
            end

        else
        end
        tmpid += 1
    end
    if (last_obj_start_id <= src_exp.length)
        tmp_text1 = ttail(src_exp, last_obj_start_id)
        blist << addtmpinfo(:nobj, tmp_text1)
    end
  #~ puts 'check tmplist=', tmplist.inspect
    return blist
end 

#~ require 'ruby-prof'
#~ RubyProf.start
tn1 = Time.now
#~ formula = '1.5+3' # 如果是这条公式，则新方法比旧方法快
formula = '1.5+((3+1.5)*3)' # 如果是这条公式，则旧方法比新方法快


100000.times do
# 切换到旧方法
   $a = exp_scan_old(formula)
#   
# 切换到新方法
  #~ $a = exp_scan_new(formula)
end
tn2 = Time.now
# 
#~ result = RubyProf.stop
#~ printer = RubyProf::GraphPrinter.new(result)
# ~ printer = RubyProf::CallStackPrinter.new(result)
#~ fio = File.open("./prof-stack.html", 'w')
#~ printer.print(fio)
#~ fio.close

 #~ $a.each do |kk|
   #~ p kk
 #~ end
puts "time=", (tn2 - tn1)

__END__

luikore #0 2019年12月18日

tf = tf.sub(t1, '') 太慢，可以改成 tf = tf[t1.size..]

u4crella #1 2019年12月18日

对

luikore 回复

之前想过这个方向。不过我看到 string[a, b] 就以为是遍历 string 的各个字符，觉得也不可能会快多少，就算了。谢谢。

u4crella #2 2019年12月18日

对

luikore 回复

按你的说法修改了一下，还是比旧的方法慢，这下我又摸不着头脑了。

不过有个奇怪现象，在 linux 虚拟机 ruby2.6.5 环境两者相差的时间比 win7 ruby2.5.7 两者相差的时间多不止一点。

还有就是

str[id..]

这个用法好像只有 ruby2.6 版本才能用，我在 2.5.7 和菜鸟在线编辑器的 2.4 都用不了这个方法。

u4crella #3 2019年12月18日

对

luikore 回复

问题已解决。在现在的代码的新方法里，修改

tf != ''

为等效代码：

tf.size > 0

这样就能得到和旧方法差不多的时间了。再排除掉以下代码的影响：

(旧方法实现以下代码的效果是在另外一个函数进行的，所以在这里的对比应该排除)

if (text[0] == '+' || text[0] == '-')
  text = '0' << text
end

这样，新方法和旧方法在 ruby 2.5.7 上相差的时间的平均值不足 0.02s，就认为性能差不多了。

得过且过吧。

u4crella #4 2019年12月18日

问题还像是没解决完。linux 虚拟机 ruby 2.6.5，旧方法与新方法消耗的时间的比例约为 1.47:1.52，这个差距应该还是能说明问题的。……

需要登录后方可回复, 如果你还没有账号请注册新账号

新手问题 切割数学算式来获取符号，不同方法的性能比较的疑问

新手问题 切割数学算式来获取符号，不同方法的性能比较的疑问

页面搜索 '新方法开始' 、'旧方法开始'、'切换到旧方法'、'切换到新方法'

前面是一堆辅助用的函数，可以直接跳过，搜索上面的字符，直接跳到新或旧的方法

新手问题切割数学算式来获取符号，不同方法的性能比较的疑问

新手问题切割数学算式来获取符号，不同方法的性能比较的疑问