新手问题这段蹩脚的代码该如何优化？

ynopeeb · 2017年02月21日 · 最后由 ynopeeb 回复于 2017年02月23日 · 1986 次阅读

要使用 Ruby 处理一份日志，提取出来里面的 duration 字段的值，并且相加，得到总的数值。日志需要先处理，找出里面的 kafka 字段，取出其中的 json 串，再做处理。作为新手，跌跌撞撞的使用 Ruby 完成了一部分功能，越看越别扭，希望能指出其中的问题，是否还有其他更简洁优雅的实现方式，多谢。

日志文件为 test.log，内容已经做了处理，参考如下：

/disk/sata13/xxx-platform/2017_01_03/0.0.0.0/platform.07_30.log.0.0.0.0.2017_01_03.gz: /naga/v0.5.3-small: {"log":"[Naga 0.5.3 I 17-01-03 07:19:43] ddc7c32422a7a2e413346258034c39e6 - kafka:{\"media_uris\": [\"\/uploads\/audio\/120\/14833991837592.amr\"], \"status\": 200, \"tasks\": [\"transcode\"], \"request_time\": 1483399183, \"task_id\": \"ddc7c32422a7a2e413346258034cxxx39e6\", \"timestamp\": 1483399183, \"bucket\": \"xxx\", \"convtime\": 0.18000006675720215, \"notify_url\": \"http:\/\/www.xxx.net\/api\/upload\/xxx\", \"input_info\": {\"streams\": [{\"index\": 0, \"audio_channels\": 1, \"audio_samplerate\": 8000.0, \"bitrate\": 8800, \"codec_desc\": \"AMR-NB (Adaptive Multi-Rate NarrowBand)\", \"codec\": \"amrnb\", \"duration\": 0.029125000000000002, \"type\": \"audio\"}], \"format\": {\"duration\": 0.029125000000000002, \"fullname\": \"3GPP AMR\", \"bitrate\": 10437.0, \"filesize\": 38.0, \"format\": \"amr\"}}, \"uploads\": [\"\/uploads\/audio\/120\/14833991837592.mp3\"], \"starttime\": 1483399183, \"async\": true, \"media_info\": {\"streams\": [{\"index\": 0, \"audio_channels\": 1, \"audio_samplerate\": 8000.0, \"bitrate\": 64000, \"codec_desc\": \"MP3 (MPEG audio layer 3)\", \"codec\": \"mp3\", \"duration\": 0.216, \"type\": \"audio\"}], \"format\": {\"duration\": 0.216, \"fullname\": \"MP2\/3 (MPEG audio layer 2\/3)\", \"bitrate\": 87000.0, \"filesize\": 2349.0, \"format\": \"mp3\"}}, \"type\": \"audio\", \"options\": {\"ab\": \"64\", \"f\": \"mp3\"}, \"errmsg\": \"\"}","time":"2017\/01\/02 23:19:43 UTC","hostname":"bd39a20ecef9@xxx"}
/disk/sata13/xxxx-platform/2017_01_03/0.0.0.0/platform.07_30.log.0.0.0.0.2017_01_03.gz: /naga/v0.5.3-small: {"log":"[Naga 0.5.3 I 17-01-03 07:19:43] ddc7c32422a7a2e413346258034c39e6 - notify:message:{'bucket': 'xxx', 'notify_url': 'http:\/\/www.xxx.net\/api\/upload\/xxx', 'content_type': 'application\/x-www-form-urlencoded', 'operator': 'xxx', 'password': 'xxx', 'data': 'xxx'}","time":"2017\/01\/02 23:19:43 UTC","hostname":"bd39a20ecef9@xxx"}
/disk/sata13/xxx-platform/2017_01_03/0.0.0.0/platform.07_30.log.0.0.0.0.2017_01_03.gz: /naga/v0.5.3-small: {"log":"[Naga 0.5.3 I 17-01-03 07:19:43] ddc7c32422a7a2e413346258034c39e6 - kafka:{\"media_uris\": [\"\/uploads\/audio\/120\/14833991837592.amr\"], \"status\": 200, \"tasks\": [\"transcode\"], \"request_time\": 1483399183, \"task_id\": \"ddc7c32422a7a2e413346258034cxxx39e6\", \"timestamp\": 1483399183, \"bucket\": \"xxx\", \"convtime\": 0.18000006675720215, \"notify_url\": \"http:\/\/www.xxx.net\/api\/upload\/xxx\", \"input_info\": {\"streams\": [{\"index\": 0, \"audio_channels\": 1, \"audio_samplerate\": 8000.0, \"bitrate\": 8800, \"codec_desc\": \"AMR-NB (Adaptive Multi-Rate NarrowBand)\", \"codec\": \"amrnb\", \"duration\": 0.029125000000000002, \"type\": \"audio\"}], \"format\": {\"duration\": 0.029125000000000002, \"fullname\": \"3GPP AMR\", \"bitrate\": 10437.0, \"filesize\": 38.0, \"format\": \"amr\"}}, \"uploads\": [\"\/uploads\/audio\/120\/14833991837592.mp3\"], \"starttime\": 1483399183, \"async\": true, \"media_info\": {\"streams\": [{\"index\": 0, \"audio_channels\": 1, \"audio_samplerate\": 8000.0, \"bitrate\": 64000, \"codec_desc\": \"MP3 (MPEG audio layer 3)\", \"codec\": \"mp3\", \"duration\": 0.216, \"type\": \"audio\"}], \"format\": {\"duration\": 0.216, \"fullname\": \"MP2\/3 (MPEG audio layer 2\/3)\", \"bitrate\": 87000.0, \"filesize\": 2349.0, \"format\": \"mp3\"}}, \"type\": \"audio\", \"options\": {\"ab\": \"64\", \"f\": \"mp3\"}, \"errmsg\": \"\"}","time":"2017\/01\/02 23:19:43 UTC","hostname":"bd39a20ecef9@xxx"}
/disk/sata13/docker-platform/2017_01_03/0.0.0.0/platform.07_30.log.0.0.0.0.2017_01_03.gz: /naga/v0.5.3-small: {"log":"[Naga 0.5.3 I 17-01-03 07:19:43] ddc7c32422a7a2e413346258034c39e6 - notify:message:{'bucket': 'xxx', 'notify_url': 'http:\/\/www.xxx.net\/api\/upload\/xxx', 'content_type': 'application\/x-www-form-urlencoded', 'operator': 'xxx', 'password': 'xxx', 'data': 'xxx'}","time":"2017\/01\/02 23:19:43 UTC","hostname":"bd39a20ecef9@xxx"}
/disk/sata13/docker-platform/2017_01_03/0.0.0.0/platform.07_30.log.0.0.0.0.2017_01_03.gz: /naga/v0.5.3-small: {"log":"[Naga 0.5.3 I 17-01-03 07:19:43] ddc7c32422a7a2e413346258034c39e6 - notify:message:{'bucket': 'xxx', 'notify_url': 'http:\/\/www.xxx.net\/api\/upload\/xxx', 'content_type': 'application\/x-www-form-urlencoded', 'operator': 'xxx', 'password': 'xxx', 'data': 'xxx'}","time":"2017\/01\/02 23:19:43 UTC","hostname":"bd39a20ecef9@xxx"}
/disk/sata13/xxx-platform/2017_01_03/0.0.0.0/platform.07_30.log.0.0.0.0.2017_01_03.gz: /naga/v0.5.3-small: {"log":"[Naga 0.5.3 I 17-01-03 07:19:43] ddc7c32422a7a2e413346258034c39e6 - kafka:{\"media_uris\": [\"\/uploads\/audio\/120\/14833991837592.amr\"], \"status\": 200, \"tasks\": [\"transcode\"], \"request_time\": 1483399183, \"task_id\": \"ddc7c32422a7a2e413346258034cxxx39e6\", \"timestamp\": 1483399183, \"bucket\": \"xxx\", \"convtime\": 0.18000006675720215, \"notify_url\": \"http:\/\/www.xxx.net\/api\/upload\/xxx\", \"input_info\": {\"streams\": [{\"index\": 0, \"audio_channels\": 1, \"audio_samplerate\": 8000.0, \"bitrate\": 8800, \"codec_desc\": \"AMR-NB (Adaptive Multi-Rate NarrowBand)\", \"codec\": \"amrnb\", \"duration\": 0.029125000000000002, \"type\": \"audio\"}], \"format\": {\"duration\": 0.029125000000000002, \"fullname\": \"3GPP AMR\", \"bitrate\": 10437.0, \"filesize\": 38.0, \"format\": \"amr\"}}, \"uploads\": [\"\/uploads\/audio\/120\/14833991837592.mp3\"], \"starttime\": 1483399183, \"async\": true, \"media_info\": {\"streams\": [{\"index\": 0, \"audio_channels\": 1, \"audio_samplerate\": 8000.0, \"bitrate\": 64000, \"codec_desc\": \"MP3 (MPEG audio layer 3)\", \"codec\": \"mp3\", \"duration\": 0.216, \"type\": \"audio\"}], \"format\": {\"duration\": 0.216, \"fullname\": \"MP2\/3 (MPEG audio layer 2\/3)\", \"bitrate\": 87000.0, \"filesize\": 2349.0, \"format\": \"mp3\"}}, \"type\": \"audio\", \"options\": {\"ab\": \"64\", \"f\": \"mp3\"}, \"errmsg\": \"\"}","time":"2017\/01\/02 23:19:43 UTC","hostname":"bd39a20ecef9@xxx"}
/disk/sata13/docker-platform/2017_01_03/0.0.0.0/platform.07_30.log.0.0.0.0.2017_01_03.gz: /naga/v0.5.3-small: {"log":"[Naga 0.5.3 I 17-01-03 07:19:43] ddc7c32422a7a2e413346258034c39e6 - notify:message:{'bucket': 'xxx', 'notify_url': 'http:\/\/www.xxx.net\/api\/upload\/xxx', 'content_type': 'application\/x-www-form-urlencoded', 'operator': 'xxx', 'password': 'xxx', 'data': 'xxx'}","time":"2017\/01\/02 23:19:43 UTC","hostname":"bd39a20ecef9@xxx"}

代码如下：

#! /usr/bin/ruby
# something
require 'json'

File.open('test.log', 'r') do |f|
    f.each do |line|
        re = /(kafka:)({.+}(?=",))+/
        match_data = re.match(line)
        data = $2.gsub('\\', '') if match_data.class == MatchData
        data1 = JSON.parse(data,{ symbolize_names: true }) if data.class == String
        if data1.is_a? Hash
            puts data1[:media_info][:format][:duration].to_f 
        end
    end
end

liprais #0 2017年02月22日

看上去你的日志就是 : 分隔开的三个字段，最后一个是 json，所以先找到最后一个然后 parse json 不就行了？

doitian #1 2017年02月22日

自己处理转义容易出错，先提取出最后一个 JSON，然后其中的 log 字段就是处理过的，再处理其中 kafka 后面的 JSON。也就是用两次 JSON.parse
代码上用 .class == XXX 这样没必须，re.match 没匹配返回 nil，直接用 if 判断就可以了。同样一个变量如果只在 if 的一个分支里赋值，而没运行到的话，该变量的值也是 nil

ynopeeb #2 2017年02月22日

#2 楼 @doitian 感谢，我测试了一下，用正则匹配，然后分组，直接输出第二个分组，按照你的建议修改了一下，看来好多了。😊

File.open('kafka_log', 'r') do |f|
    f.each do |line|
        re = /(kafka:)({.+}(?=",))+/
        match_data = re.match(line)
        unless match_data.nil?
            data = JSON.parse($2.to_s.gsub("\\", ''))
            puts data["media_info"]["format"]["duration"]
        end
    end
end

ynopeeb #3 2017年02月22日

#1 楼 @liprais 是的，看上去简单，实现过程挺折腾，估计基础的东西还是不熟练

timlen #4 2017年02月22日

re 移除第二个循环可好？

ynopeeb #5 2017年02月22日

对

timlen 回复

移除第二个循环，具体是指？

timlen #6 2017年02月22日

对

ynopeeb 回复

说错了，是移除第二个 block，就是从 f.each 这个往外移

saiga #7 2017年02月22日

正则可以使用 =~，多余的类型判断就不要了，比如 gsub 出来的值没必要判断是否是字符串，hash 可以使用 dig 来获取值。不过可能正则没办法保证提取的 json 字符串是否正确。

IO.foreach('test.log') do |line|
  if /(kafka:)({.+}(?=",))+/ =~ line
    hash = JSON.parse($2.gsub('\\', ''))
    hash.dig('media_info', 'format', 'duration')
  end
end

一行代码流...

IO.read('test.log').scan(/(?>kafka:)((?:{.+?}(?=",))+)/m).flatten.map { |l| JSON.parse(l.gsub('\\', ''))&.dig('media_info', 'format', 'duration') }

ynopeeb #8 2017年02月23日

对

saiga 回复

👍，简洁了好多。我以为 $2 需要转换成 String 才能用 gsub 方法。

ynopeeb #9 2017年02月23日

对

saiga 回复

如何把提取出来的 hash.dig('media_info', 'format', 'duration') 再相加，得出一个总数呢？

ynopeeb 关闭了讨论。 03月09日 16:43

需要登录后方可回复, 如果你还没有账号请注册新账号

新手问题 这段蹩脚的代码该如何优化？

新手问题 这段蹩脚的代码该如何优化？

新手问题这段蹩脚的代码该如何优化？

新手问题这段蹩脚的代码该如何优化？