Ruby 怎么将 gbk 的中文字符转换成 unicode 编码?

dawei_xia · 2013年11月01日 · 最后由 wlchn 回复于 2017年03月06日 · 12347 次阅读

如“中文”-> “\u4e2d\u6587” 有试过 puts Iconv.conv('utf-8','gbk',"中文").inspect # "\344\270\255\346\226\207" puts NKF.nkf('-w',"中文").inspect # "\345\266\204\347\214\237"

补充说明一下：中文转码后的字符串是被 selenium 使用，通过 selenium 框架，字符串会被传递到 web 页面中使用，框架传递要求 utf-8 编码，使用过 iconv 和 nkf，转码后传到页面上都是乱码，而直接传“\u4e2d\u6587”字符串，则显示正常。所以需要解决怎么从“中文”字符串获得“\u4e2d\u6587”字符串。另外，目前只能使用 1.8.7 版本的 ruby，encode 不支持哦

3 个赞

teddy #0 2013年11月01日

先 force_encoding("gbk") 再 encode!("utf-8")

zhangyuan #1 2013年11月01日

cleaned = string.dup.force_encoding(Encoding::UTF_8)
cleaned.encode(Encoding::UTF_8, Encoding::GBK) unless cleaned.valid_encoding?
cleaned

putty #2 2013年11月01日

iconv -f gbk -t utf-8 文件名 >> xxxx

1 个赞

dylanjiao #3 2013年11月11日

同样的问题，求解

sevk #4 2013年11月11日


#!/usr/bin/env ruby
# -*- coding: UTF-8 -*-
#
# ruby utf8 gb2312 gbk gb18030 转换
require 'rubygems'

if RUBY_VERSION > '1.9'
   if RUBY_VERSION > '1.9.2'
      $ec1 = Encoding::Converter.new("UTF-16lE", "UTF-8", :universal_newline => true)
      $ec2 = Encoding::Converter.new("UTF-8","GB2312", :universal_newline => true)
   else
      require 'iconv'
   end
else
   require 'iconv'
end

class String
   #s.encode!("gbk")
   def code_a2b(a,b)
      if RUBY_VERSION > '1.9.2' and defined? Encoding::Converter
        tmp = Encoding::Converter.new(a,b, :universal_newline => true)
        tmp.convert self rescue self
      else
        Iconv.conv("#{b}//IGNORE","#{a}//IGNORE",self)
      end
   end
   def gbtoX(code)
     code_a2b('GB18030',code)
     #code_a2b('CP20936',code)
     #code_a2b('GB2312',code)
   end

   def togb2312
      return $ec2.convert self if RUBY_VERSION > '1.9.2'
      Iconv.conv("CP20936#{Ig}","UTF-8#{Ig}",self)
   end

   def togbk
      if RUBY_VERSION > '1.9.2'
         $ec2.convert self rescue self
      else
         Iconv.conv("GBK#{Ig}","UTF-8#{Ig}",self)
      end
   end

   def togb
      if RUBY_VERSION > '1.9.2'
         $ec2.convert self rescue self
      else
         Iconv.conv("GB2312#{Ig}","UTF-8#{Ig}",self)
      end
   end
   alias to_gb togb

   def utf8_to_gb
      return $ec2.convert self if RUBY_VERSION > '1.9.2'
      Iconv.conv("GB18030#{Ig}","UTF-8#{Ig}",self)
   end
   def gb_to_utf8
      return $ec1.convert self if RUBY_VERSION > '1.9.2'
      Iconv.conv("UTF-8#{Ig}","GB18030#{Ig}",self)
   end
   def to_utf8
      return $ec1.convert self if RUBY_VERSION > '1.9.2'
      Iconv.conv("UTF-8#{Ig}","GB18030#{Ig}",self)
   end
   alias toutf8 to_utf8

   def to_hex(s=' ')
      self.each_byte.map{|b| "%02X" % b}.join(s)
   end
end

begin
  require 'rchardet' if RUBY_VERSION < '1.9'
  require 'rchardet19' if RUBY_VERSION > '1.9'
rescue LoadError
  s="载入库错误,命令:
  apt-get install rubygems; #安装ruby库管理器 
gem install rchardet; #安装字符猜测库\n否则字符编码检测功能可能失效. \n"
  s = s.utf8_to_gb if win_platform?
  puts s
  puts $!.message + $@[0]
end
def guess(s)
  CharDet.detect(s)['encoding'].upcase
end

if $0 == __FILE__
   puts '中文'.togbk
end

sunhaolin #5 2014年06月24日

请问怎么 utf8 转 unicode？如“中文”-> “\u4e2d\u6587”

wlchn #6 2017年03月06日

"中文".unpack('U*').map{ |i| "\\u" + i.to_s(16).rjust(4, '0') }.join
=> "\\u4e2d\\u6587"

需要登录后方可回复, 如果你还没有账号请注册新账号