用 elasticsearch 做搜索,以下是需要转义的字符
#4 楼 @as181920 这是 unicode character property group (翻译成字符属性组?) 的写法,很多正则引擎例如 Perl, .NET, ruby 都支持,详细列表文档里有 http://www.ruby-doc.org/core-1.9.3/Regexp.html#label-Character+Properties
还有个常用的用法就是匹配中文 \p{Han}
对文档补充说明一下:
Letter
是字母或者字符Mark
是字符上面或者侧面的小点类型的附加元素,如音调标记或者 ü
上的两点Symbol
是符号,如 +-*/
Punctuation
是标点符号,如 ,.
Separator
有 3 种:空格 Zs
换行符 Zl
分段符 Zp
Ruby 2.0 里新加了 Grapheme (这个词是学着音素 (Phoneme) 造出来的,可以看作"字素") 的匹配 \X
#6 楼 @luikore 正则好强大啊。。有时候简单的case用String#encode也很方便:
[8] pry(main)> p "<>&\"".encode(:xml => :text)
"<>&\""
=> "<>&\""
[9] pry(main)> p "<>&\"".encode(:xml => :attr)[1..-2]
"<>&""
=> "<>&""
PS. 比正则要快一些噢~还能替换 crlf 什么的。
:invalid => nil # raise error on invalid byte sequence (default)
:invalid => :replace # replace invalid byte sequence
:undef => nil # raise error on undefined conversion (default)
:undef => :replace # replace undefined conversion
:replace => string # replacement string ("?" or "\uFFFD" if not specified)
:newline => :universal # decorator for converting CRLF and CR to LF
:newline => :crlf # decorator for converting LF to CRLF
:newline => :cr # decorator for converting LF to CR
:universal_newline => true # decorator for converting CRLF and CR to LF
:crlf_newline => true # decorator for converting LF to CRLF
:cr_newline => true # decorator for converting LF to CR
:xml => :text # escape as XML CharData.
:xml => :attr # escape as XML AttValue
refs: http://www.ruby-doc.org/core-2.0/String.html#method-i-encode