新手问题 使用 sunspot 建立索引时出现 illegal characters 错误

kingguy · 2014年04月27日 · 最后由 ryancheung 回复于 2014年05月12日 · 2845 次阅读

在使用 sunspot 建立索引时出现如下错误

bundle exec rake sunspot:solr:reindex
rake aborted!
RSolr::Error::Http: RSolr::Error::Http - 400 Bad Request
Error: Illegal character ((CTRL-CHAR, code 12))
at [row,col {unknown-source}]: [155,1]

请问要如何解决?

我也碰到这个问题了,还在折腾中

折腾了一下午,这个问题终于解决了。用了很恶心的排除法,发现其中一个 product.description 里面有个 ^K 字符(终端里面可以按 ctrl-v-k 打出来), 把这个字符删除掉 reindex 就没问题了!

#2 楼 @ryancheung 我最开始的方案是找到这个字符,然后直接在数据库里面删除掉。不过后来提交了个缺陷给 sunspot 项目组,有人回复了,解决方案如下: 直接修改 Data Extractor,不过是非官方的。。。。。。

module Sunspot
  # 
  # DataExtractors present an internal API for the indexer to use to extract
  # field values from models for indexing. They must implement the #value_for
  # method, which takes an object and returns the value extracted from it.
  #
  module DataExtractor #:nodoc: all
    # 
    # AttributeExtractors extract data by simply calling a method on the block.
    #
    class AttributeExtractor
      def initialize(attribute_name)
        @attribute_name = attribute_name
      end

      def value_for(object)
        Filter.new( object.send(@attribute_name) ).value
      end
    end

    # 
    # BlockExtractors extract data by evaluating a block in the context of the
    # object instance, or if the block takes an argument, by passing the object
    # as the argument to the block. Either way, the return value of the block is
    # the value returned by the extractor.
    #
    class BlockExtractor
      def initialize(&block)
        @block = block
      end

      def value_for(object)
        Filter.new( Util.instance_eval_or_call(object, &@block) ).value
      end
    end

    # 
    # Constant data extractors simply return the same value for every object.
    #
    class Constant
      def initialize(value)
        @value = value
      end

      def value_for(object)
        Filter.new(@value).value
      end
    end

    # 
    # A Filter to allow easy value cleaning
    #
    class Filter
      def initialize(value)
        @value = value
      end
      def value
        strip_control_characters @value
      end
      def strip_control_characters(value)
        return value unless value.is_a? String

        value.chars.inject("") do |str, char|
          unless char.ascii_only? and (char.ord < 32 or char.ord == 127)
            str << char
          end
          str
        end

      end
    end

  end
end
需要 登录 后方可回复, 如果你还没有账号请 注册新账号