Rails ActiveSupport::HashWithIndifferentAccess 源码分析 - 赋予散列表无差别访问功能的类

lanzhiheng · 2020年08月06日 · 1622 次阅读

ActiveSupport::HashWithIndifferentAccess 是 Rails 中原生 Hash 类的子类，它对原生的 Hash 类进行扩展，赋予了散列表副差别访问的功能。这篇文章是对它的源码分析。原文发布于 https://www.lanzhiheng.com/posts/source%E2%80%93code-analysis-for-active-support-hash-with-indifferent-access

今天这篇文章主要是对ActiveSupport::HashWithIndifferentAccess这个类进行源码分析。它的实例是一个类 Hash 对象，它们能够同等对待“内容相同”的符号与字符串，让用户可以进行无差别地去访问该键所对应的值。

上面所提到的“内容相同”指的是能够通过Symbol#to_s与String#to_sym进行相互转换的符号与字符串。

起源

Ruby 中的散列表，会有一个问题（其实也不能算是问题），就是针对符号与字符串，哪怕它们长得很像，散列表还是会把它们当成不同的键，大概是这样：

irb(main):006:0> hash = { a: 1, b: 2}
irb(main):007:0> hash[:a]
=> 1
irb(main):008:0> hash['a']
=> nil

用户往往会期待，Hash 表能够把符号:a与字符串"a"当作同一个键来对待，也就是：

irb(main):007:0> hash[:a]
=> 1
irb(main):008:0> hash['a'] // 这是期望
=> 1

它们会希望 Ruby 能够默默地帮忙做这种转换：

irb(main):008:0> hash['a'.to_sym]
=> 1

个人观点：这种区别对待也是 Ruby 的一种设计理念，某种程度上能够避免掉一些不必要的 Bug。如果我们真的想要无差别访问的功能，可以自己复写Hash原来的方法，或者直接寻求第三方工具库的支持。

如果在找第三方库的话ActiveSupport::HashWithIndifferentAccess或许就是你想要的东西。

有的人已经这样做了

Rails 里面确实有些对象能够做到符号与字符串的无差别访问了。我们最熟悉的params就是这样一个东西。以下是我在控制器中利用binding.pry调试的结果

[1] pry(#<HomeController>)> params
=> <ActionController::Parameters {"controller"=>"home", "action"=>"index"} permitted: false>
[2] pry(#<HomeController>)> params[:sym_key] = "You can access me by String"
=> "You can access me by String"
[3] pry(#<HomeController>)> params['sym_key'] = "You can access me by String"
=> "You can access me by String"

从actionpack/lib/action_controller/metal/strong_parameters.rb的源码来看，params其实是一个方法

module ActionController
  # ...
  class Parameters
    # ...
    def initialize(parameters = {})
      @parameters = parameters.with_indifferent_access
      @permitted = self.class.permit_all_parameters
    end

    # ....
    def []=(key, value)
      @parameters[key] = value
    end
  end

  module StrongParameters
    def params
      @_params ||= Parameters.new(request.parameters)
    end
  end
end

利用内部类ActionController::Parameters进行实例化，参数是request.parameters。先不管request.parameters是啥玩意，从初始化方法来看，哪怕传入的参数是一个普通的Hash实例，它都会调用Hash#with_indifferent_access这个方法。从字面上来看，它就是把当前的散列对象转换成有无差别访问能力的对象并返回。

原生的Hash并不具有这个方法，一定是 Rails 在某处对它进行的扩展。全局搜索了一下，源码位于activesupport/lib/active_support/core_ext/hash/indifferent_access.rb

require "active_support/hash_with_indifferent_access"

class Hash
  def with_indifferent_access
    ActiveSupport::HashWithIndifferentAccess.new(self)
  end

  alias nested_under_indifferent_access with_indifferent_access
end

实际上就是把原生的Hash实例转换成ActiveSupport::HashWithIndifferentAccess的实例，而这个类就是做到无差别访问的关键。

不过单凭上面的例子，证据稍显不足。毕竟封装层数有点多，难免眼花缭乱。有个简便的方法可以验证一下，来看看request.parameters的类是什么

[1] pry(#<PostsController>)> request.parameters.class
=> ActiveSupport::HashWithIndifferentAccess

恰好它就是一个ActiveSupport::HashWithIndifferentAccess的实例，再看看它是否具有无差别访问能力？

[1] pry(#<PostsController>)> hash = request.parameters
=> ...
[2] pry(#<PostsController>)> hash[:sym_key] = "Access Me"
=> "Access Me"
[3] pry(#<PostsController>)> hash[:sym_key]
=> "Access Me"
[4] pry(#<PostsController>)> hash['sym_key']
=> "Access Me"

看来判断没错，无差别访问的能力就是隐藏在ActiveSupport::HashWithIndifferentAccess这个类下面。接下来就把注意力放在它的源码上。

源码分析

ActiveSupport::HashWithIndifferentAccess的源码位于activesupport/lib/active_support/hash_with_indifferent_access.rb。这源码文件有点长，笔者只取关键的部分来分析。

1. 简单的读写

module ActiveSupport
  class HashWithIndifferentAccess < Hash
    alias_method :regular_writer, :[]= unless method_defined?(:regular_writer)

    # ....
    # Assigns a new value to the hash:
    #
    #   hash = ActiveSupport::HashWithIndifferentAccess.new
    #   hash[:key] = 'value'
    #
    # This value can be later fetched using either +:key+ or <tt>'key'</tt>.
    def []=(key, value)
      regular_writer(convert_key(key), convert_value(value, conversion: :assignment))
    end

    def [](key)
      super(convert_key(key))
    end
  end
end

首先可以注意到的是HashWithIndifferentAccess是Hash的子类，接下来要做的其实就是对Hash类进行扩展。

上面有这么一行代码：

alias_method :regular_writer, :[]= unless method_defined?(:regular_writer)

这段代码的作用是，对原生的Hash#[]=方法进行备份。我们接下来不仅仅要复写这个方法，而且还需要在新的方法里面调用原生的方法。备份也避免了同名方法嵌套调用而导致的递归行为。

def []=(key, value)
  regular_writer(convert_key(key), convert_value(value, for: :assignment))
end

当然，Hash#[]这个方法采用super关键字也达到了相同的效果：

def [](key)
  super(convert_key(key))
end

然而我们并不仅仅是在复写原来方法的时候会用到Hash#[]=，该文件的很多地方都会用到，只是我这里没有贴出来而已。故而，我们依旧需要采用一个别名来承接原生的Hash#[]=方法，这里把它改名为regular_writer。

2. 内部转换

经过上述代码的复写，这个文件中得到了两个新方法，分别是HashWithIndifferentAccess#[]，HashWithIndifferentAccess#[]=。它们所做的事情无非就是在对散列表进行读写之前，先对键值对进行转换处理。而这里对键跟值的转换分别采用了HashWithIndifferentAccess#convert_key与HashWithIndifferentAccess#convert_value这两个方法来处理，接下来就分析一下这两个方法。

1). 键转换

先来看HashWithIndifferentAccess#convert_key方法

module ActiveSupport
  class HashWithIndifferentAccess < Hash
    # ...
    private
      def convert_key(key)
        key.kind_of?(Symbol) ? key.to_s : key
      end
    # ...
  end
end

它是一个私有方法，做的事情相当简单，无非就是把符号类型的键转换成字符串（采用Symbol#to_s），其他的类型则保持原样。

2). 值转换

值的转换就稍微麻烦些了，为了方便查看，我把跟HashWithIndifferentAccess#convert_value有关的方法都统一列出来

module ActiveSupport
  class HashWithIndifferentAccess < Hash
    def nested_under_indifferent_access
      self
    end

    # Convert to a regular hash with string keys.
    def to_hash
      _new_hash = Hash.new
      set_defaults(_new_hash)

      each do |key, value|
        _new_hash[key] = convert_value(value, conversion: :to_hash)
      end
      _new_hash
    end

    private
      def convert_value(value, conversion: nil)
        if value.is_a? Hash
          if conversion == :to_hash
            value.to_hash
          else
            value.nested_under_indifferent_access
          end
        elsif value.is_a?(Array)
          if conversion != :assignment || value.frozen?
            value = value.dup
          end
          value.map! { |e| convert_value(e, conversion: conversion) }
        else
          value
        end
      end
      # ...
  end
end

同样的，它也是一个私有方法。只是会根据不同的类型来区分处理。

类型是散列表：

检测到值类型为Hash时，会解构出的配置项中的conversion属性，并判断conversion == :to_hash，如果结果是真，则调用Hash#to_hash这个方法。这个方法在该类中也是被复写过的，简单来说它的作用就是：把HashWithIndifferentAccess的实例转换成Hash的实例，只是这个时候，所得到的Hash实例中所有的键都是字符串（这里就不具体分析那段代码了）。

反之，就会调用Hash#nested_under_indifferent_access这个方法。这个方法我在上一章也贴过，它会以当前Hash实例为范本构造出一个HashWithIndifferentAccess的实例。不过在这个类中有对这个方法进行过优化：

def nested_under_indifferent_access
  self
end

调用者如果已经是HashWithIndifferentAccess的实例的话，则直接返回调用者本身，不需要再走冗长的初始化流程了。

类型是数组：

如果检测到值是数组类型，那么会先检测配置项中的conversion是否等于:assignment。如果结果为假值，又或者是值本身已经被冻结了 (frozen?)，那么就用Object#dup来对值进行一次浅拷贝。（关于Object#dup的用法，可以参考笔者之前翻译的文章）。

接着，再以递归的形式对数组中的每一个元素用convert_value进行处理，并以相同的配置项（{ conversion: xxxx }）作为配置参数。因为这里采用的是Array#map!方法，所以会改变数组自身。

其他类型：

其他类型则不做任何转换。

3. 更新（合并）

还有一个稍微繁杂点的操作 - 更新操作。原生方法是Hash#update。它主要是用来合并两个，或多个散列表。大概就像这样：

> a = {a: 1, b: 2, c: 1}
> b = {a: 100, b: 10000, d: 90}
> c = {e: 3000}
> a.update(b)
=> {:a=>100, :b=>10000, :c=>1, :d=>90}
> a.update(b, c)
=> {:a=>100, :b=>10000, :c=>1, :d=>90, :e=>3000}

个人觉得这个方法改成Hash#merge会更好吧。事实上，Ruby 社区也是这样想的：

> a.merge(b)
=> {:a=>100, :b=>10000, :c=>1, :d=>90}

该方法在HashWithIndifferentAccess这个类中也被复写了

module ActiveSupport
  class HashWithIndifferentAccess < Hash
    # ...
    alias_method :regular_update, :update unless method_defined?(:regular_update)

    # ...
    def update(*other_hashes, &block)
      if other_hashes.size == 1
        update_with_single_argument(other_hashes.first, block)
      else
        other_hashes.each do |other_hash|
          update_with_single_argument(other_hash, block)
        end
      end
      self
    end

    private
    # ...
    def update_with_single_argument(other_hash, block)
      if other_hash.is_a? HashWithIndifferentAccess
        regular_update(other_hash, &block)
      else
        other_hash.to_hash.each_pair do |key, value|
          if block && key?(key)
            value = block.call(convert_key(key), self[key], value)
          end
          regular_writer(convert_key(key), convert_value(value))
        end
      end
    end
  end
end

首先是常规操作，采用别名regular_update来备份原生的Hash#update方法。接下来再去定制自身的HashWithIndifferentAccess#update。

更新方法本身能够接收多个参数，如果参数只有一个的时候就直接调用HashWithIndifferentAccess#update_with_single_argument，否则的话则遍历参数数组，依次对每个元素调用该方法。

最后来分析一下HashWithIndifferentAccess#update_with_single_argument。首先它会检测需要被合并的值是否为HashWithIndifferentAccess的实例，如果是的话，则直接用原生的Hash#update方法（已经被别名为regular_update）进行处理。

否则的话则先调用to_hash。这个调用的好处在于，如果对象自身并没有to_hash这个方法，就会直接抛出异常。不需要做更多的类型判断，就能提示用户采用合法的数据，这也是鸭子类型的好处。

> HashWithIndifferentAccess.new.update(1)
Traceback (most recent call last):
        1: from (irb):6
NoMethodError (undefined method `to_hash' for 1:Integer)
Did you mean?  to_s

接着，遍历该值的所有键值对，并采用原生的写方法Hash#write（此处被别名为regular_write）把对应的键与值写入到当前对象中去。

另外要注意一下，这段代码：

if block && key?(key)
  value = block.call(convert_key(key), self[key], value)
end

update方法可能会接收代码块，该代码块会被传递到HashWithIndifferentAccess#update_with_single_argument方法中。如果对应的键已经存在于当前对象中，并且代码块也存在的时候，则用代码块来换算出最新的值。接着再往下执行，用新值来插入。

尾声

篇幅有限就先分析到这，若有哪里分析得不对的，还望指正。除了上述的方法之外ActiveSupport::HashWithIndifferentAccess还对其他许多Hash已有的方法进行了复写，是为了更全面地赋予散列表无差别访问功能，有兴趣的可以自己去看看。

8 个赞

暂无回复。

需要登录后方可回复, 如果你还没有账号请注册新账号

8 个赞

共收到 0 条回复

收到新回复，点击立即加载