Rails ancestry 和 closure_tree 的批量读对比

mizuhashi · 2017年04月17日 · 最后由 playmonkey 回复于 2017年05月17日 · 4268 次阅读
本帖已被设为精华帖!

本次测试中使用了细粒度缓存插件 second_level_cache,测试数据是一个 7 个节点的二叉树。


批量读 parent:

ancestry: 
pry(main)> Catalog.all.map(&:parent)
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs"
Cache read: slc/catalogs/1/0/4b739449bcf124e7ace371e00a28ea69
Cache read: slc/catalogs/1/0/4b739449bcf124e7ace371e00a28ea69
Cache read: slc/catalogs/2/0/4b739449bcf124e7ace371e00a28ea69
Cache read: slc/catalogs/2/0/4b739449bcf124e7ace371e00a28ea69
Cache read: slc/catalogs/3/0/4b739449bcf124e7ace371e00a28ea69
Cache read: slc/catalogs/3/0/4b739449bcf124e7ace371e00a28ea69
closure_tree:
pry(main)> Catalog.includes(:parent).all.map(&:parent)
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs"
Cache read_multi: ["slc/catalogs/1/0/e1a68710d2b02b4e6409f3222c9e280e", "slc/catalogs/2/0/e1a68710d2b02b4e6409f3222c9e280e", "slc/catalogs/3/0/e1a68710d2b02b4e6409f3222c9e280e"]
missed ids -> [] | hitted ids -> [1, 2, 3]

ancestry 和 closure_tree 都可以用到 slc,但 closure_tree 使用 includes 可从 multi_get 中受益,若不 includes,则和 ancestry 完全一致。


查询 children:

ancestry:
pry(main)> Catalog.all.map(&:children)
  Catalog Load (0.2ms)  SELECT "catalogs".* FROM "catalogs"
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."ancestry" = '1'
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."ancestry" = '1/2'
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."ancestry" = '1/3'
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."ancestry" = '1/2/4'
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."ancestry" = '1/2/5'
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."ancestry" = '1/3/6'
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."ancestry" = '1/3/7'
closure_tree:
pry(main)> Catalog.includes(:children).all.map(&:children)
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs"
  Catalog Load (0.2ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."parent_id" IN (1, 2, 3, 4, 5, 6, 7)

双方都不能命中缓存,但 closure_tree 可受益于 includes。


查询 ancestors:

ancestry:
pry(main)> Catalog.all.map(&:ancestors)
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs"
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs" WHERE 1=0 ORDER BY coalesce("catalogs"."ancestry", '')
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."id" IN (1) ORDER BY coalesce("catalogs"."ancestry", '')
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."id" IN (1) ORDER BY coalesce("catalogs"."ancestry", '')
  Catalog Load (0.3ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."id" IN (1, 2) ORDER BY coalesce("catalogs"."ancestry", '')
  Catalog Load (0.4ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."id" IN (1, 2) ORDER BY coalesce("catalogs"."ancestry", '')
  Catalog Load (0.3ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."id" IN (1, 3) ORDER BY coalesce("catalogs"."ancestry", '')
  Catalog Load (0.2ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."id" IN (1, 3) ORDER BY coalesce("catalogs"."ancestry", '')
closure_tree:
pry(main)> Catalog.includes(:self_and_ancestors).all.map{|x| x.self_and_ancestors[0..-2]}
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs"
  CatalogHierarchy Load (0.2ms)  SELECT "catalog_hierarchies".* FROM "catalog_hierarchies" WHERE "catalog_hierarchies"."descendant_id" IN (1, 2, 3, 4, 5, 6, 7) ORDER BY "catalog_hierarchies".generations asc
Cache read_multi: ["slc/catalogs/1/0/e1a68710d2b02b4e6409f3222c9e280e", "slc/catalogs/2/0/e1a68710d2b02b4e6409f3222c9e280e", "slc/catalogs/3/0/e1a68710d2b02b4e6409f3222c9e280e", "slc/catalogs/4/0/e1a68710d2b02b4e6409f3222c9e280e", "slc/catalogs/5/0/e1a68710d2b02b4e6409f3222c9e280e", "slc/catalogs/6/0/e1a68710d2b02b4e6409f3222c9e280e", "slc/catalogs/7/0/e1a68710d2b02b4e6409f3222c9e280e"]
missed ids -> [] | hitted ids -> [1, 2, 3, 4, 5, 6, 7]

closure_tree 没直接将 ancestors 作为关联,所以 includes 的时候要用 self_and_ancestors,并自己去掉 self。因为关联排过序了,所以可以直接用 range 取值。查询 descendants 的情况与 ancestors 的类似。


查询 siblings:

ancestry:
pry(main)> Catalog.all.map(&:siblings)
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs"
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."ancestry" IS NULL
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."ancestry" = '1'
  Catalog Load (0.0ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."ancestry" = '1'
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."ancestry" = '1/2'
  Catalog Load (0.0ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."ancestry" = '1/2'
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."ancestry" = '1/3'
  Catalog Load (0.0ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."ancestry" = '1/3'
closure_tree:
pry(main)> Catalog.includes(parent: :children).all.map{|x| x.parent&.children&.reject{|y| y == x}}
  Catalog Load (0.1ms)  SELECT "catalogs".* FROM "catalogs"
Cache read_multi: ["slc/catalogs/1/0/e1a68710d2b02b4e6409f3222c9e280e", "slc/catalogs/2/0/e1a68710d2b02b4e6409f3222c9e280e", "slc/catalogs/3/0/e1a68710d2b02b4e6409f3222c9e280e"]
missed ids -> [] | hitted ids -> [1, 2, 3]
  Catalog Load (0.2ms)  SELECT "catalogs".* FROM "catalogs" WHERE "catalogs"."parent_id" IN (1, 2, 3)

因为 siblings 依赖于不同查询条件,不能直接使用 includes,但是可以用 parent.children 实现。


总结:

我们的使用情景是对产品批量查出他们的 catalogs,并且要用到 parent 和 grandparent。从生成的查询来看,有理由相信 closure_tree 的性能更好。ancestry 所有查询都无法使用 includes,也只有 parent 可以用到缓存,而 closure_tree 在缓存使用上占优,其他查询也大多可以用 includes 和 eager_load 进行优化。

tips: 使用 closure_tree,如果需要特定 depth 的 parent 和 children,可以自己定义一个关联:

has_one :grandparent, class_name: 'Catalog', through: :grandparent_hierarchy, source: :ancestor
has_one :grandparent_hierarchy, ->{ where(generations: 2) }, class_name: 'CatalogHierarchy', foreign_key: :descendant_id

但是对于 grandparent 没必要这么做,使用 includes(parent: :parent) 会更好。

huacnlee 将本帖设为了精华贴 04月18日 14:02

@darkbaby123 PostgreSQL 的话直接用 ltree 也是不错

需要 登录 后方可回复, 如果你还没有账号请点击这里 注册