SetとArrayのinclude?の違い

published_at: 2022-11-13

概要

先日社内のPRコードレビューを受けて、Array#include?よりSet#include?の方が高速だということを知った

その時にソースを追ってみたまとめ

要約

  • Array#include?は計算量がO(n)
  • Set#include?は計算量がO(1)

Array#include?

1VALUE 2rb_ary_includes(VALUE ary, VALUE item) 3{ 4 long i; 5 VALUE e; 6 for (i=0; i<RARRAY_LEN(ary); i++) { 7 e = RARRAY_AREF(ary, i); 8 if (rb_equal(e, item)) { 9 return Qtrue; 10 } 11 } 12 return Qfalse; 13}

refs. https://github.com/ruby/ruby/blob/master/array.c#L5417-L5430

1/** 2 * This function is an optimised version of calling `#==`. It checks equality 3 * between two objects by first doing a fast identity check using using C's 4 * `==` (same as `BasicObject#equal?`). If that check fails, it calls `#==` 5 * dynamically. This optimisation actually affects semantics, because when 6 * `#==` returns false for the same object obj, `rb_equal(obj, obj)` would 7 * still return true. This happens for `Float::NAN`, where `Float::NAN == 8 * Float::NAN` is `false`, but `rb_equal(Float::NAN, Float::NAN)` is `true`. 9 * 10 * @param[in] lhs Comparison LHS. 11 * @param[in] rhs Comparison RHS. 12 * @retval RUBY_Qtrue They are the same. 13 * @retval RUBY_Qfalse They are different. 14 */ 15 16VALUE rb_equal(VALUE lhs, VALUE rhs);

refs. https://github.com/ruby/ruby/blob/master/include/ruby/ruby.h#L164-L178

説明によるとオブジェクト同士の比較ということ

1describe "rb_equal" do 2 it "returns true if the arguments are the same exact object" do 3 s = "hello" 4 @o.rb_equal(s, s).should be_true 5 end 6 it "calls == to check equality and coerces to true/false" do 7 m = mock("string") 8 m.should_receive(:==).and_return(8) 9 @o.rb_equal(m, "hello").should be_true 10 m2 = mock("string") 11 m2.should_receive(:==).and_return(nil) 12 @o.rb_equal(m2, "hello").should be_false 13 end 14end

refs. https://github.com/ruby/ruby/blob/master/spec/ruby/optional/capi/object_spec.rb#L764-L779

念の為テストも確認して同値比較メソッドであることを確認

つまりArray#include?は1つずつ同値比較をしているので計算量がO(n)

Set#include?

1# Returns true if the set contains the given object. 2# 3# Note that <code>include?</code> and <code>member?</code> do not test member 4# equality using <code>==</code> as do other Enumerables. 5# 6# See also Enumerable#include? 7 8def include?(o) 9 @hash[o] 10end 11alias member? include?

refs. https://github.com/ruby/ruby/blob/master/lib/set.rb#L397-L406

hashでのアクセスにより O(1)の計算量を実現していることが分かった

ちなみにSetはイニシャライズでhashとして扱われている

1def initialize(enum = nil, &block) # :yields: o 2 @hash ||= Hash.new(false) 3 enum.nil? and return 4 if block 5 do_with_enum(enum) { |o| add(block[o]) } 6 else 7 merge(enum) 8 end 9end

refs. https://github.com/ruby/ruby/blob/master/lib/set.rb#L245-L255