Codementor Events

GetHashCode and Equals override in C#

Published Aug 26, 2019Last updated Feb 21, 2020
GetHashCode and Equals override in C#

I was interviewing a candidate with 6+ years of years of experience and asked -

What is GetHashCode in C# .net. and where it's used?

He replied it's a memory address where the object is stored. I was kind of paused as most of his previous answers were very impressive but somehow I was not very satisfied with this answer. Again, I was not very sure of it's default implementation as in my previous projects I always used to override this method. So, strated doubting if default implemenation is returning memory address. So, did a little research and thought of sharing this as most of developers struggle here.

The fact that the GetHashCode method returns the object address in a managed heap is a myth. This cannot be true because of its inconstancy. The garbage collector while compacting a hip shifts objects and thus changes all their addresses.

Let's get started with a question Why do we need this?

The GetHashCode method provides this hash code for algorithms that need quick checks of object equality. A hash code is a numeric value that is used to insert and identify an object in a hash-based collection such as the Dictionary<TKey,TValue> class, the Hashtable class, or a type derived from the DictionaryBase class.

Two objects that are equal return hash codes that are equal. However, the reverse is not true: equal hash codes do not imply object equality, because different (unequal) objects can have identical hash codes.

And, that the If hash codes of 2 objects are same, it uses Equals Method to check if there are same of not. Let's understand this with a below code -

class Program
    {
        static void Main(string[] args)
        {
            var obj1 = new AllowedItem("A-Key", "A-Value", true);
            var obj2 = new AllowedItem("A-Key", "A-Value", true);

            var dic = new Dictionary<AllowedItem, string>();
            dic.Add(obj1, "obj1");
            dic.Add(obj2, "obj2");
        }
    }

    public class AllowedItem
    {
        public string Name { get; private set; }
        public string Value { get; private set; }
        public bool IsAllowed { get; private set; }

        public AllowedItem(string name, string value, bool isAllowed)
        {
            Name = name;
            Value = value;
            IsAllowed = isAllowed;
        }

        public override bool Equals(object obj)
        {
            if (obj is AllowedItem other)
            {
                if (Name == other.Name && Value == other.Value && IsAllowed == other.IsAllowed)
                    return true;
            }
            return false;           
        }

        public override int GetHashCode()
        {
            return Name.GetHashCode() ^
                Value.GetHashCode() ^
                IsAllowed.GetHashCode();
        }
    }

We are trying to insert 2 same object as key to the dictionary. Here this will throw below exception when a duplicate key is being inserted to Disctionay.

System.ArgumentException: 'An item with the same key has already been added. Key: ConsoleApp2.AllowedItem'

The important point to note here is when a first item is added to Dictionary GetHasCode is called and the Hash code ineterger is saved against the object. Now when 2nd object is inserted, once again it call GetHashCode and is compared with all the existing Keys hasCode if it matches It calls Equals override which also say same, so we get an error as duplicate key.

Hash Code also used for HashSet<T>. This also ensure no duplicate items can be added to the set. This also works on same equality principal. If an object that is used as a key in a hash table does not provide a useful implementation of GetHashCode, you can specify a hash code provider by supplying an IEqualityComparer implementation to one of the overloads of the Hashtable class constructor.

public interface IEqualityComparer<in T>
    {
        bool Equals(T x, T y);
        int GetHashCode(T obj);
    }

Now that we know why we use GetHashCode let's answer another important question

What is default implementation in case of Value type vs Reference type?
In case of value GetHashCode return (i.e return this) the same value if the bytes representation can be accommodate in 4 bytes (int size). ex -

int x = 16;
var intHash = x.GetHashCode(); //Result: 16
bool y = true;
var boolHash = y.GetHashCode(); //Result:1

Reference type is little tricky. Starting from .NET 2.0, the hashing algorithm has been changed. Now, it uses a managed identifier of the thread, in which the method is running and the method looks like this:

inline DWORD GetNewHashCode()
{
      // Every thread has its own generator for hash codes so that we won't get into a situation
      // where two threads consistently give out the same hash codes.
      // Choice of multiplier guarantees period of 2**32 - see Knuth Vol 2 p16 (3.2.1.2 Theorem A)
     DWORD multiplier = m_ThreadId*4 + 5;
     m_dwHashCodeSeed = m_dwHashCodeSeed*multiplier + 1;
     return m_dwHashCodeSeed;
}

Thus, each thread has its own generator for hash codes, so that we can not get into a situation where the two threads sequentially generate the same hash codes.

When you first call the GetHashCode method, CLR evaluates the hash code and puts it in the SyncBlockIndex field of the object. If SyncBlock is associated with the object, i.e. the SyncBlockIndex field is used, CLR records the hash code in SyncBlock itself. Once SyncBlock is free, CLR copies the hash code from its body to the object’s header SyncBlockIndex. That’s all.

Discover and read more posts from DhananjayKumar
get started