Fortsätt till huvudinnehåll

Override the Object.GetHashCode method, why you should bother

When you design a class you sometimes want to override the Equals method in order to use some internal fields to determine if two instances of the class are equal.

For example let's assume that you have implemented a class Money that has two fields, one named _amount of type decimal and another one called _currency of type string. You decide that two objects of type Money are equal if and only if the amount and currency fields have the same values.

The implementation of Equals looks like this:

public override bool Equals(object obj)
{
  if (obj is Money other)
  {
    return _currency.Equals(other._currency, StringComparison.Ordinal)
        && _amount.Equals(other._amount);
  }
  return false;
}

You then notice that you get a warning during compilation,Warning CS0659 'Money' overrides Object.Equals(object o) but does not override Object.GetHashCode(). So what is this all about?

If you look into it a bit you will find that there is a rule saying that if two objects are equal, they must have the same hash code, and since you have just overridden the method that determines if two objects are equal, it is your responsibility to ensure that they also get the same hash. But why do two identical objects need to have the same hash?

The hash code is used when you build HashSets and Dictionaries. If you don't override the GetHashCode method the default method will be used, this will only compare the object's memory address and even though you consider them to be equal they will be treated as different in the HashSet or Dictionary. This breaks the use of HashSets and Dictionaries with your objects and you do not want that.

However, since the rule says that objects that are identical should have the same hash, but not that objects that are not identical cannot have the same hash, a quick fix is to implement a GetHashCode method that always returns the same integer value. This actually is a valid implementation. But, this also means that you will get hash collisions for instance of your class. If used as key-values in a Dictionary this will make addition and look-up to the dictionary much much slower.

The best thing you can do is to use the same data that you use to check if the two instances of the class are equal to also calculate the hash code. For the Money class the GetHashCode method can look like this:

public override int GetHashCode()
{
  var hashCode = _currency.GetHashCode() + _amount.GetHashCode();
  return hashCode;
}

It is worth putting some effort your hash code method!

Kommentarer

Populära inlägg i den här bloggen

Does TDD really improve software quality?

I have asked myself this question several times, and searched for answers, without coming up with any clear answer. Therefore I have decided to go hard core TDD for a longer period of time (at least 6 months) to really evaluate the effects. There are several things that I find confusing when it comes to TDD. One example is what actually defines a unit test. What is a "unit" anyway? After reading a bit about it I found a text claiming that the "unit" is "a unit of work", i.e. something quite small. Like converting a string to UPPERCASE or splitting a string into an ['a','r', 'r', 'a', 'y'] of chars. This work is usually performed by a single call to a single method in a single, isolated, class. So, what does it mean that a class is isolated? Does it mean that it doesn't have any dependencies to other classes? NO! In the context of TDD it means that any dependencies are supplied by the test environment, for exa...

Codility tasks - Part I

I was recently faced with two codility tasks when applying for a job as an Embedded Software Engineer. For those of you who arn't familiar with Codility you can check out their website here:  www.codility.com Task one - Dominator The first task was called Dominator. The goal was to, given a std::vector of integers, find an integer that occurs in more than half of the positions in the vector. If no dominator was found -1 should be returned. My approach was to loop through the vector from the first to the last element, using a std::map to count the number of occurences of each integer. If the count ever reached above half the size of the vector I stopped and returned that integer and if I reached the end without finding a dominator I returned -1. So was that a good approach? Well, the reviewer at the company rated the solution as 'pretty ok'. His preferred solution was store the first integer in the array and set a counter to 1. Then loop through the remaining i...

Codility tasks - Part II

Now, the second codility task I was faced with was a bit tougher. The goal was to create a function that, given a vector of integers A and an integer K, returned the number of integer pairs in the vector that, when added, sums up to K. Let me give you an example. Assume that you are given a vector A = [0, -1, 3, 2, -5, 7] and K = 2. Possible combinations to get K are (0, 2), (-1, 3), (3, -1), (2, 0),  (-5, 7), and (7, -5). In other words, the function should return 6. Now, how did I solve this task? The first solution that came to mind involved nested for-loops. The outer loop picking one integer at the time from the vector and the inner loop adding the integer to the others one by one to see if the result is K. This solution works, but it does not scale well. Time complexity will be O(N**2) ,   something that for large vectors will result in very long execution times. My second approach was to use my old friend, the integer counter, and count all occurences of each...