How do you hash 4 doubles into a size_t?

Question

I have bounding boxes the key type.

Box {
  double mins[2];
  double maxs[2];
}

And I want to have Box as the key type in the D programming language, so I have to implement:

size_t toHash() const @safe pure nothrow {
   size_t hash;
   for(size_t k=0; k < 2; k++) {
       // do something here
   }
   return hash;
}

Should I have a linked list on the output of the associative array and search through the list of there's a collision? Should I find some reasonable bound of double that is in my application then come up with some formula by shifting?

Not sure what to do here.

score 3 · Answer 1 · edited May 23 '17 at 12:40

Hashing makes only sense in conjunction with an equality comparison. But if types contain double values like in your example, any sensible equality comparison should work with an "epsilon", something like

   bool Equal(Box b1,Box b2, double epsilon)
   {
      return fabs(b1.mins[0]-b2.mins[0])<epsilon &&
             fabs(b1.mins[1]-b2.mins[1])<epsilon &&
             fabs(b1.maxs[0]-b2.maxs[0])<epsilon &&
             fabs(b1.maxs[1]-b2.maxs[1])<epsilon;
   }

However, this makes hashing pointless, because now "equal" boxes according to this comparison are not guaranteed to produce the same hash code if they are just "almost" equal (for any sensible hash function, as long as you do not implement the hashing function trivially by returning always 0).

Hashing something like double coordinates starts only making sense when you can map them to a discrete range, which means you have some kind of mapping from double to int. And as soon as you have integer coordinates, it should be easy to create a hash value for them (for example, like shown here).

score 2 · Accepted Answer · edited May 23 '17 at 12:40

If you can generate a guaranteed unique hash, then do so. However this means your box coordinates must be relatively small (eg if a box can be between 0 and 1, in increments of 0.1 then you have 10 steps for each co-ordinate, making a possible 10^4 unique values, assuming no 2 boxes can overlap)

In most cases you will not be able to guarantee a unique hash, so generate what you can and understand your hashes may collide. If this case, it depends what you're trying to do - I think you'd want to store these clashing hashes in a list similar collection and when reading a box, iterate through all clashes to find the exact one you wanted. The point is that you can quickly skip nearly all the un-needed boxes to get to the one you wanted.

SO has a good explanation of what hash tables are for and how they're used.

In your case, you might want to consider a single corner to identify your boxes if they don't overlay each other, the size of the box is irrelevant in such a case. Whatever you do, try to make calculating the hash as fast as possible as it will be calculated often.

How do you hash 4 doubles into a size_t?

2 Answers2