…or something like that.

I’ve been toying and experimenting with different data structures to represent the domain name cache in mdnsd.

These are the structures I’ve seriously considered

  • Radix tree/trie – It’s attractive to do lookup in O(k), but with a key length of 255 bytes (utf8) the tree would be huge, 2040 bits. Not sure it would be effective even with all the compression techniques in the book.
  • Splay tree (self-balancing binary tree), scales well but key comparisons must be strcmp().
  • Hash table, well obviously fast but there could be scaling problems as collisions become lager.
  • “Chained” hash table – Use the fact that domain names are separated in labels (label1.label2.tld) and use one hash table on each level.

Very early on I also considered skip lists, but a test showed that it didn’t beat the splay tree, so it got ditched. Radix tree were also ditched as I believe it will be quite hard to make it effective for such large keys.

I measured the performance of the three remaining structures using 1000000 random domain names, the numbers are an average of 10 runs. Code compiled with -O2, tests performed on a Pentium M 1.7Ghz. “add” refers to the time it took to insert 1000000 entries, and “find” refers to the time it took to find those 1000000 entries.

  • Splay tree (sys/tree.h), add 4.90300 seconds, find 3.3907 seconds
  • “Chained” hash table, add 7.62640 seconds, find 1.38780 seconds
  • Hash table, add 2.19680 seconds, find 1.04970 seconds

The reason the chained hash table has such a bad add time is mainly because of malloc() call overhead in the hash table initiation for each level, there is code to reduce malloc overhead (pre-allocating a large chunk and splitting it) but it’s not enough. The hash table is self-growing and will double in size when any bucket hits a threshold number of collisions, experiments showed that up to about 10 collisions per bucket was reasonable. The tests above that included a hash table had a maximum hash table size of 131072 buckets, obviously by increasing the max size (trading memory for speed) you get better performance.

My initial doubts that a hash table would scale bad were not true, 1000000 entries is A LOT more than what the mdnsd responder will ever cache in practice, and because it’s self-growing it won’t waste a lot of memory. I guess I’ll use a hash table then.

2 Responses to “Never let a tree do a hash table’s job”
  1. fli says:

    Probably nothing, but my experience with them is about null. Would be interesting to see how they perform.
    Do you have any document pointers?

    Wikipedia says they are the same as radix trees but for generic input, will I still need to work on the bit level? For example, how many nodes would a 255 byte long key need?

  2. What’s wrong with Patricia trees?