|
May
25
2007
Never let a tree do a hash table’s jobPosted by fli in FreeBSD, tags: FreeBSD, Summer of Code 2007…or something like that. I’ve been toying and experimenting with different data structures to represent the domain name cache in mdnsd. These are the structures I’ve seriously considered
Very early on I also considered skip lists, but a test showed that it didn’t beat the splay tree, so it got ditched. Radix tree were also ditched as I believe it will be quite hard to make it effective for such large keys. I measured the performance of the three remaining structures using 1000000 random domain names, the numbers are an average of 10 runs. Code compiled with -O2, tests performed on a Pentium M 1.7Ghz. “add” refers to the time it took to insert 1000000 entries, and “find” refers to the time it took to find those 1000000 entries.
The reason the chained hash table has such a bad add time is mainly because of malloc() call overhead in the hash table initiation for each level, there is code to reduce malloc overhead (pre-allocating a large chunk and splitting it) but it’s not enough. The hash table is self-growing and will double in size when any bucket hits a threshold number of collisions, experiments showed that up to about 10 collisions per bucket was reasonable. The tests above that included a hash table had a maximum hash table size of 131072 buckets, obviously by increasing the max size (trading memory for speed) you get better performance. My initial doubts that a hash table would scale bad were not true, 1000000 entries is A LOT more than what the mdnsd responder will ever cache in practice, and because it’s self-growing it won’t waste a lot of memory. I guess I’ll use a hash table then.
2 Responses to “Never let a tree do a hash table’s job”
Leave a Reply
|
Entries (RSS)
What’s wrong with Patricia trees?
Probably nothing, but my experience with them is about null. Would be interesting to see how they perform.
Do you have any document pointers?
Wikipedia says they are the same as radix trees but for generic input, will I still need to work on the bit level? For example, how many nodes would a 255 byte long key need?