Store and look up 3,000 ZStrings - possible?

Nick Mueller · 2010-11-15 07:23

So to do a binary compare, the dictionary values would have to be in order, right?

Yes, sorted. Like with any less-than-stupid search algorithm.

Hashing needs the list to be sorted too but also needs to add gaps to make hashing more effective.
Yes, binary search works by "divide and conquer". If you do a bit of google, you will find the algorithm with all the details.

Nick

Bobb Fwed · 2010-11-15 08:20

but if he has the memory to spare, its not a big deal to do an indexed storage system.If he intends to add words later on, indexing wouldn't require an entire rewrite of the dictionary. Though its probably not a huge deal, it could be convenient to add single entries on the fly.

Nick Mueller · 2010-11-15 08:34

If he intends to add words later on, indexing wouldn't require an entire rewrite of the dictionary.

Open program in editor, add word at the right place, compile and run.
As long as the data fits in RAM (depends on how big the rest of the program is), he could have the entries unsorted at first and sort them after boot in place. Makes a nice benchmark for sorting algorithms too.

Nick

Bobb Fwed · 2010-11-15 08:44

I understand that its easy. I was saying on the fly meaning without use of a computer. Entries could be added directly on the device. Of course with some sorting software and a bit of time, your method could also do this. I still contend my way is easier for him, especially since I've already written it (minus the lines of code to wrap around). It all really depends on what his needs are and what resources he'll have available if be wants to make changes.

John Kauffman · 2010-11-15 10:28

Since this is a learning exercise for me, I really appreciate both of you providing the background logic for the two approaches.

Kal_Zakkath · 2010-11-15 11:56

For some background reading check out HashTable which is basically the structure being described here.

Nick Mueller wrote:

...Hashing needs the list to be sorted too...

this is not true, hashing effectively maps a value to an array index (or EEPROM address in this case), list ordering is irrelevant.

I would say that, based on the prop's architecture you have the following situation:

Binary Search
Pros:

Garaunteed O(log n) runtime (as nick said, worst case for 3,000 entries is 12 checks)
No 'wasted' storage (Hashtables work well until they start to get too full and collisions become more likely)

Cons:

Difficult to add new items 'on-the-fly'
Input list must be sorted
Difficult if the items require more than 32-bits (i.e. longer words) due to the prop's architecture

Hashing/HashTable
Pros:

Easy to add new items at runtime
Input can be in any order
Easier to adapt for longer words
Code has already been provided (always a plus )
Best case O(1) runtime (i.e. runtime does not expand regardless of the number of items)

Cons:

Less efficient use of space (Rule of thumb: Hashtables should be less than 80% full)
Performance relies on choice of hashing algorithm
Worst case runtime O(n) (every item is searched) - this could be improved with smarter collision rules, but also gets more complicated.

Both of these methods are in wide use in computing today, and both have advantages/disadvantages - so learning one or the other will certainly be handy. Having said that, be weary of 'premature optimisation', the size of your list is small (in computing terms) and, as others have pointed out, even the standard linear search will complete in microseconds.

Store and look up 3,000 ZStrings - possible?

Comments