So you need to do some signal processing and you know you need to do a fourier transform and have made a stab at it by copying and modifying a cookbook FFT and you might even have gotten it to work but have no firm idea what the heck it is doing.    Ok, Ok so here I'm assuming you had the math in school and you are not stupid but it was awhile ago and most importantly you don't want to look lost when you need to defend what you did or god forbid a bug crops up.    Ladies and gentlemen we are here to help.

So first lets talk about our data, most importantly how it got here and how we are representing that information.  First from algebra we have graphs with an X and a Y axis and we know that any point on the graph can be represented by an X value and a Y value which we write  point X,Y.  And all the electrical engineer here will remember we can also write that same point as ( x+jy)  we also remember pythagorm theorum for converting from cartesian coordinates to Polar coordinates  at  this point it is just a reminder and notice that you may need to refresh your memory here.

Now it is possible to convert naturally occurring signals into electrical voltages and currents and graph the voltage levels against time and produce nice graphs but real signal graphs are really messy and don't appear to show any periodocity but if we look at and plot the voltage levels quick enough we can get a really good approximation of the original signal.  So then our data consists of these samples of voltage or current or pressure or whatever in numerical form taken at regular time intervals.  And this process is called digital sampling.   Real world signals are way too “messy' for us to work with in this paper so we will use some pure sine waves but all the while realize that it represents some real signal.


<<<<
Here I think we should introduce the PDF that Bean has put together for us and hopefully will modify to fit the final form we end up with.

And next we bring in Heaters description of the butterfly which I have paraphrased below

THANK YOU BEAN & HEATER  your work is critical to my aha insight and I hope others
>>>>>

A few decades ago I first encountered the FAST Fourier Transform.  It was written in basic without any explanation of how it worked.  On reading the listing it started off with a weird thing.  One by one it took all the elements in the input array and reorganized them in the output array by taking the binary element index and  reversing all the bits forming the index for the output array.  Such that the element at $000 was written to $0000 and $0001 was written to $1000 and the element at $0100 went to $0100 etc etc.   The rest of the program then operated on that reordered output array.   Why I wondered and the rest of the code was of no help so the mystery continued.

By way of explanation , lets now perform a silly little experiment and see what happens.
Imagine we want to add up 8 numbers so, s1 … s6, s7 and we have a function called sum() 
       we could write sum(s0,s1,s2,s3,s4,s5,s6,s7)
in our experiment however we will do some weird things,  lets sum all the even numbers and all the odd numbers separately
      result = sum(s0,s2,s4,s6) + sum(s1,s3,s5,s7)
Obviously this does not change the overall result so lets continue with summing the numbers whose index is even and the numbers whose index is odd in each of the two previous sums, now we have
     result = sum(s0,s4) + sum(s2,s6) + sum(s1,s3) + sum (s3,s7)
and oh what the heck because we are on a roll lets do it again ending up with
    result = sum(s0)+sum(s4)+sum(s2)+sum(s6)sum(s1)+sum(s5)+sum(s3)+sum(s9)
Now our sum does nothing except return its argument and for the final part of our experiment
we rewrite what we ended up with result=s0+s4+s2+s6+s1+s5+s3+s7 and use our sum function again
    result=sum(s0,s4,s2,s6,s1,s5,s3,s7)
Se we are back to where we began except the arguments to sum are reordered kinda strange.
 So now lets examine this reordering in both decimal and binary
     Initial order                                             Final order
Decimal        Binary                              decimal      Binary
0                    000                                    0                000
1                    001                                   4                100
2                    010                                   2                 010
3                    011                                   6                 110
4                   100                                    1                 001
5                   101                                    5                 101
6                   110                                    3                 011
7                   111                                    7                 111

And at this point we notice the binary representation of the final indices are simply the bit reversal of the starting indices.    OK so you noticed and you also remember how the basic program started so there must be something afoot here. So why did we do this in the first place?  Before we continue here a little aside.  This reordering algorithm (when we finally get there) is called the butterfly because if you trace lines from the starting index to the final index the resulting graphic kinda sorta resembles a butterfly flying left to right  (kinda like an X)  So now we continue.

Now look at how a program would compute the FT you will find a nested deep inside a loop inside a another loop is where the multiplication by sin and cos as described above occurs (another aside: this inner loop function is sometimes referred to as the twiddle factors, we shall call these inner loop operations the “guts”.) and for most practical sample sizes results in many thousands (millions) of operations.   As you might expect slows down with more operations.  The bit reversal (aka butterfly) is all about reducing the inner loop operations.   Heres how it works.


Looking at the guts in the inner loop
for 8 samples there are 8 * 8 guts to perform = 64 guts.
For 1024 operations there are 1024 * 1024 = 1048576 guts

Turns out that for the FT you can split the job in half a operate on each half and then add the results using the evens and odds much as we did in our experiment above.  So what you say.  Well!!!
If you start with 16 samples you have 16*16= 256 guts
if you split them in half you have (8*8) + (8*8) = 128 guts
and half again is (4*4) + (4*4) + (4*4) + (4*4)  = 64 guts
and so on and so on.

By splitting the thing up we get rid of the N squared execution time problem!!!
Now we see what all the bit reversal stuff was all about and why the butterfly algorithm is so important.

If we put this all together we have the FFT as an iterative process.
As it turns out recursion does exactly what the butterfly does in a slightly different form each recursive call splits the dataset into smaller and smaller sets until we are left with a single data point at which time the returns start summing up the pieces as we climb back out of the call stack ending up with the finished transform.  The key of course is to select the right arguments for each call and once again the butterfly is our friend.