To Float or Not to float, The noob's question
TJHJ
Posts: 243
So ive been thinking about this and Have decided instead of pondering aimlessly I would seek your all's advice again. Thanks as always.
Is float really that much slower?
Per Example
Really take Var1*.9
So if I am reading the assembly instructions correctly, Each operation is taking 4 clock cycles, So to Read Var1, Multiply by 1000, Divide by 900, Store, Read Again, Divide by 1000. Would take a total of 24 clock cycles. 6 operations, 4 cycles each. (I think, Not counting any influence that the hub would have on it).
Ok versus
Is it really that much slower? It seems faster operation wise, maybe I misread, I only performed 5 operations counting the Store Var if I am reading the Float Object correctly. So this only took 20 clock cycles.
The other thought is that the "Packing/ UNpacking" is where the extra time comes from in float based operations. So if the Constant was packed and then the Float operation was used later, using the same packing, it would actually be as fast if not faster.
Such that
Is this still slower, Or almost negligible at this point, because it doesn't have to be packed each time? Maybe Im missing a large chunk of this whole float idea, But Im an Engineering student, I like decimals.
It seems almost that the key factor would be if the Cog has to access the hub, If so then for either case the addition of 15 possible cycles destroys any speed advantage that would have been gained? Im having a hard time figuring out what takes "Hub access" and what doesn't. So if thats a horrible question, don't worry about answering it, just say its evil or something.
Also I see that Parallax Sells a floating point co processor, but I cant seem to find any documentation in using it with the Prop chip. Im thinking this would be kinda dumb, because I could just use an open Cog to perform the Float calculations. But Lets assume I am stretching the Prop chip to the Max, and all the cogs are full and occupied. (Most likely because I am an inefficient coder. [noparse]:([/noparse] )
Can the Prop Chip account for the Floating Point Processor? Would this even be advantageous?
Thank You for your time and help
TJ
Is float really that much slower?
Per Example
Really take Var1*.9
PusdoNumber := ((Var1 *1000)/900) Actualresult := Pusdonumber/1000
So if I am reading the assembly instructions correctly, Each operation is taking 4 clock cycles, So to Read Var1, Multiply by 1000, Divide by 900, Store, Read Again, Divide by 1000. Would take a total of 24 clock cycles. 6 operations, 4 cycles each. (I think, Not counting any influence that the hub would have on it).
Ok versus
Con Con1 = .9 Obj Flo : "Floatmath" '' Im not on my spin computer, So I cant remember the object name off hand, Sorry Pub Main ActualResult := Flo.FloatMutiply(Var1, Con1)
Is it really that much slower? It seems faster operation wise, maybe I misread, I only performed 5 operations counting the Store Var if I am reading the Float Object correctly. So this only took 20 clock cycles.
The other thought is that the "Packing/ UNpacking" is where the extra time comes from in float based operations. So if the Constant was packed and then the Float operation was used later, using the same packing, it would actually be as fast if not faster.
Such that
Flo.Pack(Con1) Do whatever Repeat ActualResult := Flo.FloatMutipily(Var1, Con1) Do Some what ever with the Result.
Is this still slower, Or almost negligible at this point, because it doesn't have to be packed each time? Maybe Im missing a large chunk of this whole float idea, But Im an Engineering student, I like decimals.
It seems almost that the key factor would be if the Cog has to access the hub, If so then for either case the addition of 15 possible cycles destroys any speed advantage that would have been gained? Im having a hard time figuring out what takes "Hub access" and what doesn't. So if thats a horrible question, don't worry about answering it, just say its evil or something.
Also I see that Parallax Sells a floating point co processor, but I cant seem to find any documentation in using it with the Prop chip. Im thinking this would be kinda dumb, because I could just use an open Cog to perform the Float calculations. But Lets assume I am stretching the Prop chip to the Max, and all the cogs are full and occupied. (Most likely because I am an inefficient coder. [noparse]:([/noparse] )
Can the Prop Chip account for the Floating Point Processor? Would this even be advantageous?
Thank You for your time and help
TJ
Comments
There would be something like
PUSH Var1
PUSH 1000
MUL
POP something
This isn't the actual result but just to give you an idea. Now, because it is being run in an interpreter and the op-codes are stored in the hub each instruction is going to need at least one hub access. Than there is also the time to decode and execute the instructions. So each bit of spin code takes around 1000 cycles on average to perform.
Why are you multiplying Var1 by 1000/900? Did you mean 900/1000? The problem you run into with this is that this is done using integer maths. So if you started with 10 the result would be 9 which is what you would expect. However, if you tried this with 1 you would end up with 0!
To multiply the floating point numbers you must first push all the arguments onto the stack, call the float object, unpack the two arguments, multiply them together, add the exponents, fix the result so that it is the right position, repack them, return to the main function and pop the result from the stack.
This is obviously a lot slower than you earlier example.
Do you need floating point? Don't forget that you can use a fixed point system and things will be a lot easier and quicker.
As for the co-processor I think that the floating point object is nearly as fast as the co-processor so there is not much point in using it.
Hope this helps.
It was meant to follow my example. I guess I should have specified what the 1st number is but to take a number and multiply by .9, eg, 1000*.9 = 900. Which is what I think I would return.
So what I am really doing, is measuring the time on a Crank sensor, But it is not a traditional one.
It is 163.41 (Deg) High, Notch 16.59 (deg), Repeat
The engines TDC arrives at the start of the 1st notch. Being a two crank, Square, two stroke the engine will fire 2x each revolution, A dead fire in two cylinders. So numbering Crank 1 has cylinder 1,2 and Crank 2 has cylinder 3,4, So the Fire order is 1,4- 2,3. But in the original setup it fires both coils( 2 plugs per coil) at the same time. So this helps me keep it simple.
So I am using the BS2 function Pulsin, Looking for the high time. Which is .907833 of the revolution. So normally I would take Measured Time/.907833 to determine speed per 1/2 revolution. Convert this a traditional RPM Number by Multiplying times 2. Then times 1/1_000_000 of a second, Sec to MS.
So it looks like this
But when all I really needed to do was take hightime/.907833 but no float = my problems.
I have it split right now because it does funny things with the math, My best guess is that I am going over the size of the Long Var limit. Trying to maintain to much accuracy.
So then the RPM gets dumped into another cog that Looks up timing, and figures how long the next delay from the start of high part of the sensor.
So in another cog
it looks up the Degrees advance, from a pre made table, that I have already scaled by 10^6 and divided by 180(Deg)
Even here is getting to be a headache, I am trying to make it adjustable from an LCD screen so doing the calcs in excel and then scaling it all up and down, its running me nuts.
Then another cog is waiting for the next TDC to pass by to start that timing.
So I feel like Im doing a whole bunch of operations for no reason.
Now comes the concern about time. The motors maximum RPM is 13,000
But for simplicities sake lets say 12,000
The time for 1 revolution at 12,000 rpm is 1/200th of a second.
So If in order to stay only 1/2 of revolution behind, Ideally it would compute the new delay in the time the sensor reads the gap in the Crank wheel.
So each revolution at 12,000 rpm takes 5 mili seconds, 5000 Microseconds
So the time the gap is there before it needs to start counting from TDC is very short,
(16.59/180)*(5000)=460 microseconds
Is this a long time? I cant seem to tell, Would it be possible to do float in the time, 2x, or is it even possible to do Pusdomath in this short of time? There are some other solutions like updating every complete revolution. Giving me a whole extra 5 milliseconds to work with....
Another thought is having a Fixed delay + the delay to get to the new timing delay, giving more calculation time, but this add more code, which means slower time to process. Im just not sure at this point.
Honestly I can't seem to figure out what fast and slow is in computer time world versus something I can relate to, So here is where I am at, Any advice or Suggestions is greatly appriciated as always.
Thanks again
TJ
Post Edited (TJHJ) : 3/29/2008 1:18:29 AM GMT
As for speed. People normally run the propeller with an 80MHz clock. However, most assembly instructions take 4 clock cycles which means effectively 20MHz or 50ns/instruction. So you can fit 100,000 assembly instructions in the time it takes to do one rotation. But Spin is up to 200 times slower so that will reduce to about 500 spin instructions. So there shouldn't be a problem with getting the work done but it may have to be done in assembly.
FLOAT32 operations need around 10 to 20 µs
A SPIN instruction is around the same time..
This however changes dramatically when comparing it to PASM perfomance.
These are the data from a "natural benchmark" I did a long time ago:
----
Edit: The main reason why SPIN compares so favourably is that multiplication and division is always done in machine code,
thus there is no real difference to a PASM program.
IOW the PASM performance depends highly on the addition/multiplication ratio,
wheras the SPIN performance does not..
I used 16x16 multiplication in PASM, whereas SPIN unavoidably does 32x32.
Post Edited (deSilva) : 3/29/2008 10:57:34 AM GMT
My boss there used to say "If you think you have to use floating point then you don't understand the problem"
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
And As for the info on Float32, and such, Exactly what I needed, Thanks for all the help.
If I were to use the result of a FMul from the Float32 object, Would I need to truncate it, and/or perform any additional functions in order to use it in a WaitCnt(Fmulresult +cnt)? or will spin just ignore anything past the decimal?
Thanks again all.
Is there no spin command for this? Also what does Ffloat do?
Thanks again.
TJ
FFloat converts an integer to a floating point number. There is a mistake in the PDF that goes with the float objects. On page 8 this
should be
FTRUNC
and/or
FROUND
Obvious, what they do