To Float or Not to float, The noob's question

TJHJ · 2008-03-28 19:36

So ive been thinking about this and Have decided instead of pondering aimlessly I would seek your all's advice again. Thanks as always.

Is float really that much slower?

Per Example
Really take Var1*.9

PusdoNumber := ((Var1 *1000)/900) 
Actualresult := Pusdonumber/1000

So if I am reading the assembly instructions correctly, Each operation is taking 4 clock cycles, So to Read Var1, Multiply by 1000, Divide by 900, Store, Read Again, Divide by 1000. Would take a total of 24 clock cycles. 6 operations, 4 cycles each. (I think, Not counting any influence that the hub would have on it).

Ok versus

Con
Con1 = .9

Obj
Flo : "Floatmath"  '' Im not on my spin computer, So I cant remember the object name off hand, Sorry

Pub Main
  ActualResult := Flo.FloatMutiply(Var1, Con1)

Is it really that much slower? It seems faster operation wise, maybe I misread, I only performed 5 operations counting the Store Var if I am reading the Float Object correctly. So this only took 20 clock cycles.

The other thought is that the "Packing/ UNpacking" is where the extra time comes from in float based operations. So if the Constant was packed and then the Float operation was used later, using the same packing, it would actually be as fast if not faster.

Such that


Flo.Pack(Con1)

Do whatever


Repeat
  ActualResult := Flo.FloatMutipily(Var1, Con1)

  Do Some what ever with the  Result.

Is this still slower, Or almost negligible at this point, because it doesn't have to be packed each time? Maybe Im missing a large chunk of this whole float idea, But Im an Engineering student, I like decimals.

It seems almost that the key factor would be if the Cog has to access the hub, If so then for either case the addition of 15 possible cycles destroys any speed advantage that would have been gained? Im having a hard time figuring out what takes "Hub access" and what doesn't. So if thats a horrible question, don't worry about answering it, just say its evil or something.

Also I see that Parallax Sells a floating point co processor, but I cant seem to find any documentation in using it with the Prop chip. Im thinking this would be kinda dumb, because I could just use an open Cog to perform the Float calculations. But Lets assume I am stretching the Prop chip to the Max, and all the cogs are full and occupied. (Most likely because I am an inefficient coder. [noparse]:([/noparse] )
Can the Prop Chip account for the Floating Point Processor? Would this even be advantageous?

Thank You for your time and help

TJ

stevenmess2004 · 2008-03-28 22:17

Its not quite that simple. Spin is an interpreted language that is very much stack based. So for just this code

something:=Var1 *1000

There would be something like
PUSH Var1
PUSH 1000
MUL
POP something
This isn't the actual result but just to give you an idea. Now, because it is being run in an interpreter and the op-codes are stored in the hub each instruction is going to need at least one hub access. Than there is also the time to decode and execute the instructions. So each bit of spin code takes around 1000 cycles on average to perform.

Why are you multiplying Var1 by 1000/900? Did you mean 900/1000? The problem you run into with this is that this is done using integer maths. So if you started with 10 the result would be 9 which is what you would expect. However, if you tried this with 1 you would end up with 0!

To multiply the floating point numbers you must first push all the arguments onto the stack, call the float object, unpack the two arguments, multiply them together, add the exponents, fix the result so that it is the right position, repack them, return to the main function and pop the result from the stack.

This is obviously a lot slower than you earlier example.

Do you need floating point? Don't forget that you can use a fixed point system and things will be a lot easier and quicker.

As for the co-processor I think that the floating point object is nearly as fast as the co-processor so there is not much point in using it.

Hope this helps.

Mike Green · 2008-03-28 22:20

The floating point library's documentation gives the timing for the operations. You should read it.

TJHJ · 2008-03-29 01:13

Thanks Again, Someday Ill stop asking dumb question I swear Mike [noparse]:)[/noparse].

stevenmess2004 said...

Why are you multiplying Var1 by 1000/900? Did you mean 900/1000? The problem you run into with this is that this is done using integer maths. So if you started with 10 the result would be 9 which is what you would expect. However, if you tried this with 1 you would end up with 0!

It was meant to follow my example. I guess I should have specified what the 1st number is but to take a number and multiply by .9, eg, 1000*.9 = 900. Which is what I think I would return.

So what I am really doing, is measuring the time on a Crank sensor, But it is not a traditional one.
It is 163.41 (Deg) High, Notch 16.59 (deg), Repeat

The engines TDC arrives at the start of the 1st notch. Being a two crank, Square, two stroke the engine will fire 2x each revolution, A dead fire in two cylinders. So numbering Crank 1 has cylinder 1,2 and Crank 2 has cylinder 3,4, So the Fire order is 1,4- 2,3. But in the original setup it fires both coils( 2 plugs per coil) at the same time. So this helps me keep it simple.

So I am using the BS2 function Pulsin, Looking for the high time. Which is .907833 of the revolution. So normally I would take Measured Time/.907833 to determine speed per 1/2 revolution. Convert this a traditional RPM Number by Multiplying times 2. Then times 1/1_000_000 of a second, Sec to MS.

So it looks like this

HighTime := Bs2.Pulsin(Crankin, 1)
Cranktime := (((Hightime*100_000)/90783)*2)
RPM := 1_000_000/Cranktime

But when all I really needed to do was take hightime/.907833 but no float = my problems.
I have it split right now because it does funny things with the math, My best guess is that I am going over the size of the Long Var limit. Trying to maintain to much accuracy.

So then the RPM gets dumped into another cog that Looks up timing, and figures how long the next delay from the start of high part of the sensor.

So in another cog

it looks up the Degrees advance, from a pre made table, that I have already scaled by 10^6 and divided by 180(Deg)

RNew := RPM/1000 '' Truncate it so it looks up in the table correctly
*Get lookup value, for M and B for the correct RPM. * '' Using y=mx +b, I dont even want to think about quadratics. 

Advance := M*RPM+B
Waittime := (Cranktime*1_000_000)/Advance

Even here is getting to be a headache, I am trying to make it adjustable from an LCD screen so doing the calcs in excel and then scaling it all up and down, its running me nuts.

Then another cog is waiting for the next TDC to pass by to start that timing.

Repeat
  Waitpeq( <| Crankin, Crankin, 0)
  Waitcnt(Waittime + Cnt)
  Outa[noparse][[/noparse]FirePlug] 
  Waitcnt(100+ cnt)

So I feel like Im doing a whole bunch of operations for no reason.

Now comes the concern about time. The motors maximum RPM is 13,000
But for simplicities sake lets say 12,000
The time for 1 revolution at 12,000 rpm is 1/200th of a second.
So If in order to stay only 1/2 of revolution behind, Ideally it would compute the new delay in the time the sensor reads the gap in the Crank wheel.
So each revolution at 12,000 rpm takes 5 mili seconds, 5000 Microseconds
So the time the gap is there before it needs to start counting from TDC is very short,
(16.59/180)*(5000)=460 microseconds

Is this a long time? I cant seem to tell, Would it be possible to do float in the time, 2x, or is it even possible to do Pusdomath in this short of time? There are some other solutions like updating every complete revolution. Giving me a whole extra 5 milliseconds to work with....
Another thought is having a Fixed delay + the delay to get to the new timing delay, giving more calculation time, but this add more code, which means slower time to process. Im just not sure at this point.

Honestly I can't seem to figure out what fast and slow is in computer time world versus something I can relate to, So here is where I am at, Any advice or Suggestions is greatly appriciated as always.

Thanks again
TJ

Post Edited (TJHJ) : 3/29/2008 1:18:29 AM GMT

stevenmess2004 · 2008-03-29 01:59

I would change your table so that it uses HighTime rather than RPM. This should be easier and save a lot of time.

As for speed. People normally run the propeller with an 80MHz clock. However, most assembly instructions take 4 clock cycles which means effectively 20MHz or 50ns/instruction. So you can fit 100,000 assembly instructions in the time it takes to do one rotation. But Spin is up to 200 times slower so that will reduce to about 500 spin instructions. So there shouldn't be a problem with getting the work done but it may have to be done in assembly.

deSilva · 2008-03-29 10:36

FLOATMATH should be used only, when time is of no concern at all; in all other cases use FLOAT32!

FLOAT32 operations need around 10 to 20 µs
A SPIN instruction is around the same time..

This however changes dramatically when comparing it to PASM perfomance.

These are the data from a "natural benchmark" I did a long time ago:

FLOATMATH:       16 minutes
FLOAT32:          1 minute 50 seconds
scaled numbers w/ SPIN:    40 seconds
scaled numbers PASM:        3 seconds

----
Edit: The main reason why SPIN compares so favourably is that multiplication and division is always done in machine code,
thus there is no real difference to a PASM program.
IOW the PASM performance depends highly on the addition/multiplication ratio,
wheras the SPIN performance does not..
I used 16x16 multiplication in PASM, whereas SPIN unavoidably does 32x32.

Post Edited (deSilva) : 3/29/2008 10:57:34 AM GMT

heater · 2008-03-29 19:03

Many years ago I was writing code for radar systems at the Marconi Radar company.
My boss there used to say "If you think you have to use floating point then you don't understand the problem"

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

TJHJ · 2008-03-30 02:00

Thanks for the idea of using hightime instead of rpm to get that calculation rolling faster.

And As for the info on Float32, and such, Exactly what I needed, Thanks for all the help.

TJHJ · 2008-03-30 02:16

Ok dumb question,
If I were to use the result of a FMul from the Float32 object, Would I need to truncate it, and/or perform any additional functions in order to use it in a WaitCnt(Fmulresult +cnt)? or will spin just ignore anything past the decimal?

Thanks again all.

stevenmess2004 · 2008-03-30 02:26

You will need to convert it from a floating point number back to an integer. There is a function in Float32 for doing this. I think it is called unpack. The reason is that floating point numbers are stored differently to normal numbers. I think this is all described in the documentation or object somewhere.

TJHJ · 2008-03-30 03:46

I might be having a special Moment, I hope not, But I cant seem to find it listed, Unless it is assumed, It is used in Floatmath, But not in Float 32 under spin, Only under assembly. So do I have to use Floatmath for Packing/Unpacking?

Is there no spin command for this? Also what does Ffloat do?

Thanks again.
TJ

stevenmess2004 · 2008-03-30 04:49

Sorry I got the wrong function. The one you want is FInteger in FloatMath. This is a private method so if you want to use it you will need to make it public. If you are using Float32 just copy the FInteger and Unpack into you code.

FFloat converts an integer to a floating point number. There is a mistake in the PDF that goes with the float objects. On page 8 this

long FMulCmd+@y ' fnumA = fnumA + y

should be

long FMulCmd+@y ' fnumA = fnumA * y

deSilva · 2008-03-31 12:35

What you need is
FTRUNC
and/or
FROUND
Obvious, what they do

To Float or Not to float, The noob's question

Comments