PASM Devision routine that works for full 32 bit devision,Help Fix this one or
TJHJ
Posts: 243
The current routine I have been using for devision in PASM is
Thanks Propeller WIKI [noparse]:)[/noparse]
The problem I have run into is that for small devision it works great, anything like 17/2, 20/3, 45/7 just some I tried. But when I try to divide 153_333_333/21_343. I get crazy results. Any suggestions for a routine that can do full 32 bit devision.
If you have a Pasm program you would be willing to share that would save me a ton of time and a headache, I cant thank you enough.
TJ
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I owe everyone here a bunch, So thanks again for answering my dumb questions.
Projects. RG500 ECU system. PropCopter. Prop CanSat. Prop Paste Gun.
Suzuki RG500 in a RGV 250 frame.
Bimota V-Due (Running on the fuel injection system)
Aprilia RS250
Post Edited (TJHJ) : 4/2/2009 5:35:05 AM GMT
mov t,#16 shl y,#15 loop cmpsub x,y wc rcl x,#1 djnz t,#loop
Thanks Propeller WIKI [noparse]:)[/noparse]
The problem I have run into is that for small devision it works great, anything like 17/2, 20/3, 45/7 just some I tried. But when I try to divide 153_333_333/21_343. I get crazy results. Any suggestions for a routine that can do full 32 bit devision.
If you have a Pasm program you would be willing to share that would save me a ton of time and a headache, I cant thank you enough.
TJ
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I owe everyone here a bunch, So thanks again for answering my dumb questions.
Projects. RG500 ECU system. PropCopter. Prop CanSat. Prop Paste Gun.
Suzuki RG500 in a RGV 250 frame.
Bimota V-Due (Running on the fuel injection system)
Aprilia RS250
Post Edited (TJHJ) : 4/2/2009 5:35:05 AM GMT
Comments
Tell us more! Would be really nice to know what you get exactly...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Airspace V - international hangar flying!
www.airspace-v.com/ggadgets for tools & toys
I am working on a full table interploate that has to be done in a limited time. So I wrote the following program to test and see if spin would be fast enough.
So here are the results I get that are odd. It seems anything over about 20,000 in the numerator makes it blow up.
Num = 25_000
Div = 24_683
Spin took : 1600 And the result was : 1
ASM took : 208 And the result was : 20774913
Num = 153_333_333
Div = 24_683
Spin took : 1664 And the result was : 6212
ASM took : 208 And the result was : 166271044
Now intrestingly these numbers work
Num = 20_000
Div = 10_000
Spin took : 1600 And the result was : 2
ASM took : 208 And the result was : 2
Yet does not
Num = 21_000
Div = 10_000
Spin took : 1600 And the result was : 2
ASM took : 208 And the result was : 65536002
Num = 20_000
Div = 15_000
Spin took : 1600 And the result was : 1
ASM took : 208 And the result was : 327680001
Note : Smaller number seem to work just fine.
So I have been trying to read and·create a new div routine·but my asm is just not that good.
·@Brian
I am totally happy with a truncated result, but it would be nice if I could handle signed numbers. I dont think any of the results of the numbers that woud be devided would ever grow beyond 32 bits.
Thank you for the time and help to look at this,
TJ
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I owe everyone here a bunch, So thanks again for answering my dumb questions.
Projects. RG500 ECU system. PropCopter. Prop CanSat. Prop Paste Gun.
Suzuki RG500 in a RGV 250 frame.
Bimota V-Due (Running on the fuel injection system)
Aprilia RS250
Post Edited (TJHJ) : 4/2/2009 3:26:03 PM GMT
You haven't said for sure whether Spin would be fast enough for your app. If it is, I have an object in the OBEX that will help called "umath". It's entirely written in Spin, so you don't have to use up another cog, and includes a 32 * 32 / 32 "fractional" multiply that retains the 64-bit intermediate product for the divide. This may be just right for your interpolation problem. You can either use it as an object or copy the relevant sections into your own program.
-Phil
TJ
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I owe everyone here a bunch, So thanks again for answering my dumb questions.
Projects. RG500 ECU system. PropCopter. Prop CanSat. Prop Paste Gun.
Suzuki RG500 in a RGV 250 frame.
Bimota V-Due (Running on the fuel injection system)
Aprilia RS250
First off, it looks like you're not shifting far enough. You're shifting only 16 bits, where 153_333_333 is a 27-bit number.
As one of your tests, you should always be able to divide by 1, and get back the original dividend. Looks like your function won't do that.
In general, if you want to divide two general 32-bit numbers, you need a dividend register that's 64 (actually 65) bits long. Is the issue really that you want to divide 16-bit numbers, using a 32-bit dividend register? If so, 153_333_333 is too big for an input value.
Whatever method you use, and however long the input words, it's critically important that every trial subtraction end up leaving no non-zero bits at the "high end." As with decimal long division,
you have to shift the operands at the beginning, so you can guarantee that the first subtraction doesn't leave any 1-bits at the high end.
It looks like your routine doesn't do this preconditioning. Granted, it's going to take extra cycles, but it's necessary.
Years ago I wrote a 32x16 div routine for the z80. Recently, the topic came up again in comp.os.cpm. I no longer have a z80 system, so couldn't show them the z80 code, but I wrote a C version to show them the algorithm. I'm attaching it so it may help.
I see that Phil Pilgrim's ASM code has the right algorithm, but again, there needs to be that initial preconditioning.
Let me explain again, and I'll try to be as clear as I can: In that first comparison, the divisor must be aligned so that it either just fits into the top bits of the dividend, or is just barely too large. After the subtraction, the high bit of the result _MUST_ be a zero. If you're dividing x by 1, the 1 must be left justified so this is true. Otherwise, you won't get back the original dividend, and non-zero high bits end up shifted off into outer space.
Jack
Jack
-Phil
It works, but I'm not terribly enamoured with that extra sub. There has to be a more efficient way...
-Phil
Post Edited (Phil Pilgrim (PhiPi)) : 4/2/2009 6:57:13 PM GMT
The code has been lifted from one of the early parallax documents and is missing the routines header. Here is the correct code...
No wonder his answers were wrong.
Phil and only 676 cycles for div. I Think it will fit now.
Thank you all again.
TJ
EDIT Brian, That helps a whole bunch..... Thanks for finding that for me
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I owe everyone here a bunch, So thanks again for answering my dumb questions.
Projects. RG500 ECU system. PropCopter. Prop CanSat. Prop Paste Gun.
Suzuki RG500 in a RGV 250 frame.
Bimota V-Due (Running on the fuel injection system)
Aprilia RS250
FYI·if takes 208 cycles to process. Ill have to see if I can get the wiki updated.
Thanks again for finding that.
TJ
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I owe everyone here a bunch, So thanks again for answering my dumb questions.
Projects. RG500 ECU system. PropCopter. Prop CanSat. Prop Paste Gun.
Suzuki RG500 in a RGV 250 frame.
Bimota V-Due (Running on the fuel injection system)
Aprilia RS250
The new 64bit divide routine takes ~676 clocks for 64 / 32 = 32
Just wanted to point this out for anyone referencing this post in the future.
TJHJ, if you know what kind of math you're doing (aka converting something like temperature repetatively), you can use the 32 bit routine without losing much accuracy. I've done this with a few different sensors. It all depends on just how accurate you need the divide. But, at 676 cycles, it's nice to have the 64 bit divide.
-Parsko
Example
Worst case is 2^32/1 = 2^32 a 32 bit number. And the simplest cast is 2^32/2^32 = 1.
I cant think of a case that causes a result larger than 32 bits if its only for devision because of the fact its integer math.
Maybe Im looking at this wrong. Or is it just the fact we need a 64 bit register to start with?
Parsko,
Thanks, Its a balance of accuracy versus time, if I can use the full 64 bit routine, why not keep the accuracy, if I cant make it crunch in time then Ill sacrifice the accuracy for the 32 bit routine.
TJ
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I owe everyone here a bunch, So thanks again for answering my dumb questions.
Projects. RG500 ECU system. PropCopter. Prop CanSat. Prop Paste Gun.
Suzuki RG500 in a RGV 250 frame.
Bimota V-Due (Running on the fuel injection system)
Aprilia RS250
Also just for the record ICCV7 C took 1008.
So Your choices are
32bit/16 bit at 208 clks
32/32 signed at 676 clks
ICCV7 At 1008 clks
And Spin at 1600-1604
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I owe everyone here a bunch, So thanks again for answering my dumb questions.
Projects. RG500 ECU system. PropCopter. Prop CanSat. Prop Paste Gun.
Suzuki RG500 in a RGV 250 frame.
Bimota V-Due (Running on the fuel injection system)
Aprilia RS250
-Phil
Actually, Phil, ra doesn't have to be smaller than rx. It only has to be smaller than twice rx,
so when you compute ra - rx, the high bit is always zero.
Consider this example, though: ra:rb = $0000_0001:0000_0000, rx = $0000_0001 (i.e. ra = rx). The quotient ($1_0000_0000) overflows 32 bits. I believe this would be the case any time ra > rx.
-Phil
I think Phil pointed out above the possibility of a 32 * 32 / 32 "fractional" multiply, which can be faster than a genera purpose division.
In terms of interpolation, you enter with a value x, and then from lookup or lookdown operation find that it lies between two entries xm and xn, where xm = x <= xn. You grab the corresponding values ym and yn. Then, by linear interpolation,
y = ym + (yn - ym) * (x - xm) / (xn - xm)
The term on the right after the + is of the form c * a/b with a/b<1.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com
Im having a hard enough time keeping the math in the original formula context, let alone trying to convert it for efficiency. Im trying to get it working 1st.
Its a monster, I am working on an intelligent step estimator that takes a know data set from any two sensors, compiling them into a 3d table based on time. Then based on the previous data set results develop a least squares 6th order equation defining the system. Make the system move to the next point most likely containing the goal based on the predicted fit of the spline. Rinse and repeat, ditching outliers that are no longer be prevalent to the current data.
If I have the math correct it would basically be a self tuning PID system capable of accepting multiple inputs and balancing them automatically to achieve the desired goal. Ideally its a pid loop that is not a wave, and therefore correctly responds to each change in conditions with a very close estimate of its desired goal.
Who knows, if it works NO MORE PID TUNING [noparse]:)[/noparse] or Non stable loops (That in my case tend to tear a bunch of stuff up always, why do they always go unstable when your not watching them?)
TJ
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I owe everyone here a bunch, So thanks again for answering my dumb questions.
Projects. RG500 ECU system. PropCopter. Prop CanSat. Prop Paste Gun.
Suzuki RG500 in a RGV 250 frame.
Bimota V-Due (Running on the fuel injection system)
Aprilia RS250
Nonetheless, the operation of taking proportions right up there with multiplication and division. It is indeed an optimization, because the original terms in the formulas have to be evaluated a priori to see where a proportion is involved. Then adjust the order of operations to avoid double precision, that is, calculate (a/b)*c rather than (c*a)/b. Well, it depends too on how much precision the problem really needs.
In spin, ** is a double precision operation, but you get to hand that off to the interpreter.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com
It doesn't work that way, Phil. Or at least, it shouldn't. rx should always be left justified, so that its high bit is always
a one. With that arrangement, rx ends up as $1000_0000. Even if rx = $1111_1111, subtracting rx gives
$0111_1111. The high bit has been reset, the 1 has been recorded as the first bit of the result, and the process
can continue.
This is nothing new. Remember that in ordinary long division, we had to "point off" the divisor so that it was as large
as possible and still "go into" the dividend. The binary algorithm works exactly the same way, except that the trial
divisions are much easier because the partial quotient is always either a 1 or a 0. You never have to back up to a
smaller quotient digit, as sometimes happens with decimal division.
Jack
-Phil