Shop OBEX P1 Docs P2 Docs Learn Events
PASM Devision routine that works for full 32 bit devision,Help Fix this one or — Parallax Forums

PASM Devision routine that works for full 32 bit devision,Help Fix this one or

TJHJTJHJ Posts: 243
edited 2009-04-07 20:25 in Propeller 1
The current routine I have been using for devision in PASM is
                mov   t,#16
                shl   y,#15
loop
                cmpsub x,y   wc
                rcl   x,#1
                djnz  t,#loop



Thanks Propeller WIKI [noparse]:)[/noparse]

The problem I have run into is that for small devision it works great, anything like 17/2, 20/3, 45/7 just some I tried. But when I try to divide 153_333_333/21_343. I get crazy results. Any suggestions for a routine that can do full 32 bit devision.

If you have a Pasm program you would be willing to share that would save me a ton of time and a headache, I cant thank you enough.

TJ

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I owe everyone here a bunch, So thanks again for answering my dumb questions.
Projects. RG500 ECU system. PropCopter. Prop CanSat. Prop Paste Gun.
Suzuki RG500 in a RGV 250 frame.
Bimota V-Due (Running on the fuel injection system)
Aprilia RS250

Post Edited (TJHJ) : 4/2/2009 5:35:05 AM GMT

Comments

  • virtuPICvirtuPIC Posts: 193
    edited 2009-04-02 07:27
    TJHJ said...
    But when I try to divide 153_333_333/21_343. I get crazy results.

    Tell us more! Would be really nice to know what you get exactly...

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Airspace V - international hangar flying!
    www.airspace-v.com/ggadgets for tools & toys
  • Brian FairchildBrian Fairchild Posts: 549
    edited 2009-04-02 12:32
    Are you only wanting to do unsigned integer division with an unsigned integer result or do you require a fractional result? Or do you need to handle signed numbers? Division of a 32-bit integer by another 32-bit integer will give a 64-bit result.
  • TJHJTJHJ Posts: 243
    edited 2009-04-02 14:43
    Sorry for not being more specific on this one.
    I am working on a full table interploate that has to be done in a limited time. So I wrote the following program to test and see if spin would be fast enough.
    Con
    _clkmode = xtal1 + pll16x
    _xinfreq = 5_000_000
    LedPinNum = 4 
    Num = 25_000'153_333_333
    Div = 24_683   
    obj
    Term : "Serial_Terminal"
    Var
    Long StartPoint,ASMResult, CNT1, CNT2, ResultStore
     
    Pub Main
    StartPoint := LedPinNum   'Set the pin number to be correct
    dira[noparse][[/noparse]LedPinNum]~~   'Test chip by turing on LED It comes on
    outa[noparse][[/noparse]LedPinNum]~~
    waitcnt(clkfreq+cnt)         'wait 1 second
    outa[noparse][[/noparse]LedPinNum]~             'Turn off led, start up asm cog
    waitcnt(clkfreq + cnt)       'Wait 1 second. This lets me make sure my ASM code isnt causing full lock
    
    cognew(@DivTest, @StartPoint) 'Start the new asm cog
    Term.start                    'Start up the terminal
    term.getc                     'Wait for a char
    
     
    CNT1 := cnt                   'Store the cnt before spin div
    ResultStore := Num/Div        'Do spin Div
    CNT2 := cnt                   'Store cnt after
     
    waitcnt(clkfreq + cnt)        'wait 1 second
    Term.out(0)                   'Clear screen
     
    
    term.str(string("Spin took : ")) 'Output the result
    term.dec(CNT2-CNT1)              'Delta CNT Spin
    Term.str(string(" And the result was : "))
    Term.dec(ResultStore)         
    Term.out(13)
    term.str(String("ASM took : "))
    term.dec(StartPoint)
    Term.str(String(" And the result was : "))
    Term.dec(ASMResult)
    repeat                  'Forever repeat to keep chip running
      waitcnt(clkfreq+cnt)
     
     
    Dat
    org
    DivTest
                    mov _cnt1, cnt    'Ssve the counter time before devision
                    mov   t,#16
                    shl   y,#15
    loop
                    cmpsub x,y  wc 
                    rcl   x,#1
                    djnz  t,#loop
                    
                    mov _cnt2, cnt      'Save the second counter time
                    sub     _cnt2, _cnt1    'Find out how long it took
                    
                    mov     t, par          'Use t for the adress of Par
                    add     t, #4           ' add 4 to move 1 long
                    wrlong  _cnt2, par      'Write long at par address
                    wrlong  x, t            'Write address at 1 long + par
    jmpforever      jmp       #JmpForever            'Just keep repeating so it never stops
     
    x       long    Num
    y       long    Div
    t       res       1
    _cnt1   res       1
    _cnt2   res       1  
    


    So here are the results I get that are odd. It seems anything over about 20,000 in the numerator makes it blow up.

    Num = 25_000
    Div = 24_683
    Spin took : 1600 And the result was : 1
    ASM took : 208 And the result was : 20774913


    Num = 153_333_333
    Div = 24_683
    Spin took : 1664 And the result was : 6212
    ASM took : 208 And the result was : 166271044

    Now intrestingly these numbers work
    Num = 20_000
    Div = 10_000
    Spin took : 1600 And the result was : 2
    ASM took : 208 And the result was : 2

    Yet does not
    Num = 21_000
    Div = 10_000
    Spin took : 1600 And the result was : 2
    ASM took : 208 And the result was : 65536002


    Num = 20_000
    Div = 15_000
    Spin took : 1600 And the result was : 1
    ASM took : 208 And the result was : 327680001

    Note : Smaller number seem to work just fine.

    So I have been trying to read and·create a new div routine·but my asm is just not that good.

    ·@Brian
    I am totally happy with a truncated result, but it would be nice if I could handle signed numbers. I dont think any of the results of the numbers that woud be devided would ever grow beyond 32 bits.


    Thank you for the time and help to look at this,
    TJ

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    I owe everyone here a bunch, So thanks again for answering my dumb questions.
    Projects. RG500 ECU system. PropCopter. Prop CanSat. Prop Paste Gun.
    Suzuki RG500 in a RGV 250 frame.
    Bimota V-Due (Running on the fuel injection system)
    Aprilia RS250

    Post Edited (TJHJ) : 4/2/2009 3:26:03 PM GMT
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-04-02 15:55
    TJ,

    You haven't said for sure whether Spin would be fast enough for your app. If it is, I have an object in the OBEX that will help called "umath". It's entirely written in Spin, so you don't have to use up another cog, and includes a 32 * 32 / 32 "fractional" multiply that retains the 64-bit intermediate product for the divide. This may be just right for your interpolation problem. You can either use it as an object or copy the relevant sections into your own program.

    -Phil
  • TJHJTJHJ Posts: 243
    edited 2009-04-02 16:02
    Phil My aplogies. Although the spin devision is correct it takes to long. The problem is I am having to do this live, so the aprrox 1600 clk cycles to get result is to long as for the interpolate it has 4 devision routines + 2 multiply commands.



    TJ

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    I owe everyone here a bunch, So thanks again for answering my dumb questions.
    Projects. RG500 ECU system. PropCopter. Prop CanSat. Prop Paste Gun.
    Suzuki RG500 in a RGV 250 frame.
    Bimota V-Due (Running on the fuel injection system)
    Aprilia RS250
  • Brian FairchildBrian Fairchild Posts: 549
    edited 2009-04-02 16:45
    Your PASM routine isn't a 32-bit divide, it's only 16-bit. The clue is t1, the loop counter, which only iterates 16 times.
  • Jack CrenshawJack Crenshaw Posts: 46
    edited 2009-04-02 16:51
    TJHJ said...
    The problem I have run into is that for small devision it works great, anything like 17/2, 20/3, 45/7 just some I tried. But when I try to divide 153_333_333/21_343. I get crazy results. Any suggestions for a routine that can do full 32 bit devision.

    First off, it looks like you're not shifting far enough. You're shifting only 16 bits, where 153_333_333 is a 27-bit number.

    As one of your tests, you should always be able to divide by 1, and get back the original dividend. Looks like your function won't do that.

    In general, if you want to divide two general 32-bit numbers, you need a dividend register that's 64 (actually 65) bits long. Is the issue really that you want to divide 16-bit numbers, using a 32-bit dividend register? If so, 153_333_333 is too big for an input value.

    Whatever method you use, and however long the input words, it's critically important that every trial subtraction end up leaving no non-zero bits at the "high end." As with decimal long division,
    you have to shift the operands at the beginning, so you can guarantee that the first subtraction doesn't leave any 1-bits at the high end.

    It looks like your routine doesn't do this preconditioning. Granted, it's going to take extra cycles, but it's necessary.

    Years ago I wrote a 32x16 div routine for the z80. Recently, the topic came up again in comp.os.cpm. I no longer have a z80 system, so couldn't show them the z80 code, but I wrote a C version to show them the algorithm. I'm attaching it so it may help.

    I see that Phil Pilgrim's ASM code has the right algorithm, but again, there needs to be that initial preconditioning.

    Let me explain again, and I'll try to be as clear as I can: In that first comparison, the divisor must be aligned so that it either just fits into the top bits of the dividend, or is just barely too large. After the subtraction, the high bit of the result _MUST_ be a zero. If you're dividing x by 1, the 1 must be left justified so this is true. Otherwise, you won't get back the original dividend, and non-zero high bits end up shifted off into outer space.

    Jack


    Jack
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-04-02 16:58
    I deleted my post ahead of Jack's comments. It's a routine I've used, but I wasn't entirely satisfied with it upon looking at it again and needed my morning coffee before proceeding further.

    -Phil
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-04-02 18:49
    Here's the code I deleted, but with some preconditioning to check for invalid operands. Attached is the same with a Spin wrapper for testing.

    '-----------[noparse][[/noparse]* Divide 64 / 32 = 32 ]--------------------------------------------
    
    ' rb := ra:rb / rx, where ra < rx.
    
    divabx        [b]cmpsub[/b]    ra,rx [b]wc[/b],[b]nr[/b]             'Check to make sure ra < rx. Carry if not.
            [b]if_c[/b]  [b]jmp[/b]       divabx_ret              'Return with error in carry.
            
                  [b]mov[/b]       cctr,#32 [b]wc[/b]             'Initialize counter.
            
    :loop         [b]rcl[/b]       rb,#1 [b]wc[/b]                'Shift dividend left.
                  [b]rcl[/b]       ra,#1 [b]wc[/b]
            [b]if_c[/b]  [b]sub[/b]       ra,rx                   'If bit shifted out, then definitely subtract.
           [b]if_nc[/b]  [b]cmpsub[/b]    ra,rx [b]wc[/b]                'Else conditionally subtract, setting carry.
                  [b]djnz[/b]      cctr,#:loop             'Back for more.
                  
                  [b]rcl[/b]       rb,#1                   'Shift in last bit of quotient.
                  [b]test[/b]      cctr,cctr [b]wc[/b]            'Clear carry (error flag).
    divabx_ret    [b]ret[/b]
    
    
    


    It works, but I'm not terribly enamoured with that extra sub. There has to be a more efficient way...

    -Phil

    Post Edited (Phil Pilgrim (PhiPi)) : 4/2/2009 6:57:13 PM GMT
  • Brian FairchildBrian Fairchild Posts: 549
    edited 2009-04-02 19:11
    There's something been bugging me about TJHJs original code and I've put my finger on it. It's wrong, or to be more accurate it only works if you know what the input and output format is. That information is missing from the Wiki.

    The code has been lifted from one of the early parallax documents and is missing the routines header. Here is the correct code...




    Here is an unsigned divider routine that divides a 32-bit value by a 16-bit value to yield a 16-bit quotient and a 16-bit remainder: 
     
    ' 
    ' Divide x[noparse][[/noparse]31..0] by y[noparse][[/noparse]15..0] (y[noparse][[/noparse]16] must be 0) 
    ' on exit, quotient is in x[noparse][[/noparse]15..0] and remainder is in x[noparse][[/noparse]31..16] 
    ' 
    divide          shl     y,#15           'get divisor into y[noparse][[/noparse]30..15] 
                    mov     t,#16           'ready for 16 quotient bits 
     
    :loop           cmpsub  x,y     wc      'if y =< x then subtract it, quotient bit into c 
                    rcl     x,#1            'rotate c into quotient, shift dividend 
                    djnz    t,#:loop        'loop until done 
     
    divide_ret      ret                     'quotient in x[noparse][[/noparse]15..0], remainder in x[noparse][[/noparse]31..16] 
                     
    
    



    No wonder his answers were wrong.
  • TJHJTJHJ Posts: 243
    edited 2009-04-02 19:28
    Awesome thank you so much everyone for the help.

    Phil and only 676 cycles for div. I Think it will fit now.

    Thank you all again.

    TJ



    EDIT Brian, That helps a whole bunch..... Thanks for finding that for me


    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    I owe everyone here a bunch, So thanks again for answering my dumb questions.
    Projects. RG500 ECU system. PropCopter. Prop CanSat. Prop Paste Gun.
    Suzuki RG500 in a RGV 250 frame.
    Bimota V-Due (Running on the fuel injection system)
    Aprilia RS250
  • TJHJTJHJ Posts: 243
    edited 2009-04-02 19:35
    Brain, Which knowing that makes the whole routine work.
    ((ASMResult<<16)>>16)  ' Ditch out the remainder and the number makes sense.
    

    FYI·if takes 208 cycles to process. Ill have to see if I can get the wiki updated.

    Thanks again for finding that.

    TJ

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    I owe everyone here a bunch, So thanks again for answering my dumb questions.
    Projects. RG500 ECU system. PropCopter. Prop CanSat. Prop Paste Gun.
    Suzuki RG500 in a RGV 250 frame.
    Bimota V-Due (Running on the fuel injection system)
    Aprilia RS250
  • parskoparsko Posts: 501
    edited 2009-04-02 19:39
    The regular 32bit divide routine takes ~208 clocks for 32 / 16 = 16.16
    The new 64bit divide routine takes ~676 clocks for 64 / 32 = 32

    Just wanted to point this out for anyone referencing this post in the future.

    TJHJ, if you know what kind of math you're doing (aka converting something like temperature repetatively), you can use the 32 bit routine without losing much accuracy. I've done this with a few different sensors. It all depends on just how accurate you need the divide. But, at 676 cycles, it's nice to have the 64 bit divide.

    -Parsko
  • parskoparsko Posts: 501
    edited 2009-04-02 19:42
    This is also quite convenient cause it will gladly divide a 32bit number by a 32bit number!!!
  • TJHJTJHJ Posts: 243
    edited 2009-04-02 23:29
    Hang on a second here, so maybe Im thinking about this wrong, but why would a 32 bit divided by a 32 bit number yield a 64 bit number?

    Example
    Worst case is 2^32/1 = 2^32 a 32 bit number. And the simplest cast is 2^32/2^32 = 1.

    I cant think of a case that causes a result larger than 32 bits if its only for devision because of the fact its integer math.

    Maybe Im looking at this wrong. Or is it just the fact we need a 64 bit register to start with?

    Parsko,
    Thanks, Its a balance of accuracy versus time, if I can use the full 64 bit routine, why not keep the accuracy, if I cant make it crunch in time then Ill sacrifice the accuracy for the 32 bit routine.
    TJ

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    I owe everyone here a bunch, So thanks again for answering my dumb questions.
    Projects. RG500 ECU system. PropCopter. Prop CanSat. Prop Paste Gun.
    Suzuki RG500 in a RGV 250 frame.
    Bimota V-Due (Running on the fuel injection system)
    Aprilia RS250
  • TJHJTJHJ Posts: 243
    edited 2009-04-03 00:36
    I just ran this in ICCV7 To see how long it would take.

    Also just for the record ICCV7 C took 1008.

    So Your choices are

    32bit/16 bit at 208 clks

    32/32 signed at 676 clks

    ICCV7 At 1008 clks

    And Spin at 1600-1604



    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    I owe everyone here a bunch, So thanks again for answering my dumb questions.
    Projects. RG500 ECU system. PropCopter. Prop CanSat. Prop Paste Gun.
    Suzuki RG500 in a RGV 250 frame.
    Bimota V-Due (Running on the fuel injection system)
    Aprilia RS250
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-04-03 05:58
    TJHJ said...
    ...why would a 32 bit divided by a 32 bit number yield a 64 bit number?
    It doesn't. Brian was thinking of multiplication.

    -Phil
  • Brian FairchildBrian Fairchild Posts: 549
    edited 2009-04-03 06:31
    Phil Pilgrim (PhiPi) said...
    TJHJ said...
    ...why would a 32 bit divided by a 32 bit number yield a 64 bit number?
    It doesn't. Brian was thinking of multiplication.
    Actually, I was thinking of a 32-bit integer dividend and a 32-bit integer divider producing a 64-bit fractional result. smile.gif
  • Jack CrenshawJack Crenshaw Posts: 46
    edited 2009-04-04 15:21
    PhiPi said...
    Here's the code I deleted, but with some preconditioning to check for invalid operands. Attached is the same with a Spin wrapper for testing.


    '
    [noparse][[/noparse]* Divide 64 / 32 = 32 ]

    ' rb := ra:rb / rx, where ra < rx.

    divabx cmpsub ra,rx wc,nr 'Check to make sure ra < rx. Carry if not.
    if_c jmp divabx_ret 'Return with error in carry.

    Actually, Phil, ra doesn't have to be smaller than rx. It only has to be smaller than twice rx,
    so when you compute ra - rx, the high bit is always zero.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-04-04 16:22
    Jack,

    Consider this example, though: ra:rb = $0000_0001:0000_0000, rx = $0000_0001 (i.e. ra = rx). The quotient ($1_0000_0000) overflows 32 bits. I believe this would be the case any time ra > rx.

    -Phil
  • Tracy AllenTracy Allen Posts: 6,664
    edited 2009-04-04 16:35
    TJ, in the context of interpolation in a table, there are specific conditions met. It comes down to c * a/b, where a/b is less than unity. In many interpolation problems involving lookup (as opposed to lookdown), b is a constant, the spacing between table entries.

    I think Phil pointed out above the possibility of a 32 * 32 / 32 "fractional" multiply, which can be faster than a genera purpose division.

    In terms of interpolation, you enter with a value x, and then from lookup or lookdown operation find that it lies between two entries xm and xn, where xm = x <= xn. You grab the corresponding values ym and yn. Then, by linear interpolation,
    y = ym + (yn - ym) * (x - xm) / (xn - xm)

    The term on the right after the + is of the form c * a/b with a/b<1.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Tracy Allen
    www.emesystems.com
  • TJHJTJHJ Posts: 243
    edited 2009-04-05 02:31
    Thanks for the thought Tracy,
    Im having a hard enough time keeping the math in the original formula context, let alone trying to convert it for efficiency. Im trying to get it working 1st.
    Its a monster, I am working on an intelligent step estimator that takes a know data set from any two sensors, compiling them into a 3d table based on time. Then based on the previous data set results develop a least squares 6th order equation defining the system. Make the system move to the next point most likely containing the goal based on the predicted fit of the spline. Rinse and repeat, ditching outliers that are no longer be prevalent to the current data.
    If I have the math correct it would basically be a self tuning PID system capable of accepting multiple inputs and balancing them automatically to achieve the desired goal. Ideally its a pid loop that is not a wave, and therefore correctly responds to each change in conditions with a very close estimate of its desired goal.

    Who knows, if it works NO MORE PID TUNING [noparse]:)[/noparse] or Non stable loops (That in my case tend to tear a bunch of stuff up always, why do they always go unstable when your not watching them?)

    TJ

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    I owe everyone here a bunch, So thanks again for answering my dumb questions.
    Projects. RG500 ECU system. PropCopter. Prop CanSat. Prop Paste Gun.
    Suzuki RG500 in a RGV 250 frame.
    Bimota V-Due (Running on the fuel injection system)
    Aprilia RS250
  • Tracy AllenTracy Allen Posts: 6,664
    edited 2009-04-05 16:28
    Wow! That's a much much bigger monster than I imagined. Good luck!

    Nonetheless, the operation of taking proportions right up there with multiplication and division. It is indeed an optimization, because the original terms in the formulas have to be evaluated a priori to see where a proportion is involved. Then adjust the order of operations to avoid double precision, that is, calculate (a/b)*c rather than (c*a)/b. Well, it depends too on how much precision the problem really needs.

    PRI ratio(a, b, c) : f            ' calculate c *a/b 
                                    ' enter with a<b; 0<=a<2^30, 0<b<=2^30 (positive)
                                    ' c can be any valid twos complement number.
      repeat 31                      'calculate result such that  result / (2^31) = a / b , approximates a/b
        a <<= 1
        result <<= 1
        if a => b
          a -= b
          result++    
      result :=  c ** result * 2    ' return this value, division by 2^32 is implicit in **
                                      '  for smaller values |c|<2^30, the *2 can precede the **.
    
    



    In spin, ** is a double precision operation, but you get to hand that off to the interpreter.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Tracy Allen
    www.emesystems.com
  • Jack CrenshawJack Crenshaw Posts: 46
    edited 2009-04-07 19:58
    PhiPi said...
    Jack,

    Consider this example, though: ra:rb = $0000_0001:0000_0000, rx = $0000_0001 (i.e. ra = rx). The quotient ($1_0000_0000) overflows 32 bits. I believe this would be the case any time ra > rx.

    It doesn't work that way, Phil. Or at least, it shouldn't. rx should always be left justified, so that its high bit is always
    a one. With that arrangement, rx ends up as $1000_0000. Even if rx = $1111_1111, subtracting rx gives
    $0111_1111. The high bit has been reset, the 1 has been recorded as the first bit of the result, and the process
    can continue.

    This is nothing new. Remember that in ordinary long division, we had to "point off" the divisor so that it was as large
    as possible and still "go into" the dividend. The binary algorithm works exactly the same way, except that the trial
    divisions are much easier because the partial quotient is always either a 1 or a 0. You never have to back up to a
    smaller quotient digit, as sometimes happens with decimal division.

    Jack
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-04-07 20:25
    Jack Crenshaw said...
    rx should always be left justified, so that its high bit is always a one.
    ??? Are you sure you're not thinking about floating point? In unsigned integer 64/32=32 division, anytime the most significant long of the dividend is greater than or equal to the divisor, you will get an overflow (i.e. the quotient won't fit in 32 bits).

    -Phil
Sign In or Register to comment.