64-bit integer math (PASM)

ManAtWork · 2013-12-11 03:11

I found out that sometimes it's easier and more efficient (code size) to use 64 bit math instead of trying to fit everything into 32 bits with lots of shift operations. In this thread I'd like to share my solutions for simple math routines in assembler. Suggestions for improvements/optimisations are welcome.

64 bit add and subtract are easy and are documeted in the propeller manual.

' unsigned x:= x+y (or also signed if flags don't care)
    add  xl,yl wz,wc
    addx xh,yh wz

' signed
    add xl,yl wz,wc
    addsx xh,yh wz

' unsigned x:= x-y
    sub xl,yl wc,wz
    subx xh,yh wz   ' signed: use subsx

If you have to add a signed 32 bit value to a 64 bit long you can use this:

    move temp,y
    sar  temp,#31    ' sign extend
    add  xl,y wc
    addx xh,temp

However, abs and neg are a bit more difficult.

            abs  xh,xh wc,wz
if_c        neg  xl,xl wz
if_c_and_nz sub  xh,#1
if_nc_and_z mov xl,xl wz   ' test for Z-flag (optional)

      neg  xl,xl wz
      neg  xh,xh wc
if_nz sub  xh,#1
if_z  mov  xh,xh wz   ' test for Z-flag (optional)

ManAtWork · 2013-12-11 07:37

Here is a 64:32 fixed point division. xh is the dividend "whole part" and xl the "fractional part". x is divided by y, the quotient is returned in xl, the remainder in xh.

              mov    count,#32
              shl    xl,#1 wc
              rcl    xh,#1
divloop       cmpsub xh,y wc
              rcl    xl,#1 wc
              rcl    xh,#1
              djnz   count,#divloop

It is assumed that xh[31..30] and y[31] are 0. This is no problem in my case because the input has a limited range and the sign bits are always vleared. Does anybody have a division routine without this restrictions?

jmg · 2013-12-11 14:30

ManAtWork wrote: »

I found out that sometimes it's easier and more efficient (code size) to use 64 bit math instead of trying to fit everything into 32 bits with lots of shift operations. In this thread I'd like to share my solutions for simple math routines in assembler.

Cool.

One 64 bit lib that is nice to have, that can extend the use of 32 bit, is the classic scaling algorithm

R = (A *

/ C

Here A * B => 64 bit results, and then 64/32(C) applies to that result. The 64 bit value is important, but hidden from the user.

and another is Binary to BCD, see the Prop2 thread for some examples.

richaj45 · 2013-12-11 18:04

How about a 32x32 signed and unsigned multiply?

thanks
richard

average joe · 2013-12-11 21:51

Here's my 64 bit add, subtract, multiply and divide object. Not perfect, the multiply could be optimized but seems to work well enough. It does use 1 pin to signal PASM there is an operation waiting. This is to keep power consumption low, instead of staying in a rdlong loop.

{{
//////////////////////////////////////////////////////////////////////////////////
//      64 bit signed addition and subtraction methods                          //
//      Author : Joe Heinz                                                      //
//      Copyright : Joe Heinz - 2013                                            //
//      Grateful acknowledgement to kuroneko (parallax forum)                   //
//      data structure as follows :                                             //
//      VAR  long  hostLong1, hostLong2, nodeLong1, nodeLong2                   //
//////////////////////////////////////////////////////////////////////////////////
}}

DAT       '' mailboxes
pin       long    ' local copy of control pin
hPtr      long    ' local copy of host pointer
nPtr      long    ' local copy of node pointer
function  byte    ' local copy of function
cog       byte    ' cog PASM object is running in

PUB NULL        ' not a top level object
  
PUB Start(controlPin)                   '' Start math object in new cog 
{{    
//////////////////////////////////////////////////////////////////////////////////
//  controlPin is pin used to indicate there is a math operation to do          //
//  returns cog ASM engine was run in or false if no cog available              //
//////////////////////////////////////////////////////////////////////////////////
}}          
    pin   := controlPin                                   ' move controlPin to pin
    hPtr  := |< pin                                       ' decode pin into hPtr
    outa[pin] := 0                                        ' make pin low
    dira[pin] := 1                                        ' make pin an output
    cog   := cognew(@entry, @pin) +1                      ' start asm cog
    return cog                                            ' return cog
    
PUB Stop                                '' Stop Math Cog
   if cog
     cogstop(cog~~ - 1)  

PUB Addition(hostPtr, nodePtr)          '' Perform 64 bit subtraction
{{
//////////////////////////////////////////////////////////////////////////////////
//  hostPtr is pointer to least significant long of host time                   //  
//  nodePtr is pointer to least significant long of node time                   //
//  result of addition is stored in host longs                                  //
//////////////////////////////////////////////////////////////////////////////////
}}

  DoMath(hostPtr, nodePtr, "a")

PUB Subtraction(hostPtr, nodePtr)       '' Perform 64 bit subtraction
{{
//////////////////////////////////////////////////////////////////////////////////
//  hostPtr is pointer to least significant long of host time, minuend          //  
//  nodePtr is pointer to least significant long of node time, subtractend      //
//  result of subtraction is stored in host longs                               //
//////////////////////////////////////////////////////////////////////////////////
}}

   DoMath(hostPtr, nodePtr, "s")

PUB multiply(hostPtr, nodePtr)          '' Perform 64 bit subtraction
{{
//////////////////////////////////////////////////////////////////////////////////
//  hostPtr is pointer to least significant long of multiplicand                //  
//  nodePtr is pointer to multiplier                                            //
//  result of multiplication is stored in host longs                            //
//////////////////////////////////////////////////////////////////////////////////
}}

   DoMath(hostPtr, nodePtr, "m")  

PUB divide(hostPtr, nodePtr)            '' Perform 64 bit subtraction
{{
//////////////////////////////////////////////////////////////////////////////////
//  hostPtr is pointer to least significant long of dividend                    //  
//  nodePtr is pointer to divisor                                               //
//  result of division is stored in host longs                                  //
//  remainder is stored in nodelong1                                            //
//////////////////////////////////////////////////////////////////////////////////
}}

  DoMath(hostPtr, nodePtr, "d")    
  
PRI DoMath(hoPtr, noPtr, func)          '' do the math operation
                                                                                                
  hPtr     := hoPtr                                         ' move pointers to mailboxes
  nPtr     := noPtr                                         ' move pointers to mailboxes
  function := func                                          ' move pointers to mailboxes
  outa[pin] := 1                                            ' make pin high
  repeat while function                                     ' wait for asm cog to finish
  outa[pin] := 0                                            ' make pin low

DAT       org       
entry             mov       address, par                    ' get first address of passing method, hPtr                                      4    3
                  rdlong    cpin, par                       ' get pin from pointer
                  add       address, #4                     ' add long offset
                  rdword    host_ptr, address               ' read the address to get pointer to Least Significant long of host time         8   1*2   2
                  add       address, #4                     ' add a word offset to address, points to nPtr                                   4    3
                  rdword    node_ptr, address               ' read the address to get pointer to Least Significant long of node time         8   1*2   3
                  add       address, #4                     ' add a word offset to address, points to function                               4    3  

loop              waitpne   zero, cpin                      ' wait for pin to be high
                  rdbyte    funct, address                  ' read the function type *add or subtract*                                       8   1*2   4
                  rdlong    host_long1, host_ptr            ' get least significant long of host time                                        8   1*2   5
                  add       host_ptr, #4                    ' add long offset to host pointer                                                4    3
                  rdlong    host_long2, host_ptr            ' get most significant long of host time                                         8   1*2   6
                  rdlong    node_long1, node_ptr            ' get least significant long of node time                                        8   1*2   7
                  add       node_ptr, #4                    ' add long offset to node pointer                                                4    3
                  rdlong    node_long2, node_ptr            ' get most significant long of node time
                  
                  cmp       funct, #"a"   wz
      if_z        call      #addi   
                  cmp       funct, #"s"   wz
      if_z        call      #subt      
                  cmp       funct, #"m"   wz
      if_z        call      #mult                              
                  cmp       funct, #"d"   wz
      if_z        call      #div      
                  
write             wrlong    host_long2, host_ptr            ' write most significant long of result                                          8   1*2   9
                  sub       host_ptr, #4                    ' subtract long offset from host pointer                                         4    3
                  wrlong    host_long1, host_ptr            ' write least significant long of result                                         8   1*2   10
                  wrlong    node_long2, node_ptr            ' write most significant long of result
                  sub       node_ptr, #4                    ' subtract long offset from node pointer
                  wrlong    node_long1, node_ptr            ' write least significant long of result
                  wrlong    zero, address                   ' clear function to return to caller                                         8   1*2   11
                  jmp       #loop                           ' 
                  
div 
                  cmp   node_long1, zero    wz              ' check to see if divisor is zero
          if_z    mov   host_long1, zero                    ' and zero out result
          if_z    mov   host_long2, zero                    ' undefined
          if_z    jmp   #write                              ' write to buffers
           
                  ' find most significant bit shift offset of divisor < mplace = 0 to 31 >             
                  mov   mplace, #0                          ' prepare MSB place
findm             rol   node_long1, #1            wc        ' rotate divisor left 1 place and check for overflow   
          if_nc   add   mplace, #1                          ' if no overflow, add 1 to MSB place
          if_nc   jmp   #findm                              ' repeat loop until overflow
                  ror   node_long1, #1                      ' rotate back one                   
                  mov   mask, #1                            ' prepare mask
                  mov   tmp, mplace                         ' move msb shift factor to tmp for itteration count
                  shl   mask, mplace                        ' move mask by msb shift factor                  
div2              cmpsub host_long2, node_long1   wc        ' subtract divsor from dividend and watch for subtraction
                  muxc  d2, mask                            ' mux c flag to d2 at mask place
                  cmpsub tmp, #1                  wz        ' subtract 1 from shift factor watching for last division
          if_nz   shr   mask, #1                            ' shift mask right by 1
          if_nz   shr   node_long1, #1                      ' shift divisor right by 1
          if_nz   jmp   #div2                               ' and do loop again                 
                  shr   mask, #1                            ' shift mask right by 1
                  shr   node_long1, #1                      ' shift divisor right by 1                                     
                  cmpsub host_long2, node_long1   wc        ' subtract divsor from dividend and watch for subtraction
                  muxc  d2, mask                            ' mux c flag to d2 at mask place
                  sub   tmp, #1                   wz        ' subtract 1 from shift factor watching for last division
                  mov   tmp, #1                             ' move 1 to tmp for muxc
                  shl   mask, #31                           ' move mask to MSB
                  mov   temp, #32                           ' move 32 to temp for itteration count                   
div1              shl   host_long2, #1                      ' make room for next bit from LSL
                  shl   host_long1, #1            wc        ' shift next bit from LSL watching for carry
                  muxc  host_long2, tmp                     ' mux carry into MSL
                  cmpsub host_long2, node_long1   wc        ' subtract divisor from MSL, watch for subtraction
                  muxc  d1, mask                            ' mux c flag to d1  
                  sub   temp, #1                  wz        ' subtract 1 from itteration count
          if_nz   shr   mask, #1                            ' move mask to next place
          if_nz   jmp   #div1                               ' do loop again         
                  mov   node_long1, host_long2              ' move remainder in node_long1
                  mov   node_long2, #0                      ' move zero to node_long2
                  mov   host_long2,  d2                     ' move d2 to host_long2    
                  mov   host_long1,  d1                     ' move d1 to host_long1  
div_ret           ret



mult              mov       multiplier, node_long1          ' get multiplier from node_long1 and place in multiplier
                  cmp       multiplier, zero          wz    ' check for multiply by 0
if_z              mov       host_long1, zero                ' if multiply by zero just zero host_long1
if_z              mov       host_long2, zero                ' and zero host_long2
if_z              jmp       #write                          ' jump to write         
                  mov       node_long1, host_long1          ' else copy longs for repeated addidion
                  mov       node_long2, host_long2          ' copy the next long
                  sub       multiplier, #1                  ' subtract one from multiplier
:loop             call      #addi                           ' jmp to add
                  djnz      multiplier, #:loop              ' and loop
mult_ret          ret                  


subt              sub       host_long1, node_long1  wc      ' do first subtraction and write c flag                                          4    4
                  subsx     host_long2, node_long2          ' do second subtraction  
subt_ret          ret
            

addi              add       host_long1, node_long1  wc      ' do first addition and write c flag
                  addsx     host_long2, node_long2          ' do second addition
addi_ret          ret
             

zero          long 0
             
address       res
host_ptr      res
node_ptr      res
host_long1    res
host_long2    res
node_long1    res
node_long2    res
funct         res
p_cog         res 
multiplier    res

mask          res
tmp           res
temp          res
mplace        res
d2            res 
d1            res
cpin          res

{{
&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;                                                   TERMS OF USE: MIT License                                                  &#9474;                                                            
&#9500;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9508;
&#9474;Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation    &#9474; 
&#9474;files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy,    &#9474;
&#9474;modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software&#9474;
&#9474;is furnished to do so, subject to the following conditions:                                                                   &#9474;
&#9474;                                                                                                                              &#9474;
&#9474;The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.&#9474;
&#9474;                                                                                                                              &#9474;
&#9474;THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE          &#9474;
&#9474;WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR         &#9474;
&#9474;COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,   &#9474;
&#9474;ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.                         &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
}}

64-bit integer math (PASM)

Comments