Shop OBEX P1 Docs P2 Docs Learn Events
I'm having trouble finding the brief info Chip posted on the math instructions — Parallax Forums

I'm having trouble finding the brief info Chip posted on the math instructions

potatoheadpotatohead Posts: 10,254
edited 2013-04-11 08:30 in Propeller 2
Anyone bookmark that? I didn't. Pretty sure I saw a post with some brief info on how to use those instructions.

Comments

  • potatoheadpotatohead Posts: 10,254
    edited 2013-04-10 23:31
    Found it!
    Here is some quick documentation about the big multiplier, divider, and square root circuits:

    As an aside, the MUL instruction returns immediately with the result. There is also a SCL instruction which scales by a max value of 2, with many fractional bits.

    There are separate large divider and multiplier circuits which take ~16 clocks each.

    To do a 32x32 multiply (A*B):

    SETMULA D/#n 'provide 1st term for signed multiply (use SETMULU for unsigned multiply)
    SETMULB D/#n 'provide 2nd term and start multiply
    (~16 clocks, if you want to do anything here)
    GETMULL D 'get lower-long result, waits if not ready yet
    GETMULH D 'get upper-long result, waits if not ready yet

    You can also poll GETMULL/GETMULH D WC, where C=1 and D=result if done. This is useful for multi-threading, where you don't want to stall the pipeline.

    To do a 32/32 divide (A/B):

    SETDIVA D/#n 'provide dividend term for signed divide (use SETDIVU for unsigned divide)
    SETDIVB D/#n 'provide divisor term and start divide
    (~16 clocks, if you want to do anything here)
    GETDIVQ D 'get quotient long result, waits if not ready yet
    GETDIVR D 'get remainder long result, waits if not ready yet

    You can also poll GETDIVQ/GETDIVR D WC, where C=1 and D=result if done.

    To do a 64/32 divide, do an extra SETDIVA/SETDIVU to provide the upper long of the dividend.

    To compute a 16-bit root of a 32-bit unsigned input:

    SETSQRL D/#n 'set 32-bit unsigned input and start computation
    (~16 clocks, if you want to do anything here)
    GETSQRT D 'get square root, waits if not ready yet

    You can also poll GETSQRT D WC, where C=1 and D=result if done.

    To compute a 32-bit root of a 64-bit unsigned input:

    SETSQRH D/#n 'set upper long of a 64-bit unsigned input
    SETSQRL D/#n 'set lower long of a 64-bit unsigned input and start computation
    (~32 clocks, if you want to do anything here)
    GETSQRT D 'get square root, waits if not ready yey
  • SeairthSeairth Posts: 2,474
    edited 2013-04-11 05:39
    Are those clock estimates the maximums or the averages?
  • AribaAriba Posts: 2,682
    edited 2013-04-11 08:30
    Seairth wrote: »
    Are those clock estimates the maximums or the averages?

    Both, and also the minimum ;-) The cordic engine takes a fixed number of clocks. For divide it depends on the bitsize (32/32 or 64/32 bits).

    My observation is that the codic process needs 17 cycles, for MUL and DIV(32/32). But you need 2 additionals instructions to fill the start values and 1 instruction to read the result, makes 20 cycles for a multiplication (21 for a 64bit result).

    But it's much better then that: The codic engine works in parallel to the CPU, so if you have written the start values with SETMULx you can execute other instructions and read the result minimum 17 cycles later. So with optimzed code you need to count only 3..4 cycles for a multiplication!

    The MUL / SCL instructions takes 2 cycles, and use a hardware 20x20 bit multiplier (signed). SCL is normalized to 19bit, that is 1.0 is $40000. This allows factors of -2.0 to +2.0 (-1 LSB).

    Andy
Sign In or Register to comment.