fast int sqrt for the prop

VIRAND · 2008-08-28 00:02

I'm surprised no one used the LOG Tables!

Tracy Allen · 2008-08-28 02:06

Chip, I'd very much like the ratio computation, right up there on a priority level with sqrt. You did that in hardware as a separate circuit?

The way I have been doing it in pasm is the following and I wonder in the spirit of this thread if the code can be shortened.

'  convert ratio y / x, where y < x, into a binary fraction f 
' enter with y, x:{y > x, x < 2^31, y <= 2^31}
' exit with binary fraction, f,   where f is the numerator of the approximation, y /x = f / (2^n).
' 
fractyxb              ' enter here , n contains number of bits, usually will be 32 bits for frqx calc.
    mov f,#0    ' result
:loop
    shl y,#1        ' double the value of y
    cmpsub y, x   wc,wr  ' if y>=x then y:=y-x, carry:=1, else a unchanged, carry:=0
    rcl f,#1        ' shift in carry, double the estimate of f, appoximation f/(2^n) =~ y/x
    djnz n,#:loop    ' do all bits
fractyxb_ret    ret

Here is the equivalent in spin, which can then be used with the ** operator when the number of bits=32 for calculating the trinary
result = z * y / x. (result := f ** z)

PRI fractyxzb (y, x,  b) : f
'' calculate f = y/x * 2^nb
'' enter with y < x, and b=number of bits in approximation 
'' b is number of bits
'' same as f: y/x = f / (2^nb) < y/x < (f+1)/(2^nb), 
'' that is, f over a power of two is a close appoximation to the original fraction.
  repeat b                      
    y <<= 1
    f <<= 1
    if y => x    '  
      y -= x
      f++

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

cgracey · 2008-08-28 03:57

Tracy,

I think the only way that routine can get sped up is by straight-lining each iteration. In the next Propeller you could do this:

··········· rep···· [noparse][[/noparse]32,3]····· 'repeat 3 instructions 32 times

··········· nop················ 'must execute two instructions here

··········· nop

··········· shl···· x,#1······· 'begin 3-instruction block

··········· cmpsub· x,y····· wc

··········· rcl···· q,#1······· 'total cycles = 3 + 3*32 = 99

With the divider circuit, you'd do this:

··········· setdivx x·········· 'set x

··········· setdivy y,[noparse][[/noparse]32]····· 'set y, begin 32 iterations

··········· (nop)·············· 'you could execute up to 31 instructions here

··········· getdivq q·········· 'get result (waits), total cycles = 2 + 31 + 1 = 34

So, it's ~3 times faster, but works in the background, so in a way it only takes 3 instructions (33x faster). This is the way the CORDIC works, too.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

Post Edited (Chip Gracey (Parallax)) : 8/28/2008 4:03:09 AM GMT

Ale · 2008-08-28 08:31

In a way how many FPUs work, I mean in the background. If we could access external memory fast enough, I mean at 160 MHz we could safely use SDRAM maybe at 66 or 80 MHz, that could easily compete with some multimedia optimized processors in the type or ARM or similar. There are many multimedia players with limited resources that decode video at 160x128 or 320x240 and 24 to 30 fps (No idea if it is doable).

The SH-1 (and I think SH-2) have an iterative division that takes 32 or so instructions and cycles. but htis way is better because you have 31 instructions to do something.

In a not so related note, is there any help for pointers ?, scale, auto-post or -pre decrement/increment ? (Maybe it is moot with the speed).

What happens with consumption ? is there an estimate or something like that ?

Cluso99 · 2008-08-29 03:52

Hey... Did anyone notice this...

Chip said...

··········· rep···· [noparse][[/noparse]32,3]····· 'repeat 3 instructions 32 times

WOW !!!!!

Ale · 2008-08-29 10:26

I did. The newest way to do a loop without variable and without jump instruction. Pretty neat. I do not see the hour this chip shows up at my desk !!.
Between this and the other thread I lost the count on how many improvements this prop 2 has, many more than what it was outlined in that long thread from last year We need it !. At last something we want will be created, just for us (and some other customers [noparse]:)[/noparse] ).

fast int sqrt for the prop

Comments