I was just responding to Leon's incorrect statement that a "Four cycle divide isn't possible". hinv is the one that suggested the four cycle divider along with a four cycle multiplier. Personally, I don't think a fast divider is worth the amount of silicon it would require. I believe Prop 2 will have a single-cycle 16x16 multiplier, which will be very useful. A 32-bit multiply could be implemented with four 16-bit multiplies. I believe the Prop 2 will also have multi-cycle macro instructions that will implement 32-bit multiplies and divides.
Mike Green,
There is indeed a 16x16 with 32bit result multiply. That doesn't exclude a 32x32 with 64bit result multiply. Nor does it exclude a 32bit divide.
I've just been playing with some assembler maths stuff on the PIC32 (MIPS32 core) using the MPLAB simulator. A 32 bit multiply with a 64 bit result takes one clock, as I expected, but a 32 bit divide (32 bit quotient and 32 bit remainder) takes 15 clocks. Here is my test code:
#include <p32xxxx.h>
.global main
.data
var1: .word 2
var2: .word 8
.text
.ent main
main:
loop:
nop
lw $t1,var1
lw $t2,var2
add $t0,$t1,$t2 # $t0 = $t1 + $t2
mult $t1,$t2 # (Hi,Lo) = $t1 * $t2
div $t2,$t1 # Lo = $t2 / $t1 Hi = $t2 mod $t1
nop
j loop
.end main
It took me a couple of hours to work out how to write a standalone MIPS32 assembly language program, everyone seems to use C! The assembly language is rather nice when one gets into it, but is a lot harder to use than Propeller assembler. We have yet to see what the equivalent operations on the Propeller II will look like, of course.
In case anyone asks, mult and div place the results in a pair of dedicated Hi and Lo registers.
MIPS assembly... mmm... it was always nice and clean and had delayed slots . 15 clocks for a 32/32 is rather good. Sadly we don't get 8 of them in one package
I thought it would be useful to have something for comparison.
It's not actually as good as I thought; it takes 22 clocks in some circumstances, which is weird. I found the reason in a MIPS document: it uses an iterative algorithm and depends on the size of the operand (8/16/24/32 bits). The 5 stage pipeline complicates things, as well.
Eight MIPS cores could go into an FPGA, of course.
Comments
Who said we don't get a 32x32 multiply?
Last I remember, it was Chip that said we would have a 16x16 unsigned multiply with a 32 bit result.
There is indeed a 16x16 with 32bit result multiply. That doesn't exclude a 32x32 with 64bit result multiply. Nor does it exclude a 32bit divide.
It took me a couple of hours to work out how to write a standalone MIPS32 assembly language program, everyone seems to use C! The assembly language is rather nice when one gets into it, but is a lot harder to use than Propeller assembler. We have yet to see what the equivalent operations on the Propeller II will look like, of course.
In case anyone asks, mult and div place the results in a pair of dedicated Hi and Lo registers.
It's not actually as good as I thought; it takes 22 clocks in some circumstances, which is weird. I found the reason in a MIPS document: it uses an iterative algorithm and depends on the size of the operand (8/16/24/32 bits). The 5 stage pipeline complicates things, as well.
Eight MIPS cores could go into an FPGA, of course.