Does the propeller do multiply/divide in hardware?

webmasterpdx · 2009-09-01 08:14

Does the propeller have a hardware multiply and divide in the processors?

Thanks
-D

MagIO2 · 2009-09-01 08:28

No

.... that was easy ;o)

webmasterpdx · 2009-09-01 08:35

That sucks. Any hardware assist?
How about a barrel shifter to shift any number of bits in one instruction time? That could speed up a multiply algorithm.

BradC · 2009-09-01 08:39

webmasterpdx said...
That sucks. Any hardware assist?
How about a barrel shifter to shift any number of bits in one instruction time? That could speed up a multiply algorithm.

Yes.

www.parallax.com/Portals/0/Downloads/docs/prod/prop/WebPM-v1.1.pdf

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lt's not particularly silly, is it?

Phil Pilgrim (PhiPi) · 2009-09-01 08:45

webmasterpdx,

Here's a download page containing documents that will answer most questions you may have about the Propeller: www.parallax.com/tabid/442/Default.aspx. If, after reading them, you have furhter questions, please do not hesitate to pose them here.

Thanks,
-Phil

heater · 2009-09-01 08:55

Didn't we already answer this one? Whatever.

There are lots of fast multiplies in PASM for the Prop floating around here, 8, 16, 32 bit .

For example Cheetah Integer Division :http://forums.parallax.com/showthread.php?p=828437
and Kenyan Multiplication http://forums.parallax.com/showthread.php?p=828096

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

webmasterpdx · 2009-09-01 09:02

Thanks to all who have replied....
Got the answers I need, plus some interesting algorithms I wasn't familiar with.....I love playing around with that kind of stuff.

Will have to check out the fast multiply/divides presented above.

Thanks again.
-Donald

heater · 2009-09-01 09:22

Yeah. That Kenyan thing really got be when it turned up here recently. Never heard of such a thing before.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Nick Mueller · 2009-09-01 10:09

@thoufiq987:
Are you posting here just to drop your SPAM-links?

Or does your Prop have Cellulite?

Nick

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Never use force, just go for a bigger hammer!

The DIY Digital-Readout for mills, lathes etc.:
YADRO

Kye · 2009-09-01 11:15

Spam

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nyamekye,

heater · 2009-09-01 11:16

Machine Intelligence just got something working!!!

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Agent420 · 2009-09-01 11:21

^ I don't care who you are, that there's funny

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

webmasterpdx · 2009-09-01 15:30

I'm not sure who the tape recorder is....but I have another question about multiply/divide.

Has anyone used the huge log table to do fast multiply. If the table is 2K that's 12 bit multiply and divide converted into an add and subtract respectively.

Multiply should be:

lx = LOG2(x)
ly = LOG2(y)
lxy = lx + ly
xy = ALOG2(lxy)

The only issue is to get the log and antilog you'd have to wait for the hub to get to that COG.

Question: How many INSTRUCTION (4x) cycles does it take from one hub appearance to the next for a specific COG. Also, How many instructions does the HUB remain with a COG before moving on.

Thx
-D

Agent420 · 2009-09-01 15:59

Page 382

Page 24

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

webmasterpdx · 2009-09-01 17:58

A couple of good and bad points based on the manual. I had trouble downloading that manual. Eventually, I had to right click and download the link instead of trying to view it in my browser. I think there is a vista problem with wireless and big files in that in some situations it seems to hang when downloading files. Anyway, when I downloaded it I was able to store a local copy of the manual which seems to run fine now.

What I was very pleasently surprised with was the fact that the propeller allows one COG to begin it's HUB operation while another COG can begin it's HUB operation 2 cycles later (regardless of whether it takes 7 cycles to complete the first COG's operation. So basically, it's all pipelined and every COG get's access to the hub to ececute a hub instruction every 16 clocks, or 4 instruction cycles. It can also run 2 local instructions between each hub instruction without missing a beat. Pretty efficient. This means that almost no time is spent waiting on the hub.

OK, on to the multiply. Unfortunately, using logs won't work for a quick integer multiply as you have to take account of decimal exponents. It's OK for fixed point math, but for byte multiplies you need exact results typically.....so it's not the best way to proceed.

There are some ways to speed up multiply. e.g. You could use 4 bit multiplies using a 256 byte table. Then a byte multiply can be done quickly by combining nibbles together and using the result to look up that table, shifting values left 4 bits and adding these numbers together, etc...
I'll need to think on this. I like the new algorithms, but we can always use a faster algorithm....

Thx
-D

Leon · 2009-09-01 19:34

An 18 pin 40 MIPS PIC24 could be used as a maths co-processor. It'll do a 17x17 bit multiply in one clock, and has fast divide hardware. They are cheap.

Leon

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle

Phil Pilgrim (PhiPi) · 2009-09-01 19:45

The problem with a coprocessor, of course, is the time required to get the arguments in and the results out, especially with a limited number of pins. My guess is that, compared with software multiply/divide, it's a push.

-Phil

Leon · 2009-09-01 20:19

The bigger 28 pin version has a full 16-bit port available with DMA. The main problem is how many pins can be made available on the Propeller. Some of the larger chips have an 8-bit bidirectional port intended for very fast inter-chip data transfers.

Leon

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle

webmasterpdx · 2009-09-01 21:40

The whole idea though is to be able to do all this in the one chip.
I know from my past that if you have a multiply you can use this to calculate divide quickly, so just a hardware multiply would have been nice. Maybe in the next rev

I'd like to see a byte multiply in under 5 instructions using a combination of lookups and algorithm, or a 16-bit multiply in under 12 instructions.
I need to look more at the log table to see if it could be used more efficiently....
-D

webmasterpdx · 2009-09-01 22:03

OK, heres an idea. I don't know the propeller well enough yet to tell how to do this....but I suspect the exponent example in page 323 is not optimized at all.

To get log2 of a 32-bit integer, you first get the location of the MS bit and that gives you a value from 0-31 which is the integer part of the log2 function (fits in 5 bits). This is placed in bits 20-16 of the result (high 16 bits). Then you take the next 11 bits below the MS bit and index this into the log table. The 16 bit value returned is placed in bits 15-0 (low 16 bits) of the result. And you are done! Thats the log2(x).
The example they give seems to be way too long for that.
If there is an assembler command to tell you the MS set bit or something like that could give you that integer part. The fractional part (to the right of the decimal point) can be simply got by taking the 11 bits below the MS bit and indexing into the log table.

I suspect that using this and by finding a quick way to do the calculations above, that this way of doing multiply will give you accurate values for byte calculations and possibly up to 11 or 12 bits with 100% accuracy.
The trick is doing this calculation very quickly.

Be interested in seeing:
1. How many bits it can do the calculation without any loss.
2. How many cycles (minimum) can it be done in.

-D

Kye · 2009-09-01 22:43

Ask cnesspilot (Author of the kenyan integer multiplication stuff). He's the one who was interested in very fast multiply code.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nyamekye,

Phil Pilgrim (PhiPi) · 2009-09-01 22:44

The Prop I does not have an encode instruction (although, as with multiply, there's a place reserved in the instruction set for one). The Prop II will have hardware multiply, but I'm not sure about the encode instruction.

-Phil

webmasterpdx · 2009-09-02 03:49

I'm not sure who to ask, but a bit reverse operation and digit reverse might be very useful for fast addressing in FFT calculations (just something I remember from my old supercomputer days).
A digit reverse is where each pair of bits is taken as a digit and the address is reversed with that in mind (for radix 4 FFTs).

-D

Mike Green · 2009-09-02 03:54

The Prop does have a bit reverse instruction that reverses a specified number of low order bits in the destination operand.

webmasterpdx · 2009-09-02 05:09

cool. I couldn't find that in the manual.

Thx
-D

Mike Green · 2009-09-02 05:13

The instruction is REV and the equivalent Spin operator is "><".

webmasterpdx · 2009-09-02 05:16

Page 163 under Bitwise Reverse.

Phil Pilgrim (PhiPi) · 2009-09-02 05:31

Donald,

The best chart of assembly intstructions is in Chip's Propeller Guts (pdf) paper. The chart in the manual doesn't say what the instructions do; the one in Chip's paper does. ('Not sure why they left that out of the one in the manual. I refer to Chip's version constantly.)

-Phil

Does the propeller do multiply/divide in hardware?

Comments