Propeller II MAC instruction
BradC
Posts: 2,601
G'day Chip,
I've had a pretty good root around all the information currently available on the MAC instruction proposed for the new chip, and from what I've been able to ascertain it's a 16x16 bit multiply?
The reason for this post is to query the accumulator size. I've been making heavy use of the MAC on another small processor recently for digital audio filtering and this chip has a 40 bit accumulator. I've found it absolutely essential to have the extra bits available to prevent overflow when using a Direct Form - 1 Bi-Quad filter. Is the new chip going to have any facility for an accumulator > 32 bits?
The other "feature" that is really nice is selectable automatic writeback saturation, where you write the contents of the accumulator back to a user register and it can automatically saturate to that register size rather than truncating the high bits and causing a wraparound. This seems pretty essential when the accumulator is larger than the native word size.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life may be "too short", but it's the longest thing we ever do.
I've had a pretty good root around all the information currently available on the MAC instruction proposed for the new chip, and from what I've been able to ascertain it's a 16x16 bit multiply?
The reason for this post is to query the accumulator size. I've been making heavy use of the MAC on another small processor recently for digital audio filtering and this chip has a 40 bit accumulator. I've found it absolutely essential to have the extra bits available to prevent overflow when using a Direct Form - 1 Bi-Quad filter. Is the new chip going to have any facility for an accumulator > 32 bits?
The other "feature" that is really nice is selectable automatic writeback saturation, where you write the contents of the accumulator back to a user register and it can automatically saturate to that register size rather than truncating the high bits and causing a wraparound. This seems pretty essential when the accumulator is larger than the native word size.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life may be "too short", but it's the longest thing we ever do.
Comments
Oh, I'm not suggesting 16x16 is inadequate (although it's about the bare minimum you want for reasonable quality digital audio work), I'm just hoping the accumulator (the register that all MAC multiplies get added into) has an extension past 32 bits.
Let's say for example you have three MAC and 2 MSC instructions (a pretty basic implementation of a biquad filter).
Your first three MAC instructions (16x16) can generate a 32 bit result each, resulting in overflow of a 32 bit accumulator. Your next 2 MSC instructions (multiply and subtract) also generate 32 bit results. The end result is a number that will comfortably fit in a 32 bit register, but if your accumulator is 32 bits you have lost information in the actual filter process as your first couple of intermediate results have been truncated.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life may be "too short", but it's the longest thing we ever do.
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Is there an inexpensive way to get started playing with the XMOS chips?
Do you get paid to constantly plug XMOS when unsoliced?
Doug
SFE has a nice little XS1-L1-64 XMOS board for $49.
I use both Propeller and XMOS chips, depending on the application. They don't really compete with each other, they are intended for completely different markets.
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Post Edited (Leon) : 1/11/2010 5:25:25 AM GMT
There is an XMOS forum (http://www.xmoslinkers.org/) but it doesn't seem to get much traffic. I guess the XMOSer's spend most of their time on forums like this one.
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
www.xcore.com
A 32-bit MAC with a 32-bit result, as is available on some ARM chips, would be a nice compromise.
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Post Edited (Leon) : 1/11/2010 5:41:50 AM GMT
It's telling that over Christmas and New Year this forum was humming with activity but over there was, well, quite.
Now the thing is that when we had those long threads about new Prop II features I would sometime suggest features that were included in the Inmos Transputer of the 1980's which was a very "out of the box" design for it's time. The Transputer is dead and gone but now, decades later, parallelism in embedded systems is back in the form of the Propeller and the XMOS. That's why that name comes up here from time to time.
I still thinks there are some ideas Prop II could adapt from the Transputer and now XMOS that continues in it's foot steps.
I also believe Chip can teach XMOS a thing or two about making a usable device [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
cmapspublic3.ihmc.us:80/servlet/SBReadResourceServlet?rid=1181572927203_421963583_5511&partName=htmltext
Hello Rest Of The World
Hello Debris
Install a propeller and blow them away
Exactly.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
sends the value of expression over a channel to another process.
receives a value from the channel and stores it in the variable.
It would be damn nice to have such high speed communication between COGs sometimes. No going through HUB RAM.
I'm sure the Props idea of locks could be extended into channels. Two COGs wanting to communicate would "check out" a channel, rather like they do locks now, and then swap longs through it.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Look, don't get me wrong, I'd wet my pants over a 32x32 MAC. Even a 24x24 (which is what you get on the coldfire with a 56 bit accumulator) would make me really, really excited and completely remove the need for me to be even looking at outboard DSP chips. Having said that, I'm managing with 16x16 with the dsPIC at the moment (although the low resolution does make you jump through hoops with regard to noise shaping).
The real beauty of the PIC MAC is it can do a MAC, 2 pre-fetches and a saturated writeback in every instruction cycle. That makes filtering pretty damn efficient, even if you do have to jump through hoops to save intermediate results for noise shaping.
If the XMOS does a 32x32 multiply and only has a 64 bit accumulator I'd reckon that was a pretty poor design decision..
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life may be "too short", but it's the longest thing we ever do.
Speaking as an old 8 biter, 32 bits seems huge and 64 bits unimaginably so.
Now maybe I'm missing a point but when dealing with any real world quantities, like audio samples, 24 bits is the limit of sensible resolution nowadays. So isn't multiplying 24 by 24 into 48 and accumulating into 64 bits enough?
"They" may have made a poor design choice, on the other hand "They" seem to be doing quite well at handling multiple streams of 24bit audio, filtering etc etc.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
24x24 into 64 would be great. If you are using 16x16 into 32 you may as well not actually have a MAC instruction and just do multiply and multi-word addition separately.
16x16 into 40 would do the job.
It comes down to the resolution of the filtering you are doing as to how close the poles are and therefore the size of the coefficients. 16 bit is really Q15 + sign, so your coefficients are only 15 bits of magnitude. As an example, I'm doing some work at ~78Khz. Using a standard butterworth LPF biquad I run out of resolution below about 600hz when I set my coefficients up. Again, because of the low resolution the coefficients need to be < 1. Where I have a coefficient > 1 I need to use multiple MAC instructions to accommodate the extra gain.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life may be "too short", but it's the longest thing we ever do.
Given that 16 bits is a signal to noise ratio of 98db and that we are working with 24 bits accumulating into 64, I would have guessed that any noise introduced by the filtering calcs was nigh on invisible. At least in audio work.
One day I might want to pick your brains re: a 3-way digital cross over algorithm for active loudspeakers I have had running on a PC for some time. I had hoped it would be doable on the Prop so as to make a small stand alone unit. As I say I know nothing of DSP, I just wrote my best simulation of a famous opamp based design and it works very well. It's a surprisingly simple and short piece of code. Just need to get away from floating point to fixed point[noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
And then you lose those extra bits off to the right causing quantization errors and more noise.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life may be "too short", but it's the longest thing we ever do.
I wrote quite an extensive post from here down, but deleted it as the original question stands. No point in any more speculation.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life may be "too short", but it's the longest thing we ever do.
The problem with the opening question is that every one of the 512 registers in a Cog is considered an accumulator in the general microprocessor sense. And 32 bits is their size. You are asking a big ask to change that.
That said, I guess to have the best performance, Chip will possibly be adding a special non-addressable register just for the MAC's accumulation. In which case there is nothing stopping it even being 64 bits. [noparse]:)[/noparse]
Not really. If you do what the dsPIC does with its standard multiply results, you use any destination of a multiply as 2 paired registers. You must specify an even register for the destination, and the destination+1 automatically becomes its partner. Now, you only need logic to add or subtract to the destination. Destination +1 is simply the carried result of the destination arithmetic. Sure 64 bits is overkill, where 40 would do, but doing it this way just uses pairs of registers. I'd have thought the logic to be a bit simpler.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life may be "too short", but it's the longest thing we ever do.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life may be "too short", but it's the longest thing we ever do.
The max value in the move instruction could also be used for the final shifter/truncator.