jmg was suggesting there is potentially a power saving on a per counter basis when compared to having them in the Cogs. So, 32 counters on a pin-pair basis should be lighter than the 32 counters we currently have across the 16 Cogs.
In the end, we'll likely have hardware multi-tasking back, too. Sorry for all the alarm.
No, no, no! Don't go saying things like that!
No, I'm not opposed to hardware multitasking being added back in. (Well, I am, but for other reasons.) I'm opposed to you making statements like this! Be ruthless in what you cut out (for now). Get something working. And then add features back in if you want. But don't say you'll likely add something back in until you are ready to actually add it. Otherwise, all you're doing is putting that pressure back on yourself!
jmg was suggesting there is potentially a power saving on a per counter basis when compared to having them in the Cogs. So, 32 counters on a pin-pair basis should be lighter than the 32 counters we currently have across the 16 Cogs.
I don't understand the logic in that. The counters should consume the same amount of power independent of where they are located. I think putting them at the pins is just a way to move some of the power dissipation away from the cogs.
The problem with deciding on whether hubex and/or multi-tasking should be in is that there is no data about how much real-estate they take, or how much power they dissipate. We know that it's possible to implement both of them, since they exist in the P2 design. If the size and power dissipation figures were known a solid decision could be made on whether to include them. Without those figures the decision must be made on some adhoc basis.
The problem with deciding on whether hubex and/or multi-tasking should be in is that there is no data about how much real-estate they take, or how much power they dissipate.
I am also in favor of not asking Chip to bother with multi-tasking for the reasons cited above and a few others.
Propeller 2 (that's what I'm calling it for now, unless we decide on Propeller 16 or another name) will be used by lots of inventors and entrepreneurs, and their specialty will often lie in fields other than embedded design. The capabilities and needs of these customers is very different than the forum members who contribute to this thread. They might be specialists in renewable energy, medical, robotics, aeronautics, environmental measurements, etc. We will likely have ten thousand customers using 100 to 1000 units each, plus a number of big wins. This particular customer can be overwhelmed by design considerations and possibilities, sometimes so much that they think this chip isn't right for their project because it offers so much. Telling them that "you've got 16 cores to use here - no need to worry about interrupts" will take care of 98% of our users for now. But when you further explain that each core has even more capabilities, like multitasking, then they really start to wonder how the heck they'll architect their program and the discussion will quickly diverge from the joy of understanding a simple design to wondering how it all works. Tracking program flow would be a challenge for them.
When Propeller 1 first came out we did a seminar for our European distributors. These people are technical buyers, representing electronic distribution companies. They can hook it up, load and modify our code examples, and have enough skill to show their customers what a Propeller can do. For two days we went through the Propeller architecture, far too deep for most of them. While multitasking may not be new to a crowd like this, I can imagine it being too much when combined with multicore. They already have trouble envisioning and accounting for the fact any core can write to any pin, how a system clock is accessed, etc. There's no need to complicate something that we expect people to sell.
And all the time adds up, too. I have no idea how much design time it might take, but I'm sure it's a week or more to bring P3 Verilog into P2. And it'll take a week or more for Jeff to document in the data sheet. Drawings, explanations, sample code - it could quickly add up to another month.
I don't know what I don't know, and perhaps that's all I know for certain. This is my take and for now I'm sticking to it. Don't mistake me for a wet blanket, as I only believe there's a lot of sense in optimizing what we've got and refusing as much temptation as possible for now. There's also a very strong business case for making frequent iterations and design improvements, which could be every year or two in our case.
Glad to here you say all that Ken. Most of it seems spot on.
Many of us have been campaigning for simplicity for a long while. Imagine having to explain those 500 opcodes of the previous P2 incarnation to those European distributors. You would be in Europe for a lot long than 2 days !
We all want silicon, now!
I do like the idea of frequent iterations and improvements.
Is anyone going to be disappointed if we get rid of hardware multitasking in the cogs?
It's a cool feature, but does introduce jitter in tasks, depending on the instruction mix. It also takes some extra flops and logic to support properly, beyond the Z/C/PC's.
I'm thinking about the ROM_Monitor and realizing that I could code it with a single task by doing cooperative multitasking at a few time-critical points in the program. I wouldn't need hardware multitasking, after all.
No multitasking would keep the cogs very simple to understand, and keep them deterministic.
Any thoughts?
Oh, I like the way this is (or at least was) going.
I suspect that most folks, not having used co-operative multitasking, dont realize how fast and simple it is..... once you have an effective scheduler up and running.
In following the banter here and reading comments about why the co-operative approach is bad, it just seems to me that the approach is poorly understood. From the comments, most seem to have it backwards.
So, while I would probably make some use of a hardware approach if it exisited, it certainly is not a "required" feature.
And with the new instructions, I expect the P1 style scheduler's performance will be significantly faster. I'm really looking forward to making one.
So I vote for dropping the hardware approach as I believe it does not add a lot of value.
Nope, we don't have it backwards re: cooperative vs hardware scheduling. Certainly not if you are also including a software scheduler into the mix (as opposed to the coroutines you find in FullDuplexSerial for example)
Not that I'm saying hardware scheduling is essential for this chip.
I always thought that hardware multitasking made things unnecessarily complicated. IIRC, some of the coding required to make it work seemed rather absurd (that may have been cleaned up after I lost interest).
Is anyone going to be disappointed if we get rid of hardware multitasking in the cogs?
It's a cool feature, but does introduce jitter in tasks, depending on the instruction mix. It also takes some extra flops and logic to support properly, beyond the Z/C/PC's.
I'm thinking about the ROM_Monitor and realizing that I could code it with a single task by doing cooperative multitasking at a few time-critical points in the program. I wouldn't need hardware multitasking, after all.
No multitasking would keep the cogs very simple to understand, and keep them deterministic.
Oh, I like the way this is (or at least was) going.
I suspect that most folks, not having used co-operative multitasking, dont realize how fast and simple it is..... once you have an effective scheduler up and running.
In following the banter here and reading comments about why the co-operative approach is bad, it just seems to me that the approach is poorly understood. From the comments, most seem to have it backwards.
So, while I would probably make some use of a hardware approach if it exisited, it certainly is not a "required" feature.
And with the new instructions, I expect the P1 style scheduler's performance will be significantly faster. I'm really looking forward to making one.
So I vote for dropping the hardware approach as I believe it does not add a lot of value.
I am in favor of making the core logic simple. I like the idea of Hubex, but I don't like the idea of hubex and multiple tasks.
I specifically recommended against this because I know all the baggage it brings with it. It begets feature creep and makes the elegant design ugly. I really didn't like the look of the P2 after Hubex was added because it lost all of it's elegance.
The chip needs to have as few "rules" as possible. Tasking creates more "rules". Hubex creates more "rules". Multiple tasks with hubex creates yet more compound "rules".
The P1 is simple, there are few rules, basically they fall into "hub instruction" and "not hub instruction". The difference is that one takes 8-22 clocks and the other takes 4 clocks.
The P-X needs to be simple. If Hubex exists, all instructions run from Hub need to take a certain number of clocks, whether 8 or 16 or 4. Multi-tasking was an artifact of the pipeline structure of the P2, the P-X isn't pipelined, so I don't think it should have tasks.
Keep the video simple, but improved over the P1.
Keep the counters working like the P1, with PHSA and PHSB, etc. We like to have the ability to write to them in realtime to synthesize FM output. I wrote an FM transmitter program that could broadcast audio at FM broadcast frequencies. Having actual FM in addition to AM output to the pins is useful and is "free" in the sense that the counter just toggles a pin and doesn't have to communicate a value to the pin I/O circuit.
Above all else, add simplicity, not complexity. I really like the idea of "Hub" peripherals, because they are space efficient and give you a few dedicated building blocks to make stuff. Most importantly, they work in a simpler fashion, rather than multiplying the kitchen sink by n number of Cogs. The P2 rationale was to push everything into the COGs to avoid clock issues, but that just made an overly complex COG.
Hub exec IS going to be in the next chip. I've just been stalled out over the last few days getting the new instruction set nailed down. At times, all the details seem overwhelming and I think about paring it down, just to get it going again. This new memory scheme (dual-port 128x128 bits, instead of quad-port 512x32) changes a lot of things. It's hard for me to get there in one step. Asking you guys how you'd feel about dropping certain features alleviates the pressure on me. In the end, we'll likely have hardware multi-tasking back, too. Sorry for all the alarm.
Great. This 'divide and conquer' sounds an ideal way to proceed.
You can also get MHz number indicators from any Builds you do along the way, to map the impact of the changes.
jmg was suggesting there is potentially a power saving on a per counter basis when compared to having them in the Cogs. So, 32 counters on a pin-pair basis should be lighter than the 32 counters we currently have across the 16 Cogs.
Power is Cpd * Ft * Vcc^2, Cpd is the sum of the Register plus routing Loads.
So identical sized routing will have identical powers.
The scope I see for power saving in a PinCell, is that the routing tools can focus just on that locally, whilst the COG counter will have to juggle for space with all the other critical paths in the COG during the Autoroute.
Because the PinCell is not large, there may even be some manual layout assist possible, especially if needed to meet MHz targets (and usually a smaller cell results).
Keep the counters working like the P1, with PHSA and PHSB, etc. We like to have the ability to write to them in realtime to synthesize FM output. I wrote an FM transmitter program that could broadcast audio at FM broadcast frequencies. Having actual FM in addition to AM output to the pins is useful and is "free" in the sense that the counter just toggles a pin and doesn't have to communicate a value to the pin I/O circuit.
There is a backward-compatible case for COG counters, but IIRC the design flow process means the PLL is not there (Chip can confirm?), so a P1+ COG counter, will be a subset of things possible on a P1.
Also, having the wide-adder not in the Pin Cell, can make the PinCell a little smaller and faster, but at the cost of a larger overall Logic area, from the duplicated counters.
So we don't establish a new term how about naming them "standard" I/O for now if they're not all "smart" on the FPGA, if it is the case that they don't have the same characteristics?
"Standard" can mean many things, and even a Basic Logic I/O on a FPGA platform is not going to be the same as the Final Pin designs, (of any pins without Counters).
Looks good...hubexec at 50% should be just fine...it will give people incentive to think creatively in using cog memory for faster code:) Plus the option of using hubexec.
No, I'm not opposed to hardware multitasking being added back in. (Well, I am, but for other reasons.) I'm opposed to you making statements like this! Be ruthless in what you cut out (for now). Get something working. And then add features back in if you want. But don't say you'll likely add something back in until you are ready to actually add it. Otherwise, all you're doing is putting that pressure back on yourself!
I agree. If Hubexec is available it will get used. Ditto for multitasking.
But if they are not, there are simple software alternatives that can achieve most of the benefits. The omission of these hardware features won't seriously impact on most people, despite all the cries of woe and despondency you tend to get in these threads when someone's "favorite" feature appears to be under threat.
I think Chip's best course of action would be to take a week or two away from the forums and sort these things out before making any more announcements about this stuff.
Comments
No, no, no! Don't go saying things like that!
No, I'm not opposed to hardware multitasking being added back in. (Well, I am, but for other reasons.) I'm opposed to you making statements like this! Be ruthless in what you cut out (for now). Get something working. And then add features back in if you want. But don't say you'll likely add something back in until you are ready to actually add it. Otherwise, all you're doing is putting that pressure back on yourself!
I am also in favor of not asking Chip to bother with multi-tasking for the reasons cited above and a few others.
Propeller 2 (that's what I'm calling it for now, unless we decide on Propeller 16 or another name) will be used by lots of inventors and entrepreneurs, and their specialty will often lie in fields other than embedded design. The capabilities and needs of these customers is very different than the forum members who contribute to this thread. They might be specialists in renewable energy, medical, robotics, aeronautics, environmental measurements, etc. We will likely have ten thousand customers using 100 to 1000 units each, plus a number of big wins. This particular customer can be overwhelmed by design considerations and possibilities, sometimes so much that they think this chip isn't right for their project because it offers so much. Telling them that "you've got 16 cores to use here - no need to worry about interrupts" will take care of 98% of our users for now. But when you further explain that each core has even more capabilities, like multitasking, then they really start to wonder how the heck they'll architect their program and the discussion will quickly diverge from the joy of understanding a simple design to wondering how it all works. Tracking program flow would be a challenge for them.
When Propeller 1 first came out we did a seminar for our European distributors. These people are technical buyers, representing electronic distribution companies. They can hook it up, load and modify our code examples, and have enough skill to show their customers what a Propeller can do. For two days we went through the Propeller architecture, far too deep for most of them. While multitasking may not be new to a crowd like this, I can imagine it being too much when combined with multicore. They already have trouble envisioning and accounting for the fact any core can write to any pin, how a system clock is accessed, etc. There's no need to complicate something that we expect people to sell.
And all the time adds up, too. I have no idea how much design time it might take, but I'm sure it's a week or more to bring P3 Verilog into P2. And it'll take a week or more for Jeff to document in the data sheet. Drawings, explanations, sample code - it could quickly add up to another month.
I don't know what I don't know, and perhaps that's all I know for certain. This is my take and for now I'm sticking to it. Don't mistake me for a wet blanket, as I only believe there's a lot of sense in optimizing what we've got and refusing as much temptation as possible for now. There's also a very strong business case for making frequent iterations and design improvements, which could be every year or two in our case.
Ken Gracey
Many of us have been campaigning for simplicity for a long while. Imagine having to explain those 500 opcodes of the previous P2 incarnation to those European distributors. You would be in Europe for a lot long than 2 days !
We all want silicon, now!
I do like the idea of frequent iterations and improvements.
Oh, I like the way this is (or at least was) going.
I suspect that most folks, not having used co-operative multitasking, dont realize how fast and simple it is..... once you have an effective scheduler up and running.
In following the banter here and reading comments about why the co-operative approach is bad, it just seems to me that the approach is poorly understood. From the comments, most seem to have it backwards.
So, while I would probably make some use of a hardware approach if it exisited, it certainly is not a "required" feature.
And with the new instructions, I expect the P1 style scheduler's performance will be significantly faster. I'm really looking forward to making one.
So I vote for dropping the hardware approach as I believe it does not add a lot of value.
Cheers,
Peter (pjv)
Nope, we don't have it backwards re: cooperative vs hardware scheduling. Certainly not if you are also including a software scheduler into the mix (as opposed to the coroutines you find in FullDuplexSerial for example)
Not that I'm saying hardware scheduling is essential for this chip.
I always thought that hardware multitasking made things unnecessarily complicated. IIRC, some of the coding required to make it work seemed rather absurd (that may have been cleaned up after I lost interest).
If Chip puts in tasks, great. It would be great for packing up to drivers into a cog.
If he does not, that is his choice.
I've used co-operating multitasking many times... over many decades.
It is far inferior to hardware tasks. (ask XMOS <grin>)
It saves memory, it makes timing high speed signals far easier. It does not need a scheduler.
I will readily grant you that for low speed signals, toggling a ton of led's etc, cooperative is all you need.
But.
Let's take practical examples.
P1+ style cog as discussed, 100MIPS (for simple two clock cycle instructions)
hardware tasks, interleaved every instruction.
25MIPS per task
LED toggling test: 25M toggles (XOR) for each task.
bit-banged serial, half duplex >5mbps
co-operative version?
LED toggling, half the performance (xor, jmpsw), 12.5M toggles
bit-banged serial - best guess, as the interleaving has to happen at waiting for start bit edge, and every bitcell thereafter - ~2mbps max
For high speed signals,
- tasks give you a roughly 2:1 speed advantage
- much easier to write code for
- uses less cog memory
As for people will not understand it / too complex... they don't have to use it if they don't want to, heck co-operative threads are still possible.
Bottom line:
It is up to Chip
I specifically recommended against this because I know all the baggage it brings with it. It begets feature creep and makes the elegant design ugly. I really didn't like the look of the P2 after Hubex was added because it lost all of it's elegance.
The chip needs to have as few "rules" as possible. Tasking creates more "rules". Hubex creates more "rules". Multiple tasks with hubex creates yet more compound "rules".
The P1 is simple, there are few rules, basically they fall into "hub instruction" and "not hub instruction". The difference is that one takes 8-22 clocks and the other takes 4 clocks.
The P-X needs to be simple. If Hubex exists, all instructions run from Hub need to take a certain number of clocks, whether 8 or 16 or 4. Multi-tasking was an artifact of the pipeline structure of the P2, the P-X isn't pipelined, so I don't think it should have tasks.
Keep the video simple, but improved over the P1.
Keep the counters working like the P1, with PHSA and PHSB, etc. We like to have the ability to write to them in realtime to synthesize FM output. I wrote an FM transmitter program that could broadcast audio at FM broadcast frequencies. Having actual FM in addition to AM output to the pins is useful and is "free" in the sense that the counter just toggles a pin and doesn't have to communicate a value to the pin I/O circuit.
Above all else, add simplicity, not complexity. I really like the idea of "Hub" peripherals, because they are space efficient and give you a few dedicated building blocks to make stuff. Most importantly, they work in a simpler fashion, rather than multiplying the kitchen sink by n number of Cogs. The P2 rationale was to push everything into the COGs to avoid clock issues, but that just made an overly complex COG.
KISS!
-Phil
I stopped following this forum mostly too after there was much discussion about tasks too. I only gained interest when hub exec was added.
Great. This 'divide and conquer' sounds an ideal way to proceed.
You can also get MHz number indicators from any Builds you do along the way, to map the impact of the changes.
Power is Cpd * Ft * Vcc^2, Cpd is the sum of the Register plus routing Loads.
So identical sized routing will have identical powers.
The scope I see for power saving in a PinCell, is that the routing tools can focus just on that locally, whilst the COG counter will have to juggle for space with all the other critical paths in the COG during the Autoroute.
Because the PinCell is not large, there may even be some manual layout assist possible, especially if needed to meet MHz targets (and usually a smaller cell results).
There is a backward-compatible case for COG counters, but IIRC the design flow process means the PLL is not there (Chip can confirm?), so a P1+ COG counter, will be a subset of things possible on a P1.
Also, having the wide-adder not in the Pin Cell, can make the PinCell a little smaller and faster, but at the cost of a larger overall Logic area, from the duplicated counters.
Having a FPGA mix of Standard Logic and smarter CounterCells makes sense, and improves test coverages from a finite FPGA.
"Standard" can mean many things, and even a Basic Logic I/O on a FPGA platform is not going to be the same as the Final Pin designs, (of any pins without Counters).
For my applications... No.
New Propeller Chip - 16 April 2014 200MHz system clock 16 cogs with 2-clock instructions, hub execution at 50% cog speed 512KB hub memory with 8/16/32/128 bit cog transfers 64 smart I/O pins 100-pin 14x14mm TQFP with exposed thermal GND pad -- addressable cog registers -- -- addr read write name hidden -- ----------------------------------------------------------------------- -- -- 000-1EF RAM RAM -- -- 1F0 CNT - CNT ICACHE0 -- 1F1 RND - RND ICACHE0 -- 1F2 INA - INA ICACHE0 -- 1F3 INB - INB ICACHE0 -- 1F4 RAM RAM+OUTA OUTA -- 1F5 RAM RAM+OUTB OUTB -- 1F6 RAM RAM+DIRA DIRA -- 1F7 RAM RAM+DIRB DIRB -- 1F8 RAM RAM+CTRA CTRA -- 1F9 RAM RAM+CTRB CTRB -- 1FA RAM RAM+FRQA FRQA -- 1FB RAM RAM+FRQB FRQB -- 1FC PHSA PHSA PHSA ICACHE1 -- 1FD PHSB PHSB PHSB ICACHE1 -- 1FE PTRA PTRA PTRA ICACHE1 -- 1FF PTRB PTRB PTRB ICACHE1 ZCDS (for D column: W=write, M=modify, R=read, L=read/immediate) ---------------------------------------------------------------------------------------------------------------------- ZCWS 0000000 ZC I CCCC DDDDDDDDD SSSSSSSSS RDBYTE D,S/PTRA/PTRB (waits for hub) ZCWS 0000001 ZC I CCCC DDDDDDDDD SSSSSSSSS RDWORD D,S/PTRA/PTRB (waits for hub) ZCWS 0000010 ZC I CCCC DDDDDDDDD SSSSSSSSS RDLONG D,S/PTRA/PTRB (waits for hub) ZCWS 0000011 ZC I CCCC DDDDDDDDD SSSSSSSSS RDQUAD D,S/PTRA/PTRB (waits for hub) ZCMS 0000100 ZC I CCCC DDDDDDDDD SSSSSSSSS SYSOP D,S/# (waits for hub, S/# determines four write-long enables) ZCWS 0000101 ZC I CCCC DDDDDDDDD SSSSSSSSS MSGIN D,S/# (receives message on pin, C=timeout) ZCMS 0000110 ZC I CCCC DDDDDDDDD SSSSSSSSS MUL D,S/# multiplier (16 x 16 unsigned multiply) ZCMS 0000111 ZC I CCCC DDDDDDDDD SSSSSSSSS MULS D,S/# multiplier (16 x 16 signed multiply) ZCMS 0001000 ZC I CCCC DDDDDDDDD SSSSSSSSS ISOB D,S/# bitop ZCMS 0001001 ZC I CCCC DDDDDDDDD SSSSSSSSS NOTB D,S/# bitop ZCMS 0001010 ZC I CCCC DDDDDDDDD SSSSSSSSS CLRB D,S/# bitop ZCMS 0001011 ZC I CCCC DDDDDDDDD SSSSSSSSS SETB D,S/# bitop ZCMS 0001100 ZC I CCCC DDDDDDDDD SSSSSSSSS SETBC D,S/# bitop ZCMS 0001101 ZC I CCCC DDDDDDDDD SSSSSSSSS SETBNC D,S/# bitop ZCMS 0001110 ZC I CCCC DDDDDDDDD SSSSSSSSS SETBZ D,S/# bitop ZCMS 0001111 ZC I CCCC DDDDDDDDD SSSSSSSSS SETBNZ D,S/# bitop ZCMS 0010000 ZC I CCCC DDDDDDDDD SSSSSSSSS ANDN D,S/# logic ZCMS 0010001 ZC I CCCC DDDDDDDDD SSSSSSSSS AND D,S/# logic ZCMS 0010010 ZC I CCCC DDDDDDDDD SSSSSSSSS OR D,S/# logic ZCMS 0010011 ZC I CCCC DDDDDDDDD SSSSSSSSS XOR D,S/# logic ZCMS 0010100 ZC I CCCC DDDDDDDDD SSSSSSSSS MUXC D,S/# logic ZCMS 0010101 ZC I CCCC DDDDDDDDD SSSSSSSSS MUXNC D,S/# logic ZCMS 0010110 ZC I CCCC DDDDDDDDD SSSSSSSSS MUXZ D,S/# logic ZCMS 0010111 ZC I CCCC DDDDDDDDD SSSSSSSSS MUXNZ D,S/# logic ZCMS 0011000 ZC I CCCC DDDDDDDDD SSSSSSSSS ROR D,S/# rotator ZCMS 0011001 ZC I CCCC DDDDDDDDD SSSSSSSSS ROL D,S/# rotator ZCMS 0011010 ZC I CCCC DDDDDDDDD SSSSSSSSS SHR D,S/# rotator ZCMS 0011011 ZC I CCCC DDDDDDDDD SSSSSSSSS SHL D,S/# rotator ZCMS 0011100 ZC I CCCC DDDDDDDDD SSSSSSSSS RCR D,S/# rotator ZCMS 0011101 ZC I CCCC DDDDDDDDD SSSSSSSSS RCL D,S/# rotator ZCMS 0011110 ZC I CCCC DDDDDDDDD SSSSSSSSS SAR D,S/# rotator ZCMS 0011111 ZC I CCCC DDDDDDDDD SSSSSSSSS REV D,S/# rotator ZCWS 0100000 ZC I CCCC DDDDDDDDD SSSSSSSSS MOV D,S/# adder ZCWS 0100001 ZC I CCCC DDDDDDDDD SSSSSSSSS ABS D,S/# adder ZCWS 0100010 ZC I CCCC DDDDDDDDD SSSSSSSSS ABSNEG D,S/# adder ZCWS 0100011 ZC I CCCC DDDDDDDDD SSSSSSSSS NEG D,S/# adder ZCWS 0100100 ZC I CCCC DDDDDDDDD SSSSSSSSS NEGC D,S/# adder ZCWS 0100101 ZC I CCCC DDDDDDDDD SSSSSSSSS NEGNC D,S/# adder ZCWS 0100110 ZC I CCCC DDDDDDDDD SSSSSSSSS NEGZ D,S/# adder ZCWS 0100111 ZC I CCCC DDDDDDDDD SSSSSSSSS NEGNZ D,S/# adder ZCMS 0101000 ZC I CCCC DDDDDDDDD SSSSSSSSS MIN D,S/# adder ZCMS 0101001 ZC I CCCC DDDDDDDDD SSSSSSSSS MAX D,S/# adder ZCMS 0101010 ZC I CCCC DDDDDDDDD SSSSSSSSS MINS D,S/# adder ZCMS 0101011 ZC I CCCC DDDDDDDDD SSSSSSSSS MAXS D,S/# adder ZCMS 0101100 ZC I CCCC DDDDDDDDD SSSSSSSSS SUMC D,S/# adder ZCMS 0101101 ZC I CCCC DDDDDDDDD SSSSSSSSS SUMNC D,S/# adder ZCMS 0101110 ZC I CCCC DDDDDDDDD SSSSSSSSS SUMZ D,S/# adder ZCMS 0101111 ZC I CCCC DDDDDDDDD SSSSSSSSS SUMNZ D,S/# adder ZCMS 0110000 ZC I CCCC DDDDDDDDD SSSSSSSSS ADD D,S/# adder ZCMS 0110001 ZC I CCCC DDDDDDDDD SSSSSSSSS SUB D,S/# adder ZCMS 0110010 ZC I CCCC DDDDDDDDD SSSSSSSSS ADDS D,S/# adder ZCMS 0110011 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBS D,S/# adder ZCMS 0110100 ZC I CCCC DDDDDDDDD SSSSSSSSS ADDX D,S/# adder ZCMS 0110101 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBX D,S/# adder ZCMS 0110110 ZC I CCCC DDDDDDDDD SSSSSSSSS ADDSX D,S/# adder ZCMS 0110111 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBSX D,S/# adder ZCWS 0111000 ZC I CCCC DDDDDDDDD SSSSSSSSS NOT D,S/# adder ZCMS 0111001 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBR D,S/# adder ZCMS 0111010 ZC I CCCC DDDDDDDDD SSSSSSSSS ADDABS D,S/# adder ZCMS 0111011 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBABS D,S/# adder ZCMS 0111100 ZC I CCCC DDDDDDDDD SSSSSSSSS INCMOD D,S/# adder ZCMS 0111101 ZC I CCCC DDDDDDDDD SSSSSSSSS DECMOD D,S/# adder ZCMS 0111110 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPSUB D,S/# adder ZCMS 0111111 ZC I CCCC DDDDDDDDD SSSSSSSSS WAITCNT D,S/# adder ZCMS 1000000 ZC I CCCC DDDDDDDDD SSSSSSSSS SETS D,S/# muxer ZCWS 1000001 ZC I CCCC DDDDDDDDD SSSSSSSSS GETS D,S/# muxer ZCMS 1000010 ZC I CCCC DDDDDDDDD SSSSSSSSS SETD D,S/# muxer ZCWS 1000011 ZC I CCCC DDDDDDDDD SSSSSSSSS GETD D,S/# muxer ZCMS 1000100 ZC I CCCC DDDDDDDDD SSSSSSSSS SETCOND D,S/# muxer ZCWS 1000101 ZC I CCCC DDDDDDDDD SSSSSSSSS GETCOND D,S/# muxer ZCMS 1000110 ZC I CCCC DDDDDDDDD SSSSSSSSS SETI D,S/# muxer ZCWS 1000111 ZC I CCCC DDDDDDDDD SSSSSSSSS GETI D,S/# muxer --MS 100100n nn I CCCC DDDDDDDDD SSSSSSSSS RORNIBn D,S/# muxer --MS 100101n nn I CCCC DDDDDDDDD SSSSSSSSS ROLNIBn D,S/# muxer --WS 100110n nn I CCCC DDDDDDDDD SSSSSSSSS GETNIBn D,S/# muxer --MS 100111n nn I CCCC DDDDDDDDD SSSSSSSSS SETNIBn D,S/# muxer --MS 1010000 nn I CCCC DDDDDDDDD SSSSSSSSS RORBYTn D,S/# muxer --MS 1010001 nn I CCCC DDDDDDDDD SSSSSSSSS ROLBYTn D,S/# muxer --WS 1010010 nn I CCCC DDDDDDDDD SSSSSSSSS GETBYTn D,S/# muxer --MS 1010011 nn I CCCC DDDDDDDDD SSSSSSSSS SETBYTn D,S/# muxer --MS 1010100 0n I CCCC DDDDDDDDD SSSSSSSSS RORWRDn D,S/# muxer --MS 1010100 1n I CCCC DDDDDDDDD SSSSSSSSS ROLWRDn D,S/# muxer --WS 1010101 0n I CCCC DDDDDDDDD SSSSSSSSS GETWRDn D,S/# muxer --MS 1010101 1n I CCCC DDDDDDDDD SSSSSSSSS SETWRDn D,S/# muxer ZCWS 1010110 ZC I CCCC DDDDDDDDD SSSSSSSSS ESWAP4 D,S/# muxer ZCWS 1010111 ZC I CCCC DDDDDDDDD SSSSSSSSS ESWAP8 D,S/# muxer ZCWS 1011000 ZC I CCCC DDDDDDDDD SSSSSSSSS SPLITW D,S/# muxer ZCWS 1011001 ZC I CCCC DDDDDDDDD SSSSSSSSS MERGEW D,S/# muxer ZCMS 1011010 ZC I CCCC DDDDDDDDD SSSSSSSSS DJZ D,S/@ adder ZCMS 1011011 ZC I CCCC DDDDDDDDD SSSSSSSSS DJNZ D,S/@ adder ZCWS 1011100 ZC I CCCC DDDDDDDDD SSSSSSSSS TOPBIT D,S/# miscellaneous ZCWS 1011101 ZC I CCCC DDDDDDDDD SSSSSSSSS DECOD D,S/# ZCMS 1011110 ZC I CCCC DDDDDDDDD SSSSSSSSS ALTDS D,S/# (set up redirection for result/D/S) ZCWS 1011111 ZC I CCCC DDDDDDDDD SSSSSSSSS JMPSW D,S/@ (jump to S/@, store return address in D, WZ/WC to save/load flags) ZCRS 1100000 ZC I CCCC DDDDDDDDD SSSSSSSSS TESTB D,S/# bitop tests and compares ZCRS 1100001 ZC I CCCC DDDDDDDDD SSSSSSSSS TESTN D,S/# logic ZCRS 1100010 ZC I CCCC DDDDDDDDD SSSSSSSSS TEST D,S/# logic ZCRS 1100011 ZC I CCCC DDDDDDDDD SSSSSSSSS CMP D,S/# adder ZCRS 1100100 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPX D,S/# adder ZCRS 1100101 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPS D,S/# adder ZCRS 1100110 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPSX D,S/# adder ZCRS 1100111 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPR D,S/# adder ZCRS 1101000 ZC I CCCC DDDDDDDDD SSSSSSSSS TJZ D,S/@ ZCRS 1101001 ZC I CCCC DDDDDDDDD SSSSSSSSS TJNZ D,S/@ ZCRS 1101010 ZC I CCCC DDDDDDDDD SSSSSSSSS TJS D,S/@ ZCRS 1101011 ZC I CCCC DDDDDDDDD SSSSSSSSS TJNS D,S/@ ZCRS 1101100 ZC I CCCC DDDDDDDDD SSSSSSSSS - D,S/# ZCRS 1101101 ZC I CCCC DDDDDDDDD SSSSSSSSS - D,S/# ZCRS 1101110 ZC I CCCC DDDDDDDDD SSSSSSSSS - D,S/# ZCRS 1101111 ZC I CCCC DDDDDDDDD SSSSSSSSS - D,S/# --LS 1110000 0L I CCCC DDDDDDDDD SSSSSSSSS WRBYTE D/#,S/PTRA/PTRB (waits for hub) --LS 1110000 1L I CCCC DDDDDDDDD SSSSSSSSS WRWORD D/#,S/PTRA/PTRB (waits for hub) --LS 1110001 0L I CCCC DDDDDDDDD SSSSSSSSS WRLONG D/#,S/PTRA/PTRB (waits for hub) --LS 1110001 1L I CCCC DDDDDDDDD SSSSSSSSS WRQUAD D/#,S/PTRA/PTRB (waits for hub, zero-extends #) --LS 1110010 0L I CCCC DDDDDDDDD SSSSSSSSS MSGOUTA D/#,S/# (send message to pin(s) on OUTA) --LS 1110010 1L I CCCC DDDDDDDDD SSSSSSSSS MSGOUTB D/#,S/# (send message to pin(s) on OUTB) --LS 1110011 0L I CCCC DDDDDDDDD SSSSSSSSS MSGDIRA D/#,S/# (send message to pin(s) on DIRA) --LS 1110011 1L I CCCC DDDDDDDDD SSSSSSSSS MSGDIRB D/#,S/# (send message to pin(s) on DIRB) --LS 1110100 0L I CCCC DDDDDDDDD SSSSSSSSS WAITPAE D/#,S/# (waits for INA) --LS 1110100 1L I CCCC DDDDDDDDD SSSSSSSSS WAITPAN D/#,S/# (waits for INA) --LS 1110101 0L I CCCC DDDDDDDDD SSSSSSSSS WAITPBE D/#,S/# (waits for INB) --LS 1110101 1L I CCCC DDDDDDDDD SSSSSSSSS WAITPBN D/#,S/# (waits for INB) --LS 1110110 0L I CCCC DDDDDDDDD SSSSSSSSS WAITVID D/#,S/# (waits for video) --LS 1110110 1L I CCCC DDDDDDDDD SSSSSSSSS PICKZC D/#,S/# (always writes Z/C) --LS 1110111 0L I CCCC DDDDDDDDD SSSSSSSSS JP D/#,S/@ (jump if pin IN high, pins registered at beginning of ALU cycle) --LS 1110111 1L I CCCC DDDDDDDDD SSSSSSSSS JNP D/#,S/@ (jump if pin IN high, pins registered at beginning of ALU cycle) --LS 1111000 0L I CCCC DDDDDDDDD SSSSSSSSS REP D/#,S/# (begin repeat block of size D/# with S/# iterations) --LS 1111000 1L I CCCC DDDDDDDDD SSSSSSSSS - D/#,S/# --LS 1111001 0L I CCCC DDDDDDDDD SSSSSSSSS - D/#,S/# --LS 1111001 1L I CCCC DDDDDDDDD SSSSSSSSS - D/#,S/# --LS 1111010 0L I CCCC DDDDDDDDD SSSSSSSSS - D/#,S/# --LS 1111010 1L I CCCC DDDDDDDDD SSSSSSSSS - D/#,S/# --LS 1111011 0L I CCCC DDDDDDDDD SSSSSSSSS - D/#,S/# --LS 1111011 1L I CCCC DDDDDDDDD SSSSSSSSS - D/#,S/# --LS 1111100 0L I CCCC DDDDDDDDD SSSSSSSSS - D/#,S/# --LS 1111100 1L I CCCC DDDDDDDDD SSSSSSSSS - D/#,S/# ---- 1111101 00 n nnnn nnnnnnnnn nnnnnnnnn AUGS #23bits (appends n to upper bits of next immediate S in same task) ---- 1111101 01 n nnnn nnnnnnnnn nnnnnnnnn AUGD #23bits (appends n to upper bits of next immediate D in same task) ---- 1111101 10 0 CCCC 0 nnnnnnnnnnnnnnnnn LOC #abs (write 17-bit absolute address to $1EF) ---- 1111101 10 0 CCCC 1 nnnnnnnnnnnnnnnnn LOC @rel (write 17-bit relative address to $1EF) ---- wr 1111101 10 1 CCCC 0 nnnnnnnnnnnnnnnnn JMP #abs (jump to 17-bit absolute address and write {Z,C,P[16:0]} to $1EF) ---- wr 1111101 10 1 CCCC 1 nnnnnnnnnnnnnnnnn JMP @rel (jump to 17-bit relative address and write {Z,C,P[16:0]} to $1EF) ---- 1111101 11 0 CCCC 0 nnnnnnnnnnnnnnnnn CALL #abs (call to 17-bit absolute address using 4-level stack) ---- 1111101 11 0 CCCC 1 nnnnnnnnnnnnnnnnn CALL @rel (call to 17-bit relative address using 4-level stack) ---- 1111101 11 1 CCCC 0 nnnnnnnnnnnnnnnnn CALLA #abs (call to 17-bit absolute address using PTRA) ---- 1111101 11 1 CCCC 1 nnnnnnnnnnnnnnnnn CALLA @rel (call to 17-bit relative address using PTRA) ---- 1111110 00 n CCCC n nnnnnnnnnnnnnnnnn SETPTRA #abs (write 19-bit absolute address to PTRA) ---- 1111110 01 n CCCC n nnnnnnnnnnnnnnnnn SETPTRA @rel (write 19-bit relative address to PTRA) ---- 1111110 10 n CCCC n nnnnnnnnnnnnnnnnn SETPTRB #abs (write 19-bit absolute address to PTRB) ---- 1111110 11 n CCCC n nnnnnnnnnnnnnnnnn SETPTRB @rel (write 19-bit relative address to PTRB) --L- 1111111 00 L CCCC DDDDDDDDD xxxx00000 WAIT D/# (wait for some number of clocks, 0 same as 1) --L- 1111111 00 L CCCC DDDDDDDDD xxxx00001 WAITPX D/# (wait for any edge on pin D/#) --L- 1111111 00 L CCCC DDDDDDDDD xxxx00010 WAITPR D/# (wait for pos edge on pin D/#) --L- 1111111 00 L CCCC DDDDDDDDD xxxx00011 WAITPF D/# (wait for neg edge on pin D/#) --L- 1111111 00 L CCCC DDDDDDDDD xxxx00100 PUSH D/# (push D/# into 4-level stack) --L- 1111111 00 L CCCC DDDDDDDDD xxxx00101 SETVID D/# (set video mode) --L- 1111111 00 L CCCC DDDDDDDDD xxxx00110 - D/# --L- 1111111 00 L CCCC DDDDDDDDD xxxx00111 - D/# (D[18:17] into Z/C via WZ/WC for JMP/CALL/CALLA/POP D) ZCR- wr 1111111 ZC x CCCC DDDDDDDDD xxxx01000 JMP D (jump to D[16:0] and write {Z,C,P[16:0]} to $1EF) ZCR- 1111111 ZC x CCCC DDDDDDDDD xxxx01001 CALL D (call to D[16:0] using 4-level stack) ZCR- 1111111 ZC x CCCC DDDDDDDDD xxxx01010 CALLA D (call to D[16:0] using PTRA stack) ZCR- 1111111 ZC x CCCC DDDDDDDDD xxxx01011 - D ZCR- 1111111 ZC x CCCC DDDDDDDDD xxxx01100 - D --R- 1111111 00 x CCCC DDDDDDDDD xxxx01101 - D --R- 1111111 00 x CCCC DDDDDDDDD xxxx01110 - D --R- 1111111 00 x CCCC DDDDDDDDD xxxx01111 - D ZCW- 1111111 ZC x CCCC DDDDDDDDD xxxx10000 POP D (pop 4-level stack into D) --W- 1111111 00 x CCCC DDDDDDDDD xxxx10001 - D --W- 1111111 00 x CCCC DDDDDDDDD xxxx10010 - D --W- 1111111 00 x CCCC DDDDDDDDD xxxx10011 - D --W- 1111111 00 x CCCC DDDDDDDDD xxxx10100 - D --W- 1111111 00 x CCCC DDDDDDDDD xxxx10101 - D --W- 1111111 00 x CCCC DDDDDDDDD xxxx10110 - D --W- 1111111 00 x CCCC DDDDDDDDD xxxx10111 - D ZC-- 1111111 ZC x CCCC xxxxxxxxx xxxx11000 RET (return using 4-level stack) ZC-- 1111111 ZC x CCCC xxxxxxxxx xxxx11001 RETA (return using PTRA stack) ZC-- 1111111 ZC x CCCC xxxxxxxxx xxxx11010 POLVID (C = ready for WAITVID) -C-- 1111111 0C x CCCC xxxxxxxxx xxxx11011 CACHEX (invalidate instruction cache) ---- 1111111 00 x CCCC xxxxxxxxx xxxx11100 - ---- 1111111 00 x CCCC xxxxxxxxx xxxx11101 - ---- 1111111 00 x CCCC xxxxxxxxx xxxx11110 - ---- 1111111 00 x CCCC xxxxxxxxx xxxx11111 - ---- 0000000 00 0 0000 000000000 000000000 NOP Aliases for WRLONG/RDLONG: PUSHA/PUSHB/POPA/POPB
Note that the JMP instructions save a return address into $1EF, so these double as the old LINK instructions.
I see it, never mind.
Looks good to me.
-Phil
I read that as an alias-write design, so RAM has a copy of (last) dira, and that means 'or' will work as expected ?
That's right. Prop1 works like this, too. It saves a bunch of D and S mux's.
Looks good...hubexec at 50% should be just fine...it will give people incentive to think creatively in using cog memory for faster code:) Plus the option of using hubexec.
I agree. If Hubexec is available it will get used. Ditto for multitasking.
But if they are not, there are simple software alternatives that can achieve most of the benefits. The omission of these hardware features won't seriously impact on most people, despite all the cries of woe and despondency you tend to get in these threads when someone's "favorite" feature appears to be under threat.
I think Chip's best course of action would be to take a week or two away from the forums and sort these things out before making any more announcements about this stuff.
Ross.
Multiple cogs at 50MIPs, that sounds rather exciting!!
If we have 4 cogs per DE0, should be an easy re-write of invaders, just split the tasks out to a cog each.
Lots more hub activity, perhaps. It'll be interesting to compare the current consumption on the DE0, for the old vs new solution.