The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

evanh · 2014-04-16 07:57

jmg was suggesting there is potentially a power saving on a per counter basis when compared to having them in the Cogs. So, 32 counters on a pin-pair basis should be lighter than the 32 counters we currently have across the 16 Cogs.

Seairth · 2014-04-16 07:57

cgracey wrote: »

In the end, we'll likely have hardware multi-tasking back, too. Sorry for all the alarm.

No, no, no! Don't go saying things like that!

No, I'm not opposed to hardware multitasking being added back in. (Well, I am, but for other reasons.) I'm opposed to you making statements like this! Be ruthless in what you cut out (for now). Get something working. And then add features back in if you want. But don't say you'll likely add something back in until you are ready to actually add it. Otherwise, all you're doing is putting that pressure back on yourself!

Dave Hein · 2014-04-16 08:19

evanh wrote: »

jmg was suggesting there is potentially a power saving on a per counter basis when compared to having them in the Cogs. So, 32 counters on a pin-pair basis should be lighter than the 32 counters we currently have across the 16 Cogs.

I don't understand the logic in that. The counters should consume the same amount of power independent of where they are located. I think putting them at the pins is just a way to move some of the power dissipation away from the cogs.

Dave Hein · 2014-04-16 08:27

The problem with deciding on whether hubex and/or multi-tasking should be in is that there is no data about how much real-estate they take, or how much power they dissipate. We know that it's possible to implement both of them, since they exist in the P2 design. If the size and power dissipation figures were known a solid decision could be made on whether to include them. Without those figures the decision must be made on some adhoc basis.

Ken Gracey · 2014-04-16 09:20

Dave Hein wrote: »

The problem with deciding on whether hubex and/or multi-tasking should be in is that there is no data about how much real-estate they take, or how much power they dissipate.

I am also in favor of not asking Chip to bother with multi-tasking for the reasons cited above and a few others.

Propeller 2 (that's what I'm calling it for now, unless we decide on Propeller 16 or another name) will be used by lots of inventors and entrepreneurs, and their specialty will often lie in fields other than embedded design. The capabilities and needs of these customers is very different than the forum members who contribute to this thread. They might be specialists in renewable energy, medical, robotics, aeronautics, environmental measurements, etc. We will likely have ten thousand customers using 100 to 1000 units each, plus a number of big wins. This particular customer can be overwhelmed by design considerations and possibilities, sometimes so much that they think this chip isn't right for their project because it offers so much. Telling them that "you've got 16 cores to use here - no need to worry about interrupts" will take care of 98% of our users for now. But when you further explain that each core has even more capabilities, like multitasking, then they really start to wonder how the heck they'll architect their program and the discussion will quickly diverge from the joy of understanding a simple design to wondering how it all works. Tracking program flow would be a challenge for them.

When Propeller 1 first came out we did a seminar for our European distributors. These people are technical buyers, representing electronic distribution companies. They can hook it up, load and modify our code examples, and have enough skill to show their customers what a Propeller can do. For two days we went through the Propeller architecture, far too deep for most of them. While multitasking may not be new to a crowd like this, I can imagine it being too much when combined with multicore. They already have trouble envisioning and accounting for the fact any core can write to any pin, how a system clock is accessed, etc. There's no need to complicate something that we expect people to sell.

And all the time adds up, too. I have no idea how much design time it might take, but I'm sure it's a week or more to bring P3 Verilog into P2. And it'll take a week or more for Jeff to document in the data sheet. Drawings, explanations, sample code - it could quickly add up to another month.

I don't know what I don't know, and perhaps that's all I know for certain. This is my take and for now I'm sticking to it. Don't mistake me for a wet blanket, as I only believe there's a lot of sense in optimizing what we've got and refusing as much temptation as possible for now. There's also a very strong business case for making frequent iterations and design improvements, which could be every year or two in our case.

Ken Gracey

Heater. · 2014-04-16 09:30

Glad to here you say all that Ken. Most of it seems spot on.

Many of us have been campaigning for simplicity for a long while. Imagine having to explain those 500 opcodes of the previous P2 incarnation to those European distributors. You would be in Europe for a lot long than 2 days !

We all want silicon, now!

I do like the idea of frequent iterations and improvements.

pjv · 2014-04-16 09:31

cgracey wrote: »

Question:

Is anyone going to be disappointed if we get rid of hardware multitasking in the cogs?

It's a cool feature, but does introduce jitter in tasks, depending on the instruction mix. It also takes some extra flops and logic to support properly, beyond the Z/C/PC's.

I'm thinking about the ROM_Monitor and realizing that I could code it with a single task by doing cooperative multitasking at a few time-critical points in the program. I wouldn't need hardware multitasking, after all.

No multitasking would keep the cogs very simple to understand, and keep them deterministic.

Any thoughts?

Oh, I like the way this is (or at least was) going.

I suspect that most folks, not having used co-operative multitasking, dont realize how fast and simple it is..... once you have an effective scheduler up and running.

In following the banter here and reading comments about why the co-operative approach is bad, it just seems to me that the approach is poorly understood. From the comments, most seem to have it backwards.

So, while I would probably make some use of a hardware approach if it exisited, it certainly is not a "required" feature.

And with the new instructions, I expect the P1 style scheduler's performance will be significantly faster. I'm really looking forward to making one.

So I vote for dropping the hardware approach as I believe it does not add a lot of value.

Cheers,

Peter (pjv)

Heater. · 2014-04-16 09:39

pjv,

Nope, we don't have it backwards re: cooperative vs hardware scheduling. Certainly not if you are also including a software scheduler into the mix (as opposed to the coroutines you find in FullDuplexSerial for example)

Not that I'm saying hardware scheduling is essential for this chip.

jazzed · 2014-04-16 09:51

Chip,

I always thought that hardware multitasking made things unnecessarily complicated. IIRC, some of the coding required to make it work seemed rather absurd (that may have been cleaned up after I lost interest).

cgracey wrote: »

Question:

Is anyone going to be disappointed if we get rid of hardware multitasking in the cogs?

It's a cool feature, but does introduce jitter in tasks, depending on the instruction mix. It also takes some extra flops and logic to support properly, beyond the Z/C/PC's.

I'm thinking about the ROM_Monitor and realizing that I could code it with a single task by doing cooperative multitasking at a few time-critical points in the program. I wouldn't need hardware multitasking, after all.

No multitasking would keep the cogs very simple to understand, and keep them deterministic.

Any thoughts?

Bill Henning · 2014-04-16 09:55

pjv,

If Chip puts in tasks, great. It would be great for packing up to drivers into a cog.

If he does not, that is his choice.

I've used co-operating multitasking many times... over many decades.

It is far inferior to hardware tasks. (ask XMOS <grin>)

It saves memory, it makes timing high speed signals far easier. It does not need a scheduler.

I will readily grant you that for low speed signals, toggling a ton of led's etc, cooperative is all you need.

But.

Let's take practical examples.

P1+ style cog as discussed, 100MIPS (for simple two clock cycle instructions)

hardware tasks, interleaved every instruction.

25MIPS per task

LED toggling test: 25M toggles (XOR) for each task.

bit-banged serial, half duplex >5mbps

co-operative version?

LED toggling, half the performance (xor, jmpsw), 12.5M toggles

bit-banged serial - best guess, as the interleaving has to happen at waiting for start bit edge, and every bitcell thereafter - ~2mbps max

For high speed signals,

- tasks give you a roughly 2:1 speed advantage
- much easier to write code for
- uses less cog memory

As for people will not understand it / too complex... they don't have to use it if they don't want to, heck co-operative threads are still possible.

Bottom line:

It is up to Chip

pjv wrote: »

Oh, I like the way this is (or at least was) going.

I suspect that most folks, not having used co-operative multitasking, dont realize how fast and simple it is..... once you have an effective scheduler up and running.

In following the banter here and reading comments about why the co-operative approach is bad, it just seems to me that the approach is poorly understood. From the comments, most seem to have it backwards.

So, while I would probably make some use of a hardware approach if it exisited, it certainly is not a "required" feature.

And with the new instructions, I expect the P1 style scheduler's performance will be significantly faster. I'm really looking forward to making one.

So I vote for dropping the hardware approach as I believe it does not add a lot of value.

Cheers,

Peter (pjv)

pedward · 2014-04-16 10:49

I am in favor of making the core logic simple. I like the idea of Hubex, but I don't like the idea of hubex and multiple tasks.

I specifically recommended against this because I know all the baggage it brings with it. It begets feature creep and makes the elegant design ugly. I really didn't like the look of the P2 after Hubex was added because it lost all of it's elegance.

The chip needs to have as few "rules" as possible. Tasking creates more "rules". Hubex creates more "rules". Multiple tasks with hubex creates yet more compound "rules".

The P1 is simple, there are few rules, basically they fall into "hub instruction" and "not hub instruction". The difference is that one takes 8-22 clocks and the other takes 4 clocks.

The P-X needs to be simple. If Hubex exists, all instructions run from Hub need to take a certain number of clocks, whether 8 or 16 or 4. Multi-tasking was an artifact of the pipeline structure of the P2, the P-X isn't pipelined, so I don't think it should have tasks.

Keep the video simple, but improved over the P1.

Keep the counters working like the P1, with PHSA and PHSB, etc. We like to have the ability to write to them in realtime to synthesize FM output. I wrote an FM transmitter program that could broadcast audio at FM broadcast frequencies. Having actual FM in addition to AM output to the pins is useful and is "free" in the sense that the counter just toggles a pin and doesn't have to communicate a value to the pin I/O circuit.

Above all else, add simplicity, not complexity. I really like the idea of "Hub" peripherals, because they are space efficient and give you a few dedicated building blocks to make stuff. Most importantly, they work in a simpler fashion, rather than multiplying the kitchen sink by n number of Cogs. The P2 rationale was to push everything into the COGs to avoid clock issues, but that just made an overly complex COG.

KISS!

Phil Pilgrim (PhiPi) · 2014-04-16 11:10

+16 What pedward said!

-Phil

Kye · 2014-04-16 11:15

+1 Pedward

I stopped following this forum mostly too after there was much discussion about tasks too. I only gained interest when hub exec was added.

jmg · 2014-04-16 11:32

cgracey wrote: »

Hub exec IS going to be in the next chip. I've just been stalled out over the last few days getting the new instruction set nailed down. At times, all the details seem overwhelming and I think about paring it down, just to get it going again. This new memory scheme (dual-port 128x128 bits, instead of quad-port 512x32) changes a lot of things. It's hard for me to get there in one step. Asking you guys how you'd feel about dropping certain features alleviates the pressure on me. In the end, we'll likely have hardware multi-tasking back, too. Sorry for all the alarm.

Great. This 'divide and conquer' sounds an ideal way to proceed.
You can also get MHz number indicators from any Builds you do along the way, to map the impact of the changes.

jmg · 2014-04-16 11:44

evanh wrote: »

jmg was suggesting there is potentially a power saving on a per counter basis when compared to having them in the Cogs. So, 32 counters on a pin-pair basis should be lighter than the 32 counters we currently have across the 16 Cogs.

Power is Cpd * Ft * Vcc^2, Cpd is the sum of the Register plus routing Loads.
So identical sized routing will have identical powers.

The scope I see for power saving in a PinCell, is that the routing tools can focus just on that locally, whilst the COG counter will have to juggle for space with all the other critical paths in the COG during the Autoroute.

Because the PinCell is not large, there may even be some manual layout assist possible, especially if needed to meet MHz targets (and usually a smaller cell results).

jmg · 2014-04-16 11:52

pedward wrote: »

Keep the counters working like the P1, with PHSA and PHSB, etc. We like to have the ability to write to them in realtime to synthesize FM output. I wrote an FM transmitter program that could broadcast audio at FM broadcast frequencies. Having actual FM in addition to AM output to the pins is useful and is "free" in the sense that the counter just toggles a pin and doesn't have to communicate a value to the pin I/O circuit.

There is a backward-compatible case for COG counters, but IIRC the design flow process means the PLL is not there (Chip can confirm?), so a P1+ COG counter, will be a subset of things possible on a P1.

Also, having the wide-adder not in the Pin Cell, can make the PinCell a little smaller and faster, but at the cost of a larger overall Logic area, from the duplicated counters.

jmg · 2014-04-16 12:04

Bill Henning wrote:

Drooling...
On the DE0, could you bring out all 64 I/O's? Make the rest of them 'dumb' but still available for regular dumb-io

Having a FPGA mix of Standard Logic and smarter CounterCells makes sense, and improves test coverages from a finite FPGA.

Ken Gracey wrote: »

So we don't establish a new term how about naming them "standard" I/O for now if they're not all "smart" on the FPGA, if it is the case that they don't have the same characteristics?

"Standard" can mean many things, and even a Basic Logic I/O on a FPGA platform is not going to be the same as the Final Pin designs, (of any pins without Counters).

rabaggett · 2014-04-16 12:58

cgracey wrote: »

Is hub exec worth slowing the cog down for?

For my applications... No.

cgracey · 2014-04-16 13:18

Okay. Here is the tentative plan for the new chip:

New Propeller Chip - 16 April 2014

	200MHz system clock
	16 cogs with 2-clock instructions, hub execution at 50% cog speed
	512KB hub memory with 8/16/32/128 bit cog transfers
	64 smart I/O pins
	100-pin 14x14mm TQFP with exposed thermal GND pad


-- addressable cog registers
--
--	addr		read		write		name		hidden
--	-----------------------------------------------------------------------
--
--	000-1EF		RAM		RAM
--
--	1F0		CNT		-		CNT		ICACHE0
--	1F1		RND		-		RND		ICACHE0
--	1F2		INA		-		INA		ICACHE0
--	1F3		INB		-		INB		ICACHE0
--	1F4		RAM		RAM+OUTA	OUTA
--	1F5		RAM		RAM+OUTB	OUTB
--	1F6		RAM		RAM+DIRA	DIRA
--	1F7		RAM		RAM+DIRB	DIRB
--	1F8		RAM		RAM+CTRA	CTRA
--	1F9		RAM		RAM+CTRB	CTRB
--	1FA		RAM		RAM+FRQA	FRQA
--	1FB		RAM		RAM+FRQB	FRQB
--	1FC		PHSA		PHSA		PHSA		ICACHE1
--	1FD		PHSB		PHSB		PHSB		ICACHE1
--	1FE		PTRA		PTRA		PTRA		ICACHE1
--	1FF		PTRB		PTRB		PTRB		ICACHE1



ZCDS (for D column: W=write, M=modify, R=read, L=read/immediate)
----------------------------------------------------------------------------------------------------------------------

ZCWS		0000000 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDBYTE	D,S/PTRA/PTRB		(waits for hub)
ZCWS		0000001 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDWORD	D,S/PTRA/PTRB		(waits for hub)
ZCWS		0000010 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDLONG	D,S/PTRA/PTRB		(waits for hub)
ZCWS		0000011 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDQUAD	D,S/PTRA/PTRB		(waits for hub)

ZCMS		0000100 ZC I CCCC DDDDDDDDD SSSSSSSSS		SYSOP	D,S/#			(waits for hub, S/# determines four write-long enables)

ZCWS		0000101 ZC I CCCC DDDDDDDDD SSSSSSSSS		MSGIN	D,S/#			(receives message on pin, C=timeout)

ZCMS		0000110 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUL	D,S/#			multiplier	(16 x 16 unsigned multiply)
ZCMS		0000111 ZC I CCCC DDDDDDDDD SSSSSSSSS		MULS	D,S/#			multiplier	(16 x 16 signed multiply)

ZCMS		0001000 ZC I CCCC DDDDDDDDD SSSSSSSSS		ISOB	D,S/#			bitop
ZCMS		0001001 ZC I CCCC DDDDDDDDD SSSSSSSSS		NOTB	D,S/#			bitop
ZCMS		0001010 ZC I CCCC DDDDDDDDD SSSSSSSSS		CLRB	D,S/#			bitop
ZCMS		0001011 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETB	D,S/#			bitop
ZCMS		0001100 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETBC	D,S/#			bitop
ZCMS		0001101 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETBNC	D,S/#			bitop
ZCMS		0001110 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETBZ	D,S/#			bitop
ZCMS		0001111 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETBNZ	D,S/#			bitop

ZCMS		0010000 ZC I CCCC DDDDDDDDD SSSSSSSSS		ANDN	D,S/#			logic
ZCMS		0010001 ZC I CCCC DDDDDDDDD SSSSSSSSS		AND	D,S/#			logic
ZCMS		0010010 ZC I CCCC DDDDDDDDD SSSSSSSSS		OR	D,S/#			logic
ZCMS		0010011 ZC I CCCC DDDDDDDDD SSSSSSSSS		XOR	D,S/#			logic
ZCMS		0010100 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUXC	D,S/#			logic
ZCMS		0010101 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUXNC	D,S/#			logic
ZCMS		0010110 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUXZ	D,S/#			logic
ZCMS		0010111 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUXNZ	D,S/#			logic

ZCMS		0011000 ZC I CCCC DDDDDDDDD SSSSSSSSS		ROR	D,S/#			rotator
ZCMS		0011001 ZC I CCCC DDDDDDDDD SSSSSSSSS		ROL	D,S/#			rotator
ZCMS		0011010 ZC I CCCC DDDDDDDDD SSSSSSSSS		SHR	D,S/#			rotator
ZCMS		0011011 ZC I CCCC DDDDDDDDD SSSSSSSSS		SHL	D,S/#			rotator
ZCMS		0011100 ZC I CCCC DDDDDDDDD SSSSSSSSS		RCR	D,S/#			rotator
ZCMS		0011101 ZC I CCCC DDDDDDDDD SSSSSSSSS		RCL	D,S/#			rotator
ZCMS		0011110 ZC I CCCC DDDDDDDDD SSSSSSSSS		SAR	D,S/#			rotator
ZCMS		0011111 ZC I CCCC DDDDDDDDD SSSSSSSSS		REV	D,S/#			rotator

ZCWS		0100000 ZC I CCCC DDDDDDDDD SSSSSSSSS		MOV	D,S/#			adder
ZCWS		0100001 ZC I CCCC DDDDDDDDD SSSSSSSSS		ABS	D,S/#			adder
ZCWS		0100010 ZC I CCCC DDDDDDDDD SSSSSSSSS		ABSNEG	D,S/#			adder
ZCWS		0100011 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEG	D,S/#			adder
ZCWS		0100100 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEGC	D,S/#			adder
ZCWS		0100101 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEGNC	D,S/#			adder
ZCWS		0100110 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEGZ	D,S/#			adder
ZCWS		0100111 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEGNZ	D,S/#			adder

ZCMS		0101000 ZC I CCCC DDDDDDDDD SSSSSSSSS		MIN	D,S/#			adder
ZCMS		0101001 ZC I CCCC DDDDDDDDD SSSSSSSSS		MAX	D,S/#			adder
ZCMS		0101010 ZC I CCCC DDDDDDDDD SSSSSSSSS		MINS	D,S/#			adder
ZCMS		0101011 ZC I CCCC DDDDDDDDD SSSSSSSSS		MAXS	D,S/#			adder
ZCMS		0101100 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUMC	D,S/#			adder
ZCMS		0101101 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUMNC	D,S/#			adder
ZCMS		0101110 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUMZ	D,S/#			adder
ZCMS		0101111 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUMNZ	D,S/#			adder

ZCMS		0110000 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADD	D,S/#			adder
ZCMS		0110001 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUB	D,S/#			adder
ZCMS		0110010 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADDS	D,S/#			adder
ZCMS		0110011 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBS	D,S/#			adder
ZCMS		0110100 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADDX	D,S/#			adder
ZCMS		0110101 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBX	D,S/#			adder
ZCMS		0110110 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADDSX	D,S/#			adder
ZCMS		0110111 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBSX	D,S/#			adder

ZCWS		0111000 ZC I CCCC DDDDDDDDD SSSSSSSSS		NOT	D,S/#			adder
ZCMS		0111001 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBR	D,S/#			adder
ZCMS		0111010 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADDABS	D,S/#			adder
ZCMS		0111011 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBABS	D,S/#			adder
ZCMS		0111100 ZC I CCCC DDDDDDDDD SSSSSSSSS		INCMOD	D,S/#			adder
ZCMS		0111101 ZC I CCCC DDDDDDDDD SSSSSSSSS		DECMOD	D,S/#			adder
ZCMS		0111110 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPSUB	D,S/#			adder
ZCMS		0111111 ZC I CCCC DDDDDDDDD SSSSSSSSS		WAITCNT	D,S/#			adder

ZCMS		1000000 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETS	D,S/#			muxer
ZCWS		1000001 ZC I CCCC DDDDDDDDD SSSSSSSSS		GETS	D,S/#			muxer
ZCMS		1000010 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETD	D,S/#			muxer
ZCWS		1000011 ZC I CCCC DDDDDDDDD SSSSSSSSS		GETD	D,S/#			muxer
ZCMS		1000100 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETCOND	D,S/#			muxer
ZCWS		1000101 ZC I CCCC DDDDDDDDD SSSSSSSSS		GETCOND	D,S/#			muxer
ZCMS		1000110 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETI	D,S/#			muxer
ZCWS		1000111 ZC I CCCC DDDDDDDDD SSSSSSSSS		GETI	D,S/#			muxer

--MS		100100n nn I CCCC DDDDDDDDD SSSSSSSSS		RORNIBn	D,S/#			muxer
--MS		100101n nn I CCCC DDDDDDDDD SSSSSSSSS		ROLNIBn	D,S/#			muxer
--WS		100110n nn I CCCC DDDDDDDDD SSSSSSSSS		GETNIBn	D,S/#			muxer
--MS		100111n nn I CCCC DDDDDDDDD SSSSSSSSS		SETNIBn	D,S/#			muxer

--MS		1010000 nn I CCCC DDDDDDDDD SSSSSSSSS		RORBYTn	D,S/#			muxer
--MS		1010001 nn I CCCC DDDDDDDDD SSSSSSSSS		ROLBYTn	D,S/#			muxer
--WS		1010010 nn I CCCC DDDDDDDDD SSSSSSSSS		GETBYTn	D,S/#			muxer
--MS		1010011 nn I CCCC DDDDDDDDD SSSSSSSSS		SETBYTn	D,S/#			muxer

--MS		1010100 0n I CCCC DDDDDDDDD SSSSSSSSS		RORWRDn	D,S/#			muxer
--MS		1010100 1n I CCCC DDDDDDDDD SSSSSSSSS		ROLWRDn	D,S/#			muxer
--WS		1010101 0n I CCCC DDDDDDDDD SSSSSSSSS		GETWRDn	D,S/#			muxer
--MS		1010101 1n I CCCC DDDDDDDDD SSSSSSSSS		SETWRDn	D,S/#			muxer

ZCWS		1010110 ZC I CCCC DDDDDDDDD SSSSSSSSS		ESWAP4	D,S/#			muxer
ZCWS		1010111 ZC I CCCC DDDDDDDDD SSSSSSSSS		ESWAP8	D,S/#			muxer

ZCWS		1011000 ZC I CCCC DDDDDDDDD SSSSSSSSS		SPLITW	D,S/#			muxer
ZCWS		1011001 ZC I CCCC DDDDDDDDD SSSSSSSSS		MERGEW	D,S/#			muxer

ZCMS		1011010 ZC I CCCC DDDDDDDDD SSSSSSSSS		DJZ	D,S/@			adder
ZCMS		1011011 ZC I CCCC DDDDDDDDD SSSSSSSSS		DJNZ	D,S/@			adder

ZCWS		1011100 ZC I CCCC DDDDDDDDD SSSSSSSSS		TOPBIT	D,S/#			miscellaneous
ZCWS		1011101 ZC I CCCC DDDDDDDDD SSSSSSSSS		DECOD	D,S/#
ZCMS		1011110 ZC I CCCC DDDDDDDDD SSSSSSSSS		ALTDS	D,S/#			(set up redirection for result/D/S)
ZCWS		1011111 ZC I CCCC DDDDDDDDD SSSSSSSSS		JMPSW	D,S/@			(jump to S/@, store return address in D, WZ/WC to save/load flags)

ZCRS		1100000 ZC I CCCC DDDDDDDDD SSSSSSSSS		TESTB	D,S/#			bitop	tests and compares
ZCRS		1100001 ZC I CCCC DDDDDDDDD SSSSSSSSS		TESTN	D,S/#			logic
ZCRS		1100010 ZC I CCCC DDDDDDDDD SSSSSSSSS		TEST	D,S/#			logic
ZCRS		1100011 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMP	D,S/#			adder
ZCRS		1100100 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPX	D,S/#			adder
ZCRS		1100101 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPS	D,S/#			adder
ZCRS		1100110 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPSX	D,S/#			adder
ZCRS		1100111 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPR	D,S/#			adder

ZCRS		1101000 ZC I CCCC DDDDDDDDD SSSSSSSSS		TJZ	D,S/@
ZCRS		1101001 ZC I CCCC DDDDDDDDD SSSSSSSSS		TJNZ	D,S/@
ZCRS		1101010 ZC I CCCC DDDDDDDDD SSSSSSSSS		TJS	D,S/@
ZCRS		1101011 ZC I CCCC DDDDDDDDD SSSSSSSSS		TJNS	D,S/@

ZCRS		1101100 ZC I CCCC DDDDDDDDD SSSSSSSSS		-	D,S/#
ZCRS		1101101 ZC I CCCC DDDDDDDDD SSSSSSSSS		-	D,S/#
ZCRS		1101110 ZC I CCCC DDDDDDDDD SSSSSSSSS		-	D,S/#
ZCRS		1101111 ZC I CCCC DDDDDDDDD SSSSSSSSS		-	D,S/#

--LS		1110000 0L I CCCC DDDDDDDDD SSSSSSSSS		WRBYTE	D/#,S/PTRA/PTRB		(waits for hub)
--LS		1110000 1L I CCCC DDDDDDDDD SSSSSSSSS		WRWORD	D/#,S/PTRA/PTRB		(waits for hub)
--LS		1110001 0L I CCCC DDDDDDDDD SSSSSSSSS		WRLONG	D/#,S/PTRA/PTRB		(waits for hub)
--LS		1110001 1L I CCCC DDDDDDDDD SSSSSSSSS		WRQUAD	D/#,S/PTRA/PTRB		(waits for hub, zero-extends #)

--LS		1110010 0L I CCCC DDDDDDDDD SSSSSSSSS		MSGOUTA	D/#,S/#			(send message to pin(s) on OUTA)
--LS		1110010 1L I CCCC DDDDDDDDD SSSSSSSSS		MSGOUTB	D/#,S/#			(send message to pin(s) on OUTB)
--LS		1110011 0L I CCCC DDDDDDDDD SSSSSSSSS		MSGDIRA	D/#,S/#			(send message to pin(s) on DIRA)
--LS		1110011 1L I CCCC DDDDDDDDD SSSSSSSSS		MSGDIRB	D/#,S/#			(send message to pin(s) on DIRB)

--LS		1110100 0L I CCCC DDDDDDDDD SSSSSSSSS		WAITPAE	D/#,S/#			(waits for INA)
--LS		1110100 1L I CCCC DDDDDDDDD SSSSSSSSS		WAITPAN	D/#,S/#			(waits for INA)
--LS		1110101 0L I CCCC DDDDDDDDD SSSSSSSSS		WAITPBE	D/#,S/#			(waits for INB)
--LS		1110101 1L I CCCC DDDDDDDDD SSSSSSSSS		WAITPBN	D/#,S/#			(waits for INB)

--LS		1110110 0L I CCCC DDDDDDDDD SSSSSSSSS		WAITVID	D/#,S/#			(waits for video)
--LS		1110110 1L I CCCC DDDDDDDDD SSSSSSSSS		PICKZC	D/#,S/#			(always writes Z/C)
--LS		1110111 0L I CCCC DDDDDDDDD SSSSSSSSS		JP	D/#,S/@			(jump if pin IN high, pins registered at beginning of ALU cycle)
--LS		1110111 1L I CCCC DDDDDDDDD SSSSSSSSS		JNP	D/#,S/@			(jump if pin IN high, pins registered at beginning of ALU cycle)

--LS		1111000 0L I CCCC DDDDDDDDD SSSSSSSSS		REP	D/#,S/#			(begin repeat block of size D/# with S/# iterations)
--LS		1111000 1L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
--LS		1111001 0L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
--LS		1111001 1L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#

--LS		1111010 0L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
--LS		1111010 1L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
--LS		1111011 0L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
--LS		1111011 1L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#

--LS		1111100 0L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
--LS		1111100 1L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#

----		1111101 00 n nnnn nnnnnnnnn nnnnnnnnn		AUGS	#23bits			(appends n to upper bits of next immediate S in same task)
----		1111101 01 n nnnn nnnnnnnnn nnnnnnnnn		AUGD	#23bits			(appends n to upper bits of next immediate D in same task)

----		1111101 10 0 CCCC 0 nnnnnnnnnnnnnnnnn		LOC	#abs			(write 17-bit absolute address to $1EF)
----		1111101 10 0 CCCC 1 nnnnnnnnnnnnnnnnn		LOC	@rel			(write 17-bit relative address to $1EF)
---- wr		1111101 10 1 CCCC 0 nnnnnnnnnnnnnnnnn		JMP	#abs			(jump to 17-bit absolute address and write {Z,C,P[16:0]} to $1EF)
---- wr		1111101 10 1 CCCC 1 nnnnnnnnnnnnnnnnn		JMP	@rel			(jump to 17-bit relative address and write {Z,C,P[16:0]} to $1EF)
----		1111101 11 0 CCCC 0 nnnnnnnnnnnnnnnnn		CALL	#abs			(call to 17-bit absolute address using 4-level stack)
----		1111101 11 0 CCCC 1 nnnnnnnnnnnnnnnnn		CALL	@rel			(call to 17-bit relative address using 4-level stack)
----		1111101 11 1 CCCC 0 nnnnnnnnnnnnnnnnn		CALLA	#abs			(call to 17-bit absolute address using PTRA)
----		1111101 11 1 CCCC 1 nnnnnnnnnnnnnnnnn		CALLA	@rel			(call to 17-bit relative address using PTRA)

----		1111110 00 n CCCC n nnnnnnnnnnnnnnnnn		SETPTRA	#abs			(write 19-bit absolute address to PTRA)
----		1111110 01 n CCCC n nnnnnnnnnnnnnnnnn		SETPTRA	@rel			(write 19-bit relative address to PTRA)
----		1111110 10 n CCCC n nnnnnnnnnnnnnnnnn		SETPTRB	#abs			(write 19-bit absolute address to PTRB)
----		1111110 11 n CCCC n nnnnnnnnnnnnnnnnn		SETPTRB	@rel			(write 19-bit relative address to PTRB)

--L-		1111111 00 L CCCC DDDDDDDDD xxxx00000		WAIT	D/#			(wait for some number of clocks, 0 same as 1)
--L-		1111111 00 L CCCC DDDDDDDDD xxxx00001		WAITPX	D/#			(wait for any edge on pin D/#)
--L-		1111111 00 L CCCC DDDDDDDDD xxxx00010		WAITPR	D/#			(wait for pos edge on pin D/#)
--L-		1111111 00 L CCCC DDDDDDDDD xxxx00011		WAITPF	D/#			(wait for neg edge on pin D/#)
--L-		1111111 00 L CCCC DDDDDDDDD xxxx00100		PUSH	D/#			(push D/# into 4-level stack)
--L-		1111111 00 L CCCC DDDDDDDDD xxxx00101		SETVID	D/#			(set video mode)
--L-		1111111 00 L CCCC DDDDDDDDD xxxx00110		-	D/#
--L-		1111111 00 L CCCC DDDDDDDDD xxxx00111		-	D/#
												(D[18:17] into Z/C via WZ/WC for JMP/CALL/CALLA/POP D)
ZCR- wr		1111111 ZC x CCCC DDDDDDDDD xxxx01000		JMP	D			(jump to D[16:0] and write {Z,C,P[16:0]} to $1EF)
ZCR-		1111111 ZC x CCCC DDDDDDDDD xxxx01001		CALL	D			(call to D[16:0] using 4-level stack)
ZCR-		1111111 ZC x CCCC DDDDDDDDD xxxx01010		CALLA	D			(call to D[16:0] using PTRA stack)
ZCR-		1111111 ZC x CCCC DDDDDDDDD xxxx01011		-	D
ZCR-		1111111 ZC x CCCC DDDDDDDDD xxxx01100		-	D
--R-		1111111 00 x CCCC DDDDDDDDD xxxx01101		-	D
--R-		1111111 00 x CCCC DDDDDDDDD xxxx01110		-	D
--R-		1111111 00 x CCCC DDDDDDDDD xxxx01111		-	D

ZCW-		1111111 ZC x CCCC DDDDDDDDD xxxx10000		POP	D			(pop 4-level stack into D)
--W-		1111111 00 x CCCC DDDDDDDDD xxxx10001		-	D
--W-		1111111 00 x CCCC DDDDDDDDD xxxx10010		-	D
--W-		1111111 00 x CCCC DDDDDDDDD xxxx10011		-	D
--W-		1111111 00 x CCCC DDDDDDDDD xxxx10100		-	D
--W-		1111111 00 x CCCC DDDDDDDDD xxxx10101		-	D
--W-		1111111 00 x CCCC DDDDDDDDD xxxx10110		-	D
--W-		1111111 00 x CCCC DDDDDDDDD xxxx10111		-	D

ZC--		1111111 ZC x CCCC xxxxxxxxx xxxx11000		RET				(return using 4-level stack)
ZC--		1111111 ZC x CCCC xxxxxxxxx xxxx11001		RETA				(return using PTRA stack)
ZC--		1111111 ZC x CCCC xxxxxxxxx xxxx11010		POLVID				(C = ready for WAITVID)
-C--		1111111 0C x CCCC xxxxxxxxx xxxx11011		CACHEX				(invalidate instruction cache)
----		1111111 00 x CCCC xxxxxxxxx xxxx11100		-
----		1111111 00 x CCCC xxxxxxxxx xxxx11101		-
----		1111111 00 x CCCC xxxxxxxxx xxxx11110		-
----		1111111 00 x CCCC xxxxxxxxx xxxx11111		-

----		0000000 00 0 0000 000000000 000000000		NOP

Aliases for WRLONG/RDLONG: 	PUSHA/PUSHB/POPA/POPB

Note that the JMP instructions save a return address into $1EF, so these double as the old LINK instructions.

Bill Henning · 2014-04-16 13:26

I took a quick read, will digest later, but.... Looks good!

potatohead · 2014-04-16 13:33

I only saw pollvid in there, no waitvid, or is that the two ops shown below, still a little undefined?

I see it, never mind.

Looks good to me.

Phil Pilgrim (PhiPi) · 2014-04-16 13:36

Chip, any particular reason for the write-only-ness of some of the registers? And how would a read-modify-write operation work on them? For example,

or dira,#1

-Phil

jmg · 2014-04-16 13:39

Phil Pilgrim (PhiPi) wrote: »

Chip, any particular reason for the write-only-ness of some of the registers? And how would a read-modify-write operation work on them? For example,
or dira,#1

I read that as an alias-write design, so RAM has a copy of (last) dira, and that means 'or' will work as expected ?

cgracey · 2014-04-16 13:59

jmg wrote: »

I read that as an alias-write design, so RAM has a copy of (last) dira, and that means 'or' will work as expected ?

That's right. Prop1 works like this, too. It saves a bunch of D and S mux's.

dr hydra · 2014-04-16 14:04

Chip

Looks good...hubexec at 50% should be just fine...it will give people incentive to think creatively in using cog memory for faster code:) Plus the option of using hubexec.

Kye · 2014-04-16 14:12

50 MIPs hub exec works, that's fast enough to compete.

RossH · 2014-04-16 14:14

Seairth wrote: »

No, no, no! Don't go saying things like that!

No, I'm not opposed to hardware multitasking being added back in. (Well, I am, but for other reasons.) I'm opposed to you making statements like this! Be ruthless in what you cut out (for now). Get something working. And then add features back in if you want. But don't say you'll likely add something back in until you are ready to actually add it. Otherwise, all you're doing is putting that pressure back on yourself!

I agree. If Hubexec is available it will get used. Ditto for multitasking.

But if they are not, there are simple software alternatives that can achieve most of the benefits. The omission of these hardware features won't seriously impact on most people, despite all the cries of woe and despondency you tend to get in these threads when someone's "favorite" feature appears to be under threat.

I think Chip's best course of action would be to take a week or two away from the forums and sort these things out before making any more announcements about this stuff.

Ross.

mindrobots · 2014-04-16 14:14

Kye wrote: »

50 MIPs hub exec works, that's fast enough to compete.

Multiple cogs at 50MIPs, that sounds rather exciting!!

Tubular · 2014-04-16 14:54

ozpropdev wrote: »

Fabulous! Cool! Great! Awesome! Excellent!

If we have 4 cogs per DE0, should be an easy re-write of invaders, just split the tasks out to a cog each.

Lots more hub activity, perhaps. It'll be interesting to compare the current consumption on the DE0, for the old vs new solution.

Brian Fairchild · 2014-04-16 15:04

Kye wrote: »

50 MIPs hub exec works, that's fast enough to compete.

Especially with 16 core to play with.

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments