The Parallax Chip I Really Want...

Brian Fairchild · 2014-05-15 05:18

I'm under no illusion that I'll ever get one, but for the record, the chip I'd really like is...

16 x 100 MIPS cores (cores 1-16) each with 4k of 32-bit longs for registers, no hubexec, smart pins as previously described
1 additional (identical) core 0 with no IO but with hubexec from central memory
256k of central memory accessed in a strict round robin core0, core1, core0, core2, core0, core3....
central cordic and central big multiplier

that's it, no slot allocation, no tables, no complicated schemes.

I hear the cry "but the core only has 9-bit address fields" to which I reply "so what, that's the last processor, the proposed one bears no resemblance to it, and existing objects won't run without modification anyway."

Plus a decent IDE C compiler, for which I'd pay money, and a set of known working soft-peripherals to get designs off the ground quickly.

The reality is that I'll not see one of those nor any other silicon in the wild for at least a year. Shame really. But, on the up side I can buy M4 based boards now for less than the projected cost of a P16X64A chip and if I bought a DE0-nano I can fit 12 NIOS-IIs in it.

Heater. · 2014-05-15 06:26

Brian,

To address those 4K registers you will need at least 12 bits. That's another 8 bits per instruction. Are you proposing a 40 bit wide instruction in order to fit them in?

Clearly not as you specified 32-bit longs.

So how are you going to do this? I look forward to seeing your proposed instruction set an instruction encodings.

mindrobots · 2014-05-15 07:16

OK, how about we make the instruction fetch just a double long fetch (increment P by 2 with each instruction fetch). There's a 64 bit instruction decoder which sees the second long as 16 bits of D and 16 bits of S. The 18 bits that used to be D and S in the first long can be all kinds of fun things! (you could even add few o those bits to the OPCODE field so you could support 1024 or 2048 different OPCODES!!!

There's like over 8 billion mnemonic combinations using the English alphabet, so we certainly won't run out of mnemonics!

Or go back to variable length instructions on a LONG boundary just like we used to have variable length instructions on BYTE boundaries in the old days. Remember all the fun of hand disassembling 8 bit machine code?? It will be like deja vu all over again!!

Martin Hodge · 2014-05-15 09:19

My ideal chip? P1 ported to a smaller faster process.

That's it...

(Of course it's way too late for that.)

Heater. · 2014-05-15 11:32

Martin,

P1 ported to a smaller faster process.

Sounds exactly what we are heading toward now.

Except with more pins, more RAM and more COGS.

Sounds great!

base2design · 2014-05-15 13:22

I was musing about this too and find myself just wanting two (faster) P1's on a chip with some kind of FIFO/hub-thingy between them and on-chip flash.

That's my .02$ worth!

-joe

Dave Hein · 2014-05-15 13:34

At this point I would be satisfied with a 200 MHz P1 with Port B implemented, 64K of hub RAM and executing one instruction every two cycles. That would be 5 times faster than the current Propeller, and it would takes almost no effort to develop.

base2design · 2014-05-15 13:35

Excuse my laziness, but would "Port B" translate into a second set of 32-I/O pins?

Dave Hein · 2014-05-15 13:37

Oh, and could I have a 16x16 multiplier and 8K of RAM per cog?

EDIT: Yes, port B would provide an additional 32 bits of I/O.

Dave Hein · 2014-05-15 13:40

Heater. wrote: »

Brian,

To address those 4K registers you will need at least 12 bits. That's another 8 bits per instruction. Are you proposing a 40 bit wide instruction in order to fit them in?

Clearly not as you specified 32-bit longs.

So how are you going to do this? I look forward to seeing your proposed instruction set an instruction encodings.

It would use the hubexec method, except as applied to cog RAM.

kwinn · 2014-05-15 18:26

No reason you couldn't have cogs with 40 bit registers and hub ram with 32 bit longs. There were some Harvard architecture systems built, and some current chips use it internally by implementing 2 separate address and data buses between the cpu and data/instruction caches. A bit more complicated, but 40 bit math ops might make it worth while.

Heater. wrote: »

Brian,

To address those 4K registers you will need at least 12 bits. That's another 8 bits per instruction. Are you proposing a 40 bit wide instruction in order to fit them in?

Clearly not as you specified 32-bit longs.

So how are you going to do this? I look forward to seeing your proposed instruction set an instruction encodings.

Cluso99 · 2014-05-15 18:58

Dave Hein wrote: »

It would use the hubexec method, except as applied to cog RAM.

Spot on Dave!

And in fact it would not be hubex, but in fact native because in fact, that is how modern micros (both microcontrollers and microprocessors) work...
Main program and data memory, and registers. In our case we have ~496 registers + ~16 special registers.
Running code in those registers is abnormal in the normal scheme of things.

And absolutely deterministic while running within the cog (core).

Heater. · 2014-05-15 21:22

mindrobots,

...how about we make the instruction fetch just a double long fetch...

Better to leave all that for the 64 bit P3

Then we keep the same nice regular architecture and instruction encoding style but can extend those src/dst fields to enable addressing millions of HUB registers.

That should satisfy all those who want more memory space in HUB without making crude hacks in the architecture.

I suspect a 40 bit wide architecture would do. But that might be to weird for people to accept.

kwinn

No reason you couldn't have cogs with 40 bit registers and hub ram with 32 bit longs.

Eeew. I can see many reasons for not wanting to do that.

mindrobots · 2014-05-15 21:59

Heater. wrote: »

....without making crude hacks in the architecture.

Um, no, we wouldn't want to do that!

JonnyMac · 2014-05-17 07:38

On my morning walk I was wishing for a 4 core P1 with hubexec and smart pins, a smaller chip with say, 16 IOs. I have a lot of small projects that cannot bear the cost of the P1. I think, considering the market(s) for small micros, that a smaller Propeller would do very well with volume vendors. But that's me thinking aloud as I type....

Kerry S · 2014-05-17 08:52

I would have been happy with a faster P1 core with a few new instructions (bit and pin control operators mainly like we had on the SX) and 16K or 32K cog ram for programs/data and more I/O with simple analog capabilities. Shared Hub ram could be 16K or 32K.

jmg · 2014-05-17 13:50

JonnyMac wrote: »

On my morning walk I was wishing for a 4 core P1 with hubexec and smart pins, a smaller chip with say, 16 IOs. I have a lot of small projects that cannot bear the cost of the P1. I think, considering the market(s) for small micros, that a smaller Propeller would do very well with volume vendors. But that's me thinking aloud as I type....

I agree that is an opening, but the questions are what Price and what Package ?
Examples : I see latest Cypress PSoC4000 claims from ~29c (but that is 8p, & next size is 2x price), and Nuvoton show ~30c for a TSSOP20 part. - and both those parts have Flash on chip.

A two-chip solution, pitched at 'Low Pin Counts' is a little counter intuitive.

kwinn · 2014-05-18 13:07

Heater. wrote: »

mindrobots,

Better to leave all that for the 64 bit P3

Then we keep the same nice regular architecture and instruction encoding style but can extend those src/dst fields to enable addressing millions of HUB registers.

That should satisfy all those who want more memory space in HUB without making crude hacks in the architecture.

I suspect a 40 bit wide architecture would do. But that might be to weird for people to accept.

kwinn

Eeew. I can see many reasons for not wanting to do that.

Keeping code and data the same bit width does make for a simpler cleaner design but is not always the most efficient design. One instrument I worked on had an embedded computer with 12 bit program memory and IIRC 36 bit registers. It used the minimum hardware possible for doing the calculations with the precision needed for that application, and could be made at a reasonable cost.

As much as I like the precision and memory space 64 bits makes possible I have to wonder if it is worth the inefficiency of 64 bit instructions on a microcontroller. How much hardware can be placed on a chip for instruction decoding, cpu's, memory, and I/O? If instructions can address gigabytes of memory but the chip has space for megabytes aren't those extra bits in the instructions wasted space?

The Parallax Chip I Really Want...

Comments