What if the Propeller Architecture just makes excellent Microcontrollers??

mindrobots · 2014-05-09 06:07

Maybe the P2 is just destined to be the excellent evolution of the very capable and underappreciated P1 Microcontroller.

Bristling with smart I/O pins, packed full of more capable cores and all the empty spaces filled with HUBRAM.

Load COGS up with Parallax vetted soft peripherals glued together with Spin or C Application Orchestration / HMI code (I don't like "Business Logic"), or try some OBEX objects for more obscure soft peripherals and applicaiotns.

When you exceed the capabilities of a MICROCONTROLLER, you grab your favorite microprocessor or SOC, replace the Orchestration object in your Propeller with the Parallax provided SPI or I2C (or whatever else is cool) and turn your propeller into an awesome Soft Peripheral Collection. Once you do this, go crazy with your HMI stuff in a more appropriate playground. (Of course, you could also just put another P2 in front of your original P2.

So what does the P2 REALLY need to be the best MIcroncontroller and Soft Peripheral Controller it can be?

More HUBRAM?
More Smart Pins?
Better P1 counter emulation?
SERDES?
Anything extra to make SPI slave high speed and flawless? (You don't need a bazillion /CS lines, you can have the SPI interface protocol use first command as a peripheral select - a soft /CS)
Multi-tasking?
HubExec? (I'm thinking not)
Expanded COGRAM?? (can we give up two of the cccc bits to provide one more address bit for SREG and DREG?) 1024 longs is twice as good as 512 longs!
Faster COGS?
DSP type features?

What else is needed to travel down this road?

Parallax supported soft peripheral library?
Soft Peripheral builder tool (drag and drop configuration for components and interfaces)?

Is it a better road to travel?

Maybe it just can't evolve into a general purpose microprocessor?

Just another road to consider...probably not popular with a lot of the factions but the P1 did get us here because of the soft peripheral concept and its capabilities as a microcontroller.

EDIT: This also an excellent stepping stone to Heater's diabolical plan of having the P3 be a SOC with an ARM core for Application Orchestration and a 64 bit collection of SUPER COGs and Smart Pins.

Kerry S · 2014-05-09 07:35

With the death of the P2 it is clear we will need to use external processors for video, HMI and large data functions.

Your overview is a good one. The dream of an 'all in one' chip is gone so what is the next step? I love Heaters Cog+ARM concept but that is, for now, just a dream and less likely than the P2 getting made.

SO following on from your ideas.

1. SERDES for as fast as possible serial comm as this is now critical in data exchange with the 'other' chip.

2. Better PASM with the new bit/nibble/pin operators that Chip has been working on. That allows tighter faster code.

3. Smart pins sound good, if they don't end up being so complex that they break the design or power envelope...

4. I know it cannot be done, due to instruction length limitations, but I would rather see the COGS with larger memory pools so that they can run programs internal without hub ram being involved. Hub ram should just be for data sharing. If we are moving our video/HMI/DATA off chip then the cogs should be optimized for doing complex driver functions. HUBEXEC I fear is going to be the eternal 'breaks this design' that leaves us in an infinite while loop waiting for it.

5. More pins, but that will not happen due to package choice.

6. Hardware tasking would be good. Something that does not break determinism. For a lot of I/O we don't need the bandwidth we have without using the cog to do multiple tasks.

Here is what I would like to see:

Hardware 1 / 2 / 4 task settings. Each of these hardware tasks would have their OWN 2K memory block with a small shared ram section (similar to the P2) for internal data sharing/coordination by special instructions. By having the tasks have their own memory blocks we get around the 9bit addressing issue with the instruction set. Now we, in effect, get 4x the memory for 4 tasks running in a hard locked time frame. Determinism is kept. This in effect gives us 4 micro cores within each cog. It also reduces the need for HUBEXEC which we almost HAVE to have in order to have these cogs working on multiple processes (I/O) at one time.

With this you can choose for each cog 1 x 80 MIPS, 2 x 40 MIPS or 4 x 20 MIPS.

Then forget Hubexec, just coordinate with the C compiler guys to make sure they can optimize LMM programming.

jmg · 2014-05-09 14:15

mindrobots wrote: »

Bristling with smart I/O pins, packed full of more capable cores and all the empty spaces filled with HUBRAM.

Yes, and in the order you have written.

mindrobots wrote: »

So what does the P2 REALLY need to be the best MIcroncontroller and Soft Peripheral Controller it can be?

More HUBRAM?
More Smart Pins?
Better P1 counter emulation?
SERDES?
Anything extra to make SPI slave high speed and flawless? (You don't need a bazillion /CS lines, you can have the SPI interface protocol use first command as a peripheral select - a soft /CS)
Multi-tasking?
HubExec? (I'm thinking not)
Expanded COGRAM?? (can we give up two of the cccc bits to provide one more address bit for SREG and DREG?) 1024 longs is twice as good as 512 longs!
Faster COGS?
DSP type features?

What else is needed to travel down this road?

Soft Peripheral Controller is a very good name, and it focuses that in larger designs, this is not doing all the lifting.

A clear market space to me, is LCD Display drivers, aka SSD1963 - that is a focused Soft Peripheral Controller itself.
That needs a good chunk of RAM, and good QuadSPI and Parallel Slave Port, and not a lot of Core IQ.
Of course, Higher CoreIQ can grow it from a scan engine, to a smarter graphics handler.

Another market space, is between Microcontroller and CPLD/Small FPGA.
Moderate CPLDs are cheap, but to get one that can also do even modest uC work bumps you into P2 territory.
There are many designs where a Small FPGA is used as a Peripheral Controller, where a P2 can compete.
Especially if it focuses on the Smart Pins, and doing things FPGA cannot touch, like ADC on pins...
With a simple slave parallel port, it can also perform as 'Smarter SRAM'
(eg I see Cypress now support NVRAM with asynchronous Open NAND Flash Interface (ONFI)

mindrobots wrote: »

Maybe it just can't evolve into a general purpose microprocessor?

Nor should it try, but note that once it can do the above, it ls already more capable than your Grandfather's Z80.

mindrobots wrote: »

Expanded COGRAM?? (can we give up two of the cccc bits to provide one more address bit for SREG and DREG?) 1024 longs is twice as good as 512 longs!

Because it is quad port, COGRAM 'costs' twice as much as HubRam, so any added COGRAM subtracts 2x that in HUB.
HUB is already just a bit small, but once HUBRAM is enough to support LCDs, then more COGRAM could be looked at.

There may already be opcode support, in the form of longer jumps & relative jumps.- that would make lower 512 code/data, and > 512 could be Code / indirect data only, which might allow lower cost 2 port memory.
A COG might then be 1024, with lower 512 as register/code and upper 512 as code/idata

OR if that upper memory is dual ported to an adjacent COG, would 512 as register/code + 256 as code/idata/shared ?

jmg · 2014-05-09 14:25

Kerry S wrote: »

6. Hardware tasking would be good. Something that does not break determinism. For a lot of I/O we don't need the bandwidth we have without using the cog to do multiple tasks.

Here is what I would like to see:

Hardware 1 / 2 / 4 task settings. Each of these hardware tasks would have their OWN 2K memory block with a small shared ram section (similar to the P2) for internal data sharing/coordination by special instructions. By having the tasks have their own memory blocks we get around the 9bit addressing issue with the instruction set. Now we, in effect, get 4x the memory for 4 tasks running in a hard locked time frame. Determinism is kept. This in effect gives us 4 micro cores within each cog. It also reduces the need for HUBEXEC which we almost HAVE to have in order to have these cogs working on multiple processes (I/O) at one time.

With this you can choose for each cog 1 x 80 MIPS, 2 x 40 MIPS or 4 x 20 MIPS.

Then forget Hubexec, just coordinate with the C compiler guys to make sure they can optimize LMM programming.

Interesting idea, I've not seen info on the 'relative challenge' in Doing each of HubExec or Tasking, but tasking certainly appeals as a way to not waste silicon, and many I/O problems are small, but determinism is essential.
Chip already had a flexible Task mapper, and I see no reason to leave out the 3x choice, or abac choice
ie above you have aaaa, abab, abcd, easy to include the useful but missing abc, abac & aab, aaab

dMajo · 2014-05-10 06:12

Kerry S wrote: »

...

Here is what I would like to see:

Hardware 1 / 2 / 4 task settings. Each of these hardware tasks would have their OWN 2K memory block with a small shared ram section (similar to the P2) for internal data sharing/coordination by special instructions. By having the tasks have their own memory blocks we get around the 9bit addressing issue with the instruction set. Now we, in effect, get 4x the memory for 4 tasks running in a hard locked time frame. Determinism is kept. This in effect gives us 4 micro cores within each cog. It also reduces the need for HUBEXEC which we almost HAVE to have in order to have these cogs working on multiple processes (I/O) at one time.

With this you can choose for each cog 1 x 80 MIPS, 2 x 40 MIPS or 4 x 20 MIPS.

Then forget Hubexec, just coordinate with the C compiler guys to make sure they can optimize LMM programming.

This is exactly what I also like and already similarly proposed.

With 8K cogs perhaps you also do not need a common shared area when in treading mode. It will simply suffice to extend a bit the mov instruction set with:

mov01, mov02, mov03, mov04: to move from current (page) source to hard-wired 2K page destination
mov10, mov20, mov30, mov40: to move from hard-wired 2K page source to current (page) destination
the classic mov (and other opcodes) will always address source and destination within the current page.

Additionally a jmp1, jmp2, jmp3, jmp4 can jump within the 2K addresed page to switch it in case of single thread to be able to execute from all the 8K area or 4K in case of 2 threads.
In this way the shared (communication) area between threads (where and how wide) is up to the programmer.

coginit, cogstop will index cog,thread .. in case of cogstop of multithreaded cogs it will not change the behavior but start executing nops in place of the killed thread code.

This do not require any additional read/write port to the cog ram because the cpu executing the code is always the same. Only the IO registers should be replicated/mapped at the end of any 2K block.

With such 8Kcogs, even with only 8 of them, I'll not miss hubexec. rd/wr-quad will still be wished.

What if the Propeller Architecture just makes excellent Microcontrollers??

Comments