The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

davidsaunders · 2015-04-05 05:46

David Betz wrote: »

Wasn't the quad width hub lost with the new "simpler" P2? I thought that was one of the features of P2-hot that caused its demise.

It may have been. I was thinking that it was one of the few things that is being carried over. Someone will have to go back through this thread and see.

davidsaunders · 2015-04-05 05:47

Heater. wrote: »

David,

I second that motion.

And I second the second.

The term COG does seem to be a put off for those accustomed to other multi core processors.

davidsaunders · 2015-04-05 05:51

Daved Betz wrote:

Wasn't the quad width hub lost with the new "simpler" P2? I thought that was one of the features of P2-hot that caused its demise.

That is what I thought, Chip said that the RDQUAD/WRQUAD access is still going to be there

.

cgracey wrote: »

This thread is about the new chip we are going to build in the 180nm process.
...

Though the cog memory map is still 512x32, cog RAM will be physically organized as 128 x 128, so we can read or write four contiguous registers with RDQUAD/WRQUAD instructions. This is way better than what we had on the Prop2, because, rather than just affecting mappable overlay registers, these transfers are into and out of the actual cog registers, themselves. These 128 bit paths don't take too much mux'ing and they keep the power down to reasonable levels. Interfaces to any peripherals can take advantage of them, too. This also gives cogs running at 200MHz (100 MIPS) a hub memory bandwidth of 200MB/s, which is enough to do any kind of VGA that we have the internal hub memory to support, at any color depth (up to 24bpp) - without any hub slot reallocation scheme needed to favor particular cogs. LMM greatly benefits from this, too.

tonyp12 · 2015-04-05 09:17

Maybe P2 should drop DRAM support and go with native hyperbus support?, though Spansion is dragging their feet it should be fully available before end of 2015 I guess.
Having 64MEG of Flash with 2MEG of RAM only using 12pins.

Bold move, stack these extra two dies on top of the P2 using buried pins and make it the default.

Second bold move, redesign the APU to natively run in hyperbus mode (maybe more for P3)
e.g the P2/3 does not have any internal ROM or Hub RAM (but a 1K bus cache)
It's a hyperbus mcu, first of it's kind (should also have native hdmi and/or usb-c support of course)

>microcontrollers where both RAM and Flash are connected to the same Hyperbus Interface.
http://news.spansion.com/2015-02-23-Spansion-Introduces-HyperRAM-Memory

jmg · 2015-04-05 12:28

David Betz wrote: »

What if you just add some really fast chip-to-chip communications to talk to a separate "application processor" and only run "soft peripherals" on the P2?

That should come by default - the smart pins should give fast master and slave operation.
Interesting could be HyperBUS slave, as well as master, as I see there are MCUs coming in 2015 with HyperBUS masters

David Betz wrote: »

Is there any chance that P2 could dispense with the term "COG" in favor of just calling them "cores"? I think one of the problems with talking about the Propeller with the uninitiated is the different jargon and COG is at the top of the list.

Agreed, and a quite easy change to make. This change has zero silicon impact

jmg · 2015-04-05 12:39

tonyp12 wrote: »

Maybe P2 should drop DRAM support and go with native hyperbus support?, though Spansion is dragging their feet it should be fully available before end of 2015 I guess.
Having 64MEG of Flash with 2MEG of RAM only using 12pins.

Yes, 2015 should see release of HyperFLASH, HyperRAM and MCUs with hyperFLASH hosting.
A design can emulate a large portion of HyperBUS right now, with 2 x QuadSPI in DDR mode. HyperBUS is actually a simpler setup, as it has less modes, than QuadSPI.DDR, but it would be a good idea to support both 2 x QuadSPI & HyperBUS links in a peripheral block. (minor config changes needed)

tonyp12 wrote: »

Second bold move, redesign the APU to natively run in hyperbus mode (maybe more for P3)
e.g the P2/3 does not have any internal ROM or Hub RAM (but a 1K bus cache)
It's a hyperbus mcu, first of it's kind (should also have native hdmi and/or usb-c support of course)

Certainly it could support eXecute In Place (XIP) on HyperBUS, but the random access time is still a little too slow to dispense entirely with HUB memory.
Maybe a dual HyperBUS design, one BUS for Data and one for Code, could be worth investigating ?
Depends how smart the smart pin cell is going to be ?

potatohead · 2015-04-05 14:19

Well you all know what I think about COG and CORE.

Frankly, COGS are just a bit different. They do not have interrupts, and they have a small address space, and they are isolated from the shared memory space too.

Typical, multi core designs I see have no isolation, have interrupts and are intended to run larger program code too.

I may feel better about it, depending on if we get HUBEXE and how well it works compared to LMM, and byte code type programs.

Say CORE and most of those expectations jump to the forefront, and then a lot of time gets spent explaining away how the Propeller core is so different from what most people think a core should be.

Additionally, those differences get us some cool stuff like objects and the ease of combining them, all well understood and proven here well in advance of the general multi processing trend for microcontrollers.

I generally switch to core when a COG looks a lot more like one, and Chip decides to go along for the ride.

The big one for me is that memory isolation. It's a notable thing that differentiates a Propeller from a device simply having more than one CPU running with a pool of RAM.

evanh · 2015-04-05 16:15

davidsaunders wrote: »

That is what I thought, Chip said that the RDQUAD/WRQUAD access is still going to be there .

Yep. What was dropped was the 128bit data path into HubRAM. It's still 128 bits from Cog to cache but the Hub databus changed from a single 128bit bus, or was it 256bit?, to sixteen ambidextrous 32bit buses. At any rate, the result is, peak throughput went up while power consumption went down.

Dave Hein · 2015-04-06 05:58

P2 Watch: 75 more days till the end of Spring.

        April                   May                   June        
Su Mo Tu We Th Fr Sa   Su Mo Tu We Th Fr Sa   Su Mo Tu We Th Fr Sa
         -- -- -- --                   1  2       1  2  3  4  5  6
--  6  7  8  9 10 11    3  4  5  6  7  8  9    7  8  9 10 11 12 13
12 13 14 15 16 17 18   10 11 12 13 14 15 16   14 15 16 17 18 19 20
19 20 21 22 23 24 25   17 18 19 20 21 22 23   -- -- -- -- -- -- --
26 27 28 29 30         24 25 26 27 28 29 30   -- -- --
                       31

David Betz · 2015-04-06 06:31

75 days until the end of spring? Then what happens? We start counting the number of days until the end of summer?

Dave Hein · 2015-04-06 07:19

Ken has stated that there will be a get-together in Rocklin in the Fall. The main point of the event is to work with and demonstrate the P2 on Parallax's FPGA board. The event will be announced 6 months before when the P2 design is completed and an FPGA image is available. 6 months before Fall is Spring, so an FPGA image should be available sometime this Spring. Spring started on March 20, and the last full day of Spring is June 20. There are 75 more days until June 20, so a P2 FPGA image should be available sometime within the next 75 days.

David Betz · 2015-04-06 07:20

Dave Hein wrote: »

Ken has stated that there will be a get-together in Rocklin in the Fall. The main point of the event is to work with and demonstrate the P2 on Parallax's FPGA board. The event will be announced 6 months before when the P2 design is completed and an FPGA image is available. 6 months before Fall is Spring, so an FPGA image should be available sometime this Spring. Spring started on March 20, and the last full day of Spring is June 20. So there are 75 more days until June 20.

Yes, I understand all of that logic but it's probably just wishful thinking to believe that they can stick to that schedule rigidly. You can't schedule creativity! :-)

Dave Hein · 2015-04-06 07:31

Of course you can schedule creativity. Companies do it all the time. That's how they remain in business and are competitive. To be honest, if Parallax doesn't fulfill this latest version of the schedule I am probably done with them. It's ludicrous that Parallax has taken soooooooo long to develop the P2. The P1 is no longer competitive with AVR's latest offerings. The P2 could compete well against it, but if Parallax continues to take a lackadaisical approach to developing it we will never see it.

kwinn · 2015-04-06 08:30

potatohead wrote: »

Well you all know what I think about COG and CORE.

Frankly, COGS are just a bit different. They do not have interrupts, and they have a small address space, and they are isolated from the shared memory space too.

Typical, multi core designs I see have no isolation, have interrupts and are intended to run larger program code too.

I may feel better about it, depending on if we get HUBEXE and how well it works compared to LMM, and byte code type programs.

Say CORE and most of those expectations jump to the forefront, and then a lot of time gets spent explaining away how the Propeller core is so different from what most people think a core should be.

Additionally, those differences get us some cool stuff like objects and the ease of combining them, all well understood and proven here well in advance of the general multi processing trend for microcontrollers.

I generally switch to core when a COG looks a lot more like one, and Chip decides to go along for the ride.

The big one for me is that memory isolation. It's a notable thing that differentiates a Propeller from a device simply having more than one CPU running with a pool of RAM.

So what! It's an atypical "core", but still the core of a processor. Personally I don't care if it's called a core or a cog, but if it helps in making the chip a success I am happy to call it anything at all.

David Betz · 2015-04-06 08:43

Dave Hein wrote: »

Of course you can schedule creativity. Companies do it all the time. That's how they remain in business and are competitive. To be honest, if Parallax doesn't fulfill this latest version of the schedule I am probably done with them. It's ludicrous that Parallax has taken soooooooo long to develop the P2. The P1 is no longer competitive with AVR's latest offerings. The P2 could compete well against it, but if Parallax continues to take a lackadaisical approach to developing it we will never see it.

Yeah, I guess you're right. At this point, it seems like the P2 will have to compete against the LowRISC chip as well since it seems to be coming out about the same time.

potatohead · 2015-04-06 08:57

It's an atypical "core", but still the core of a processor. Personally I don't care if it's called a core or a cog, but if it helps in making the chip a success I am happy to call it anything at all.

In the "dark silicon" papers recently linked here, the world "tile" is being used. We all trashed on "tile" as some kind of silly thing. But the truth is, "tile" has some connotations associated with it that happen to be important in terms of power, silicon use, performance, etc...

Calling the COGS cores may well help with success. If so, great! I'm on board.

However, I've not really seen any compelling cases on that yet.

I like COG, because it triggers the question, "What is a COG?", just as "tile" triggers, "What is a tile?" and in that moment, the other person is open to understanding things more clearly, and that is a nice thing to get done when your product is highly differentiated.

In terms of "tile" a similar dynamic plays out when discussing the latest design trends. Tiles are important, and people can, should, and I would argue want to understand those matters of importance when confronted with "tile" rather than "core", which would be the standard word use, and that would over generalize the concepts the authors found important enough to author a paper about.

On the other hand, "core" may well result in people thinking they understand, but really don't. And that's not their fault either. It just is. Then the questions become more ugly ones: "how come this core doesn't work like most other cores?" and I find, in a general sense, setting expectations right away much better for products and people than resetting them or having to manage them is.

Generalizations have merit. I'm actually a fan, and I use them regularly. I do that so ordinary people, such as myself, have a shot at understanding my efforts to share something with them.

But when there are specifics, necessary understandings that people won't always have a strong motive to grok, having a "token" or "buzz" word, term etc... can often make a ton of sense. Additionally, there is a long established observation in play, and I first read it in a BYTE column by Jerry Pournele. "If you use the term, you buy into the concept" on a basic level, and he was referring to the many new terms Microsoft was putting out there, along with others. Owning a bit of the language isn't a bad thing when it's associated with our products in a good way. This observation has been high value for me professionally over the years.

The value comes in easy, positive differentiation. People will make choices when they are aware of them. And when they understand the choices, they make better choices too.

And that's why I hold the position that I do. It's not contrary just because. I've got very specific reasons for it.

Yes, I am a pain in the Smile!

Bill Henning · 2015-04-06 13:25

Adding support for hyperbus? Not a bad idea.

Dropping DRAM support? Very bad idea.

tonyp12 wrote: »

Maybe P2 should drop DRAM support and go with native hyperbus support?, though Spansion is dragging their feet it should be fully available before end of 2015 I guess.
Having 64MEG of Flash with 2MEG of RAM only using 12pins.

Bold move, stack these extra two dies on top of the P2 using buried pins and make it the default.

Second bold move, redesign the APU to natively run in hyperbus mode (maybe more for P3)
e.g the P2/3 does not have any internal ROM or Hub RAM (but a 1K bus cache)
It's a hyperbus mcu, first of it's kind (should also have native hdmi and/or usb-c support of course)

>microcontrollers where both RAM and Flash are connected to the same Hyperbus Interface.
http://news.spansion.com/2015-02-23-Spansion-Introduces-HyperRAM-Memory

potatohead · 2015-04-06 14:54

Can hyperbus be done without licenses?

Seconded on DRAM. Needs to be possible.

JDat · 2015-04-06 15:40

Today you want hyperbus. What you will want tomorrow? CSI? DSI? I2C? UART? SPI? HDMI? VGA? Ethernet? SATA? WS28212? Wifi? Bluetooth? GSM? The beauty of propeller is bitbang, not peripheral hardware. I don't want to tie my design to specific pins. I want that pin assigment on PCB is 99.95% flexibile like on P1. No hardware peripherals on die.

jmg · 2015-04-06 15:44

potatohead wrote: »

Can hyperbus be done without licenses?

Seconded on DRAM. Needs to be possible.

Given 8-bit (dual chip) QuadSPI with DDR hardware covers most of the bases, you could even avoid using the HyperBUS name, if that really was a costly sticking point.
Once you have that HW, the address-serving choice of nibble or byte, is pretty much all you need for modest speed operation.
The P2 is unlikely to go fast enough to need the CLK return signal timing.

DRAM will always be possible, it comes down to how much HW layer support you need around RAS/CAS an what ultimate MHz you chase.

jmg · 2015-04-06 15:51

JDat wrote: »

Today you want hyperbus. What you will want tomorrow? CSI? DSI? I2C? UART? SPI? HDMI? VGA? Ethernet? SATA? WS28212? Wifi? Bluetooth? GSM? The beauty of propeller is bitbang, not peripheral hardware. I don't want to tie my design to specific pins. I want that pin assigment on PCB is 99.95% flexibile like on P1. No hardware peripherals on die.

P1 already has Counters and Video peripheral hardware on die, and Chip has indicated Smart pins are coming on P2.
Such HW peripherals are needed to avoid thrashing about in software, trying to do at modest speeds what HW can do much faster, and lower power.
Of course, if your needs are only modest, and you have COGs to burn, feel free to do it all in software.

JDat · 2015-04-06 16:20

Why hyperbus? Maybe somebody want more advanced counters for DRAM addressing and refresh or DMA, JPEG/MPEG encoder/decoder? Whatever. Maybe somebody want FPGA or ARM on P2 die? On FPGA you can to whatever you want, but, please, don't put Smile on die. Propeller is for high speed realtime process, not for linux or whatever OS. Don't make P2 as general purpose MCU. Regarding energy burning. Actually...
...it depends.

jmg · 2015-04-06 16:56

JDat wrote: »

Why hyperbus?

You really need to ask ?
There are other threads on this, but HyperBUS is really just a grown-up QuadSPI, with both Flash and RAM offerings.
It gives DRAM performance,with far fewer pins, and the P2 is going to be very pin-constrained to support full DRAM.
The QuadSPI chips are out there now, and very cheap, and are obvious targets to support in HW, given the low die costs and large performance gains over thrashing pins in software.
HyperBUS is a (very) small incremental step from QuadSPI support, and when releasing a MCU that you hope for design longevity on, you certainly need to look around at what cheap and small support chips are out there.
That is one reason i2c is dropped on P2, and SPI instead used for Boot.

Tubular · 2015-04-06 17:32

jmg, are there any likely second sources for hyperbus on the horizon?

Also any news on mipi sensewire?

JDat · 2015-04-06 17:54

Okey! Where are working examples of hyperbus? 30 sec-google shows only typical advertising style information, no 3rd party objective information. This looks similar to Audinate Dante protocol. 12 Year old proprietary and expensive system, who, only now (last 3 years), slowly integrating into existing products. Hyperbus is too young to be as all-accepted standard. By the time, if survive, when hyperbus will be accepted as standard, there will be Parallax P3 or P4 MCUs.

jmg · 2015-04-06 19:13

Tubular wrote: »

jmg, are there any likely second sources for hyperbus on the horizon?

ISSI have signed already, and I think freescale have MCUs coming, & I expect it to ramp quite quickly, given it is really SPI v2.
Of course, any HyperBUS support should not support only HyperBUS, it should have a nibble mode for QuadSPI address too, to cover the widest Advanced-Serial-mode parts.

rod1963 · 2015-04-06 19:41

I hope Chip isn't listening. The last time he listened to the gurus they turned the P-2 design into a fiasco.

koehler · 2015-04-06 20:50

Double-post removed

koehler · 2015-04-06 21:09

potatohead wrote: »

We all trashed on "tile" as some kind of silly thing. But the truth is, "tile" has some connotations associated with it that happen to be important in terms of power, silicon use, performance, etc...
Calling the COGS cores may well help with success. If so, great! I'm on board.
However, I've not really seen any compelling cases on that yet.

Computing 'tiles' may have been trashed by some, but that merely reflects either their igorance or fantasy POV of the Cog as somehow superioir. The idea/s of Tiles have been around longer than the Prop.
Persisting in calling them Cogs as somehow radically different than a Core is wrong and only help convince otherwise knowledgeable people to not bother with the Prop.
Most people have a decently tuned BS detector. Taking something as simple as a Core, and trying to repackage it as something esoteric a la Cog, is probably the first thing that sets of the BSD.
Next comes PASM or SPIN, with C in some funky *MM mode vaguely reminiscent at first glance to paging or bank switching to some with mid-long term experience. Yes, it may not actually be that, but the 2-3 minutes new eyeballs are going to spend on the site are all going to be skim and scanning.

Parallax already seems to have figured out that the Cog nomenclature is a not much of a boon, or at least last time I looked at something on the main site IIRC.

Nothing stopping you from calling them Cogs if you want too though

jmg · 2015-04-07 17:11

This latest update from Chip : (I added some bold)

cgracey wrote: »

Things are coming together well for the Prop 2.

Last night I got the new transfer/DDS block finished and I hooked it into the cog. It shuttles bytes/word/longs of I/O pin data to/from hub RAM at up to 32 bits per clock. It also drives DACs at those rates and performs DDS/Goertzel operations. It uses a 256x32 look-up RAM for outputting pixel-type and DDS/Goertzel data. All cogs can utilize the full bandwidth without affecting the others. I need to thoroughly test it now and then get onto the next things: hub execution (not much code, but challenging to think about), hub-based CORDIC (straightforward), and smart pins (not hard, but rather open-ended).

I hope that in two months' time, I'll have an FPGA image ready. We are making a final PCB for our Cyclone V -A7 board now. We've already proven it and developed its loader which uses 2Mbps FTDI USB serial talking to a Prop 1. It loads about 35x-70x faster than Quartus' built-in programmer (3 seconds to load straight into the FPGA, 6 seconds to load into flash for cold booting).

Our -A7 board will support all 16 cogs and 512KB hub RAM.

The DE2-115 will fit ~12 cogs and 256KB hub RAM.

All this memory and I/O bandwidth, plus hub exec, is going to be really fun to work with.

Some Altera links are broken, but this document from Dec 2014
https://www.altera.com/en_US/pdfs/literature/po/ss_quartussevswe.pdf
says Free WebEdition supports
Cyclone® V FPGAs: All (Excluding 5CEA9, 5CGXC9, and 5CGTD9)
Cyclone V SoCs: All
Which may help those budgeting for a Parallax A7 board.

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments