He, he, no way am I suggesting abandoning the COG. I like the COG. I like multiple COGS. Lot's of them.
I do worry that the P2 instruction set and registers has become too complex, The P1 was a joy to program in PASM. About the simplest assembler language I have ever used. I guess the thing is I'm kind of done with having to learn yet another instruction set, bunch of mnemonics and register layouts etc. been round that loop too many times. I'm hoping others will get those trick high speed drivers the P2 will need written.
The COG has one great advantage over regular processors. It can avoid the shared memory bottle neck you run into when multiple processor share RAM. It can run code entirely within it's own isolated space, deterministically, at full speed.
The RISC-V is another world, but this thread is about "the the state of micrcontrollers these days" and I believe RISC-V is growing to be a big thing in that space. Hardly noticeable now but being worked on by a lot of players large and small. They want to get away from proprietary ISA shackles and expense.
RISC-V is very interesting for academic use or if you want to implement your own CPU in an FPGA. No license problems, a scalable Instructionset and free C/C++ toolchains.
But I don't see the advantage for the user of Microcontrollers. Risc-V based contollers will do the same as ARMs or PICs, presentations from manufactors show even the same internal busses, so they can reuse the ARM peripherals. The silicon area will be a bit smaller and there are no license costs, but the small ARMs are so cheap now that these can not have a big effect.
I don't see why a RISC-V is another world compared to a Propeller cog. You can have exactly the same local Instruction memory per core, and only the Load and Store instructions access a hub-ram. A RV32I set is more leightweight than a P1 cog, but it misses alot of what P2 cogs can do, so - no - I don't suggest that Chip should start over..
Here is an interesting talk about a huge parallel processor on a FPGA with a lot of RISC-V cores. The clusters are quite similar to 8 cogs with a shared 32k hub (=P1), then many clusters are connected by a network on chip.
Cool, yep, hierarchical linkages does seems superior to a flat grid that has been common for massive parallel computing.
Looking at the cluster block diagram, as you say, it really is very like an eight Cog Prop2 without the Smartpins. The eight processing elements, a cross-bar switch to the cluster RAM and the accelerator being like the Hub's CORDIC solver. I'm actually surprised they aren't caching instructions from the cluster RAM.
Of course they are aiming at a much more data rich target than the Prop by putting so many in such a tight space.
Ah, I understand why the instruction RAM is not a cache. It prevents data duplication in the precious FPGA resources, not to mention the more complex implementation required.
The ideal will be that external DRAM is still the main RAM for both program and data. And the cluster RAM is intended to be fully usable for data only. I'll give them points for staying focused there.
There is certainly a lot of interest in RISC-V in the academic community. They need a open standard to work to. They are fed up with work becoming obsolete when architectures disappear, DEC Alpha was a favourite, it is no more, for example.
But I don't see the advantage for the user of Microcontrollers. Risc-V based contollers will do the same as ARMs or PICs, presentations from manufactors show even the same internal busses, so they can reuse the ARM peripherals. The silicon area will be a bit smaller and there are no license costs, but the small ARMs are so cheap now that these can not have a big effect.
From what I' reading around there is a lot of interest in RISC-V from hardware developers. If you are a small outfit and you want to get some funky custom device made with a processor on it what can yo do:
1) License an ARM core. What I'm reading is that this may well be cheap but not cheap enough for many. That a worse problem is simply the time it takes to get a license. Besides what if you plan would benefit from a customized instruction set with some little extensions. No go with ARM. Same can be said of MIPS cores apparently.
2) Use one of the many open source cores that exist already. Open RISC for example. There you are out on a limb with something rather non-standard.
3) Hire a bunch of CS grads and design your own CPU architecture. This is a lot of work, expensive and fraught with problems and risks. I thing the P2 development demonstrates these issues well.
4) Get together with a world wide community and adopt a standard. A standard that will be around and supported for a long time. For which Free and Opensource implementations exist.
RISC-V is being designed to be scalable from small MCU's to big Linux running machines. From 32 to 128 bits!
This is why there is a lot of interest from commercial chip builders as well as academics.
One could have used similar arguments about why developing/adopting Linux was pointless back in the day. Why do that, Windows is available, it works, it's cheap enough not to be a hindrance. There are deeper arguments as to why to do it. Arguments about ownership, self determination, independence, security. Such that Linux is now everywhere. I believe the same arguments apply to hardware and RISC-V is a step in that direction.
You are right, there may be no advantage to MCU users. They continue to use C/C++ as they do. They may have to tackle a new assembly language. Hope fully that do get to experience the advantage of cheaper stuff that gets devloped quicker.
I don't see why a RISC-V is another world compared to a Propeller cog. You can have exactly the same local Instruction memory per core, and only the Load and Store instructions access a hub-ram.
I'm not sure I follow you there. I RISC has registers, 64 is it for the RISC-V?, and it has RAM. It has to fetch instructions from RAM, it has to use load store instructions to get data from RAM. TYpically multiple-cores share RAM which is a bottle neck.
The COG can run independently with all it's registers/RAM/data/instructions in the same place.
Perhaps what you suggest is workable. I don't see it yet.
You're kind of agreeing with one another. Ariba pointed out an impressively tiny example where RISC-V cores are being applied in a Prop like manner specifically for FPGA target deployments.
I agree about the P2-Hot cog performance. MAC was at least 6x faster per clock, with auto pointer updates, not to mention higher resolution.
Current P2 CORDIC is superior, though, along with egg-beater hub RAM, streamer, and smart pins. Plus 2x the cogs.
Putting all the best parts together would be nice, but would require a denser process. What we have now is more energy efficient.
I think P2 has moved in a better direction with Smart Pins.
6x MAC is only relative, and other chips will always have better MAC, so trying to compete in the numeric-sandpit is risky and short-lived.
P2 now targets the Smart IO area, which I see as having longer design-lifetimes, and is actually helped by advances in other areas.
eg RaspPi level OS hosts, that have MIPs to burn, but are rather poor at real time IO.
In the more general MCU space, there is a growing move to use more controllers, as designers find they cannot actually get one chip that does all they need, or get all they need done on one chip.
I think designers generally are getting more used to multiple suppliers too. 10 years ago, most tended to start with one Core, and stick with it - not so much today.
Bill's post above is an example, where he lists quite a few cores, all under active design.
All this combines to make take-up prospects of P2, look quite good.
(eg It's surprising how many 32 bit MCU have only 16b timers ?!)
- Is there an increasing sense of liberation using modern microcontrollers, or do you sense a long arm of increasing control?
microcontrollers: opening up
sbc's: opening up slowly, except binary GPU blobs
x86 pc's: closing tighter every minute
Intel makes an interesting example.
Their high end PC Processors, target a small number of clients, taking large volumes, and with relatively short design life-cycles.
- but Intel are also making stuttering steps back into the Microcontroller space they used to be a big player in.
They seem to still re-learning what is needed to play in this space, but will be worth tracking.
eg I've been looking at their D2000, (32KF, 8KR, $2.30/1k) and the Eval Board ($14.95) has been sold out at Mouser for a while, but their delivery dates do shows spare and booked stats :
1,983 On Order Estimated Ship Date 283 5/13/2016 1,700 5/23/2016
That 1,700 drop is new, and shows just how quickly these are selling.
Data look ok, but is patchy in coverage. Coms and maths look ok, MHz and Timers more ordinary.
(eg It's surprising how many 32 bit MCU have only 16b timers ?!)
I don't think the demand is there, it's not that they can't do 24 or 32 bit they can. There probably isn't that much of a demand outside of certain industries like the automotive and industrial sectors. That's were you see the multi-core offerings from Infineon and Freescale loaded to the gills with all sorts of I/O options. Even there 24 bit seems to be the most common with Freescale's ETPU engine.
(eg It's surprising how many 32 bit MCU have only 16b timers ?!)
I don't think the demand is there, it's not that they can't do 24 or 32 bit they can. There probably isn't that much of a demand outside of certain industries like the automotive and industrial sectors. That's were you see the multi-core offerings from Infineon and Freescale loaded to the gills with all sorts of I/O options. Even there 24 bit seems to be the most common with Freescale's ETPU engine.
16 bits is just too small, and juggling a prescaler and a 16b timer is possible, but quite constraining.
NXP is one of the few who designed 32b timers and 32b prescalers.
Nuvoton use a 24b timer and 8b prescaler.
Infineon tend to be good on timers, but they favour 16b with cascade options, which means you consume a pair of timers, for each 32b channel.
Many others seemed to be lazy, and cut & paste smaller MCU peripherals, because they have those on file ...
Or maybe their process is so trailing edge, the larger counters cannot clock at full speed ?
Not really the right way to move technology forward, is it ?
Given the 32b ALU, why limit the peripherals ?
Comments
He, he, no way am I suggesting abandoning the COG. I like the COG. I like multiple COGS. Lot's of them.
I do worry that the P2 instruction set and registers has become too complex, The P1 was a joy to program in PASM. About the simplest assembler language I have ever used. I guess the thing is I'm kind of done with having to learn yet another instruction set, bunch of mnemonics and register layouts etc. been round that loop too many times. I'm hoping others will get those trick high speed drivers the P2 will need written.
The COG has one great advantage over regular processors. It can avoid the shared memory bottle neck you run into when multiple processor share RAM. It can run code entirely within it's own isolated space, deterministically, at full speed.
The RISC-V is another world, but this thread is about "the the state of micrcontrollers these days" and I believe RISC-V is growing to be a big thing in that space. Hardly noticeable now but being worked on by a lot of players large and small. They want to get away from proprietary ISA shackles and expense.
EDIT: Oops, read my previous post for this to make sense - http://forums.parallax.com/discussion/comment/1374004/#Comment_1374004
That's something missing from the forum now, the reply number of the post.
But I don't see the advantage for the user of Microcontrollers. Risc-V based contollers will do the same as ARMs or PICs, presentations from manufactors show even the same internal busses, so they can reuse the ARM peripherals. The silicon area will be a bit smaller and there are no license costs, but the small ARMs are so cheap now that these can not have a big effect.
I don't see why a RISC-V is another world compared to a Propeller cog. You can have exactly the same local Instruction memory per core, and only the Load and Store instructions access a hub-ram. A RV32I set is more leightweight than a P1 cog, but it misses alot of what P2 cogs can do, so - no - I don't suggest that Chip should start over..
Here is an interesting talk about a huge parallel processor on a FPGA with a lot of RISC-V cores. The clusters are quite similar to 8 cogs with a shared 32k hub (=P1), then many clusters are connected by a network on chip.
Andy
Looking at the cluster block diagram, as you say, it really is very like an eight Cog Prop2 without the Smartpins. The eight processing elements, a cross-bar switch to the cluster RAM and the accelerator being like the Hub's CORDIC solver. I'm actually surprised they aren't caching instructions from the cluster RAM.
Of course they are aiming at a much more data rich target than the Prop by putting so many in such a tight space.
The ideal will be that external DRAM is still the main RAM for both program and data. And the cluster RAM is intended to be fully usable for data only. I'll give them points for staying focused there.
There is certainly a lot of interest in RISC-V in the academic community. They need a open standard to work to. They are fed up with work becoming obsolete when architectures disappear, DEC Alpha was a favourite, it is no more, for example. From what I' reading around there is a lot of interest in RISC-V from hardware developers. If you are a small outfit and you want to get some funky custom device made with a processor on it what can yo do:
1) License an ARM core. What I'm reading is that this may well be cheap but not cheap enough for many. That a worse problem is simply the time it takes to get a license. Besides what if you plan would benefit from a customized instruction set with some little extensions. No go with ARM. Same can be said of MIPS cores apparently.
2) Use one of the many open source cores that exist already. Open RISC for example. There you are out on a limb with something rather non-standard.
3) Hire a bunch of CS grads and design your own CPU architecture. This is a lot of work, expensive and fraught with problems and risks. I thing the P2 development demonstrates these issues well.
4) Get together with a world wide community and adopt a standard. A standard that will be around and supported for a long time. For which Free and Opensource implementations exist.
RISC-V is being designed to be scalable from small MCU's to big Linux running machines. From 32 to 128 bits!
This is why there is a lot of interest from commercial chip builders as well as academics.
One could have used similar arguments about why developing/adopting Linux was pointless back in the day. Why do that, Windows is available, it works, it's cheap enough not to be a hindrance. There are deeper arguments as to why to do it. Arguments about ownership, self determination, independence, security. Such that Linux is now everywhere. I believe the same arguments apply to hardware and RISC-V is a step in that direction.
You are right, there may be no advantage to MCU users. They continue to use C/C++ as they do. They may have to tackle a new assembly language. Hope fully that do get to experience the advantage of cheaper stuff that gets devloped quicker. I'm not sure I follow you there. I RISC has registers, 64 is it for the RISC-V?, and it has RAM. It has to fetch instructions from RAM, it has to use load store instructions to get data from RAM. TYpically multiple-cores share RAM which is a bottle neck.
The COG can run independently with all it's registers/RAM/data/instructions in the same place.
Perhaps what you suggest is workable. I don't see it yet.
Sorry I've been MIA, but I've been working on other products.
I've been spending time in ARM land, PIC's, AVR's and others.
ARM SBC's: Full linux, HDMI, USB etc, for as little as $5
PIC/AVR/etc: very low cost
huge data sheets, zillion unique peripheral blocks, (mostly) single core
on the SBC's: not deterministic
More peripherals, more memory, lower cost
Generally opening up.
microcontrollers: opening up
sbc's: opening up slowly, except binary GPU blobs
x86 pc's: closing tighter every minute
SBC's: yes
uC's: not really, it is work
pc's: no
I think in a way its lucky - you have not been polluted and kept the prop vision
I am REALLY looking forward to it! Can't wait to play with silicon.
I have not had time to load my DE2-115 or Nano's, as I have no time to play
Since you asked...
I drop by every week or two to see the P2's progress.
I love the USB, 16 cores, 512KB, bigger LUT's etc.
I still liked the HOT design more, purely due to better performance with the caches, and threading.
Having said that, I am very likely to make a few P2 based boards, but frankly, the cost of the P2 can't be too high.
Why?
Only low volume - high price designs will be able to afford to design it in then.
I'll keep checking back every week or two... now I am off to work on some new designs.
FYI - I am considering two more P1 based designs, but the high cost of the P1 huts on BOM costs, and limits the potential market.
I see the P2 positioned as the "ultimate I/O coprocessor".
Potentially, I could see a P3 / P4 as a general purpose CPU, but that is far in the future...
Nice to see you pop up on the forum Bill, it's been a while.
I agree about the P2-Hot cog performance. MAC was at least 6x faster per clock, with auto pointer updates, not to mention higher resolution.
Current P2 CORDIC is superior, though, along with egg-beater hub RAM, streamer, and smart pins. Plus 2x the cogs.
Putting all the best parts together would be nice, but would require a denser process. What we have now is more energy efficient.
I think P2 has moved in a better direction with Smart Pins.
6x MAC is only relative, and other chips will always have better MAC, so trying to compete in the numeric-sandpit is risky and short-lived.
P2 now targets the Smart IO area, which I see as having longer design-lifetimes, and is actually helped by advances in other areas.
eg RaspPi level OS hosts, that have MIPs to burn, but are rather poor at real time IO.
In the more general MCU space, there is a growing move to use more controllers, as designers find they cannot actually get one chip that does all they need, or get all they need done on one chip.
I think designers generally are getting more used to multiple suppliers too. 10 years ago, most tended to start with one Core, and stick with it - not so much today.
Bill's post above is an example, where he lists quite a few cores, all under active design.
All this combines to make take-up prospects of P2, look quite good.
(eg It's surprising how many 32 bit MCU have only 16b timers ?!)
Their high end PC Processors, target a small number of clients, taking large volumes, and with relatively short design life-cycles.
- but Intel are also making stuttering steps back into the Microcontroller space they used to be a big player in.
They seem to still re-learning what is needed to play in this space, but will be worth tracking.
eg I've been looking at their D2000, (32KF, 8KR, $2.30/1k) and the Eval Board ($14.95) has been sold out at Mouser for a while, but their delivery dates do shows spare and booked stats :
1,983 On Order Estimated Ship Date 283 5/13/2016 1,700 5/23/2016
That 1,700 drop is new, and shows just how quickly these are selling.
Data look ok, but is patchy in coverage. Coms and maths look ok, MHz and Timers more ordinary.
I don't think the demand is there, it's not that they can't do 24 or 32 bit they can. There probably isn't that much of a demand outside of certain industries like the automotive and industrial sectors. That's were you see the multi-core offerings from Infineon and Freescale loaded to the gills with all sorts of I/O options. Even there 24 bit seems to be the most common with Freescale's ETPU engine.
NXP is one of the few who designed 32b timers and 32b prescalers.
Nuvoton use a 24b timer and 8b prescaler.
Infineon tend to be good on timers, but they favour 16b with cascade options, which means you consume a pair of timers, for each 32b channel.
Many others seemed to be lazy, and cut & paste smaller MCU peripherals, because they have those on file ...
Or maybe their process is so trailing edge, the larger counters cannot clock at full speed ?
Not really the right way to move technology forward, is it ?
Given the 32b ALU, why limit the peripherals ?
Yep, too long. I missed this place...
Re: P2-Hot
Maybe after the new P2 is available as a chip, a P3 could be based on the best of P2-Hot and P2-LP (low power) at 90nm or smaller...