NXP produces multicore (dual Cortex M0 & M4) processor

Heater. · 2011-01-23 14:40

Leon,

ARM is the dominant architecture in 32-bit embedded applications.

Yep, been using them my self for along time.

As I said previously, ARM Cortex chips offer fully deterministic I/O and interrupt handling.

That may well be true up to some level of performance. But if it is up against the wall getting the timing right on some critical event with tight timing tolerances it is impossible to get it right for some other event as well. Enter the requirement for lots of cores. Or clocked I/O as XMOS has.

There isn't much point in using them with Propeller devices.

Is there really an ARM that will for example drive 32 independent pulse width modulated signals out at the same time. Or mix and match doing some of that with a bunch of other weird logic.?

There isn't much point in using them with Propeller devices.

[Yes I repeated that] Except when the ARM is running Linux as God intended and provides the IDE for the Prop(s) as well as taking care of any bulky application logic and networking etc etc.

I can't see the Propeller replacing FPGAs. XMOS chips were designed for that sort of thing, though.

Yeah OK. But I still see a Prop as a baby XMOS. It just hasn't got the speed to compete with XMOS or try to be an FPGA, but in principle its on the way. The Prop II will push it further in that direction.

RossH · 2011-01-23 14:42

Leon wrote: »

As I said previously, ARM Cortex chips offer fully deterministic I/O and interrupt handling. There isn't much point in using them with Propeller devices.

The Cortex appears to make interrupt times deterministic by artifically stretching EVERY interrupt service time to be the LONGEST possible interrupt service time (see here). Which means if you elect to enable this option then ALL your interrupts will be slower than normal to service. Given this, I wonder how many people actually elect to use this option?

Also, making the instruction and interrupt times deterministic is one thing. Making your program deterministic is another. Any interrupt will disrupt the normal flow of program control and give you non-deterministic program timing - unless the source of external interrupts is itself deterministic (in which case you don't really need interrupts) or you implement some additional synchronization mechanism (which both complicates and slows down your program).

None of this is necessary on the Propeller, which offers fully deterministic behaviour at the program level, not just the instruction level.

Heater. · 2011-01-23 14:45

RossH,

Yep. I think that is what I was trying to say. You put it much better.

Leon · 2011-01-23 14:53

However, the 13 clocks minimum latency mentioned for zero wait-states isn't much, when you have a device clocked at 100 MHz or 150 MHz.

RossH · 2011-01-23 14:57

Leon wrote: »

However, the 13 clocks minimum latency mentioned for zero wait-states isn't much, when you have a device clocked at 100 MHz or 150 MHz.

Then I suppose all ARM users enable this option by default. And they probably get very confused when their programs are still not deterministic.

Ross.

Leon · 2011-01-23 15:00

I doubt it. There aren't many applications where interrupt operation needs to be deterministic, most people just want it as fast as possible, It's often used with DMA, bypassing the CPU.

I once developed a large program for a medical instrument on an NXP LC2148 ARM7 chip that needed lots of digital I/O, several ADC inputs, timers etc. that would have been impossible on a Propeller. I didn't have any problems handling several interrupts (I didn't need them to be deterministic). The whole thing was written in C and developed on a low-cost prototyping board.

Heater. · 2011-01-23 15:05

Leon,

However, the 13 clocks minimum mentioned for zero wait-states isn't much, when you have a device clocked at 100 MHz or 150 MHz.

It does not matter how quick the time from interrupt signal to interrupt handler starting is. It also does not matter what the clock speed or MIPs of your CPU is.

Because:

1) These things get quicker all the time however if event A is being serviced NOW (and needs to be) and event B occurs which also needs to be serviced within a few instructions comes along then the system fails. The problem is always there but the time scales get shorter as technology progresses.

2) As Ross says, any event handled by an interrupt disrupts the deterministic timing of whatever was going on before the interrupt. There is no way around that.

That is unless you have multiple cores to dedicate to critical tasks. OR you have deterministic hardware threading support and things like clocked I/O to help you in extreme cases. As XMOS does.

Heater. · 2011-01-23 15:11

Leon,

I doubt it. There aren't many applications where interrupt operation needs to be deterministic, most people just want it as fast as possible, It's often used with DMA, bypassing the CPU.

I'm surprised to here you say that. That implies there is no point in the Prop having 8 COGs and there is no point in XMOS providing up to 4 cores plus all that nice deterministic thread scheduling in hardware and other I/O timing goodness.

DMA is fine. If you have a peripheral hardware block that is handling some I/O functionality. Then the pressure on interrupts is not so great. But as you know the idea of the Prop and XMOS is that you don't have any dedicated peripheral hardware blocks. Rather you have a bunch of free cores that can be programmed to be whatever peripheral you like.

Leon · 2011-01-23 15:36

I think that's the case with deterministic processing, most of the time it simply isn't needed, and designers feel at home with peripherals implemented in hardware. That is why conventional devices like the ARM are so widely used, and will be for the foreseeable future. XMOS is doing very well in markets where determinism is important, however, such as high-end audio processing. That is a niche market at present, but the technology will be very widely used in the next generation of home entertainment systems.

potatohead · 2011-01-23 15:53

They want it as fast as possible, because they don't get concurrent, multi-processing on those chips. The prop just isn't multi-core, it's a concurrent multi-processor. Symmetric too. Those are not trivial differentiators, nor are they useless. Secondly, Props have a built in "supervisor" type memory model. We've got virtual machines on these chips. Right now, they are just kernels executing code, and the complexity is high. But, as the scale grows, that's going to prove competitive.

What I see happening is a steady investment in concurrent multi-processing, that can begin to compare with the existing investment we've got in more common interrupt based systems. As that happens, these discussions will change. The scale of Prop I is small, but it's completely possible to have SPIN, C, PASM, etc... running concurrently now. That took a lot of investment and discovery. As has been written before, Propeller boot-strapping is still going on, where on the other more established designs, it's already happened. Because of that, some of the advantages mature designs have see a higher weighting than they would otherwise have in the future.

At some point, it's going to get really interesting. Take some C code, compile it, and load in binaries that fire off SPIN / PASM bits to do things, where those SPIN / PASM bits operate just as dedicated hardware does. Need a different functional spec? Great, just swap in "a new chip" virtually, and leave the C program unchanged, if desired. That kind of thing isn't so easy on every other device detailed here, because those devices have hardware dependencies the Propeller does not have.

And DMA? Why do we bypass the CPU? Because that CPU cannot do the job, that's why. Usually DMA is indicated where either concurrency or speed is required, often pushing the CPU to the side. It's interesting to contrast that with how Props do things. Adding those capabilities also adds complexity to the hardware design, which will add modes that need to be tested, etc...

Re: Xmos. I've said from the beginning, it's a ideal target for cable boxes, audio processors, and other streaming kinds of things.

I suspect Prop II will do well in small devices, TV game toys, instrumentation, etc... That's a nice, growing niche where there are lots of opportunities, and often a much lower barrier to entry, as opposed to being the next General Dynamics.

Finally, designers feel at home with dedicated peripherals NOW. IMHO, more or less constant supply sourcing and revision issues will continue to raise the benefit of software solutions, particularly as existing devices scale, and overall boot-strapping like we see here, continues to chip away at the entrenched means and methods.

Having to rev a board, or suffer a regression due to a device changing, or that cannot be sourced, isn't cheap. Software solutions compare very favorably, and can be built around more robust and common devices to avoid those issues. Say a product has a 10 year life. How many revs, what are the field support issues, inventory, tracking, and other costs associated with these things? High. High enough to warrant giving the idea of more software, less hardware more than passing consideration as time goes by.

Edit: If those things were not significant issues, multi-core / concurrent development would not be growing, now would it?

Cluso99 · 2011-01-23 17:58

The prop solves a lot of issues. No interrupts is a big point ans usually missed. I agree with heater that for a standalone device, an ARM & Prop together would be a good solution, but the ARM would just do the heavy lifting and leave the prop for the I/O. This is where the 32 I/O becomes an problem. The Prop II will solve that. Likewise, I do not see the propII becoming a device. That market is pretty much sewn up by the arm.

MS is late to the device market having made lots of attempts with WinCE. Renaming it W7xxx will not solve the bloat problems which underly this os. And Android is free and simpler than linux. Only MS cash and muscle and business methods (brute force) could overcome this, but I doubt it. Things have evolved, and the PC is no longer where it was. I see the revolution in the iPad (and equivalents) where you take your tablet with you. You sit it on a little frame and use a wireless keyboard/mouse and perhaps even a large LCD screen at workplaces, including a projected keyboard at times. This is a use anywhere technology that has only just begun, although Apple have tried many times before with things such as the Newton. Now, the things including internet, are in place. Believe me, this will be the way of the future, and 2 OS will be in the forefront, Apple's proprietary and Android.

Ale mentioned the pC clunkers. I have maintained for 10 years that the market was ripe for small boxes. Only Apple really responded with the mac mini, and what a lovely box it is. My mum uses a powerful Acer small box and it sits on top of the desktop instead of hidden away underneath. Except for games and servers, these clunkers are coming to an end of life.

Back to topic (although this has been fun)

The dual arm has already announced about 10 variants. We have 1 prop - yes I know we would like 3 or 4 (Prop 1, Prop II, Prop 1B 64I/O, cheaper Prop 1 with 16I/O & 2-4 cogs). Wedon't have to worry about pins because they can basically go anywhere. We don't have to worry about peripherals in our chip. Like I said in that comment, we can have 15 UARTs if we want, etc, etc. Currently we cannot do really complex devices like fully compliant USB, but hey, we can now do it thanks to Micah which can no doubt be improved. We really have not touched on what other purposes the video counter can do - I have just understood a bit more about it and it is a lot more useful than I had realised. The dual arm still communicates between then by interrupts although I suspect that this is not required because of common sram sections.

Anyway, my reply to the article was to enlighten them that the multicore micro is not new, and to enlighten the wider audience of this fact.

What I realised, is engineers look at the block diagrams to see what functions (peripherals) are in a chip. The prop does not show any, hence the perception there are none, so the writeup does not get a look-in. IMHO, we need a new block diagram which contains 8 blocks? titled "configurable peripheral set" and then small blocks showing what can be made.

Heater. · 2011-01-24 00:20

Cluso,

Agreed.

One nitpick:

Android is free and simpler than linux.

Don't forget Android is based on Linux as much a Debian or Ubuntu is.

Cluso99 · 2011-01-24 00:33

heater: I wasn't aware of that. It is a much cutdown version though, isn't it? It would certainly have made sense to use a lot of what was available.

While initially I was quite happy to pay what ms asked for their windoze os, lately the price to upgrade has become v.expensive. In fact, it is almost cheaper to buy a new laptop w W7 than an upgrade to W7. Office went the same way. IMHO, ms forced the open os by overcharging and abusing a monopoly, so I will be happy to see Android or apple's os succeed.

Heater. · 2011-01-24 00:59

Clusso,

The confusion is in the fact that Linux is NOT an operating system. Linux is a Unix like kernel.

If you put enough user land on top of the Linux kernel you end up with Debian or RedHat or whatever. All or which are commonly and incorrectly referred to as Linux.

Android is a Linux kernel with a totally different user land on top. Specifically built for small screen mobile devices. I guess it still has a lot of familiar libraries under the hood somewhere as well.

Batang · 2011-01-24 01:43

CSX700 Processor

http://www.clearspeed.com/products/csx700.php

Performance
250MHz core clock frequency
96 GFLOPS single or double precision
75 GFLOPS sustained double precision DGEMM
48 GMAC/s integer performance
9W typical power dissipation
192 Gbytes/s internal memory bandwidth
2 x 4 Gbytes/s external memory bandwidth
4 Gbytes/s chip-to-chip bandwidth

Features
Each CSX700 processor contains:
Dual independent Multi-threaded SIMD array processors (MTAP)
each containing 96 processing elements
PCIe x16 host interface
2 x 64-bit DDR2 DRAM interface with ECC support
256 Kbytes on-chip scratchpad memory
On-chip instruction and data caches
ECC protection on all on and off-chip memory
Host debug port
64-bit virtual, 48-bit physical addressing
On-chip DMA controller

192 high-performance processing elements, containing:
32/64-bit FP Multiplier
32/64-bit FP Adder
128-byte register file
6 Kbytes of high bandwidth SRAM
High speed I/O channel
Integer ALU and 16-bit integer MAC

Leon · 2011-01-24 01:50

It isn't much use for embedded applications, though.

Batang · 2011-01-24 01:54

Hi Leon,

True but it does illustrate that there is more to the multi core world than some would think.

Cheers

Edit: And since this thread has diverged from embedded applications, imagine what you could do video wise with it.

jmg · 2011-01-24 11:33

Cluso99 wrote: »

NXP produces multicore (dual Cortex M0 & M4) processor

Nice looking device, and the asymmetric nature of the Dual-Core is interesting.
As others have said, Peripheral I/O cores are not new, but being able to 'get at them' has not always been easy.

The other trend NXP (and others) are starting to support, which is highly significant for embedded apps, is Execute in Place from QuadSPI.
QuadSPI has Flash and NVSram, and is small and cheap (at least in flash

)

QuadSPI is currently over 100MHz, giving 50MBytes/sec memory bandwidth, which is plenty for most decision based code, and leaves on-chip RAM for the small/critical stuff.

I see Nuvoton offer an 8051cored PC I/O controller, now with Execute in Place from QuadSPI, so the idea is already in embedded.
To my thinking, Prop 2 really has to offer EiP from QuadSPI support.

Heater. · 2011-01-24 12:15

jmg,

I have heard of SPI FLASH before, and there are those here executing various kinds of LMM or byte codes from SPI FLASH on the Prop.

QuadSPI is new to me and sounds like it's exactly what these Prop interpreters need in order to get large Programs into external memory whilst conserving Prop pins.

Has any one looked into using QuadSPI for the Prop? Have I missed something here?

http://www.eetasia.com/ART_8800475283_499486_NP_c4c2800a.HTM

Bill Henning · 2011-01-24 13:08

I've had some W25Qxxx parts sitting beside me waiting for me to have time for them for a while... no time yet :-(

Heater. wrote: »

jmg,

I have heard of SPI FLASH before, and there are those here executing various kinds of LMM or byte codes from SPI FLASH on the Prop.

QuadSPI is new to me and sounds like it's exactly what these Prop interpreters need in order to get large Programs into external memory whilst conserving Prop pins.

Has any one looked into using QuadSPI for the Prop? Have I missed something here?

http://www.eetasia.com/ART_8800475283_499486_NP_c4c2800a.HTM

RossH · 2011-01-24 14:08

Heater. wrote: »

I have heard of SPI FLASH before, and there are those here executing various kinds of LMM or byte codes from SPI FLASH on the Prop.

QuadSPI is new to me and sounds like it's exactly what these Prop interpreters need in order to get large Programs into external memory whilst conserving Prop pins.

I'd not heard the term "Execute In Place" before, but this what Catalina is currently doing on the Morpheus and the C3 (using "Standard SPI") . Surprisingly, the performance is not as awful as you might expect - provided you use Sequential (or Continuous) mode.

"Quad SPI" would presumably be around four times faster, at the cost of 3 more pins. However, this might be a reasonable compromise.

Ross.

jmg · 2011-01-24 17:22

Expanding these SPI options a little, there are SRAMs too, lowish speed, low price, SPI 8KB/32KB SRAM from OnSemi / Microchip.

I see Cypress are also adding SPI RAM, but as NV SRAM (I guess chasing FRAMs?) - not as cheap, but the brave could mix Code/Data ?

Cypress is launching a new second generation serial nvSRAM in Q3’10. This device, available in densities from 64Kbit to 2Mbit, will support both SPI and I2C interfaces. Some of the key features of the product :

· Low active and standby current
· New deep sleep mode
· 40MHz and 104MHz SPI interface
· Upto 3.4MHz I2C interface
· 1.8V IO interface
· Standard SOIC8 package
· Integrated RTC
· Multiple voltage options
· And much more.

They do not explicitly mention QuadSPI, but they are still in the SO8 footprint & 104MHz is a good speed.

Cluso99 · 2011-01-26 18:02

Interesting to see that the latest newsletter has the reply quotes, so the Prop effectively gets a mention as being around for over 5 years. Nice

Toby Seckshund · 2011-01-26 23:22

Heater

On the link "http://www.eetasia.com/ART_880047528...P_c4c2800a.HTM" the ads are running something about Solar fuses. Judging by the cold wind coming from your direction, again, I think it may bave blown.

Heater. · 2011-01-26 23:39

Toby, you'll have to fix that link, I have no idea what you are talking about:)

My father, a Czechoslovakian, used to say "No good wind blows from the east" but I don't think he was referring to the weather.

.

K2 · 2011-01-27 07:11

Ross, the last paragraph of your post on the other site was perfect! Good job.

Leon · 2011-01-27 07:47

I just gave XMOS a plug, to provide a bit of balance.

Cluso99 · 2011-01-27 14:58

Great Leon. This is a perfect avenue to let the world know there are other multicore micros available. Neither the Propeller nor Xmos seem to get much press!

Leon · 2011-01-27 15:16

XMOS does quite well, actually. Here is a piece in the Financial Times:

http://www.ft.com/cms/s/0/bd45e1d0-28b3-11e0-aa18-00144feab49a.html#ixzz1CHUVMICn

It's our equivalent of the Wall Street Journal. Registration is required to read it, but here are the first three paras:

Small chipmaker’s big ambitions

Under the solemn gaze of a life-size toy moose and rhinoceros, David May and Terry Leeder are explaining their plans for turning the small business started by Prof May into a big part of the global semiconductor industry.

They are the driving forces behind a British company that has developed what many regard as the ultimate – but ever elusive – aim of the industry: a “universal” product that can be programmed in a huge number of ways to suit almost any application in fields from industrial motors to hand-held music players.

“Welcome to the Department of Hard Problems,” says Prof May, a jovial 59-year-old who is regarded by many as one of the world’s top microchip designers, indicating the faintly down-at-heel headquarters of Xmos, in Bristol.

Heater. · 2011-01-27 18:18

Not to worry. ARM is soon up to four cores http://www.eetimes.com/electronics-news/4212605/Four-core-ARM-A9-to-run-Sony-game-console?cid=NL_EETimesDaily

Not that we will be dropping them into our hobby projects any time soon.

NXP produces multicore (dual Cortex M0 & M4) processor

Comments