Is it time to re-examine the P2 requirements ???

Heater. · 2015-02-12 01:12

There is a very big tension in the Propeller design philosophy.

On the one hand it is intended to be as generic as possible. One might see it as an alternative to building custom logic or having to press an FPGA into service.

On the other hand the means to do this is by making it programmable in a normal software fashion. Writing Spin or whatever and PASM can be a lot easier for than designing/building logic circuits or fighting with VHDL/Verlog and the enourmously complex tools used for that.

The end result is that many will say that it sucks as an FPGA replacement, it's too small and too slow. And many will say it sucks as a micro-controller, too slow, no interrupts no peripherals etc, etc.

I believe XMOS has been suffering from this same tension. They are the only other guys I know building devices that spring from the same "let's make it generic and programmable", the "soft peripheral" philosophy with many cores and real-time determinism.

However last I checked XMOS had caved in and started adding junk to their new devices, including: UART, I2C, SPI, Smartcard, ADC, DAC, USB 2.0 and yes a complete ARM Cortex M3 CPU!!! I guess they just dropped a typical ARM SoC design into the XMOS silicon.

I gather some here would like to see that happen to the P2 as well. Well if anyone want's such a beast of a device I know where to get it already: http://www.xmos.com/products/silicon

Have fun : )

jmg · 2015-02-12 01:31

evanh wrote: »

I remember being very enthusiastic towards the hardware threads on the basis that combo drivers could be engineered around them very effectively. And later on when Chip was struggling to fit them to HubExec I was still supportive even though I didn't see great value in them being visible at that level.

Yes, hardware threads was a good idea, but you make a good point about HubExec - Threads are 'nice to have' at that level, but not as compelling as they are within COG.

Would threads that disabled or detuned the pipeline, to make the HW design simpler, still have a use ?

eg Let's suppose the Fetch MHz halved, but was assignable on a slot basis to 4 Program Counters, you would still have ~ 4x Prop1 in total, or 4 x Prop1's in a single COG, if they equally shared slots.
That still sounds useful ?

msrobots · 2015-02-12 01:49

@jmg,

I am a programmer, not a chip designer nor a EE. I started soldering shortly after I was loosing my eyesight and after my hands started to be somehow - hmm - unsteady?

So even thru hole stuff is challenging me. SMD? Well - not sure if I could handle that.

But I had way more fun programming the propeller as I had for a long time at work. I do quite simple projects with it. But they are giving me instant(?) reward. I am able to program a prop and use it.

The P1 is quite robust. I made a lot of mistakes while playing around with them and I was by now just able to destroy one pin on a Demo Board. Accidently abuse. My fault.

That is one board out of ~20 I used/installed. I am just a tinkerer. Not some high value customer of Parallax.

The simplicity of the Prop is drawing me in. I tried some other MCs, but - well - was pushed away by not really understanding even parts of them Data Sheets. I am just a Code Monkey not a EE.

Lets hope the P2 will be as easy to use.

Mike

evanh · 2015-02-12 02:15

jmg wrote: »

Would threads that disabled or detuned the pipeline, to make the HW design simpler, still have a use ?

It wasn't hard for Chip to make it slice cleanly within the Cog. The only immediate hiccup he had was that a Hub read would stall all threads. That might have to be the one sacrifice to keep it small.

kwinn · 2015-02-12 07:42

@ Mike Green

Yes, an interrupt would require some additional hardware, but in it's simplest form that would be very little. This is what I wished had been on the P1 for a couple of projects.

1- A control register that has the interrupt program counter and status bits. Nine bits for the program counter, 5 bits for the input pin number, 1 bit for interrupt enable, and 1 bit for interrupt state.

2- Instructions to enable, disable, and return from the interrupt. That could be a single instruction that writes the enable, interrupt, and start address bits to the control register. These bits would be in the s and d field of the interrupt control instruction.

On reset/power up interrupts would be disabled and the cog would work exactly as it does now.

Use of the interrupts would be very simple since there is no need to store state. When interrupt enable and interrupt are both 1 the interrupt program counter is used, otherwise the regular program counter is used.

Code for the regular and interrupt routines are written as a single pasm program and could even share data and subroutines.

Mike Green · 2015-02-12 08:50

It's a little more complicated than that, particularly with a pipelined processor ... not terrible, but the details can keep it from working. For example, if a JMPRET is executing when an interrupt occurs, the write of the incremented PC has to complete before the switch to the interrupt PC occurs. I think that's already handled by the pipeline and RAM write logic, but needs to be checked. Should there be separate flags registers for interruptible and interrupt states to reduce latency? It costs a couple of instructions to save and restore the flags otherwise. There's very little real code that doesn't use at least one flag once. There's still the question of what to do with the WAITxxx instructions when they're interrupted and the condition they're waiting for occurs during the interrupt processing.

Dave Hein · 2015-02-12 09:09

Mike, each of the issues you presented have a known solution. The effect on the pipeline is similar to what happens when a jump is performed. Instructions that are already in the pipeline would be invalidated, and a return address would be saved the points to the next instruction to be executed. The flags could also be saved along with the return address, which is similar to what Chip has already implemented for calls in the P2 FPGA image from a year ago.

The WAITxxx instructions would need special handling. Dedicated logic could be used to latch the WAITxxx event so that on return from interrupt the processor would miss the event.

kwinn · 2015-02-12 09:55

@msrobots

Right. P1 has 8 of them and P2 hopefully 16. So just use one of them as Interrupt controller. Even on P1 waitpne and friends can check for multiple interrupt pins at once while in low power state.

This cog then can either handle the need job by itself or 'outsource' it via mailbox to a another cog and returning to its idle state.

Not doable on any single core.

And you still have other execution engines working in parallel without interruption while your interrupt gets handled by the handler and eventually dispatched to some other parallel process.

All valid points provided you have enough execution engines for the tasks that are needed, or the timing requirement makes it possible to outsource it.

For what exactly do you need a interrupt again?

To add a very small task that had to respond to an input pin in 8.5uSec or less. Already had 8 cogs in use.

I do love the propeller for its unorthodox way of doing things. The main plan on it is to use parallel execution instead of a interrupted single core computation. Why you want to press this back into the mainstream, boring, complicated mess of interrupts and hardware registers for dedicated SPI, Hyper-bus, Quad-whatever?

It's not what the propeller is. Sure. Dedicated hardware for protocols and interrupts are what the mainstream MCs are doing. Almost all of them. If you think you need this in your life, do it. Use a PIC, ARM, Pentium, whatever. Why can't you let the propeller do it differently?

Why are you constantly voting for dedicated HW support for whatever protocol, interrupts and other 'mainstream features' when the main goal of @chip on the propeller was to NOT have those things in the first place?

NOT having dedicated hardware and NOT having interrupts and NOT having tons of HW-registers and Data Sheets and different Versions of the same stupid chip because of different HW and pins is exactly what the Propeller is about.

If you remove that from the propeller (or add, depend
ing on view) it is not a propeller anymore but just another MC like all of the other ones.

I'm not looking for dedicated peripherals, mainstream interrupt systems, or hardware dedicated registers. I like the general and unorthodox nature of the propeller. What I would like to see is generic hardware that makes it easier to create the standard peripherals in software. After all, we do have to interface with the rest of the world.

Quite confused!

Virtually everything in the propeller has similarities to mainstream systems. Registers, cpu, memory, I/O and instructions are similar. It is how they are combined that makes it unique.

Do some members suffer from interrupt phobia?

I don't see how adding a simple interrupt to every cog detracts from the cogs. All it would do is allow you to switch from one task to a second one based on an external signal. Not so different from the time sliced threads proposed previously. Not even all that far removed from the waitxx instructions, except that the cog can be doing something else instead of waiting. For some applications that could be a substantial increase in processing power.

potatohead · 2015-02-12 10:11

You got the task done right?

Had there been interrupts a whole pile of things may not have gotten done.

Now we know a lot more about using concurrent multiprocseeors. And those things are robust and easy to combine too.

Your use case required some effort. Lots of cases don't. And we have accumulated many more that are easy and effective too.

It is the way they are combined that makes a Prop unique.

There are always niche cases. Not all warrant a feature, given there are solutions. This is a basic complexity and dilution of the primary feature argument.

Lots of products end up messy due to this dynamic not well considered.

jmg · 2015-02-12 11:28

evanh wrote: »

It wasn't hard for Chip to make it slice cleanly within the Cog. The only immediate hiccup he had was that a Hub read would stall all threads. That might have to be the one sacrifice to keep it small.

I can see HUB accesses have to cause some interaction, as the slot is finite.

A full-on stall would be a problem, but a HUB queue would be tolerable (ie only HUB-HUB stall).
The Condition code field could even be used to give queue priority.

A possible alternative would be to have only one task (0?) in charge of HUB slots, but no stalls at all on the others.
(In those other tasks, HUB opcodes become NOPs)

Less ideal, but may simplify the workaround ?. It defaults to standard operation.

jmg · 2015-02-12 11:43

kwinn wrote: »

1- A control register that has the interrupt program counter and status bits. Nine bits for the program counter, 5 bits for the input pin number, 1 bit for interrupt enable, and 1 bit for interrupt state.

I would expand that a little with :
+ 2 bits for Edge / Level control LO,HI,_/= , =\_, maybe one bit for pending/new rules
+1 bit to the Pin field, to allow a choice of Counter events to trigger. This also allows Watchdog operation.

Addit: +1 global bit, that can be set/polled by any/all COGS for inter-COG sync tasks.

kwinn wrote: »

Use of the interrupts would be very simple since there is no need to store state. When interrupt enable and interrupt are both 1 the interrupt program counter is used, otherwise the regular program counter is used.

Once you have that, you have task switching hardware done..
eg fire that state toggle every other cycle, and you get 50% cycle sharing.

jmg · 2015-02-12 11:50

kwinn wrote: »

Do some members suffer from interrupt phobia?

I don't see how adding a simple interrupt to every cog detracts from the cogs. All it would do is allow you to switch from one task to a second one based on an external signal. Not so different from the time sliced threads proposed previously. Not even all that far removed from the waitxx instructions, except that the cog can be doing something else instead of waiting. For some applications that could be a substantial increase in processing power.

Then do not call it an interrupt, call it a hardware event activated task switch instead

With no stack, it is not quite your grandfathers' interrupt anyway.

Heater. · 2015-02-12 12:13

Any time anyone says they want an interrupt, what they mean is "I want a blob of code to be run when some event happens". What they are also saying is "I don't care if whatever background code is halted for the 10 or a 100 or 1000 instructions the interrupt routine takes".

We say: "Put that blob of code in another CPU and there you go."

They say: "I have run out of CPU's but I notice I have CPU that is not doing much and does not require low latency. Can't I have an interrupt to do what I want?"

I say "If you are in that situation then an interrupt would indeed fix it. BUT if you have the mechanisms in place to do interrupts: divert flow of execution, remember where you came from, return back there when the interrupt handler is done, have a second PC and flags etc etc THEN you might as well go the whole hog and have hardware time sliced threading in there"

Basically the idea is that there is no "background" and "interrupt handler". No, everything in an MCU is event based. Everything is an event handler. Threads of code waiting on events.

By the way, what is the deal with the 8.5uS latency requirement? I think Kye's Full Duplex Serial driver that works at 250K baud shows how to meet that requirement using WAITCNT and coroutines already.

jmg · 2015-02-12 12:31

Heater. wrote: »

I say "If you are in that situation then an interrupt would indeed fix it. BUT if you have the mechanisms in place to do interrupts: divert flow of execution, remember where you came from, return back there when the interrupt handler is done, have a second PC and flags etc etc THEN you might as well go the whole hog and have hardware time sliced threading in there".

I agree, any "hardware event activated task switch" solution has hardware that is a task-switch engine.
A few control bits adds cycle-level task switching.

Dave Hein · 2015-02-12 13:11

Cycle-level task switching requires zero-overhead hardware support. That is, there can be no performance hit associated with a task switch. However, if you only do 10,000 task switches per second you could allow for some task-switching overhead. A 100 MIPS processor could have a 10 cycle task-switch overhead, which would only take 0.1% of the processor.

evanh · 2015-02-12 13:15

kwinn wrote: »

I don't see how adding a simple interrupt to every cog detracts from the cogs. All it would do is allow you to switch from one task to a second one based on an external signal. Not so different from the time sliced threads proposed previously. Not even all that far removed from the waitxx instructions, except that the cog can be doing something else instead of waiting. For some applications that could be a substantial increase in processing power.

Make your argument for revisiting the threading implementation instead. There is opportunity for an improved version of what Chip had done previously. Real interrupts have very little chance of seeing any traction. That was true even in the earlier Prop2 design.

Waiting is good. It saves power and it responds predictably, very rapidly and doesn't have curly side effects like long latency instruction fetches or potentially hefty context switch. This is why I was happy when Chip dumped the earlier Prop2 design. The Cogs were getting so fat they were begging to be kept busy. I'm happy to say "Not any longer".

Heater. · 2015-02-12 13:18

So, the crux of the matter is: how to map external events onto code that deals with them. That is what we do in MCU's

Throwing a CPU at each possible event handles that nicely. Until you run out of CPUs.

Having a single CPU deal with multiple events must degrade performance of some code somewhere.

The nice way to do this is for all events and code to be symmetrical. There is no "back gound" and "interrupt handler", only code that is activated by events.

Ergo, no interrupts but thread slicing.

evanh · 2015-02-12 13:21

Dave Hein wrote: »

However, if you only do 10,000 task switches per second you could allow for some task-switching overhead. A 100 MIPS processor could have a 10 cycle task-switch overhead, which would only take 0.1% of the processor.

That's kernel level function. Totally different discussion. Not used seriously on the Prop1. Should it be treated serious on the Prop2?

Custom cooperative tasks in a single app can and will do it already. That's sufficient for real use.

EDIT: Also, once you go down that path you also want to start introducing memory protection/virtualisation and all the bloat and isolation that comes with it.

mark · 2015-02-12 13:59

Heater. wrote: »

Throwing a CPU at each possible event handles that nicely. Until you run out of CPUs.

It's not like this is just an issue with an OS-free uC. Eventually you run out of resources on *any* machine, no matter how powerful it is relative to something else. Sometimes we don't want to simply accept that something is not suitable for a particular job and argue "if only it had more ______" or "if only it had feature _____". Generic multitools are great in theory, but one is only preferable in a pinch over a more proper and focused tool. When working on something, I never chose the screwdriver in a swiss army knife over a proper screwdriver. It was a similar case with the Lockheed F-35, a plane that had to be everything to everyone, frequently being required to do things that were at odds with one another, and in the end wasn't particularly good at anything.

This is obviously why the processor market (including uCs and high performance CPUs) is as vast as it is. This is tough for small semiconductor companies who don't have the resources to do things the way the big players can. They need a differentiating, and unique (ideally in a good way) product for a segment of the market that was missed/ignored, and hopefully have it be profitable enough for them. Based on what we know if the current P2 design, it will no doubt be unique, and be able to do some things that probably no other uC or even big boy CPU can as simply and cheaply. It has no business being compared to a 50c PIC or AVR. It has no business being compared to some high-performance ARM-based application processor. It might be capable of doing some of the same things those others can, but the goal should always be to focus on doing what those others cant.

I don't think anyone here honestly expects the P2 to be the be-all-end-all of uCs. Is software polling inferior to hardware supported cog multi-threading or interrupts? I dunno. Maybe. In some ways. Whatever, it's irrelevant. Will the P2 be a flop just because it lacks those things? Hey, the P1 is still around and Parallax seems pleased with it, so that should be proof enough that a design lacking those features will not result in failure on the market.

David Betz · 2015-02-12 14:04

[QUOTE=mark

David Betz · 2015-02-12 14:09

I think it's great that there are so many ideas here about how to improve the Propeller. We have the P1 Verilog code. Why don't some of the people making these suggestions try to implement them as an extension of P1 so we can try them on for size? I suppose a feature that is fully fleshed out and tested is more likely to be considered for inclusion in P2 than one that is just some vague ideas.

Heater. · 2015-02-12 14:22

@mark

mark · 2015-02-12 14:23

David Betz wrote: »

I think the worry is that it will be compared to a 50c ARM micro controller which is probably a valid comparison.

Unless a 50c ARM chip is magically several orders of magnitude more powerful than an AVR, I don't see how that's true. Perhaps the ARM part might be able to do some things the P2 can't, it's more than likely that the reverse will be true as well. If you can't accomplish something even with ten 50c chips, what good are they to the engineer?

@heater

I didn't mean to actually direct it at you, I just wanted to quote what you said and expand on it.

Heater. · 2015-02-12 14:44

mark

David Betz · 2015-02-12 14:44

[QUOTE=mark

jmg · 2015-02-12 15:42

[QUOTE=mark

jmg · 2015-02-12 15:48

David Betz wrote: »

I think it's great that there are so many ideas here about how to improve the Propeller. We have the P1 Verilog code. Why don't some of the people making these suggestions try to implement them as an extension of P1 so we can try them on for size? I suppose a feature that is fully fleshed out and tested is more likely to be considered for inclusion in P2 than one that is just some vague ideas.

I'm sure that will happen.
The P1V is ported to ever-more FPGAs, and with the MAX10, it makes sense to deploy a P1 and a P1V.MAX10 in a design.

That will make it natural to have the P1V feature extended towards P2, so such a design cam migrate to P2 when that releases. There is already a list of counter extensions, using the unused bits, and once more details on the Smart Pins is released, they will form another extension template.
If I understood a earlier post of Ken Gracey right, he is even keen to have P2 verilog pieces tested on P1v platforms.

evanh · 2015-02-12 20:18

jmg wrote: »

... and do so at customer-expected speeds.

There is three groups of people:

1: Those that insist on standards being meet. There is no expectation of speed other than it meeting the spec. The spec will often allow for slower devices and slower devices indeed do exist.

2: Those that just want it to work. There is no expectation of speed other than getting the job done. The extra speed is often just a luxury but sometimes it's important. Not everything can be done with one solution. I'll take whatever comes to be.

3: Those that crow about comparisons just for a laugh. There is no expectation of speed other than comparing which is faster. Take USB for example, the top listed speed is usually double what is ever achieved by any host controller. No one complains because there is nothing that achieves better than half the full rated speed.

Heater. · 2015-02-13 07:26

Holy cow, look what I found!

No more discussing of P2 requirements for me. I'm going to blow all the money saved up for P2's on some of these https://www.96boards.org/products/

The The HiKey Board - 1/96 board.

8 cores, 64 bit, 1.2GhZ plus GPU plus a ton of interfaces. 130 dollars !

Software all supported by the wonderful Linaro folks.

Brian Fairchild · 2015-02-13 08:23

Heater. wrote: »

Holy cow, look what I found!

I've been trying to think of a reason not to buy one of these...

http://uk.rs-online.com/web/p/processor-microcontroller-development-kits/8194709/

The Parallella Epiphany III Desktop Computer

FPGA + dual Cortex A9s + 16 x 32-bit cores + a host of IO. Peak performance = 32GFLOPS

The (harsh) reality is that the world has moved on since the P2 was first mooted. By the time we have real production silicon in 2016 I'm not sure where it will fit.

Is it time to re-examine the P2 requirements ???

Comments