The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

mmm · 2015-07-20 04:51

Heater, being able to reduce complexity is a use case in itself. If you cannot reduce the scope of code that you're running, then your trusted computing base (TCB) is the entire code that you're running. As I said earlier, my use case may be none of your use cases and vice versa.

potatohead · 2015-07-20 05:00

Protected modes are way out of scope for this design.

We know how to do this stuff well and we know it works well from our experience on P1.

Peter Jakacki · 2015-07-20 05:05

Rather than saying this later than sooner I will say it sooner than later or else sooner or later we are going to be in the P2-hot position again. We had lots of really good ideas last time but each idea introduced further complications making the P2 silicon target drift further and further away until it became impractical to manufacture into silicon.

One of these great ideas is the simple unnested interrupt scheme but most of these other ideas are fine for P2-soft or P3 or an ARM but not for P2-2015. Just think like this, any current design in silicon is better than great ideas but no silicon and yet what we pretty much have now is really great and has a chance of becoming silicon maybe even this year???

However if this kitchen gets too crowded again there is probably a pretty good chance we will go hungry for another year, or worse, some of us hungry cats could find another home where we can get our fill.

Heater. · 2015-07-20 05:13

mmm,
I don't get the idea. This is a micro-controller we are talking about. I write the code, I run it, I trust it. Well, as much as I can trust myself not to put bugs in there.
Or perhaps I grab some modules that others have written and add them to my project. I can trust that as well, I have the source, others will have used and abused it and shaken out any bugs here.
Finally, lets say some buggy/malicious code does get into my system when I build it. Well then confining it's actions to a sand box is not much use. It has to be able to communicate with my code. As soon as I allow that it has the means to thwart my application.
This is not a general purpose computer on which one will be loading and running random binary blobs of code of unknown provenance that may be out to do bad things to peoples systems.
Now, I will admit that having limits on assessable memory and throwing a fault trap when they are hit may help in debugging code. I don't know of any MCU's that do that.
What actually is your use case for this feature?

mmm · 2015-07-20 05:15

Cluso99, you may not use the chip in a workstation computer, but nothing should prevent you from doing just that. Didn't Chip even mention, that the new interrupt features "allow a whole console computer with video, keyboard, mouse, and RTC to operate from one cog"?

Also, please note that it is not my intention, to add to complexity or power comsumption here. While I said, that I lack the intuition to judge, what adaptions would be necessary to provide the suggested functionality, I uttered my speculation, that this should be easy and essentially doesn't add functionality at all but rather just disables features that are already in. So, naive as I am, I don't expect any impact on power consumption, should this restricted mode be added. If you tell me, that this is a complicated addition that takes two weeks of full-time thinking to implement, adds 10 % to cog complexity or raises power comsumption to the roof, well, I'd be among the first to say "don't add it"!

In the same vein, I'd even like to propose more possibilities to restrict functionality. I don't know, if it is possible, but I could imagine, that it might save power, if the CORDIC could be completely switched off or if hub memory could partially be switched off in cases, where it is known, that only 256 KB of it would ever be accessed, for instance. Would instructions for such functionality add complexity? In my view, taking features away doesn't count as "adding complexity". But I have admittedly no clue about the electrical implementation required to provide such functionality.

Heater. · 2015-07-20 05:36

Let's talk about protected modes for a moment...
Back in the early 1980's Intel introduced the 286 microprocessor. It's major feature was an MMU and "protected" modes. It's 16 bit processor could make use of lots of RAM thanks to that MMU and it could isolate memory between threads or processes. It had page tables and segments to let you do all that. At the time the company I worked for, Northern Telecom, started to find bugs in some of the 286 instructions.One that I recall is that on a 286 if you do a multiply by an immediate value that happened to be negative you got the wrong result! Eventually we got, under non-disclosure, a huge document from Intel that described all the bugs in the 286 that they knew about. Sure enough that multiply bug was in there. 90% of the other 286 bugs described were all about the protected modes and how they could be circumvented! Now, whilst you are musing such process isolation, what about restricting access to the IO pins as well?Plenty of scope for buggy/malicious code to mess you up there.
All in all, I think that if one had some spare transistors to play with there are many other features that would be useful to implement before thinking about memory protection that nobody is ever going to use. If you need that get an ARM!

mmm · 2015-07-20 06:00

Maybe I made the description of what I wanted too complicated, and thus such heated rejection, although I thought it was the most simple thing to do. Let me clarify again:

(1) I don't really want to add anything. I want to be able to take away from what's already in.

(2) If this feature cannot be implemented in an afternoon, then don't think any further about it. Just leave it out!

potatohead, the proposed feature has nothing to do with x86 protected mode. This is not about running code in different modes on the same cog. Instead, it is something that is set per cog, and if set, takes away features that are *already* in the cog.

Same at Peter Jackaki, the proposal is not really about adding something, it's about restricting something. Your warnings are generally right, though.

Heater, I don't know what happens in the current design, when the large hub is accessed out of bounds, but I thought I didn't propose any trap, just that any restricted cog should only see a virtual hub space from 0 to 32 KB, which physically starts at an offset of cog number x 32 KB (assuming numbered cogs from 0 to 15). Whether out-of-bounds access attempts wrap within the virtual 32KB or trap in some way doesn't matter, I think. My use case is, to use it the SoC or CPU style, as you might call it. You don't know of an MCU that supports memory protection, not even in the trivially simple poor man's way that I'm proposing? Fine, a good reason to equip this one with this unique selling propsition. It doesn't add anything that isn't already in, I believe. And what alternative would you suggest? That I use a more complex chip although I want a less complex chip?

jmg · 2015-07-20 06:04

I uttered my speculation, that this should be easy and essentially doesn't add functionality at all but rather just disables features that are already in. So, naive as I am, I don't expect any impact on power consumption, should this restricted mode be added. If you tell me, that this is a complicated addition that takes two weeks of full-time thinking to implement, adds 10 % to cog complexity or raises power comsumption to the roof, well, I'd be among the first to say "don't add it"!

Yup, it is a complicated addition that will consume logic, and impact speed.

What you call ' just disable' needs a run time check, and that == ++Power and -- Speed
Likewise pin needs to check some allocate table, before they act.
All that extra mapping configuration of 'legal access' is costly to design in, and costly to test.

On the other hand, I can see a lot of benefits with the HW Vectors as added, especially as Threads are now gone.

However, it may be simple to trap out-of-range address on a large boundary (eg 512k) , as that simply tests upper address bits. That would support (very) large code with auto-location.

Much less complex than your allowance-mapping request for Pins + Memory.

mmm · 2015-07-20 06:10

Heater, thanks for your excursus to x86 protected mode. Great example of what my proposol is not about.

Also your suggestion about restricting access to the IO pins. I thought, I mentioned this in my original proposal. Yes, cogs in restricted mode cannot access any pins either. You might have overread this detail.

mmm · 2015-07-20 06:14

jmg, well, if that is so, I guess, it should be left out then. Didn't think that so much would be involved with it.

Heater. · 2015-07-20 06:21

mmm,
What you describe as "taking away" is actually "adding". As a minimum one would have to add some means to enable and disable this feature as required. That is additional register bits to be provided or even special instructions.
Who controls this anyway? Clearly you can not have all COGS able to disable this RAM restriction at will,that would defeat the point. That means having some "master COG" that sets up memory access for all the others, there we have a bucket of complexity straight away.
What addresses does a COG running in the restricted mode see? Do they get mapped from whatever space in memory to a range 0 to 32K that the COG sees? That is more complexity. Or does the COG see the actual physical address but cannot step outside it's 32K range? Again more logic to do all this.
How do these restricted COG communicate between each other?
As for "unique selling point", what is the point? Who would use it? For what?

mmm · 2015-07-20 07:08

Heater, I suggest that you reread my original proposal. All your questions have already been addressed in my earlier posts. But let let me answer them again:

Yes, some means to enable and disable restricted mode would obviously have to be provided. In that sense, it is an addition. You might certainly need to equip each cog with an additional internal one-bit register to have it know how to act. Didn't seem like more than half a percent of additional complexity to me, but jmg told me, that was too naively thought.

Who controls this? As I mentioned in the original proposal, you start in unrestricted mode. The instruction "RESTRICT " would set or unset the mode. Any cog running in unrestricted mode can set any cog's mode. Obviously, the instruction must be disabled, once a cog's restricted mode bit is set and until it is unset, such that a cog in restricted mode cannot use it. No need to have a "master cog".

What addresses do cogs in restricted mode see? Like any unrestricted cog would see hub addresses from 0 to 512 KB (or is that 511 KB?), any restricted cog would see hub addresses from 0 to 32 KB. For cog 0, this physically maps to 0 to 32 KB, for cog 1, this maps to 32 to 64 KB and so on.

How do restricted cogs communicate between each other? They cannot, since their accessible hub slices are disjunct. They either only communicate with unrestricted cogs or have an unrestricted cog handle their messages as an intermediary (with all the delays that such brokering would cause).

As for "unique selling point", well, I said, I would buy it. If you don't find compelling use cases for yourself, maybe there aren't any for you. What have I read here about interrupts? "If you don't need them, you don't have to use them." Again, when I made the proposal, I was not under the impression, that this would be complicated to add or use any significant ressources. I'm still not sure I understand what makes it complicated, but jmg told me, it is. I still think, you could have read the proposal more carefully.

I guess, there's no point in discussing this any further, if the costs are like jmg described them.

mmm · 2015-07-20 07:30

I see, the forum has eaten my brackets and the characters between them. The instruction was supposed to read "RESTRICT .ON/OFF. .COG NUMBER." (brackets replaced with dots, hope this displays correctly now and clarifies the intention).

Tubular · 2015-07-20 07:45

Hey mmm
What you described would be pretty trivial to implement, you're just (selectively) forcing 4 Msb's to the cog id, in effect, for hub memory access, and perhaps pin access too. It would be easy to do this in verilog, and wouldn't expect much "cost" to it as described.
I'd suggest though it makes sense to implement this kind of thing in a P1V (verlilog version of the P1), or when we get there, in P2V. If its for commercial application this might also be arranged through parallax.
As you might have gleaned the appetite for new suggestions and discussion has waned. There are a lot of hungry forumistas with knife and fork in hand, bib on, waiting for their p2 dinner, and they're hungry, and they're not so fussed about the gravy right now.
But its important these things should still be raised and discussed.

jmg · 2015-07-20 08:12

Hey mmm What you described would be pretty trivial to implement, you're just (selectively) forcing 4 Msb's to the cog id, in effect, for hub memory access, and perhaps pin access too.

A simple 4 bit mask is unworkable - that gives customers only either 512k or 32k, with no overlaps.
Two 'secured' COGS cannot ever see each other - oops.
So, you need to add more bits to allow more control of overlaps, and constraints, and even with all this, someone can still just launch a COG to replace what was there.The 'protection' is a costly illusion.

Heater. · 2015-07-20 08:28

Seems the idea is that unrestricted COGs can see into the space of the restricted COGs in order to facilitate communication.
What's missing here is any possible use for the thing.

Cluso99 · 2015-07-20 08:33

mmm: just because you are taking away functions, it does not mean there are less transistors involved. On the contrary, you need more logic to test for the function access, and then more to gate or force an alternative. In most cases, not only does it add to LEs and power, it adds in delays that may ultimately be in the critical path and hence restrict the ultimate chip speed.

The hot chip was hot because of all the extras that were added. A lot was als learnt in how to minimise the power.

mmm · 2015-07-20 08:36

jmg, that was the intention, as I mentioned in my answer to Heater. Restricted cogs cannot directly communicate with each other and either only communicate with unrestricted cogs or have an unrestricted cog pass there data to each other (that would delay communication quite a lot). I thought, that more flexibility would require more complexity, but the no overlap solution would already satisfy me (as I said, more wishes for P3). I'm not sure about the last bit about "launching a cog and replace what was there". A cog in restricted mode shouldn't be able to "free itself" from restrictions, so the mode setting instruction would have to be disabled in restricted mode alongside other instructions that might be used to raise priviledges. So launching an unrestricted cog and causing it to execute code should certainly not be allowed in restricted mode. Only unrestricted cogs shall be able to do setup tasks. So some instructions would have to be forced to NOPs in restricted mode.

Tubular, thanks for the encouraging words. It gives me some hope, that, if it causes too many issues for the real chip, I might at least be able to use it in a P2V, if I invest some time in it.

jmg · 2015-07-20 08:40

jmg, that was the intention, as I mentioned in my answer to Heater. Restricted cogs cannot directly communicate with each other and either only communicate with unrestricted cogs or have an unrestricted cog pass there data to each other (that would delay communication quite a lot).

Yup, and you just consumed 6% of the chip as a costly message passer, as a 'fix' - and one that has no protection.Easy to replace that, 3rd COG with a hacked one. The two restricted COGS think their little worlds are fine.

Tubular · 2015-07-20 08:45

Actually I think the 'use for the thing' is probably what mmm is most sure about.
The "no overlaps" *is* what's being asked for by the OP. We're reading other requirements into the request, then deciding its too costly
Even then there may be other ways to communicate via locks or pins (remember linus's "turbulence" p1 demo?)
I'm aware or one space project that would benefit from something along these lines, but I'm sure there are others

mmm · 2015-07-20 08:56

Cluso99, well, I didn't say, i thought it wouldn't cost anything, I just thought it would be simple enough to be negligible, compared with the alternative to sandbox code in a VM. I'm still clueless about how much hotter or slower a P2 with this feature would run compared to one without it. I certainly don't want to be the one responsible for customers' complaints about slow chips running too hot. If it costs too much, it can't be had. Sorry for the hassle. Looking forward to P2V then.

jmg · 2015-07-20 09:00

The "no overlaps" *is* what's being asked for by the OP. We're reading other requirements into the request, then deciding its too costly

Not quite - we are trying to take a singe-use spec, and turn it into something that someone else might be able to use too (without consuming a whole COG as traffic manager). There is caution needed - A feature that is too constrained, will be laughed at. (Ask Apple about USB ports...)
You can constrain HUB writes with a single write subroutine that masks bits in SW, with no added silicon cost, and without needing another COG. Users can freely choose their masking designs & memory maps, it is not cast into the Silicon.

mmm · 2015-07-20 09:19

jmg, I'm not sure what "fix" you're talking about. There is no actual intention to have two restricted cogs talk to each other. I just mentioned, that it could still be done, if you really wanted to. The important case is to have cogs that don't have global access. You cannot "hack" unrestricted cogs, because these are meant to run your TCB code, that you control. If you have no way to restrict cogs in their scope of action, you loose all integrity guarantees as soon as you run any alien code.

Tubular · 2015-07-20 09:26

Then you're adding an execution speed, code size, and complexity penalty to something that could be tidily done in hardware.
We need to leave the decision about what goes into the silicon to the chef. I'm not arguing this should "go in", if anything do a P1V or P2V with it, but the proposal still deserves evaluation regardless

And don't worry too much about being laughed at

mmm · 2015-07-20 10:04

To address the scheme's inflexibity concerns, it might be an option to increase a restricted cog's access window to 64 KB but keep the 32 KB multiples for starting offsets. That is, like any unrestricted cog would see hub addresses from 0 to 512 KB (or is that 511 KB?), any restricted cog would see hub addresses from 0 to 64 KB. For cog 0, this physically maps to 0 to 64 KB, for cog 1, this maps to 32 to 96 KB, for cog 3, this maps to 64 to 128 KB and so on. The slices wouldn't be disjunct anymore, but adjacent cogs would share a 32 KB common area. The last cog (number 15) would be a bit weird though, since it would physically map to both the last and the first 32 KB of hub memory, in that order. More sophisticated schemes could be implemented with this adaptation, while it would still be simple, if the weird wrapping case for cog 15 doesn't introduce any undue complexity. Having more hub space available to restricted cog might also came in handy, although you would have less disjunctive areas at your disposal. Generally though, it would be more flexible.

jmg · 2015-07-20 10:26

To address the scheme's inflexibity concerns, it might be an option to increase a restricted cog's access window to 64 KB but keep the 32 KB multiples for starting offsets.
Yup, you need to add more bits, and all the storage registers and run-time comparators to check them.
Next, duplicate that for pins, where you may also want to share some overlapping resource.

jmg · 2015-07-20 10:42

.... If you have no way to restrict cogs in their scope of action, you loose all integrity guarantees as soon as you run any alien code.

That's a complicated objective, that needs more than just simple masking.
A single alien COG you might be able to manage, but users will want/expect to run up to 15 alien COGs, and I cannot see a means to sensibly house-keep all that, with the P2's resources.
A Microprocessor it is not.

A Custom Loader could give some measure of Alien checking/constraint on memory access, but I'm less sure about any system to constrain pin-access, or access to common shared resource.
(there are still other weaknesses even if you do manage to constrain HUB and Pins )

Memory Protection (be it HW, or SW, or SW + Loader Vfy) would catch errant pointers, which is probably the most common failure mode ?

jmg · 2015-07-20 10:44

Grr forum Sw..

ctwardell · 2015-07-20 10:55

...P2-2015...

Thanks Peter, we should rally around THAT name!
C.W.

koehler · 2015-07-20 13:22

Simple interrupts, or the closest thing Chip has at the moment.

+1 Peter, starting to get too many Chefs in the kitchen again.

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments