Prop2 MMU

m00tykins · 2015-08-12 18:41

Give me a while to get my counter arguments lined up...

heh, I know I was being a bit provocative, sorry 'bout that. Maybe I'm assuming certain things are obvious when they aren't, though I felt a little tired of explaining. Maybe I'm completely off and just not making any sense at all. Regardless, didn't mean to anger anyone too much.

Heater. · 2015-08-12 18:51

m00tykins,

I don't worry about provocative. I love a good debate.

It's not just that things are often not as obvious as one might first think it's that the facts you are presenting as "obvious" are actually false

I have to chew over your screed and pick out the juicy bits to refute.

No anger here. Only love...man.

koehler · 2015-08-13 02:14

The main thing is, if you go to the effort to get some form of MMU working on the Prop, what is the benefit?

For yourself, you can say you did it. That is not unworthy.

For anyone else, it would seem to negate the OBEX, and in return require a similar type of repository.

I can see some people on the forum playing around with it, maybe a few would join you as hardcore proponents.
But on the whole, it would seem to be of little/no benefit to the community at large.
Especially since I believe it would end up reducing performance for no real gain.

m00tykins · 2015-08-13 05:08

Heater. wrote: »

m00tykins,

I don't worry about provocative. I love a good debate.

It's not just that things are often not as obvious as one might first think it's that the facts you are presenting as "obvious" are actually false

I have to chew over your screed and pick out the juicy bits to refute.

No anger here. Only love...man.

far out duuuude

     ."".    ."",
     |  |   /  /
     |  |  /  /
     |  | /  /
     |  |/  ;-._ 
     }  ` _/  / ;
     |  /` ) /  /
     | /  /_/\_/\
     |/  /      |
     (  ' \ '-  |
      \    `.  /
       |      |
       |      |

Heater. · 2015-08-13 05:36

I ain't no jive turkey.

Heater. · 2015-08-16 18:30

m00tykins,

Firstly let's get this straight. Your contention is that it is impossible to create a safe programming system without and MMU. My contention is, that it is.

Can we define "safe" as a system where I can download unknown program "X" and also another unknown program "Y" and run them at the same time on the same CPU, in such a way that they cannot affect each others operation. Further that they cannot affect the operation of my operating system "O"?

The basis for your assertion seems to be some hand waving argument about "Because Turing..."

Part of the difference of opinion here seems to be down to the idea of "Turing Machine" and how it applies.

The Turing Machine is a very simple but rigorously defined idea. A mathematical artefact. It has the concept of memory, a means of traversing that memory, and changing elements in that memory. There are strict logical rules. From that we can derive all kinds of conclusions about computation, what can be computed, what cannot, and so on. The basis of computer science.

I agree with you of course that many programming language are "Turing complete". Basically capable of whatever a Turing machine can do and therefore, basically, all logically equivalent.

BUT, here is the kicker, sit down, this might be disturbing:

C/C++ are not Turing Machines.

Yes C is Turing complete. Anything you can compute with a Turing machine can be computed in C. However the reverse is not true.

The C specification is incomplete. It is full of "implementation dependent" features and "unspecified behaviours". For example: How big is an int? What is the result of an integer overflow? What is the result of an out of bound array index access. What is the result of calling some function through some arbitrary pointer? Without the right number of parameters? And so on and so on. C and C++ are not rigorously defined enough to be Turing Machines. They are more akin to random number generators. Real random number generators that is! That is to say, given any rigorously defined, Turing complete, language one can determine what it does, even if that means building a system to run it. Given any arbitrary C/C++ program it is not possible to determine what it does without taking it's external environment into account.

Point is, that to consider C/C++ and such language as Turing Machines one has to include the entire environment they run in. All that memory and stack they may happen to use, intended by the programmer or not. One has to include the instruction set of the machine running the program, after all a C program may well try and execute some code at any arbitrary address. None of which is analyzable from just looking at the C source alone.

For this reason one has to wrap an MMU around them, to contain the problem.

I actually think that, due to the above observations, anyone attempting to formally prove the operation of a C program, like that seL4 high-assurance kernel project, is either nuts or a charlatan.

If you cannot find any holes in the argument so far I will continue with part two...

msrobots · 2015-08-16 19:00

As far as I understand @m00tykins want a MMU for process isolation.

But process isolation is given by the fact that one cog can not access the memory of another cog.

Hub memory is a shared resource, like the Hard drive on systems with MMU is shared and two isolated processes can without problem crash each others files on it.

But CODE runs on a COG. Perfectly isolated from each other Cog.

my 2 cents.

Mike

Heater. · 2015-08-16 19:25

msrobots,

Yes, the whole discussion is about process isolation.

You are correct. Code executing entirely within a COG cannot possibly interfere with code executing in any other COG. They have their own private memory spaces.

That is fine until your programs want to use any HUB RAM. Which is pretty much essential to make a useful system.

At that point any COG can mess with any location in HUB and therefore mess with any data any other COG is using in HUB.

In the P1 we have LMM mode of execution, in the PII we get that baked into silicon. So now not only can a COG mess with another COGs data it can change the code it is running as well.

There is no process isolation.

Certainly Spin programs are working in a shared memory space. Spin has the @operator and no array bounds checking. Which means any Spin code running on any COG can mess with any other Spin code (or data) running on any other COG on the Propeller.

Spin of course, like C, is not a Turing machine. It has no formal definition and many undefined behaviours. As such it is not amenable to analysis as to how one object can interfere with an other object.

rod1963 · 2015-08-16 19:36

If you look at current real-world examples of "Safety" MCU's like the Hercules line from TI. They do have hardware memory protection along with a host of other silicon to ensure hi-reliability.

http://www.ti.com/lit/ug/spnu552a/spnu552a.pdf

Heater. · 2015-08-16 20:15

rod,

Interesting.

Ten pages into that document and I have yet to see how or what is safe about that MCU. I don't have the patients to read it all.

That does of course expand the whole "safety" idea into world far beyond the simple process isolation discussion here. In the world of safety critical systems there are a billion other things to think about: What happens if the power supply is noisy? Or there is a brown out? What happens when a RAM bit gets flipped by a cosmic ray? What happens when a transistor fails in your CPU? What happens when an essential software process does actually have a bug that causes it to make an illegal memory access? What happens when the temperature is very high or very low?

I'm sure the Ti chip has many features to address those issues.

In the world of avionics systems I have dealt with safety means having multiple redundant systems with their own independent power supplies. Never mind process isolation, they are processes on different machines!

Ti may well have hardware memory protection "to ensure hi-reliability". I am mid way through an argument here to demonstrate that is not required. It does not even help ensure your software actually works as intended.

jmg · 2015-08-16 22:37

msrobots wrote: »

As far as I understand @m00tykins want a MMU for process isolation.

But process isolation is given by the fact that one cog can not access the memory of another cog.

But CODE runs on a COG. Perfectly isolated from each other Cog.

Even that is conditional, with caveats, as COGS can change PLL settings and PINs are not isolated, so it is still possible to have system-level failures.

That said, it may be possible to have a safety simulator, that checks binaries for access to critical resources, but there self-modifying code can subvert that..
Checking for HUB collision is harder. Indexing bounding can be prescribed, but often COGS want to share memory areas, but in a carefully agreed manner.
With a common config file, and special global analysis tools, you could report un-allowed overlaps, but that is a long way from running a unknown program.

Heater. · 2015-08-16 23:10

jmg,

Yes, pins, PPL settings and locks are ways a COG can subvert other processes on the Prop. Heck, any COG can start any other with new code. Can any one think of more ways?

All in all this MMU process isolation idea is doomed to failure with out taking a lot of other things into account.

So far I going to limit my arguments to the simple case of program "X", program "Y" and operating/run time system "O" that can run together safely. Clearly pins, PLLs, locks, cogstarts etc can be abstracted away into memory accesses, so if we can deal with the memory problem we are in good shape.

More in part II of my story...

msrobots · 2015-08-17 00:19

Heater. wrote: »

msrobots,

Yes, the whole discussion is about process isolation.

You are correct. Code executing entirely within a COG cannot possibly interfere with code executing in any other COG. They have their own private memory spaces.

That is fine until your programs want to use any HUB RAM. Which is pretty much essential to make a useful system.

At that point any COG can mess with any location in HUB and therefore mess with any data any other COG is using in HUB.

In the P1 we have LMM mode of execution, in the PII we get that baked into silicon. So now not only can a COG mess with another COGs data it can change the code it is running as well.

There is no process isolation.

Certainly Spin programs are working in a shared memory space. Spin has the @operator and no array bounds checking. Which means any Spin code running on any COG can mess with any other Spin code (or data) running on any other COG on the Propeller.

Spin of course, like C, is not a Turing machine. It has no formal definition and many undefined behaviours. As such it is not amenable to analysis as to how one object can interfere with an other object.

In my Example I stated that Hub ram may be compared with - say - a hard drive on a computer with processes isolated by MMU.

It is a shared resource.

Even if you are able to use a MMU to separate processes you will need some shared resource. Like you said it is essential to make a useful system.

Lets put it into some perspective. We are talking about a micro controller not a full grown computer system. What we have now is a complete process isolation for 8 processes running in there own memory. Quite neat. (Already on P1)

Like with a MMU the COG programs have a simple memory model starting at zero.

I even think that COG memory is saver and provides more separation as a programmable MMU. (OS can do BS)

HubExec changes all of this, but for critical processes you may not need to use it.

Mike

Heater. · 2015-08-17 00:48

msrobots

What we have now is a complete process isolation for 8 processes running in there own memory. Quite neat. (Already on P1)

No, we do not. Not in any way, shape or form.

We are all used to building Propeller programs by mixing and matching objects from here and there, significantly from OBEX, to achieve our goal.

If one of those objects, running from its own COG, happens to randomly write to some random location in HUB then you are going to have an interesting time figuring out why your program does not run reliably.

Admittedly my PASM code running entirely in a COG is immune to abuse. If we ignore the fact that it can be killed at any time by any other COG. Or perhaps confused by activity on the I/O pins.

The argument that HUB RAM is just an I/O device for a COG, like a hard drive, is a nice way to look at things but again there is no control or arbitration over that.

Yes, I do agree we are talking a micro-controller here and an MMU is redundant.

My objective here, and my debate with m00tykins, is to show that an MMU in hardware is always redundant!

rod1963 · 2015-08-18 05:24

Heater. wrote: »

rod,

Interesting.

Ten pages into that document and I have yet to see how or what is safe about that MCU. I don't have the patients to read it all.

That does of course expand the whole "safety" idea into world far beyond the simple process isolation discussion here. In the world of safety critical systems there are a billion other things to think about: What happens if the power supply is noisy? Or there is a brown out? What happens when a RAM bit gets flipped by a cosmic ray? What happens when a transistor fails in your CPU? What happens when an essential software process does actually have a bug that causes it to make an illegal memory access? What happens when the temperature is very high or very low?

I'm sure the Ti chip has many features to address those issues.

In the world of avionics systems I have dealt with safety means having multiple redundant systems with their own independent power supplies. Never mind process isolation, they are processes on different machines!

Ti may well have hardware memory protection "to ensure hi-reliability". I am mid way through an argument here to demonstrate that is not required. It does not even help ensure your software actually works as intended.

As far as software working as intended - you do something called testing and validation to make sure it does work as expected under a wide variety of circumstances. You know the drill.

That said, your criticism of the TI gear applies equally if not more to the Prop that's quite long in the tooth and never intended for safety critical systems. so what's yer point?

BTW Hercules is not aimed at military aviation or rad-hard space systems but thanks for bringing it up. I doubt TI is going to get it flight certified anymore than Chip and Ken are going to get the Prop certified. It's a non-sequitur.

Those markets are very small and belongs to specialized vendors.

I just brought the Hercules up as example of what some of the big dogs are doing. It's certainly a I/O beast but it's not aimed at hobbyists.

BTW I have also dealt with hi rel systems - military aviation 10 years all told and 5 on the data acquisition end. I'm quite familiar with the avionics suites of certain airborne platforms. I know about the isolation and redundancies in them. FWIW I started with the old Mil-Spec 1750A processors with core memory(IBM AP-101 series, then 102).

Heater. · 2015-08-18 14:05

rod1963,

...what's yer point?

I did not actually offer any criticism of the TI part. I went back and had another look at the Hercules document. It certainly is loaded with hardware diagnostics, error detection, etc features. Just what you want in a safety critical application. I have never worked with a Hercules so no criticism from me.

My point, such as it is, is that we can take many different inferences from the word "safe". The devices like Hercules are clearly aimed at one application area of safety critical systems. Those avionics systems I worked on are old school now, a lot less integrated. The Boeing 777 PFC's, for example, use Intel 486, Motorola 68000 and AMD 29K. There was a lot of input checking and cross referencing going on in external ASICs and micro-controllers. All of that was triple redundant, with a 4th box on standby for use at the pilots discretion. The 777 can be flown, not so easily, with analogue electronic control should all that digital stuff go AWOL. Just it the kill switch in the roof of the cockpit!

I don't recall any of those systems making use of MMU. That is just add complication and further unprovability to a job already done by the languages and run-times used. Ada and Lucol mainly.

Which brings us to the other use of the word "safe". The far simpler problem raised by m00tykins that in a general purpose, non-safety critical, operating system we may want to load and run code that we cannot trust, "safe" in the knowledge that our machine will not allow that code to bring down, interfere or snoop on any other code we may have running at the time.

The claim is we need a hardware MMU in order to make that possible. My claim is that we don't.

More on that later...

m00tykins · 2015-08-18 15:09

Hey guys, Heater,

Sorry to get back to you so late, honestly I haven't even read the entire thread yet. I've been terribly busy at work and getting home too late at night to get back to this thread and make a detailed response (even now I need to leave soon haha). But, yes, I do agree so far heater, and I'm really interested in hearing the rest of your argument. Please continue.

m00tykins · 2015-08-18 18:27

Ok, I actually managed to get enough time to make a decent post :P

Heater. wrote: »

m00tykins,

Point is, that to consider C/C++ and such language as Turing Machines one has to include the entire environment they run in. All that memory and stack they may happen to use, intended by the programmer or not. One has to include the instruction set of the machine running the program, after all a C program may well try and execute some code at any arbitrary address. None of which is analyzable from just looking at the C source alone.

For this reason one has to wrap an MMU around them, to contain the problem.

I actually think that, due to the above observations, anyone attempting to formally prove the operation of a C program, like that seL4 high-assurance kernel project, is either nuts or a charlatan.

Actually, the formal verification process for seL4 was entirely math-based, meaning that the code is only verified to have no *semantic* errors. Basically the code is compared with an idealized mathematical model of the system and is proven to implement the specification, but of course there are countless other ways such a system could fail. To quote the seL4 website:

"The security proofs state that if the kernel is configured according to the proof assumptions and further hardware assumptions are met, this specification (and with it the seL4 kernel implementation) enforces a number of strong security properties: integrity, confidentiality, and availability.

There may still be unexpected features in the specification and one or more of the assumptions may not apply."

So yes, you are right, although seL4 may be bug-free the implementation will not. Basically I completely agree with you that no system can currently be made bug-free, just pointing out a technicality.

Heater. wrote: »

I don't recall any of those systems making use of MMU. That is just add complication and further unprovability to a job already done by the languages and run-times used. Ada and Lucol mainly.

I totally agree here too. In this case, all code is trusted, so an MMU is unnecessary complexity.

Ale · 2015-08-27 16:35

To isolate processes we only need the protection part of the MMU, not the remapping of addresses. Even in a monotask system ensuring that the program does not overwrite memory is not enough, what if the processor is glitchy ? Say ADD does not always return the right result ? You do not even need a badly/incomplete/wrongly coded program/algorithm!

The problem is to know exactly what requirements you have to fulfill and not to want something with vagely defined boundaries.

Heater. · 2015-08-27 16:56

Ale,

My gut feeling is that if you are prepared to accept that your processor may be "glitchy", basically has a probability of failing to produce a correct result, then you have to accept that any hardware you add to check memory bounds or any other odd behavior can also be "glitchy". It's all on the same chip and built from the same transistors right?

That means your memory protection feature can glitch and cause failure of an other wise perfectly running process.

That is to say, that by adding such checking you have increased the probability of failure rather than reducing it !

You are right, the requirements for safety features depend on your exact requirements for fault tolerance, accuracy of results, availability and so on.

jmg · 2015-08-27 23:18

Heater. wrote: »

My gut feeling is that if you are prepared to accept that your processor may be "glitchy", basically has a probability of failing to produce a correct result, then you have to accept that any hardware you add to check memory bounds or any other odd behavior can also be "glitchy". It's all on the same chip and built from the same transistors right?

Not quite correct- there are soft errors in SRAM that are not the same as 'logic', which means you can get a gain in reliability with (extra) code (or HW) that confirms SRAM integrity.

In a P2, an interrupt COP watchdog type code, could check the code-area for valid.
Of course, self modifying code is a wrinkle to this, but less of that is needed in P2, right ?

Heater. · 2015-08-27 23:50

jmg,

You may have to elaborate on that. What is a "soft error" in RAM and how is it different from a "glitchy" processor? They all sound like random errors to me.

Let's say a "soft error" in RAM provides wrong bit when the processor reads it. Before and after that the RAM is correct. Now our processor calculates an addition wrongly, which is used for an address which is now out of bounds and our MMU catches that fault.

Sounds great. Disaster averted. Provided of course you did not actually need the result of whatever calculation it was that failed and was aborted. Or provided you have time to run it again and hope that it succeeds.

BUT given that the RAM and CPU and MMU are all built out of the same technology on the same chip, all of their transistors have an equal probability of failing. The more transistors you add the more probable something is going to fail.

Ergo, adding an MMU increases the risk of failure rather than reducing it.

jmg · 2015-08-28 00:36

Heater. wrote: »

jmg,

You may have to elaborate on that. What is a "soft error" in RAM and how is it different from a "glitchy" processor? They all sound like random errors to me.

Here is a starting point
https://en.wikipedia.org/wiki/Soft_error

FPGAs have the config memory deliberately larger and slower, to mitigate this.

I know some industry sectors that use power removal watchdogs, as the ultimate in error recovery. Seems on many devices these days, reset is more a reset request, and many errors need power removal.

Ale · 2015-08-28 08:06

Some mechanisms like end to end ECC can help to mitigate delivery problems like Heater says. ECC in RAM can help with thosa random cosmic ray that flips a bit. There are dual cpu processors with the second processor being used to check on the main processor's results.... how far do you need to go ? Is the real question. But to assure reliability an MMU/MPU is just one of them.
You can have multitasking systems without memory protection and bounds check. Each "technology" solves one problem.
For instance ASIL D systems have two separated processors doing the same and checking up to each other, results not in range trigger a reset....

Heater. · 2015-08-28 08:25

We know what a "soft error" is. It's that cosmic ray flipping a bit in RAM, it that EMI corrupting received data with noise spikes, and so on. It is that randomly occurring error that can be recovered from as opposed to a permanent fault condition like a blown transistor.

The proposal was that a processor can be "glitchy" i.e. subject to such random soft errors. And that was reason to need an MMU. My statement is simply that if your device has a CPU and memory and other required features, and that those are all built on the same chip with the same technology, then adding more of the same transistors to detect those errors actually increases the probability of failure. It's a simple probability calculation. Think of your processing as throwing dice, every time a six comes up it is a failure, obviously throwing two dice at a time increases the probability that you see a six come up. Add more dice and it gets worse. Adding that MMU is like adding more dice to this game.

Which is not to say adding such error checking is not useful. Even if adding the error checking increases the chance of failure. If you can detect the error and abort the process at least you don't produce a wrong result, which is good, and you have the chance of running it again. Think of it like people adding up lists of numbers. If one person does it there is a chance of a mistake. If two people do it and cross check there is twice the chance of a mistake, but the cross check fails and they do it again.

You are suggesting FPGAs have a different "larger slower" memory to increase reliability. That changes the picture somewhat. Different technology, different failure rates. Mix and match them to get the overall lowest failure rate.

Certainly watchdogs that remove power have been around for ages. Why should I assume the reset circuitry itself does not need a reset?

That made me chuckle, I have a PC here where occasionally USB stops working. No amount of rebooting will get it going again. It requires a total power down to un-stick it.

evanh · 2015-08-28 12:17

Extensive integration of ECC is very effective, but that's got zippo to do with MMUs.

The MMU sole reason for being is to isolate foreign programs from each other: - Programs that are intended to be vying for resources without knowledge of their competition on complex OSes that themselves contain many competing programs. - Programs that are written and delivered by unknown agents. - Programs that can be hostile, whether by design or ignorance.

A single tailored application like what microcontrollers are used for has no need of an MMU. The only reason MMUs are in vogue is because of the large OS eco-systems that have evolved to use them ... and it's become cheap to throw the kitchen sink at every application under the sun.

evanh · 2015-08-28 12:40

Heater. wrote: »

You are suggesting FPGAs have a different "larger slower" memory to increase reliability. That changes the picture somewhat. Different technology, different failure rates. Mix and match them to get the overall lowest failure rate.

I doubt that's generic practise. Just in the more rugged implementations I'd think. Unless it's done in some multilayer trick that takes no extra space then every oversized config cell is eating valuable real-estate. Not only that but some FPGAs allow config space to become selectable distributed block RAM.

Heater. · 2015-08-29 17:00

Arguably an MMU can protect you from your own buggy program. Program steps out of bounds, it gets aborted. This at least saves you from continuing with incorrect data and producing wrong results.

Question is then what? Halt the machine? Reboot it automatically? This rather depends on your application requirements. A reset is often acceptable, that is why we have watchdogs that do that.

evanh · 2015-08-29 23:42

That's not a soft error. That would be a hostile program.

The great part about ECC is not that it can detect errors but that it can correct them on the fly with zero performance penalty.

evanh · 2015-08-29 23:50

Well, I guess maybe I shouldn't say zero performance penalty. There is the built-in ECC redundancy that always consumes something.

Prop2 MMU

Comments