The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

evanh · 2015-07-19 14:57

Smile, I'm now totally against supporting interrupts from pin inputs. The only reason I suggested Smartpins in the first place was to make use of the counters there. But that hasn't panned out.

Seairth · 2015-07-19 16:12

Ugh. You know, the more scenarios I consider with the current interrupt implementation, the more I think we've taken it too far. Here are are a couple reasons why:
Suppose both the pin and timer interrupt occur "at the same time". The current rule is that pin interrupts are higher priority than timer interrupts, so the pin interrupt would be serviced first. What this really means is that LINK $1F3 will get inserted into the pipeline first. But that turns out to be wrong. In this case, LINK $1F5 should be inserted first. Why? Because if LINK $1F3 is inserted first, then LINK $1F5 will be inserted on the very next instruction cycle. This will result in the timer ISR interrupting the pin ISR, which means the effective priorities have reversed. On the other hand, if you are using auto-ILOCK, then you do want to insert LINK $1F3 first, which will then delay the insertion of LINK $1F5.
So, now the LINK-insertion order must be dependent on which auto-ILOCK bits are set. And users must also be able to keep track of this! This is the exact sort of thing that makes "traditional" interrupt architectures such a bother to use.
Then I was thinking about the whole INTMASK thing. Earlier, I had suggested that maybe each mask should be 2 bits, considering the fact that sometimes you need to delay another ISR and other times you need to suppress it altogether. But this made me realize that you may want different behaviors at different times. In other words, when in an XFER ISR, you want one mask, but in TIMER ISR, you want a different mask. Now, this would mean that an instruction like INTEDGE would need to be 8 bits: 2 bits for edge configuration and 6 bits for mask configuration (3x2 bits for each interrupt).
Again, this starts to get really complicated and difficult to track! Even if you went back to a 1-bit mask, you would still need 3 bits per configuration.
In the end, all of this has come about because the architecture now allows simultaneous interrupts. It makes me think that the prior approach, where a cog can have only one interrupt mode active at a time, was the best balance of simplicity and flexibility.

potatohead · 2015-07-19 16:31

A lot of that complexity goes away if they lock on entry and free on exit no matter what. Then the priority is clear.

It's not going to work for as many use cases, but it will be clear as to what is going on, and given the number of COGS, maybe that's OK. There are other ways to solve problems.

And if that's a hard line, people will design for it too, which improves on it all over time. This idea got applied to P1, and people did design for it to great success. Why not apply it here too?

Really fleshing out those cases is where the complexity comes from, and it's necessary on devices that do not have concurrent, multi-processing on board. That's understandable. P2 doesn't need to do that.

Having one at a time is golden, agreed. I struggle with flexibility though. Using two of them may make great sense, and if the conflict is interrupts interrupting interrupts, let's just keep that out of the equation.

Locking / freeing no matter what makes sure they happen in the order specified and in a predictable way.

I think that's best.

For cases where it needs to be a bit more than that, use multiple COGS, or go back to tried and true Propeller methods well known on P1.

I submit always locking and freeing is the best balance.

I'm neutral on pin interrupts. I see it as enabling advanced comms vs robust operation. What is worth what?

Sapphire · 2015-07-19 16:40

I got the new system all implemented and it seems to work fine. Oh, and the compiled design got smaller by 30 LE's! This system is quite a bit simpler on the inside. It was way easier to think about.
Here are the new instructions:
INTPER D/# set period for timer interruptINTPIN D/# set pin for edge interruptINTMODE D/# set interrupt modeINTMASK D/# set interrupt mask via d[0] - IFREE/ILOCK
mode %LSS_LPP_LTT

%LSS = streamer interrupt, issues 'LINK $1F0,$1F1 WC,WZ' L: 1 = ILOCK on interrupt and IFREE on 'LINK D,$1F0 WC WZ' SS: x0 = disable, 01=rollover, 11=block wrap
%LPP = pin edge interrupt, issues 'LINK $1F2,$1F3 WC,WZ'
L: 1 = ILOCK on interrupt and IFREE on 'LINK D,$1F2 WC,WZ' PP: 00 = disable, 01=any edge, 10=pos edge, 11=neg edge
%LTT = timer interrupt, issues 'LINK $1F4,$1F5 WC,WZ' L: 1 = ILOCK on interrupt and IFREE on 'LINK D,$1F4 WC,WZ' TT: x0 = disable, 01=initialize and enable, 11=enable
Any reason why the pin edge interrupt %LPP bits couldn't be more logical?
PP: 00 = disable, 01=neg edge, 10=pos edge, 11=any edge

potatohead · 2015-07-19 16:51

Well, we could get totally pedantic and discuss the high bit being the sign bit:

PP: 00 = disable, 01=pos edge, 10=neg edge, 11=any edge

Good observation! I'm not really serious about the above, BTW.

Seairth · 2015-07-19 17:41

Well, we could get totally pedantic and discuss the high bit being the sign bit:

PP: 00 = disable, 01=pos edge, 10=neg edge, 11=any edge

Good observation! I'm not really serious about the above, BTW.

Either of those should work. That makes those 2 bits an OR truth table for the edge transitions (similar to way that instruction predicates work).

Seairth · 2015-07-19 18:14

Supposing we really want to hold on to multiple interrupts, here might be another approach. I haven't fully thought this out, but I'm throwing it out there anyhow.
INTXFER D/# - enable/disable XFER interruptsINTEDGE D/# - enable/disable EDGE interruptsINTIMER D/# - enable/disable TIMER interrupts
(note: Instead of auto-ILOCK, there is a 2-bit field for INTCLR-on-JMP. Details below.)
ILOCK - delays LINK insertion, regardless of interrupt priority.IFREE - disables ILOCK
INTCLR D/# - clear the highest-priority interrupt flag, where D/# is: %00 : NOP %01 : clears pending flag %10 : clears active flag %11 : clears both pending and active flags
With these instructions in mind, consider the following rules:
* An interrupt state is internally represented by two bits: 00: Not active -1: Pending (waiting to insert a LINK-to-ISR) 1-: Active (LINK has been inserted)
* An interrupt state transitions as follows: 00 -> 01 -> 10. If another interrupt condition occurs while the state is 10, it would transition to 11. INTCLR (and INCLR-on-JMP) determines whether the final transition will be 00 (INTCLR %11) or 01 (INTCLR %10). * Interrupt priority, from high to low, is: XFER, PIN, TIMER.
* When two interrupts trigger at the same time, the lower-priority ISR is LINKed first (see prior comment why this is necessary).
* A higher-priority interrupt can always interrupt a lower-priority interrupt, unless the lower-priority ISR has called ILOCK.
* In all cases, a given interrupt's active flag is not cleared until INTCLR is called with the appropriate mask (or INCLR-on-JMP is performed). This means that an ISR cannot be accidentally re-entered.
* INTCLR only clears the highest-priority interrupt. Since a lower-priority ISR cannot be executing when a higher-priority ISR is active, it is not possible for an ISR to affect interrupt state of an unrelated interrupt.
So, with the above specification, I think it might be possible to support multiple interrupts in a simple, straight-forward manner.
One observation (though I am sure more will follow):
* INTEDGE might not need a delay with this approach. Since you are guaranteed to not have a spurious edge interrupt while the EDGE ISR is active, the only concern is with spurious interrupts that occur after the ISR has completed. In this case, one approach would be for the ISR to track the duration between interrupts and simply ignore interrupts that are "too soon".

dnalor · 2015-07-19 19:01

* A higher-priority interrupt can always interrupt a lower-priority interrupt, unless the lower-priority ISR has called ILOCK.
* INTCLR only clears the highest-priority interrupt. Since a lower-priority ISR cannot be executing when a higher-priority ISR is active, it is not possible for an ISR to affect interrupt state of an unrelated interrupt.

* I would make it the other way around. The ISR has to enable that it can be interrupted (so I know it from other microcontrollers).
* INTCLR then should clear the active interrupt.

rod1963 · 2015-07-19 19:15

At first having a limited interrupt capability looked good on a Prop, now it's morphed into something quite complicated.

jmg · 2015-07-19 19:34

mov $1F5*4,#tmrint 'set timer interrupt vector
mov $1F3*4,#pinint 'set pin interrupt vector jmp $1F4*4 wc,wz 'return
jmp $1F2*4 wc,wz 'return

One compiler question.
Can You made in compiler pseudo codes for that returns ?
retit
retip
For simplicity of usage and readablity of programs

Of course, all generation of the MOV and JMP opcodes around Vectors, should be handled in a clearer way by Assembler. This code above just allows quick testing.

I thought about some nonsensical INTMODE data pattern being used for IFREE/ILOCK, but it just seemed easier to give it a whole instruction. Do you see a nice way to exploit INTMODE for that purpose?

What the user sees matters more to them, than how the binary opcode is managed.A better question is which approach is smallest and fastest, as the ASM can present either cleanly to users ?

Maybe it would be good to break INTMODE into three different instructions: INTXFER, INTEDGE, INTIMER

At the user source level, sure - clarity is always good and mask patterns are never easy to remember.The alternative is a lot of Mask equates that always have to be included.

Seairth · 2015-07-19 19:35

* A higher-priority interrupt can always interrupt a lower-priority interrupt, unless the lower-priority ISR has called ILOCK.
* INTCLR only clears the highest-priority interrupt. Since a lower-priority ISR cannot be executing when a higher-priority ISR is active, it is not possible for an ISR to affect interrupt state of an unrelated interrupt.

* I would make it the other way around. The ISR has to enable that it can be interrupted (so I know it from other microcontrollers).
* INTCLR then should clear the active interrupt.

I'm not sure about the first point. An ISR can call ILOCK to prevent being interrupted (just as the "main" task can). If you implicitly ILOCK at the beginning of every ISR, then it means that the interrupt priorities only apply when more than one interrupt is pending at the same time. Also, if you intend for higher-priority interrupts to be allowed, you would need to insert an IFREE at the beginning of your ISRs.
Having said all that, I don't know which approach is better. This might be one of those places where "principle of least surprise" should come into effect. In which case, your approach might be "less surprising". I suppose you could also add the auto-ILOCK option back to the configuration, thereby leaving the choice to the programmer.

As for the second point, that's what INTCLR does. You stated it a bit more succinctly, though. Note that INTCLR could also clear the pending bit of the active interrupt.

Seairth · 2015-07-19 19:38

At first having a limited interrupt capability looked good on a Prop, now it's morphed into something quite complicated.

Agreed. I'm fine with going back to an older iteration of the design. My proposal above was mostly an attempt to strike a compromise in case Chip wanted to keep multiple interrupts.

Seairth · 2015-07-19 19:42

Chip,
Does the LINK instruction cause the pipeline to be flushed like JMP does? Assuming this is true, then I foresee an issue with multiple interrupts. If two interrupt LINK instructions are queued back-to-back, the second one will be flushed from the pipeline and lost.

jmg · 2015-07-19 19:59

Do you think we are lacking any helpful masks, or anything? One thing that can be done to mask a source is to just turn it off, and then back on later. It works like a newbie would imagine.
And you can do an INTPER any time. When the timer reloads, it will grab the last-written value.

There are two possible desirable modes on Timer Reload, that could need a control Flag.
a) The queue register is loaded, and it waits for next roll-over.(as above) Good for Modulation/PWM etc
b) The timer is reloaded when queue register is loaded. b) avoids the issue of needing a rapid timer response, say from a long watchdog timer set for boot, to a shorter timer for run - Here, you want the new time cadence to apply immediately. This also allows quite long times to be set for testing / sleep, but a rapid change of gears on the timer when stuff needs doing.
I don't think there are any hidden, gotchas. If you do an INTPIN and switch to another pin which happens to be in the opposite state of the prior pin, the edge sensor won't wrongly trigger because it will wait for two clocks to get two new pin samples.
It seems complete, to me, but there may be some thing(s) missing. Timer and streamer interrupts are quite safe, but pin interrupts can potentially bring things to a halt. Do you think we need some hold-off counter to delay when the next pin interrupt is registered? Pin interrupts concern me.

Timers can have an always sticky-bit, which allows short term stretch without missing counts.
A good test code for that would be to trigger PinInt from inside timer, and PinINT (single edge) has a REP opcode on Toggle that lasts for 1.5 Timer Periods. This should skew one timer edge 0.5 periods.Of course a REP that is > 2 Timer periods will start to drop edges.
If the edge-state-engine re-arms on any change, then edge sense is always 'live' and not sticky.That's usually good for Pins, but maybe not so good for Pin-Cell events like Timer/Capture, those are more flagged than edge.I think those pin-cell signals use the same Pin pathway, so that means some method of 'co-operation' is needed between the INT state engine and Pin-Cell to not miss Timer/capture signals from the PinCell.

jmg · 2015-07-19 20:09

At first having a limited interrupt capability looked good on a Prop, now it's morphed into something quite complicated.

Most of the internal state engine handling being discussed here, is invisible to the user.
It is less complicated than it looks - in fact, Chip says it uses less Logic

jmg · 2015-07-19 20:12

* INTEDGE might not need a delay with this approach. Since you are guaranteed to not have a spurious edge interrupt while the EDGE ISR is active, the only concern is with spurious interrupts that occur after the ISR has completed. In this case, one approach would be for the ISR to track the duration between interrupts and simply ignore interrupts that are "too soon".

I think some of that type of house-keeping can be done in the Smart-pins, which have (IIRC) a debounce timer. See also my comments on Sticky and not-sticky flag handling.
Some use-cases need one, some use-cases need the other.

dnalor · 2015-07-19 20:19

* A higher-priority interrupt can always interrupt a lower-priority interrupt, unless the lower-priority ISR has called ILOCK.
* INTCLR only clears the highest-priority interrupt. Since a lower-priority ISR cannot be executing when a higher-priority ISR is active, it is not possible for an ISR to affect interrupt state of an unrelated interrupt.

* I would make it the other way around. The ISR has to enable that it can be interrupted (so I know it from other microcontrollers).
* INTCLR then should clear the active interrupt.

I'm not sure about the first point. An ISR can call ILOCK to prevent being interrupted (just as the "main" task can). If you implicitly ILOCK at the beginning of every ISR, then it means that the interrupt priorities only apply when more than one interrupt is pending at the same time. Also, if you intend for higher-priority interrupts to be allowed, you would need to insert an IFREE at the beginning of your ISRs.
Having said all that, I don't know which approach is better. This might be one of those places where "principle of least surprise" should come into effect. In which case, your approach might be "less surprising". I suppose you could also add the auto-ILOCK option back to the configuration, thereby leaving the choice to the programmer.

As for the second point, that's what INTCLR does. You stated it a bit more succinctly, though. Note that INTCLR could also clear the pending bit of the active interrupt.

In my imagination of the hardware (pipeline?), I would think that it might be easier to implement a auto-ILOCK, than to make sure, that a Lock in the ISR does not come to late (clock-cycles?). I'am maybe total wrong.
Yes, you have to insert a IFREE. But then you know, that you can be interrupted. No surprise then.
You wrote: "INTCLR only clears the highest-priority interrupt." Maybe I understood something wrong."succinctly"
Yes, sorry. Thinking/writing in english is difficult for me. And I'am a little lazy ;-)Yes I understood: %01 : clears pending flag%10 : clears active flag%11 : clears both pending and active flagsThat's logical to me.
At first having a limited interrupt capability looked good on a Prop, now it's morphed into something quite complicated.

You don't have to use it.

jmg · 2015-07-19 20:26

Supposing we really want to hold on to multiple interruptsThat depends on what Multiple means.

I think Chip needs to post some scope examples of multiple interrupts firing, to see how the present design unravels and queues (or skips) multiple requests.
The example given so far, is only one interrupt at a time, but multiple sources.

potatohead · 2015-07-19 21:28

Set up a poly counter in the FPGA and trigger them all that way. Run the counter a bit faster than the chip to simulate odd edge cases.

At the least, we could identify a set of test cases and check those.

Cluso99 · 2015-07-19 22:11

I suggest that Chip leaves what we currently have, warts and all, and move on.

This way we can get an FPGA code sooner, and we can test real time. We can start a new thread to discuss the outcomes which doesn't need to involve chip until we can at least list some common issues.

Does this make sense?

BTW I am happy with what we have, and just don't allow any nested interrupts. This way, the timer is used as I envisaged, a timer to activate if a pin interrupt did not occur within a specific time period.
The timer can also be used for its original purpose (without pin interrupt) for multi tasking.

Dave Hein · 2015-07-19 23:14

P2 Watch: 28 days since the end of Spring.

June                  July
Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa
   -- -- -- -- -- --            1  2  3  4
-- -- -- -- -- -- --   5  6  7  8  9 10 11
-- -- -- -- -- -- --  12 13 14 15 16 17 18
21 22 23 24 25 26 27  19 -- -- -- -- -- --
28 29 30              -- -- -- -- -- --

P2 Day is almost here! It will get here sooner if those darn interrupts would stop.

jmg · 2015-07-19 23:40

Any reason why the pin edge interrupt %LPP bits couldn't be more logical?
PP: 00 = disable, 01=neg edge, 10=pos edge, 11=any edge

Those are usually managed as defined names in an Include file these days, but a purist would say the most logical at a bit-level would be to have
01=pos edge, 10=neg edge
as 01 is _/= and 10 is =\_

Sapphire · 2015-07-20 00:38

The point is I think we can all agree that %11 should be any edge.

mmm · 2015-07-20 02:20

It does not seem to be possible to restrict a cog's access rights. This is unfortunate, since this prevents sandboxed code execution despite the chip's logical partition into independent cores. I'd like to suggest, that an instruction be added to set a cog into a restricted mode, that should work as follows:

(1) Accessible hub memory is transparently reduced to 32 KB and mapped to disjunct slices (16 x 32 = 512, one for each cog, fix physical mapping but transparent (virtual) addressing).
(2) Access to pins is disabled.
(3) Restricted cogs cannot use the mode setting instruction.

By default, cogs start in unrestricted mode. Any cog's mode can be set by any cog (in unrestricted mode). Transparent hub memory reduction means, that only 32 KB of hub memory can be used. Attempts to read from or write to locations above the respective virtually visible 32 KB of hub memory should wrap (or have the same effect, that an attempt to read from or write to locations above the normally visible 512 KB in unrestricted mode would have.

Implications are:

(1) Cogs in restricted mode only have physical access to their respective hub slice.
(2) Restricted cogs cannot communicate with each other (unless an unrestricted cog acts as a broker, copying message data between disjunct hub memory slices)
(3) Restricted cogs cannot directly access external memory, because they cannot read from or write to pins.

I hope, that this could easily be implemented, since practically no new functionality has to be added but merely a way to restrict functionality shall be provided. This shoulkd add negligable complexity and would allow a safe execution of untrusted code and memory protection without necessitating a memory management unit (MMU).

The recent discussion around interrupts shows, that there are numerous things that people would like to see in a processor for their individual use cases. If I remember correctly, one iterrupt feature iteration has added 2.5 % to cog complexity. I don't know, how much additional complexity further iterations of the interrupt feature, which currently seem to be in discussion, would add, but just wanted to throw in the feature, which to me would be most useful and on which I would spend any remaining complexity budget, rather than on any further iteration of an interrupt feature (or even the first one), if those features were mutually exclusive for remaining complexity budget reasons. While I have no useful intuition about the actual complexity, I naively assume, that the herein proposed feature would add even less than 2.5 % to cog complexity, and hope, that it will be considered.

jmg · 2015-07-20 02:46

It does not seem to be possible to restrict a cog's access rights. This is unfortunate, since this prevents sandboxed code execution despite the chip's logical partition into independent cores.I'm not quite following the case you want to cover here ?By design, COGS are likely to want to share some memory, and that access can be masked in software if you want to exclude access to other memory. All that can be set at compile time.It could even be a compiler switch. An operating as expected COG is fully sand boxed.
If you want to protect against a runaway/corrupted code COG, or some deliberate attack, that's a very different ball-game. I could see a use in mapping access to the supported-in-opcode-but-not-physically-present upper 512k of memory*, to one of the Vector Traps (interrupts) - that's not quite an all-bells MMU, but it would give a means to slot-in an off-chip stub, that would be automatic (from user code perspective) and highly flexible.
* Or any memory outside on-chip and the Vector-stub can decide what to do about it, before return.

mmm · 2015-07-20 03:25

Yes, restricted mode would allow protection against erroneous or hostile code. The impact that such code might have could be confined. The proposed scheme is admittedly rather coarse and unsophisticated to keep it simple (I have more wihes for P3). Restricted mode does not prohibit a cog to share memory with other cogs, since any unrestricted cog would still be able to access the complete hub address space. Only the direct communication between two cogs that both run in restricted mode would be affected, which is due to the unsophisticated scheme to keep things simple. Restricting restricted cogs completely from accessing pins is also pretty blunt, but a more fine-grained approach would add more complexity. And I don't want to add complexity, I just want strong isolation of untrusted code to be achievable. It should be possible to detect a failed or misbehaving cog and to restart or disable it.

jmg · 2015-07-20 03:55

I can't see truly hostile code being protected - once it gets inside a P2, the P2 cannot distinguish good from bad, and there is a shipload of things bad code could do. (not just memory access)
There are encrypted loaders & ID fuses, to help prevent hostile code from ever getting inside.

Runaway/lost code could need far more tests than just memory access, and that could be done nicely with a timer vector / watchdog, which I see as a significant use of the new features.

Heater. · 2015-07-20 04:06

mmm,
Seems pointless to me. What use case would your proposal satisfy?

Cluso99 · 2015-07-20 04:15

I really don't see the need to restrict a cogs access to pins or memory.
Remember we don't have huge programs and an os running on a single processor. This is not a PC nor is it an ARM.
Some generally available drivers will be pretested and it will be possible to use these just as in the P1. These aren't part of the main program(s) running in other cogs. This all makes for simpler code that shouldn't require sandboxing.

IMHO if there was time and space and power available, I could think of far better things to add that ought to be more general use.

mmm · 2015-07-20 04:37

What shipload of bad things could bad code (either intentionally or unintentionally bad) do outside its restricted cog's memory slice and without access to pins, assuming that the system has been carefully designed to shield against misbehaving subsystems compromising the entire system? Sure, the subsystem might excessively use resources. Periodic health or timeout checks should be performed.

Preventing potentially corrupt code from becoming part of the system is theoretically possible, if you can read all of it and have the budget to formally prove it. If you can't, having means to confine it means, you can at least maintain the integrity of your own code by careful construction. And since the Propeller chip will probably also be used as an educational tool, instructors might find uses for such functionality to teach defensive programming.

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments