The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

potatohead · 2015-07-20 15:28

Yep. That's gonna happen. It's up to Chip to manage it.

Seairth · 2015-07-20 15:58

Chip,
In the interest of moving forward, I suggest going back to the initial interrupt solution. It may be easiest to leave the design as it currently stands, but I'm concerned that there are edge cases in the current design that we will overlook. It seems much better to reduce/eliminate the edge cases and figure out how to work around limitations, than to risk putting out something that goes too far and requires us to avoid hidden pitfalls.
If you don't want to go back to the earlier design, maybe this approach will work for you:
* Only one interrupt mode is active at a time.
* Include TIMER and XFER interrupts, but maybe leave out the EDGE interrupts for now. There is still WAITxxx and whatever is available via smart pins, so EDGE behavior has already got some decent support. TIMER and XFER interrupts, on the other hand are new capabilities.
* Interrupts have two bit flags: Pending, Lock. When an interrupt condition occurs, Pending is set. At instruction fetch, LINK is inserted for [Pending && !Lock]. When the LINK is inserted, also set Pending to 0 and Lock to 1.
* Interrupts always lock-on-LINK. An ISR must either call INTCLR before returning, or use INTRET instead of JMP.
Here's a matching instruction set:
INTXFER D/# - enable XFER interrupts (D = mode)INTIMER D/# - enable TIMER interrupts (D = timer length)INTEDGE D/# - enable EDGE interrupts (D = pin number)INTOFF - disable interrupts (could be an alias for "INTIMER 0" or "INTEDGE 0" or "INTXFER 0")
INTHOLD - sets the lock flag (only useful in the "main" task)INTCLR D/# - clear the interrupt flags, where D/# is: %00 : NOP %01 : clears pending flag %10 : clears lock flag %11 : clears both flagsINTCLR - alias for "INTCLR %10"INTRET - performs both "JMP $1F4" and "INTCLR %10"
(note: I left INTEDGE in the instruction list, though I initially suggested leaving it out of the initial P2 release. Since ISRs must clear the Lock bit, and can also clear the pending bit, I don't think there's a need for the pin time-delay from earlier designs.)

potatohead · 2015-07-20 17:42

Personally, I agree with Cluso in that what got done should stay for now.

We can test auto lock and free and or test not doing that too. It is possible to run a lot of scenarios on the current feature definition.

And we need to do that too. Adding interrupts iny view is necessary to fully exploit this design. A design propose, test, refine cycle is to be expected.

cgracey · 2015-07-20 17:44

Here are the new instructions:

Here is a program:

dat org
mov dira,#%111 'make pins 2..0 outputs
mov $1F5*4,#tmrint 'set timer interrupt vector mov $1F3*4,#pinint 'set pin interrupt vector
intper #100 'interrupt every 100 clocks intpin #31 'pushbutton input intmode #%011_001 'interrupt on pushbutton and timer
loop xor outa,#%001 'main loop, toggle pin 0 jmp @loop
tmrint xor outa,#%010 'timer interrupt, toggle pin 1 jmp $1F4*4 wc,wz 'return
pinint xor outa,#%100 'pin interrupt, toggle pin 2 jmp $1F2*4 wc,wz 'return

Hi Chip.

One compiler question.

Can You made in compiler pseudo codes for that returns ?

retit

retip

For simplicity of usage and readablity of programs

Yes, I will add those instructions: RETIP (pins), RETIX (transfer/streamer), RETIT (timer)

cgracey · 2015-07-20 17:57

Do you think we are lacking any helpful masks, or anything? One thing that can be done to mask a source is to just turn it off, and then back on later. It works like a newbie would imagine.

If you are suggesting that INTMASK could instead be performed by disabling an interrupt with INTMODE, I think that's a bad idea. First of all, INTMASK does not prevent the interrupt from triggering, just delays when the associated LINK is inserted into the pipeline. Temporarily disabling an interrupt with INTMODE causes the code to miss an interrupt altogether. There are use case for both approaches, so I would keep INTMASK.
Also, if you use INTMODE, it means that code will have to track whether a specific interrupt was enabled prior to disabling it, so that it can know whether to enable it again later. INTMASK, on the other hand, can be performed without that knowledge.

You guys found the problems!
I'll have a new system done today where we have real priorities and it will be very straightforward. For three interrupt sources, there are only six possible priority orders. To make this really right, we need to have higher-order interrupts be able to interrupt lower-order interrupts, and remove IFREE/ILOCK from that whole equation. IFREE and ILOCK will become software-only controlled, so that they can be used in both mainline code and interrupt code. Also, I will break each interrupt's control into its own instruction. This, I think, will take the interrupt handling as far as it can go. This is going to be very simple, hardware-wise, too. As one of you pointed out, we will use two bits to track the state of each interrupt: %00=inactive, %01=pending, %10=executing, after which the appropriate RETIx (actually LINK D,$1Fx WC,WZ) returns the state to %00. If a higher-priority interrupt is pending, while a lower-priority interrupt is executing, the higher-priority interrupt's 'LINK $1Fx,$1Fx+1 WC,WZ' will be injected into the pipeline to cause it to execute. Does this sound complete?

potatohead · 2015-07-20 18:03

Nice!!

Yeah, it does seem very complete! I'll toss my post above. I wrote it based on the worry of endless changes more than anything else. Bad assumption.

Some quick iteration on this seems to continue to center in on something really solid, and that is, IMHO, the real concern.

Either nail it, or don't do it at all, kind of thing.

I'll have a new system done today where we have real priorities and it will be very straightforward. For three interrupt sources, there are only six possible priority orders. To make this really right, we need to have higher-order interrupts be able to interrupt lower-order interrupts, and remove IFREE/ILOCK from that whole equation. IFREE and ILOCK will become software-only controlled, so that they can be used in both mainline code and interrupt code. Also, I will break each interrupt's control into its own instruction. This, I think, will take the interrupt handling as far as it can go. This is going to be very simple, hardware-wise, too. As one of you pointed out, we will use two bits to track the state of each interrupt: %00=inactive, %01=pending, %10=executing, after which the appropriate RETIx (actually LINK D,$1FX WC,WZ) returns the state to %00. If a higher-priority interrupt is pending, while a lower-priority interrupt is executing, the higher-priority interrupt's 'LINK $1Fx,$1Fx+1 WC,WZ' will be injected into the pipeline to cause it to execute. Does this sound complete?

cgracey · 2015-07-20 18:16

cgracey said:
Switching is probably 10% here, compared to what it was in P2-Hot.
Does this mean a P2-hot with gating would be warm only ??

Maybe not. Gating takes time, which saves power. P2-Hot was as fast as could be.

cgracey · 2015-07-20 18:20

Well, we could get totally pedantic and discuss the high bit being the sign bit:

PP: 00 = disable, 01=pos edge, 10=neg edge, 11=any edge

Good observation! I'm not really serious about the above, BTW.

Either of those should work. That makes those 2 bits an OR truth table for the edge transitions (similar to way that instruction predicates work).

You are right. We will make these patterns logical.

Seairth · 2015-07-20 18:20

You guys found the problems!
I'll have a new system done today where we have real priorities and it will be very straightforward. For three interrupt sources, there are only six possible priority orders. To make this really right, we need to have higher-order interrupts be able to interrupt lower-order interrupts, and remove IFREE/ILOCK from that whole equation. IFREE and ILOCK will become software-only controlled, so that they can be used in both mainline code and interrupt code. Also, I will break each interrupt's control into its own instruction. This, I think, will take the interrupt handling as far as it can go. This is going to be very simple, hardware-wise, too. As one of you pointed out, we will use two bits to track the state of each interrupt: %00=inactive, %01=pending, %10=executing, after which the appropriate RETIx (actually LINK D,$1FX WC,WZ) returns the state to %00. If a higher-priority interrupt is pending, while a lower-priority interrupt is executing, the higher-priority interrupt's 'LINK $1Fx,$1Fx+1 WC,WZ' will be injected into the pipeline to cause it to execute. Does this sound complete?

I still think it is too much. A couple notes:
* The insertion of the second interrupt LINK must be delayed by more than one instruction fetch from a prior interrupt LINK insertion. Otherwise, the second LINK will be lost when the first LINK causes the pipeline to get flushed.

* Remember that, if two or more interrupts are pending, you have to insert the lower-priority LINK first. Otherwise, you end up with inverted priorities.
* It seems that EDGE should be higher priority that XFER. I'm guessing that timely response to XFER interrupts only really affects performance, while timely response to EDGE interrupts can affect function. Taking this one step further, is there a use case to have both XFER and TIMER active at the same time? If not, maybe those can be combined into a single interrupt. That would leave you with one interrupt for external events and one interrupt for internal events.

potatohead · 2015-07-20 18:31

Timer can be used to schedule tasks while xfer is streaming data.

Seairth · 2015-07-20 18:33

Also, to make sure I am understanding the allowable states, you can only have:
00 - Inactive01 - Pending10 - Active
This means it is not possible to have a pending interrupt queued while that interrupt's ISR is active (i.e. "%11").

Seairth · 2015-07-20 18:36

Timer can be used to schedule tasks while xfer is streaming data.

But is that likely to happen? I'd expect streaming to typically occur on a different cog from the task-switching cog.
Regardless, do you agree that EDGE should probably be a higher priority than XFER?

potatohead · 2015-07-20 18:36

I plan on it for video.

The two together replicate waitvid, and allow for simple hold a value on DAC for x time functionality so that processing can happen during both types of events similar to what we had on hot.

Seairth · 2015-07-20 18:41

Also, you should only need a single RETI instruction. You will always be returning from the highest-priority interrupt, so you should be able to mux the inserted LINK (shouldn't that be JMP?) based on the Active bits.

potatohead · 2015-07-20 18:43

I am mulling edge over right now.

cgracey · 2015-07-20 18:54

It does not seem to be possible to restrict a cog's access rights. This is unfortunate, since this prevents sandboxed code execution despite the chip's logical partition into independent cores. I'd like to suggest, that an instruction be added to set a cog into a restricted mode, that should work as follows:

(1) Accessible hub memory is transparently reduced to 32 KB and mapped to disjunct slices (16 x 32 = 512, one for each cog, fix physical mapping but transparent (virtual) addressing).
(2) Access to pins is disabled.
(3) Restricted cogs cannot use the mode setting instruction.

By default, cogs start in unrestricted mode. Any cog's mode can be set by any cog (in unrestricted mode). Transparent hub memory reduction means, that only 32 KB of hub memory can be used. Attempts to read from or write to locations above the respective virtually visible 32 KB of hub memory should wrap (or have the same effect, that an attempt to read from or write to locations above the normally visible 512 KB in unrestricted mode would have.

Implications are:

(1) Cogs in restricted mode only have physical access to their respective hub slice.
(2) Restricted cogs cannot communicate with each other (unless an unrestricted cog acts as a broker, copying message data between disjunct hub memory slices)
(3) Restricted cogs cannot directly access external memory, because they cannot read from or write to pins.

I hope, that this could easily be implemented, since practically no new functionality has to be added but merely a way to restrict functionality shall be provided. This shoulkd add negligable complexity and would allow a safe execution of untrusted code and memory protection without necessitating a memory management unit (MMU).

The recent discussion around interrupts shows, that there are numerous things that people would like to see in a processor for their individual use cases. If I remember correctly, one iterrupt feature iteration has added 2.5 % to cog complexity. I don't know, how much additional complexity further iterations of the interrupt feature, which currently seem to be in discussion, would add, but just wanted to throw in the feature, which to me would be most useful and on which I would spend any remaining complexity budget, rather than on any further iteration of an interrupt feature (or even the first one), if those features were mutually exclusive for remaining complexity budget reasons. While I have no useful intuition about the actual complexity, I naively assume, that the herein proposed feature would add even less than 2.5 % to cog complexity, and hope, that it will be considered.

One thing to consider is that cogs can configure smart I/O pins, too - and even leave them configured after they stop. To really sandbox a cog, you would have to limit its I/O pins, not just memory, and clean up those I/O pins after it stopped. Having done both, I think it would be pretty isolated. This is something we can do maybe later. As you said, it would be easy to do. This would be really useful if the Prop2 was acting as a computer, where unknown software was going to be run on it, especially as a development platform. As an embedded, and presumably debugged, system, this might not have much value, though it might have helped during its development.
Also, you'd need to inhibit the cog's ability to start and stop other cogs, as that could be disruptive.
Time will show if we need this.

cgracey · 2015-07-20 19:03

I see, the forum has eaten my brackets and the characters between them. The instruction was supposed to read "RESTRICT .ON/OFF. .COG NUMBER." (brackets replaced with dots, hope this displays correctly now and clarifies the intention).

There is a button in the post editor that looks like '<>'. That toggles between what is viewed on the forum and the source behind it. I've had to use that a few times to extract errant '<blockcode>' and '</blockcode>' commands, along with the stuff before, after, and between them. This editor software makes a huge mess and looks like Hello Kitty.

dnalor · 2015-07-20 19:53

The interrupt can "only" insert a link in a running cog, right? How difficult would it be, to make it possible to wake a sleeping (waitcnt) cog?This feature would make the "interrupt thing" complete.e.g.: A Uart-cog would poll for TX-data only each ms (not wasting energy in a fullspeed loop). Incoming data would wake it "immediately".

pedward · 2015-07-20 20:51

I vote for the CLI/STI mnemonics for disabling and enabling interrupts.

The IFREE and ILOCK are just too outside of what they mean. CLear Interrupt and SeT Interrupt mean the interrupt flag, if it's cleared then interrupts can't fire, if it's set then interrupts can fire.

jmg · 2015-07-20 20:52

* The insertion of the second interrupt LINK must be delayed by more than one instruction fetch from a prior interrupt LINK insertion. Otherwise, the second LINK will be lost when the first LINK causes the pipeline to get flushed.
Details like this are why I asked earlier for a nested test case of real code, with all mixes active,That's the only real way to nail down all the hand-over and interactions that can occur.I think three INTs are fine, but the handling details matter.
* It seems that EDGE should be higher priority that XFER. I'm guessing that timely response to XFER interrupts only really affects performance, while timely response to EDGE interrupts can affect function. Taking this one step further, is there a use case to have both XFER and TIMER active at the same time? If not, maybe those can be combined into a single interrupt. That would leave you with one interrupt for external events and one interrupt for internal events.

I can think of cases where you would want XFER and TIMER both enabled.-ie the user does not want to fiddle the vectors, as that has its own dangers.
An important use of TIMER will be for COP (Cog operating properly) WGOG type self-sanity-checks, and for background debug trace. In those cases the user system needs to almost not know it is there.
MCUs that do implement priority on interrupts usually allow the user to select which one is King.Most often, those top-level INTS are very small and compact, and do things like count totals.

jmg · 2015-07-20 22:17

The interrupt can "only" insert a link in a running cog, right? How difficult would it be, to make it possible to wake a sleeping (waitcnt) cog?This feature would make the "interrupt thing" complete.e.g.: A Uart-cog would poll for TX-data only each ms (not wasting energy in a fullspeed loop). Incoming data would wake it "immediately".

I think that is possible now. An INT can (and usually would) 'wake' an idle COG and it has choices on RET to exit back to the last PC, or it could change RET adr to +1, or it could modify the WAITCNT value.To keep RET lockout times low, designs can also do an early RETI to non critical 'INT' code, before a final back-to-user RET.I'm not sure you would want an INT to force an exit of WAITCNT on RETI ?
Note: A PinINT can wake on Pin Cell UART CHAR, it does not need to be timer polled.

jmg · 2015-07-20 22:23

Also, you should only need a single RETI instruction. You will always be returning from the highest-priority interrupt, so you should be able to mux the inserted LINK (shouldn't that be JMP?) based on the Active bits.

Without a true stack, the present scheme ofRETIP (pins), RETIX (transfer/streamer), RETIT (timer)
is better, as the address the INT exits to can be user-adjusted. (& it is clear where it is)This is useful in cases where you want to do fast important stuff, and remove the blocking, to allow other INTs to occur while you finish.

Cluso99 · 2015-07-20 23:13

What Chip has now makes sense and covers the parts I would use in some sets of serial receive. It also works for threading by timer, and had been pointed out, a watchdog.

I really think it's time to move on. Let's test what's there. If problems are found, then let's discuss them without ditching what we have (from someone who didn't want interrupts). Maybe there are some gotchas. If so, then they can be fixed or noted as a restriction for use.

I see these interrupts as a use for drivers and other standard objects, rather than everyday use.

Anyway, let's move on for now

Seairth · 2015-07-21 01:17

Also, you should only need a single RETI instruction. You will always be returning from the highest-priority interrupt, so you should be able to mux the inserted LINK (shouldn't that be JMP?) based on the Active bits.

Without a true stack, the present scheme ofRETIP (pins), RETIX (transfer/streamer), RETIT (timer)
is better, as the address the INT exits to can be user-adjusted. (& it is clear where it is)This is useful in cases where you want to do fast important stuff, and remove the blocking, to allow other INTs to occur while you finish.

I don't think I explained myself correctly. You don't need three separate instructions. The interrupt state bits already contain the information you need for a single RETI to perform the correct JMP instruction, as shown by this verilog pseudo-code:
isr_ret <= st_xfer[1] ? JMP $1F0 :
st_edge[1] ? JMP $1F2 : st_timr[1] ? JMP $1F4 : NOP;
In other words, looking at each of the interrupt's "active" bit in order, RETI knows exactly which JMP to perform.
Besides, this also avoids the issue of what to do if someone calls RETIX when not in the the XFER ISR,

jmg · 2015-07-21 03:02

I can see you do not need them, but you've added more logic to remove them, and users still have to preload the three vectors by name.Given they may want to patch the contents of the 3 possible return vectors, explicit RETIX (etc) makes the code clearer. RETI to me implies a normal stack, which P2 does not have.
As for 'calls RETIX when not in the the XFER ISR' that should do what it usually does. Clear the INT flag (redundant as they are not set) and jump to Vector in RETIX register. I do not see an issue ?
In the early stages of developing / measuring interrupt routines, I often call the INT code without a real interrupt. The MCU just treats RETI like a normal return.

cgracey · 2015-07-21 07:39

I think I have an interrupt solution that does everything, plus some extra stuff for almost free.
First, though, PJV had made some request that we could know when a certain hub location was being written to by another cog. The setup and address comparison for that would have been way too complex. We talked about this and found that a much simpler goal would get us there: Have each cog get a signal when the correspondingly-numbered hub RAM (of which there are 16) gets written to at its first address. This amounts to a little combinational logic that feeds a flop that goes out to each cog. If some cog writes within $0..$3, cog 0 gets a pulse. If some cog writes to $4..$7, cog 1 gets a pulse, and so on, up to $3C..$3F causing cog 15 to get a pulse. This would take about 100 LE's for a full 16-cog implementation, which is 0.07% of the total digital logic. But we could even make it better by signalling reads, too, not just writes, for a likely increase of 16 LE's. The point of all this being that we could use these RDZ/WRZ signals for interrupts, waiting, and polling. Cogs could use this mechanism to fully handshake asynchronous 32-bit data streams between them (in the background, even).
Consider that all interrupt sources are signals which go high for one clock per event: timer reload, pin edge, transfer rollover, transfer block wrap, hub long write at cogid*4, hub long read at cogid*4, and whatever else we can mux into the interrupt logic.
There are lots of headaches surrounding interrupt priority, though. Those can be simplified in a big way by having three generic interrupts that are fixed in priority. You just need to mux the desired interrupt source signal into each one, until you've max'd out at the third, lowest-priority interrupt. This way, your interrupt vectors can always build from $1F4/$1F5, downwards, with the top one being the highest priority:
IRET2 = $1F0IJMP2 = $1F1IRET1 = $1F2IJMP1 = $1F3IRET0 = $1F4IJMP0 = $1F5
Here are the instructions that make this work:
SETIPER D/# - set interrupt timer period (32 bits)SETIPIN D/# - set interrupt pin (6 bits)SETIBRK D/# - set interrupt breakpoint (20 bits) - NEW!!!
SETINT0 D/# - set interrupt 0 mode (4 bits)SETINT1 D/# - set interrupt 1 mode (4 bits) SETINT2 D/# - set interrupt 2 mode (4 bits)
IFREE - allow interruptsILOCK - ignore interrupts
Here are the modes for SETINTx:
0000 off0001 timer interrupt0010 transfer rollover interrupt0011 transfer block wrap interrupt0100 breakpoint interrupt0101 pin pos-edge interrupt0110 pin neg-edge interrupt0111 pin any-edge interrupt1000 read mem interrupt1001 write mem interrupt1010 <unused>1011 <unused>1100 <unused>1101 <unused>1110 <unused>1111 <unused>

Interrupt 0 always has highest priority and can interrupt both interrupts 1 and 2.Interrupt 1 has middle priority and can only interrupt interrupt 2.Interrupt 2 has lowest priority and cannot interrupt any other interrupts.
Each interrupt's progress is tracked by two bits: %00=inactive, %01=pending, %10=executing (after which return to %00).
The breakpoint uses a 20-bit register to store the address, and a comparator to check it against P (program counter), which result gets qualified with the 'GET' signal, which is high for the first clock in each instruction. This would cause the interrupt return address to point to the instruction that was going to execute after the instruction at the breakpoint address. This, of course, can be used for debugging.
I'm still thinking about what kind of practical/short hold-off could be employed for pin-edge interrupts. If too many slow down the cog, that's okay, but the cog shouldn't grind to a halt. I'm thinking 16, or so, clocks should be adequate to wait after a return from a pin-edge interrupt, before recognizing a new edge.

Sapieha · 2015-07-21 08:31

Hi Chip.

It is not possible have that user programmable?

I'm
still thinking about what kind of practical/short hold-off could be
employed for pin-edge interrupts. If too many slow down the cog, that's
okay, but the cog shouldn't grind to a halt. I'm thinking 16, or so,
clocks should be adequate to wait after a return from a pin-edge
interrupt, before recognizing a new edge.
Quote

cgracey · 2015-07-21 08:46

Hi Chip.

It is not possible have that user programmable?

I'm
still thinking about what kind of practical/short hold-off could be
employed for pin-edge interrupts. If too many slow down the cog, that's
okay, but the cog shouldn't grind to a halt. I'm thinking 16, or so,
clocks should be adequate to wait after a return from a pin-edge
interrupt, before recognizing a new edge.
Quote

I would like to come up with some value that works adequately for all cases. By the Prop2 user's design, this should not ever be necessary, but in the real world, over-toggling could happen. I'm just thinking about not allowing the cog to get completely tied up. Some hold-off would permit other things to keep going and let the designer see that he has a slow system, and then address his pin problem.
I could make it user-programmable, but it would involve a 32-bit register and counter. The nice thing about doing that is that there's no feedback needed from the return-from-interrupt, which keeps all the the pre-mux interrupt sources open-loop. That would certainly solve the problem, but it's kind of expensive.

Sapieha · 2015-07-21 09:10

Hi Chip.

If 16 is enough then made first 4 LSB bits read only with preloaded value.

And give possibility to only reprogramming of higher bit's.

cgracey · 2015-07-21 09:17

Hi Chip.

If 16 is enough then made first 4 LSB bits read only with preloaded value.

And give possibility to only reprogramming of higher bit's.

I was actually just thinking the same thing!

I'll make it like that, as it simplifies the interrupt logic quite a bit. 16 bits with four implied lsb's would allow hold-offs from 16 to 1M clocks with 16-clock increments. That would be totally adequate.

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments