The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

evanh · 2015-07-19 00:28

I'm certainly struggling to work out what Cluso is wanting added to the hardware that isn't just a software concern.

The Hold-off timer is purely for noise filtering. To prevent too many IRQs coming at the software.

PS: Transmitting can be handled in the same ISR given it's a regular tick that can check both buffers.

evanh · 2015-07-19 00:53

I have to admit I am suitably impressed with what Chip has done. Looking at the scope traces he presented ( http://forums.parallax.com/discussion/comment/1337306/#Comment_1337306 ), it looks pretty compact:

I see it's taking 6 clocks per toggle of the main loop. And only a mere 10 clocks for the interrupt to interject it's toggle every 100 clocks. Okay, the main loop JMP can be considered as overhead, presumably that's 4 clocks. So, there is 8 clocks (of hardwired ISR overhead + return) which equates to two JMPs, which figures.

Do JMPs have to be four clocks? On the Prop1, they are only double time when not taken.

cgracey · 2015-07-19 00:59

Do JMPs have to be four clocks? On the Prop1, they are only double time when not taken.

JMPs are four clocks because they cancel the trailing instruction, which needs to get through the pipe before the new instruction stream gets to the execute stage.
The Prop1 assumed the JMP in DJNZ would happen, to keep things fast most of the time, and then it patched things up if it didn't JMP. There was no pipeline in the Prop1, so it didn't have to worry about in-progress instructions getting through the pipe.

evanh · 2015-07-19 01:07

cgracey said:
The Prop1 assumed the JMP in DJNZ would happen, to keep things fast most of the time, and then it patched things up if it didn't JMP. There was no pipeline in the Prop1, so it didn't have to worry about in-progress instructions getting through the pipe.

Patching things up sounds awfully similar to worrying about in-progress instructions.

cgracey · 2015-07-19 01:22

cgracey said:
The Prop1 assumed the JMP in DJNZ would happen, to keep things fast most of the time, and then it patched things up if it didn't JMP. There was no pipeline in the Prop1, so it didn't have to worry about in-progress instructions getting through the pipe.

Patching things up sounds awfully similar to worrying about in-progress instructions.

I think of the Prop1 like a chopping block and the Prop2 like a GI tract.

evanh · 2015-07-19 01:23

Ah, I think I understand a bit more. The Prop1 JMP/JMPRET instructions didn't need an early decision to fetch the branched instruction, where as the Prop2 would need this because there is more instruction overlap.

I don't know how hard this is but I'd be keen to have it added to each Cog.

evanh · 2015-07-19 01:30

I suppose the REP instruction will compensate in many situations.

cgracey · 2015-07-19 01:40

Ah, I think I understand a bit more. The Prop1 JMP/JMPRET instructions didn't need an early decision to fetch the branched instruction, where as the Prop2 would need this because there is more instruction overlap.

I don't know how hard this is but I'd be keen to have it added to each Cog.

You mean have DJNZ start the JMP early? It would speed things up in cog exec mode, but really explode the timing in hub exec mode on a fall-through. Hmmm, we know if we are in cog exec mode, so that could be done, saving two clocks on a looping DJNZ, but then costing maybe four clocks on a fall-through. I think REP takes care of fast looping pretty well, but too bad it must block interrupts.

potatohead · 2015-07-19 01:45

People can always unroll code to marginalize the longer JMP.

Seairth · 2015-07-19 01:45

Chip, without opening up a can of worms, might it be possible to have 2 sets of interrupt tables, one pair for the timer interrupt and one pair for the other interrupts. The intent is to permit a separate timeout over the other interrupts.
So the inserted instruction would be
LINK $1F2,$1F3 WC,WZ and
LINK $1F4,$1F5 WC,WZ

It would be simple to accommodate multiple sources with multiple vectors, but I'm worried we would need to start making a more complicated system to handle simultaneous interrupts. Right now, there is no special context of being in an interrupt. A link instruction was inserted, that's all. It seems the hold-off counter solution wouldn't suffice, anymore, with multiple sources. It's hard to think about!

Fine Chip. We don't want to over complicate things and I am not really a proponent of interrupts anyway. If they are there then we will find ways to use them in special drivers.

Cluso, I was thinking about your request, as it pertained to serial. Of course, I don't know which serial you were talking about, but I'm going to guess you meant TTL self-clocked asynchronous serial (a UART). With that in mind, it occurs to me that a pin interrupt timeout should not be necessary in the first place. Here's the general approach I'm thinking of:
* Use the interrupt mechanism for read, use the "main" task for write
For the receive routine:1. Read is always started by setting up an interrupt on RX falling edge (presumably the beginning of the start bit).2. Once a falling edge is discovered, switch from pin interrupts to timer interrupts, such that the RX pin is sampled at bittime/2. If the value is not low, then this was not a start bit. Go back to step 1.3. Now, set the timer interrupt to bittime, thereby sampling the bit stream in the middle of each following bit.4. Continue sampling for the number of bits, plus the parity and stop bit(s). If the parity wrong, return to Step 1. If stop bits are low, return to Step 1.5. Otherwise, good data. Store to hub (or to local buffer that the "main" task can transfer later).6. Go back to step 1.
Now, the obvious issue here is that, if you start a read mid-frame, you need a way to correct for this. There are two different approaches I can think of, both of which can be done with the current interrupt approach.
The point is, I don't think you will need a timeout, at least for the "TTL serial". And I think the code itself will be very straight forward. As much as I am still in awe of Chip's one-cog serial driver, I suspect the current simple interrupts will make that sort of legerdemain unnecessary most of the time.

Yes, I was thinking serial in general as well as async ttl uart. I thought the receive could be done automatically via interrupts, and the Tx in main.

I think there are other potential uses too. I am not so sure about the usefulness of the time delay that Chip has implemented although we will see.

Looking forward to doing some testing. Also the smart pins will be very interesting in seeing their capabilities.

Well, right off the bat, the minimum time delay has to be something like 7 clock cycles, or whatever is long enough to jump to the interrupt routine and execute the ILOCK to prevent another interrupt trigger.
I suppose Chip could have hard-coded a time delay to allow for this.
But suppose instead that you wanted to simply count the number of edges. To do so, you would need only two instructions: ADD, JMP. So, if you don't want to also use ILOCK/IFREE, then those 7 cycles now needs to be 9.
Now, suppose instead that you want to do that AND switch the interrupt to look for the opposite edge. Now you need at least 6 instructions (TEST, ADD, INTPR, ADD, INTPF, JMP).
Or... well, you get the idea. The reality is that there will always be an application-specific minimum time delay that must be guaranteed before you can allow an interrupt to re-occur.

jmg · 2015-07-19 01:54

I'm certainly struggling to work out what Cluso is wanting added to the hardware that isn't just a software concern.

The Hold-off timer is purely for noise filtering. To prevent too many IRQs coming at the software.

PS: Transmitting can be handled in the same ISR given it's a regular tick that can check both buffers.

There is hold-off for pin-style INTs and a Timer for time-paced INTsI think currently, you have to choose Pin-Cell or Time mode, and cannot support both.It would be good to have Pin-Cell on one vector and Timer on another, but on a first-come basis - question is, how much does that impact the logic ?

evanh · 2015-07-19 01:59

People can always unroll code to marginalize the longer JMP.

I'm mostly concerned with bit-bashing via interrupts right now. The ISR overhead can be halved by making the JMPs a two clock instruction.

jmg · 2015-07-19 02:21

People can always unroll code to marginalize the longer JMP.

I'm mostly concerned with bit-bashing via interrupts right now. The ISR overhead can be halved by making the JMPs a two clock instruction.

if you want to bit-bashing via interrupts, then you'll need to add lines to de-jitter the INT delays (assuming that can actually be done) - reading the buried timer would be one way, running a parallel CNT delta would be another )

evanh · 2015-07-19 02:37

Jitter is a separate issue and is application dependant just like throughput is. For the most part there is can only be 2 clocks jitter when executing Cog code. Chip's example had none at all because it ran in beat with the timer. There will be predictable lag for sure but no significant jitter unless instructions like REP are used inappropriately.

10ns(peak-to-peak) jitter isn't such a big deal and good luck with compensating for it.

evanh · 2015-07-19 02:54

PS: I'm thinking Chip got it right first when he said two clocks rather than two instructions. The two instruction delay is part of the lag rather than the jitter.

And, of course, lag is another issue that is application dependant. Many many, nearly all, applications can operate perfectly with huge and variable lag.

Seairth · 2015-07-19 03:00

The INT delays could be significantly more than 2 cycles. For instance, and AUGD/S would add another two clock cycles. If an unaligned hubop is in the pipeline, the delay could vary considerably.

potatohead · 2015-07-19 03:04

I wonder how those work now with the new HUB design?

cgracey · 2015-07-19 03:07

We've all been ruminating on the same things, I think.
How about this:
INTMODE D/# - set interrupt modeINTPER D/# - set period for timer interruptINTPIN D/# - set pin for edge interruptIFREE - allow interruptILOCK - don't allow interrupt until IFREE
Interrupt mode settings: %LSS_LPP_LT
%LSS = streamer interrupt (issues 'LINK $1F0,$1F1 WC,WZ')
L: 1=ILOCK on interrupt SS: 0x=disable, 10=rollover, 11=block wrap
%LPP = pin edge interrupt (issues 'LINK $1F2,$1F3 WC,WZ') L: 1=ILOCK on interrupt PP: 00=disable, 01=any edge, 10=pos edge, 11=neg edge
%LT = timer interrupt (issues 'LINK $1F4,$1F5 WC,WZ') L: 1=ILOCK on interrupt T: 0=disable, 1=enable
Whenever a JMP/LINK to $1F0/$1F2/$1F4 occurs (return from interrupt), IFREE happens automatically.
This system takes about the same amount of logic, but has these advantages:
- multiple concurrent interrupts are allowed on a first-come basis, but in the event of simultaneous occurrence, they are prioritized as: streamer, pin-edge, timer.
- Timer period and pin number can be changed on the fly.
- Interrupt routines can be protected from interruption by the L mode bits, cancelling the need for hold-off counting. When they exit, IFREE always occurs.
This would take an hour to change. I like the idea of being able to have slow housekeeping interrupts with fast event-driven interrupts that are protected from interruption. This could allow a whole console computer with video, keyboard, mouse, and RTC to operate from one cog, with mainline code not needing to worry about any keyboard/mouse/RTC details - they would all be taken care of in the background.

cgracey · 2015-07-19 03:08

The INT delays could be significantly more than 2 cycles. For instance, and AUGD/S would add another two clock cycles. If an unaligned hubop is in the pipeline, the delay could vary considerably.

There are no penalties for unaligned accesses. Everything takes the same amount of time now.

potatohead · 2015-07-19 03:14

I'm very strongly in favor of this. If we are going to do interrupts, sorting out a really sweet set of cases is worth it. We are thinking about the same things.

And the mix 'n match only makes sense if we've got great capability on all axis too.

Doing this means this chip design is capable of most of what the "hot" one was.

And it's got a ton of COGS!

We will need to provide some canned examples for common things. Those will get sorted as we jam on the FPGA and build some basic stuff.

potatohead · 2015-07-19 03:15

And for those who will say, "oh, but spud! My god. INTERRUPTS!"

First, we need 'em. With no tasker, the COGS won't actually be able to really saturate that nice HUB. And that's a shame.

Second, it's been a design goal to have this chip compete and be able to do a lot of tasks concurrently. Heck, serving up BASIC to say... 10 users at the same time, might be pretty great. So will streaming data from all over the place through the nice, fast HUB.

Finally, when we go to market it, all the discussion has been on dealing with this big exception. How many pages on that? I don't even want to count.

If it's a hybrid, great! I've dealt with hybrid products and their sales / marketing / implementation. Hybrid, or dual purpose type designs SUCK, unless both modes nail their use cases. So, that's another way of saying, we need to nail this one.

Interrupts designed to not conflict, or glitch, and are easy to use. Seriously. What Chip just put here is easy. Really easy.

And, unlike other designs, one can take that "full computer cog" Chip described, set pins, buffer memories, insert it with a COGSTART and GO, no different than say, "FullDuplexSerial" on P1. That's compelling. It's not going to require the usual game of weaving the new demand into a house of cards type exercise most people associate with this kind of thing.

Yanomani · 2015-07-19 03:26

Hi Chip

In the instruction,

INTMODE D/# - set interrupt mode
Are the interrupt mode settings: %LSS_LPP_LT limited to 8 bits by space constraints or because there is no use for the nineth bit at the immediate parameter?

Yanomani

jmg · 2015-07-19 03:28

We've all been ruminating on the same things, I think.
How about this:
INTMODE D/# - set interrupt modeINTPER D/# - set period for timer interruptINTPIN D/# - set pin for edge interruptIFREE - allow interruptILOCK - don't allow interrupt until IFREE
Interrupt mode settings: %LSS_LPP_LT
%LSS = streamer interrupt (issues 'LINK $1F0,$1F1 WC,WZ')
L: 1=ILOCK on interrupt SS: 0x=disable, 10=rollover, 11=block wrap
%LPP = pin edge interrupt (issues 'LINK $1F2,$1F3 WC,WZ') L: 1=ILOCK on interrupt PP: 00=disable, 01=any edge, 10=pos edge, 11=neg edge
%LT = timer interrupt (issues 'LINK $1F4,$1F5 WC,WZ') L: 1=ILOCK on interrupt T: 0=disable, 1=enable
Whenever a JMP/LINK to $1F0/$1F2/$1F4 occurs (return from interrupt), IFREE happens automatically.
This system takes about the same amount of logic, but has these advantages:
- multiple concurrent interrupts are allowed on a first-come basis, but in the event of simultaneous occurrence, they are prioritized as: streamer, pin-edge, timer.
- Timer period and pin number can be changed on the fly.
- Interrupt routines can be protected from interruption by the L mode bits, cancelling the need for hold-off counting. When they exit, IFREE always occurs.
This would take an hour to change. I like the idea of being able to have slow housekeeping interrupts with fast event-driven interrupts that are protected from interruption.

Sounds nice and flexible to me, especially if "This system takes about the same amount of logic"
This could allow a whole console computer with video, keyboard, mouse, and RTC to operate from one cog, with mainline code not needing to worry about any keyboard/mouse/RTC details - they would all be taken care of in the background.

All of that in one cog is a compelling demonstration.
Not related to INTs, but related to this example : Can the PLL be made to lock to external Sync, allowing easy Character Insertion over video ?

evanh · 2015-07-19 03:29

Thanks Chip, looks good. The period and pin select is going to all be in the Smartpins isn't it? They can't be two clock instructions can they?

cgracey · 2015-07-19 03:31

Glad you approve, potatohead! I was half-expecting you to say something like, "Stop the madness!"
I'll get this changed tonight. The implementation is just a twist on what I already finished and cleaned up yesterday. This makes it feel quite complete.

cgracey · 2015-07-19 03:34

Hi Chip

In the three instructions,

INTMODE D/# - set interrupt modeINTPER D/# - set period for timer interruptINTPIN D/# - set pin for edge interrupt

Are the interrupt mode settings: %LSS_LPP_LT limited to 8 bits by space constraints or because there is no use for the nineth bit at the immediate parameter?

Yanomani

I thought of how we could use that 9th bit for the LT, to become LTT, but I don't know what kind of functionality we could add to the timer interrupt. Any ideas?

jmg · 2015-07-19 03:35

The period and pin select is going to all be in the Smartpins isn't it?

I think there is a Local timer, and also the Pin Cell timer(s) - so this ?
INTPER D/# - set period for timer interruptThis one has a local reload-timer
INTPIN D/# - set pin for edge interruptThis one attaches to a Pin-channel, but the 'Pin' signal could be from a Pin-Cell timer overflow, or a Pin-Cell timer capture, or a UART ready ? or ..?.

potatohead · 2015-07-19 03:36

lol

If it got to a point where the interrupts were confusing, yeah. Maybe. But these are distinct purposes, each with a hardware vector, and frankly, will be needed to make a COG do what a COG can do. Edit: This design dispenses with "in COG" special functions. WAITVID Great! WAITVID was cool for video, but everyone really wanted it to go bi-directional.

Without these interrupts, or a tasker, these COGS actually have some limits P1 COGS do not.

What sense does that make?

For me, I was thinking about the monitor, and on chip type development I did on "hot", and replicating that would take a lot of COGS. That made the 16 look more like 6 or maybe even 4 COGS. Nice, but a far cry from where it was.

The other thing is heat. All the polling and background activity in "hot" was quite expensive! When the decision was made to cut some features, like that video system and the in COG math, it just didn't seem to make any sense to be left at such a reduced performance level.

So my one question is, "are we headed toward hot 2.0?" Seriously. Tell me we aren't, and this is all good.

cgracey · 2015-07-19 03:36

Not related to INTs, but related to this example : Can the PLL be made to lock to external Sync, allowing easy Character Insertion over video ?

Well, if you fed the XI pin with a clock that was also used in the video, I suppose you could get them to sync. I don't know, though.

jmg · 2015-07-19 03:39

So my one question is, "are we headed toward hot 2.0?" Seriously. Tell me we aren't, and this is all good.
This has to lower total power in typical use cases. The Main COG code can now spend longer in a WAITxx state, and is less likely to need a power-hogging tight polling loop.

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments