The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

cgracey · 2015-07-14 21:50

As for one task per cog, that's perfectly fine, best keeping the real estate small, but having to not use AUGS/D or ALTDS when using interrupts is totally fine with me!

If we don't do the CALL when AUGS/AUGD/ALTDS is in progress, but just wait another instruction or two for them to finish, there's no problem. Then, AUGS/AUGD/ALTDS can be used in both mainline and interrupt code without any problem. I've just got to make sure there are no other gotcha's hiding in the shadows.
Every time I add something like this, it's hard on my brain, but it's never more than just a few lines of Verilog, once its done and cleaned up. Hub exec went in really cleanly and simply, after a LOT of brain pain. I might just be getting old. My dad said several years ago that he could only get a few good hours of programming in during the morning, then he's shot for the rest of the day for that kind of work. In contrast, driving a tractor is very easy on the brain and relaxing to think about.

pjv · 2015-07-14 22:24

Chip;
IFFFFF you are contemplating some sort of interrupt scheme, could that then possibly include a JMPRET when executing a read or write to a hub address specified in a cog or hub location? Such a scheme would eliminate the need for polling in a tight loop. The multi-thread scheduler could really benefit from that; eliminating polling would make it much more responsive.
Cheers,
Peter (pjv)

cgracey · 2015-07-14 22:27

Chip;

IFFFFF you are contemplating some sort of interrupt scheme, could that then possibly include a JMPRET when executing a read or write to a hub address specified in a cog or hub location? Such a scheme would eliminate the need for polling in a tight loop. The multi-thread scheduler could really benefit from that; eliminating polling would make it much more responsive.

Cheers,

Peter (pjv)

Peter, I don't fully understand what you mean. Could you give an example?

cgracey · 2015-07-14 22:54

Having at least some simple mechanism to cause periodic, automatic CALLs would be good, though. I'm still ruminating over this timer interrupt idea. It's too expensive to add a bunch of hardware to allow breaks at any time, however, if I qualify the interrupt request with no-AUGS/D-in-progress and no-ALTDS-in-progress, it might have very minimal cost. I'm still trying to make sure this isn't just delusional. This would make things like background serial ports possible without much coding.

Sounds a good idea.
If you do need to pair-opcodes to manage this, I'd suggest making the travel time the same in all cases, to remove any opcode-dependant jitter - that may mean a dummy delay in some opcodes.
Such a fixed delay may even make the coding simpler ?

It might make things simpler, though I imagine it would slow down the majority of cases. Instructions can be regarded as atomic, we just can't interrupt if a coupled set of instructions is executing. I think these are all the cases:
1) If AUGS/AUGD have executed and their values are waiting for an instruction, don't interrupt.2) If ALTDS is executing, the next instruction must be allowed to execute, as well.3) If REP is active, wait for it to finish iterating.4) Some interrupt inhibitors would be needed to allow critical segments to complete without interruption: TLOCK/TFREE. Interrupt code wouldn't ever be interrupted, unless the timer setting was too aggressive.
I think that's all the stuff to worry about. Now, how to sneak a CALL in?

pjv · 2015-07-14 22:58

Well, this is now all based on the fact that there will be no hardware multi-threading. So I presume that there could be some interest in doing multi-threads with a software scheduler. This already works very well in the P1, but the main task, (the scheduler itself) must never be WAITed by any of its associated tasks.... it is the tasks that are WAITed by the scheduler.
So, in order to signal from threads in one cog to threads in another cog, a polling of a par location is implemented. This polling occurs when all scheduled tasks are idle (WAITing). When the poller sees an event, indicated by the contents of its par location, it executes an "interrupt" routine specified by those contents. The rate of this polling is set by programmable maximum and minimum values in the scheduler, but the response time to a par read/write event is still besed on that poll.
If a mechanism existed where a read or write to the par location could trigger a JMPRET in the scheduler to deal with that "interrupt", then the polling could be eliminated, yielding a much more responsive and deterministic operation. On servicing the "interrupt" one would just returnn to the interrupted activity. To keep the harware simpler, one could dispense with saving any C or Z status, and handle that in the "service" routine.
Hope that explanation helps.
Cheers,
Peter (pjv)

jmg · 2015-07-15 00:00

It might make things simpler, though I imagine it would slow down the majority of cases. Instructions can be regarded as atomic, we just can't interrupt if a coupled set of instructions is executing. I think these are all the cases:

Sure, but on something like a timer, extra flight time is not an issue, jitter certainly is.
I'm not sure how many cycles each of those opcodes use, but REP sounds like a special case that would be hard to mix with jitter free.
I think there are SW alternatives to REP, if needed.

Where other opcodes are 'paired' the 'packer' time is only 1 Opcode cycle.

Cluso99 · 2015-07-15 00:12

Personally I would rather not have any interrupts. There are far more useful things to waste development time on.

Do remember we have 16 cogs.

Perhaps there are simpler and other ways to achieve some cooperative co processing???

For example, what if there was an extra waitxxx instruction that could wait for some common general purpose flags. Now suppose some of these flags were private to the respective cogs, and some were shared between all cogs. This would be similar to the internal I/o ports discussed a long time ago. Now, suppose counters could also set/reset these.

We have hubexec, so using software, we could define vectors or a jump table, one for each of these flags. This would enable one cog to fire off a number of cogs to perform certain functions in parallel.

This is not exactly what is trying to be achieved here, but maybe it provokes some thoughts.

jmg · 2015-07-15 00:24

Perhaps there are simpler and other ways to achieve some cooperative co processing???

A common flag set is a good idea for cooperative co processing, but the Timer branch idea is a little different from cooperative co processing.

For example, Timer branch would allow software operating-properly aka watchdog stub.
I can also see Debug uses, where you want to avoid changing the running code, but be able to snoop on it from some low bandwidth back channel.
Chips item 4) above would work nicely in this use case.

potatohead · 2015-07-15 00:55

How does HUBEXEC differ from COGEXEC ?

Are the proposed atomic instruction cases the same, given the FIFO / STREAMER are involved?

David Betz · 2015-07-15 02:09

Ummm... are we starting to go down the same rathole as last time with endless suggestions for new features?

potatohead · 2015-07-15 02:15

Potentially.

Frankly, the simple, "just stop at the next instruction and jump" timer feature proposed would have some good utility for tasking. Now it's more complicated, or not good enough, etc...

Chip mentioned "shadows" and that's why I asked what I did. That's probably where the shadows lie.

I swear, if we go off on another feature slippery slope, I'm out.

Dave Hein · 2015-07-15 02:16

Oh boy, P2 day is almost here! I'm happy to wait for the op code list until the FPGA image and assembler are ready. If Chip releases bits and pieces early it will just slow down progress on the FPGA image because he will be bombarded with questions and change requests.

Is it Goundhog Day again?

ozpropdev · 2015-07-15 02:31

After moving to Propellers in my designs/projects I have developed an "allergy" to interrupts.
I feel 16 cogs is a far more "civilized" alternative.
IMHO Interrupts

jmg · 2015-07-15 02:39

I feel 16 cogs is a far more "civilized" alternative.

Those 16 COGs are still there - it is not an either/or choice.
Problems are :"use another COG" sounds simple, but you do run out of COGs - 16 is a finite number.
The other issue is, some things you simply cannot do with 'just throw another COG on the barbie' - like a COP stub on the running COG, or a Debug stub..
Chip raised the topic, so I'm happy to let him explore it.

potatohead · 2015-07-15 03:07

Yeah, I'm there too. We don't need it at all. And simple seems to always end up messy.

Tor · 2015-07-15 07:46

Hmm.. on the desktop PC I use threads, not to "do more in parallel", I do it to simplify the design: One thread to forever listen to a port (and that's all), for example. Imagine a terminal emulator working with a serial port: One thread listening to the keyboard, another thread listening to the serial port. The result is less code, the simplest design possible.

So in general I use threads as one would use cogs: Dedicate a thread to do one specific thing. Another to do something else. Thus remove complexity.

The thing is: I never need 16 threads for any (PC) project.. not even close. I don't see the need for an interrupt system on the P2 if the worry is only to run out of cogs. There was some discussion earlier about counters and pins and internal ports (in the previous interrupt/no interrupt discussion). One should possibly look there instead if anything should be done at all. I mean, timers are fine.. but I don't need it to interrupt a running process.

-Tor

Cluso99 · 2015-07-15 08:10

There are far more other useful things that could be done if there were time. IMHO Interrupts is not one of them.

evanh · 2015-07-15 09:51

Chip,
I just been skimming the instruction list ... The reverse subtract and compares seem extraneous. What's their purpose?

MJB · 2015-07-15 10:52

An internal FLAG vector / hidden port to WAITxxx on would sure have many uses.
especially if this is of course shared between all the COGs.
so there is no need to waste pins for it.

Dave Hein · 2015-07-15 12:26

Chip,
I just been skimming the instruction list ... The reverse subtract and compares seem extraneous. What's their purpose?

There are times when you want to do "B = A - B" instead of "A = A - B". At least I think that's what the reverse subtract is for. The reverse compare is just a special case of the reverse subtract where the result is not stored. I don't see a strong argument for the keeping the reverse compare.

ozpropdev · 2015-07-15 13:12

@evanh
sub dest,src (dest = dest - src)
subr dest,src (dest = src - dest)

evanh · 2015-07-15 13:38

Doh! Of course, D is simultaneously an input and output. That makes sense now.

Seairth · 2015-07-15 14:20

@cgracey: There are two LINK instructions in the latest instruction set.
(edit: updated with more context, and a mention. or whaterver it's called.)

Ariba · 2015-07-15 15:46

Chip,
I just been skimming the instruction list ... The reverse subtract and compares seem extraneous. What's their purpose?

...I don't see a strong argument for the keeping the reverse compare.

Not sure if this argument is strong, but you can do a higher than comparsion only with the C flag. This spares the Z flag for other purpose.The reverse compare comes anyway for free, it's just a subr without write back.

Andy

rabaggett · 2015-07-15 17:53

rabaggett · 2015-07-15 18:06

I need to figure out how to quote.
I find lots of ways to keep a cog busy. I'd trade speed for interrupt any day

cgracey · 2015-07-15 22:33

Chip,
I just been skimming the instruction list ... The reverse subtract and compares seem extraneous. What's their purpose?

...I don't see a strong argument for the keeping the reverse compare.

Not sure if this argument is strong, but you can do a higher than comparsion only with the C flag. This spares the Z flag for other purpose.The reverse compare comes anyway for free, it's just a subr without write back.

Andy

They come in really handy, sometimes. They save a few instructions when you need them.

cgracey · 2015-07-15 22:42

Hmm.. on the desktop PC I use threads, not to "do more in parallel", I do it to simplify the design: One thread to forever listen to a port (and that's all), for example. Imagine a terminal emulator working with a serial port: One thread listening to the keyboard, another thread listening to the serial port. The result is less code, the simplest design possible.

So in general I use threads as one would use cogs: Dedicate a thread to do one specific thing. Another to do something else. Thus remove complexity.

The thing is: I never need 16 threads for any (PC) project.. not even close. I don't see the need for an interrupt system on the P2 if the worry is only to run out of cogs. There was some discussion earlier about counters and pins and internal ports (in the previous interrupt/no interrupt discussion). One should possibly look there instead if anything should be done at all. I mean, timers are fine.. but I don't need it to interrupt a running process.

-Tor

I like a simple timer interrupt because it does make code a lot simpler when you want to do a few concurrent things in a single cog. For example, I'd hate the Monitor program to need extra cogs for handling serial. On the Prop2 Hot, we had multi-tasking, which was really nice. This will get us similar functionality with coarser time granularity, but way less hardware.
It's pretty much figured out now, and I'm starting the implementation. The Verilog code to this do is quite minimal and it mainly involves muxing 'LINK $1F4,$1F5 WC,WZ' into the instruction pipe and then muxing different values into the result. That LINK instruction that David Betz requested is perfect for something like this.
There will be an instruction 'TSHARE D/#' to enable/disable the timer interrupt. If D/# is zero, the timer interrupt is disabled; otherwise, D/# will set the number of clocks that the interrupt occurs on (ie #200 means every 200 clocks). Two other instructions TLOCK/TFREE can be used to prevent the interrupt from occurring within a block of code. When the timer rolls under and reloads, a bit is set which gets cleared as soon as the 'LINK' instruction can be issued.

jmg · 2015-07-15 22:49

I like a simple timer interrupt because it does make code a lot simpler when you want to do a few concurrent things in a single cog. For example, I'd hate the Monitor program to need extra cogs. On the Prop2 Hot, we had multi-tasking, which was really nice. This will get us similar functionality with coarser time granularity, but way less hardware.
It's pretty much figured out now, and I'm starting the implementation. The Verilog code to this do is quite minimal and it mainly involves muxing 'LINK $1F4,$1F5 WC,WZ' into the instruction pipe and then muxing different values into the result. That LINK instruction that David Betz requested is perfect for something like this.

Sounds great

. Good debug is going to be important, along with Operational Watchdog uses.
Here is an example of the 'State of the play'.http://www.bbc.com/news/technology-33409311
This has a Separate Debug MCU to handle USB and Debug-link

cgracey · 2015-07-15 22:55

About the reverse subtract and compare instructions SUBR/CMPR: they are really useful for processing CNT values, where you want CNT minus a register into the register. They make code a lot tighter in those cases.

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments