The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

jmg · 2014-04-23 19:02

Rayman wrote: »

Maybe they're thinking that using smaller pin spacing means that pins are closer to the die and therefore are better at dissapating heat...

'They' meaning Parallax ? - the Thermal PAD does most of the cooling on the 14mm package, and 0.5mm is ok for Solder paste/reflow volume production.

Amkor spec a 20x20mm package, with smaller Thermal PAD, at close to the same as the 14 x14mm one ( 20.0 vs 20.6 °C/W)
- so the bigger package and more leads, helps the cooling, but most of the work is done by the Thermal PAD and Via-array.

Seairth · 2014-04-23 19:59

evanh wrote: »

Start with NOP, or more accurately IF_NEVER. There is currently over 268 million possibly variants of NOP in the instruction set.

I wonder if there would be any value in getting rid of the predicate field altogether. You could still have an instruction that could conditionally NOP the following instruction:

        if_z
        mov f1, f2

Yes, it would use up a few additional registers. The instruction could even include a field for the number of following instructions are "gated":

        if_z #2         ' the next two instructions are gated
        mov f1, f2
        add f1, f3
        cmp f1, #32     ' first non-gated instruction

You could even support the common if/else idiom (e.g. where you get an instruction with the IF_Z predicate, followed by an instruction with the IF_NZ predicate):

        if_z #2, #1     ' the next two instructions are executed if Z=1, otherwise NOP.  After that, the next (one) instruction is executed if Z<>1, otherwise NOP.
        mov f1, f2
        add f1, f3
        cmp f1, #32

And, on the flip side, you would also gain 4 bits per instruction. So, if you don't use predicates any more than an average of once per 8 instructions, you'd actually be increasing code density!. Just think of the things you could maybe do with those extra bits:

Increase the instruction size (which isn't really necessary for P1+, I believe).
Add flag bits for indirect registers (Chip mentioned this several days ago; not sure if this is still possible)
Increase D/S size. (I wonder how much HUB ram you'd have to give up to get 1024 registers instead.)

In truth, I don't really see this happening. But it's fun to think about.

jmg · 2014-04-23 20:22

Seairth wrote: »
You could even support the common if/else idiom (e.g. where you get an instruction with the IF_Z predicate, followed by an instruction with the IF_NZ predicate):
        if_z #2, #1     ' the next two instructions are executed if Z=1, otherwise NOP.  After that, the next (one) instruction is executed if Z<>1, otherwise NOP.
        mov f1, f2
        add f1, f3
        cmp f1, #32

Do you mean this as a Conditional Skip instruction ?

With JUMPS now taking longer, and a growing number of cases where jumps are best avoided
(LMM-feeders, and Execute in Place engines on QuadSPI memory) there is merit in having a SKIP opcode.

When using Skip, The Code-fetch-stream is not broken as it is with JUMPS, instead some code is just NOP'd or ignored as it streams past..

For small-sized code decisions, this will execute faster than Jumps

It can also be smaller, and with a faster longest path, to psuedo code like this
   Var = GuessValue
   If TestGuessWrong Then  // this can be a skip, faster than a JUMP ?
      Var = FixGuessValue
   end if 

than this 
   If TestGuessWrong Then
      Var = FixGuessValue
      // a JUMP is needed here
   Else 
      Var = GuessValue
   end if

Electrodude · 2014-04-23 20:45

evanh wrote: »

Start with NOP, or more accurately IF_NEVER. There is currently over 268 million possibly variants of NOP in the instruction set.

It would really be nice if that were true, because then Chip would be able to put instructions there so he could re-add the wr bit. Unfortunately, all 268435456 of the IF_NEVER instructions are used, as disabled forms of instructions that are still enablable. If you want to reversibly disable an instruction (on the P1), all you have to do is clear all the conditional bits. When you want to re-enable it, you just put them back to 11 or whatever they used to be.

I really like the idea of repurposing conditionals in cases like IF_[N]Z NEGZ x, y and such, though. Make sure you don't do that for IF_[N][ZC] mux[n][zc] though, because that's actually useful (set/clear if the flag is set, otherwise leave it alone).

Seairth wrote: »
I wonder if there would be any value in getting rid of the predicate field altogether. You could still have an instruction that could conditionally NOP the following instruction:
        if_z
        mov f1, f2

Please don't do that. It's worse than AUGx.

electrodude

Cluso99 · 2014-04-23 20:53

The conditionals on each instruction made for deterministic code and for alternative instructions on conditions (if z load #0, if c load #F etc) without having to use a jump. It makes simple code easy.

As for package, Parallax is aiming at the commercial market and the realities are they want small packages. I thought 0.4 pitch was tight, but 0.5 can be hand soldered with care and practice. The center ground pad makes this a bit harder,but still doable. Tip: if you plan on hand soldering, extend the pads out from the ic another 0.5-1.0mm. This makes it easier to draw the iron out from the pins and reomve any shorts.

tonyp12 · 2014-04-23 21:09

0.50mm should be fine with stencil (oshstensils.com etc) just try to drop down the IC down straight the first time as you don't want to smudge the paste.

Phil Pilgrim (PhiPi) · 2014-04-23 21:15

With that fine a pitch, you do have to be more careful. And you can't put soldermask between the pads, either, because it'll be too narrow to adhere to the board properly and could drift over the pads themselves.. So precision pasting and placement are vital to prevent solder bridges. (BTW, I don't even mask between pads with 0.65mm spacing in my layouts.)

-Phil

Lawson · 2014-04-23 22:20

0.4mm pin pitch will only be an annoyance. While I don't expect to be able to individually solder more than half the pins without bridging, flux and solder wick cleans up bridges so beautifully it's depressing. i.e. the joints I cleaned up look *better* than all the joints I did right the first time. (as long as all the pins of the chip touch the pcb pads)

Marty

Cluso99 · 2014-04-23 23:02

Power draw will be interesting with ~1/20th LEs per cog, and the ALU divided into 6 blocks to reduce consumption. Might even end up being a lowish power ic, nothing at all as originally expected.

evanh · 2014-04-24 01:00

Cluso99 wrote: »

Power draw will be interesting with ~1/20th LEs per cog, and the ALU divided into 6 blocks to reduce consumption. Might even end up being a lowish power ic, nothing at all as originally expected.

Yeah. Also, cool running is one of those things that's under-appreciated until it's gone.

Heater. · 2014-04-24 01:19

On other threads Loopy is implementing a watchdog timer in a COG so as to emulate Arduino behaviour.

Which makes me think: "Does the P2 have a hardware watchdog like AVRs and other MCUs and SoCs do?"

Cluso99 · 2014-04-24 01:28

Heater. wrote: »

On other threads Loopy is implementing a watchdog timer in a COG so as to emulate Arduino behaviour.

Which makes me think: "Does the P2 have a hardware watchdog like AVRs and other MCUs and SoCs do?"

You can always implement a software watchdog in it's own cog. This is the beauty of cogs (cores). It can watch as many cogs(cores) as you like.

jmg · 2014-04-24 01:52

Heater. wrote: »

On other threads Loopy is implementing a watchdog timer in a COG so as to emulate Arduino behaviour.

Which makes me think: "Does the P2 have a hardware watchdog like AVRs and other MCUs and SoCs do?"

Watchdogs take many forms, the more strict software standards require Watchdogs that are independent of Main SysClk (so many uC have their own WDOG Osc), and some have Osc Fail detects on the Main Osc.

Those that do not, can use external Watchdogs, and an Async Reset to force a known state on WDOG fail.

Also, when coming out of reset, being able to confirm OSC operation (Xtals, PLL etc) before change-over is another common requirement.

I think Chip was going to look at improving details around this, with some more counter options.

If hardware tasking makes the cut, (hopefully it will), some quite advanced Watchdog / COP can be done in one Task, as well as core-dump style failure reports..

Heater. · 2014-04-24 03:25

I have seen all kind of watchdogs. A simple C and R can do it on the cheap. I have seen watchdogs on watchdogs in avionics! In the extreme you end up with triple or quad redundant systems all being "watchdog" for each other.

What bugs me about a software watchdog in COG is that it is quite possible for crashed code in other COGS to stop the watchdog by reinitializing the COG.

Seairth · 2014-04-24 04:54

Cluso99 wrote: »

The conditionals on each instruction made for deterministic code and for alternative instructions on conditions (if z load #0, if c load #F etc) without having to use a jump. It makes simple code easy.

In my post, the IF_xxx instruction still results in deterministic code, as it doesn't cause a jump (the same clock cycles are consumed whether it executes the gated instructions or not). This, I believe, is the same thing that jmg was referring to as SKIP. As for my ruminations on how it might be extended, none of that's really necessary to simply implement existing functionality. And, if this allowed us to go to 1024 registers per cog, it's not like the additional instruction would be taking up too much space.

But, since I'm on the topic, suppose the following syntax:

IFELSE, #cond [, #if [, #else]]

where #cond is one of the 16 possible conditional values, #if is the number of instructions (0-31) that get executed if cond is true and ignored (treated as NOPs) if cond is false, and #else is the inverse of that. #else defaults to zero if left off and #if defaults to one if left off. To make things a little cleaner in code, the instruction could have a simpler "IF $cond [, #if]" alias. This instruction would require one 4-bit field and two 5-bit fields to encode the information, which should be easy to do if the INSTR, D, and S were each extended by one bit (the cond could be encoded as the INSTR lsb plus Z/C/I, and #if[/#else/I] could be packed in D or S).

(Incidentally IF always, #0 would be a NOP, meaning that no dedciated NOP instruction would have to be added, just an alias).

Seairth · 2014-04-24 05:08

Heater. wrote: »

What bugs me about a software watchdog in COG is that it is quite possible for crashed code in other COGS to stop the watchdog by reinitializing the COG.

Agreed! I wonder how hard it would be to change COGINIT such that COG 0 could not be restarted by any other cog other than itself. This would at least allow one cog to be protected.

Alternatively, maybe Chip could add a HUBOP that enables/disables COGINIT for a given cog or set of cogs. With this, a cog could protect itself by disable COGINIT for itself. This would not completely prevent other cogs from calling COGINIT on that cog, but would now require an additional HUBOP to re-enable COGINIT first. This two-step process would probably be enough to protect from accidental COGINITs from errant cogs. (I'd probably also have a cog that calls COGSTOP on itself implicitly re-enable COGINIT.)

rjo__ · 2014-04-24 05:43

Do we know yet if the new FPGA image will have connections to the available SDRAM of the DE2 and/or Nano?
Will an SDRAM driver with the kind of throughput in the previous chip be possible using one cog or are we looking at multiple cog implementations to get there?

Seairth · 2014-04-24 05:53

Heater. wrote: »

Which makes me think: "Does the P2 have a hardware watchdog like AVRs and other MCUs and SoCs do?"

And this reminds me of my earlier post about changing the way WAITCNT worked (or, at least, adding NEXTCNT/TESTCNT/JMPCNT). It occurred to me that my suggestion was reminiscent of timers. Which also has a bit of overlap with the existing counters.

In typical hardware timers, you have:

A timer frequency
A timer duration
An interrupt to indicated when the duration has been reached

So, suppose each cog had 4 timers (or more?). The instructions might include:

SETTMR #c/D, #p/S, #n : sets a timer
GETTMR D, #n : gets the current count and sets the C flag if the timer has expired
TESTTMR #n : Sets the C flag if the timer has expired
GETTMRS D, #m : gets all timers matching mask #m and sets the matching bit if expired (Z = 1 means none of the masked timers have expired yet)
WAITTMR #m : waits until one of the masked timers expires
JMPTMR D, #n : Jumps to D if the timer has expired
JMPNTMR D, #n : Jumps to D if the timer has not expired

where

#n is an immediate value between 0 and 3, indicating the specific timer
#c/D is the number of time units before the timer expires
#p/S is the period (in clock cycles) between each decrement of #c.
#m is a bit mask of the timers
D is the address to jump to.

So, one-second timer might look like:

SETTMR #1, period, #0 ' where period is a register containing the frequency value

The timer range (at 200MHz) would be up to 2940.826 years. And you could also call "SETTMR #0, #0, #n" to disable a timer (to reduce power usage).

You can see how a watchdog could easily use this feature. And, taking it back to my original inspiration for NEXTCNT/TESTCNT/etc, FDS could also use this for the bit timing. This might even supplant some uses of CTRx.

evanh · 2014-04-24 05:54

rjo__ wrote: »

Do we know yet if the new FPGA image will have connections to the available SDRAM of the DE2 and/or Nano?
Will an SDRAM driver with the kind of throughput in the previous chip be possible using one cog or are we looking at multiple cog implementations to get there?

Unlikely given the reduced pin count. However, I wouldn't discount it completely, the new direction of moving non-core functionality away from Cogs is a sign of wider selection of specialised smarts in general.

Seairth · 2014-04-24 06:02

rjo__ wrote: »

Do we know yet if the new FPGA image will have connections to the available SDRAM of the DE2 and/or Nano?
Will an SDRAM driver with the kind of throughput in the previous chip be possible using one cog or are we looking at multiple cog implementations to get there?

I imagine the SDRAM driver will be a bit harder to do now. The last version of the code I saw depended on AUX and SETXFR, neither of which are available in P1+.

rjo__ · 2014-04-24 07:49

Seairth,

Thanks

On the first pass, I wouldn't expect any connections. But the adaptor boards only have room for about 32+/-pins. That leaves plenty of room to dedicate prop pins to available hardware assets on the FPGA boards. I know that there were SDRAM solutions for the Prop1, so it should be possible to do something on the P1Rex. Having the connections would allow prototyping complete solutions for which the new chip seems uniquely suited. I know that Chip dealt with some hairy constraints and I was just wondering what those constraints mean in the context of the new chip. It is perfect for third party help, if the basic connections are crafted.

Rich

Lawson · 2014-04-24 08:46

Heater. wrote: »

I have seen all kind of watchdogs. A simple C and R can do it on the cheap. I have seen watchdogs on watchdogs in avionics! In the extreme you end up with triple or quad redundant systems all being "watchdog" for each other.

What bugs me about a software watchdog in COG is that it is quite possible for crashed code in other COGS to stop the watchdog by reinitializing the COG.

I've used an external analog watchdog in a recent project. It's based on a charge pump, so periodic bursts of pin toggling are needed to re-set it. I figure it's less likely for the code to crash with the watchdog pin toggling than with the watchdog pin high or low.

If the P2 does get a watchdog, it sounds like a good thing to put next to the reset pin and config with the MSGX instructions the "smart pins" use.

Marty

jmg · 2014-04-24 13:14

Seairth wrote: »

In typical hardware timers, you have:
A timer frequency

A timer duration

An interrupt to indicated when the duration has been reached

So, suppose each cog had 4 timers (or more?). The instructions might include:
SETTMR #c/D, #p/S, #n : sets a timer

GETTMR D, #n : gets the current count and sets the C flag if the timer has expired

TESTTMR #n : Sets the C flag if the timer has expired

GETTMRS D, #m : gets all timers matching mask #m and sets the matching bit if expired (Z = 1 means none of the masked timers have expired yet)

WAITTMR #m : waits until one of the masked timers expires

JMPTMR D, #n : Jumps to D if the timer has expired

JMPNTMR D, #n : Jumps to D if the timer has not expired

Chip is placing Timer/Counter Cells at the pins, and the usual pin Read visible to the COG can be remapped from a Pin, to a Timer Flag. (this in addition to the std P1 cloned COG Adder/Counter)
That will cover most of your cases above, and I guess even the multiple masked ones, via the waitpin mask.
It will cover full range PWM, and quadrature counting, and other capture features.

Chip did mention some means to start multiple Pin-cells at the same time ( IIRC the Pin Write remapped to an Enable/Trigger)

These opcodes handle Pin-cell communication

MSGIN	D,S/#			(receives message on pin, C=timeout)
MSGOUTA	D/#,S/#			(send message to pin(s) on OUTA)
MSGOUTB	D/#,S/#			(send message to pin(s) on OUTB)
MSGDIRA	D/#,S/#			(send message to pin(s) on DIRA)
MSGDIRB	D/#,S/#			(send message to pin(s) on DIRB)

jmg · 2014-04-24 13:25

rjo__ wrote: »

Do we know yet if the new FPGA image will have connections to the available SDRAM of the DE2 and/or Nano?
Will an SDRAM driver with the kind of throughput in the previous chip be possible using one cog or are we looking at multiple cog implementations to get there?

To support SDRAM and LCD Display writes, some form of parallel strobed IO will be needed.
I think the Video-pathway could manage this, by supporting bypass of the DAC choice, and also allowing burst-read

That's not quite a full SDRAM driver, but it would support reasonable software-assisted burst access R/W and should allow good bandwidths on streaming writes.

With a little care around the Chip selects, it may be possible to parallel a 16b LCD bus to 16b SDRAM and alternate flows ?
ie Software does the command control, and HW can stream a buffer-size in or out (or by Counter control).

JonnyMac · 2014-04-24 13:38

I've been very busy coding several P1 projects so I haven't visited this forum.

For the most part, I'm thrilled with the Propeller as it is, and will heartily welcome more cogs and memory. What I'd really love to have is the ability to do set-and-forget PWM control using a counter -- control that allows me to maintain duty cycle and frequency without having to reload the phsx register in a loop. This feature does in fact exist in the SX48, and I used it for motor control in a product built by Camera Turret Company (www.cameraturret.com). With the demise of the SX, I encouraged Lou to embrace the Propeller and now all of his product use it. What's frustrating, though, is that we have to use a cog to run motors with a specific duty cycle and frequency.

In my idyllic world, there would be would be two frqx regsiters for each counter: frx1 and frx2. The frx1 register would be used in the modes we know and use today. In set-and-forget mode, frx1 would hold the "on" ticks for a pin, the frx2 register would hold the "off" ticks for a pin (of course, those would be reversed for a differenital pin).

It's a small request, Chip... what do you say?

jmg · 2014-04-24 13:48

JonnyMac wrote: »

It's a small request, Chip... what do you say?

You are in luck - This (and many other) features are already in the Pin Counter cells, see Thread Putting smarts into the I/O pins

Invent-O-Doc · 2014-04-24 13:49

Glad that is in the smart pin already. I see diminishing value to a lot of things that used to take a COG. I mean, there's 16 of them now. I won't mind doing a watchdog timer, or serial driver, or whatever....

Seairth · 2014-04-24 14:22

jmg wrote: »

Chip is placing Timer/Counter Cells at the pins, and the usual pin Read visible to the COG can be remapped from a Pin, to a Timer Flag. (this in addition to the std P1 cloned COG Adder/Counter)
That will cover most of your cases above, and I guess even the multiple masked ones, via the waitpin mask.
It will cover full range PWM, and quadrature counting, and other capture features.

Chip did mention some means to start multiple Pin-cells at the same time ( IIRC the Pin Write remapped to an Enable/Trigger)

Yeah, it might be possible to re-utilize the smart pins. Maybe. But the timers I speak of are internal only. While there is similarity to the CTRx and smart pins, they are not (necessarily) meant to drive pins. It seems like a misuse of the smart pins, but I'll happily reserve judgement until I see what Chip comes up with.

jmg · 2014-04-24 15:19

Seairth wrote: »

Yeah, it might be possible to re-utilize the smart pins. Maybe. But the timers I speak of are internal only. While there is similarity to the CTRx and smart pins, they are not (necessarily) meant to drive pins. It seems like a misuse of the smart pins, but I'll happily reserve judgement until I see what Chip comes up with.

Timers all cost area, but I think most of what you are after, is do-able with a saturating option on the present Counter, set to decrement ?

In this mode, <> 0 is still running, and =0 is timed-out, so no new opcodes are needed, just a simple added mode-option.
An Auto-reload alternative could auto-repeat.

There are a number of spare bits in CTRx config regs - enough to have 2 bits for Mode and some OF Flags. ( The 3 bit PLLDIV field will also be spare, but is probably best reserved for backward compatible )
Each new mode usually slows down a counter slightly, so the MHz impact would need to be checked.

Has the same number of counters and registers, but some more choices on how they run/pause ?

evanh · 2014-04-24 16:16

I don't think counters in the smart-pins has happened just yet. Chip has been way too busy refactoring the Cogs - which still have the Prop1 counters in them, afaik.

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments