The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Rayman · 2014-04-08 11:25

Yeah, with only 8 cogs, I though multitasking was essential.
But, with 16, I don't think it's necessary...

jazzed · 2014-04-08 11:46

Bill Henning wrote: »

Yep, I missed his answer to you.

I was trying to save him time

I really don't know how Chip manages with so little sleep.

Chip can handle himself very well. The Walnut harvest is a little more demanding if you're available ;-)

His personal schedule does seem amazing. He is in very good physical shape though, so that makes it easier.

Seairth · 2014-04-08 12:12

Rayman wrote: »

Yeah, with only 8 cogs, I though multitasking was essential.
But, with 16, I don't think it's necessary...

It seems, however, that Chip is determined to put some form of it back in (see his earlier comment about the ROM_Monitor needing it). If it's going to be added, I wonder if it can be kept very simple:

Two tasks (T0 - T1)
Simple round-robin of enable tasks
T0 is always enabled
JMPT1 (same as P2, there is no JMPT0)
THALT (same as P2)
TMODE #n/S (LSB enables/disables T1)
No register remapping
HUBOPs stall
WAITxxx self-jumps

Looking at version 0.5 of the ROM_Monitor, I wonder if this would be enough. It may not be strictly necessary to run the baud_task after initial detection. Alternatively, it could be re-enabled (in place of rx_task) after a read timeout period.

If more than two tasks are necessary, then maybe the following approach would work:

Four tasks (T0 - T3)
Simple round-robin of enable tasks
T0 is always enabled
JMPT1, JMPT2, JMPT3 (same as P2, there is no JMPT0)
THALT (same as P2)
TMODE #n/S (enable/disable tasks; four-bit value, where LSB is enable/disable and next three bits are a mask)
No register remapping
HUBOPs stall
WAITxxx self-jumps

wmosscrop · 2014-04-08 12:17

cgracey wrote: »

Do we really need to support words any more, with RDWORD/WRWORD? Bytes, I think, are always needed, but I don't think I've ever used words for anything before. Convention says we need them, but do we, really?

I might be coming in a little late on this subtopic, but my IBM 1130 emulator relies on these instructions.

Walter

dnalor · 2014-04-08 12:42

Chip wrote somewhere in this thread, that they stay.

Invent-O-Doc · 2014-04-08 17:03

I will miss color composite video but can see that being a technology of the past.

I recommend few suggestions for features or well end up with another kludge that wont fit on this process. Even a basic 16 cog 512k propeller will allow for many new applications and bring in money to make a third generation propeller on 65nm.

The technical risk must be kept as low as possible, even at the cost of a favorite convenience feature.

David Betz · 2014-04-08 17:25

Wow! A lot has happened since yesterday! I'm looking forward to seeing a feature list for the new chip. I'm glad it was possible to add hubexec and I hope its complexity doesn't add too much risk to the project.

RossH · 2014-04-08 17:52

David Betz wrote: »

Wow! A lot has happened since yesterday! I'm looking forward to seeing a feature list for the new chip. I'm glad it was possible to add hubexec and I hope its complexity doesn't add too much risk to the project.

Amen to that.

Cluso99 · 2014-04-08 18:18

I really like the way Chip is using the "shadow registers" as the D & I Caches - from a wasted unusable resource, its sheer magic !

Having the hub and cog the same width - 128 bits = 4 longs = QUAD also gives a big improvement to hub bandwidth. He says he will even look at 256 bits = WIDE if the power is within limits. While this isn't necessary IMHO, it surely would be a bonus if it works out.

The change to OnSemi's memory has opened up the whole inner die space and made these things easier.

As for the instructions, many of the extras (on the P2 they were SETxxxx, GETxxxx, CFGxxxx, etc) should really be considered as just simple SET, GET, CFG, etc instructions where one of the operands is xxxx (ie one of the counters, pins, pointers, etc). Perhaps the new similar instructions should be converted to the simple form with an additional operand, so the instruction count becomes significantly less. I know this is semantics, but its a real consideration in the minds of someone deciding whether to use the P16. Fewer base instructions sounds so much better!

I do really like the way Chip is regrouping and working his magic. As long as it doesn't go off the rails again I will be happy.

koehler · 2014-04-08 18:29

I doubt the majority of folks want to see feature creep, however rjo's previous post is valid.
Perhaps suggestions for additional instructs or specific customized ones could be discussed in a seperate thread, and hashed out.

Once Chip has really gotten a handle on the overall design, and worked out any kinks/dependencies, if available some time could spent reviewing those vetted suggestions.
Can't remember how many P1+ Cores will fit into the P2 Core area, however it 'seems' as though a fair number of those from the P2 might be able to be added to available silicon die space, which we have in abundance*.

*Abundance is assuming that 16 P1+ Cores are taking up 1-3 P2 Core die space.
**And also assuming instructions take up miniscule numbers of transistors such that truely useful, germane prgrammatic expressions will increase speed yet again, and reduce code complexity.
Though I may be off-base.

cgracey wrote: »

Yes, I cringed when I thought you'd see that.

It just takes that much to get program flow and computation running smoothly.

I'd post the instruction set, but it's still a mess. I just got rid of the pixel blending instructions. They are totally fun to play with, but an excess in this chip.

mindrobots · 2014-04-08 18:34

I get the feeling Chip is back on target with the vision of the fun chip he wants this to be. Back in charge of balancing his goals and protecting against bloat.

cgracey · 2014-04-08 19:30

Roy Eltham wrote: »

Here is the output from Phil's perl script run against my copy of the obex (as of august 2012):

Opcode frequencies for 88602 lines of PASM code.
_______________________


byte:     22989
mov:      12233
long:      9398
add:       5024
jmp:       4203
call:      4092
word:      3944
or:        2508
test:      2296
rdlong:    1775
shl:       1718
cmp:       1523
sub:       1464
shr:       1428
djnz:      1391
and:       1187
wrlong:     960
andn:       860
xor:        758
movs:       658
jmpret:     574
waitcnt:    490
muxc:       487
rol:        449
rdbyte:     448
rcr:        426
movd:       426
cmps:       383
ret:        371
waitvid:    349
wrbyte:     306
neg:        276
rcl:        271
ror:        259
movi:       231
nop:        219
rdword:     209
cmpsub:     167
tjz:        153
sar:        149
tjnz:       143
muxnz:      106
abs:        104
muxnc:       94
max:         80
waitpeq:     77
waitpne:     72
muxz:        71
subs:        70
wrword:      66
addx:        66
min:         58
sumc:        53
cogstop:     51
cogid:       50
negc:        50
mins:        45
sumnc:       40
maxs:        39
rev:         37
absneg:      32
negnz:       32
adds:        30
clkset:      29
coginit:     13
sumnz:        9
lockclr:      7
subx:         6
negnc:        5
lockret:      4
lockset:      3
negz:         3
addabs:       2
sumz:         2
locknew:      1
hubop:        0
subabs:       0
cmpsx:        0
cmpx:         0
addsx:        0
subsx:        0

This represents actual PASM usage in the files, eliminating comments, strings, etc, and only looking in DAT sections. Thanks Phil!

Wow! Look at how few times LOCKSET and LOCKCLR were used: 3 and 7, only! I wonder if they were even needed, or used just because their existence suggested they were needed.

I've never used these, myself, though I put them in the design because I could imagine they might be necessary, as some others now adamantly suppose. They've hardly been used on the OBEX, though. I would like to just get rid of them. There are other ways to do such things and, ideally, you wouldn't be writing code that required them. I think they might just be faking people out into thinking they are useful. In practice, I've never found them even remotely necessary. I SAY JETISSON THOSE THINGS! Anyone care, who's actually used them? Please speak up!

Ken Gracey · 2014-04-08 19:41

cgracey wrote: »

I SAY JETISSON THOSE THINGS! Anyone care, who's actually used them? Please speak up!

Open the capsule door. We've ridden a horse to the universe of Complexity and we're finally taking a ship back to Simplicity.

Especially if there's another way to achieve the same thing as you mentioned, why waste thousands of dollars documenting?

While we're simplifying, I'd like to propose we jettison the word "cog" in favor of "core". When introducing the Propeller - especially to non-native English speakers in their own country, Propeller-specific terms like this become a hurdle to their understanding and sometimes interest level. Each explanation becomes a tangent of its own, sometimes not coming back to the original message about how it works.

What do ya say? Propeller is a multicore processor, not a multicog processor.

Ken Gracey

potatohead · 2014-04-08 19:58

It's hard to just say "core..." COG is different, but...

"Concurrent Multi-Processor" isn't as overused as "multi-core" is. And the name space for "multi-core" includes a lot of things not Propeller-like. Some devices have multiple cores, and they multi-process, but they don't do it concurrently in the way a Propeller does. We don't use the more accurate terms much these days either, but we've stuck with "COG"

That all could be expanded on with, "Symmetric Concurrent Multi-Processor", and or the same with multi-core and we would get much closer and be more differentiated too. All a mouth full. Marketing this stuff is a PITA, if you ask me on that basis. Buzzwords are a dime a dozen.

The thing about "COG" is that it is different. We own COG. We don't own "multi-core" COG means a very specific thing, whereas "multi-core can mean a whole ton of things, many of which have nothing to do with what COG means.

A similar thing is true with "HUB." Interestingly, we could call this "shared memory" and on the P1 that's a reasonable thing to say. We could say "Shared resource" on the P1+ as it's currently evolving. There isn't as much push to be rid of HUB, and why is that? Just musing.

From a simple, "what is it?" point of view, COG is a problem. But then again, so is "multi-core" because saying that fails to properly differentiate the product, IMHO of course. "Concurrent Multi-Processor" is much closer. CMP...

From a documentation point of view, COG is much less of a problem. Truth is, people have to understand the programming model, instructions, data sheet and such to get going. There is a nice synergy with the instructions, COGNEW, COGSTART, etc...

If we change that, do we do CMPSTART, CMPNEW? Or maybe MCORENEW? How about CORENEW, CORSTART?

In marketing, I get not wanting to use COG. So don't. But the case for eliminating it in documentation, and in casual use, is much weaker.

COG is one of the things that binds this community. We have a lingo associated with the Propeller and other things that's somewhat differentiated from other things out there. And given most of our discussion, COG comes out in context rather quickly. We have surprisingly few "WHAT IS A COG?" questions, and maybe we actually have more "WHAT IS THE HUB EXACTLY?" ones, but that could just be me.

I kind of don't want to quit saying COG when discussing Propeller related things with peers here. I wouldn't mind not saying COG when discussing it in a more general, "Hey! Try this thing out." kind of context though.

Now Saphia did coin, "Smart IO"

If I were you, I would latch onto that one. It's a winner! And it's going to near perfectly describe what we will have in the P1+ chip too.

Finally, we did have a discussion similar to this quite some time back. Multi-core in the micro-controller realm was trendy. Still is, but there are now offerings diluting the trend. Still, the Propeller was somewhat trendy at that time. Still is, when properly differentiated.

(goes back into quiet hole while details on P1+ gestate)

Electrodude · 2014-04-08 20:01

cgracey wrote: »

Wow! Look at how few times LOCKSET and LOCKCLR were used: 3 and 7, only! I wonder if they were even needed, or used just because their existence suggested they were needed.

I've never used these, myself, though I put them in the design because I could imagine they might be necessary, as some others now adamantly suppose. They've hardly been used on the OBEX, though. I would like to just get rid of them. There are other ways to do such things and, ideally, you wouldn't be writing code that required them. I think they might just be faking people out into thinking they are useful. In practice, I've never found them even remotely necessary. I SAY JETISSON THOSE THINGS! Anyone care, who's actually used them? Please speak up!

Please keep the locks. I've found them very useful for doing work on large structures in ram in multiple cogs and preventing access conflicts. Another important use is arbitrating a debug terminal when it's being printed to by multiple cogs at the same time. Locks theoretically can be implemented in software, but it's much more convenient to have them in hardware. Besides, don't they not take up an opcode because they're part of hubop?

However, yes, please jettison most of the stuff in the P2 not in the original P1 for this chip and save it for the real P2, especially things like tasks and hubexec.

Something's wrong with that instruction usage table. There is only one locknew but 4 lockrets?!? All of my programs that use locks have one locknew and no lockret's because they keep their locks forever.

electrodude

cgracey · 2014-04-08 20:03

Electrodude wrote: »

Please keep the locks. I've found them very useful for doing work on large structures in ram in multiple cogs and preventing access conflicts. Another important use is arbitrating a debug terminal when it's being printed to by multiple cogs at the same time. Locks theoretically can be implemented in software, but it's much more convenient to have them in hardware. Besides, don't they not take up an opcode because they're part of hubop?

However, yes, please jettison most of the stuff in the P2 not in the original P1 for this chip and save it for the real P2, especially things like tasks and hubexec.

Something's wrong with that instruction usage table. There is only one locknew but 4 lockrets?!? All of my programs that use locks have one locknew and no lockret's because they keep their locks forever.

electrodude

All right I'll keep them. How many would you ideally like to see in the design?

RossH · 2014-04-08 20:03

cgracey wrote: »

Wow! Look at how few times LOCKSET and LOCKCLR were used: 3 and 7, only! I wonder if they were even needed, or used just because their existence suggested they were needed.

I've never used these, myself, though I put them in the design because I could imagine they might be necessary, as some others now adamantly suppose. They've hardly been used on the OBEX, though. I would like to just get rid of them. There are other ways to do such things and, ideally, you wouldn't be writing code that required them. I think they might just be faking people out into thinking they are useful. In practice, I've never found them even remotely necessary. I SAY JETISSON THOSE THINGS! Anyone care, who's actually used them? Please speak up!

NOOOOOOOOOOOO!!!!

All these figures mean is that you are not looking at a sufficiently representative sample of code.The OBEX is not representative of application level programs - for instance, you probably don't need any locks if you are using only one OBEX object, but you may need them if you are using two or more - so they would be under-represented in the OBEX code, but may be widely used in applications.

I use them regularly, and I rely on them! So do the people who use Catalina (although they may not realize it!). If PropGCC had the kind of low-level multi-threading that Catalina supports, it would need them too.

PLEASE STOP MESSING WITH THE P1 INSTRUCTION SET!

Ross.

Alex.Stanfield · 2014-04-08 20:04

cgracey wrote: »

Wow! Look at how few times LOCKSET and LOCKCLR were used: 3 and 7, only! I wonder if they were even needed, or used just because their existence suggested they were needed.

I've never used these, myself, though I put them in the design because I could imagine they might be necessary, as some others now adamantly suppose. They've hardly been used on the OBEX, though. I would like to just get rid of them. There are other ways to do such things and, ideally, you wouldn't be writing code that required them. I think they might just be faking people out into thinking they are useful. In practice, I've never found them even remotely necessary. I SAY JETISSON THOSE THINGS! Anyone care, who's actually used them? Please speak up!

I'm afraid I've used them many times when sharing stuff between cores. Maybe I haven't used LOCKNEW/RET from PASM directly but for sure I did from SPIN, which in turn might had used the instruction didn't it?

BTW: how can you warrant atomic reserve of a resource in a parallel environment without a lock?

Alex

RossH · 2014-04-08 20:05

Ken Gracey wrote: »

Open the capsule door. We've ridden a horse to the universe of Complexity and we're finally taking a ship back to Simplicity.

Especially if there's another way to achieve the same thing and the benefits are unknown, why waste thousands of dollars documenting something potentially obscure?

While we're simplifying, I'd like to propose we jettison the word "cog" in favor of "core". When introducing the Propeller - especially to non-native English speakers in their own country, Propeller-specific terms like this become a hurdle to their understanding and sometimes interest level. Each explanation becomes a tangent of its own, sometimes not coming back to the original message about how it works.

What do ya say? Propeller is a multicore processor, not a multicog processor.

Ken Gracey

Cogs are not cores. You are just going to confuse people even more.

Seairth · 2014-04-08 20:06

cgracey wrote: »

Wow! Look at how few times LOCKSET and LOCKCLR were used: 3 and 7, only! I wonder if they were even needed, or used just because their existence suggested they were needed.

I've never used these, myself, though I put them in the design because I could imagine they might be necessary, as some others now adamantly suppose. They've hardly been used on the OBEX, though. I would like to just get rid of them. There are other ways to do such things and, ideally, you wouldn't be writing code that required them. I think they might just be faking people out into thinking they are useful. In practice, I've never found them even remotely necessary. I SAY JETISSON THOSE THINGS! Anyone care, who's actually used them? Please speak up!

I'm not opposed, but what would be the alternative?

(note: I'd be a bit cautious about basing overall usage statistics on just the OBEX codebase. The design intent for reusable objects and the design intent of the code that makes use of them can be considerably different. While LOCKxxx may not be used in the OBEX, it might very well be used when coordinating multiple OBEX objects that are otherwise not designed to work together.)

RossH · 2014-04-08 20:07

cgracey wrote: »

All right I'll keep them. How many would you ideally like to see in the design?

Keeping the 8 is fine.

If you need more (as I sometimes do) I use one lock to provide a temporary semaphore while I manage others implemented in software.

Ross.

Alex.Stanfield · 2014-04-08 20:12

cgracey wrote: »

All right I'll keep them. How many would you ideally like to see in the design?

Glad to hear that!

If it's ok to have 32 that would be great!

Alex

jazzed · 2014-04-08 20:12

cgracey wrote: »

Wow! Look at how few times LOCKSET and LOCKCLR were used: 3 and 7, only! I wonder if they were even needed, or used just because their existence suggested they were needed.

I've never used these, myself, though I put them in the design because I could imagine they might be necessary, as some others now adamantly suppose. They've hardly been used on the OBEX, though. I would like to just get rid of them. There are other ways to do such things and, ideally, you wouldn't be writing code that required them. I think they might just be faking people out into thinking they are useful. In practice, I've never found them even remotely necessary. I SAY JETISSON THOSE THINGS! Anyone care, who's actually used them? Please speak up!

The locks are useful Chip. We use them in PropellerGCC. Removing them would break PropellerGCC. They are also used in the PST driver to allow ungarbled multi-cog messages. I know you've had an axe for these things for a while, but they are actually useful. Please don't JETTISON them.

Ken. Yes, Propeller is a mult-core processor. The wikipedia page even has a propeller entry ;-)

Electrodude · 2014-04-08 20:15

cgracey wrote: »

All right I'll keep them. How many would you ideally like to see in the design?

I guess 16 or 32 would be more than plenty. There should definitely be at least twice as many as in the P1 because there are twice as many cogs.

RossH wrote: »

Keeping the 8 is fine.

If you need more (as I sometimes do) I use one lock to provide a temporary semaphore while I manage others implemented in software.

Ross.

Good point, technically only one is really necessary. It's still nice to have more, though. However, you've proven that it is possible to use more than 8, so it will be possible to use more than 16 on this chip because there are 16 cogs, so there should be 32 locks.

By the way, can you add the cogid trick in the P2 to figure out if other cogs are running? I assume it's trivial, right?

Thanks,
electrodude

Ken Gracey · 2014-04-08 20:17

potatohead,

Thanks for the thoughtful reply.

You pointed out that cog is a term that bonds the community. I agree, but we also want the community to be much larger. In fact, it has to be bigger to make this business viable. While there are many reasons it's not a larger community, some of them are small, and this is one of them. We've spent much time in China promoting the Propeller in education and we've made some headway. Several universities in Shaanxi province have taken it on. But English is a relatively new language, especially to the older educators (but the students are now fluent). We've used every possible presentation approach, and our best results are when we get to engage people with hardware in front of them and code on the projection screen. However, even in this very favorable environment with impressive high-level demos, the choice of words affects the amount of retention and interest. As you present the terms - cog, Spin, hub RAM - the interest can fade. But if you replace them with core, Propeller programming language, and RAM, the barriers are lower because they already know these terms. And, at the right time, after they've accepted what they're seeing, we use Propeller-specific terms and get better results.

The process of introducing Propeller-specific terms is easier in Europe, and of course even easier here at home.

You mentioned that perhaps we should use "core" for marketing but leave the term "cog" in place. That's basically what we're doing, so it's good to hear your thoughts.

But I still think we should take it a step further: standardize, make it easier to adopt. That's why I propose "core".

Of course, with instructions based on the word COG this is difficult to do and Chip would need to accept bigger changes. We don't have complete P1 backward-compatibility anyway.

We have made some mistakes in the past and I don't care to see us repeat them again. One of the bigger ones was not having C support right out the gate.

SmartIO is a great term. I don't know what it means yet, but it's somewhat self-explanatory. Thanks to Sapieha.

Ken Gracey

potatohead · 2014-04-08 20:21

If you personify the COG, you will find it considerably more engaging and easily adopted by the Asian people in that part of the world.

Just saying...

That said, I would feel goofy writing and saying CORE instead of COG. Not sure where to go with that, but there it is. Not saying no...

T Chap · 2014-04-08 20:24

Ken, while you in marketing mood on the forum, may I suggest a clearer naming convention. ie A name that rolls off the tongue easier than P16xxxx etc. TurboProp.... SuperPropeller....

potatohead · 2014-04-08 20:31

Ok, answer me this:

If we refer to a COG as a CORE, and we say multi-core, just what is it about our multi-core that is different, or better than the other multi-core devices out there?

What is the secret sauce?

Now, I would take the idea of a COG, personify it, and in some parts of the world would potentially include a character to represent it, and in others, simply do it by reference, and link the COG to multi-core in a way that implies the secret sauce, that partner that helps you get through, just works, etc... and then play off it all having something different than the other guys and having a nice, compact, well differentiated term for it too. But that's me.

Seairth · 2014-04-08 20:31

Alex.Stanfield wrote: »

Glad to hear that!

If it's ok to have 32 that would be great!

If the number of locks are going to be increased, I'd suggest doing them as several discrete sets. For instance, you could do 4 sets of 8. This will not have any affect on LOCKSET/LOCKCLR. However, LOCKNEW/LOCKRET would use D to select one of the four sets. This would allow unrelated semaphores to be more easily implemented.

Just a thought...

Dave Hein · 2014-04-08 20:32

We need at least one atomic read-modify-write instruction, so please keep LOCKSET and LOCKCLR.

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments