Hub Execution Model Thread (split from blog)

pedward · 2013-12-05 17:43

cgracey wrote: »

I think that in hub-execute mode, PTRA will become the program counter and its three LSBs will dictate which long in WIDEs is executing. There should be no problem with 8-long boundaries, as they should seamlessly connect. There will be pipeline stalls, though, when WIDEs are reloading. Those stalls don't break up the continuity in the pipeline, though.

I don't know if having a visible BIG register is good. It seems kind of dirty. I think it would be better to sublimate the function by the trailing pipeline AUGD/AUGS instructions. Also, I understand Bill's argument about BIG being reusable, but couldn't the same thing be achieved just by moving a 32-bit constant into a cog register and using that? I might be missing the boat on a few things here.

About hub-execute mode: To make this really seamless and not like some half-baked LMM mode from before, there needs to be an overall mode awareness, or awareness by task. To support multitasking, plus hub-execute for a particular task, we'd need the latter, plus a task-aware BIG constant mechanism which would have to be some kind of a prefix.

Chip, I knew that HUBEX would have tons of dependencies and little corner cases. I really think this is something you need to shelve for after the P2 going gold. You can work through the FPGA and firm it up afterwards, but I really don't think it is a good thing to focus on right now. My gut tells me that it is a solid 4 months of testing and fixing to make it work right, there is A LOT of testing that needs to happen around that feature.

As it stands, the WIDE is a relatively minor architecture change; you already have thousands of hours of successful user testing on the existing design, this would all need to start over with HUBEX to prove it doesn't break normal stuff.

cgracey · 2013-12-05 18:36

David Betz wrote: »

Why not just range check the immediate value? Is a special syntax really needed?

By making the programmer express ## for 32-bit values, he's signifying that he understands he's going to get an extra long and an extra clock cycle. This scheme is wasteful for non-hub-exec code, because it takes an unnecesary cycle - that 32-bit constant could have been declared in a register elsewhere so that it didn't waste time by being in-line. This mechanism only makes sense within hub-exec code.

David Betz · 2013-12-05 18:38

cgracey wrote: »

By making the programmer express ## for 32-bit values, he's signifying that he understands he's going to get an extra long and an extra clock cycle. This scheme is wasteful for non-hub-exec code, because it takes an unnecesary cycle - that 32-bit constant could have been declared in a register elsewhere so that it didn't need to waste time by being in-line. This mechanism only makes sense within hub-exec code.

Good point.

ozpropdev · 2013-12-05 18:43

cgracey wrote: »

By making the programmer express ## for 32-bit values, he's signifying that he understands he's going to get an extra long and an extra clock cycle. This scheme is wasteful for non-hub-exec code, because it takes an unnecesary cycle - that 32-bit constant could have been declared in a register elsewhere so that it didn't need to waste time by being in-line. This mechanism only makes sense within hub-exec code.

I agree, it makes it a deliberate action by the user thereby clearly defining the outcome.

potatohead · 2013-12-05 18:45

One way to simplify things is to not allow hardware tasks during HUBEXEC.

I'm with Ozpropdev though. If this can be worked out so that it's robust and it can actually replace LMM, I think it's worth it, but not so worth a long time, or risk of buggy trouble prone features.

@David, Chip basically articulated why I liked the "##" syntax. I was thinking the same thing, only use it in the context of hub executable code. Wasn't able to post for a bit.

@Cluso, yeah if we get the feature, I would make use of it during scan lines. At the production clock speed, a ton can get done. However, it is a little less useful if no tasks are permitted, but not so much that it would be worth making video mutually exclusive IMHO. Having a big COG would still be very useful in that context, IMHO.

cgracey · 2013-12-05 18:56

pedward wrote: »

Chip, I knew that HUBEX would have tons of dependencies and little corner cases. I really think this is something you need to shelve for after the P2 going gold. You can work through the FPGA and firm it up afterwards, but I really don't think it is a good thing to focus on right now. My gut tells me that it is a solid 4 months of testing and fixing to make it work right, there is A LOT of testing that needs to happen around that feature.

As it stands, the WIDE is a relatively minor architecture change; you already have thousands of hours of successful user testing on the existing design, this would all need to start over with HUBEX to prove it doesn't break normal stuff.

I've got to get into it a little to see what it's going to take, but you may be right. I can already see that we'll likely need a separate cache mechanism from WIDE, or else other features won't work concurrently (like automated moves to/from hub), and there will be lots of 'splainin to do that nobody will like. The 32-bit immediate D/S mechanism is a keeper, though, because it would be just as important to have with old-style LMM, if that's all we can safely do. Of course, it might be important to make it some kind of a task-tied prefix, so that multi-tasking can still work, along with LMM. We'll see.

David Betz · 2013-12-05 19:00

cgracey wrote: »

I've got to get into it a little to see what it's going to take, but you may be right. I can already see that we'll likely need a separate cache mechanism from WIDE, or else other features won't work concurrently (like automated moves to/from hub), and there will be lots of 'splainin to do that nobody will like. The 32-bit immediate D/S mechanism is a keeper, though, because it would be just as important to have with old-style LMM, if that's all we can safely do. Of course, it might be important to make it some kind of a task-tied prefix, so that multi-tasking can still work, along with LMM. We'll see.

I don't think that the BIG idea will work with LMM mode. The problem is that the instruction being modified and the BIG prefix/suffix will get separated by the LMM loop code. I think you'll need to do something more like LDLO and LDHI to load 32 bit constants if you want to use them in LMM code. Or am I missing something?

cgracey · 2013-12-05 19:02

David Betz wrote: »

I don't think that the BIG idea will work with LMM mode. The problem is that the instruction being modified and the BIG prefix/suffix will get separated by the LMM loop code. I think you'll need to do something more like LDLO and LDHI to load 32 bit constants if you want to use them in LMM code. Or am I missing something?

You're right.

jmg · 2013-12-05 19:02

ozpropdev wrote: »

I agree, it makes it a deliberate action by the user thereby clearly defining the outcome.

I'm ok with a deliberate option, but it's preferable NOT to deviate from industry Assembler standards.

A clear mnemonic is preferable to a cryptic, non standard, ## that someone maintaining the code in 5 years time, will think is a typo.

If you want to make this explicit, I'd add an assembler directive, so automated tools can use the universal syntax, and so that source code is more portable.
Code clarity, predictability to average users, and portability, should win every time.

ozpropdev · 2013-12-05 19:02

If the only way we can get HUBEXEC is to lock out multi-tasking then I'm cool with that.
Yes people, you read correctly.... sacrifice multi-tasking mode to use HUBEXEC!

potatohead · 2013-12-05 19:03

Yep. My thoughts exactly.

potatohead · 2013-12-05 19:10

There is:

"Industry standard" assembler standards'

, and there is PASM syntax.

We've all used PASM syntax, and it resonantes with SPIN syntax and that "##" makes perfect sense to anyone who's been doing either for a while now. Same as "@@" and a variety of other constructs.

Frankly, "industry standard" assembler syntax, something like GAS, can be whatever it is, and the syntax released with or as part of Pnut / SPIN + PASM should be the same sort of syntax everybody using SPIN + PASM is used to.

Those that want to use the Gcc tools will know GAS, and they will be tuned into that syntax, directives, etc...

I think that makes the best sense.

Regarding "industry standard" assembler syntax, what is that exactly? I've seen a pile of assemblers, some do macros, some don't, some part of larger compiler suites like GAS, etc... and frankly, they vary considerably enough to warrant keeping the SPIN + PASM syntax as is.

And I'll be honest here, I really like that syntax and style and feel the call to bend it toward what is often a lot more verbose, pollution in the same way.

Beauty is, to each their own. There will be at least two basic development paths. Gcc and friends, and SPIN + PASM, and I would expect the same "do it all in one file if you want to" lean and mean and productive syntax we've used to great effect on P1 to continue right into P2.

And the Pro / industry standard syntax can just be in the Gcc branch of things. No worries.

jmg · 2013-12-05 19:10

ozpropdev wrote: »

If the only way we can get HUBEXEC is to lock out multi-tasking then I'm cool with that.
Yes people, you read correctly.... sacrifice multi-tasking mode to use HUBEXEC!

I assume you mean only in the COG running HUBEXEC ?

Even there, is such a blunt-hammer needed - you can impose limits on what the task-sharing code is allowed to do, if HUBXEC will tangle with other HUB accesses.

potatohead · 2013-12-05 19:11

I would prefer it either works and the cases are all worked out, or going into HUB mode cancels the tasks on that COG.

Sapieha · 2013-12-05 19:12

Hi ozpropdev.

I will say more --- STOP multi-tasking permanently if COG enter HUBEXEC mode

ozpropdev wrote: »

If the only way we can get HUBEXEC is to lock out multi-tasking then I'm cool with that.
Yes people, you read correctly.... sacrifice multi-tasking mode to use HUBEXEC!

ozpropdev · 2013-12-05 19:22

jmg wrote: »

I assume you mean only in the COG running HUBEXEC ?

Even there, is such a blunt-hammer needed - you can impose limits on what the task-sharing code is allowed to do, if HUBXEC will tangle with other HUB accesses.

Oops! I did mean in the cog running HUBEXEC!
I'm not totally giving up multi-tasking cake, just a slice of it.

jmg · 2013-12-05 19:22

potatohead wrote: »

Those that want to use the Gcc tools will know GAS, and they will be tuned into that syntax, directives, etc...

I think that makes the best sense.

Now you've just made code less portable

The best sense, is something you can hand to a first year student (ex AVR or PIC or 8051), and say 'What does this do' ?

Parallax are strong in education, so they should focus on code clarity. Inventing NEW magic operators, is not clever..

ozpropdev · 2013-12-05 19:29

Sapieha wrote: »

Hi ozpropdev.

I will say more --- STOP multi-tasking permanently if COG enter HUBEXEC mode

Yes Sapieha, that's what I imagined.
Give 100% of cog power to run HUBEXEC.

Bill Henning · 2013-12-05 19:33

Ray,

This would be hidden inside a macro.

Having 32 bit constant arguments is just as against 9 bit S rules.

Check my post, in one example it saved two longs out of nine. Big win.

Cluso99 wrote: »

BILL,
OK, I took a look at your post about reversing the BIG to use lower 23 bits and the XXX #S using the high 9 bits. I really do not like this idea. It goes against all the #S immediate values. I know you can cheat and use it for other obscure results, but for me IMHO it is a real no-no. This is irrespective of whether you automatically save the resultant bits to $1F1 or not.

Bill Henning · 2013-12-05 19:36

Good idea re/ AUGD & AUGS !

Btw, please consider putting lowest 23 bits in AUG, that way aug instructions can be re-used by other code in the 8 line cache as pointers. One example I posted saved 2 of 9 cogs using that trick.

Also, if the composed value (aug + 9 bit constant) is exposed at say $1F1, that can have further memory savings.

cgracey wrote: »

I'm thinking that there should be two BIG instructions: AUGD and AUGS. This way, in case both D and S of the initial instruction are immediate, you can control which one gets augmented without any precedence rules. Also, 32-bit constants can be denoted in PASM by ##:

ADD reg,##100_000

...becomes...

ADD reg,#100_000 & $1FF
AUGS #100_000 >> 9

...and this...

SETSERA ##configvalue,#baud

...becomes...

SETSERA #configvalue & $1FF, #baud
AUGD #configvalue >> 9

Bill Henning · 2013-12-05 19:39

I can see HUBEXEC being used for graphics libraries, gui code etc. but not high performance display drivers.

I suppose low to medium performance graphics drivers are possible in it, but frankly cog-only drivers will seriously out perform it, and not have to share bandwidth with hubexec.

Cluso99 wrote: »

David and others,
Do you see that the HUBEXEC model would also be used to generate video (as in AUX being used to drive the video DACs)? If so, would it be a problem if this was not permitted??? I have an idea.

Bill Henning · 2013-12-05 19:45

Good points.

Using PTRA for the PC would simplify matters, as otherwise every cog would need dual-mode program counters. I was just trying to think of a way of saving the pointer resource.

Correct, the same thing could be achieved by building a temp variable and using that, however that requires an additional instruction.

It's up to you, we make suggestions, you get to cherry pick and polish them.

To fully support multi-tasking is more complex.

If the PC is PTRA, only one task of the four can be in hubexec mode - which is fine. I envisioned hubexec as a whole cog thing, for maximum performance.

Unless there is a separate cache for teh RDxxxxC instructions, the WIDE cache would constantly thrash, totally killing performance for both the hubexec and regular tasks.

To allow multiple hubexec per cog, requires not using PTRA, making four wide dual mode program counters (one for each task), four BIG hidden/exposed registers, and four separate WIDE cache buffers, and four separate caches for RDxxxC's. Too much to add at this stage, raises the risk of problems significantly. Great idea for P3 though

cgracey wrote: »

I think that in hub-execute mode, PTRA will become the program counter and its three LSBs will dictate which long in WIDEs is executing. There should be no problem with 8-long boundaries, as they should seamlessly connect. There will be pipeline stalls, though, when WIDEs are reloading. Those stalls don't break up the continuity in the pipeline, though.

I don't know if having a visible BIG register is good. It seems kind of dirty. I think it would be better to sublimate the function by the trailing pipeline AUGD/AUGS instructions. Also, I understand Bill's argument about BIG being reusable, but couldn't the same thing be achieved just by moving a 32-bit constant into a cog register and using that? I might be missing the boat on a few things here.

About hub-execute mode: To make this really seamless and not like some half-baked LMM mode from before, there needs to be an overall mode awareness, or awareness by task. To support multitasking, plus hub-execute for a particular task, we'd need the latter, plus a task-aware BIG constant mechanism which would have to be some kind of a prefix.

Bill Henning · 2013-12-05 19:46

Good point!

That would work.

potatohead wrote: »

@Cluso, what do you mean?

Maybe HUBEXEC can happen on part of a scan line. Sometimes graphics engines have exceeded what we can get into a COG. On P1, we simply didn't do them, or we had to pack things into two COGS, etc...

At clock speed, quick dips into HUBEXEC mode while AUX is being drawn to the screen could be VERY useful. Nice to have the execute space frankly. Would be a shame to give it up before we've even written any code!

If AUX were linked to HUBEXEC, it would preclude using it in this way. COG mode used to fill AUX, for example.

Then those pixels take a LONG time to render to the screen at the speed of the P2. Drop into HUBEXEC to get various things done, return to COG mode for more feeding of WAITVIDS, and on it all goes. IMHO, people will do this. I'm thinking of it.

I agree with this overall. So many optimizations done already. Digging too deep may well overcomplicate things and limit some of the potential we've already got.

Bill Henning · 2013-12-05 19:46

Ok, I give up

cgracey wrote: »

I agree, too.

Bill Henning · 2013-12-05 19:51

+1

cgracey wrote: »

By making the programmer express ## for 32-bit values, he's signifying that he understands he's going to get an extra long and an extra clock cycle. This scheme is wasteful for non-hub-exec code, because it takes an unnecesary cycle - that 32-bit constant could have been declared in a register elsewhere so that it didn't waste time by being in-line. This mechanism only makes sense within hub-exec code.

Bill Henning · 2013-12-05 19:53

+1

ozpropdev wrote: »

If the only way we can get HUBEXEC is to lock out multi-tasking then I'm cool with that.
Yes people, you read correctly.... sacrifice multi-tasking mode to use HUBEXEC!

jazzed · 2013-12-05 19:55

potatohead wrote: »

There is:

"Industry standard" assembler standards'

, and there is PASM syntax. ....

Doug, Propeller GAS is essentially a PASM Macro Assembler.

potatohead · 2013-12-05 19:57

Now you've just made code less portable

The best sense, is something you can hand to a first year student (ex AVR or PIC or 8051), and say 'What does this do' ?

Parallax are strong in education, so they should focus on code clarity. Inventing NEW magic operators, is not clever..

I think the idea of portable assembly language is pretty funny.

That said, PASM in the form that Chip produced it is frankly the leanest, easiest to learn, effective assembly language I've ever used. I've taught assembly language a few times, and PASM is dead simple to teach to people. I didn't even have to break a sweat this last time, and the lean, unified environment of SPIN + PASM rocks hard.

Sorry, I really don't care how portable that is. SPIN + PASM as currently envisioned needs to exist. Quite frankly, let the other environments compete with it. I'm sure they will do fine.

And to be really clear, none of that is intended as some sort of dig on gcc, or any other thing. For those who have mastered gcc and friends, that industry standard syntax will be there for them, and for those students in education on a C path, it will be there for them too.

We have no need to start trying to make one unified syntax / environment. it's not gonna happen. I don't think it even should.

So then, there is the hobby, learn by doing education and there is the more formal education. The educators called for C, and Parallax has that path covered. The others call for the magic that is SPIN + PASM, which I might add, is a serious reason why many of us are here in the first place, and there is a path for that too.

Chip needs to do the syntax he would do. I feel strongly about this. That syntax won't be on the beaten path, and that is a very good thing.

Bill Henning · 2013-12-05 20:06

+1

potatohead, I totally agree.

potatohead · 2013-12-05 20:09

@Steve: Yes I know.

My objection isn't about gas. It's specifically about leaving the syntax Chip would create intact. For those so inclined to use it, they will find it very productive. Lean and mean like they do on P1 right now.

And I know that is a special thing, not standard, etc... Let's just let it exist for those inclined to use it. That's all.

Let me put it this way. I know some very experienced C programmers, who do low level things regularly in that tool chain, who feel exactly the same way I do about SPIN + PASM. It's as brilliant as it is non-standard.

I really don't want those bridges crossed because there just isn't a lot of value doing so. However, there is a TON of value in the chip, PASM + SPIN being designed to work together. --again, for those so inclined.

Once again, not a dig. Just a boundary I think needs to be intact.

And one other thing: It's ASSEMBLY LANGUAGE. Assembly language programmers do lots of stuff that higher level programmers would consider inadvisable. Great, no worries. That is why they are assembly language programmers and not higher level programmers. And at the end of the day, when it's all out there, those people work pure magic and they do so precisely because they CAN. This needs to exist. I feel strongly about it.

There are times and places where it does not make sense to do that, and we will have the tools for all of those times and places. But, there are times and places where it's AWESOME to do that, and I want that too.

Edit: And one more thing. Once the P2 is done and everybody gets to work on the things they want to work on, I very strongly suspect Chip is going to return to development tools that run on the P2, and that's yet another reason to leave the syntax vision of SPIN + PASM alone.

Think of it like the Forthers do. They know all the standard stuff, yet they use Forth. They have the same reasons for doing that we find people using SPIN + PASM do, and it's OK. In fact, it's brilliant and the wonderful thing about computing is it all plays out just fine. No worries.

In the end, these things all compete, none of them win the day, and we all use what we use and when we work together, we compromise and that is as good as it all will ever get.

Hub Execution Model Thread (split from blog)

Comments