Pnut Assembler Bug - Immediate representation of cog address labels

ozpropdev · 2015-09-30 05:27

Hi Chip
If I refer to a cog address by name as an immediate it fails. If I manually enter the cog address it works.
An example

	mov	regy,#one	'fails
	mov	regy,#$30	'works

See attached demo program

Test result for manually coded immediate value

Prop2 "ALTDS" instruction test #1

regy = 00000030
-=> Results <=-
11111111
22222222
33333333
44444444
55555555
66666666
77777777
88888888

Results of Pnut's immediate assignment

Prop2 "ALTDS" instruction test #1

regy = 000000C0
-=> Results <=-
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000

cgracey · 2015-09-30 08:54

I've looked over your code, but haven't run it, yet.

I think when you do '#one' you are getting the byte-oriented address. To get the register, you'd need to do '#one>>2'. At least, that's a problem I see.

This byte-oriented addressing is a pain, in some ways.

cgracey · 2015-09-30 08:56

I'm concerned that this whole addressing issue is causing so much fatigue among us that we are not feeling as free as we should. Would you concur?

cgracey · 2015-09-30 08:58

Having long-oriented (as opposed to byte-oriented) addressing throughout the whole system of hardware and tools would take a huge load off our minds, but then there's the pesky problem of how to address words and longs.

ozpropdev · 2015-09-30 09:01

Thanks Chip, that fixed it!
Mind was in old cog mode. I forgot about the new byte addressing model.

ozpropdev · 2015-09-30 09:05

Can the assembler assume that the last directive was an "ORG" and convert cog register references the "old" way.

cgracey · 2015-09-30 09:09

ozpropdev wrote: »

Can the assembler assume that the last directive was an "ORG" and convert cog register references the "old" way.

Sure, but then we've got another problem:

How to handle JMP #label?

What if it's a hub address that uses the two LSB's?

We can make all kinds of patches, but we need a fundamental change to make things really straight.

ozpropdev · 2015-09-30 09:25

Oh yeah, a JMP to hub-exec does upset things a bit.

While on the subject of making things nice and easy, I was playing with ALTDS today.
One idea on a supporting syntax for this instruction was along these lines.

    mov regx,(regy)     'use s value in regy to replace original s value
    mov regx,(regy++)    'same as above but with increment
    add regx,regy `newreg   're directs result to field supplied by newreg
.. and similar for d-field
    mov (regy),#0
...etc

Like the AUGx implementation currently in Pnut these would generate ALTDS instructions.
Cleans code up and eliminates remembering the ALTDS modes.
Just an idea...

jmg · 2015-09-30 09:42

cgracey wrote: »

Sure, but then we've got another problem:

How to handle JMP #label?

What if it's a hub address that uses the two LSB's?

We can make all kinds of patches, but we need a fundamental change to make things really straight.

A good start would be to make the Address without the #, ( just like nearly all other assemblers do )

Then, you could introduce an ADR() operator for loading an address for index use.
User intent is clear, and assembler checks for segment and applies requires >> if any.

cgracey · 2015-09-30 09:46

ozpropdev wrote: »
Oh yeah, a JMP to hub-exec does upset things a bit.

While on the subject of making things nice and easy, I was playing with ALTDS today.
One idea on a supporting syntax for this instruction was along these lines.
    mov regx,(regy)     'use s value in regy to replace original s value
    mov regx,(regy++)    'same as above but with increment
    add regx,regy `newreg   're directs result to field supplied by newreg
.. and similar for d-field
    mov (regy),#0
...etc
Like the AUGx implementation currently in Pnut these would generate ALTDS instructions.
Cleans code up and eliminates remembering the ALTDS modes.
Just an idea...

I've been thinking pretty much the same thing. A way to hide all the ugliness and make it easy to use.

What's really eating me up at the moment is this issue of cog registers being located at 1/4 of the measured address. This creates all sorts of nasty gotchas. We need a long-based machine that can address byte and word data, as well. It seems like we have it backwards.

jmg · 2015-09-30 09:46

cgracey wrote: »

Having long-oriented (as opposed to byte-oriented) addressing throughout the whole system of hardware and tools would take a huge load off our minds, but then there's the pesky problem of how to address words and longs.

Maybe you could default to implicit long, and use something like
ByteAdr(ByteVariableName)
WordAdr(WordVariableName)
for the other cases loading ?

cgracey · 2015-09-30 09:52

jmg wrote: »

cgracey wrote: »

Having long-oriented (as opposed to byte-oriented) addressing throughout the whole system of hardware and tools would take a huge load off our minds, but then there's the pesky problem of how to address words and longs.

Maybe you could default to implicit long, and use something like
ByteAdr(ByteVariableName)
WordAdr(WordVariableName)
for the other cases loading ?

Perhaps.

What P2-Hot did was track long addresses for PC use (got rid of two LSBs), and then had instructions which could take PC values and shift them left two bits and add offsets for array lookups, and what not. It worked pretty well. I just can't get over the estrangement from not being able to call out cog registers by their numbers.

jmg · 2015-09-30 09:59

cgracey wrote: »

I just can't get over the estrangement from not being able to call out cog registers by their numbers.

Do you mean using a raw decimal number ?
That is a rare usage, as most ASM code reserves locations by name, and then uses the VarName as the reference.
Same with Labels.

The tools can then do pretty much what you describe P2 hot silicon as having done.

I think the silicon has to 'think in bytes', but the tools do not need to use bytes as the default - the most common usage would be a better default.

David Betz · 2015-09-30 10:20

cgracey wrote: »

I'm concerned that this whole addressing issue is causing so much fatigue among us that we are not feeling as free as we should. Would you concur?

It seems to me that all assemblers for machines with long-aligned instructions have faced this problem since the beginning of time. How do they handle this? I think part of it is to implicitly shift branch target addresses right by two when assembling instructions. Part of how this is handled has to do with the whole "jmp foo" vs. "jmp #foo" syntax change that others have suggested. If branch or jump targets are treated differently from immediate values, the rules for assembling addresses can be different. Does this solve the problem?

Electrodude · 2015-09-30 13:11

cgracey wrote: »

Having long-oriented (as opposed to byte-oriented) addressing throughout the whole system of hardware and tools would take a huge load off our minds, but then there's the pesky problem of how to address words and longs.

Then please make all instruction addresses long-aligned. Why do we even need misaligned instructions? It won't save any significant amount of memory. Also, forcing instructions to be long-aligned doesn't have to disallow non-aligned data accesses (unless I have no idea how streamer addressing works) - keep the hub streamer how it is now, and feed it {pc, 2'b00} as the address when doing hubexec.

I'm sure many if not most would disagree with me on this last part, but I would actually prefer long addressing for cog/lutram and byte addressing for hubram over consistent byte addressing for both.

Hubexec is not the main feature of the P2. In fact, it is arguably the least unique feature about the P2 - it's all most processors have. It should not dictate how everything else works.

User Name · 2015-09-30 16:06

> Hubexec is not the main feature of the P2. In fact, it is arguably the least unique feature about the P2 - it's all most processors have. It should not dictate how everything else works.

Yes x 10. It is distressing to me to see everything else compromised for hubexec. But then that has been a fundamental split among Prop users for a long while. My privately held opinion has been that if hubexec is your thing, you might do well to look into an ARM.

David Betz · 2015-09-30 16:10

User Name wrote: »

> Hubexec is not the main feature of the P2. In fact, it is arguably the least unique feature about the P2 - it's all most processors have. It should not dictate how everything else works.

Yes x 10. It is distressing to me to see everything else compromised for hubexec. But then that has been a fundamental split among Prop users for a long while. My privately held opinion has been that if hubexec is your thing, you might do well to look into an ARM.

Unfortunately, that is exactly what most people have already done. How many of us are left here?

User Name · 2015-09-30 16:47

Wish I knew whether the answer was to be more like the competition or to up your own game.

David Betz · 2015-09-30 16:51

User Name wrote: »

Wish I knew whether the answer was to be more like the competition or to up your own game.

I thought the idea was to be somewhere in between. Hub execution gives us the ability to run largish programs that are compiled to native code. This is necessary for some applications. Sometimes you can't pay the penalty for a bytecode interpreter. However, we also have the ability to run fast COG code like we did in P1. Isn't this the best of both worlds?

User Name · 2015-09-30 17:06

Pretty clear that new P2 cogs will not be as fast as they could have been if there were no such thing as hub exec. The many little performance compromises that have had to be made have stacked up. Maybe that will prove to have been the proper direction to take. Time will tell.

David Betz · 2015-09-30 17:11

User Name wrote: »

Pretty clear that new P2 cogs will not be as fast as they could have been if there were no such thing as hub exec. The many little performance compromises that have had to be made have stacked up. Maybe that will prove to have been the proper direction to take. Time will tell.

Interesting. I wasn't aware that there were any compromises made in COG mode execution to support hub execution. What are they? I realize that the setup for COG execution is a bit more complicated than it used to be. Is that what you're talking about? Or is execution after setup also slower for some reason?

cgracey · 2015-09-30 19:12

There are no performance compromises in the cog to support hub exec. It's just that the resultant addressing scheme complicates the user's experience. The tools can only go so far in trying to ameliorate this.

User Name · 2015-09-30 23:02

I'm mostly referring to the basic architecture... The depth of the pipeline require to make the magic happen... The number of gates occupying critical paths... The increase in the total number of gates... The number of gates being clocked... The heat produced... The die size... Etc.

I think I've read every word that Chip has ever posted to the Forum, and while I don't have perfect recall (not even close), the overall impression has always been that features have their costs.

If hub exec is a true freebie (other than the addressing complications) I'll shut up right now.

Actually, I'll probably shut up anyway.

cgracey · 2015-10-01 01:22

User Name wrote: »

I'm mostly referring to the basic architecture... The depth of the pipeline require to make the magic happen... The number of gates occupying critical paths... The increase in the total number of gates... The number of gates being clocked... The heat produced... The die size... Etc.

I think I've read every word that Chip has ever posted to the Forum, and while I don't have perfect recall (not even close), the overall impression has always been that features have their costs.

If hub exec is a true freebie (other than the addressing complications) I'll shut up right now.

Actually, I'll probably shut up anyway.

It actually was almost free, after implementing the eggbeater hub memory scheme. When I went to connect it up for hub exec, it was literally a line of Verilog here, a line there, and one over there. I thought it was, somehow, going to be a lot more complicated than that. The eggbeater was needed for other features, as well, so it wasn't hub-exec-centric.

David Betz · 2015-10-01 01:28

cgracey wrote: »

User Name wrote: »

I'm mostly referring to the basic architecture... The depth of the pipeline require to make the magic happen... The number of gates occupying critical paths... The increase in the total number of gates... The number of gates being clocked... The heat produced... The die size... Etc.

I think I've read every word that Chip has ever posted to the Forum, and while I don't have perfect recall (not even close), the overall impression has always been that features have their costs.

If hub exec is a true freebie (other than the addressing complications) I'll shut up right now.

Actually, I'll probably shut up anyway.

It actually was almost free, after implementing the eggbeater hub memory scheme. When I went to connect it up for hub exec, it was literally a line of Verilog here, a line there, and one over there. I thought it was, somehow, going to be a lot more complicated than that. The eggbeater was needed for other features, as well, so it wasn't hub-exec-centric.

I'm not sure why it is necessary to apologize for hub exec mode. It makes perfect sense to be able to execute instructions from the largest RAM on the chip. HLL performance on the P1 is crippled because of a lack of this capability. And how many P1 sales were lost because of this limitation?

cgracey · 2015-10-01 01:43

David Betz wrote: »

cgracey wrote: »

User Name wrote: »

I'm mostly referring to the basic architecture... The depth of the pipeline require to make the magic happen... The number of gates occupying critical paths... The increase in the total number of gates... The number of gates being clocked... The heat produced... The die size... Etc.

I think I've read every word that Chip has ever posted to the Forum, and while I don't have perfect recall (not even close), the overall impression has always been that features have their costs.

If hub exec is a true freebie (other than the addressing complications) I'll shut up right now.

Actually, I'll probably shut up anyway.

It actually was almost free, after implementing the eggbeater hub memory scheme. When I went to connect it up for hub exec, it was literally a line of Verilog here, a line there, and one over there. I thought it was, somehow, going to be a lot more complicated than that. The eggbeater was needed for other features, as well, so it wasn't hub-exec-centric.

I'm not sure why it is necessary to apologize for hub exec mode. It makes perfect sense to be able to execute instructions from the largest RAM on the chip. HLL performance on the P1 is crippled because of a lack of this capability. And how many P1 sales were lost because of this limitation?

Good point. It did turn out to be pretty simple, though, once the eggbeater was there to serve a few other necessary purposes, as well.

Pnut Assembler Bug - Immediate representation of cog address labels

Comments