The addressing conundrum

Cluso99 · 2015-10-03 12:45

David Betz wrote: »
cgracey wrote: »
I got the new memory and branching model working. I also got REP working in hub exec.

There's full binary compatibility now between cog/lut code and hub code that use relative addressing.

In cog code now, we are back to the good old 1:1 addressing - no more 4x'd register addresses. What a relief!

Here's the new map for code execution :

00000..001FF = cog
00200..003FF = lut
00400..FFFFF = hub

Downloaded programs start at $400.

When in the cog, all registers are long, with their addresses being contiguous integers. The PC steps by 1.

When in the hub, instructions take 4 bytes. The PC steps by 4.

To bridge the two contexts, there are two simple things done:

The 9-bit-constant relative branches DJNZ/DJZ/TJZ/... encode the -256..+255 instruction range into their S field. When in cog exec, that value is sign-extended and added to the PC. When in hub exec, it is shifted left two bits and used the same way. This way, both cog and hub contexts get the max use out of these instructions and maintain binary compatibility.

The 20-bit-constant relative branches JMP/CALL/CALLA/... are encoded for hub exec as you imagine they would be, where they track byte offset. When the cog uses these branches, it shifts them right two bits to get cog-relative values. They are assembled pre-4x'd in cog code that way. So, these instructions are now binary compatible between cog/lut and hub code.

REP now works in hub exec by forcing a jump during the last instruction in the repeat block. It didn't take much logic to implement and it works just as you'd expect. Even though it's slow in hub exec, because of the branching on each iteration, it is a convenient instruction to have for doing simple loops.

The assembler generates the same code for relative branches and REP in both cog/lut exec and hub exec contexts.

I will have updated FPGA files done tomorrow. I just finished the Prop123-A7 compile and now I need to make the DE2-115 version.

Here's what the all_cogs_blink program looks like now. Note the ORGH and the REP:
dat
	orgh	$400

' launch 15 cogs (cog 0 falls through and runs 'blink', too)
' any cogs missing from the FPGA won't blink

	loc	x,@blink

	rep	@repend,#15
	coginit	#16,x
repend

blink	cogid	x		'which cog am I?
	setb	dirb,x		'make that pin an output
	notb	outb,x		'flip its output state
	add	x,#16		'add to my id
	shl	x,#18		'shift up to make it big
	waitx	x		'wait that many clocks
	jmp	@blink		'do it again

	org
x	res	1		'variable at cog register 8
Looks good except I find it odd that the 9 bit immediate addresses get treated as long addresses but the 20 bit addresses get treated as byte addresses. Seems like it would be better if they were both shifted left by 2 to get hub addresses for consistency. Then all immediate address fields are treated as long addresses or long offsets.

This is precisely what I have been pushing for without success. Makes for a standard instruction model of all long addresses. The consequence of this is all instructions must be long aligned, which IMHO is fine.

David Betz · 2015-10-03 12:50

Cluso99 wrote: »
David Betz wrote: »
cgracey wrote: »
I got the new memory and branching model working. I also got REP working in hub exec.

There's full binary compatibility now between cog/lut code and hub code that use relative addressing.

In cog code now, we are back to the good old 1:1 addressing - no more 4x'd register addresses. What a relief!

Here's the new map for code execution :

00000..001FF = cog
00200..003FF = lut
00400..FFFFF = hub

Downloaded programs start at $400.

When in the cog, all registers are long, with their addresses being contiguous integers. The PC steps by 1.

When in the hub, instructions take 4 bytes. The PC steps by 4.

To bridge the two contexts, there are two simple things done:

The 9-bit-constant relative branches DJNZ/DJZ/TJZ/... encode the -256..+255 instruction range into their S field. When in cog exec, that value is sign-extended and added to the PC. When in hub exec, it is shifted left two bits and used the same way. This way, both cog and hub contexts get the max use out of these instructions and maintain binary compatibility.

The 20-bit-constant relative branches JMP/CALL/CALLA/... are encoded for hub exec as you imagine they would be, where they track byte offset. When the cog uses these branches, it shifts them right two bits to get cog-relative values. They are assembled pre-4x'd in cog code that way. So, these instructions are now binary compatible between cog/lut and hub code.

REP now works in hub exec by forcing a jump during the last instruction in the repeat block. It didn't take much logic to implement and it works just as you'd expect. Even though it's slow in hub exec, because of the branching on each iteration, it is a convenient instruction to have for doing simple loops.

The assembler generates the same code for relative branches and REP in both cog/lut exec and hub exec contexts.

I will have updated FPGA files done tomorrow. I just finished the Prop123-A7 compile and now I need to make the DE2-115 version.

Here's what the all_cogs_blink program looks like now. Note the ORGH and the REP:
dat
	orgh	$400

' launch 15 cogs (cog 0 falls through and runs 'blink', too)
' any cogs missing from the FPGA won't blink

	loc	x,@blink

	rep	@repend,#15
	coginit	#16,x
repend

blink	cogid	x		'which cog am I?
	setb	dirb,x		'make that pin an output
	notb	outb,x		'flip its output state
	add	x,#16		'add to my id
	shl	x,#18		'shift up to make it big
	waitx	x		'wait that many clocks
	jmp	@blink		'do it again

	org
x	res	1		'variable at cog register 8
Looks good except I find it odd that the 9 bit immediate addresses get treated as long addresses but the 20 bit addresses get treated as byte addresses. Seems like it would be better if they were both shifted left by 2 to get hub addresses for consistency. Then all immediate address fields are treated as long addresses or long offsets.
This is precisely what I have been pushing for without success. Makes for a standard instruction model of all long addresses. The consequence of this is all instructions must be long aligned, which IMHO is fine.

Yes, I realize I didn't originate this idea. I just thought I'd give it one more push! :-)

rjo__ · 2015-10-03 14:28

oops

Heater. · 2015-10-03 14:35

LONG aligned instructions sounds fine to me. Would not expect anything else really.

rjo__ · 2015-10-03 14:42

ok, here's a conundrum, you guys can all agree upon:)

dat
	orgh	1

' launch cog 1 (cog 0 falls through and runs 'blink', too)
	coginit	#1,#blink

blink	cogid	x		'which cog am I?
	setb	dirb,x		'make that pin an output
	notb	outb,x  	'flip its output state
	waitx	myval		'wait that many clocks
	jmp	@blink		'do it again

	org
myval     long    $2FAF080  '50_000_000
x       res     1

2 issues:

1. When cog0 gets to waitx ... it doesn't get myval, effectively executing the following:

waitx #0

And the cog0 LED stays on for about a minute and 24 seconds... presumably until waitx rolls over its counter.

2. When cog1 gets to waitx, it doesn't get myval either, but does get a value... a pretty big one and apparently always the same value(or nearly) but not the one I am trying to send it(myval).

What am I doing wrong??

rjo__ · 2015-10-03 14:56

I can fix it by making myvalue a constant and then using

waitx ##myvalue

but I still don't understand what my code is doing wrong.

Roy Eltham · 2015-10-03 15:41

David, Cluso, Heater,

Chip doesn't want to give up unaligned code in HUB. It does allow for simpler mixed code and data, as in his example with strings.

I think the only real argument here is that because of this it's still possible to write HUB code that is not binary compatible to also run in COG/LUT. His example with mixed data and code will not work if copied into a cog. Of course, even if he made hub instructions long aligned only, his example wouldn't work in cog anyway because it needs byte access to the string data.

The only real complete solution would actually be to go in the other direction and make it so COG/LUT could run unaligned code and also have byte/word data access (rdxxxx/wrxxxx would need to be able to read COG/LUT addresses, and the memory map would have to change so hub would not start at 0). I don't think this is feasible to do for P2's timeframe, if at all with this architecture.

Seairth · 2015-10-03 15:45

rjo__ wrote: »
ok, here's a conundrum, you guys can all agree upon:)
dat
	orgh	1

' launch cog 1 (cog 0 falls through and runs 'blink', too)
	coginit	#1,#blink

blink	cogid	x		'which cog am I?
	setb	dirb,x		'make that pin an output
	notb	outb,x  	'flip its output state
	waitx	myval		'wait that many clocks
	jmp	@blink		'do it again

	org
myval     long    $2FAF080  '50_000_000
x       res     1
2 issues:

1. When cog0 gets to waitx ... it doesn't get myval, effectively executing the following:

waitx #0

And the cog0 LED stays on for about a minute and 24 seconds... presumably until waitx rolls over its counter.

2. When cog1 gets to waitx, it doesn't get myval either, but does get a value... a pretty big one and apparently always the same value(or nearly) but not the one I am trying to send it(myval).

What am I doing wrong??

You are executing in hub exec mode. But you are treating myval like you are running in cog exec mode.

Seairth · 2015-10-03 15:49

Following up, you either need to copy your data into the cog, or you need to use rdlong to read the value from its location in hub memory.

Incidentally, this sort of mistake is going to be a common occurrence. Not sure what can be done about it, though.

Roy Eltham · 2015-10-03 15:52

rjo__

The problem is that data elements are not automatically copied up into the COG, so your code is reading address $10 in the COG memory for myval, but it's not being initialized to your 50 million value. You need to copy the value into the COG register yourself in code.

rjo__ · 2015-10-03 16:02

That's not so bad:)

This works

con

'myvalue = 50_000_000

dat
	orgh	1

' launch cog 1 (cog 0 falls through and runs 'blink', too)
	coginit	#1,#blink
myvalue     long    $2FAF080  '50_000_000

blink	cogid	x		'which cog am I?
	setb	dirb,x		'make that pin an output
	notb	outb,x  	'flip its output state
        rdlong  mycogvalue,##myvalue
	waitx	mycogvalue		'wait that many clocks
	jmp	@blink		'do it again

	org
x                  res     1
mycogvalue  res 1

rjo__ · 2015-10-03 16:05

But... I want to be in cog execute mode... conserving every bit in HUB ram as possible... So?

potatohead · 2015-10-03 17:50

Doesn't Intel use non aligned? Those instructions are variable length, and it's pretty common.

The way Chip has it now we get the best of both options. I understand it is compelling to just pick hub or cog, but the reality is the two have basic differences that make doing that largely impractical.

Additionally, we are back to cog code being simple and fun. This is important because cog code being easy and fun helps with learning, drivers, and or getting the max performance.

Assembly language is looking fun now with these latest decisions.

That is a design goal guys. Higher level tool considerations are important too. And gcc, etc... will be just fine with what has been done.

Finally, as it stands right now, an on chip dev system has great potential. I want to see that happen. All of that is the "is fun" part of our design spec. Why not? How often is that part of the discussion? Never! So let's maximize that part right along with the practicalities.

Future geeks will thank us.

cgracey · 2015-10-03 18:52

Guys, do you think we should give COGINIT an option for loading a cog's RAM and then JMPing to it at $008? It would make it much easier to start up small programs. Having cogs start up in hub exec is like making everybody jump into the deep end of the pool.

Heater. · 2015-10-03 19:08

Having everything start as hub exec is exactly how the Propeller 1 works. As far as I know there is no way to write a pure PASM program for the P1 without at least a few Spin byte codes to get it started.

Luckily back in the dark ages when the Prop Tool was Windows only and before there was a BST, SimpleIDE, HomeSpun, OpenSpin, Catalina, Prop GCC, etc etc there was at least Cliff Biffle's propasm https://github.com/cbiffle/propasm which did exactly that.

Bill Henning · 2015-10-03 19:11

That would be handy!

Btw, I like your solution re/ addressing. Personally, I plan on using the first 4k in the hub for mailboxes and system wide data.

Keep the short immediate address jumps in longs, and the branches in +/- longs. Be green. Don't waste two address bits on 0's.

It would be a terrible waste to limit the 9 bit cog # branches to the first 128 cog addresses, and the 9 bit relative addresses to +/- 64 longs - it would be taking purism too far.

cgracey wrote: »

Guys, do you think we should give COGINIT an option for loading a cog's RAM and then JMPing to it at $008? It would make it much easier to start up small programs. Having cogs start up in hub exec is like making everybody jump into the deep end of the pool.

potatohead · 2015-10-03 19:34

That is a nice option Chip.

jmg · 2015-10-03 19:37

cgracey wrote: »

Guys, do you think we should give COGINIT an option for loading a cog's RAM and then JMPing to it at $008? It would make it much easier to start up small programs. Having cogs start up in hub exec is like making everybody jump into the deep end of the pool.

Choice is always good. It should be clear which mode a user is starting in.
Is there a cost to this option ?

Electrodude · 2015-10-03 20:14

Instead of doing cogram COGINIT in hardware, what if you modify COGINIT to take a SETQ x, where x initializes the new cog's ptra, and then put a subroutine in hubram like this:

DAT
               orgh
coginit_cogram setq    #$1FF
               rdlong  0, ptra
	       jmp     #8

Then, to load a cog P1-style, you would just do:

               setq    ##cog_code
	       coginit #16, ##coginit_cogram ' is augs+setq+augs+instr like this legal?
	       ...

DAT
	       org    
cog_code       long    0[8] ' special registers
               ' cogexec code

Seairth · 2015-10-03 20:30

cgracey wrote: »

Guys, do you think we should give COGINIT an option for loading a cog's RAM and then JMPing to it at $008? It would make it much easier to start up small programs. Having cogs start up in hub exec is like making everybody jump into the deep end of the pool.

Yes!

cgracey · 2015-10-03 20:42

jmg wrote: »

cgracey wrote: »

Guys, do you think we should give COGINIT an option for loading a cog's RAM and then JMPing to it at $008? It would make it much easier to start up small programs. Having cogs start up in hub exec is like making everybody jump into the deep end of the pool.

Choice is always good. It should be clear which mode a user is starting in.
Is there a cost to this option ?

Another bit for cog launching and 2..3 more cog ROM instructions. I'm thinking of supporting LUT load after cog load, too. It's about an hour of work. I'll do this.

Cluso99 · 2015-10-03 20:43

cgracey wrote: »

Guys, do you think we should give COGINIT an option for loading a cog's RAM and then JMPing to it at $008? It would make it much easier to start up small programs. Having cogs start up in hub exec is like making everybody jump into the deep end of the pool.

Yes, provided it does not take up too much space.
However, this could also be a routine in hub loaded from ROM that the user could jump to via coginit.

BTW wouldn't a cog start at $010 make more sense?

cgracey · 2015-10-03 20:52

...BTW wouldn't a cog start at $010 make more sense?

No. If interrupt vectors are present from $00A.. $00F, they will look like NOP's. Otherwise, code can start at $008. A few instructions can always go at $008.. $009. Being able to automatically load interrupt vectors without having to manually set them would be nice.

rjo__ · 2015-10-03 21:11

Roy Eltham · 2015-10-03 22:31

Chip,
I would like that option on COGINIT, but I really like the start in HUBEXEC mode too, so I hope it stays having both options.

Also, will the initial startup still have the first cog start in HUBEXEC? I assume so, right?

potatohead · 2015-10-04 02:41

Unless we provide an option to protect the RAM copy of internal ROM, it's probably best we don't put routines and such in there that people would need to depend on.

The addressing conundrum

Comments