Handy P2 PASM2 feature: _RET_ PUSH {#}D

rogloh · 2020-03-09 01:31

I just found a very handy feature I can use in PASM2 to determine where to jump next, and it sort of lets code sequences be chained together dynamically. I can already make very good use of this in my HyperRAM driver.

To use it you can do a _ret_ while you push new data in the same instruction. The top of stack element will essentially be swapped with a new value you decide on and you still get to return to the original caller at the same time, all in the same single instruction. I think this feature might be useful for quite a few things when you combine it with execf too.

             call #pusher
             ....
   _ret_    _any_instruction_ ' code will now branch off to something determined by the pusher routine


pusher
             ....
   _ret_     push #return_addr

I tested it and it seems to work but I didn't try it exhaustively. Hopefully no hidden gotchas if the stack wraps around while using it, etc.

cgracey · 2020-03-09 02:13

Interesting. I will look at the Verilog source to see what is actually happening.

rogloh · 2020-03-09 02:47

Great. If this "feature" is reliable, I'd really quite like to make use of it. Though if it is some type of race condition or something very unexpected, perhaps I'd better not.

cgracey · 2020-03-09 02:51

rogloh wrote: »

Great. If this "feature" is reliable, I'd really quite like to make use of it. Though if it is some type of race condition or something very unexpected, perhaps I'd better not.

It won't be a race condition, at all. It's just a matter of the steering logic and what it does when both return and push are happening at the same time.

rogloh · 2020-03-09 03:00

Ok, that is good. It should be consistent then. I'm putting it into my code for future testing at the moment. I'm finding it saves some instruction space.

cgracey · 2020-03-09 03:29

rogloh wrote: »

Ok, that is good. It should be consistent then. I'm putting it into my code for future testing at the moment. I'm finding it saves some instruction space.

I need to get my head around the ramifications of using it.

rogloh · 2020-03-09 03:37

Yeah, and I now wonder what this one does...probably not as useful.

_ret_ pop a

Also these:

_ret_ call #a

_ret_ jmp #a

_ret_ execf a

_ret_ ret

cgracey · 2020-03-09 03:51

The _ret_ only happens if a branch is NOT happening. So, a jump with a _ret_ in front of it will just jump.

Also, the hardware is set up so that only one value can be popped at a time. So, I think the first example would do a return and put the return address into 'a'.

AJL · 2020-03-09 05:29

Could this be used as the basis of the JMPRET replacement that @Cluso99 had been asking after?

Initialize mov     Task2, #SecondTask 'Initialize 1st Dest.
FirstTask <start of first task>
 ...
 jmpret  Task1, Task2 'Give 2nd task cycles
 <more first task code>
 ...
 jmpret  Task1, Task2 'Give 2nd task cycles
 jmp       #FirstTask 'Loop first task

SecondTask <start of second task>
 ...
 jmpret  Task2, Task1 'Give 1st task cycles
 <more second task code>
 ...
 jmpret  Task2, Task1 'Give 1st task cycles
 jmp     #SecondTask 'Loop second task
 Task1 res 1 'Declare task address
 Task2 res 1 'storage space

becomes

Initialize PUSH #SecondTask 'Initialize 1st Dest.
FirstTask <start of first task>
 ...
 _ret_ PUSH #Task1A 'Give 2nd task cycles
Task1A <more first task code>
 ...
 _ret_ PUSH #Task1B 'Give 2nd task cycles
Task1B <more first task code>
 jmp #FirstTask 'Loop first task

SecondTask <start of second task>
 ...
 _ret_ PUSH #Task2A 'Give 1st task cycles
Task2A <more second task code>
 ...
 _ret_ PUSH #SecondTask 'Give 1st task cycles and loop second task

Not a drop-in replacement, but is it similar enough?

rogloh · 2020-03-09 05:55

Yes I think it might be able to work that way. You are telling the code you are returning to where to come back to next when you do a "_ret_ push #xxx". When that caller code returns it will jump there.

It will consume one stack entry and you have to branch between your code fragments at the same stack level so it is not quite the same as the JMPRET which is probably a little more versatile. But I do think this feature could be used in interesting ways and could save some COGRAM at times.

TonyB_ · 2020-03-09 19:21

A few thoughts, untested, in no particular order:

1. This is a potentially very useful discovery that risks becoming forgotten in a few days' or weeks' time. I suggest editing the title to include the most useful instruction, as follows:
Handy P2 PASM2 feature: _RET_ PUSH {#}D

(Aside. Are the pixel doubling routines buried deep somewhere in a video thread?)

2. Implementing _RET_ to POP the return address before the PUSH is the only way round that adds any value. Great design work!

3. Thinking about XBYTE, this could be exactly what I was looking for here assuming that $1FF is no longer top of stack after the PUSH. Apart from my example of a cleanup routine, it could be a way of passing a parameter from one bytecode to the next using the hardware stack.

4. A Where Am I? routine could be a single long:

		call	#pop_PC		'Where am I?
here		...			'You are here
		
pop_PC	
	_ret_	pop	PC		'PC[31:0] = {C, Z, 10'b0, PC[19:0]}

N.B. PC register also contains copy of C and Z as high bits.

cgracey · 2020-03-09 20:22

I looked at the Verilog code and it seems that if you do this:

_ret_ push value

...what happens is the 'push value' occurs, while a jump to the return address also takes place. Afterwards, you have returned, as expected, and the bottom of the stack contains the pushed value, while the next level of the stack contains the address that was returned to. So, it kind of makes a mess of the stack.

Stack before:

7: ?
6: ?
5: ?
4: ?
3: ?
2: ?
1: ?
0: return address

Stack after:

7: ?
6: ?
5: ?
4: ?
3: ?
2: ?
1: return address
0: value

Cluso99 · 2020-03-09 21:24

So while it takes the return address from the top of the stack, the removal does not take place because another item is pushed onto the stack effectively cancelling the pop's adjustment to the stack pointer.

cgracey · 2020-03-09 21:53

Cluso99 wrote: »

So while it takes the return address from the top of the stack, the removal does not take place because another item is pushed onto the stack effectively cancelling the pop's adjustment to the stack pointer.

I could have made it NOT push or pop the stack if BOTH were happening, but it never occurred to me that anyone would write code like that.

jmg · 2020-03-09 21:58

cgracey wrote: »

...what happens is the 'push value' occurs, while a jump to the return address also takes place. Afterwards, you have returned, as expected, and the bottom of the stack contains the pushed value, while the next level of the stack contains the address that was returned to. So, it kind of makes a mess of the stack.

So the stack creeps and wraps around ?
That's likely to be a big problem with interrupts, but may be ok/tolerable if you know interrupts will never be used.

How about debug of this code ? IIRC that uses interrupts to step, so this may not be step-able ?

rogloh · 2020-03-09 22:01

Ok that will limit things a bit. Interestingly I think I can still use this feature the way I intended in my driver because I don't really use the stack otherwise before this. Everything else I do is branching before this level. So it's probably still okay for the stack to grow in size and get corrupted in this way each time it is called if you only use it at the top level in your code and don't need to return somewhere after this.

rogloh · 2020-03-09 22:22

cgracey wrote: »

Cluso99 wrote: »

So while it takes the return address from the top of the stack, the removal does not take place because another item is pushed onto the stack effectively cancelling the pop's adjustment to the stack pointer.

I could have made it NOT push or pop the stack if BOTH were happening, but it never occurred to me that anyone would write code like that.

I guess we should probably see what happens with the _RET_ POP {#}D case too and how it leaves things. It might be possible to gain some value from that if it does something interesting as well by creating some other side-effect behaviour with the stack etc.

TonyB_ · 2020-03-09 22:50

If the return address is $1FF, does the _ret_ start a new bytecode? If so, it seems to be exactly what I wanted earlier in the XBYTE thread. The return address is not popped, but that's $1FF for XBYTE and so would not be popped anyway. The important thing for me is to start a new bytecode with something other than $1FF on the top of stack. Having $1FF at one level below top of stack is ideal.

rogloh wrote: »

So it's probably still okay for the stack to grow in size and get corrupted in this way each time it is called if you only use it at the top level in your code and don't need to return somewhere after this.

Agreed, it's still useful.

EDIT:
I wouldn't say the stack gets "corrupted" as POPs could remove the unwanted already-used return addresses, if necessary. The effective stack depth is reduced, though.

cgracey · 2020-03-10 00:16

Here is the Verilog for the hardware stack:

// stack

wire stk_push		= push || callpa || callpb || callr || calli;					// pushes

wire stk_write		= ptr_w || go && (stk_push || pop || (ret || ret_auto) && !stk_xbyte);		// writes

wire [7:0][31:0] stkx	= ptr_w || stk_push ? {stk[6:0], push && !ptr_w ? d : {c, z, 10'b0, p}}		// push/call, ptr_w traps execution address
					    : {stk[7], stk[7:1]};					// pop/ret/ret_auto

`regscan (stk,     256'b0, stk_write, stkx)								// stack levels
`regscan (stk_xbyte, 1'b0, stk_write, stkx[0][19:0] == 20'b0000_0000_0001_1111_1111)			// top stack value is $001FF (speeds up xbytex)

wire [1:0] popret_cz	= {stk[0][31], pop_reg ? ~|stk[0] : stk[0][30]};

ptr_w is the signal that the cog receives during its COGINIT. It's used to push the execution address onto the hardware stack, so that you can find out where the cog hung up, if need be.

Cluso99 · 2020-03-10 02:02

cgracey wrote: »

Cluso99 wrote: »

So while it takes the return address from the top of the stack, the removal does not take place because another item is pushed onto the stack effectively cancelling the pop's adjustment to the stack pointer.

I could have made it NOT push or pop the stack if BOTH were happening, but it never occurred to me that anyone would write code like that.

There is always someone that will try something unintended

It's good to know just what will happen tho.

rogloh · 2020-03-10 02:48

I just tried out the "_RET_ POP D" case to see what would happen. It looks like it returns ok to where it was called from and the D register gets a copy of the return address of the calling code (i.e. caller location in COG RAM + 1), plus the state of the C, Z flags at calling time. Not sure what actual final uses this has but perhaps this could be used in recording/tracing the location of where each function returned to last and/or capturing the original caller's flag state to a given register, which normally wouldn't be done with other RET cases. You could also restore the original flags later with "RCZL D WCZ" perhaps if you need to and also still make use of the C, Z flags from the called code before this operation (which could be useful for returning state). As with the push case I didn't check if it cleans up the stack fully but its Verilog OR condition in the stk_write wire makes it look like it should work in its usual way this time.

TonyB_ · 2020-03-10 11:00

cgracey wrote: »

Here is the Verilog for the hardware stack:

// stack

wire stk_push		= push || callpa || callpb || callr || calli;					// pushes

wire stk_write		= ptr_w || go && (stk_push || pop || (ret || ret_auto) && !stk_xbyte);		// writes

wire [7:0][31:0] stkx	= ptr_w || stk_push ? {stk[6:0], push && !ptr_w ? d : {c, z, 10'b0, p}}		// push/call, ptr_w traps execution address
					    : {stk[7], stk[7:1]};					// pop/ret/ret_auto

`regscan (stk,     256'b0, stk_write, stkx)								// stack levels
`regscan (stk_xbyte, 1'b0, stk_write, stkx[0][19:0] == 20'b0000_0000_0001_1111_1111)			// top stack value is $001FF (speeds up xbytex)

wire [1:0] popret_cz	= {stk[0][31], pop_reg ? ~|stk[0] : stk[0][30]};

I think that seven consecutive POPs will guarantee that all eight longs on the stack are the same and further POPs will keep reading this value, assuming no intervening CALLs or PUSHes.

Does reset clear the stack?

cgracey · 2020-03-10 19:06

TonyB_ wrote: »

cgracey wrote: »

Here is the Verilog for the hardware stack:

// stack

wire stk_push		= push || callpa || callpb || callr || calli;					// pushes

wire stk_write		= ptr_w || go && (stk_push || pop || (ret || ret_auto) && !stk_xbyte);		// writes

wire [7:0][31:0] stkx	= ptr_w || stk_push ? {stk[6:0], push && !ptr_w ? d : {c, z, 10'b0, p}}		// push/call, ptr_w traps execution address
					    : {stk[7], stk[7:1]};					// pop/ret/ret_auto

`regscan (stk,     256'b0, stk_write, stkx)								// stack levels
`regscan (stk_xbyte, 1'b0, stk_write, stkx[0][19:0] == 20'b0000_0000_0001_1111_1111)			// top stack value is $001FF (speeds up xbytex)

wire [1:0] popret_cz	= {stk[0][31], pop_reg ? ~|stk[0] : stk[0][30]};

I think that seven consecutive POPs will guarantee that all eight longs on the stack are the same and further POPs will keep reading this value, assuming no intervening CALLs or PUSHes.

Does reset clear the stack?

A chip reset clears the stacks, but not a cog start or stop.

jmg · 2020-03-10 19:25

TonyB_ wrote: »

I think that seven consecutive POPs will guarantee that all eight longs on the stack are the same and further POPs will keep reading this value, assuming no intervening CALLs or PUSHes.

Aren't those backwards ? 8 PUSHes would fully define the stack.

TonyB_ · 2020-03-10 19:39

cgracey wrote: »

A chip reset clears the stacks, but not a cog start or stop.

Thanks, Chip.

jmg wrote: »

TonyB_ wrote: »

I think that seven consecutive POPs will guarantee that all eight longs on the stack are the same and further POPs will keep reading this value, assuming no intervening CALLs or PUSHes.

Aren't those backwards ? 8 PUSHes would fully define the stack.

Just a fun comment based on my reading of the Verilog posted above. The seven longs below top of stack move up one level for each POP and long furthest from top stays the same.

Handy P2 PASM2 feature: _RET_ PUSH {#}D

Comments