Shop OBEX P1 Docs P2 Docs Learn Events
Handy P2 PASM2 feature: _RET_ PUSH {#}D — Parallax Forums

Handy P2 PASM2 feature: _RET_ PUSH {#}D

roglohrogloh Posts: 5,122
edited 2020-03-09 22:06 in Propeller 2
I just found a very handy feature I can use in PASM2 to determine where to jump next, and it sort of lets code sequences be chained together dynamically. I can already make very good use of this in my HyperRAM driver.

To use it you can do a _ret_ while you push new data in the same instruction. The top of stack element will essentially be swapped with a new value you decide on and you still get to return to the original caller at the same time, all in the same single instruction. I think this feature might be useful for quite a few things when you combine it with execf too.
             call #pusher
             ....
   _ret_    _any_instruction_ ' code will now branch off to something determined by the pusher routine


pusher
             ....
   _ret_     push #return_addr

I tested it and it seems to work but I didn't try it exhaustively. Hopefully no hidden gotchas if the stack wraps around while using it, etc.

Comments

  • cgraceycgracey Posts: 14,133
    Interesting. I will look at the Verilog source to see what is actually happening.
  • Great. If this "feature" is reliable, I'd really quite like to make use of it. Though if it is some type of race condition or something very unexpected, perhaps I'd better not.
  • cgraceycgracey Posts: 14,133
    rogloh wrote: »
    Great. If this "feature" is reliable, I'd really quite like to make use of it. Though if it is some type of race condition or something very unexpected, perhaps I'd better not.

    It won't be a race condition, at all. It's just a matter of the steering logic and what it does when both return and push are happening at the same time.
  • roglohrogloh Posts: 5,122
    edited 2020-03-09 03:00
    Ok, that is good. It should be consistent then. I'm putting it into my code for future testing at the moment. I'm finding it saves some instruction space.
  • cgraceycgracey Posts: 14,133
    rogloh wrote: »
    Ok, that is good. It should be consistent then. I'm putting it into my code for future testing at the moment. I'm finding it saves some instruction space.

    I need to get my head around the ramifications of using it.
  • roglohrogloh Posts: 5,122
    edited 2020-03-09 03:41
    Yeah, and I now wonder what this one does...probably not as useful.
    _ret_ pop a
    

    Also these:
    _ret_ call #a
    
    _ret_ jmp #a
    
    _ret_ execf a
    
    _ret_ ret
    
  • cgraceycgracey Posts: 14,133
    The _ret_ only happens if a branch is NOT happening. So, a jump with a _ret_ in front of it will just jump.

    Also, the hardware is set up so that only one value can be popped at a time. So, I think the first example would do a return and put the return address into 'a'.
  • Could this be used as the basis of the JMPRET replacement that @Cluso99 had been asking after?
    Initialize mov     Task2, #SecondTask 'Initialize 1st Dest.
    FirstTask <start of first task>
     ...
     jmpret  Task1, Task2 'Give 2nd task cycles
     <more first task code>
     ...
     jmpret  Task1, Task2 'Give 2nd task cycles
     jmp       #FirstTask 'Loop first task
    
    SecondTask <start of second task>
     ...
     jmpret  Task2, Task1 'Give 1st task cycles
     <more second task code>
     ...
     jmpret  Task2, Task1 'Give 1st task cycles
     jmp     #SecondTask 'Loop second task
     Task1 res 1 'Declare task address
     Task2 res 1 'storage space
    

    becomes
    Initialize PUSH #SecondTask 'Initialize 1st Dest.
    FirstTask <start of first task>
     ...
     _ret_ PUSH #Task1A 'Give 2nd task cycles
    Task1A <more first task code>
     ...
     _ret_ PUSH #Task1B 'Give 2nd task cycles
    Task1B <more first task code>
     jmp #FirstTask 'Loop first task
    
    SecondTask <start of second task>
     ...
     _ret_ PUSH #Task2A 'Give 1st task cycles
    Task2A <more second task code>
     ...
     _ret_ PUSH #SecondTask 'Give 1st task cycles and loop second task
    

    Not a drop-in replacement, but is it similar enough?
  • roglohrogloh Posts: 5,122
    edited 2020-03-09 06:05
    Yes I think it might be able to work that way. You are telling the code you are returning to where to come back to next when you do a "_ret_ push #xxx". When that caller code returns it will jump there.

    It will consume one stack entry and you have to branch between your code fragments at the same stack level so it is not quite the same as the JMPRET which is probably a little more versatile. But I do think this feature could be used in interesting ways and could save some COGRAM at times.
  • TonyB_TonyB_ Posts: 2,108
    edited 2020-03-10 10:49
    A few thoughts, untested, in no particular order:

    1. This is a potentially very useful discovery that risks becoming forgotten in a few days' or weeks' time. I suggest editing the title to include the most useful instruction, as follows:
    Handy P2 PASM2 feature: _RET_ PUSH {#}D

    (Aside. Are the pixel doubling routines buried deep somewhere in a video thread?)

    2. Implementing _RET_ to POP the return address before the PUSH is the only way round that adds any value. Great design work!

    3. Thinking about XBYTE, this could be exactly what I was looking for here assuming that $1FF is no longer top of stack after the PUSH. Apart from my example of a cleanup routine, it could be a way of passing a parameter from one bytecode to the next using the hardware stack.

    4. A Where Am I? routine could be a single long:
    		call	#pop_PC		'Where am I?
    here		...			'You are here
    		
    pop_PC	
    	_ret_	pop	PC		'PC[31:0] = {C, Z, 10'b0, PC[19:0]}
    
    N.B. PC register also contains copy of C and Z as high bits.
  • cgraceycgracey Posts: 14,133
    I looked at the Verilog code and it seems that if you do this:

    _ret_ push value

    ...what happens is the 'push value' occurs, while a jump to the return address also takes place. Afterwards, you have returned, as expected, and the bottom of the stack contains the pushed value, while the next level of the stack contains the address that was returned to. So, it kind of makes a mess of the stack.

    Stack before:

    7: ?
    6: ?
    5: ?
    4: ?
    3: ?
    2: ?
    1: ?
    0: return address

    Stack after:

    7: ?
    6: ?
    5: ?
    4: ?
    3: ?
    2: ?
    1: return address
    0: value
  • Cluso99Cluso99 Posts: 18,066
    So while it takes the return address from the top of the stack, the removal does not take place because another item is pushed onto the stack effectively cancelling the pop's adjustment to the stack pointer.
  • cgraceycgracey Posts: 14,133
    edited 2020-03-09 21:54
    Cluso99 wrote: »
    So while it takes the return address from the top of the stack, the removal does not take place because another item is pushed onto the stack effectively cancelling the pop's adjustment to the stack pointer.

    I could have made it NOT push or pop the stack if BOTH were happening, but it never occurred to me that anyone would write code like that.
  • jmgjmg Posts: 15,140
    cgracey wrote: »
    ...what happens is the 'push value' occurs, while a jump to the return address also takes place. Afterwards, you have returned, as expected, and the bottom of the stack contains the pushed value, while the next level of the stack contains the address that was returned to. So, it kind of makes a mess of the stack.

    So the stack creeps and wraps around ?
    That's likely to be a big problem with interrupts, but may be ok/tolerable if you know interrupts will never be used.

    How about debug of this code ? IIRC that uses interrupts to step, so this may not be step-able ?

  • roglohrogloh Posts: 5,122
    edited 2020-03-09 22:03
    Ok that will limit things a bit. Interestingly I think I can still use this feature the way I intended in my driver because I don't really use the stack otherwise before this. Everything else I do is branching before this level. So it's probably still okay for the stack to grow in size and get corrupted in this way each time it is called if you only use it at the top level in your code and don't need to return somewhere after this.
  • roglohrogloh Posts: 5,122
    edited 2020-03-09 22:29
    cgracey wrote: »
    Cluso99 wrote: »
    So while it takes the return address from the top of the stack, the removal does not take place because another item is pushed onto the stack effectively cancelling the pop's adjustment to the stack pointer.

    I could have made it NOT push or pop the stack if BOTH were happening, but it never occurred to me that anyone would write code like that.

    I guess we should probably see what happens with the _RET_ POP {#}D case too and how it leaves things. It might be possible to gain some value from that if it does something interesting as well by creating some other side-effect behaviour with the stack etc.
  • TonyB_TonyB_ Posts: 2,108
    edited 2020-03-09 23:45
    If the return address is $1FF, does the _ret_ start a new bytecode? If so, it seems to be exactly what I wanted earlier in the XBYTE thread. The return address is not popped, but that's $1FF for XBYTE and so would not be popped anyway. The important thing for me is to start a new bytecode with something other than $1FF on the top of stack. Having $1FF at one level below top of stack is ideal.
    rogloh wrote: »
    So it's probably still okay for the stack to grow in size and get corrupted in this way each time it is called if you only use it at the top level in your code and don't need to return somewhere after this.
    Agreed, it's still useful.

    EDIT:
    I wouldn't say the stack gets "corrupted" as POPs could remove the unwanted already-used return addresses, if necessary. The effective stack depth is reduced, though.
  • cgraceycgracey Posts: 14,133
    edited 2020-03-10 00:20
    Here is the Verilog for the hardware stack:
    // stack
    
    wire stk_push		= push || callpa || callpb || callr || calli;					// pushes
    
    wire stk_write		= ptr_w || go && (stk_push || pop || (ret || ret_auto) && !stk_xbyte);		// writes
    
    wire [7:0][31:0] stkx	= ptr_w || stk_push ? {stk[6:0], push && !ptr_w ? d : {c, z, 10'b0, p}}		// push/call, ptr_w traps execution address
    					    : {stk[7], stk[7:1]};					// pop/ret/ret_auto
    
    `regscan (stk,     256'b0, stk_write, stkx)								// stack levels
    `regscan (stk_xbyte, 1'b0, stk_write, stkx[0][19:0] == 20'b0000_0000_0001_1111_1111)			// top stack value is $001FF (speeds up xbytex)
    
    wire [1:0] popret_cz	= {stk[0][31], pop_reg ? ~|stk[0] : stk[0][30]};
    

    ptr_w is the signal that the cog receives during its COGINIT. It's used to push the execution address onto the hardware stack, so that you can find out where the cog hung up, if need be.
  • Cluso99Cluso99 Posts: 18,066
    cgracey wrote: »
    Cluso99 wrote: »
    So while it takes the return address from the top of the stack, the removal does not take place because another item is pushed onto the stack effectively cancelling the pop's adjustment to the stack pointer.

    I could have made it NOT push or pop the stack if BOTH were happening, but it never occurred to me that anyone would write code like that.
    There is always someone that will try something unintended ;)

    It's good to know just what will happen tho.
  • roglohrogloh Posts: 5,122
    edited 2020-03-10 11:13
    I just tried out the "_RET_ POP D" case to see what would happen. It looks like it returns ok to where it was called from and the D register gets a copy of the return address of the calling code (i.e. caller location in COG RAM + 1), plus the state of the C, Z flags at calling time. Not sure what actual final uses this has but perhaps this could be used in recording/tracing the location of where each function returned to last and/or capturing the original caller's flag state to a given register, which normally wouldn't be done with other RET cases. You could also restore the original flags later with "RCZL D WCZ" perhaps if you need to and also still make use of the C, Z flags from the called code before this operation (which could be useful for returning state). As with the push case I didn't check if it cleans up the stack fully but its Verilog OR condition in the stk_write wire makes it look like it should work in its usual way this time.
  • cgracey wrote: »
    Here is the Verilog for the hardware stack:
    // stack
    
    wire stk_push		= push || callpa || callpb || callr || calli;					// pushes
    
    wire stk_write		= ptr_w || go && (stk_push || pop || (ret || ret_auto) && !stk_xbyte);		// writes
    
    wire [7:0][31:0] stkx	= ptr_w || stk_push ? {stk[6:0], push && !ptr_w ? d : {c, z, 10'b0, p}}		// push/call, ptr_w traps execution address
    					    : {stk[7], stk[7:1]};					// pop/ret/ret_auto
    
    `regscan (stk,     256'b0, stk_write, stkx)								// stack levels
    `regscan (stk_xbyte, 1'b0, stk_write, stkx[0][19:0] == 20'b0000_0000_0001_1111_1111)			// top stack value is $001FF (speeds up xbytex)
    
    wire [1:0] popret_cz	= {stk[0][31], pop_reg ? ~|stk[0] : stk[0][30]};
    

    I think that seven consecutive POPs will guarantee that all eight longs on the stack are the same and further POPs will keep reading this value, assuming no intervening CALLs or PUSHes.

    Does reset clear the stack?
  • cgraceycgracey Posts: 14,133
    TonyB_ wrote: »
    cgracey wrote: »
    Here is the Verilog for the hardware stack:
    // stack
    
    wire stk_push		= push || callpa || callpb || callr || calli;					// pushes
    
    wire stk_write		= ptr_w || go && (stk_push || pop || (ret || ret_auto) && !stk_xbyte);		// writes
    
    wire [7:0][31:0] stkx	= ptr_w || stk_push ? {stk[6:0], push && !ptr_w ? d : {c, z, 10'b0, p}}		// push/call, ptr_w traps execution address
    					    : {stk[7], stk[7:1]};					// pop/ret/ret_auto
    
    `regscan (stk,     256'b0, stk_write, stkx)								// stack levels
    `regscan (stk_xbyte, 1'b0, stk_write, stkx[0][19:0] == 20'b0000_0000_0001_1111_1111)			// top stack value is $001FF (speeds up xbytex)
    
    wire [1:0] popret_cz	= {stk[0][31], pop_reg ? ~|stk[0] : stk[0][30]};
    

    I think that seven consecutive POPs will guarantee that all eight longs on the stack are the same and further POPs will keep reading this value, assuming no intervening CALLs or PUSHes.

    Does reset clear the stack?

    A chip reset clears the stacks, but not a cog start or stop.
  • jmgjmg Posts: 15,140
    TonyB_ wrote: »
    I think that seven consecutive POPs will guarantee that all eight longs on the stack are the same and further POPs will keep reading this value, assuming no intervening CALLs or PUSHes.
    Aren't those backwards ? 8 PUSHes would fully define the stack.



  • TonyB_TonyB_ Posts: 2,108
    edited 2020-03-10 19:40
    cgracey wrote: »
    A chip reset clears the stacks, but not a cog start or stop.
    Thanks, Chip.
    jmg wrote: »
    TonyB_ wrote: »
    I think that seven consecutive POPs will guarantee that all eight longs on the stack are the same and further POPs will keep reading this value, assuming no intervening CALLs or PUSHes.
    Aren't those backwards ? 8 PUSHes would fully define the stack.
    Just a fun comment based on my reading of the Verilog posted above. The seven longs below top of stack move up one level for each POP and long furthest from top stays the same.
Sign In or Register to comment.