Shop OBEX P1 Docs P2 Docs Learn Events
Recursive/Reentrant code in COG? — Parallax Forums

Recursive/Reentrant code in COG?

Heater.Heater. Posts: 21,230
edited 2013-09-29 15:27 in Propeller 2
We are all familaiar with the fact that the Propeller COG does not have a call stack. Rather it uses an old computer technique of writing the return address for subroutine calls to some known location where it can be picked up at the end of the subroutine (JMPRET).

This is not normally an issue for COG code as it's not often one want's to put a recursive algorithm in COG. (Has any one ever done so? Even the recursive Fibo benchmark?).

However we now have the possibility of running multiple threads in a COG. That means that different threads may want to call the same subroutine. Rather than having multiple copies of the same code in COG we migh need to make it a subroutine(s) to fit things in the space available.

What happens then?

Using normal COG CALL/RET won't work. Thread A can call a subroutine, which places the return address for thread A at the end of the routine (normally). Thread B migh call the same routine before thread A has returned. Ooops we have just now overwritten A's return address. Thread A could eaily end up running thread B's code or some such horror.

What we need is a stack based call return mechanism for thread safe calls.

Anyone attempted this yet?

Comments

  • ozpropdevozpropdev Posts: 2,791
    edited 2013-09-28 23:36
    Hi Heater

    That's a tricky one!
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2013-09-28 23:42
    If there were a block of RAM with a common address range, but with different physcial memory for each thread, the return addresses could be stored there.

    -Phil
  • AribaAriba Posts: 2,682
    edited 2013-09-29 00:06
    There is a stack based call and return mechanism in Prop 2 It even saves the C and Z flag.
    See CALLA, CALLB, RETA, RETB

    Andy
  • ozpropdevozpropdev Posts: 2,791
    edited 2013-09-29 00:27
    Hi Andy

    Using CALLA,RETA doesn't get around the multi thread issue of who called it first?

    For example

    Thread 1 calls routine CALLA #XYZ.
    Thread 2 then does the same, CALLA #XYZ
    When thread 1 completes call it hits a RETA which returns to thread 2's PC not thread 1's.

    I think I've got that right?

    Cheers
    Brian
  • Heater.Heater. Posts: 21,230
    edited 2013-09-29 00:36
    I haven't really looked into this but isn't there some weird memory tricks going on with threads such that, say, four threads all running the same code all see different memory for some variables? Sure I saw an example like that float by once.

    If so, all that is needed is for a call, which is actually a JMPRET to place the return address in some place unique to the thread where it can be picked up by another JMPRET at return time. Rather than placing the return address in the RETURN instruction as usual.

    This would not allow recursive calls within a thread but would allow reentrant calls between threads as the return addresses get stored in different places.

    No stack required.

    Edit: Oops, sorry Phil that's what you said:
    If there were a block of RAM with a common address range, but with different physcial memory for each thread, the return addresses could be stored there.
  • AribaAriba Posts: 2,682
    edited 2013-09-29 00:41
    Brian
    Yes you are right.
    I think there are anyway other problems beside the return address. Also all the registers you use in the subroutine can get written by one thread and read by another. You will also need local registers on the stack and a stackpointer per thread.

    But maybe Prop 2's register remapping can help here? I don't really understand that yet...

    Andy
  • ozpropdevozpropdev Posts: 2,791
    edited 2013-09-29 00:55
    Hi Guys

    I think using SETMAP could get around the register overwrite issue and the return address storage issues.
    The problem is the mechanism to retrieve the return address related to the current thread.

    I'm dreaming here but if we had SPC and SPD as well as SPA and SB, we could have a CALLC and CALLD.
    This in addition to a RETX for example that pops a return address from the stack based on thread number. Thread 0 pops SPA,1 pops SPB etc.
    That would be nice. Just dreaming....(Sorry Chip, Ken, Beau and others..)

    Cheers
    Brian
  • Heater.Heater. Posts: 21,230
    edited 2013-09-29 00:56
    Ah, yes, those pesky local variables.

    In the back of my mind was the idea that every thread is working with it's own variable, via "register remapping" or what ever it is called. It was only the return addresses that woke me up in the middle of the night nagging that they are in the shared instruction space normally.

    Now, where is that example of threads using common code and separate variables?
  • Heater.Heater. Posts: 21,230
    edited 2013-09-29 01:04
    ospropdev,
    I think using SETMAP could get around the register overwrite issue and the return address storage issues.
    That sounds like the thing I am looking for to get thread local variables. Which could include space for return addresses.
    The problem is the mechanism to retrieve the return address related to the current thread.
    This should be simple. I have to look again but in Full Duplex Serial don't we have two coroutines calling each other with JMPRET, the return address being stored in some variable somewhere rather than in the normal return instruction location?

    Seems to me the same technique of making calls and returns with JMPRET could be used by a thread such that it's return address is stored in it's thread local storage. Of course every subroutine you call would have to have it's own return address storage location defined.

    We should not need to request any new instructions in the PII.
  • ozpropdevozpropdev Posts: 2,791
    edited 2013-09-29 01:08
    Register remapping in the unofficial docs
    dat
           org                        'longs are like nop's, get skipped
    
    pin    long        0                'task 0 data
    count  long        1
    delay  long        0
    extra  long        0
    
          long        1                'task 1 data
          long        5
          long        0
          long        0
    
          long        2                'task 2 data
          long        13
          long        0
          long        0
    
          long        3                'task 3 data
          long        29
          long        0
          long        0
    
          setmap        #%1_010_010        'remap registers by task, 4 sets, 4 registers each
          settask        #%%3210                'enable all tasks
          jmptask        #loop,#%1111        'before any newly-started tasks get to execute stage, jump all tasks to loop
    
    loop   notp        pin                'toggle task x pin
          mov        delay,count        'get task x delay
          djnz        delay,#$        'count down delay
          jmp        #loop                'loop (count + 3 clocks)
    
    
  • Heater.Heater. Posts: 21,230
    edited 2013-09-29 02:13
    ozpropdev,

    Bingo, that's just what I was thinking of. Thanks.

    So, can we do reentrant, thread safe, subroutines as simply as this:
    1) Make the subroutine call with CALL as normal.
    2) Ensure the subroutine_ret lable and return instruction are in the threads data area.
    3) To return from the subroutine just jump there.

    Like so:
    pin             long        0     'task 0 data
    count           long        1
    delay           long        0
    subroutine_ret  ret               ' Return address for thread safe calls of subroutine.
    
           long        1              'task 1 data
           long        5
           long        0
           long        0
    
           long        2              'task 2 data
           long        13
           long        0
           long        0
    
           long        3              'task 3 data
           long        29
           long        0
           long        0
    
           setmap      #%1_010_010     'remap registers by task, 4 sets, 4 registers each
           settask     #%%3210         'enable all tasks
           jmptask     #thread, #%1111 'before any newly-started tasks get to execute stage, jump all tasks to loop
    
    thread notp        pin             'toggle task x pin
           mov         delay,count     'get task x delay
           djnz        delay,#$        'count down delay
    
           call        #subroutine     'Jump to subroutine
                                       '  saves return address in
                                       '  thread local subroutine_ret.
    
           jmp         #loop           'loop (count + 3 clocks)
    
    subroutine
           'bla bla
           add count,  #1              'tweaks thread local count.
           'bla bla
           jmp         #subroutine_ret 'Returns via thread local return address.
    
           'jmp        subroutine_ret   'Or can we do like this?
    

    Anyone got their FPGA set up to test this?
  • ozpropdevozpropdev Posts: 2,791
    edited 2013-09-29 02:26
    Hi Heater

    Wouldn't the RET instruction cause a problem when the cog is launched?
    My understanding is that the variables are treated as NOP's and are ignored.
    This relies on the values not being decoded as instructions?

    Edit: Only 18 bit values can be used so there not decoded.

    Brian
  • Heater.Heater. Posts: 21,230
    edited 2013-09-29 03:47
    Brian,

    OK can't we just reserve a space with zero in it to store return addresses and and just do a JMP through that location (no #) for our returns? Like so:
    pin             long        0     'task 0 data
    count           long        1
    delay           long        0
    subroutine_ret  long        0     ' Space for return address for thread safe calls of subroutine.
    
           long        1              'task 1 data
           long        5
           long        0
           long        0
    
           long        2              'task 2 data
           long        13
           long        0
           long        0
    
           long        3              'task 3 data
           long        29
           long        0
           long        0
    
           setmap      #%1_010_010     'remap registers by task, 4 sets, 4 registers each
           settask     #%%3210         'enable all tasks
           jmptask     #thread, #%1111 'before any newly-started tasks get to execute stage, jump all tasks to loop
    
    thread notp        pin             'toggle task x pin
           mov         delay,count     'get task x delay
           djnz        delay,#$        'count down delay
    
           call        #subroutine     'Jump to subroutine
                                       '  saves return address in
                                       '  thread local subroutine_ret.
    
           jmp         #loop           'loop (count + 3 clocks)
    
    subroutine
           'bla bla
           add count,  #1              'tweaks thread local count.
           'bla bla
           jmp        subroutine_ret   'Return from subroutine through thread local address.
    
  • ozpropdevozpropdev Posts: 2,791
    edited 2013-09-29 04:51
    Ok, I think we've got it now.
    Here's a program I just tested on my new DE2 board.
    Running 4 tasks calling the same routine.
    Both work fine. Hooray!

    con
    
    {{
    Chip answered my question on SETMAP as follows:
    
    
    	SETMAP #%1_010_010
    
    Those two 3-bit sub-fields can be %000..%111 to represent 1/2/4/8/16/32/64/128 sets
    or registers. For task-switched remapping, it makes sense to have
    4 sets (bits 5..3 = %010), but the set size (number of registers in a set, selected
    by bits 2..0) can be any power of 2 you'd like, up to 128
    }}
    
    
    	_clkmode = xinput
    	_xinfreq = 60_000_000
    
    
    	led0 = 32
    	led2 = 34
    	led4 = 36
    	led6 = 38
    
    dat
    		orgh	$e80
    
    		org
    
    pin		long	led0
    time		long	0
    delay		long	60_000_000
    flash_ret	long	0 
    
    		long	led2
    		long	0
    		long	30_000_000
    flash_ret2	long	0
    
    		long	led4
    		long	0
    		long	15_000_000
    flash_ret3	long	0
    
    		long	led6
    		long	0
    		long	7_500_000
    flash_ret4	long	0
    
    
    		mov	flash_ret,jmp_inst
    		mov	flash_ret2,jmp_inst
    		mov	flash_ret3,jmp_inst
    		mov	flash_ret4,jmp_inst
    
    		setmap	#%1_010_010        'remap registers by task, 4 sets, 4 registers each
    		jmptask	#task1,#%0010
    		jmptask	#task2,#%0100
    		jmptask	#task3,#%1000
    		settask	#%%3210
    
    
    task0		jmpret	flash_ret,#flash
    		jmp	#task0
    
    task1		jmpret	flash_ret,#flash
    		jmp	#task1
    
    
    task2		jmpret	flash_ret,#flash
    		jmp	#task2
    
    
    task3		jmpret	flash_ret,#flash
    		jmp	#task3
    
    
    flash		getcnt	time
    		add	time,delay
    		passcnt	time
    		notp	pin
    		jmp	flash_ret
    
    jmp_inst	jmp	#0
    
    
  • Heater.Heater. Posts: 21,230
    edited 2013-09-29 06:02
    Cool!

    I'm getting tired but isn't:
            jmpret  flash_ret,#flash
    
    the same as:
           call  #flash
    

    And what is that "jmp_inst" business about?

    Not sure I see why we need it.

    "mov flash_ret, jmp_inst" puts a jump to zero instruction at the return address location.
    The subroutine calls will over write 9 bits of that with their return addresses.
    But when we do the return with "jmp flash_ret" we are not using the JMP opcode we put in their (or are we?)

    Am I missing some points here?
  • timx8timx8 Posts: 6
    edited 2013-09-29 13:17
    I'm also curious if the call instruction would do the same thing. If "call #flash" is smart enough to follow the register remapping, to know where to write the return address, then it should be the same.

    Flash_ret(,2,3,4) holds the task-specific return address in the lower 9 bits. I don't think it also needs to hold the jump instruction. If you get rid of jmp_inst and the 4 lines that reference it, the code still works, right?

    Thanks for a demonstration of some cool techniques!
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2013-09-29 13:26
    Heater is right. The following is unnecessary:
    		mov	flash_ret,jmp_inst
    		mov	flash_ret2,jmp_inst
    		mov	flash_ret3,jmp_inst
    		mov	flash_ret4,jmp_inst
    

    -Phil
  • Heater.Heater. Posts: 21,230
    edited 2013-09-29 13:29
    Yes. We need to know! :)

    That "call #flash" is an assmbler pseudo op that should build the exact same "jmpret flash_ret,#flash" instruction bit for bit. At least it does for P1.

    And as you say we don't need the "jmp" opcode bits in those flash_retx slots.

    I just found out my DE0 nano has been trodden on, don't ask, and many of it's header pins are severely bent. Hope they don't fall off when I straighten them out and it still works.
  • Heater.Heater. Posts: 21,230
    edited 2013-09-29 13:31
    Thanks Phil,

    Our posts crossed over. I may be right but has anyone tested it yet? As I said above I'm having some technical hitches here....
  • ozpropdevozpropdev Posts: 2,791
    edited 2013-09-29 15:03
    Hi Guys

    You are correct!

    It was very late for me and looking at it this morning ...huh?
    For some reason I thought the JMP instruction needed to be there?
    Here's the corrected code that has been tested.

    Cheers
    Brian
    con
    
    	_clkmode = xinput
    	_xinfreq = 60_000_000
    
    
    	led0 = 32
    	led2 = 34
    	led4 = 36
    	led6 = 38
    
    dat
    		orgh	$e80
    
    		org
    
    pin		long	led0
    time		long	0
    delay		long	60_000_000
    flash_ret	long	0 
    
    		long	led2
    		long	0
    		long	30_000_000
    		long	0
    
    		long	led4
    		long	0
    		long	15_000_000
    		long	0
    
    		long	led6
    		long	0
    		long	7_500_000
    		long	0
    
    
    		setmap	#%1_010_010        'remap registers by task, 4 sets, 4 registers each
    		jmptask	#task1,#%0010
    		jmptask	#task2,#%0100
    		jmptask	#task3,#%1000
    		settask	#%%3210
    
    
    task0		call	#flash
    		jmp	#task0
    
    task1		call	#flash
    		jmp	#task1
    
    
    task2		call	#flash
    		jmp	#task2
    
    
    task3		call	#flash
    		jmp	#task3
    
    
    flash		getcnt	time
    		add	time,delay
    		passcnt	time
    		notp	pin
    		jmp	flash_ret
    
    
    
  • Heater.Heater. Posts: 21,230
    edited 2013-09-29 15:27
    ozpropdev,

    Fantastic, thanks for that confirmation.

    That is so clean and elegant.

    This problem turned out to be much easier than I was imagining yesterday.
Sign In or Register to comment.