Recursive/Reentrant code in COG?
Heater.
Posts: 21,230
We are all familaiar with the fact that the Propeller COG does not have a call stack. Rather it uses an old computer technique of writing the return address for subroutine calls to some known location where it can be picked up at the end of the subroutine (JMPRET).
This is not normally an issue for COG code as it's not often one want's to put a recursive algorithm in COG. (Has any one ever done so? Even the recursive Fibo benchmark?).
However we now have the possibility of running multiple threads in a COG. That means that different threads may want to call the same subroutine. Rather than having multiple copies of the same code in COG we migh need to make it a subroutine(s) to fit things in the space available.
What happens then?
Using normal COG CALL/RET won't work. Thread A can call a subroutine, which places the return address for thread A at the end of the routine (normally). Thread B migh call the same routine before thread A has returned. Ooops we have just now overwritten A's return address. Thread A could eaily end up running thread B's code or some such horror.
What we need is a stack based call return mechanism for thread safe calls.
Anyone attempted this yet?
This is not normally an issue for COG code as it's not often one want's to put a recursive algorithm in COG. (Has any one ever done so? Even the recursive Fibo benchmark?).
However we now have the possibility of running multiple threads in a COG. That means that different threads may want to call the same subroutine. Rather than having multiple copies of the same code in COG we migh need to make it a subroutine(s) to fit things in the space available.
What happens then?
Using normal COG CALL/RET won't work. Thread A can call a subroutine, which places the return address for thread A at the end of the routine (normally). Thread B migh call the same routine before thread A has returned. Ooops we have just now overwritten A's return address. Thread A could eaily end up running thread B's code or some such horror.
What we need is a stack based call return mechanism for thread safe calls.
Anyone attempted this yet?
Comments
That's a tricky one!
-Phil
See CALLA, CALLB, RETA, RETB
Andy
Using CALLA,RETA doesn't get around the multi thread issue of who called it first?
For example
Thread 1 calls routine CALLA #XYZ.
Thread 2 then does the same, CALLA #XYZ
When thread 1 completes call it hits a RETA which returns to thread 2's PC not thread 1's.
I think I've got that right?
Cheers
Brian
If so, all that is needed is for a call, which is actually a JMPRET to place the return address in some place unique to the thread where it can be picked up by another JMPRET at return time. Rather than placing the return address in the RETURN instruction as usual.
This would not allow recursive calls within a thread but would allow reentrant calls between threads as the return addresses get stored in different places.
No stack required.
Edit: Oops, sorry Phil that's what you said:
Yes you are right.
I think there are anyway other problems beside the return address. Also all the registers you use in the subroutine can get written by one thread and read by another. You will also need local registers on the stack and a stackpointer per thread.
But maybe Prop 2's register remapping can help here? I don't really understand that yet...
Andy
I think using SETMAP could get around the register overwrite issue and the return address storage issues.
The problem is the mechanism to retrieve the return address related to the current thread.
I'm dreaming here but if we had SPC and SPD as well as SPA and SB, we could have a CALLC and CALLD.
This in addition to a RETX for example that pops a return address from the stack based on thread number. Thread 0 pops SPA,1 pops SPB etc.
That would be nice. Just dreaming....(Sorry Chip, Ken, Beau and others..)
Cheers
Brian
In the back of my mind was the idea that every thread is working with it's own variable, via "register remapping" or what ever it is called. It was only the return addresses that woke me up in the middle of the night nagging that they are in the shared instruction space normally.
Now, where is that example of threads using common code and separate variables?
Seems to me the same technique of making calls and returns with JMPRET could be used by a thread such that it's return address is stored in it's thread local storage. Of course every subroutine you call would have to have it's own return address storage location defined.
We should not need to request any new instructions in the PII.
Bingo, that's just what I was thinking of. Thanks.
So, can we do reentrant, thread safe, subroutines as simply as this:
1) Make the subroutine call with CALL as normal.
2) Ensure the subroutine_ret lable and return instruction are in the threads data area.
3) To return from the subroutine just jump there.
Like so:
Anyone got their FPGA set up to test this?
Wouldn't the RET instruction cause a problem when the cog is launched?
My understanding is that the variables are treated as NOP's and are ignored.
This relies on the values not being decoded as instructions?
Edit: Only 18 bit values can be used so there not decoded.
Brian
OK can't we just reserve a space with zero in it to store return addresses and and just do a JMP through that location (no #) for our returns? Like so:
Here's a program I just tested on my new DE2 board.
Running 4 tasks calling the same routine.
Both work fine. Hooray!
I'm getting tired but isn't: the same as:
And what is that "jmp_inst" business about?
Not sure I see why we need it.
"mov flash_ret, jmp_inst" puts a jump to zero instruction at the return address location.
The subroutine calls will over write 9 bits of that with their return addresses.
But when we do the return with "jmp flash_ret" we are not using the JMP opcode we put in their (or are we?)
Am I missing some points here?
Flash_ret(,2,3,4) holds the task-specific return address in the lower 9 bits. I don't think it also needs to hold the jump instruction. If you get rid of jmp_inst and the 4 lines that reference it, the code still works, right?
Thanks for a demonstration of some cool techniques!
-Phil
That "call #flash" is an assmbler pseudo op that should build the exact same "jmpret flash_ret,#flash" instruction bit for bit. At least it does for P1.
And as you say we don't need the "jmp" opcode bits in those flash_retx slots.
I just found out my DE0 nano has been trodden on, don't ask, and many of it's header pins are severely bent. Hope they don't fall off when I straighten them out and it still works.
Our posts crossed over. I may be right but has anyone tested it yet? As I said above I'm having some technical hitches here....
You are correct!
It was very late for me and looking at it this morning ...huh?
For some reason I thought the JMP instruction needed to be there?
Here's the corrected code that has been tested.
Cheers
Brian
Fantastic, thanks for that confirmation.
That is so clean and elegant.
This problem turned out to be much easier than I was imagining yesterday.