Calling conventions
moony
Posts: 27
in Propeller 2
Bit of a preface: I was unable to get P2LLVM to build. User error? Probably. But I thought it'd be a more fun learning experience to start the process of designing an LLVM backend from scratch, so here we are.
I was hoping the people here could help me figure out what calling conventions would be ideal to provide implementations for. The P2 has a very large regfile, so giving LLVM as much of that as possible is going to be beneficial for performance, especially as LLVM is smart enough to not take more than it needs. However there isn't really a one-size-fits-all solution. I was hoping to get a discussion going on what CC layouts could be best, and what one to make the default.
I personally was thinking a 256 register CC, with 8 caller saved registers, 248 callee saved registers, two return registers, and (I couldn't figure out a good number) argument registers could make a good default, as this leaves 240 registers for programmer allocation and possibly also for use by interrupt routines (which need their own CC. I'm even further from figuring that out, as I need a different CC for each interrupt level)
A 32 register option would also be good for low-intrusiveness C functions (where much of the code may be ASM and wants as much of cog ram as possible), or even a minimal CC (all regs callee saved, only 8-16 regs) to avoid needing to spill at all in an assembly caller.
I wanted to make sure that SP is in PTRA in all calling conventions, specifically as this allows very easily starting a C function on another cog with no setup cost due to being able to specify the PTRA from the starting cog.
Any thoughts? Any other CCs that may be a good idea? I know spin2interp and TAQOZ compatible CCs would be good as well.
I was hoping the people here could help me figure out what calling conventions would be ideal to provide implementations for. The P2 has a very large regfile, so giving LLVM as much of that as possible is going to be beneficial for performance, especially as LLVM is smart enough to not take more than it needs. However there isn't really a one-size-fits-all solution. I was hoping to get a discussion going on what CC layouts could be best, and what one to make the default.
I personally was thinking a 256 register CC, with 8 caller saved registers, 248 callee saved registers, two return registers, and (I couldn't figure out a good number) argument registers could make a good default, as this leaves 240 registers for programmer allocation and possibly also for use by interrupt routines (which need their own CC. I'm even further from figuring that out, as I need a different CC for each interrupt level)
A 32 register option would also be good for low-intrusiveness C functions (where much of the code may be ASM and wants as much of cog ram as possible), or even a minimal CC (all regs callee saved, only 8-16 regs) to avoid needing to spill at all in an assembly caller.
I wanted to make sure that SP is in PTRA in all calling conventions, specifically as this allows very easily starting a C function on another cog with no setup cost due to being able to specify the PTRA from the starting cog.
Any thoughts? Any other CCs that may be a good idea? I know spin2interp and TAQOZ compatible CCs would be good as well.
Comments
Putting PA/PB/PTRB to use for something might be a good idea, too. Not sure if that's possible, but passing the first pointer-type argument (read: the C++ this pointer) in PTRB would map really well to object oriented code.
This is my current draft of the C Calling Convention.
Hoping for feedback on the layout, as, importantly, once these are implemented they're not really changeable without breaking backwards compatibility.
I think you're missing a stack frame pointer?
PTRB might be better off callee-saved (and not carrying the return value), since it is quite likely to stay the same through multiple call levels.
Also, flexspin does a cool trick to get out of having to push/pop the hub stack for every call: The internal stack is used for calls, but non-leaf functions pop the return addr/flags into a register and save it along with the other registers (and then JMP through it to return). This, as it turns out, is mostly faster than CALLA/RETA (see below), but takes some additional instrs in non-leaf functions (offset by the ability to use _RET_ in leaf functions). Then again, maybe a bit complex for the basic CC.