Calling conventions

moony · 2021-01-20 03:49

Bit of a preface: I was unable to get P2LLVM to build. User error? Probably. But I thought it'd be a more fun learning experience to start the process of designing an LLVM backend from scratch, so here we are.

I was hoping the people here could help me figure out what calling conventions would be ideal to provide implementations for. The P2 has a very large regfile, so giving LLVM as much of that as possible is going to be beneficial for performance, especially as LLVM is smart enough to not take more than it needs. However there isn't really a one-size-fits-all solution. I was hoping to get a discussion going on what CC layouts could be best, and what one to make the default.

I personally was thinking a 256 register CC, with 8 caller saved registers, 248 callee saved registers, two return registers, and (I couldn't figure out a good number) argument registers could make a good default, as this leaves 240 registers for programmer allocation and possibly also for use by interrupt routines (which need their own CC. I'm even further from figuring that out, as I need a different CC for each interrupt level)

A 32 register option would also be good for low-intrusiveness C functions (where much of the code may be ASM and wants as much of cog ram as possible), or even a minimal CC (all regs callee saved, only 8-16 regs) to avoid needing to spill at all in an assembly caller.

I wanted to make sure that SP is in PTRA in all calling conventions, specifically as this allows very easily starting a C function on another cog with no setup cost due to being able to specify the PTRA from the starting cog.

Any thoughts? Any other CCs that may be a good idea? I know spin2interp and TAQOZ compatible CCs would be good as well.

Wuerfel_21 · 2021-01-20 12:42

(For those playing along at home, we two talked about this on discord a bunch already)
Putting PA/PB/PTRB to use for something might be a good idea, too. Not sure if that's possible, but passing the first pointer-type argument (read: the C++ this pointer) in PTRB would map really well to object oriented code.

moony · 2021-01-20 14:08

@Wuerfel_21: Oh absolutely, I was going to try and see if I could make intelligent use of PTRB in the future. I'll note that down, as yea, it'd be good for OO and for functions that do most of their operations on a struct. Not using it would be silly, as it's the best way to access structures in memory and often it may be worth the allocation pressure to try and keep an in-use struct pointer in it at all times. As spilling should be pretty rare due to the high reg count (likely will do most of the spilling at entry), could also make sense to use PTRA for this task sometimes. Moving the SP can be a bit complicated though so that'd definitely be a stretch goal backend wise.

moony · 2021-01-20 14:52

On a different note, @"Peter Jakacki" Do you have any recommendations for producing a TAQOZ-compatible calling convention? I don't fully understand how TAQOZ works so you're probably the best person to ask.

moony · 2021-01-21 17:28

https://gist.github.com/moonheart08/517dd2ac1909ebfb15cd9eb69d421353
This is my current draft of the C Calling Convention.
Hoping for feedback on the layout, as, importantly, once these are implemented they're not really changeable without breaking backwards compatibility.

Wuerfel_21 · 2021-01-21 18:43

Looks good.

I think you're missing a stack frame pointer?

PTRB might be better off callee-saved (and not carrying the return value), since it is quite likely to stay the same through multiple call levels.

Also, flexspin does a cool trick to get out of having to push/pop the hub stack for every call: The internal stack is used for calls, but non-leaf functions pop the return addr/flags into a register and save it along with the other registers (and then JMP through it to return). This, as it turns out, is mostly faster than CALLA/RETA (see below), but takes some additional instrs in non-leaf functions (offset by the ability to use _RET_ in leaf functions).

(All of these not counting FIFO load delay)
CALLA+RETA = 5..12 + 11..18 = 16..30 cycles
CALL+RET (leaf func)= 4+4 = 8 cycles
CALL+POP+save/restore 1 more register +JMP (non-leaf, saving other regs) = 4+2+1+1+4 = 12 cycles
CALL+POP+PUSHA+RETA (non-leaf, not saving other regs) : 4+2+3..10+11..18 = 20..34 cycles

Then again, maybe a bit complex for the basic CC.

Calling conventions

Comments