sharing PASM2 routines among multiple languages
ersmith
Posts: 6,087
in Propeller 2
It would be nice if there were a standard interface for calling a PASM2 routine from a high level language; a "pasmcall" calling convention, for lack of a better word. Such a calling convention would generally be in addition to the normal calling convention for the high level language.
The benefit is that we could re-use PASM2 hubexec code for common routines without having to re-invent the wheel. For example, it'd be nice to have a standard floating point library that could be plugged in to different languages. I'm not suggesting that this is the only way to do PASM2, or that it's a solution that works in all cases. I'm just proposing that it be an optional language feature that, if supported, would allow programmers to re-use their code in many places.
With that caveat in mind, what would such a calling convention look like? I can think of a few basic principles:
(a) The _ret_ instruction prefix is incredibly useful, but it means that the call would be via a standard "call" instruction (not "calla" or "calld"). Again, this would apply only to "pasmcall" functions; the normal language calling convention might require a stack for recursion, for example, and so be based on one of those other call instructions.
(b) The "pa" and "pb" registers are clearly designed to accept arguments, so lets pass the first two parameters in those. Return values could come back in "pa". Additional parameters would either go in the COG scratch registers (see below) or on the stack, if we can agree on what the stack looks like . Or, we could limit pasmcall functions to having at most 2 parameters.
(c) The function would need some scratch registers in COG memory, and/or some stack space to save registers in. We could specify that COG memory registers $1e8-$1ef are temporary registers that may be modified in pasmcall routines. It'd also be nice to have some standard for the stack, but that could be harder to reach agreement on (does the stack grow up or down? which register holds the stack pointer?).
Here's the stack usage I'm aware of at present:
(1) All the fastspin languages use ptra as the stack pointer, and the stack grows up.
(2) p2gcc uses COG memory location $27 (?) as the stack pointer, and the stack grows down
(3) Risc-V gcc uses COG location $02 as the stack pointer, and the stack grows down.
(4) Catalina uses a COG location as the stack pointer (I think it may vary with mode?) and the stack grows down
(5) In traditional Spin the stack grows up, so I expect the Spin2 interpreter will do this as well
(6) TAQOZ seems to store its stack in LUT memory rather than HUB, and the stack grows up
It'd be nice if we all agreed to make HUB stacks grow down from the top of memory (I'm willing to change fastspin to do this). But if that's not feasible, then we could define that pasmcall routines would always get a pointer to some scratch memory in HUB that they could use. This memory could come from the stack, or a static area. I think it should be passed in ptra or ptrb
Comments? Suggestions?
The benefit is that we could re-use PASM2 hubexec code for common routines without having to re-invent the wheel. For example, it'd be nice to have a standard floating point library that could be plugged in to different languages. I'm not suggesting that this is the only way to do PASM2, or that it's a solution that works in all cases. I'm just proposing that it be an optional language feature that, if supported, would allow programmers to re-use their code in many places.
With that caveat in mind, what would such a calling convention look like? I can think of a few basic principles:
(a) The _ret_ instruction prefix is incredibly useful, but it means that the call would be via a standard "call" instruction (not "calla" or "calld"). Again, this would apply only to "pasmcall" functions; the normal language calling convention might require a stack for recursion, for example, and so be based on one of those other call instructions.
(b) The "pa" and "pb" registers are clearly designed to accept arguments, so lets pass the first two parameters in those. Return values could come back in "pa". Additional parameters would either go in the COG scratch registers (see below) or on the stack, if we can agree on what the stack looks like . Or, we could limit pasmcall functions to having at most 2 parameters.
(c) The function would need some scratch registers in COG memory, and/or some stack space to save registers in. We could specify that COG memory registers $1e8-$1ef are temporary registers that may be modified in pasmcall routines. It'd also be nice to have some standard for the stack, but that could be harder to reach agreement on (does the stack grow up or down? which register holds the stack pointer?).
Here's the stack usage I'm aware of at present:
(1) All the fastspin languages use ptra as the stack pointer, and the stack grows up.
(2) p2gcc uses COG memory location $27 (?) as the stack pointer, and the stack grows down
(3) Risc-V gcc uses COG location $02 as the stack pointer, and the stack grows down.
(4) Catalina uses a COG location as the stack pointer (I think it may vary with mode?) and the stack grows down
(5) In traditional Spin the stack grows up, so I expect the Spin2 interpreter will do this as well
(6) TAQOZ seems to store its stack in LUT memory rather than HUB, and the stack grows up
It'd be nice if we all agreed to make HUB stacks grow down from the top of memory (I'm willing to change fastspin to do this). But if that's not feasible, then we could define that pasmcall routines would always get a pointer to some scratch memory in HUB that they could use. This memory could come from the stack, or a static area. I think it should be passed in ptra or ptrb
Comments? Suggestions?
Comments
Usually there are variables immediately after the code -- in fact if the video buffer is declared as an array that's where it probably will be. In any event the usual convention in most architectures is for the stack to grow down from the top of memory and the heap (dynamically allocated memory) to grow upwards. It's that way in ARM, x86, RISC-V, SPARC, and MIPS. (The various RISC processors generally don't have dedicated push/pop instructions and could grow the stack either way, but the official calling conventions use down).
I agree that $1E8..$1EF would be good for a scratchpad. Probably also optimal for parameter passing. Maybe it should be fully $1E0..$1EF for everything. It's true that PA and PB have advantage in some cases, but I don't know if mandating their use as conduit would be best. Certainly, PA/PB/PTRA/PTRB should be free for use within called routines. Spin2 restores them on return after an outside call.
For the called hub code, it might be good to afford 2 to 4 hardware-stack levels for simple CALL/RET activity.
Yes, it depends on the mode - in Compact or LMM mode, Catalina uses arbitrary cog locations since these modes were carried over from the P1, and it doesn't seem worth the effort of changing them.
But in Native mode it uses PTRA as stack pointer and PTRB as frame pointer. If I ever do go back and change the other modes, this is what I would change them to.
As you say, the usual convention in stack-oriented languages is that stack grows down and heap grows up (and ne'er the twain shall meet!). I see no real need to flout this convention as it will just make porting or integrating with new languages that much more complicated.
I don't really understand Chip's comment about multiple cogs making a difference here. Each cog has to be allocated a fixed and separate block of RAM as stack, or they will interfere - so it doesn't seem to matter whether you start from the top or the bottom of the block. I do know Chip has "hardwired" that stacks to grow upwards if you use the CALLA or CALLB instructions, but I don't use them because CALLD gives you easier access to the PC, which is necessary in many cases (so CALLA and CALLB are actually more complex to use).
However, for PASM functions called from high-level languages, we could have a separate convention that the block of memory allocated for stack grows up, not down. This would allow PASM functions to use CALLA and CALLB.
Multiple processors can share a heap, but not a stack. So it makes sense for the heap to start at one end of memory or the other. Stacks can be allocated anywhere.
Exactly.
Perhaps spin2 should declare a stack size???
That's a good idea, Ross. Maybe the "pasmcall" convention should specify that ptra points to the beginning a block of HUB memory that's at least 32 longs (128 bytes) in size, which the PASM code may use for a stack or for any other purpose it wants.
Yes! I know in the case of Spin2, PTRA will already be pointing to the current stack, so the PASM code could just continue from there, building upward using CALLA. No need to do anything special, except be sure you've got 32 more stack longs than you'd otherwise need for your Spin2 code.