sharing PASM2 routines among multiple languages

ersmith · 2019-06-27 13:07

It would be nice if there were a standard interface for calling a PASM2 routine from a high level language; a "pasmcall" calling convention, for lack of a better word. Such a calling convention would generally be in addition to the normal calling convention for the high level language.

The benefit is that we could re-use PASM2 hubexec code for common routines without having to re-invent the wheel. For example, it'd be nice to have a standard floating point library that could be plugged in to different languages. I'm not suggesting that this is the only way to do PASM2, or that it's a solution that works in all cases. I'm just proposing that it be an optional language feature that, if supported, would allow programmers to re-use their code in many places.

With that caveat in mind, what would such a calling convention look like? I can think of a few basic principles:

(a) The _ret_ instruction prefix is incredibly useful, but it means that the call would be via a standard "call" instruction (not "calla" or "calld"). Again, this would apply only to "pasmcall" functions; the normal language calling convention might require a stack for recursion, for example, and so be based on one of those other call instructions.

(b) The "pa" and "pb" registers are clearly designed to accept arguments, so lets pass the first two parameters in those. Return values could come back in "pa". Additional parameters would either go in the COG scratch registers (see below) or on the stack, if we can agree on what the stack looks like

. Or, we could limit pasmcall functions to having at most 2 parameters.

(c) The function would need some scratch registers in COG memory, and/or some stack space to save registers in. We could specify that COG memory registers $1e8-$1ef are temporary registers that may be modified in pasmcall routines. It'd also be nice to have some standard for the stack, but that could be harder to reach agreement on (does the stack grow up or down? which register holds the stack pointer?).

Here's the stack usage I'm aware of at present:

(1) All the fastspin languages use ptra as the stack pointer, and the stack grows up.
(2) p2gcc uses COG memory location $27 (?) as the stack pointer, and the stack grows down
(3) Risc-V gcc uses COG location $02 as the stack pointer, and the stack grows down.
(4) Catalina uses a COG location as the stack pointer (I think it may vary with mode?) and the stack grows down
(5) In traditional Spin the stack grows up, so I expect the Spin2 interpreter will do this as well
(6) TAQOZ seems to store its stack in LUT memory rather than HUB, and the stack grows up

It'd be nice if we all agreed to make HUB stacks grow down from the top of memory (I'm willing to change fastspin to do this). But if that's not feasible, then we could define that pasmcall routines would always get a pointer to some scratch memory in HUB that they could use. This memory could come from the stack, or a static area. I think it should be passed in ptra or ptrb

Comments? Suggestions?

Electrodude · 2019-06-27 14:39

Why shouldn't the standard be that the stack grows up, as in Spin on the P1? That way, if it overflows, it will clobber buffers at the end of memory rather than code at the beginning. A partially corrupted video buffer is better than corrupted code.

ersmith · 2019-06-27 15:26

Electrodude wrote: »

Why shouldn't the standard be that the stack grows up, as in Spin on the P1? That way, if it overflows, it will clobber buffers at the end of memory rather than code at the beginning. A partially corrupted video buffer is better than corrupted code.

Usually there are variables immediately after the code -- in fact if the video buffer is declared as an array that's where it probably will be. In any event the usual convention in most architectures is for the stack to grow down from the top of memory and the heap (dynamically allocated memory) to grow upwards. It's that way in ARM, x86, RISC-V, SPARC, and MIPS. (The various RISC processors generally don't have dedicated push/pop instructions and could grow the stack either way, but the official calling conventions use down).

cgracey · 2019-06-27 18:49

Eric, sounds good, but I'm kind of a fan of stacks growing upwards, since there is likely more than one of them. Growing downwards would be great if there was only one cog, but we have eight cogs. So, there are going to be potentially 8 stacks. Nothing special about growing down from the top in that case.

I agree that $1E8..$1EF would be good for a scratchpad. Probably also optimal for parameter passing. Maybe it should be fully $1E0..$1EF for everything. It's true that PA and PB have advantage in some cases, but I don't know if mandating their use as conduit would be best. Certainly, PA/PB/PTRA/PTRB should be free for use within called routines. Spin2 restores them on return after an outside call.

For the called hub code, it might be good to afford 2 to 4 hardware-stack levels for simple CALL/RET activity.

Dave Hein · 2019-06-27 19:05

FYI, when calling immediate addresses greater than 511 the CALLD instruction can only use PA, PB, PTRA or PTRB for the destination register. p2gcc uses PA to hold the return address.

RossH · 2019-06-29 00:03

ersmith wrote: »

(4) Catalina uses a COG location as the stack pointer (I think it may vary with mode?) and the stack and the stack grows down

Yes, it depends on the mode - in Compact or LMM mode, Catalina uses arbitrary cog locations since these modes were carried over from the P1, and it doesn't seem worth the effort of changing them.

But in Native mode it uses PTRA as stack pointer and PTRB as frame pointer. If I ever do go back and change the other modes, this is what I would change them to.

It'd be nice if we all agreed to make HUB stacks grow down from the top of memory (I'm willing to change fastspin to do this). But if that's not feasible, then we could define that pasmcall routines would always get a pointer to some scratch memory in HUB that they could use. This memory could come from the stack, or a static area. I think it should be passed in ptra or ptrb

Comments? Suggestions?

As you say, the usual convention in stack-oriented languages is that stack grows down and heap grows up (and ne'er the twain shall meet!). I see no real need to flout this convention as it will just make porting or integrating with new languages that much more complicated.

I don't really understand Chip's comment about multiple cogs making a difference here. Each cog has to be allocated a fixed and separate block of RAM as stack, or they will interfere - so it doesn't seem to matter whether you start from the top or the bottom of the block. I do know Chip has "hardwired" that stacks to grow upwards if you use the CALLA or CALLB instructions, but I don't use them because CALLD gives you easier access to the PC, which is necessary in many cases (so CALLA and CALLB are actually more complex to use).

However, for PASM functions called from high-level languages, we could have a separate convention that the block of memory allocated for stack grows up, not down. This would allow PASM functions to use CALLA and CALLB.

cgracey · 2019-06-29 00:27

This business about the stack growing down and the heap growing up implies single-processor activity, doesn't it? If you had two processors, you would need two stacks. No point in pinning stacks to the end of memory, then. As long as you have to have multiple RAM blocks allocated for stacks, it doesn't really matter whether they grow up or down, anymore, does it?

RossH · 2019-06-29 00:30

cgracey wrote: »

This business about the stack growing down and the heap growing up implies single-processor activity, doesn't it?

Multiple processors can share a heap, but not a stack. So it makes sense for the heap to start at one end of memory or the other. Stacks can be allocated anywhere.

If you had two processors, you would need two stacks. No point in pinning stacks to the end of memory, then. As long as you have to have multiple RAM blocks allocated for stacks, it doesn't really matter whether they grow up or down, anymore, does it?

Exactly.

Cluso99 · 2019-06-29 01:57

I think the spin stack growing up was because the first spin program knows where the end of code is, and grows up from there. The primary spin program doesn’t have any reserved size. Other spin processes (equivalent of other cogs) must declare how much stack space is required and that is reserved below the primary cogs stack.
Perhaps spin2 should declare a stack size???

ersmith · 2019-06-29 13:23

RossH wrote: »

However, for PASM functions called from high-level languages, we could have a separate convention that the block of memory allocated for stack grows up, not down. This would allow PASM functions to use CALLA and CALLB.

That's a good idea, Ross. Maybe the "pasmcall" convention should specify that ptra points to the beginning a block of HUB memory that's at least 32 longs (128 bytes) in size, which the PASM code may use for a stack or for any other purpose it wants.

cgracey · 2019-06-29 15:27

ersmith wrote: »

RossH wrote: »

However, for PASM functions called from high-level languages, we could have a separate convention that the block of memory allocated for stack grows up, not down. This would allow PASM functions to use CALLA and CALLB.

That's a good idea, Ross. Maybe the "pasmcall" convention should specify that ptra points to the beginning a block of HUB memory that's at least 32 longs (128 bytes) in size, which the PASM code may use for a stack or for any other purpose it wants.

Yes! I know in the case of Spin2, PTRA will already be pointing to the current stack, so the PASM code could just continue from there, building upward using CALLA. No need to do anything special, except be sure you've got 32 more stack longs than you'd otherwise need for your Spin2 code.